Skip to content

Conversation

@akukanov
Copy link
Contributor

The patch removes seemingly redundant (most likely outdated) code in the single-group implementation of copy_if.

Copy link
Contributor Author

@akukanov akukanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments below explain why certain changes are made.

Comment on lines +357 to +361
template <typename _ValueType, bool _Inclusive, typename _Group, typename _InPtr, typename _OutPtr,
typename _BinaryOperation>
void
__scan_work_group(const _Group& __group, _Begin __begin, _End __end, _OutIt __out_it, _BinaryOperation __bin_op,
unseq_backend::__no_init_value<_ValueType>)
__scan_work_group(const _Group& __group, _InPtr __begin, _InPtr __end, _OutPtr __out_it, _BinaryOperation __bin_op,
unseq_backend::__no_init_value<_ValueType> = {})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change aligns the template parameters with that of SYCL group scans. It also allows omitting __no_init_value in the call (see the lines 533-534 below).

Comment on lines +433 to +437
template <bool _Inclusive, std::uint16_t _ElemsPerItem, std::uint16_t _WGSize, typename _KernelName>
struct __parallel_transform_scan_static_single_group_submitter;

template <bool _Inclusive, ::std::uint16_t _ElemsPerItem, ::std::uint16_t _WGSize, bool _IsFullGroup,
typename... _ScanKernelName>
struct __parallel_transform_scan_static_single_group_submitter<_Inclusive, _ElemsPerItem, _WGSize, _IsFullGroup,
template <bool _Inclusive, std::uint16_t _ElemsPerItem, std::uint16_t _WGSize, typename... _ScanKernelName>
struct __parallel_transform_scan_static_single_group_submitter<_Inclusive, _ElemsPerItem, _WGSize,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_IsFullGroup is no more used by the class.

Comment on lines 489 to 490
typename _KernelName>
template <typename _Size, std::uint16_t _ElemsPerItem, std::uint16_t _WGSize, typename _KernelName>
struct __parallel_copy_if_static_single_group_submitter;

template <typename _Size, ::std::uint16_t _ElemsPerItem, ::std::uint16_t _WGSize, bool _IsFullGroup,
typename... _ScanKernelName>
struct __parallel_copy_if_static_single_group_submitter<_Size, _ElemsPerItem, _WGSize, _IsFullGroup,
template <typename _Size, std::uint16_t _ElemsPerItem, std::uint16_t _WGSize, typename... _ScanKernelName>
struct __parallel_copy_if_static_single_group_submitter<_Size, _ElemsPerItem, _WGSize,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Comment on lines 498 to 496
typename _Assign>
template <typename _InRng, typename _OutRng, typename _UnaryOp, typename _Assign>
__future<sycl::event, __result_and_scratch_storage<_Size>>
operator()(sycl::queue& __q, _InRng&& __in_rng, _OutRng&& __out_rng, std::size_t __n, _InitType __init,
_BinaryOperation __bin_op, _UnaryOp __unary_op, _Assign __assign)
operator()(sycl::queue& __q, _InRng&& __in_rng, _OutRng&& __out_rng, std::size_t __n, _UnaryOp __unary_op,
_Assign __assign)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial value type and the binary operation for scans are always defined the same way for copy_if, as scans operate with the mask and indices (and not user types). There is therefore no reason to expose those in the interface of __parallel_copy_if_static_single_group_submitter.

Comment on lines 548 to 557
const ::std::uint16_t __residual = __n % _WGSize;
const ::std::uint16_t __residual_start = __n - __residual;
if (__item_id < __residual)
{
auto __idx = __residual_start + __item_id;
if (__lacc[__idx])
__assign(static_cast<__tuple_type>(__in_rng[__idx]),
__out_rng[__lacc[__idx + __elems_per_wg]]);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The for loop right above already deals correctly with the residual, this code is completely unnecessary.

Comment on lines -589 to -592
const bool __is_full_group = __n == __wg_size;

if (__is_full_group)
return __parallel_transform_scan_static_single_group_submitter<
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since _IsFullGroup was not used, the runtime dispatch here was completely pointless.

@akukanov akukanov force-pushed the dev/simplify-single-group-copy_if-akukanov branch 3 times, most recently from 7cab0e5 to e452faf Compare October 31, 2025 00:13
@akukanov akukanov force-pushed the dev/simplify-single-group-copy_if-akukanov branch from e452faf to 72cc15c Compare November 3, 2025 13:25
Comment on lines -856 to -858
// Note that the integers provided in the integer_sequence must be monotonically increasing
template <typename>
class __static_monotonic_dispatcher;
Copy link
Contributor Author

@akukanov akukanov Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to remove __static_monotonic_dispatcher together with its only use in __parallel_copy_if.

While it is a nice utility to avoid the verbose size-based runtime-to-template-instantiation dispatch with multiple if branches or switch statements, I think these explicitly verbose constructs are in a sense better - exactly because of "highlighting" the dispatch. This sort of dispatch is quite problematic due to multiple kernel instantiations, and it should be used consciously only where absolutely necessary. The nice wrapper both "hides" the ugly pattern and also adds an extra layer (if not two) to go through in case of refactoring or adjustments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant