-
Notifications
You must be signed in to change notification settings - Fork 115
Simplify single group copy_if #2502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments below explain why certain changes are made.
| template <typename _ValueType, bool _Inclusive, typename _Group, typename _InPtr, typename _OutPtr, | ||
| typename _BinaryOperation> | ||
| void | ||
| __scan_work_group(const _Group& __group, _Begin __begin, _End __end, _OutIt __out_it, _BinaryOperation __bin_op, | ||
| unseq_backend::__no_init_value<_ValueType>) | ||
| __scan_work_group(const _Group& __group, _InPtr __begin, _InPtr __end, _OutPtr __out_it, _BinaryOperation __bin_op, | ||
| unseq_backend::__no_init_value<_ValueType> = {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change aligns the template parameters with that of SYCL group scans. It also allows omitting __no_init_value in the call (see the lines 533-534 below).
| template <bool _Inclusive, std::uint16_t _ElemsPerItem, std::uint16_t _WGSize, typename _KernelName> | ||
| struct __parallel_transform_scan_static_single_group_submitter; | ||
|
|
||
| template <bool _Inclusive, ::std::uint16_t _ElemsPerItem, ::std::uint16_t _WGSize, bool _IsFullGroup, | ||
| typename... _ScanKernelName> | ||
| struct __parallel_transform_scan_static_single_group_submitter<_Inclusive, _ElemsPerItem, _WGSize, _IsFullGroup, | ||
| template <bool _Inclusive, std::uint16_t _ElemsPerItem, std::uint16_t _WGSize, typename... _ScanKernelName> | ||
| struct __parallel_transform_scan_static_single_group_submitter<_Inclusive, _ElemsPerItem, _WGSize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_IsFullGroup is no more used by the class.
| typename _KernelName> | ||
| template <typename _Size, std::uint16_t _ElemsPerItem, std::uint16_t _WGSize, typename _KernelName> | ||
| struct __parallel_copy_if_static_single_group_submitter; | ||
|
|
||
| template <typename _Size, ::std::uint16_t _ElemsPerItem, ::std::uint16_t _WGSize, bool _IsFullGroup, | ||
| typename... _ScanKernelName> | ||
| struct __parallel_copy_if_static_single_group_submitter<_Size, _ElemsPerItem, _WGSize, _IsFullGroup, | ||
| template <typename _Size, std::uint16_t _ElemsPerItem, std::uint16_t _WGSize, typename... _ScanKernelName> | ||
| struct __parallel_copy_if_static_single_group_submitter<_Size, _ElemsPerItem, _WGSize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
| typename _Assign> | ||
| template <typename _InRng, typename _OutRng, typename _UnaryOp, typename _Assign> | ||
| __future<sycl::event, __result_and_scratch_storage<_Size>> | ||
| operator()(sycl::queue& __q, _InRng&& __in_rng, _OutRng&& __out_rng, std::size_t __n, _InitType __init, | ||
| _BinaryOperation __bin_op, _UnaryOp __unary_op, _Assign __assign) | ||
| operator()(sycl::queue& __q, _InRng&& __in_rng, _OutRng&& __out_rng, std::size_t __n, _UnaryOp __unary_op, | ||
| _Assign __assign) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial value type and the binary operation for scans are always defined the same way for copy_if, as scans operate with the mask and indices (and not user types). There is therefore no reason to expose those in the interface of __parallel_copy_if_static_single_group_submitter.
| const ::std::uint16_t __residual = __n % _WGSize; | ||
| const ::std::uint16_t __residual_start = __n - __residual; | ||
| if (__item_id < __residual) | ||
| { | ||
| auto __idx = __residual_start + __item_id; | ||
| if (__lacc[__idx]) | ||
| __assign(static_cast<__tuple_type>(__in_rng[__idx]), | ||
| __out_rng[__lacc[__idx + __elems_per_wg]]); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The for loop right above already deals correctly with the residual, this code is completely unnecessary.
| const bool __is_full_group = __n == __wg_size; | ||
|
|
||
| if (__is_full_group) | ||
| return __parallel_transform_scan_static_single_group_submitter< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since _IsFullGroup was not used, the runtime dispatch here was completely pointless.
7cab0e5 to
e452faf
Compare
e452faf to
72cc15c
Compare
| // Note that the integers provided in the integer_sequence must be monotonically increasing | ||
| template <typename> | ||
| class __static_monotonic_dispatcher; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to remove __static_monotonic_dispatcher together with its only use in __parallel_copy_if.
While it is a nice utility to avoid the verbose size-based runtime-to-template-instantiation dispatch with multiple if branches or switch statements, I think these explicitly verbose constructs are in a sense better - exactly because of "highlighting" the dispatch. This sort of dispatch is quite problematic due to multiple kernel instantiations, and it should be used consciously only where absolutely necessary. The nice wrapper both "hides" the ugly pattern and also adds an extra layer (if not two) to go through in case of refactoring or adjustments.
The patch removes seemingly redundant (most likely outdated) code in the single-group implementation of
copy_if.