This is a functor which wraps a lambda for reduction. Basically, this is necessary when one wants to call a variadic lambda on an NVIDIA GPU. CUDA seems to be unable to expand the variadic arguments - in contrast, a direct approach does indeed work for openMP or serial compilation. To get around this limitation, the KokkosNDLambdaWrapperReduction packs the indices into an array. Uses compile-time index sequences to extract the first dim args as indices and the last arg as the reduction value, avoiding recursive tuple_first/tuple_cat overhead per GPU thread.
More...
#include <kokkos.hh>
|
| template<size_t... Is, typename... Args> |
| KOKKOS_FORCEINLINE_FUNCTION void | impl (device::integer_sequence< size_t, Is... >, Args &&...args) const |
| |
template<int dim, typename FUN>
struct DiFfRG::KokkosNDLambdaWrapperReduction< dim, FUN >
This is a functor which wraps a lambda for reduction. Basically, this is necessary when one wants to call a variadic lambda on an NVIDIA GPU. CUDA seems to be unable to expand the variadic arguments - in contrast, a direct approach does indeed work for openMP or serial compilation. To get around this limitation, the KokkosNDLambdaWrapperReduction packs the indices into an array. Uses compile-time index sequences to extract the first dim args as indices and the last arg as the reduction value, avoiding recursive tuple_first/tuple_cat overhead per GPU thread.
- Template Parameters
-
| dim | Number of arguments taken |
| FUN | The lambda to which we forward the indices |
◆ KokkosNDLambdaWrapperReduction()
template<int dim, typename FUN >
◆ impl()
template<int dim, typename FUN >
template<size_t... Is, typename... Args>
◆ operator()()
template<int dim, typename FUN >
template<typename... Args>
requires (sizeof...(Args) == dim + 1)
◆ fun
template<int dim, typename FUN >
The documentation for this struct was generated from the following file:
- /home/runner/work/DiFfRG_current/DiFfRG_current/DiFfRG/include/DiFfRG/common/kokkos.hh