Changes
#17 (Aug 28, 2024, 8:09:30 AM)
- Fix bogus icpc -Werror — ndellin / githubweb
- Fix macro guards — Daniel Arndt / githubweb
#16 (Aug 27, 2024, 10:37:19 PM)
- Use raw pointers for std::sort if possible — Daniel Arndt / githubweb
- core(graph): allow submission onto an arbitrary exec space instance — romin.tomasetti / githubweb
- Fix typo in macro guard — noreply / githubweb
#16 (Aug 27, 2024, 10:37:19 PM)
- Use raw pointers for std::sort if possible — Daniel Arndt / githubweb
- core(graph): allow submission onto an arbitrary exec space instance — romin.tomasetti / githubweb
- Fix typo in macro guard — noreply / githubweb
#15 (Aug 27, 2024, 3:20:37 PM)
- Make sure the graph death test has the `_DeathTest` suffix (#7262) — noreply / githubweb
- Allow extracting host and device views from DualView with const value type (#7242) — noreply / githubweb
- Split test — Daniel Arndt / githubweb
#13 (Aug 25, 2024, 11:37:05 AM)
- Fix DynRankView::operator[](index_type) constraint — Daniel Arndt / githubweb
- Move and rename Kokkos_View.hpp — crtrott / githubweb
- Add new Kokkos_View.hpp to include Kokkos_ViewLegacy.hpp — crtrott / githubweb
- Fix test — Daniel Arndt / githubweb
#12 (Aug 23, 2024, 4:44:38 PM)
- SYCL: Address oneAPI 2025.0.0 deprecations — Daniel Arndt / githubweb
- Add workflow to create releases with SLSA provenance generation — Damien L-G / githubweb
- Drop (unused) .clang-format-ignore file — Damien L-G / githubweb
- Remove (unused) code coverage configuration file — Damien L-G / githubweb
- Remove master_history.txt — Damien L-G / githubweb
- Prefer dot-file-style build configuration file with AppVeyor — Damien L-G / githubweb
- Remove obscure test scripts — Damien L-G / githubweb
- Update CONTRIBUTING and remove outdated comment to reflect our using clang-foramt-16 [ci ckip] — Damien L-G / githubweb
- Drop outdated file describing how to snapshot Kokkos into Trilinos [ci skip] — Damien L-G / githubweb
- Ignore codebase formatting commits in the blame view (#7204) — noreply / githubweb
- Update error message clang-format version not 16 (#7208) — noreply / githubweb
- [ci skip] Do not add CI and other tools configuration files to archive files — Damien L-G / githubweb
- GitHub CI: Use hashes for versions — Daniel Arndt / githubweb
- OpenMPTarget: FunctorAdapter to centralize tag evaluation. (#7200) — noreply / githubweb
- Pin dependencies in Dockerfiles (#7210) — noreply / githubweb
- OpenMPTarget: FunctorAdapter bug in AMD GPU code path. — rgayatri / githubweb
- Bump actions/checkout from 4.1.6 to 4.1.7 — noreply / githubweb
- Bump actions/upload-artifact from 4.3.5 to 4.3.6 — noreply / githubweb
- Bump github/codeql-action from 3.25.6 to 3.26.0 — noreply / githubweb
- Bump DoozyX/clang-format-lint-action from 0.17 to 0.18 — noreply / githubweb
- Simplify constraints for trivial types — Damien L-G / githubweb
- Drop pointless constrait on Impl::ViewValueFunctor<ScalarType>::construct_shared_allocation() — Damien L-G / githubweb
- Specialize Impl::ViewValueFunctor for trivial types — Damien L-G / githubweb
- Improve fence labels in View init and delete — Damien L-G / githubweb
- Add functions for retrieving the default tile sizes in MDRangePolicy (#6839) — noreply / githubweb
- Fix ub in bin sort when sorting within bins on host (#7223) — noreply / githubweb
- Hands off reserved identifiers (#7224) — noreply / githubweb
- Hide IMPL_CUDA_MALLOC_ASYNC option when CUDA is not enabled — Damien L-G / githubweb
- Mark HIP_MULTIPLE_KERNEL_INSTANTIATIONS and IMPL_HIP_UNIFIED_MEMORY options as backend specific — Damien L-G / githubweb
- Improve further view initialization/destruction (#7225) — noreply / githubweb
- Cuda: Check if device support cudaMallocAsync (#7217) — noreply / githubweb
- Cherry-pick 4.4.00 changelog — Damien L-G / githubweb
- Bump github/codeql-action from 3.26.0 to 3.26.2 — noreply / githubweb
- Improve GH action to produce release artifacts (#7231) — noreply / githubweb
- Fix clang-tidy header guard to ignore tpls — Daniel Arndt / githubweb
- Fix misspelled cmake variable, for some systems leading to not compiling all tests — crtrott / githubweb
- Fix overlooked naming test in #7222 — crtrott / githubweb
- Introduce new `SequentialHostInit` view allocation property (#7229) — noreply / githubweb
- Adding occupancy tuning for CUDA architectures (#6788) — noreply / githubweb
- Fixup [Experimental::]SYCL — Damien L-G / githubweb
- Use macros to protect use of SharedSpace — crtrott / githubweb
- core(graph): promote `instantiate` to public API — romin.tomasetti / githubweb
- Enforce modernize-type-traits (#7227) — noreply / githubweb
- typo: kokkkos -> kokkos — romin.tomasetti / githubweb
- core(view): aligning `HostMirror` and `host_mirror_type` — romin.tomasetti / githubweb
- Add support for CUDA unified memory architectures i.e. Grace Hopper (#6823) — noreply / githubweb
- Using CUDA limits to set extents for blocks,grids (#7235) — noreply / githubweb
- Fix some more clang-tidy complains — Daniel Arndt / githubweb
- Update nightly CI from ROCm 6.1 to ROCm 6.2 — Bruno Turcksin / githubweb
- Don't use modulo — Daniel Arndt / githubweb
#12 (Aug 23, 2024, 4:44:38 PM)
- SYCL: Address oneAPI 2025.0.0 deprecations — Daniel Arndt / githubweb
- Add workflow to create releases with SLSA provenance generation — Damien L-G / githubweb
- Drop (unused) .clang-format-ignore file — Damien L-G / githubweb
- Remove (unused) code coverage configuration file — Damien L-G / githubweb
- Remove master_history.txt — Damien L-G / githubweb
- Prefer dot-file-style build configuration file with AppVeyor — Damien L-G / githubweb
- Remove obscure test scripts — Damien L-G / githubweb
- Update CONTRIBUTING and remove outdated comment to reflect our using clang-foramt-16 [ci ckip] — Damien L-G / githubweb
- Drop outdated file describing how to snapshot Kokkos into Trilinos [ci skip] — Damien L-G / githubweb
- Ignore codebase formatting commits in the blame view (#7204) — noreply / githubweb
- Update error message clang-format version not 16 (#7208) — noreply / githubweb
- [ci skip] Do not add CI and other tools configuration files to archive files — Damien L-G / githubweb
- GitHub CI: Use hashes for versions — Daniel Arndt / githubweb
- OpenMPTarget: FunctorAdapter to centralize tag evaluation. (#7200) — noreply / githubweb
- Pin dependencies in Dockerfiles (#7210) — noreply / githubweb
- OpenMPTarget: FunctorAdapter bug in AMD GPU code path. — rgayatri / githubweb
- Bump actions/checkout from 4.1.6 to 4.1.7 — noreply / githubweb
- Bump actions/upload-artifact from 4.3.5 to 4.3.6 — noreply / githubweb
- Bump github/codeql-action from 3.25.6 to 3.26.0 — noreply / githubweb
- Bump DoozyX/clang-format-lint-action from 0.17 to 0.18 — noreply / githubweb
- Simplify constraints for trivial types — Damien L-G / githubweb
- Drop pointless constrait on Impl::ViewValueFunctor<ScalarType>::construct_shared_allocation() — Damien L-G / githubweb
- Specialize Impl::ViewValueFunctor for trivial types — Damien L-G / githubweb
- Improve fence labels in View init and delete — Damien L-G / githubweb
- Add functions for retrieving the default tile sizes in MDRangePolicy (#6839) — noreply / githubweb
- Fix ub in bin sort when sorting within bins on host (#7223) — noreply / githubweb
- Hands off reserved identifiers (#7224) — noreply / githubweb
- Hide IMPL_CUDA_MALLOC_ASYNC option when CUDA is not enabled — Damien L-G / githubweb
- Mark HIP_MULTIPLE_KERNEL_INSTANTIATIONS and IMPL_HIP_UNIFIED_MEMORY options as backend specific — Damien L-G / githubweb
- Improve further view initialization/destruction (#7225) — noreply / githubweb
- Cuda: Check if device support cudaMallocAsync (#7217) — noreply / githubweb
- Cherry-pick 4.4.00 changelog — Damien L-G / githubweb
- Bump github/codeql-action from 3.26.0 to 3.26.2 — noreply / githubweb
- Improve GH action to produce release artifacts (#7231) — noreply / githubweb
- Fix clang-tidy header guard to ignore tpls — Daniel Arndt / githubweb
- Fix misspelled cmake variable, for some systems leading to not compiling all tests — crtrott / githubweb
- Fix overlooked naming test in #7222 — crtrott / githubweb
- Introduce new `SequentialHostInit` view allocation property (#7229) — noreply / githubweb
- Adding occupancy tuning for CUDA architectures (#6788) — noreply / githubweb
- Fixup [Experimental::]SYCL — Damien L-G / githubweb
- Use macros to protect use of SharedSpace — crtrott / githubweb
- core(graph): promote `instantiate` to public API — romin.tomasetti / githubweb
- Enforce modernize-type-traits (#7227) — noreply / githubweb
- typo: kokkkos -> kokkos — romin.tomasetti / githubweb
- core(view): aligning `HostMirror` and `host_mirror_type` — romin.tomasetti / githubweb
- Add support for CUDA unified memory architectures i.e. Grace Hopper (#6823) — noreply / githubweb
- Using CUDA limits to set extents for blocks,grids (#7235) — noreply / githubweb
- Fix some more clang-tidy complains — Daniel Arndt / githubweb
- Update nightly CI from ROCm 6.1 to ROCm 6.2 — Bruno Turcksin / githubweb
- Don't use modulo — Daniel Arndt / githubweb
#12 (Aug 23, 2024, 4:44:38 PM)
- SYCL: Address oneAPI 2025.0.0 deprecations — Daniel Arndt / githubweb
- Add workflow to create releases with SLSA provenance generation — Damien L-G / githubweb
- Drop (unused) .clang-format-ignore file — Damien L-G / githubweb
- Remove (unused) code coverage configuration file — Damien L-G / githubweb
- Remove master_history.txt — Damien L-G / githubweb
- Prefer dot-file-style build configuration file with AppVeyor — Damien L-G / githubweb
- Remove obscure test scripts — Damien L-G / githubweb
- Update CONTRIBUTING and remove outdated comment to reflect our using clang-foramt-16 [ci ckip] — Damien L-G / githubweb
- Drop outdated file describing how to snapshot Kokkos into Trilinos [ci skip] — Damien L-G / githubweb
- Ignore codebase formatting commits in the blame view (#7204) — noreply / githubweb
- Update error message clang-format version not 16 (#7208) — noreply / githubweb
- [ci skip] Do not add CI and other tools configuration files to archive files — Damien L-G / githubweb
- GitHub CI: Use hashes for versions — Daniel Arndt / githubweb
- OpenMPTarget: FunctorAdapter to centralize tag evaluation. (#7200) — noreply / githubweb
- Pin dependencies in Dockerfiles (#7210) — noreply / githubweb
- OpenMPTarget: FunctorAdapter bug in AMD GPU code path. — rgayatri / githubweb
- Bump actions/checkout from 4.1.6 to 4.1.7 — noreply / githubweb
- Bump actions/upload-artifact from 4.3.5 to 4.3.6 — noreply / githubweb
- Bump github/codeql-action from 3.25.6 to 3.26.0 — noreply / githubweb
- Bump DoozyX/clang-format-lint-action from 0.17 to 0.18 — noreply / githubweb
- Simplify constraints for trivial types — Damien L-G / githubweb
- Drop pointless constrait on Impl::ViewValueFunctor<ScalarType>::construct_shared_allocation() — Damien L-G / githubweb
- Specialize Impl::ViewValueFunctor for trivial types — Damien L-G / githubweb
- Improve fence labels in View init and delete — Damien L-G / githubweb
- Add functions for retrieving the default tile sizes in MDRangePolicy (#6839) — noreply / githubweb
- Fix ub in bin sort when sorting within bins on host (#7223) — noreply / githubweb
- Hands off reserved identifiers (#7224) — noreply / githubweb
- Hide IMPL_CUDA_MALLOC_ASYNC option when CUDA is not enabled — Damien L-G / githubweb
- Mark HIP_MULTIPLE_KERNEL_INSTANTIATIONS and IMPL_HIP_UNIFIED_MEMORY options as backend specific — Damien L-G / githubweb
- Improve further view initialization/destruction (#7225) — noreply / githubweb
- Cuda: Check if device support cudaMallocAsync (#7217) — noreply / githubweb
- Cherry-pick 4.4.00 changelog — Damien L-G / githubweb
- Bump github/codeql-action from 3.26.0 to 3.26.2 — noreply / githubweb
- Improve GH action to produce release artifacts (#7231) — noreply / githubweb
- Fix clang-tidy header guard to ignore tpls — Daniel Arndt / githubweb
- Fix misspelled cmake variable, for some systems leading to not compiling all tests — crtrott / githubweb
- Fix overlooked naming test in #7222 — crtrott / githubweb
- Introduce new `SequentialHostInit` view allocation property (#7229) — noreply / githubweb
- Adding occupancy tuning for CUDA architectures (#6788) — noreply / githubweb
- Fixup [Experimental::]SYCL — Damien L-G / githubweb
- Use macros to protect use of SharedSpace — crtrott / githubweb
- core(graph): promote `instantiate` to public API — romin.tomasetti / githubweb
- Enforce modernize-type-traits (#7227) — noreply / githubweb
- typo: kokkkos -> kokkos — romin.tomasetti / githubweb
- core(view): aligning `HostMirror` and `host_mirror_type` — romin.tomasetti / githubweb
- Add support for CUDA unified memory architectures i.e. Grace Hopper (#6823) — noreply / githubweb
- Using CUDA limits to set extents for blocks,grids (#7235) — noreply / githubweb
- Fix some more clang-tidy complains — Daniel Arndt / githubweb
- Update nightly CI from ROCm 6.1 to ROCm 6.2 — Bruno Turcksin / githubweb
- Don't use modulo — Daniel Arndt / githubweb
#12 (Aug 23, 2024, 4:44:38 PM)
- SYCL: Address oneAPI 2025.0.0 deprecations — Daniel Arndt / githubweb
- Add workflow to create releases with SLSA provenance generation — Damien L-G / githubweb
- Drop (unused) .clang-format-ignore file — Damien L-G / githubweb
- Remove (unused) code coverage configuration file — Damien L-G / githubweb
- Remove master_history.txt — Damien L-G / githubweb
- Prefer dot-file-style build configuration file with AppVeyor — Damien L-G / githubweb
- Remove obscure test scripts — Damien L-G / githubweb
- Update CONTRIBUTING and remove outdated comment to reflect our using clang-foramt-16 [ci ckip] — Damien L-G / githubweb
- Drop outdated file describing how to snapshot Kokkos into Trilinos [ci skip] — Damien L-G / githubweb
- Ignore codebase formatting commits in the blame view (#7204) — noreply / githubweb
- Update error message clang-format version not 16 (#7208) — noreply / githubweb
- [ci skip] Do not add CI and other tools configuration files to archive files — Damien L-G / githubweb
- GitHub CI: Use hashes for versions — Daniel Arndt / githubweb
- OpenMPTarget: FunctorAdapter to centralize tag evaluation. (#7200) — noreply / githubweb
- Pin dependencies in Dockerfiles (#7210) — noreply / githubweb
- OpenMPTarget: FunctorAdapter bug in AMD GPU code path. — rgayatri / githubweb
- Bump actions/checkout from 4.1.6 to 4.1.7 — noreply / githubweb
- Bump actions/upload-artifact from 4.3.5 to 4.3.6 — noreply / githubweb
- Bump github/codeql-action from 3.25.6 to 3.26.0 — noreply / githubweb
- Bump DoozyX/clang-format-lint-action from 0.17 to 0.18 — noreply / githubweb
- Simplify constraints for trivial types — Damien L-G / githubweb
- Drop pointless constrait on Impl::ViewValueFunctor<ScalarType>::construct_shared_allocation() — Damien L-G / githubweb
- Specialize Impl::ViewValueFunctor for trivial types — Damien L-G / githubweb
- Improve fence labels in View init and delete — Damien L-G / githubweb
- Add functions for retrieving the default tile sizes in MDRangePolicy (#6839) — noreply / githubweb
- Fix ub in bin sort when sorting within bins on host (#7223) — noreply / githubweb
- Hands off reserved identifiers (#7224) — noreply / githubweb
- Hide IMPL_CUDA_MALLOC_ASYNC option when CUDA is not enabled — Damien L-G / githubweb
- Mark HIP_MULTIPLE_KERNEL_INSTANTIATIONS and IMPL_HIP_UNIFIED_MEMORY options as backend specific — Damien L-G / githubweb
- Improve further view initialization/destruction (#7225) — noreply / githubweb
- Cuda: Check if device support cudaMallocAsync (#7217) — noreply / githubweb
- Cherry-pick 4.4.00 changelog — Damien L-G / githubweb
- Bump github/codeql-action from 3.26.0 to 3.26.2 — noreply / githubweb
- Improve GH action to produce release artifacts (#7231) — noreply / githubweb
- Fix clang-tidy header guard to ignore tpls — Daniel Arndt / githubweb
- Fix misspelled cmake variable, for some systems leading to not compiling all tests — crtrott / githubweb
- Fix overlooked naming test in #7222 — crtrott / githubweb
- Introduce new `SequentialHostInit` view allocation property (#7229) — noreply / githubweb
- Adding occupancy tuning for CUDA architectures (#6788) — noreply / githubweb
- Fixup [Experimental::]SYCL — Damien L-G / githubweb
- Use macros to protect use of SharedSpace — crtrott / githubweb
- core(graph): promote `instantiate` to public API — romin.tomasetti / githubweb
- Enforce modernize-type-traits (#7227) — noreply / githubweb
- typo: kokkkos -> kokkos — romin.tomasetti / githubweb
- core(view): aligning `HostMirror` and `host_mirror_type` — romin.tomasetti / githubweb
- Add support for CUDA unified memory architectures i.e. Grace Hopper (#6823) — noreply / githubweb
- Using CUDA limits to set extents for blocks,grids (#7235) — noreply / githubweb
- Fix some more clang-tidy complains — Daniel Arndt / githubweb
- Update nightly CI from ROCm 6.1 to ROCm 6.2 — Bruno Turcksin / githubweb
- Don't use modulo — Daniel Arndt / githubweb
#12 (Aug 23, 2024, 4:44:38 PM)
- SYCL: Address oneAPI 2025.0.0 deprecations — Daniel Arndt / githubweb
- Add workflow to create releases with SLSA provenance generation — Damien L-G / githubweb
- Drop (unused) .clang-format-ignore file — Damien L-G / githubweb
- Remove (unused) code coverage configuration file — Damien L-G / githubweb
- Remove master_history.txt — Damien L-G / githubweb
- Prefer dot-file-style build configuration file with AppVeyor — Damien L-G / githubweb
- Remove obscure test scripts — Damien L-G / githubweb
- Update CONTRIBUTING and remove outdated comment to reflect our using clang-foramt-16 [ci ckip] — Damien L-G / githubweb
- Drop outdated file describing how to snapshot Kokkos into Trilinos [ci skip] — Damien L-G / githubweb
- Ignore codebase formatting commits in the blame view (#7204) — noreply / githubweb
- Update error message clang-format version not 16 (#7208) — noreply / githubweb
- [ci skip] Do not add CI and other tools configuration files to archive files — Damien L-G / githubweb
- GitHub CI: Use hashes for versions — Daniel Arndt / githubweb
- OpenMPTarget: FunctorAdapter to centralize tag evaluation. (#7200) — noreply / githubweb
- Pin dependencies in Dockerfiles (#7210) — noreply / githubweb
- OpenMPTarget: FunctorAdapter bug in AMD GPU code path. — rgayatri / githubweb
- Bump actions/checkout from 4.1.6 to 4.1.7 — noreply / githubweb
- Bump actions/upload-artifact from 4.3.5 to 4.3.6 — noreply / githubweb
- Bump github/codeql-action from 3.25.6 to 3.26.0 — noreply / githubweb
- Bump DoozyX/clang-format-lint-action from 0.17 to 0.18 — noreply / githubweb
- Simplify constraints for trivial types — Damien L-G / githubweb
- Drop pointless constrait on Impl::ViewValueFunctor<ScalarType>::construct_shared_allocation() — Damien L-G / githubweb
- Specialize Impl::ViewValueFunctor for trivial types — Damien L-G / githubweb
- Improve fence labels in View init and delete — Damien L-G / githubweb
- Add functions for retrieving the default tile sizes in MDRangePolicy (#6839) — noreply / githubweb
- Fix ub in bin sort when sorting within bins on host (#7223) — noreply / githubweb
- Hands off reserved identifiers (#7224) — noreply / githubweb
- Hide IMPL_CUDA_MALLOC_ASYNC option when CUDA is not enabled — Damien L-G / githubweb
- Mark HIP_MULTIPLE_KERNEL_INSTANTIATIONS and IMPL_HIP_UNIFIED_MEMORY options as backend specific — Damien L-G / githubweb
- Improve further view initialization/destruction (#7225) — noreply / githubweb
- Cuda: Check if device support cudaMallocAsync (#7217) — noreply / githubweb
- Cherry-pick 4.4.00 changelog — Damien L-G / githubweb
- Bump github/codeql-action from 3.26.0 to 3.26.2 — noreply / githubweb
- Improve GH action to produce release artifacts (#7231) — noreply / githubweb
- Fix clang-tidy header guard to ignore tpls — Daniel Arndt / githubweb
- Fix misspelled cmake variable, for some systems leading to not compiling all tests — crtrott / githubweb
- Fix overlooked naming test in #7222 — crtrott / githubweb
- Introduce new `SequentialHostInit` view allocation property (#7229) — noreply / githubweb
- Adding occupancy tuning for CUDA architectures (#6788) — noreply / githubweb
- Fixup [Experimental::]SYCL — Damien L-G / githubweb
- Use macros to protect use of SharedSpace — crtrott / githubweb
- core(graph): promote `instantiate` to public API — romin.tomasetti / githubweb
- Enforce modernize-type-traits (#7227) — noreply / githubweb
- typo: kokkkos -> kokkos — romin.tomasetti / githubweb
- core(view): aligning `HostMirror` and `host_mirror_type` — romin.tomasetti / githubweb
- Add support for CUDA unified memory architectures i.e. Grace Hopper (#6823) — noreply / githubweb
- Using CUDA limits to set extents for blocks,grids (#7235) — noreply / githubweb
- Fix some more clang-tidy complains — Daniel Arndt / githubweb
- Update nightly CI from ROCm 6.1 to ROCm 6.2 — Bruno Turcksin / githubweb
- Don't use modulo — Daniel Arndt / githubweb
#12 (Aug 23, 2024, 4:44:38 PM)
- SYCL: Address oneAPI 2025.0.0 deprecations — Daniel Arndt / githubweb
- Add workflow to create releases with SLSA provenance generation — Damien L-G / githubweb
- Drop (unused) .clang-format-ignore file — Damien L-G / githubweb
- Remove (unused) code coverage configuration file — Damien L-G / githubweb
- Remove master_history.txt — Damien L-G / githubweb
- Prefer dot-file-style build configuration file with AppVeyor — Damien L-G / githubweb
- Remove obscure test scripts — Damien L-G / githubweb
- Update CONTRIBUTING and remove outdated comment to reflect our using clang-foramt-16 [ci ckip] — Damien L-G / githubweb
- Drop outdated file describing how to snapshot Kokkos into Trilinos [ci skip] — Damien L-G / githubweb
- Ignore codebase formatting commits in the blame view (#7204) — noreply / githubweb
- Update error message clang-format version not 16 (#7208) — noreply / githubweb
- [ci skip] Do not add CI and other tools configuration files to archive files — Damien L-G / githubweb
- GitHub CI: Use hashes for versions — Daniel Arndt / githubweb
- OpenMPTarget: FunctorAdapter to centralize tag evaluation. (#7200) — noreply / githubweb
- Pin dependencies in Dockerfiles (#7210) — noreply / githubweb
- OpenMPTarget: FunctorAdapter bug in AMD GPU code path. — rgayatri / githubweb
- Bump actions/checkout from 4.1.6 to 4.1.7 — noreply / githubweb
- Bump actions/upload-artifact from 4.3.5 to 4.3.6 — noreply / githubweb
- Bump github/codeql-action from 3.25.6 to 3.26.0 — noreply / githubweb
- Bump DoozyX/clang-format-lint-action from 0.17 to 0.18 — noreply / githubweb
- Simplify constraints for trivial types — Damien L-G / githubweb
- Drop pointless constrait on Impl::ViewValueFunctor<ScalarType>::construct_shared_allocation() — Damien L-G / githubweb
- Specialize Impl::ViewValueFunctor for trivial types — Damien L-G / githubweb
- Improve fence labels in View init and delete — Damien L-G / githubweb
- Add functions for retrieving the default tile sizes in MDRangePolicy (#6839) — noreply / githubweb
- Fix ub in bin sort when sorting within bins on host (#7223) — noreply / githubweb
- Hands off reserved identifiers (#7224) — noreply / githubweb
- Hide IMPL_CUDA_MALLOC_ASYNC option when CUDA is not enabled — Damien L-G / githubweb
- Mark HIP_MULTIPLE_KERNEL_INSTANTIATIONS and IMPL_HIP_UNIFIED_MEMORY options as backend specific — Damien L-G / githubweb
- Improve further view initialization/destruction (#7225) — noreply / githubweb
- Cuda: Check if device support cudaMallocAsync (#7217) — noreply / githubweb
- Cherry-pick 4.4.00 changelog — Damien L-G / githubweb
- Bump github/codeql-action from 3.26.0 to 3.26.2 — noreply / githubweb
- Improve GH action to produce release artifacts (#7231) — noreply / githubweb
- Fix clang-tidy header guard to ignore tpls — Daniel Arndt / githubweb
- Fix misspelled cmake variable, for some systems leading to not compiling all tests — crtrott / githubweb
- Fix overlooked naming test in #7222 — crtrott / githubweb
- Introduce new `SequentialHostInit` view allocation property (#7229) — noreply / githubweb
- Adding occupancy tuning for CUDA architectures (#6788) — noreply / githubweb
- Fixup [Experimental::]SYCL — Damien L-G / githubweb
- Use macros to protect use of SharedSpace — crtrott / githubweb
- core(graph): promote `instantiate` to public API — romin.tomasetti / githubweb
- Enforce modernize-type-traits (#7227) — noreply / githubweb
- typo: kokkkos -> kokkos — romin.tomasetti / githubweb
- core(view): aligning `HostMirror` and `host_mirror_type` — romin.tomasetti / githubweb
- Add support for CUDA unified memory architectures i.e. Grace Hopper (#6823) — noreply / githubweb
- Using CUDA limits to set extents for blocks,grids (#7235) — noreply / githubweb
- Fix some more clang-tidy complains — Daniel Arndt / githubweb
- Update nightly CI from ROCm 6.1 to ROCm 6.2 — Bruno Turcksin / githubweb
- Don't use modulo — Daniel Arndt / githubweb
#12 (Aug 23, 2024, 4:44:38 PM)
- SYCL: Address oneAPI 2025.0.0 deprecations — Daniel Arndt / githubweb
- Add workflow to create releases with SLSA provenance generation — Damien L-G / githubweb
- Drop (unused) .clang-format-ignore file — Damien L-G / githubweb
- Remove (unused) code coverage configuration file — Damien L-G / githubweb
- Remove master_history.txt — Damien L-G / githubweb
- Prefer dot-file-style build configuration file with AppVeyor — Damien L-G / githubweb
- Remove obscure test scripts — Damien L-G / githubweb
- Update CONTRIBUTING and remove outdated comment to reflect our using clang-foramt-16 [ci ckip] — Damien L-G / githubweb
- Drop outdated file describing how to snapshot Kokkos into Trilinos [ci skip] — Damien L-G / githubweb
- Ignore codebase formatting commits in the blame view (#7204) — noreply / githubweb
- Update error message clang-format version not 16 (#7208) — noreply / githubweb
- [ci skip] Do not add CI and other tools configuration files to archive files — Damien L-G / githubweb
- GitHub CI: Use hashes for versions — Daniel Arndt / githubweb
- OpenMPTarget: FunctorAdapter to centralize tag evaluation. (#7200) — noreply / githubweb
- Pin dependencies in Dockerfiles (#7210) — noreply / githubweb
- OpenMPTarget: FunctorAdapter bug in AMD GPU code path. — rgayatri / githubweb
- Bump actions/checkout from 4.1.6 to 4.1.7 — noreply / githubweb
- Bump actions/upload-artifact from 4.3.5 to 4.3.6 — noreply / githubweb
- Bump github/codeql-action from 3.25.6 to 3.26.0 — noreply / githubweb
- Bump DoozyX/clang-format-lint-action from 0.17 to 0.18 — noreply / githubweb
- Simplify constraints for trivial types — Damien L-G / githubweb
- Drop pointless constrait on Impl::ViewValueFunctor<ScalarType>::construct_shared_allocation() — Damien L-G / githubweb
- Specialize Impl::ViewValueFunctor for trivial types — Damien L-G / githubweb
- Improve fence labels in View init and delete — Damien L-G / githubweb
- Add functions for retrieving the default tile sizes in MDRangePolicy (#6839) — noreply / githubweb
- Fix ub in bin sort when sorting within bins on host (#7223) — noreply / githubweb
- Hands off reserved identifiers (#7224) — noreply / githubweb
- Hide IMPL_CUDA_MALLOC_ASYNC option when CUDA is not enabled — Damien L-G / githubweb
- Mark HIP_MULTIPLE_KERNEL_INSTANTIATIONS and IMPL_HIP_UNIFIED_MEMORY options as backend specific — Damien L-G / githubweb
- Improve further view initialization/destruction (#7225) — noreply / githubweb
- Cuda: Check if device support cudaMallocAsync (#7217) — noreply / githubweb
- Cherry-pick 4.4.00 changelog — Damien L-G / githubweb
- Bump github/codeql-action from 3.26.0 to 3.26.2 — noreply / githubweb
- Improve GH action to produce release artifacts (#7231) — noreply / githubweb
- Fix clang-tidy header guard to ignore tpls — Daniel Arndt / githubweb
- Fix misspelled cmake variable, for some systems leading to not compiling all tests — crtrott / githubweb
- Fix overlooked naming test in #7222 — crtrott / githubweb
- Introduce new `SequentialHostInit` view allocation property (#7229) — noreply / githubweb
- Adding occupancy tuning for CUDA architectures (#6788) — noreply / githubweb
- Fixup [Experimental::]SYCL — Damien L-G / githubweb
- Use macros to protect use of SharedSpace — crtrott / githubweb
- core(graph): promote `instantiate` to public API — romin.tomasetti / githubweb
- Enforce modernize-type-traits (#7227) — noreply / githubweb
- typo: kokkkos -> kokkos — romin.tomasetti / githubweb
- core(view): aligning `HostMirror` and `host_mirror_type` — romin.tomasetti / githubweb
- Add support for CUDA unified memory architectures i.e. Grace Hopper (#6823) — noreply / githubweb
- Using CUDA limits to set extents for blocks,grids (#7235) — noreply / githubweb
- Fix some more clang-tidy complains — Daniel Arndt / githubweb
- Update nightly CI from ROCm 6.1 to ROCm 6.2 — Bruno Turcksin / githubweb
- Don't use modulo — Daniel Arndt / githubweb
#12 (Aug 23, 2024, 4:44:38 PM)
- SYCL: Address oneAPI 2025.0.0 deprecations — Daniel Arndt / githubweb
- Add workflow to create releases with SLSA provenance generation — Damien L-G / githubweb
- Drop (unused) .clang-format-ignore file — Damien L-G / githubweb
- Remove (unused) code coverage configuration file — Damien L-G / githubweb
- Remove master_history.txt — Damien L-G / githubweb
- Prefer dot-file-style build configuration file with AppVeyor — Damien L-G / githubweb
- Remove obscure test scripts — Damien L-G / githubweb
- Update CONTRIBUTING and remove outdated comment to reflect our using clang-foramt-16 [ci ckip] — Damien L-G / githubweb
- Drop outdated file describing how to snapshot Kokkos into Trilinos [ci skip] — Damien L-G / githubweb
- Ignore codebase formatting commits in the blame view (#7204) — noreply / githubweb
- Update error message clang-format version not 16 (#7208) — noreply / githubweb
- [ci skip] Do not add CI and other tools configuration files to archive files — Damien L-G / githubweb
- GitHub CI: Use hashes for versions — Daniel Arndt / githubweb
- OpenMPTarget: FunctorAdapter to centralize tag evaluation. (#7200) — noreply / githubweb
- Pin dependencies in Dockerfiles (#7210) — noreply / githubweb
- OpenMPTarget: FunctorAdapter bug in AMD GPU code path. — rgayatri / githubweb
- Bump actions/checkout from 4.1.6 to 4.1.7 — noreply / githubweb
- Bump actions/upload-artifact from 4.3.5 to 4.3.6 — noreply / githubweb
- Bump github/codeql-action from 3.25.6 to 3.26.0 — noreply / githubweb
- Bump DoozyX/clang-format-lint-action from 0.17 to 0.18 — noreply / githubweb
- Simplify constraints for trivial types — Damien L-G / githubweb
- Drop pointless constrait on Impl::ViewValueFunctor<ScalarType>::construct_shared_allocation() — Damien L-G / githubweb
- Specialize Impl::ViewValueFunctor for trivial types — Damien L-G / githubweb
- Improve fence labels in View init and delete — Damien L-G / githubweb
- Add functions for retrieving the default tile sizes in MDRangePolicy (#6839) — noreply / githubweb
- Fix ub in bin sort when sorting within bins on host (#7223) — noreply / githubweb
- Hands off reserved identifiers (#7224) — noreply / githubweb
- Hide IMPL_CUDA_MALLOC_ASYNC option when CUDA is not enabled — Damien L-G / githubweb
- Mark HIP_MULTIPLE_KERNEL_INSTANTIATIONS and IMPL_HIP_UNIFIED_MEMORY options as backend specific — Damien L-G / githubweb
- Improve further view initialization/destruction (#7225) — noreply / githubweb
- Cuda: Check if device support cudaMallocAsync (#7217) — noreply / githubweb
- Cherry-pick 4.4.00 changelog — Damien L-G / githubweb
- Bump github/codeql-action from 3.26.0 to 3.26.2 — noreply / githubweb
- Improve GH action to produce release artifacts (#7231) — noreply / githubweb
- Fix clang-tidy header guard to ignore tpls — Daniel Arndt / githubweb
- Fix misspelled cmake variable, for some systems leading to not compiling all tests — crtrott / githubweb
- Fix overlooked naming test in #7222 — crtrott / githubweb
- Introduce new `SequentialHostInit` view allocation property (#7229) — noreply / githubweb
- Adding occupancy tuning for CUDA architectures (#6788) — noreply / githubweb
- Fixup [Experimental::]SYCL — Damien L-G / githubweb
- Use macros to protect use of SharedSpace — crtrott / githubweb
- core(graph): promote `instantiate` to public API — romin.tomasetti / githubweb
- Enforce modernize-type-traits (#7227) — noreply / githubweb
- typo: kokkkos -> kokkos — romin.tomasetti / githubweb
- core(view): aligning `HostMirror` and `host_mirror_type` — romin.tomasetti / githubweb
- Add support for CUDA unified memory architectures i.e. Grace Hopper (#6823) — noreply / githubweb
- Using CUDA limits to set extents for blocks,grids (#7235) — noreply / githubweb
- Fix some more clang-tidy complains — Daniel Arndt / githubweb
- Update nightly CI from ROCm 6.1 to ROCm 6.2 — Bruno Turcksin / githubweb
- Don't use modulo — Daniel Arndt / githubweb
#12 (Aug 23, 2024, 4:44:38 PM)
- SYCL: Address oneAPI 2025.0.0 deprecations — Daniel Arndt / githubweb
- Add workflow to create releases with SLSA provenance generation — Damien L-G / githubweb
- Drop (unused) .clang-format-ignore file — Damien L-G / githubweb
- Remove (unused) code coverage configuration file — Damien L-G / githubweb
- Remove master_history.txt — Damien L-G / githubweb
- Prefer dot-file-style build configuration file with AppVeyor — Damien L-G / githubweb
- Remove obscure test scripts — Damien L-G / githubweb
- Update CONTRIBUTING and remove outdated comment to reflect our using clang-foramt-16 [ci ckip] — Damien L-G / githubweb
- Drop outdated file describing how to snapshot Kokkos into Trilinos [ci skip] — Damien L-G / githubweb
- Ignore codebase formatting commits in the blame view (#7204) — noreply / githubweb
- Update error message clang-format version not 16 (#7208) — noreply / githubweb
- [ci skip] Do not add CI and other tools configuration files to archive files — Damien L-G / githubweb
- GitHub CI: Use hashes for versions — Daniel Arndt / githubweb
- OpenMPTarget: FunctorAdapter to centralize tag evaluation. (#7200) — noreply / githubweb
- Pin dependencies in Dockerfiles (#7210) — noreply / githubweb
- OpenMPTarget: FunctorAdapter bug in AMD GPU code path. — rgayatri / githubweb
- Bump actions/checkout from 4.1.6 to 4.1.7 — noreply / githubweb
- Bump actions/upload-artifact from 4.3.5 to 4.3.6 — noreply / githubweb
- Bump github/codeql-action from 3.25.6 to 3.26.0 — noreply / githubweb
- Bump DoozyX/clang-format-lint-action from 0.17 to 0.18 — noreply / githubweb
- Simplify constraints for trivial types — Damien L-G / githubweb
- Drop pointless constrait on Impl::ViewValueFunctor<ScalarType>::construct_shared_allocation() — Damien L-G / githubweb
- Specialize Impl::ViewValueFunctor for trivial types — Damien L-G / githubweb
- Improve fence labels in View init and delete — Damien L-G / githubweb
- Add functions for retrieving the default tile sizes in MDRangePolicy (#6839) — noreply / githubweb
- Fix ub in bin sort when sorting within bins on host (#7223) — noreply / githubweb
- Hands off reserved identifiers (#7224) — noreply / githubweb
- Hide IMPL_CUDA_MALLOC_ASYNC option when CUDA is not enabled — Damien L-G / githubweb
- Mark HIP_MULTIPLE_KERNEL_INSTANTIATIONS and IMPL_HIP_UNIFIED_MEMORY options as backend specific — Damien L-G / githubweb
- Improve further view initialization/destruction (#7225) — noreply / githubweb
- Cuda: Check if device support cudaMallocAsync (#7217) — noreply / githubweb
- Cherry-pick 4.4.00 changelog — Damien L-G / githubweb
- Bump github/codeql-action from 3.26.0 to 3.26.2 — noreply / githubweb
- Improve GH action to produce release artifacts (#7231) — noreply / githubweb
- Fix clang-tidy header guard to ignore tpls — Daniel Arndt / githubweb
- Fix misspelled cmake variable, for some systems leading to not compiling all tests — crtrott / githubweb
- Fix overlooked naming test in #7222 — crtrott / githubweb
- Introduce new `SequentialHostInit` view allocation property (#7229) — noreply / githubweb
- Adding occupancy tuning for CUDA architectures (#6788) — noreply / githubweb
- Fixup [Experimental::]SYCL — Damien L-G / githubweb
- Use macros to protect use of SharedSpace — crtrott / githubweb
- core(graph): promote `instantiate` to public API — romin.tomasetti / githubweb
- Enforce modernize-type-traits (#7227) — noreply / githubweb
- typo: kokkkos -> kokkos — romin.tomasetti / githubweb
- core(view): aligning `HostMirror` and `host_mirror_type` — romin.tomasetti / githubweb
- Add support for CUDA unified memory architectures i.e. Grace Hopper (#6823) — noreply / githubweb
- Using CUDA limits to set extents for blocks,grids (#7235) — noreply / githubweb
- Fix some more clang-tidy complains — Daniel Arndt / githubweb
- Update nightly CI from ROCm 6.1 to ROCm 6.2 — Bruno Turcksin / githubweb
- Don't use modulo — Daniel Arndt / githubweb
#11 (Aug 8, 2024, 10:52:26 AM)
- SYCL: Use sycl::shift_group_[left|right] and sycl::select_from_group (#7146) — noreply / githubweb
- Hidden friend operator== for Kokkos::Array (#7148) — noreply / githubweb
- OpenMPTarget: Update docker clang build. (#7147) — noreply / githubweb
- Make struct "ChunkSize" constructor explicit to avoid implicit construction in RangePolicy (#7151) — noreply / githubweb
- Fix Kokkos::Array<T, 0> default initialization for icpc (#7154) — noreply / githubweb
- Make ExecutionSpace constructors explicit (#7156) — noreply / githubweb
- Bump ossf/scorecard-action from 2.3.3 to 2.4.0 — noreply / githubweb
- Fix Kokkos_CoreUnitTest_DeviceAndThreads (#7159) — noreply / githubweb
- Add nvidia Grace Architecture (#7158) — noreply / githubweb
- tutorials: do not mention requiring c++11 — timo.heister / githubweb
- Enable deprecation warnings in the GCC 8.4 build — Damien L-G / githubweb
- Disable deprecated warnings with GCC < 11.1 for Pair<T1, void> — Damien L-G / githubweb
- Prefer ExecutionSpace::name() to a typeid expression in hello world — Damien L-G / githubweb
- OpenMPTarget: Delete ununsed code. — rgayatri / githubweb
- Implement KOKKOS_ENABLE_IMPL_VIEW_OF_VIEWS_DESTRUCTOR_PRECONDITION_VIOLATION_WORKAROUND (#7168) — noreply / githubweb
- Hide `IMPL_REF_COUNT_BRANCH_UNLIKELY` option (#7175) — noreply / githubweb
- [ci skip] Bump develop to version 4.4.99 — Damien L-G / githubweb
- remove usage of ENABLE_CXX11_DISPATCH_LAMBDA (#7176) — noreply / githubweb
- Add support for AMD Phoenix APUs with Radeon 740M/760M/780M/880M/890M (#7162) — noreply / githubweb
- mention indent/formatting script — timo.heister / githubweb
- add missing tutorials to CMake configuration — timo.heister / githubweb
- add NOLINT statement — timo.heister / githubweb
- Fix bogus warnings for cuda/11.4 with gcc/8.5 (#7181) — noreply / githubweb
- OpenMPTarget: Remove OpenMPTargetExec (#6594) — noreply / githubweb
- Avoid nesting fences into parallel_for when initializing/deleting views — Damien L-G / githubweb
- Fix atomic accessor for pre-volta GPU architectures (#7189) — noreply / githubweb
- Bump actions/upload-artifact from 4.3.4 to 4.3.5 — noreply / githubweb
- OpenMPTarget: DeepCopy in separate file. (#7192) — noreply / githubweb
- Move SYCL out of Experimental (#7171) — noreply / githubweb
- clang-format 16 — Daniel Arndt / githubweb
#10 (Jul 22, 2024, 4:25:43 PM)
- Fix HIP — Daniel Arndt / githubweb
#9 (Jul 22, 2024, 4:23:38 PM)
- Fix HIP — Daniel Arndt / githubweb
#8 (Jul 22, 2024, 2:15:48 PM)
- Fix sign comparison warnings — Daniel Arndt / githubweb
#5 (Jan 29, 2024, 7:27:58 PM)
- Add runtime function to query the number of devices and make device ID consistent with `KOKKOS_VISIBLE_DEVICES` (#6713) — noreply / githubweb
- Clean up test case — Daniel Arndt / githubweb
#3 (Jan 25, 2024, 10:41:36 AM)
- implementation and tests — fnrizzi / githubweb
- implementation and tests — fnrizzi / githubweb
- only compute with relavent entries — tccleve / githubweb
- subset of team level impl of std algorithms — fnrizzi / githubweb
- fix copyright — fnrizzi / githubweb
- guard for openmptarget — fnrizzi / githubweb
- fix for openmptarget — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- SIMD: suppress a uninitialized variable warning (#6294) — noreply / githubweb
- OpenACC CMakechange Clacc (#6250) — noreply / githubweb
- `Kokkos::sort` support custom comparator (#6253) — noreply / githubweb
- Add nightly build using latest gcc and c++23 — Bruno Turcksin / githubweb
- Explicitly capture this in lambda function — Bruno Turcksin / githubweb
- Fix typo in nightly jenkins configuration — Bruno Turcksin / githubweb
- Fix a memory bug in the `free_state` function of random pools. (#6290) — noreply / githubweb
- std_algos: for_each: try condense the impl — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Enforce create_mirror restrictions w.r.t. ViewCtorArgs on all variants — Daniel Arndt / githubweb
- Use KOKKOS_IF_ON_HOST — Daniel Arndt / githubweb
- Suppress warnings — Daniel Arndt / githubweb
- Don't suppress warnings for NVHPC — Daniel Arndt / githubweb
- Improve macro definitions — Daniel Arndt / githubweb
- Enable Serial backend in HPX build — cezary.skrzynski / githubweb
- Modify fences in View API test — cezary.skrzynski / githubweb
- Check for overflow during backend initialization (Cuda, HIP, SYCL) (#6159) — noreply / githubweb
- Improve SYCL reduction performance: MDRangePolicy (#6271) — noreply / githubweb
- bring back previous code as discussed in meeting — fnrizzi / githubweb
- create cudaAPI function wrappers — tccleve / githubweb
- Reorganize #include <cuda_runtime_api.h> — tccleve / githubweb
- Some api function require cuda11.2+ — tccleve / githubweb
- Cuda10 requires "stream=nullptr" as default arg — tccleve / githubweb
- Rework stream inputs — tccleve / githubweb
- Use "if constexpr" for setCudaDevice — tccleve / githubweb
- Remove static in comment — tccleve / githubweb
- Update CI: use CUDA 11.6.2 instead of 11.6.0 (#6314) — noreply / githubweb
- add helper variable templates `are_*_iterators_v` (#6312) — noreply / githubweb
- [HIP] Optimize parallel_reduce (#6229) — noreply / githubweb
- Split Kokkos_SYCL_Parallel_Range.hpp — Daniel Arndt / githubweb
- Split Kokkos_SYCL_Parallel_Reduce.hpp — Daniel Arndt / githubweb
- Move Kokkos_SYCL_Scan.hpp — Daniel Arndt / githubweb
- Split Kokkos_Parallel_Team.hpp — Daniel Arndt / githubweb
- Add comment that we don't need to link against clang_rt with ROCm 5.5 — Bruno Turcksin / githubweb
- Error out when CXX standard is not set when using amdclang or cray clang — Bruno Turcksin / githubweb
- Disable failing tests for ROCm 5.5 and 5.6 — Bruno Turcksin / githubweb
- #5635: Add parallel_scan overload with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Serial/OpenMP: Parallel_scan with return value for TeamThreadRange (#6090) — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Allow using the SYCL execution space on AMD GPUs (#6321) — noreply / githubweb
- Update Clang+Cuda CI from 10.1 to 11.0.3 (#6318) — noreply / githubweb
- SIMD: add shift ops for all int types (#6109) — noreply / githubweb
- SYCL: Use sycl::bit_cast from oneAPI 2024.0.0 on (#6300) — noreply / githubweb
- remove spurious undefs — fnrizzi / githubweb
- Decrease maximum memory available to ccache — Bruno Turcksin / githubweb
- Fix gtest when using C++20 — Bruno Turcksin / githubweb
- Fully qualify Experimental::SYCL in algorithms to avoid conflicting namespaces — Daniel Arndt / githubweb
- address reviews [skip ci] — antoine.meyer54 / githubweb
- formatting — fnrizzi / githubweb
- SIMD: split simd unit tests into separate files (#6278) — noreply / githubweb
- Run NVHPC only on V100 — Bruno Turcksin / githubweb
- Use checked arithmetic builtins for overflow detection (#6313) — noreply / githubweb
- Passing size==0 to DeepCopy/memcpy/omp_target_memcpy (#6273) — noreply / githubweb
- allow sorting via native oneDPL to support views with stride = 1 (#6322) — noreply / githubweb
- Adopt new HIP cmake's way of finding clang-rt — nicurtis / githubweb
- Fix #6334: intel compiler does not like returning inside if constexpr (#6335) — noreply / githubweb
- Fix #6336: capturing `this` implicitly is deprecated in C++20 (#6337) — noreply / githubweb
- fix lambda capture — fnrizzi / githubweb
- remove unnecessary file, fix constraints — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix — fnrizzi / githubweb
- fix lambda capture and constraints — fnrizzi / githubweb
- keep only subset — fnrizzi / githubweb
- revert files — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix syntax — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Disable default oneDPL support in Trilinos — Daniel Arndt / githubweb
- Rename AMD GPU architectures (#6266) — noreply / githubweb
- Fix compiling SIMD libraray with NEON and gcc-13 — Daniel Arndt / githubweb
- SIMD: add generator constructors (#6347) — noreply / githubweb
- Use /usr/bin/env bash in nvcc_wrapper and kokkos_launch_compiler for portability — mikael.simberg / githubweb
- Fix cudaAPI wrapper errors for CUDA_MALLOC_ASYNC (#6346) — noreply / githubweb
- Use std::aligned_alloc for allocations (#6341) — noreply / githubweb
- Initial implementation of gfx942 (#6358) — noreply / githubweb
- Introduce constructor for multi-GPU support. — Daniel Arndt / githubweb
- Only set KOKKOS_ARCH_AMD_GPU if a AMD GPU architecture is enabled — Daniel Arndt / githubweb
- Extend 'hip_driver_check_page_migration' (#6364) — noreply / githubweb
- Replace volatile m_ready_count loads by desul::atomic_loads in task queue implementation — mikael.simberg / githubweb
- Add Kokkos_Atomic.hpp headers to OpenMP and HPX task headers — mikael.simberg / githubweb
- Add support for HIP Graph — Bruno Turcksin / githubweb
- Replace one of the ROCM 5.2 configuration with ROCM 5.6 — Bruno Turcksin / githubweb
- Use aligned version of operator new instead of aligned_alloc — Daniel Arndt / githubweb
- HPX: Don't interfere with exception handling — Daniel Arndt / githubweb
- Fix -Wformat-truncation warnings in CI (#6354) — noreply / githubweb
- Do not use HIP Graph with ROCm 5.2 — Bruno Turcksin / githubweb
- Use constexpr West in src — Bruno Turcksin / githubweb
- Use constexpr West in test — Bruno Turcksin / githubweb
- SIMD: convert binary operators to hidden friends (#6320) — noreply / githubweb
- simd: make mask and condition unit test to check with all data types (#6360) — noreply / githubweb
- team-level std algos: part 2 (#6205) — noreply / githubweb
- Add deprecated attribute to HostSpace(AllocationMechanism) definition — Damien L-G / githubweb
- Added gather_from and scatter_to for AVX2 and AVX512 simd — donlee / githubweb
- Rebased and applied feedbacks — donlee / githubweb
- Changed gen ctor usage to pass in KOKKOS_LAMBDA — donlee / githubweb
- Suppress NVCC attribute ignored warning on SIMD operators as hidden friends — Damien L-G / githubweb
- Remove deprecated code 3 support for volatile join — crtrott / githubweb
- Disable a test not working with nvhpc-23.1 — crtrott / githubweb
- Reenabling tests for nvhpc 23.7 — crtrott / githubweb
- Update containers and algorithms for NVC++ 23.7 — crtrott / githubweb
- More NVC++ 23.7 updates — crtrott / githubweb
- NVC++ clang-format fixes — crtrott / githubweb
- Update nvhpc to version 23.7 in the CI — crtrott / githubweb
- NVHPC 23.7 update: address reviewer comments — crtrott / githubweb
- OpenACC: Guard tests relying on abort — crtrott / githubweb
- Fix TestAtomic to use the test execspace — crtrott / githubweb
- Use NVHPC 23.7 for testing of OpenACC — crtrott / githubweb
- Update nvhpc gtest skip message — crtrott / githubweb
- Work around OpenMPTarget failure — crtrott / githubweb
- Update base docker file for nvhpc — crtrott / githubweb
- Remove stray Cuda graph pattern specialization from tag — Damien L-G / githubweb
- Fix typo — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Fix uninitialize variable warning with gcc 13 — Bruno Turcksin / githubweb
- HIP: Update print_configuration (#6387) — noreply / githubweb
- Add test — Daniel Arndt / githubweb
- Fix typo. — noreply / githubweb
- Check AVX, AVX2, AVX512XEON compiler macros on KokkosCore_config.h (#6248) — noreply / githubweb
- Don't use local headers or runtime in HPX backend due to deprecation — mikael.simberg / githubweb
- Explicitly check for valid device id — Daniel Arndt / githubweb
- Add a more comprehensive `kokkos_{malloc, free}` perf_test (#6377) — noreply / githubweb
- team-level std algos: part 3 (#6207) — noreply / githubweb
- Adding is_scoped_enum & to_underlying (#6356) — noreply / githubweb
- SIMD: add float simd support (#6177) — noreply / githubweb
- team-level std algos: part 4 (#6208) — noreply / githubweb
- Added a gen ctor for float (#6397) — noreply / githubweb
- team-level std algos: part 5 (#6209) — noreply / githubweb
- Deprecate Cuda(cudaStream_t, bool) — Damien L-G / githubweb
- Fixup checked interger operations death test — Damien L-G / githubweb
- Deprecate HIP(hipStream_t, bool) — Damien L-G / githubweb
- Let Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC be ON by default — Damien L-G / githubweb
- Print whether KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC is defined — Damien L-G / githubweb
- Introduce disable_malloc_async Cuda option with generated makefiles — Damien L-G / githubweb
- Preserve one build that disables Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC — Damien L-G / githubweb
- Use archive extraction time for timestamps — cezary.skrzynski / githubweb
- Disable performance benchmarks in AppVeyor CI — cezary.skrzynski / githubweb
- team-level std algos: part 6 (#6210) — noreply / githubweb
- address comments — fnrizzi / githubweb
- OpenMP backend refactor files. (#6403) — noreply / githubweb
- Drop (unused) `Cuda::cuda_internal_maximum_shared_words` — Damien L-G / githubweb
- Drop check that the host backend is initialized before the Cuda/HIP/SYCL one — Damien L-G / githubweb
- Drop unused HIPInternal::m_maxSharedWords data member — Damien L-G / githubweb
- Drop unused HIPInternal::m_hipArch static data member — Damien L-G / githubweb
- !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize — Damien L-G / githubweb
- Drop pre-Kepler logic in Cuda::impl_initialize — Damien L-G / githubweb
- use single — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- formatting — fnrizzi / githubweb
- Team-level std algos: part 7 (#6211) — noreply / githubweb
- formatting — fnrizzi / githubweb
- Enable death tests for fedora rawhide — cezary.skrzynski / githubweb
- core/src: Add half math functions to private header (#6124) — noreply / githubweb
- Drop check whether device supports unified addressing — Damien L-G / githubweb
- fix single as per Christian's suggestion — fnrizzi / githubweb
- Only warn once (at initialization) when forcing allocation in unified memory — Damien L-G / githubweb
- check-copyright improvements (#6399) — noreply / githubweb
- Use execution space instance argument to get device properties in block size deduction — Damien L-G / githubweb
- Address reviewer' comments — Bruno Turcksin / githubweb
- Fix to avoid #186-D pointless comparison warning. — maarten.arnst / githubweb
- add comment — fnrizzi / githubweb
- improve tests to address review — fnrizzi / githubweb
- Fix guard for isnan test for bhalf_t — Daniel Arndt / githubweb
- avoid potential race condition HIP — tccleve / githubweb
- Set the device id in cuda_kernel_arch — Daniel Arndt / githubweb
- [SYCL][Reduction] Group counter should use at least memory_order::acq_rel — andrei.elovikov / githubweb
- Same for scan — andrei.elovikov / githubweb
- Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy — Daniel Arndt / githubweb
- improve tests with intra-team result check — fnrizzi / githubweb
- Fixes for Kokkos::Array (#6372) — noreply / githubweb
- try fix for unique, previous impl to remove later — fnrizzi / githubweb
- #5635: Add parallel_scan changes for CUDA and TeamThreadRange — cezary.skrzynski / githubweb
- remove old impl — fnrizzi / githubweb
- #5635: Enable TeamThreadRange test for CUDA — cezary.skrzynski / githubweb
- Clean up benchmarks/gups — cwpears / githubweb
- benchmark/gups: use CMake — cwpears / githubweb
- OpenMPTarget: Disable check for SIMD compiler macros — Daniel Arndt / githubweb
- #5635: Add parallel_scan with value for CUDA and ThreadVectorRange — cezary.skrzynski / githubweb
- add missing assert — fnrizzi / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- add intra team check for missing test — fnrizzi / githubweb
- fix intel compile error — fnrizzi / githubweb
- fix unreachable for intel — fnrizzi / githubweb
- re-enable unit tests for sort and random via makefile (#6422) — noreply / githubweb
- OpenMPTarget init-join fix (#6444) — noreply / githubweb
- Fix Cuda parallel_scan ThreadVectorRange range — Daniel Arndt / githubweb
- Assign final sum in Cuda parallel_scan ThreadVectorRange — Daniel Arndt / githubweb
- Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443) — noreply / githubweb
- std_algos: improving min, max, minmax (#6421) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426) — noreply / githubweb
- Skip bessel function tests known to fail on Intel GPUs (#6434) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436) — noreply / githubweb
- Fix race condition in functor_vec_scan_ret_val test — Daniel Arndt / githubweb
- Fix parallel_scan_with_reducers test — Daniel Arndt / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425) — noreply / githubweb
- improve tests (#6432) — noreply / githubweb
- improve tests (#6437) — noreply / githubweb
- Move final assignment to correct scope — cezary.skrzynski / githubweb
- fix casting warning in Random test — fnrizzi / githubweb
- Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449) — noreply / githubweb
- HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7 — Daniel Arndt / githubweb
- fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459) — noreply / githubweb
- improve tests to check intra-team result (#6431) — noreply / githubweb
- SIMD: Math functions should be in namespace Kokkos — Daniel Arndt / githubweb
- SYCL: Disable another bessel function test for Intel GPUs — Daniel Arndt / githubweb
- Check for default device — Daniel Arndt / githubweb
- team-level std algos: part 10 (#6256) — noreply / githubweb
- team-level std algos: part 11 (#6258) — noreply / githubweb
- #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302) — noreply / githubweb
- #5635: Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- use shortcut — cezary.skrzynski / githubweb
- Skip testing for non-power-of-two team sizes — cezary.skrzynski / githubweb
- Fix formatting — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for HIP backend — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- #5635: SYCL: Add parallel_scan overload with return value — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1 — Daniel Arndt / githubweb
- Add compatible copy assignment operator to DualView — maarten.arnst / githubweb
- Fix minimum version for Google benchmark — Daniel Arndt / githubweb
- Add test of copy constructor/assignment operator for DualView. — maarten.arnst / githubweb
- Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU — maarten.arnst / githubweb
- Modify test so that source and destination view are of different type — maarten.arnst / githubweb
- Use call operator instead of run_me function — maarten.arnst / githubweb
- team-level std algos: part 12 (#6350) — noreply / githubweb
- core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407) — noreply / githubweb
- Moving abort and assert into their own public headers (#6445) — noreply / githubweb
- Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- Allow detecting SIMD types based on compiler macros (#6188) — noreply / githubweb
- Add KOKKOS_ARCH_ARM_NEON — Daniel Arndt / githubweb
- Fix implementation for cyl_bessel_i0 — Daniel Arndt / githubweb
- cleaning: remove iostream from headers where possible (IWYU) — romin.tomasetti / githubweb
- Fix compiling SIMD unit tests on NVIDIA — Daniel Arndt / githubweb
- guards to ensure DBL_EPSILON return for POWER8,9 — ajpowel / githubweb
- Fix NVCC warnings (#6483) — noreply / githubweb
- team-level std algos: part 13 (#6351) — noreply / githubweb
- Also fix annotations for generator constructor for AVX512 and NEON — Daniel Arndt / githubweb
- #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange — cezary.skrzynski / githubweb
- Fix atomic operations bug for Min and Max (#6435) — noreply / githubweb
- Fix example/build_cmake_installed_different_compiler — Daniel Arndt / githubweb
- fix impl — fnrizzi / githubweb
- Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp — noreply / githubweb
- Split Kokkos_Threads_Parallel files — Bruno Turcksin / githubweb
- HPX: Implement TeamThread and ThreadVector parallel_scan with return value — Daniel Arndt / githubweb
- Serial: Allow for distinct execution space instances (#6441) — noreply / githubweb
- add overload for TeamThreadRange — fnrizzi / githubweb
- address review comment — fnrizzi / githubweb
- Update to HIP TeamPolicy Block number heuristic (#6284) — noreply / githubweb
- SIMD: Split math functions from SIMD_Common.hpp (#6487) — noreply / githubweb
- Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON — Daniel Arndt / githubweb
- ad threadvector — fnrizzi / githubweb
- fix order — fnrizzi / githubweb
- remove guards — fnrizzi / githubweb
- UnorderedMap(space instance): proposal for #6067 — romin.tomasetti / githubweb
- Rename Kokkos_ThreadsExec to align with the other backends — Bruno Turcksin / githubweb
- Promote Kokkos_Printf.hpp to public include — Daniel Arndt / githubweb
- simd: add floor, ceil, round, trunc operations (#6393) — noreply / githubweb
- Update CI in OpenMPTarget to use llvm-17 (#6472) — noreply / githubweb
- Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp — Bruno Turcksin / githubweb
- Kokkos_SIMD_Scalar.hpp: remove extra ';' — ndellin / githubweb
- add guards — fnrizzi / githubweb
- avoid auto — fnrizzi / githubweb
- [ci skip] Update Kokkos version to 4.2.99 — Daniel Arndt / githubweb
- [deprecated code 3] remove all default device init tests — Damien L-G / githubweb
- [deprecated code 3] remove InitArguments — Damien L-G / githubweb
- [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants — Damien L-G / githubweb
- [deprecated code 3] remove {OpenMP,HPX}::partition_master — Damien L-G / githubweb
- OpenMP backend cleanup following removal of deprecated code 3 — Damien L-G / githubweb
- Get rid of FIXME_OPENMP — Damien L-G / githubweb
- [deprecated code 3] remove MasterLock — Damien L-G / githubweb
- [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option — Damien L-G / githubweb
- fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- Do not append " - blocks" to the bitset label — Damien L-G / githubweb
- with_updated_label -> append_to_label — Daniel Arndt / githubweb
- SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables — Daniel Arndt / githubweb
- Fixup in README (github -> GitHub) — Damien L-G / githubweb
- Check that device associated with stream matches requested device — Daniel Arndt / githubweb
- Threads: replace enum with constexpr int and enum class (#6514) — noreply / githubweb
- Added unit tests for reduction ops and few intel svml intrinsics — donlee / githubweb
- Remove sleep and wake functions — Bruno Turcksin / githubweb
- Remove extra constructor — Daniel Arndt / githubweb
- Prefer defaulted default constructor for Bitset (#6524) — noreply / githubweb
- SYCL: Use host-pinned memory to copy reduction/scan result (#6500) — noreply / githubweb
- Remove unused variables — Bruno Turcksin / githubweb
- Remove Sentinel struct from Threads — Bruno Turcksin / githubweb
- Small cleanup of ThreadsInternal::initialize — Bruno Turcksin / githubweb
- Remove useless forward declaration — Bruno Turcksin / githubweb
- Remove spawn function — Bruno Turcksin / githubweb
- Add comments — Bruno Turcksin / githubweb
- Fix indentation — Bruno Turcksin / githubweb
- Fix typo in macro guard — Bruno Turcksin / githubweb
- Reduce number of View constructor instantiations — Damien L-G / githubweb
- Bump HPX version used in CI to 1.9.0 — mikael.simberg / githubweb
- Split files in HIP backend — Bruno Turcksin / githubweb
- Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544) — noreply / githubweb
- Only define STDALGO_TEAM_SOURCES_* once — Daniel Arndt / githubweb
- Remove logical memory spaces — Damien L-G / githubweb
- Rollback changes to view constructors to reduce the number of instantiations (#6564) — noreply / githubweb
- Address reviewer comments — Daniel Arndt / githubweb
- Threads remove unused variables and functions (#6566) — noreply / githubweb
- Remove unused Sandia testing files (#6568) — noreply / githubweb
- fallback implementation cleanup — donlee / githubweb
- Remove empty quotation marks for static_assert — Daniel Arndt / githubweb
- [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header — Damien L-G / githubweb
- Drop Clang+CUDA workaround — Damien L-G / githubweb
- OpenMPTarget: CI compiler upgrade. (#6545) — noreply / githubweb
- m_cudaDev isn't static anymore — Daniel Arndt / githubweb
- Add crtrott's launch_latency benchmark (#6379) — noreply / githubweb
- Simplify fence functions in the Threads backend (#6571) — noreply / githubweb
- simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574) — noreply / githubweb
- Set the device id explicitly for CUDA API calls in impl_initialize — Daniel Arndt / githubweb
- OpenMPTarget: Guard scratch memory usage in ParallelReduce — Daniel Arndt / githubweb
- SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534) — noreply / githubweb
- Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header — Damien L-G / githubweb
- Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556) — noreply / githubweb
- OpenMP: No memset in viewfill (#6573) — noreply / githubweb
- Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header" — noreply / githubweb
- OpenACC: add atomics support (#6446) — noreply / githubweb
- Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543) — noreply / githubweb
- kokkos(unique): fix allocation of temporary view to enfore using the provided space instance — romin.tomasetti / githubweb
- Use binary wrapper for consistency in definition of half types numeric traits (#6590) — noreply / githubweb
- try fix — fnrizzi / githubweb
- Fix TestNumericTriats.hpp for SYCL with bfloat16 support — Daniel Arndt / githubweb
- Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3 — crtrott / githubweb
- Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601) — noreply / githubweb
- Added missing operator* to NEON simd — crtrott / githubweb
- [ci skip] Update changelog on develop for 4.2.00 (#6592) — noreply / githubweb
- Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF (#6593) — noreply / githubweb
- try fix — fnrizzi / githubweb
- avoid pyt package — fnrizzi / githubweb
- try — fnrizzi / githubweb
- fix for macos — fnrizzi / githubweb
- remove comments — fnrizzi / githubweb
- use reference — crtrott / githubweb
- add branching — fnrizzi / githubweb
- [ci skip] fix formatting — cezary.skrzynski / githubweb
- GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing — Daniel Arndt / githubweb
- nvcc(wrapper): adding missing `--generate-line-info` arg — romin.tomasetti / githubweb
- Add clang-format check to GitHub workflows (#6612) — noreply / githubweb
- graph(HIP): adding inline keyword to fix #6623 — romin.tomasetti / githubweb
- Add jenkins multibranch pipeline options — Bruno Turcksin / githubweb
- Add warp sync for Cuda parallel reduce — tccleve / githubweb
- kokkos(profiling): do not finalize in any backend — romin.tomasetti / githubweb
- Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format — Bruno Turcksin / githubweb
- update comment to include final() mention — tccleve / githubweb
- Disabling OpenACC in the CI because it emits too many warnings — Bruno Turcksin / githubweb
- unorderedmap: modernize traits — romin.tomasetti / githubweb
- nvcc wrapper: remove troubling flag to fix 6628 (#6629) — noreply / githubweb
- tools(profiling): type (related to kokkos/kokkos-tools/pull/221) — romin.tomasetti / githubweb
- This PR fixes the too-much-OpenACC-warning issue, mentioned in PR #6639. — lees2 / githubweb
- add missing header fix #6644 — fnrizzi / githubweb
- SYCL: Restrict workaround for is_device_copyable to oneAPI versions before 2024.0.0 (#6532) — noreply / githubweb
- Fixup test math functions ulp should double -> int — Damien L-G / githubweb
- Drop DualView converting copy assignment operator — Damien L-G / githubweb
- Don't use rocm-docker for clang-format — Daniel Arndt / githubweb
- Diable HIP CI — Daniel Arndt / githubweb
- Remove deprecation warning for AllocationMechanism for gcc <11.0 — Daniel Arndt / githubweb
- OpenMPTarget: clang extensions for dynamic shared memory. (#6380) — noreply / githubweb
- Fix builtin_unreachable use for MSVC/CUDA — crtrott / githubweb
- Fix missing include on msvc/cuda — crtrott / githubweb
- Avoid lambdas in constexpr branch for msvc/cuda — crtrott / githubweb
- Sidestep lacking CTAD support msvc/cuda — crtrott / githubweb
- Fix formatting — crtrott / githubweb
- Move header for Damien because he is right — crtrott / githubweb
- Unit test for issue 3371 (negative vector length should not yield a negative max_team_size) (#6076) — noreply / githubweb
- Add CMakeLists.txt for stream benchmark — cwpears / githubweb
- Do not negate the dependent true traits helper — Damien L-G / githubweb
- Drop guards to accommodate external code defining KOKKOS_ASSERT — Damien L-G / githubweb
- Use omp_get_max_active_levels() when supported — Daniel Arndt / githubweb
- Add missing gfx940 — rberger / githubweb
- Add Impl::always_false type-dendent false trait — Damien L-G / githubweb
- Per review prefer always_false<Arg>::value to is_void_v<Arg> — Damien L-G / githubweb
- Improve "no copy mechanism" exception message — bmkelle / githubweb
- Add a unit test for new deep_copy exception msg — bmkelle / githubweb
- Add missing include sstream — bmkelle / githubweb
- src->source, dst->destination — bmkelle / githubweb
- Workaround for ROCm 6.0 failing to compile with AVX2 SIMD support — Bruno Turcksin / githubweb
- SYCL: Force inlining of Kokkos::printf (#6650) — noreply / githubweb
- Improve handling of printf in OMPT on Intel GPUs — Daniel Arndt / githubweb
- OpenMP: Use `omp_get_nested` for older gcc versions (#6685) — noreply / githubweb
- Disable more Bessel tests for SYCL on INtel GPUs — Daniel Arndt / githubweb
- fill_random without exceution space instance should fence — Daniel Arndt / githubweb
- Drop unnecessary guarding for a tool library being loaded in ProfilingSection — Damien L-G / githubweb
- Drop unnecessary header include in Kokkos_Profiling_ProfileSection.hpp — Damien L-G / githubweb
- #5333: CUDA: Use scratch space appropriate to small reduction elements in Team reductions (#5334) — noreply / githubweb
- Cuda: Allocate using the correct device (#6392) — noreply / githubweb
- Let `Profiling::ProfilingSection(std::string)` constructor be explicit and nodiscard (#6690) — noreply / githubweb
- Cosmetic changes to ProfilingSection — Damien L-G / githubweb
- GitHub CI: Test with AddressSanitizer (#6676) — noreply / githubweb
- Kokkos::Array deduction guide (#6373) — noreply / githubweb
- Add CI for MSVC+Cuda (#6661) — noreply / githubweb
- SYCL: Address deprecations after oneAPI 2023.2.0 (#6577) — noreply / githubweb
- Fixup cast tolerance to double before printing — Damien L-G / githubweb
- Try linking against CUDA libararies even with KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE — Daniel Arndt / githubweb
- test_array_ctad: disable test for intel versions < 2021 — ndellin / githubweb
- Enable `{transform_}exclusive_scan` in place (#6667) — noreply / githubweb
- Add `ATOMICS_BYPASS` configuration option to disable atomics (#6692) — noreply / githubweb
- Check matching static extents in View constructor (#5190) — noreply / githubweb
- Remove Kokkos::[b]half_t volatile overloads (#6579) — noreply / githubweb
- add tests — fnrizzi / githubweb
- Provide `kokkos_swap` as part of Core and deprecate `Experimental::swap` in Algorithms (#6697) — noreply / githubweb
- Provide new public headers `<Kokkos_Clamp.hpp>` and `<Kokkos_MinMax.hpp>` (#6687) — noreply / githubweb
- Fix TeamThreadMDRange parallel_reduce (#6511) — noreply / githubweb
- add tests for in-place `inclusive_scan` (#6682) — noreply / githubweb
- Drop pointless Kokkos::Impl::CudaExec forward declaration — Damien L-G / githubweb
- Don't use the compiler launcher script if the compile language is CUDA. (#6704) — noreply / githubweb
- Deprecate `{Cuda,HIP}::detect_device_count()` and `Cuda::[detect_]device_arch()` (#6710) — noreply / githubweb
- Get rid of CudaInternal::cuda_get_error_{name,string}_wrapper — Damien L-G / githubweb
- No need to jump through so many hoops to print the error message — Damien L-G / githubweb
- HIP: Forgot to delete matching brace closing the namespace — Damien L-G / githubweb
- Make initialize and finalize of the Cuda/HIP singleton less special (#6714) — noreply / githubweb
- Kokkos_HIP.cpp: include Kokkos_Core.hpp to resolve errors — ndellin / githubweb
- Add bound checks in RangePolicy and MDRangePolicy (#6617) — noreply / githubweb
- Temporary fix to reenable HIP CI — Bruno Turcksin / githubweb
- Let the smart pointer manage the CUDA/HIP stream (#6721) — noreply / githubweb
- Fix Docker env variables — Bruno Turcksin / githubweb
- Ensure view_allocation_error does not silently ignore that no exception was thrown — Damien L-G / githubweb
- Add RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc enumerator — Damien L-G / githubweb
- [OpenACC] throw if acc_malloc returned nullptr — Damien L-G / githubweb
- Fixup using declaration — Damien L-G / githubweb
- Disable openacc.view_allocation_error test — Damien L-G / githubweb
- Guard `[MD]RangePolicy` precondition check for deprecated code 4 (#6726) — noreply / githubweb
- Add C++26 standard to CMake Setup — dev / githubweb
- Add support for C++26 in generated makefiles — Damien L-G / githubweb
- Add KOKKOS_ENABLE_CXX26 to the configuration metadata — Damien L-G / githubweb
- Reenable HIP testing — Bruno Turcksin / githubweb
- Disabling failing HIP test in the CI — Bruno Turcksin / githubweb
- Use team_size_max to fix "Team size too large" error in reducer test (#6725) — noreply / githubweb
- Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with `SharedAllocationRecord`) (#6732) — noreply / githubweb
- Untangle SharedAllocationRecord spaghetti code — Damien L-G / githubweb
- Fix TestThreadVectorMDRangeParallelReduce (#6734) — noreply / githubweb
- Cuda multi-GPU support: Allow execution space instance constructor to run (#6706) — noreply / githubweb
- Drop support for deprecated command-line arguments and environment variables (#6744) — noreply / githubweb
- Avoid unnecessary zero-memset of the scratch flags in SYCL (#6739) — noreply / githubweb
#3 (Jan 25, 2024, 10:41:36 AM)
- implementation and tests — fnrizzi / githubweb
- implementation and tests — fnrizzi / githubweb
- only compute with relavent entries — tccleve / githubweb
- subset of team level impl of std algorithms — fnrizzi / githubweb
- fix copyright — fnrizzi / githubweb
- guard for openmptarget — fnrizzi / githubweb
- fix for openmptarget — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- SIMD: suppress a uninitialized variable warning (#6294) — noreply / githubweb
- OpenACC CMakechange Clacc (#6250) — noreply / githubweb
- `Kokkos::sort` support custom comparator (#6253) — noreply / githubweb
- Add nightly build using latest gcc and c++23 — Bruno Turcksin / githubweb
- Explicitly capture this in lambda function — Bruno Turcksin / githubweb
- Fix typo in nightly jenkins configuration — Bruno Turcksin / githubweb
- Fix a memory bug in the `free_state` function of random pools. (#6290) — noreply / githubweb
- std_algos: for_each: try condense the impl — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Enforce create_mirror restrictions w.r.t. ViewCtorArgs on all variants — Daniel Arndt / githubweb
- Use KOKKOS_IF_ON_HOST — Daniel Arndt / githubweb
- Suppress warnings — Daniel Arndt / githubweb
- Don't suppress warnings for NVHPC — Daniel Arndt / githubweb
- Improve macro definitions — Daniel Arndt / githubweb
- Enable Serial backend in HPX build — cezary.skrzynski / githubweb
- Modify fences in View API test — cezary.skrzynski / githubweb
- Check for overflow during backend initialization (Cuda, HIP, SYCL) (#6159) — noreply / githubweb
- Improve SYCL reduction performance: MDRangePolicy (#6271) — noreply / githubweb
- bring back previous code as discussed in meeting — fnrizzi / githubweb
- create cudaAPI function wrappers — tccleve / githubweb
- Reorganize #include <cuda_runtime_api.h> — tccleve / githubweb
- Some api function require cuda11.2+ — tccleve / githubweb
- Cuda10 requires "stream=nullptr" as default arg — tccleve / githubweb
- Rework stream inputs — tccleve / githubweb
- Use "if constexpr" for setCudaDevice — tccleve / githubweb
- Remove static in comment — tccleve / githubweb
- Update CI: use CUDA 11.6.2 instead of 11.6.0 (#6314) — noreply / githubweb
- add helper variable templates `are_*_iterators_v` (#6312) — noreply / githubweb
- [HIP] Optimize parallel_reduce (#6229) — noreply / githubweb
- Split Kokkos_SYCL_Parallel_Range.hpp — Daniel Arndt / githubweb
- Split Kokkos_SYCL_Parallel_Reduce.hpp — Daniel Arndt / githubweb
- Move Kokkos_SYCL_Scan.hpp — Daniel Arndt / githubweb
- Split Kokkos_Parallel_Team.hpp — Daniel Arndt / githubweb
- Add comment that we don't need to link against clang_rt with ROCm 5.5 — Bruno Turcksin / githubweb
- Error out when CXX standard is not set when using amdclang or cray clang — Bruno Turcksin / githubweb
- Disable failing tests for ROCm 5.5 and 5.6 — Bruno Turcksin / githubweb
- #5635: Add parallel_scan overload with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Serial/OpenMP: Parallel_scan with return value for TeamThreadRange (#6090) — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Allow using the SYCL execution space on AMD GPUs (#6321) — noreply / githubweb
- Update Clang+Cuda CI from 10.1 to 11.0.3 (#6318) — noreply / githubweb
- SIMD: add shift ops for all int types (#6109) — noreply / githubweb
- SYCL: Use sycl::bit_cast from oneAPI 2024.0.0 on (#6300) — noreply / githubweb
- remove spurious undefs — fnrizzi / githubweb
- Decrease maximum memory available to ccache — Bruno Turcksin / githubweb
- Fix gtest when using C++20 — Bruno Turcksin / githubweb
- Fully qualify Experimental::SYCL in algorithms to avoid conflicting namespaces — Daniel Arndt / githubweb
- address reviews [skip ci] — antoine.meyer54 / githubweb
- formatting — fnrizzi / githubweb
- SIMD: split simd unit tests into separate files (#6278) — noreply / githubweb
- Run NVHPC only on V100 — Bruno Turcksin / githubweb
- Use checked arithmetic builtins for overflow detection (#6313) — noreply / githubweb
- Passing size==0 to DeepCopy/memcpy/omp_target_memcpy (#6273) — noreply / githubweb
- allow sorting via native oneDPL to support views with stride = 1 (#6322) — noreply / githubweb
- Adopt new HIP cmake's way of finding clang-rt — nicurtis / githubweb
- Fix #6334: intel compiler does not like returning inside if constexpr (#6335) — noreply / githubweb
- Fix #6336: capturing `this` implicitly is deprecated in C++20 (#6337) — noreply / githubweb
- fix lambda capture — fnrizzi / githubweb
- remove unnecessary file, fix constraints — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix — fnrizzi / githubweb
- fix lambda capture and constraints — fnrizzi / githubweb
- keep only subset — fnrizzi / githubweb
- revert files — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix syntax — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Disable default oneDPL support in Trilinos — Daniel Arndt / githubweb
- Rename AMD GPU architectures (#6266) — noreply / githubweb
- Fix compiling SIMD libraray with NEON and gcc-13 — Daniel Arndt / githubweb
- SIMD: add generator constructors (#6347) — noreply / githubweb
- Use /usr/bin/env bash in nvcc_wrapper and kokkos_launch_compiler for portability — mikael.simberg / githubweb
- Fix cudaAPI wrapper errors for CUDA_MALLOC_ASYNC (#6346) — noreply / githubweb
- Use std::aligned_alloc for allocations (#6341) — noreply / githubweb
- Initial implementation of gfx942 (#6358) — noreply / githubweb
- Introduce constructor for multi-GPU support. — Daniel Arndt / githubweb
- Only set KOKKOS_ARCH_AMD_GPU if a AMD GPU architecture is enabled — Daniel Arndt / githubweb
- Extend 'hip_driver_check_page_migration' (#6364) — noreply / githubweb
- Replace volatile m_ready_count loads by desul::atomic_loads in task queue implementation — mikael.simberg / githubweb
- Add Kokkos_Atomic.hpp headers to OpenMP and HPX task headers — mikael.simberg / githubweb
- Add support for HIP Graph — Bruno Turcksin / githubweb
- Replace one of the ROCM 5.2 configuration with ROCM 5.6 — Bruno Turcksin / githubweb
- Use aligned version of operator new instead of aligned_alloc — Daniel Arndt / githubweb
- HPX: Don't interfere with exception handling — Daniel Arndt / githubweb
- Fix -Wformat-truncation warnings in CI (#6354) — noreply / githubweb
- Do not use HIP Graph with ROCm 5.2 — Bruno Turcksin / githubweb
- Use constexpr West in src — Bruno Turcksin / githubweb
- Use constexpr West in test — Bruno Turcksin / githubweb
- SIMD: convert binary operators to hidden friends (#6320) — noreply / githubweb
- simd: make mask and condition unit test to check with all data types (#6360) — noreply / githubweb
- team-level std algos: part 2 (#6205) — noreply / githubweb
- Add deprecated attribute to HostSpace(AllocationMechanism) definition — Damien L-G / githubweb
- Added gather_from and scatter_to for AVX2 and AVX512 simd — donlee / githubweb
- Rebased and applied feedbacks — donlee / githubweb
- Changed gen ctor usage to pass in KOKKOS_LAMBDA — donlee / githubweb
- Suppress NVCC attribute ignored warning on SIMD operators as hidden friends — Damien L-G / githubweb
- Remove deprecated code 3 support for volatile join — crtrott / githubweb
- Disable a test not working with nvhpc-23.1 — crtrott / githubweb
- Reenabling tests for nvhpc 23.7 — crtrott / githubweb
- Update containers and algorithms for NVC++ 23.7 — crtrott / githubweb
- More NVC++ 23.7 updates — crtrott / githubweb
- NVC++ clang-format fixes — crtrott / githubweb
- Update nvhpc to version 23.7 in the CI — crtrott / githubweb
- NVHPC 23.7 update: address reviewer comments — crtrott / githubweb
- OpenACC: Guard tests relying on abort — crtrott / githubweb
- Fix TestAtomic to use the test execspace — crtrott / githubweb
- Use NVHPC 23.7 for testing of OpenACC — crtrott / githubweb
- Update nvhpc gtest skip message — crtrott / githubweb
- Work around OpenMPTarget failure — crtrott / githubweb
- Update base docker file for nvhpc — crtrott / githubweb
- Remove stray Cuda graph pattern specialization from tag — Damien L-G / githubweb
- Fix typo — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Fix uninitialize variable warning with gcc 13 — Bruno Turcksin / githubweb
- HIP: Update print_configuration (#6387) — noreply / githubweb
- Add test — Daniel Arndt / githubweb
- Fix typo. — noreply / githubweb
- Check AVX, AVX2, AVX512XEON compiler macros on KokkosCore_config.h (#6248) — noreply / githubweb
- Don't use local headers or runtime in HPX backend due to deprecation — mikael.simberg / githubweb
- Explicitly check for valid device id — Daniel Arndt / githubweb
- Add a more comprehensive `kokkos_{malloc, free}` perf_test (#6377) — noreply / githubweb
- team-level std algos: part 3 (#6207) — noreply / githubweb
- Adding is_scoped_enum & to_underlying (#6356) — noreply / githubweb
- SIMD: add float simd support (#6177) — noreply / githubweb
- team-level std algos: part 4 (#6208) — noreply / githubweb
- Added a gen ctor for float (#6397) — noreply / githubweb
- team-level std algos: part 5 (#6209) — noreply / githubweb
- Deprecate Cuda(cudaStream_t, bool) — Damien L-G / githubweb
- Fixup checked interger operations death test — Damien L-G / githubweb
- Deprecate HIP(hipStream_t, bool) — Damien L-G / githubweb
- Let Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC be ON by default — Damien L-G / githubweb
- Print whether KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC is defined — Damien L-G / githubweb
- Introduce disable_malloc_async Cuda option with generated makefiles — Damien L-G / githubweb
- Preserve one build that disables Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC — Damien L-G / githubweb
- Use archive extraction time for timestamps — cezary.skrzynski / githubweb
- Disable performance benchmarks in AppVeyor CI — cezary.skrzynski / githubweb
- team-level std algos: part 6 (#6210) — noreply / githubweb
- address comments — fnrizzi / githubweb
- OpenMP backend refactor files. (#6403) — noreply / githubweb
- Drop (unused) `Cuda::cuda_internal_maximum_shared_words` — Damien L-G / githubweb
- Drop check that the host backend is initialized before the Cuda/HIP/SYCL one — Damien L-G / githubweb
- Drop unused HIPInternal::m_maxSharedWords data member — Damien L-G / githubweb
- Drop unused HIPInternal::m_hipArch static data member — Damien L-G / githubweb
- !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize — Damien L-G / githubweb
- Drop pre-Kepler logic in Cuda::impl_initialize — Damien L-G / githubweb
- use single — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- formatting — fnrizzi / githubweb
- Team-level std algos: part 7 (#6211) — noreply / githubweb
- formatting — fnrizzi / githubweb
- Enable death tests for fedora rawhide — cezary.skrzynski / githubweb
- core/src: Add half math functions to private header (#6124) — noreply / githubweb
- Drop check whether device supports unified addressing — Damien L-G / githubweb
- fix single as per Christian's suggestion — fnrizzi / githubweb
- Only warn once (at initialization) when forcing allocation in unified memory — Damien L-G / githubweb
- check-copyright improvements (#6399) — noreply / githubweb
- Use execution space instance argument to get device properties in block size deduction — Damien L-G / githubweb
- Address reviewer' comments — Bruno Turcksin / githubweb
- Fix to avoid #186-D pointless comparison warning. — maarten.arnst / githubweb
- add comment — fnrizzi / githubweb
- improve tests to address review — fnrizzi / githubweb
- Fix guard for isnan test for bhalf_t — Daniel Arndt / githubweb
- avoid potential race condition HIP — tccleve / githubweb
- Set the device id in cuda_kernel_arch — Daniel Arndt / githubweb
- [SYCL][Reduction] Group counter should use at least memory_order::acq_rel — andrei.elovikov / githubweb
- Same for scan — andrei.elovikov / githubweb
- Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy — Daniel Arndt / githubweb
- improve tests with intra-team result check — fnrizzi / githubweb
- Fixes for Kokkos::Array (#6372) — noreply / githubweb
- try fix for unique, previous impl to remove later — fnrizzi / githubweb
- #5635: Add parallel_scan changes for CUDA and TeamThreadRange — cezary.skrzynski / githubweb
- remove old impl — fnrizzi / githubweb
- #5635: Enable TeamThreadRange test for CUDA — cezary.skrzynski / githubweb
- Clean up benchmarks/gups — cwpears / githubweb
- benchmark/gups: use CMake — cwpears / githubweb
- OpenMPTarget: Disable check for SIMD compiler macros — Daniel Arndt / githubweb
- #5635: Add parallel_scan with value for CUDA and ThreadVectorRange — cezary.skrzynski / githubweb
- add missing assert — fnrizzi / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- add intra team check for missing test — fnrizzi / githubweb
- fix intel compile error — fnrizzi / githubweb
- fix unreachable for intel — fnrizzi / githubweb
- re-enable unit tests for sort and random via makefile (#6422) — noreply / githubweb
- OpenMPTarget init-join fix (#6444) — noreply / githubweb
- Fix Cuda parallel_scan ThreadVectorRange range — Daniel Arndt / githubweb
- Assign final sum in Cuda parallel_scan ThreadVectorRange — Daniel Arndt / githubweb
- Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443) — noreply / githubweb
- std_algos: improving min, max, minmax (#6421) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426) — noreply / githubweb
- Skip bessel function tests known to fail on Intel GPUs (#6434) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436) — noreply / githubweb
- Fix race condition in functor_vec_scan_ret_val test — Daniel Arndt / githubweb
- Fix parallel_scan_with_reducers test — Daniel Arndt / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425) — noreply / githubweb
- improve tests (#6432) — noreply / githubweb
- improve tests (#6437) — noreply / githubweb
- Move final assignment to correct scope — cezary.skrzynski / githubweb
- fix casting warning in Random test — fnrizzi / githubweb
- Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449) — noreply / githubweb
- HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7 — Daniel Arndt / githubweb
- fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459) — noreply / githubweb
- improve tests to check intra-team result (#6431) — noreply / githubweb
- SIMD: Math functions should be in namespace Kokkos — Daniel Arndt / githubweb
- SYCL: Disable another bessel function test for Intel GPUs — Daniel Arndt / githubweb
- Check for default device — Daniel Arndt / githubweb
- team-level std algos: part 10 (#6256) — noreply / githubweb
- team-level std algos: part 11 (#6258) — noreply / githubweb
- #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302) — noreply / githubweb
- #5635: Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- use shortcut — cezary.skrzynski / githubweb
- Skip testing for non-power-of-two team sizes — cezary.skrzynski / githubweb
- Fix formatting — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for HIP backend — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- #5635: SYCL: Add parallel_scan overload with return value — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1 — Daniel Arndt / githubweb
- Add compatible copy assignment operator to DualView — maarten.arnst / githubweb
- Fix minimum version for Google benchmark — Daniel Arndt / githubweb
- Add test of copy constructor/assignment operator for DualView. — maarten.arnst / githubweb
- Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU — maarten.arnst / githubweb
- Modify test so that source and destination view are of different type — maarten.arnst / githubweb
- Use call operator instead of run_me function — maarten.arnst / githubweb
- team-level std algos: part 12 (#6350) — noreply / githubweb
- core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407) — noreply / githubweb
- Moving abort and assert into their own public headers (#6445) — noreply / githubweb
- Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- Allow detecting SIMD types based on compiler macros (#6188) — noreply / githubweb
- Add KOKKOS_ARCH_ARM_NEON — Daniel Arndt / githubweb
- Fix implementation for cyl_bessel_i0 — Daniel Arndt / githubweb
- cleaning: remove iostream from headers where possible (IWYU) — romin.tomasetti / githubweb
- Fix compiling SIMD unit tests on NVIDIA — Daniel Arndt / githubweb
- guards to ensure DBL_EPSILON return for POWER8,9 — ajpowel / githubweb
- Fix NVCC warnings (#6483) — noreply / githubweb
- team-level std algos: part 13 (#6351) — noreply / githubweb
- Also fix annotations for generator constructor for AVX512 and NEON — Daniel Arndt / githubweb
- #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange — cezary.skrzynski / githubweb
- Fix atomic operations bug for Min and Max (#6435) — noreply / githubweb
- Fix example/build_cmake_installed_different_compiler — Daniel Arndt / githubweb
- fix impl — fnrizzi / githubweb
- Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp — noreply / githubweb
- Split Kokkos_Threads_Parallel files — Bruno Turcksin / githubweb
- HPX: Implement TeamThread and ThreadVector parallel_scan with return value — Daniel Arndt / githubweb
- Serial: Allow for distinct execution space instances (#6441) — noreply / githubweb
- add overload for TeamThreadRange — fnrizzi / githubweb
- address review comment — fnrizzi / githubweb
- Update to HIP TeamPolicy Block number heuristic (#6284) — noreply / githubweb
- SIMD: Split math functions from SIMD_Common.hpp (#6487) — noreply / githubweb
- Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON — Daniel Arndt / githubweb
- ad threadvector — fnrizzi / githubweb
- fix order — fnrizzi / githubweb
- remove guards — fnrizzi / githubweb
- UnorderedMap(space instance): proposal for #6067 — romin.tomasetti / githubweb
- Rename Kokkos_ThreadsExec to align with the other backends — Bruno Turcksin / githubweb
- Promote Kokkos_Printf.hpp to public include — Daniel Arndt / githubweb
- simd: add floor, ceil, round, trunc operations (#6393) — noreply / githubweb
- Update CI in OpenMPTarget to use llvm-17 (#6472) — noreply / githubweb
- Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp — Bruno Turcksin / githubweb
- Kokkos_SIMD_Scalar.hpp: remove extra ';' — ndellin / githubweb
- add guards — fnrizzi / githubweb
- avoid auto — fnrizzi / githubweb
- [ci skip] Update Kokkos version to 4.2.99 — Daniel Arndt / githubweb
- [deprecated code 3] remove all default device init tests — Damien L-G / githubweb
- [deprecated code 3] remove InitArguments — Damien L-G / githubweb
- [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants — Damien L-G / githubweb
- [deprecated code 3] remove {OpenMP,HPX}::partition_master — Damien L-G / githubweb
- OpenMP backend cleanup following removal of deprecated code 3 — Damien L-G / githubweb
- Get rid of FIXME_OPENMP — Damien L-G / githubweb
- [deprecated code 3] remove MasterLock — Damien L-G / githubweb
- [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option — Damien L-G / githubweb
- fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- Do not append " - blocks" to the bitset label — Damien L-G / githubweb
- with_updated_label -> append_to_label — Daniel Arndt / githubweb
- SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables — Daniel Arndt / githubweb
- Fixup in README (github -> GitHub) — Damien L-G / githubweb
- Check that device associated with stream matches requested device — Daniel Arndt / githubweb
- Threads: replace enum with constexpr int and enum class (#6514) — noreply / githubweb
- Added unit tests for reduction ops and few intel svml intrinsics — donlee / githubweb
- Remove sleep and wake functions — Bruno Turcksin / githubweb
- Remove extra constructor — Daniel Arndt / githubweb
- Prefer defaulted default constructor for Bitset (#6524) — noreply / githubweb
- SYCL: Use host-pinned memory to copy reduction/scan result (#6500) — noreply / githubweb
- Remove unused variables — Bruno Turcksin / githubweb
- Remove Sentinel struct from Threads — Bruno Turcksin / githubweb
- Small cleanup of ThreadsInternal::initialize — Bruno Turcksin / githubweb
- Remove useless forward declaration — Bruno Turcksin / githubweb
- Remove spawn function — Bruno Turcksin / githubweb
- Add comments — Bruno Turcksin / githubweb
- Fix indentation — Bruno Turcksin / githubweb
- Fix typo in macro guard — Bruno Turcksin / githubweb
- Reduce number of View constructor instantiations — Damien L-G / githubweb
- Bump HPX version used in CI to 1.9.0 — mikael.simberg / githubweb
- Split files in HIP backend — Bruno Turcksin / githubweb
- Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544) — noreply / githubweb
- Only define STDALGO_TEAM_SOURCES_* once — Daniel Arndt / githubweb
- Remove logical memory spaces — Damien L-G / githubweb
- Rollback changes to view constructors to reduce the number of instantiations (#6564) — noreply / githubweb
- Address reviewer comments — Daniel Arndt / githubweb
- Threads remove unused variables and functions (#6566) — noreply / githubweb
- Remove unused Sandia testing files (#6568) — noreply / githubweb
- fallback implementation cleanup — donlee / githubweb
- Remove empty quotation marks for static_assert — Daniel Arndt / githubweb
- [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header — Damien L-G / githubweb
- Drop Clang+CUDA workaround — Damien L-G / githubweb
- OpenMPTarget: CI compiler upgrade. (#6545) — noreply / githubweb
- m_cudaDev isn't static anymore — Daniel Arndt / githubweb
- Add crtrott's launch_latency benchmark (#6379) — noreply / githubweb
- Simplify fence functions in the Threads backend (#6571) — noreply / githubweb
- simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574) — noreply / githubweb
- Set the device id explicitly for CUDA API calls in impl_initialize — Daniel Arndt / githubweb
- OpenMPTarget: Guard scratch memory usage in ParallelReduce — Daniel Arndt / githubweb
- SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534) — noreply / githubweb
- Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header — Damien L-G / githubweb
- Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556) — noreply / githubweb
- OpenMP: No memset in viewfill (#6573) — noreply / githubweb
- Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header" — noreply / githubweb
- OpenACC: add atomics support (#6446) — noreply / githubweb
- Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543) — noreply / githubweb
- kokkos(unique): fix allocation of temporary view to enfore using the provided space instance — romin.tomasetti / githubweb
- Use binary wrapper for consistency in definition of half types numeric traits (#6590) — noreply / githubweb
- try fix — fnrizzi / githubweb
- Fix TestNumericTriats.hpp for SYCL with bfloat16 support — Daniel Arndt / githubweb
- Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3 — crtrott / githubweb
- Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601) — noreply / githubweb
- Added missing operator* to NEON simd — crtrott / githubweb
- [ci skip] Update changelog on develop for 4.2.00 (#6592) — noreply / githubweb
- Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF (#6593) — noreply / githubweb
- try fix — fnrizzi / githubweb
- avoid pyt package — fnrizzi / githubweb
- try — fnrizzi / githubweb
- fix for macos — fnrizzi / githubweb
- remove comments — fnrizzi / githubweb
- use reference — crtrott / githubweb
- add branching — fnrizzi / githubweb
- [ci skip] fix formatting — cezary.skrzynski / githubweb
- GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing — Daniel Arndt / githubweb
- nvcc(wrapper): adding missing `--generate-line-info` arg — romin.tomasetti / githubweb
- Add clang-format check to GitHub workflows (#6612) — noreply / githubweb
- graph(HIP): adding inline keyword to fix #6623 — romin.tomasetti / githubweb
- Add jenkins multibranch pipeline options — Bruno Turcksin / githubweb
- Add warp sync for Cuda parallel reduce — tccleve / githubweb
- kokkos(profiling): do not finalize in any backend — romin.tomasetti / githubweb
- Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format — Bruno Turcksin / githubweb
- update comment to include final() mention — tccleve / githubweb
- Disabling OpenACC in the CI because it emits too many warnings — Bruno Turcksin / githubweb
- unorderedmap: modernize traits — romin.tomasetti / githubweb
- nvcc wrapper: remove troubling flag to fix 6628 (#6629) — noreply / githubweb
- tools(profiling): type (related to kokkos/kokkos-tools/pull/221) — romin.tomasetti / githubweb
- This PR fixes the too-much-OpenACC-warning issue, mentioned in PR #6639. — lees2 / githubweb
- add missing header fix #6644 — fnrizzi / githubweb
- SYCL: Restrict workaround for is_device_copyable to oneAPI versions before 2024.0.0 (#6532) — noreply / githubweb
- Fixup test math functions ulp should double -> int — Damien L-G / githubweb
- Drop DualView converting copy assignment operator — Damien L-G / githubweb
- Don't use rocm-docker for clang-format — Daniel Arndt / githubweb
- Diable HIP CI — Daniel Arndt / githubweb
- Remove deprecation warning for AllocationMechanism for gcc <11.0 — Daniel Arndt / githubweb
- OpenMPTarget: clang extensions for dynamic shared memory. (#6380) — noreply / githubweb
- Fix builtin_unreachable use for MSVC/CUDA — crtrott / githubweb
- Fix missing include on msvc/cuda — crtrott / githubweb
- Avoid lambdas in constexpr branch for msvc/cuda — crtrott / githubweb
- Sidestep lacking CTAD support msvc/cuda — crtrott / githubweb
- Fix formatting — crtrott / githubweb
- Move header for Damien because he is right — crtrott / githubweb
- Unit test for issue 3371 (negative vector length should not yield a negative max_team_size) (#6076) — noreply / githubweb
- Add CMakeLists.txt for stream benchmark — cwpears / githubweb
- Do not negate the dependent true traits helper — Damien L-G / githubweb
- Drop guards to accommodate external code defining KOKKOS_ASSERT — Damien L-G / githubweb
- Use omp_get_max_active_levels() when supported — Daniel Arndt / githubweb
- Add missing gfx940 — rberger / githubweb
- Add Impl::always_false type-dendent false trait — Damien L-G / githubweb
- Per review prefer always_false<Arg>::value to is_void_v<Arg> — Damien L-G / githubweb
- Improve "no copy mechanism" exception message — bmkelle / githubweb
- Add a unit test for new deep_copy exception msg — bmkelle / githubweb
- Add missing include sstream — bmkelle / githubweb
- src->source, dst->destination — bmkelle / githubweb
- Workaround for ROCm 6.0 failing to compile with AVX2 SIMD support — Bruno Turcksin / githubweb
- SYCL: Force inlining of Kokkos::printf (#6650) — noreply / githubweb
- Improve handling of printf in OMPT on Intel GPUs — Daniel Arndt / githubweb
- OpenMP: Use `omp_get_nested` for older gcc versions (#6685) — noreply / githubweb
- Disable more Bessel tests for SYCL on INtel GPUs — Daniel Arndt / githubweb
- fill_random without exceution space instance should fence — Daniel Arndt / githubweb
- Drop unnecessary guarding for a tool library being loaded in ProfilingSection — Damien L-G / githubweb
- Drop unnecessary header include in Kokkos_Profiling_ProfileSection.hpp — Damien L-G / githubweb
- #5333: CUDA: Use scratch space appropriate to small reduction elements in Team reductions (#5334) — noreply / githubweb
- Cuda: Allocate using the correct device (#6392) — noreply / githubweb
- Let `Profiling::ProfilingSection(std::string)` constructor be explicit and nodiscard (#6690) — noreply / githubweb
- Cosmetic changes to ProfilingSection — Damien L-G / githubweb
- GitHub CI: Test with AddressSanitizer (#6676) — noreply / githubweb
- Kokkos::Array deduction guide (#6373) — noreply / githubweb
- Add CI for MSVC+Cuda (#6661) — noreply / githubweb
- SYCL: Address deprecations after oneAPI 2023.2.0 (#6577) — noreply / githubweb
- Fixup cast tolerance to double before printing — Damien L-G / githubweb
- Try linking against CUDA libararies even with KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE — Daniel Arndt / githubweb
- test_array_ctad: disable test for intel versions < 2021 — ndellin / githubweb
- Enable `{transform_}exclusive_scan` in place (#6667) — noreply / githubweb
- Add `ATOMICS_BYPASS` configuration option to disable atomics (#6692) — noreply / githubweb
- Check matching static extents in View constructor (#5190) — noreply / githubweb
- Remove Kokkos::[b]half_t volatile overloads (#6579) — noreply / githubweb
- add tests — fnrizzi / githubweb
- Provide `kokkos_swap` as part of Core and deprecate `Experimental::swap` in Algorithms (#6697) — noreply / githubweb
- Provide new public headers `<Kokkos_Clamp.hpp>` and `<Kokkos_MinMax.hpp>` (#6687) — noreply / githubweb
- Fix TeamThreadMDRange parallel_reduce (#6511) — noreply / githubweb
- add tests for in-place `inclusive_scan` (#6682) — noreply / githubweb
- Drop pointless Kokkos::Impl::CudaExec forward declaration — Damien L-G / githubweb
- Don't use the compiler launcher script if the compile language is CUDA. (#6704) — noreply / githubweb
- Deprecate `{Cuda,HIP}::detect_device_count()` and `Cuda::[detect_]device_arch()` (#6710) — noreply / githubweb
- Get rid of CudaInternal::cuda_get_error_{name,string}_wrapper — Damien L-G / githubweb
- No need to jump through so many hoops to print the error message — Damien L-G / githubweb
- HIP: Forgot to delete matching brace closing the namespace — Damien L-G / githubweb
- Make initialize and finalize of the Cuda/HIP singleton less special (#6714) — noreply / githubweb
- Kokkos_HIP.cpp: include Kokkos_Core.hpp to resolve errors — ndellin / githubweb
- Add bound checks in RangePolicy and MDRangePolicy (#6617) — noreply / githubweb
- Temporary fix to reenable HIP CI — Bruno Turcksin / githubweb
- Let the smart pointer manage the CUDA/HIP stream (#6721) — noreply / githubweb
- Fix Docker env variables — Bruno Turcksin / githubweb
- Ensure view_allocation_error does not silently ignore that no exception was thrown — Damien L-G / githubweb
- Add RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc enumerator — Damien L-G / githubweb
- [OpenACC] throw if acc_malloc returned nullptr — Damien L-G / githubweb
- Fixup using declaration — Damien L-G / githubweb
- Disable openacc.view_allocation_error test — Damien L-G / githubweb
- Guard `[MD]RangePolicy` precondition check for deprecated code 4 (#6726) — noreply / githubweb
- Add C++26 standard to CMake Setup — dev / githubweb
- Add support for C++26 in generated makefiles — Damien L-G / githubweb
- Add KOKKOS_ENABLE_CXX26 to the configuration metadata — Damien L-G / githubweb
- Reenable HIP testing — Bruno Turcksin / githubweb
- Disabling failing HIP test in the CI — Bruno Turcksin / githubweb
- Use team_size_max to fix "Team size too large" error in reducer test (#6725) — noreply / githubweb
- Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with `SharedAllocationRecord`) (#6732) — noreply / githubweb
- Untangle SharedAllocationRecord spaghetti code — Damien L-G / githubweb
- Fix TestThreadVectorMDRangeParallelReduce (#6734) — noreply / githubweb
- Cuda multi-GPU support: Allow execution space instance constructor to run (#6706) — noreply / githubweb
- Drop support for deprecated command-line arguments and environment variables (#6744) — noreply / githubweb
- Avoid unnecessary zero-memset of the scratch flags in SYCL (#6739) — noreply / githubweb
#3 (Jan 25, 2024, 10:41:36 AM)
- implementation and tests — fnrizzi / githubweb
- implementation and tests — fnrizzi / githubweb
- only compute with relavent entries — tccleve / githubweb
- subset of team level impl of std algorithms — fnrizzi / githubweb
- fix copyright — fnrizzi / githubweb
- guard for openmptarget — fnrizzi / githubweb
- fix for openmptarget — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- SIMD: suppress a uninitialized variable warning (#6294) — noreply / githubweb
- OpenACC CMakechange Clacc (#6250) — noreply / githubweb
- `Kokkos::sort` support custom comparator (#6253) — noreply / githubweb
- Add nightly build using latest gcc and c++23 — Bruno Turcksin / githubweb
- Explicitly capture this in lambda function — Bruno Turcksin / githubweb
- Fix typo in nightly jenkins configuration — Bruno Turcksin / githubweb
- Fix a memory bug in the `free_state` function of random pools. (#6290) — noreply / githubweb
- std_algos: for_each: try condense the impl — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Enforce create_mirror restrictions w.r.t. ViewCtorArgs on all variants — Daniel Arndt / githubweb
- Use KOKKOS_IF_ON_HOST — Daniel Arndt / githubweb
- Suppress warnings — Daniel Arndt / githubweb
- Don't suppress warnings for NVHPC — Daniel Arndt / githubweb
- Improve macro definitions — Daniel Arndt / githubweb
- Enable Serial backend in HPX build — cezary.skrzynski / githubweb
- Modify fences in View API test — cezary.skrzynski / githubweb
- Check for overflow during backend initialization (Cuda, HIP, SYCL) (#6159) — noreply / githubweb
- Improve SYCL reduction performance: MDRangePolicy (#6271) — noreply / githubweb
- bring back previous code as discussed in meeting — fnrizzi / githubweb
- create cudaAPI function wrappers — tccleve / githubweb
- Reorganize #include <cuda_runtime_api.h> — tccleve / githubweb
- Some api function require cuda11.2+ — tccleve / githubweb
- Cuda10 requires "stream=nullptr" as default arg — tccleve / githubweb
- Rework stream inputs — tccleve / githubweb
- Use "if constexpr" for setCudaDevice — tccleve / githubweb
- Remove static in comment — tccleve / githubweb
- Update CI: use CUDA 11.6.2 instead of 11.6.0 (#6314) — noreply / githubweb
- add helper variable templates `are_*_iterators_v` (#6312) — noreply / githubweb
- [HIP] Optimize parallel_reduce (#6229) — noreply / githubweb
- Split Kokkos_SYCL_Parallel_Range.hpp — Daniel Arndt / githubweb
- Split Kokkos_SYCL_Parallel_Reduce.hpp — Daniel Arndt / githubweb
- Move Kokkos_SYCL_Scan.hpp — Daniel Arndt / githubweb
- Split Kokkos_Parallel_Team.hpp — Daniel Arndt / githubweb
- Add comment that we don't need to link against clang_rt with ROCm 5.5 — Bruno Turcksin / githubweb
- Error out when CXX standard is not set when using amdclang or cray clang — Bruno Turcksin / githubweb
- Disable failing tests for ROCm 5.5 and 5.6 — Bruno Turcksin / githubweb
- #5635: Add parallel_scan overload with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Serial/OpenMP: Parallel_scan with return value for TeamThreadRange (#6090) — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Allow using the SYCL execution space on AMD GPUs (#6321) — noreply / githubweb
- Update Clang+Cuda CI from 10.1 to 11.0.3 (#6318) — noreply / githubweb
- SIMD: add shift ops for all int types (#6109) — noreply / githubweb
- SYCL: Use sycl::bit_cast from oneAPI 2024.0.0 on (#6300) — noreply / githubweb
- remove spurious undefs — fnrizzi / githubweb
- Decrease maximum memory available to ccache — Bruno Turcksin / githubweb
- Fix gtest when using C++20 — Bruno Turcksin / githubweb
- Fully qualify Experimental::SYCL in algorithms to avoid conflicting namespaces — Daniel Arndt / githubweb
- address reviews [skip ci] — antoine.meyer54 / githubweb
- formatting — fnrizzi / githubweb
- SIMD: split simd unit tests into separate files (#6278) — noreply / githubweb
- Run NVHPC only on V100 — Bruno Turcksin / githubweb
- Use checked arithmetic builtins for overflow detection (#6313) — noreply / githubweb
- Passing size==0 to DeepCopy/memcpy/omp_target_memcpy (#6273) — noreply / githubweb
- allow sorting via native oneDPL to support views with stride = 1 (#6322) — noreply / githubweb
- Adopt new HIP cmake's way of finding clang-rt — nicurtis / githubweb
- Fix #6334: intel compiler does not like returning inside if constexpr (#6335) — noreply / githubweb
- Fix #6336: capturing `this` implicitly is deprecated in C++20 (#6337) — noreply / githubweb
- fix lambda capture — fnrizzi / githubweb
- remove unnecessary file, fix constraints — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix — fnrizzi / githubweb
- fix lambda capture and constraints — fnrizzi / githubweb
- keep only subset — fnrizzi / githubweb
- revert files — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix syntax — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Disable default oneDPL support in Trilinos — Daniel Arndt / githubweb
- Rename AMD GPU architectures (#6266) — noreply / githubweb
- Fix compiling SIMD libraray with NEON and gcc-13 — Daniel Arndt / githubweb
- SIMD: add generator constructors (#6347) — noreply / githubweb
- Use /usr/bin/env bash in nvcc_wrapper and kokkos_launch_compiler for portability — mikael.simberg / githubweb
- Fix cudaAPI wrapper errors for CUDA_MALLOC_ASYNC (#6346) — noreply / githubweb
- Use std::aligned_alloc for allocations (#6341) — noreply / githubweb
- Initial implementation of gfx942 (#6358) — noreply / githubweb
- Introduce constructor for multi-GPU support. — Daniel Arndt / githubweb
- Only set KOKKOS_ARCH_AMD_GPU if a AMD GPU architecture is enabled — Daniel Arndt / githubweb
- Extend 'hip_driver_check_page_migration' (#6364) — noreply / githubweb
- Replace volatile m_ready_count loads by desul::atomic_loads in task queue implementation — mikael.simberg / githubweb
- Add Kokkos_Atomic.hpp headers to OpenMP and HPX task headers — mikael.simberg / githubweb
- Add support for HIP Graph — Bruno Turcksin / githubweb
- Replace one of the ROCM 5.2 configuration with ROCM 5.6 — Bruno Turcksin / githubweb
- Use aligned version of operator new instead of aligned_alloc — Daniel Arndt / githubweb
- HPX: Don't interfere with exception handling — Daniel Arndt / githubweb
- Fix -Wformat-truncation warnings in CI (#6354) — noreply / githubweb
- Do not use HIP Graph with ROCm 5.2 — Bruno Turcksin / githubweb
- Use constexpr West in src — Bruno Turcksin / githubweb
- Use constexpr West in test — Bruno Turcksin / githubweb
- SIMD: convert binary operators to hidden friends (#6320) — noreply / githubweb
- simd: make mask and condition unit test to check with all data types (#6360) — noreply / githubweb
- team-level std algos: part 2 (#6205) — noreply / githubweb
- Add deprecated attribute to HostSpace(AllocationMechanism) definition — Damien L-G / githubweb
- Added gather_from and scatter_to for AVX2 and AVX512 simd — donlee / githubweb
- Rebased and applied feedbacks — donlee / githubweb
- Changed gen ctor usage to pass in KOKKOS_LAMBDA — donlee / githubweb
- Suppress NVCC attribute ignored warning on SIMD operators as hidden friends — Damien L-G / githubweb
- Remove deprecated code 3 support for volatile join — crtrott / githubweb
- Disable a test not working with nvhpc-23.1 — crtrott / githubweb
- Reenabling tests for nvhpc 23.7 — crtrott / githubweb
- Update containers and algorithms for NVC++ 23.7 — crtrott / githubweb
- More NVC++ 23.7 updates — crtrott / githubweb
- NVC++ clang-format fixes — crtrott / githubweb
- Update nvhpc to version 23.7 in the CI — crtrott / githubweb
- NVHPC 23.7 update: address reviewer comments — crtrott / githubweb
- OpenACC: Guard tests relying on abort — crtrott / githubweb
- Fix TestAtomic to use the test execspace — crtrott / githubweb
- Use NVHPC 23.7 for testing of OpenACC — crtrott / githubweb
- Update nvhpc gtest skip message — crtrott / githubweb
- Work around OpenMPTarget failure — crtrott / githubweb
- Update base docker file for nvhpc — crtrott / githubweb
- Remove stray Cuda graph pattern specialization from tag — Damien L-G / githubweb
- Fix typo — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Fix uninitialize variable warning with gcc 13 — Bruno Turcksin / githubweb
- HIP: Update print_configuration (#6387) — noreply / githubweb
- Add test — Daniel Arndt / githubweb
- Fix typo. — noreply / githubweb
- Check AVX, AVX2, AVX512XEON compiler macros on KokkosCore_config.h (#6248) — noreply / githubweb
- Don't use local headers or runtime in HPX backend due to deprecation — mikael.simberg / githubweb
- Explicitly check for valid device id — Daniel Arndt / githubweb
- Add a more comprehensive `kokkos_{malloc, free}` perf_test (#6377) — noreply / githubweb
- team-level std algos: part 3 (#6207) — noreply / githubweb
- Adding is_scoped_enum & to_underlying (#6356) — noreply / githubweb
- SIMD: add float simd support (#6177) — noreply / githubweb
- team-level std algos: part 4 (#6208) — noreply / githubweb
- Added a gen ctor for float (#6397) — noreply / githubweb
- team-level std algos: part 5 (#6209) — noreply / githubweb
- Deprecate Cuda(cudaStream_t, bool) — Damien L-G / githubweb
- Fixup checked interger operations death test — Damien L-G / githubweb
- Deprecate HIP(hipStream_t, bool) — Damien L-G / githubweb
- Let Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC be ON by default — Damien L-G / githubweb
- Print whether KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC is defined — Damien L-G / githubweb
- Introduce disable_malloc_async Cuda option with generated makefiles — Damien L-G / githubweb
- Preserve one build that disables Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC — Damien L-G / githubweb
- Use archive extraction time for timestamps — cezary.skrzynski / githubweb
- Disable performance benchmarks in AppVeyor CI — cezary.skrzynski / githubweb
- team-level std algos: part 6 (#6210) — noreply / githubweb
- address comments — fnrizzi / githubweb
- OpenMP backend refactor files. (#6403) — noreply / githubweb
- Drop (unused) `Cuda::cuda_internal_maximum_shared_words` — Damien L-G / githubweb
- Drop check that the host backend is initialized before the Cuda/HIP/SYCL one — Damien L-G / githubweb
- Drop unused HIPInternal::m_maxSharedWords data member — Damien L-G / githubweb
- Drop unused HIPInternal::m_hipArch static data member — Damien L-G / githubweb
- !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize — Damien L-G / githubweb
- Drop pre-Kepler logic in Cuda::impl_initialize — Damien L-G / githubweb
- use single — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- formatting — fnrizzi / githubweb
- Team-level std algos: part 7 (#6211) — noreply / githubweb
- formatting — fnrizzi / githubweb
- Enable death tests for fedora rawhide — cezary.skrzynski / githubweb
- core/src: Add half math functions to private header (#6124) — noreply / githubweb
- Drop check whether device supports unified addressing — Damien L-G / githubweb
- fix single as per Christian's suggestion — fnrizzi / githubweb
- Only warn once (at initialization) when forcing allocation in unified memory — Damien L-G / githubweb
- check-copyright improvements (#6399) — noreply / githubweb
- Use execution space instance argument to get device properties in block size deduction — Damien L-G / githubweb
- Address reviewer' comments — Bruno Turcksin / githubweb
- Fix to avoid #186-D pointless comparison warning. — maarten.arnst / githubweb
- add comment — fnrizzi / githubweb
- improve tests to address review — fnrizzi / githubweb
- Fix guard for isnan test for bhalf_t — Daniel Arndt / githubweb
- avoid potential race condition HIP — tccleve / githubweb
- Set the device id in cuda_kernel_arch — Daniel Arndt / githubweb
- [SYCL][Reduction] Group counter should use at least memory_order::acq_rel — andrei.elovikov / githubweb
- Same for scan — andrei.elovikov / githubweb
- Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy — Daniel Arndt / githubweb
- improve tests with intra-team result check — fnrizzi / githubweb
- Fixes for Kokkos::Array (#6372) — noreply / githubweb
- try fix for unique, previous impl to remove later — fnrizzi / githubweb
- #5635: Add parallel_scan changes for CUDA and TeamThreadRange — cezary.skrzynski / githubweb
- remove old impl — fnrizzi / githubweb
- #5635: Enable TeamThreadRange test for CUDA — cezary.skrzynski / githubweb
- Clean up benchmarks/gups — cwpears / githubweb
- benchmark/gups: use CMake — cwpears / githubweb
- OpenMPTarget: Disable check for SIMD compiler macros — Daniel Arndt / githubweb
- #5635: Add parallel_scan with value for CUDA and ThreadVectorRange — cezary.skrzynski / githubweb
- add missing assert — fnrizzi / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- add intra team check for missing test — fnrizzi / githubweb
- fix intel compile error — fnrizzi / githubweb
- fix unreachable for intel — fnrizzi / githubweb
- re-enable unit tests for sort and random via makefile (#6422) — noreply / githubweb
- OpenMPTarget init-join fix (#6444) — noreply / githubweb
- Fix Cuda parallel_scan ThreadVectorRange range — Daniel Arndt / githubweb
- Assign final sum in Cuda parallel_scan ThreadVectorRange — Daniel Arndt / githubweb
- Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443) — noreply / githubweb
- std_algos: improving min, max, minmax (#6421) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426) — noreply / githubweb
- Skip bessel function tests known to fail on Intel GPUs (#6434) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436) — noreply / githubweb
- Fix race condition in functor_vec_scan_ret_val test — Daniel Arndt / githubweb
- Fix parallel_scan_with_reducers test — Daniel Arndt / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425) — noreply / githubweb
- improve tests (#6432) — noreply / githubweb
- improve tests (#6437) — noreply / githubweb
- Move final assignment to correct scope — cezary.skrzynski / githubweb
- fix casting warning in Random test — fnrizzi / githubweb
- Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449) — noreply / githubweb
- HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7 — Daniel Arndt / githubweb
- fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459) — noreply / githubweb
- improve tests to check intra-team result (#6431) — noreply / githubweb
- SIMD: Math functions should be in namespace Kokkos — Daniel Arndt / githubweb
- SYCL: Disable another bessel function test for Intel GPUs — Daniel Arndt / githubweb
- Check for default device — Daniel Arndt / githubweb
- team-level std algos: part 10 (#6256) — noreply / githubweb
- team-level std algos: part 11 (#6258) — noreply / githubweb
- #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302) — noreply / githubweb
- #5635: Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- use shortcut — cezary.skrzynski / githubweb
- Skip testing for non-power-of-two team sizes — cezary.skrzynski / githubweb
- Fix formatting — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for HIP backend — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- #5635: SYCL: Add parallel_scan overload with return value — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1 — Daniel Arndt / githubweb
- Add compatible copy assignment operator to DualView — maarten.arnst / githubweb
- Fix minimum version for Google benchmark — Daniel Arndt / githubweb
- Add test of copy constructor/assignment operator for DualView. — maarten.arnst / githubweb
- Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU — maarten.arnst / githubweb
- Modify test so that source and destination view are of different type — maarten.arnst / githubweb
- Use call operator instead of run_me function — maarten.arnst / githubweb
- team-level std algos: part 12 (#6350) — noreply / githubweb
- core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407) — noreply / githubweb
- Moving abort and assert into their own public headers (#6445) — noreply / githubweb
- Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- Allow detecting SIMD types based on compiler macros (#6188) — noreply / githubweb
- Add KOKKOS_ARCH_ARM_NEON — Daniel Arndt / githubweb
- Fix implementation for cyl_bessel_i0 — Daniel Arndt / githubweb
- cleaning: remove iostream from headers where possible (IWYU) — romin.tomasetti / githubweb
- Fix compiling SIMD unit tests on NVIDIA — Daniel Arndt / githubweb
- guards to ensure DBL_EPSILON return for POWER8,9 — ajpowel / githubweb
- Fix NVCC warnings (#6483) — noreply / githubweb
- team-level std algos: part 13 (#6351) — noreply / githubweb
- Also fix annotations for generator constructor for AVX512 and NEON — Daniel Arndt / githubweb
- #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange — cezary.skrzynski / githubweb
- Fix atomic operations bug for Min and Max (#6435) — noreply / githubweb
- Fix example/build_cmake_installed_different_compiler — Daniel Arndt / githubweb
- fix impl — fnrizzi / githubweb
- Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp — noreply / githubweb
- Split Kokkos_Threads_Parallel files — Bruno Turcksin / githubweb
- HPX: Implement TeamThread and ThreadVector parallel_scan with return value — Daniel Arndt / githubweb
- Serial: Allow for distinct execution space instances (#6441) — noreply / githubweb
- add overload for TeamThreadRange — fnrizzi / githubweb
- address review comment — fnrizzi / githubweb
- Update to HIP TeamPolicy Block number heuristic (#6284) — noreply / githubweb
- SIMD: Split math functions from SIMD_Common.hpp (#6487) — noreply / githubweb
- Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON — Daniel Arndt / githubweb
- ad threadvector — fnrizzi / githubweb
- fix order — fnrizzi / githubweb
- remove guards — fnrizzi / githubweb
- UnorderedMap(space instance): proposal for #6067 — romin.tomasetti / githubweb
- Rename Kokkos_ThreadsExec to align with the other backends — Bruno Turcksin / githubweb
- Promote Kokkos_Printf.hpp to public include — Daniel Arndt / githubweb
- simd: add floor, ceil, round, trunc operations (#6393) — noreply / githubweb
- Update CI in OpenMPTarget to use llvm-17 (#6472) — noreply / githubweb
- Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp — Bruno Turcksin / githubweb
- Kokkos_SIMD_Scalar.hpp: remove extra ';' — ndellin / githubweb
- add guards — fnrizzi / githubweb
- avoid auto — fnrizzi / githubweb
- [ci skip] Update Kokkos version to 4.2.99 — Daniel Arndt / githubweb
- [deprecated code 3] remove all default device init tests — Damien L-G / githubweb
- [deprecated code 3] remove InitArguments — Damien L-G / githubweb
- [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants — Damien L-G / githubweb
- [deprecated code 3] remove {OpenMP,HPX}::partition_master — Damien L-G / githubweb
- OpenMP backend cleanup following removal of deprecated code 3 — Damien L-G / githubweb
- Get rid of FIXME_OPENMP — Damien L-G / githubweb
- [deprecated code 3] remove MasterLock — Damien L-G / githubweb
- [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option — Damien L-G / githubweb
- fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- Do not append " - blocks" to the bitset label — Damien L-G / githubweb
- with_updated_label -> append_to_label — Daniel Arndt / githubweb
- SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables — Daniel Arndt / githubweb
- Fixup in README (github -> GitHub) — Damien L-G / githubweb
- Check that device associated with stream matches requested device — Daniel Arndt / githubweb
- Threads: replace enum with constexpr int and enum class (#6514) — noreply / githubweb
- Added unit tests for reduction ops and few intel svml intrinsics — donlee / githubweb
- Remove sleep and wake functions — Bruno Turcksin / githubweb
- Remove extra constructor — Daniel Arndt / githubweb
- Prefer defaulted default constructor for Bitset (#6524) — noreply / githubweb
- SYCL: Use host-pinned memory to copy reduction/scan result (#6500) — noreply / githubweb
- Remove unused variables — Bruno Turcksin / githubweb
- Remove Sentinel struct from Threads — Bruno Turcksin / githubweb
- Small cleanup of ThreadsInternal::initialize — Bruno Turcksin / githubweb
- Remove useless forward declaration — Bruno Turcksin / githubweb
- Remove spawn function — Bruno Turcksin / githubweb
- Add comments — Bruno Turcksin / githubweb
- Fix indentation — Bruno Turcksin / githubweb
- Fix typo in macro guard — Bruno Turcksin / githubweb
- Reduce number of View constructor instantiations — Damien L-G / githubweb
- Bump HPX version used in CI to 1.9.0 — mikael.simberg / githubweb
- Split files in HIP backend — Bruno Turcksin / githubweb
- Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544) — noreply / githubweb
- Only define STDALGO_TEAM_SOURCES_* once — Daniel Arndt / githubweb
- Remove logical memory spaces — Damien L-G / githubweb
- Rollback changes to view constructors to reduce the number of instantiations (#6564) — noreply / githubweb
- Address reviewer comments — Daniel Arndt / githubweb
- Threads remove unused variables and functions (#6566) — noreply / githubweb
- Remove unused Sandia testing files (#6568) — noreply / githubweb
- fallback implementation cleanup — donlee / githubweb
- Remove empty quotation marks for static_assert — Daniel Arndt / githubweb
- [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header — Damien L-G / githubweb
- Drop Clang+CUDA workaround — Damien L-G / githubweb
- OpenMPTarget: CI compiler upgrade. (#6545) — noreply / githubweb
- m_cudaDev isn't static anymore — Daniel Arndt / githubweb
- Add crtrott's launch_latency benchmark (#6379) — noreply / githubweb
- Simplify fence functions in the Threads backend (#6571) — noreply / githubweb
- simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574) — noreply / githubweb
- Set the device id explicitly for CUDA API calls in impl_initialize — Daniel Arndt / githubweb
- OpenMPTarget: Guard scratch memory usage in ParallelReduce — Daniel Arndt / githubweb
- SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534) — noreply / githubweb
- Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header — Damien L-G / githubweb
- Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556) — noreply / githubweb
- OpenMP: No memset in viewfill (#6573) — noreply / githubweb
- Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header" — noreply / githubweb
- OpenACC: add atomics support (#6446) — noreply / githubweb
- Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543) — noreply / githubweb
- kokkos(unique): fix allocation of temporary view to enfore using the provided space instance — romin.tomasetti / githubweb
- Use binary wrapper for consistency in definition of half types numeric traits (#6590) — noreply / githubweb
- try fix — fnrizzi / githubweb
- Fix TestNumericTriats.hpp for SYCL with bfloat16 support — Daniel Arndt / githubweb
- Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3 — crtrott / githubweb
- Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601) — noreply / githubweb
- Added missing operator* to NEON simd — crtrott / githubweb
- [ci skip] Update changelog on develop for 4.2.00 (#6592) — noreply / githubweb
- Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF (#6593) — noreply / githubweb
- try fix — fnrizzi / githubweb
- avoid pyt package — fnrizzi / githubweb
- try — fnrizzi / githubweb
- fix for macos — fnrizzi / githubweb
- remove comments — fnrizzi / githubweb
- use reference — crtrott / githubweb
- add branching — fnrizzi / githubweb
- [ci skip] fix formatting — cezary.skrzynski / githubweb
- GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing — Daniel Arndt / githubweb
- nvcc(wrapper): adding missing `--generate-line-info` arg — romin.tomasetti / githubweb
- Add clang-format check to GitHub workflows (#6612) — noreply / githubweb
- graph(HIP): adding inline keyword to fix #6623 — romin.tomasetti / githubweb
- Add jenkins multibranch pipeline options — Bruno Turcksin / githubweb
- Add warp sync for Cuda parallel reduce — tccleve / githubweb
- kokkos(profiling): do not finalize in any backend — romin.tomasetti / githubweb
- Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format — Bruno Turcksin / githubweb
- update comment to include final() mention — tccleve / githubweb
- Disabling OpenACC in the CI because it emits too many warnings — Bruno Turcksin / githubweb
- unorderedmap: modernize traits — romin.tomasetti / githubweb
- nvcc wrapper: remove troubling flag to fix 6628 (#6629) — noreply / githubweb
- tools(profiling): type (related to kokkos/kokkos-tools/pull/221) — romin.tomasetti / githubweb
- This PR fixes the too-much-OpenACC-warning issue, mentioned in PR #6639. — lees2 / githubweb
- add missing header fix #6644 — fnrizzi / githubweb
- SYCL: Restrict workaround for is_device_copyable to oneAPI versions before 2024.0.0 (#6532) — noreply / githubweb
- Fixup test math functions ulp should double -> int — Damien L-G / githubweb
- Drop DualView converting copy assignment operator — Damien L-G / githubweb
- Don't use rocm-docker for clang-format — Daniel Arndt / githubweb
- Diable HIP CI — Daniel Arndt / githubweb
- Remove deprecation warning for AllocationMechanism for gcc <11.0 — Daniel Arndt / githubweb
- OpenMPTarget: clang extensions for dynamic shared memory. (#6380) — noreply / githubweb
- Fix builtin_unreachable use for MSVC/CUDA — crtrott / githubweb
- Fix missing include on msvc/cuda — crtrott / githubweb
- Avoid lambdas in constexpr branch for msvc/cuda — crtrott / githubweb
- Sidestep lacking CTAD support msvc/cuda — crtrott / githubweb
- Fix formatting — crtrott / githubweb
- Move header for Damien because he is right — crtrott / githubweb
- Unit test for issue 3371 (negative vector length should not yield a negative max_team_size) (#6076) — noreply / githubweb
- Add CMakeLists.txt for stream benchmark — cwpears / githubweb
- Do not negate the dependent true traits helper — Damien L-G / githubweb
- Drop guards to accommodate external code defining KOKKOS_ASSERT — Damien L-G / githubweb
- Use omp_get_max_active_levels() when supported — Daniel Arndt / githubweb
- Add missing gfx940 — rberger / githubweb
- Add Impl::always_false type-dendent false trait — Damien L-G / githubweb
- Per review prefer always_false<Arg>::value to is_void_v<Arg> — Damien L-G / githubweb
- Improve "no copy mechanism" exception message — bmkelle / githubweb
- Add a unit test for new deep_copy exception msg — bmkelle / githubweb
- Add missing include sstream — bmkelle / githubweb
- src->source, dst->destination — bmkelle / githubweb
- Workaround for ROCm 6.0 failing to compile with AVX2 SIMD support — Bruno Turcksin / githubweb
- SYCL: Force inlining of Kokkos::printf (#6650) — noreply / githubweb
- Improve handling of printf in OMPT on Intel GPUs — Daniel Arndt / githubweb
- OpenMP: Use `omp_get_nested` for older gcc versions (#6685) — noreply / githubweb
- Disable more Bessel tests for SYCL on INtel GPUs — Daniel Arndt / githubweb
- fill_random without exceution space instance should fence — Daniel Arndt / githubweb
- Drop unnecessary guarding for a tool library being loaded in ProfilingSection — Damien L-G / githubweb
- Drop unnecessary header include in Kokkos_Profiling_ProfileSection.hpp — Damien L-G / githubweb
- #5333: CUDA: Use scratch space appropriate to small reduction elements in Team reductions (#5334) — noreply / githubweb
- Cuda: Allocate using the correct device (#6392) — noreply / githubweb
- Let `Profiling::ProfilingSection(std::string)` constructor be explicit and nodiscard (#6690) — noreply / githubweb
- Cosmetic changes to ProfilingSection — Damien L-G / githubweb
- GitHub CI: Test with AddressSanitizer (#6676) — noreply / githubweb
- Kokkos::Array deduction guide (#6373) — noreply / githubweb
- Add CI for MSVC+Cuda (#6661) — noreply / githubweb
- SYCL: Address deprecations after oneAPI 2023.2.0 (#6577) — noreply / githubweb
- Fixup cast tolerance to double before printing — Damien L-G / githubweb
- Try linking against CUDA libararies even with KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE — Daniel Arndt / githubweb
- test_array_ctad: disable test for intel versions < 2021 — ndellin / githubweb
- Enable `{transform_}exclusive_scan` in place (#6667) — noreply / githubweb
- Add `ATOMICS_BYPASS` configuration option to disable atomics (#6692) — noreply / githubweb
- Check matching static extents in View constructor (#5190) — noreply / githubweb
- Remove Kokkos::[b]half_t volatile overloads (#6579) — noreply / githubweb
- add tests — fnrizzi / githubweb
- Provide `kokkos_swap` as part of Core and deprecate `Experimental::swap` in Algorithms (#6697) — noreply / githubweb
- Provide new public headers `<Kokkos_Clamp.hpp>` and `<Kokkos_MinMax.hpp>` (#6687) — noreply / githubweb
- Fix TeamThreadMDRange parallel_reduce (#6511) — noreply / githubweb
- add tests for in-place `inclusive_scan` (#6682) — noreply / githubweb
- Drop pointless Kokkos::Impl::CudaExec forward declaration — Damien L-G / githubweb
- Don't use the compiler launcher script if the compile language is CUDA. (#6704) — noreply / githubweb
- Deprecate `{Cuda,HIP}::detect_device_count()` and `Cuda::[detect_]device_arch()` (#6710) — noreply / githubweb
- Get rid of CudaInternal::cuda_get_error_{name,string}_wrapper — Damien L-G / githubweb
- No need to jump through so many hoops to print the error message — Damien L-G / githubweb
- HIP: Forgot to delete matching brace closing the namespace — Damien L-G / githubweb
- Make initialize and finalize of the Cuda/HIP singleton less special (#6714) — noreply / githubweb
- Kokkos_HIP.cpp: include Kokkos_Core.hpp to resolve errors — ndellin / githubweb
- Add bound checks in RangePolicy and MDRangePolicy (#6617) — noreply / githubweb
- Temporary fix to reenable HIP CI — Bruno Turcksin / githubweb
- Let the smart pointer manage the CUDA/HIP stream (#6721) — noreply / githubweb
- Fix Docker env variables — Bruno Turcksin / githubweb
- Ensure view_allocation_error does not silently ignore that no exception was thrown — Damien L-G / githubweb
- Add RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc enumerator — Damien L-G / githubweb
- [OpenACC] throw if acc_malloc returned nullptr — Damien L-G / githubweb
- Fixup using declaration — Damien L-G / githubweb
- Disable openacc.view_allocation_error test — Damien L-G / githubweb
- Guard `[MD]RangePolicy` precondition check for deprecated code 4 (#6726) — noreply / githubweb
- Add C++26 standard to CMake Setup — dev / githubweb
- Add support for C++26 in generated makefiles — Damien L-G / githubweb
- Add KOKKOS_ENABLE_CXX26 to the configuration metadata — Damien L-G / githubweb
- Reenable HIP testing — Bruno Turcksin / githubweb
- Disabling failing HIP test in the CI — Bruno Turcksin / githubweb
- Use team_size_max to fix "Team size too large" error in reducer test (#6725) — noreply / githubweb
- Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with `SharedAllocationRecord`) (#6732) — noreply / githubweb
- Untangle SharedAllocationRecord spaghetti code — Damien L-G / githubweb
- Fix TestThreadVectorMDRangeParallelReduce (#6734) — noreply / githubweb
- Cuda multi-GPU support: Allow execution space instance constructor to run (#6706) — noreply / githubweb
- Drop support for deprecated command-line arguments and environment variables (#6744) — noreply / githubweb
- Avoid unnecessary zero-memset of the scratch flags in SYCL (#6739) — noreply / githubweb
#3 (Jan 25, 2024, 10:41:36 AM)
- implementation and tests — fnrizzi / githubweb
- implementation and tests — fnrizzi / githubweb
- only compute with relavent entries — tccleve / githubweb
- subset of team level impl of std algorithms — fnrizzi / githubweb
- fix copyright — fnrizzi / githubweb
- guard for openmptarget — fnrizzi / githubweb
- fix for openmptarget — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- SIMD: suppress a uninitialized variable warning (#6294) — noreply / githubweb
- OpenACC CMakechange Clacc (#6250) — noreply / githubweb
- `Kokkos::sort` support custom comparator (#6253) — noreply / githubweb
- Add nightly build using latest gcc and c++23 — Bruno Turcksin / githubweb
- Explicitly capture this in lambda function — Bruno Turcksin / githubweb
- Fix typo in nightly jenkins configuration — Bruno Turcksin / githubweb
- Fix a memory bug in the `free_state` function of random pools. (#6290) — noreply / githubweb
- std_algos: for_each: try condense the impl — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Enforce create_mirror restrictions w.r.t. ViewCtorArgs on all variants — Daniel Arndt / githubweb
- Use KOKKOS_IF_ON_HOST — Daniel Arndt / githubweb
- Suppress warnings — Daniel Arndt / githubweb
- Don't suppress warnings for NVHPC — Daniel Arndt / githubweb
- Improve macro definitions — Daniel Arndt / githubweb
- Enable Serial backend in HPX build — cezary.skrzynski / githubweb
- Modify fences in View API test — cezary.skrzynski / githubweb
- Check for overflow during backend initialization (Cuda, HIP, SYCL) (#6159) — noreply / githubweb
- Improve SYCL reduction performance: MDRangePolicy (#6271) — noreply / githubweb
- bring back previous code as discussed in meeting — fnrizzi / githubweb
- create cudaAPI function wrappers — tccleve / githubweb
- Reorganize #include <cuda_runtime_api.h> — tccleve / githubweb
- Some api function require cuda11.2+ — tccleve / githubweb
- Cuda10 requires "stream=nullptr" as default arg — tccleve / githubweb
- Rework stream inputs — tccleve / githubweb
- Use "if constexpr" for setCudaDevice — tccleve / githubweb
- Remove static in comment — tccleve / githubweb
- Update CI: use CUDA 11.6.2 instead of 11.6.0 (#6314) — noreply / githubweb
- add helper variable templates `are_*_iterators_v` (#6312) — noreply / githubweb
- [HIP] Optimize parallel_reduce (#6229) — noreply / githubweb
- Split Kokkos_SYCL_Parallel_Range.hpp — Daniel Arndt / githubweb
- Split Kokkos_SYCL_Parallel_Reduce.hpp — Daniel Arndt / githubweb
- Move Kokkos_SYCL_Scan.hpp — Daniel Arndt / githubweb
- Split Kokkos_Parallel_Team.hpp — Daniel Arndt / githubweb
- Add comment that we don't need to link against clang_rt with ROCm 5.5 — Bruno Turcksin / githubweb
- Error out when CXX standard is not set when using amdclang or cray clang — Bruno Turcksin / githubweb
- Disable failing tests for ROCm 5.5 and 5.6 — Bruno Turcksin / githubweb
- #5635: Add parallel_scan overload with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Serial/OpenMP: Parallel_scan with return value for TeamThreadRange (#6090) — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Allow using the SYCL execution space on AMD GPUs (#6321) — noreply / githubweb
- Update Clang+Cuda CI from 10.1 to 11.0.3 (#6318) — noreply / githubweb
- SIMD: add shift ops for all int types (#6109) — noreply / githubweb
- SYCL: Use sycl::bit_cast from oneAPI 2024.0.0 on (#6300) — noreply / githubweb
- remove spurious undefs — fnrizzi / githubweb
- Decrease maximum memory available to ccache — Bruno Turcksin / githubweb
- Fix gtest when using C++20 — Bruno Turcksin / githubweb
- Fully qualify Experimental::SYCL in algorithms to avoid conflicting namespaces — Daniel Arndt / githubweb
- address reviews [skip ci] — antoine.meyer54 / githubweb
- formatting — fnrizzi / githubweb
- SIMD: split simd unit tests into separate files (#6278) — noreply / githubweb
- Run NVHPC only on V100 — Bruno Turcksin / githubweb
- Use checked arithmetic builtins for overflow detection (#6313) — noreply / githubweb
- Passing size==0 to DeepCopy/memcpy/omp_target_memcpy (#6273) — noreply / githubweb
- allow sorting via native oneDPL to support views with stride = 1 (#6322) — noreply / githubweb
- Adopt new HIP cmake's way of finding clang-rt — nicurtis / githubweb
- Fix #6334: intel compiler does not like returning inside if constexpr (#6335) — noreply / githubweb
- Fix #6336: capturing `this` implicitly is deprecated in C++20 (#6337) — noreply / githubweb
- fix lambda capture — fnrizzi / githubweb
- remove unnecessary file, fix constraints — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix — fnrizzi / githubweb
- fix lambda capture and constraints — fnrizzi / githubweb
- keep only subset — fnrizzi / githubweb
- revert files — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix syntax — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Disable default oneDPL support in Trilinos — Daniel Arndt / githubweb
- Rename AMD GPU architectures (#6266) — noreply / githubweb
- Fix compiling SIMD libraray with NEON and gcc-13 — Daniel Arndt / githubweb
- SIMD: add generator constructors (#6347) — noreply / githubweb
- Use /usr/bin/env bash in nvcc_wrapper and kokkos_launch_compiler for portability — mikael.simberg / githubweb
- Fix cudaAPI wrapper errors for CUDA_MALLOC_ASYNC (#6346) — noreply / githubweb
- Use std::aligned_alloc for allocations (#6341) — noreply / githubweb
- Initial implementation of gfx942 (#6358) — noreply / githubweb
- Introduce constructor for multi-GPU support. — Daniel Arndt / githubweb
- Only set KOKKOS_ARCH_AMD_GPU if a AMD GPU architecture is enabled — Daniel Arndt / githubweb
- Extend 'hip_driver_check_page_migration' (#6364) — noreply / githubweb
- Replace volatile m_ready_count loads by desul::atomic_loads in task queue implementation — mikael.simberg / githubweb
- Add Kokkos_Atomic.hpp headers to OpenMP and HPX task headers — mikael.simberg / githubweb
- Add support for HIP Graph — Bruno Turcksin / githubweb
- Replace one of the ROCM 5.2 configuration with ROCM 5.6 — Bruno Turcksin / githubweb
- Use aligned version of operator new instead of aligned_alloc — Daniel Arndt / githubweb
- HPX: Don't interfere with exception handling — Daniel Arndt / githubweb
- Fix -Wformat-truncation warnings in CI (#6354) — noreply / githubweb
- Do not use HIP Graph with ROCm 5.2 — Bruno Turcksin / githubweb
- Use constexpr West in src — Bruno Turcksin / githubweb
- Use constexpr West in test — Bruno Turcksin / githubweb
- SIMD: convert binary operators to hidden friends (#6320) — noreply / githubweb
- simd: make mask and condition unit test to check with all data types (#6360) — noreply / githubweb
- team-level std algos: part 2 (#6205) — noreply / githubweb
- Add deprecated attribute to HostSpace(AllocationMechanism) definition — Damien L-G / githubweb
- Added gather_from and scatter_to for AVX2 and AVX512 simd — donlee / githubweb
- Rebased and applied feedbacks — donlee / githubweb
- Changed gen ctor usage to pass in KOKKOS_LAMBDA — donlee / githubweb
- Suppress NVCC attribute ignored warning on SIMD operators as hidden friends — Damien L-G / githubweb
- Remove deprecated code 3 support for volatile join — crtrott / githubweb
- Disable a test not working with nvhpc-23.1 — crtrott / githubweb
- Reenabling tests for nvhpc 23.7 — crtrott / githubweb
- Update containers and algorithms for NVC++ 23.7 — crtrott / githubweb
- More NVC++ 23.7 updates — crtrott / githubweb
- NVC++ clang-format fixes — crtrott / githubweb
- Update nvhpc to version 23.7 in the CI — crtrott / githubweb
- NVHPC 23.7 update: address reviewer comments — crtrott / githubweb
- OpenACC: Guard tests relying on abort — crtrott / githubweb
- Fix TestAtomic to use the test execspace — crtrott / githubweb
- Use NVHPC 23.7 for testing of OpenACC — crtrott / githubweb
- Update nvhpc gtest skip message — crtrott / githubweb
- Work around OpenMPTarget failure — crtrott / githubweb
- Update base docker file for nvhpc — crtrott / githubweb
- Remove stray Cuda graph pattern specialization from tag — Damien L-G / githubweb
- Fix typo — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Fix uninitialize variable warning with gcc 13 — Bruno Turcksin / githubweb
- HIP: Update print_configuration (#6387) — noreply / githubweb
- Add test — Daniel Arndt / githubweb
- Fix typo. — noreply / githubweb
- Check AVX, AVX2, AVX512XEON compiler macros on KokkosCore_config.h (#6248) — noreply / githubweb
- Don't use local headers or runtime in HPX backend due to deprecation — mikael.simberg / githubweb
- Explicitly check for valid device id — Daniel Arndt / githubweb
- Add a more comprehensive `kokkos_{malloc, free}` perf_test (#6377) — noreply / githubweb
- team-level std algos: part 3 (#6207) — noreply / githubweb
- Adding is_scoped_enum & to_underlying (#6356) — noreply / githubweb
- SIMD: add float simd support (#6177) — noreply / githubweb
- team-level std algos: part 4 (#6208) — noreply / githubweb
- Added a gen ctor for float (#6397) — noreply / githubweb
- team-level std algos: part 5 (#6209) — noreply / githubweb
- Deprecate Cuda(cudaStream_t, bool) — Damien L-G / githubweb
- Fixup checked interger operations death test — Damien L-G / githubweb
- Deprecate HIP(hipStream_t, bool) — Damien L-G / githubweb
- Let Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC be ON by default — Damien L-G / githubweb
- Print whether KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC is defined — Damien L-G / githubweb
- Introduce disable_malloc_async Cuda option with generated makefiles — Damien L-G / githubweb
- Preserve one build that disables Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC — Damien L-G / githubweb
- Use archive extraction time for timestamps — cezary.skrzynski / githubweb
- Disable performance benchmarks in AppVeyor CI — cezary.skrzynski / githubweb
- team-level std algos: part 6 (#6210) — noreply / githubweb
- address comments — fnrizzi / githubweb
- OpenMP backend refactor files. (#6403) — noreply / githubweb
- Drop (unused) `Cuda::cuda_internal_maximum_shared_words` — Damien L-G / githubweb
- Drop check that the host backend is initialized before the Cuda/HIP/SYCL one — Damien L-G / githubweb
- Drop unused HIPInternal::m_maxSharedWords data member — Damien L-G / githubweb
- Drop unused HIPInternal::m_hipArch static data member — Damien L-G / githubweb
- !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize — Damien L-G / githubweb
- Drop pre-Kepler logic in Cuda::impl_initialize — Damien L-G / githubweb
- use single — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- formatting — fnrizzi / githubweb
- Team-level std algos: part 7 (#6211) — noreply / githubweb
- formatting — fnrizzi / githubweb
- Enable death tests for fedora rawhide — cezary.skrzynski / githubweb
- core/src: Add half math functions to private header (#6124) — noreply / githubweb
- Drop check whether device supports unified addressing — Damien L-G / githubweb
- fix single as per Christian's suggestion — fnrizzi / githubweb
- Only warn once (at initialization) when forcing allocation in unified memory — Damien L-G / githubweb
- check-copyright improvements (#6399) — noreply / githubweb
- Use execution space instance argument to get device properties in block size deduction — Damien L-G / githubweb
- Address reviewer' comments — Bruno Turcksin / githubweb
- Fix to avoid #186-D pointless comparison warning. — maarten.arnst / githubweb
- add comment — fnrizzi / githubweb
- improve tests to address review — fnrizzi / githubweb
- Fix guard for isnan test for bhalf_t — Daniel Arndt / githubweb
- avoid potential race condition HIP — tccleve / githubweb
- Set the device id in cuda_kernel_arch — Daniel Arndt / githubweb
- [SYCL][Reduction] Group counter should use at least memory_order::acq_rel — andrei.elovikov / githubweb
- Same for scan — andrei.elovikov / githubweb
- Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy — Daniel Arndt / githubweb
- improve tests with intra-team result check — fnrizzi / githubweb
- Fixes for Kokkos::Array (#6372) — noreply / githubweb
- try fix for unique, previous impl to remove later — fnrizzi / githubweb
- #5635: Add parallel_scan changes for CUDA and TeamThreadRange — cezary.skrzynski / githubweb
- remove old impl — fnrizzi / githubweb
- #5635: Enable TeamThreadRange test for CUDA — cezary.skrzynski / githubweb
- Clean up benchmarks/gups — cwpears / githubweb
- benchmark/gups: use CMake — cwpears / githubweb
- OpenMPTarget: Disable check for SIMD compiler macros — Daniel Arndt / githubweb
- #5635: Add parallel_scan with value for CUDA and ThreadVectorRange — cezary.skrzynski / githubweb
- add missing assert — fnrizzi / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- add intra team check for missing test — fnrizzi / githubweb
- fix intel compile error — fnrizzi / githubweb
- fix unreachable for intel — fnrizzi / githubweb
- re-enable unit tests for sort and random via makefile (#6422) — noreply / githubweb
- OpenMPTarget init-join fix (#6444) — noreply / githubweb
- Fix Cuda parallel_scan ThreadVectorRange range — Daniel Arndt / githubweb
- Assign final sum in Cuda parallel_scan ThreadVectorRange — Daniel Arndt / githubweb
- Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443) — noreply / githubweb
- std_algos: improving min, max, minmax (#6421) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426) — noreply / githubweb
- Skip bessel function tests known to fail on Intel GPUs (#6434) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436) — noreply / githubweb
- Fix race condition in functor_vec_scan_ret_val test — Daniel Arndt / githubweb
- Fix parallel_scan_with_reducers test — Daniel Arndt / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425) — noreply / githubweb
- improve tests (#6432) — noreply / githubweb
- improve tests (#6437) — noreply / githubweb
- Move final assignment to correct scope — cezary.skrzynski / githubweb
- fix casting warning in Random test — fnrizzi / githubweb
- Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449) — noreply / githubweb
- HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7 — Daniel Arndt / githubweb
- fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459) — noreply / githubweb
- improve tests to check intra-team result (#6431) — noreply / githubweb
- SIMD: Math functions should be in namespace Kokkos — Daniel Arndt / githubweb
- SYCL: Disable another bessel function test for Intel GPUs — Daniel Arndt / githubweb
- Check for default device — Daniel Arndt / githubweb
- team-level std algos: part 10 (#6256) — noreply / githubweb
- team-level std algos: part 11 (#6258) — noreply / githubweb
- #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302) — noreply / githubweb
- #5635: Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- use shortcut — cezary.skrzynski / githubweb
- Skip testing for non-power-of-two team sizes — cezary.skrzynski / githubweb
- Fix formatting — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for HIP backend — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- #5635: SYCL: Add parallel_scan overload with return value — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1 — Daniel Arndt / githubweb
- Add compatible copy assignment operator to DualView — maarten.arnst / githubweb
- Fix minimum version for Google benchmark — Daniel Arndt / githubweb
- Add test of copy constructor/assignment operator for DualView. — maarten.arnst / githubweb
- Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU — maarten.arnst / githubweb
- Modify test so that source and destination view are of different type — maarten.arnst / githubweb
- Use call operator instead of run_me function — maarten.arnst / githubweb
- team-level std algos: part 12 (#6350) — noreply / githubweb
- core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407) — noreply / githubweb
- Moving abort and assert into their own public headers (#6445) — noreply / githubweb
- Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- Allow detecting SIMD types based on compiler macros (#6188) — noreply / githubweb
- Add KOKKOS_ARCH_ARM_NEON — Daniel Arndt / githubweb
- Fix implementation for cyl_bessel_i0 — Daniel Arndt / githubweb
- cleaning: remove iostream from headers where possible (IWYU) — romin.tomasetti / githubweb
- Fix compiling SIMD unit tests on NVIDIA — Daniel Arndt / githubweb
- guards to ensure DBL_EPSILON return for POWER8,9 — ajpowel / githubweb
- Fix NVCC warnings (#6483) — noreply / githubweb
- team-level std algos: part 13 (#6351) — noreply / githubweb
- Also fix annotations for generator constructor for AVX512 and NEON — Daniel Arndt / githubweb
- #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange — cezary.skrzynski / githubweb
- Fix atomic operations bug for Min and Max (#6435) — noreply / githubweb
- Fix example/build_cmake_installed_different_compiler — Daniel Arndt / githubweb
- fix impl — fnrizzi / githubweb
- Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp — noreply / githubweb
- Split Kokkos_Threads_Parallel files — Bruno Turcksin / githubweb
- HPX: Implement TeamThread and ThreadVector parallel_scan with return value — Daniel Arndt / githubweb
- Serial: Allow for distinct execution space instances (#6441) — noreply / githubweb
- add overload for TeamThreadRange — fnrizzi / githubweb
- address review comment — fnrizzi / githubweb
- Update to HIP TeamPolicy Block number heuristic (#6284) — noreply / githubweb
- SIMD: Split math functions from SIMD_Common.hpp (#6487) — noreply / githubweb
- Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON — Daniel Arndt / githubweb
- ad threadvector — fnrizzi / githubweb
- fix order — fnrizzi / githubweb
- remove guards — fnrizzi / githubweb
- UnorderedMap(space instance): proposal for #6067 — romin.tomasetti / githubweb
- Rename Kokkos_ThreadsExec to align with the other backends — Bruno Turcksin / githubweb
- Promote Kokkos_Printf.hpp to public include — Daniel Arndt / githubweb
- simd: add floor, ceil, round, trunc operations (#6393) — noreply / githubweb
- Update CI in OpenMPTarget to use llvm-17 (#6472) — noreply / githubweb
- Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp — Bruno Turcksin / githubweb
- Kokkos_SIMD_Scalar.hpp: remove extra ';' — ndellin / githubweb
- add guards — fnrizzi / githubweb
- avoid auto — fnrizzi / githubweb
- [ci skip] Update Kokkos version to 4.2.99 — Daniel Arndt / githubweb
- [deprecated code 3] remove all default device init tests — Damien L-G / githubweb
- [deprecated code 3] remove InitArguments — Damien L-G / githubweb
- [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants — Damien L-G / githubweb
- [deprecated code 3] remove {OpenMP,HPX}::partition_master — Damien L-G / githubweb
- OpenMP backend cleanup following removal of deprecated code 3 — Damien L-G / githubweb
- Get rid of FIXME_OPENMP — Damien L-G / githubweb
- [deprecated code 3] remove MasterLock — Damien L-G / githubweb
- [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option — Damien L-G / githubweb
- fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- Do not append " - blocks" to the bitset label — Damien L-G / githubweb
- with_updated_label -> append_to_label — Daniel Arndt / githubweb
- SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables — Daniel Arndt / githubweb
- Fixup in README (github -> GitHub) — Damien L-G / githubweb
- Check that device associated with stream matches requested device — Daniel Arndt / githubweb
- Threads: replace enum with constexpr int and enum class (#6514) — noreply / githubweb
- Added unit tests for reduction ops and few intel svml intrinsics — donlee / githubweb
- Remove sleep and wake functions — Bruno Turcksin / githubweb
- Remove extra constructor — Daniel Arndt / githubweb
- Prefer defaulted default constructor for Bitset (#6524) — noreply / githubweb
- SYCL: Use host-pinned memory to copy reduction/scan result (#6500) — noreply / githubweb
- Remove unused variables — Bruno Turcksin / githubweb
- Remove Sentinel struct from Threads — Bruno Turcksin / githubweb
- Small cleanup of ThreadsInternal::initialize — Bruno Turcksin / githubweb
- Remove useless forward declaration — Bruno Turcksin / githubweb
- Remove spawn function — Bruno Turcksin / githubweb
- Add comments — Bruno Turcksin / githubweb
- Fix indentation — Bruno Turcksin / githubweb
- Fix typo in macro guard — Bruno Turcksin / githubweb
- Reduce number of View constructor instantiations — Damien L-G / githubweb
- Bump HPX version used in CI to 1.9.0 — mikael.simberg / githubweb
- Split files in HIP backend — Bruno Turcksin / githubweb
- Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544) — noreply / githubweb
- Only define STDALGO_TEAM_SOURCES_* once — Daniel Arndt / githubweb
- Remove logical memory spaces — Damien L-G / githubweb
- Rollback changes to view constructors to reduce the number of instantiations (#6564) — noreply / githubweb
- Address reviewer comments — Daniel Arndt / githubweb
- Threads remove unused variables and functions (#6566) — noreply / githubweb
- Remove unused Sandia testing files (#6568) — noreply / githubweb
- fallback implementation cleanup — donlee / githubweb
- Remove empty quotation marks for static_assert — Daniel Arndt / githubweb
- [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header — Damien L-G / githubweb
- Drop Clang+CUDA workaround — Damien L-G / githubweb
- OpenMPTarget: CI compiler upgrade. (#6545) — noreply / githubweb
- m_cudaDev isn't static anymore — Daniel Arndt / githubweb
- Add crtrott's launch_latency benchmark (#6379) — noreply / githubweb
- Simplify fence functions in the Threads backend (#6571) — noreply / githubweb
- simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574) — noreply / githubweb
- Set the device id explicitly for CUDA API calls in impl_initialize — Daniel Arndt / githubweb
- OpenMPTarget: Guard scratch memory usage in ParallelReduce — Daniel Arndt / githubweb
- SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534) — noreply / githubweb
- Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header — Damien L-G / githubweb
- Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556) — noreply / githubweb
- OpenMP: No memset in viewfill (#6573) — noreply / githubweb
- Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header" — noreply / githubweb
- OpenACC: add atomics support (#6446) — noreply / githubweb
- Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543) — noreply / githubweb
- kokkos(unique): fix allocation of temporary view to enfore using the provided space instance — romin.tomasetti / githubweb
- Use binary wrapper for consistency in definition of half types numeric traits (#6590) — noreply / githubweb
- try fix — fnrizzi / githubweb
- Fix TestNumericTriats.hpp for SYCL with bfloat16 support — Daniel Arndt / githubweb
- Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3 — crtrott / githubweb
- Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601) — noreply / githubweb
- Added missing operator* to NEON simd — crtrott / githubweb
- [ci skip] Update changelog on develop for 4.2.00 (#6592) — noreply / githubweb
- Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF (#6593) — noreply / githubweb
- try fix — fnrizzi / githubweb
- avoid pyt package — fnrizzi / githubweb
- try — fnrizzi / githubweb
- fix for macos — fnrizzi / githubweb
- remove comments — fnrizzi / githubweb
- use reference — crtrott / githubweb
- add branching — fnrizzi / githubweb
- [ci skip] fix formatting — cezary.skrzynski / githubweb
- GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing — Daniel Arndt / githubweb
- nvcc(wrapper): adding missing `--generate-line-info` arg — romin.tomasetti / githubweb
- Add clang-format check to GitHub workflows (#6612) — noreply / githubweb
- graph(HIP): adding inline keyword to fix #6623 — romin.tomasetti / githubweb
- Add jenkins multibranch pipeline options — Bruno Turcksin / githubweb
- Add warp sync for Cuda parallel reduce — tccleve / githubweb
- kokkos(profiling): do not finalize in any backend — romin.tomasetti / githubweb
- Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format — Bruno Turcksin / githubweb
- update comment to include final() mention — tccleve / githubweb
- Disabling OpenACC in the CI because it emits too many warnings — Bruno Turcksin / githubweb
- unorderedmap: modernize traits — romin.tomasetti / githubweb
- nvcc wrapper: remove troubling flag to fix 6628 (#6629) — noreply / githubweb
- tools(profiling): type (related to kokkos/kokkos-tools/pull/221) — romin.tomasetti / githubweb
- This PR fixes the too-much-OpenACC-warning issue, mentioned in PR #6639. — lees2 / githubweb
- add missing header fix #6644 — fnrizzi / githubweb
- SYCL: Restrict workaround for is_device_copyable to oneAPI versions before 2024.0.0 (#6532) — noreply / githubweb
- Fixup test math functions ulp should double -> int — Damien L-G / githubweb
- Drop DualView converting copy assignment operator — Damien L-G / githubweb
- Don't use rocm-docker for clang-format — Daniel Arndt / githubweb
- Diable HIP CI — Daniel Arndt / githubweb
- Remove deprecation warning for AllocationMechanism for gcc <11.0 — Daniel Arndt / githubweb
- OpenMPTarget: clang extensions for dynamic shared memory. (#6380) — noreply / githubweb
- Fix builtin_unreachable use for MSVC/CUDA — crtrott / githubweb
- Fix missing include on msvc/cuda — crtrott / githubweb
- Avoid lambdas in constexpr branch for msvc/cuda — crtrott / githubweb
- Sidestep lacking CTAD support msvc/cuda — crtrott / githubweb
- Fix formatting — crtrott / githubweb
- Move header for Damien because he is right — crtrott / githubweb
- Unit test for issue 3371 (negative vector length should not yield a negative max_team_size) (#6076) — noreply / githubweb
- Add CMakeLists.txt for stream benchmark — cwpears / githubweb
- Do not negate the dependent true traits helper — Damien L-G / githubweb
- Drop guards to accommodate external code defining KOKKOS_ASSERT — Damien L-G / githubweb
- Use omp_get_max_active_levels() when supported — Daniel Arndt / githubweb
- Add missing gfx940 — rberger / githubweb
- Add Impl::always_false type-dendent false trait — Damien L-G / githubweb
- Per review prefer always_false<Arg>::value to is_void_v<Arg> — Damien L-G / githubweb
- Improve "no copy mechanism" exception message — bmkelle / githubweb
- Add a unit test for new deep_copy exception msg — bmkelle / githubweb
- Add missing include sstream — bmkelle / githubweb
- src->source, dst->destination — bmkelle / githubweb
- Workaround for ROCm 6.0 failing to compile with AVX2 SIMD support — Bruno Turcksin / githubweb
- SYCL: Force inlining of Kokkos::printf (#6650) — noreply / githubweb
- Improve handling of printf in OMPT on Intel GPUs — Daniel Arndt / githubweb
- OpenMP: Use `omp_get_nested` for older gcc versions (#6685) — noreply / githubweb
- Disable more Bessel tests for SYCL on INtel GPUs — Daniel Arndt / githubweb
- fill_random without exceution space instance should fence — Daniel Arndt / githubweb
- Drop unnecessary guarding for a tool library being loaded in ProfilingSection — Damien L-G / githubweb
- Drop unnecessary header include in Kokkos_Profiling_ProfileSection.hpp — Damien L-G / githubweb
- #5333: CUDA: Use scratch space appropriate to small reduction elements in Team reductions (#5334) — noreply / githubweb
- Cuda: Allocate using the correct device (#6392) — noreply / githubweb
- Let `Profiling::ProfilingSection(std::string)` constructor be explicit and nodiscard (#6690) — noreply / githubweb
- Cosmetic changes to ProfilingSection — Damien L-G / githubweb
- GitHub CI: Test with AddressSanitizer (#6676) — noreply / githubweb
- Kokkos::Array deduction guide (#6373) — noreply / githubweb
- Add CI for MSVC+Cuda (#6661) — noreply / githubweb
- SYCL: Address deprecations after oneAPI 2023.2.0 (#6577) — noreply / githubweb
- Fixup cast tolerance to double before printing — Damien L-G / githubweb
- Try linking against CUDA libararies even with KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE — Daniel Arndt / githubweb
- test_array_ctad: disable test for intel versions < 2021 — ndellin / githubweb
- Enable `{transform_}exclusive_scan` in place (#6667) — noreply / githubweb
- Add `ATOMICS_BYPASS` configuration option to disable atomics (#6692) — noreply / githubweb
- Check matching static extents in View constructor (#5190) — noreply / githubweb
- Remove Kokkos::[b]half_t volatile overloads (#6579) — noreply / githubweb
- add tests — fnrizzi / githubweb
- Provide `kokkos_swap` as part of Core and deprecate `Experimental::swap` in Algorithms (#6697) — noreply / githubweb
- Provide new public headers `<Kokkos_Clamp.hpp>` and `<Kokkos_MinMax.hpp>` (#6687) — noreply / githubweb
- Fix TeamThreadMDRange parallel_reduce (#6511) — noreply / githubweb
- add tests for in-place `inclusive_scan` (#6682) — noreply / githubweb
- Drop pointless Kokkos::Impl::CudaExec forward declaration — Damien L-G / githubweb
- Don't use the compiler launcher script if the compile language is CUDA. (#6704) — noreply / githubweb
- Deprecate `{Cuda,HIP}::detect_device_count()` and `Cuda::[detect_]device_arch()` (#6710) — noreply / githubweb
- Get rid of CudaInternal::cuda_get_error_{name,string}_wrapper — Damien L-G / githubweb
- No need to jump through so many hoops to print the error message — Damien L-G / githubweb
- HIP: Forgot to delete matching brace closing the namespace — Damien L-G / githubweb
- Make initialize and finalize of the Cuda/HIP singleton less special (#6714) — noreply / githubweb
- Kokkos_HIP.cpp: include Kokkos_Core.hpp to resolve errors — ndellin / githubweb
- Add bound checks in RangePolicy and MDRangePolicy (#6617) — noreply / githubweb
- Temporary fix to reenable HIP CI — Bruno Turcksin / githubweb
- Let the smart pointer manage the CUDA/HIP stream (#6721) — noreply / githubweb
- Fix Docker env variables — Bruno Turcksin / githubweb
- Ensure view_allocation_error does not silently ignore that no exception was thrown — Damien L-G / githubweb
- Add RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc enumerator — Damien L-G / githubweb
- [OpenACC] throw if acc_malloc returned nullptr — Damien L-G / githubweb
- Fixup using declaration — Damien L-G / githubweb
- Disable openacc.view_allocation_error test — Damien L-G / githubweb
- Guard `[MD]RangePolicy` precondition check for deprecated code 4 (#6726) — noreply / githubweb
- Add C++26 standard to CMake Setup — dev / githubweb
- Add support for C++26 in generated makefiles — Damien L-G / githubweb
- Add KOKKOS_ENABLE_CXX26 to the configuration metadata — Damien L-G / githubweb
- Reenable HIP testing — Bruno Turcksin / githubweb
- Disabling failing HIP test in the CI — Bruno Turcksin / githubweb
- Use team_size_max to fix "Team size too large" error in reducer test (#6725) — noreply / githubweb
- Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with `SharedAllocationRecord`) (#6732) — noreply / githubweb
- Untangle SharedAllocationRecord spaghetti code — Damien L-G / githubweb
- Fix TestThreadVectorMDRangeParallelReduce (#6734) — noreply / githubweb
- Cuda multi-GPU support: Allow execution space instance constructor to run (#6706) — noreply / githubweb
- Drop support for deprecated command-line arguments and environment variables (#6744) — noreply / githubweb
- Avoid unnecessary zero-memset of the scratch flags in SYCL (#6739) — noreply / githubweb
#3 (Jan 25, 2024, 10:41:36 AM)
- implementation and tests — fnrizzi / githubweb
- implementation and tests — fnrizzi / githubweb
- only compute with relavent entries — tccleve / githubweb
- subset of team level impl of std algorithms — fnrizzi / githubweb
- fix copyright — fnrizzi / githubweb
- guard for openmptarget — fnrizzi / githubweb
- fix for openmptarget — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- SIMD: suppress a uninitialized variable warning (#6294) — noreply / githubweb
- OpenACC CMakechange Clacc (#6250) — noreply / githubweb
- `Kokkos::sort` support custom comparator (#6253) — noreply / githubweb
- Add nightly build using latest gcc and c++23 — Bruno Turcksin / githubweb
- Explicitly capture this in lambda function — Bruno Turcksin / githubweb
- Fix typo in nightly jenkins configuration — Bruno Turcksin / githubweb
- Fix a memory bug in the `free_state` function of random pools. (#6290) — noreply / githubweb
- std_algos: for_each: try condense the impl — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Enforce create_mirror restrictions w.r.t. ViewCtorArgs on all variants — Daniel Arndt / githubweb
- Use KOKKOS_IF_ON_HOST — Daniel Arndt / githubweb
- Suppress warnings — Daniel Arndt / githubweb
- Don't suppress warnings for NVHPC — Daniel Arndt / githubweb
- Improve macro definitions — Daniel Arndt / githubweb
- Enable Serial backend in HPX build — cezary.skrzynski / githubweb
- Modify fences in View API test — cezary.skrzynski / githubweb
- Check for overflow during backend initialization (Cuda, HIP, SYCL) (#6159) — noreply / githubweb
- Improve SYCL reduction performance: MDRangePolicy (#6271) — noreply / githubweb
- bring back previous code as discussed in meeting — fnrizzi / githubweb
- create cudaAPI function wrappers — tccleve / githubweb
- Reorganize #include <cuda_runtime_api.h> — tccleve / githubweb
- Some api function require cuda11.2+ — tccleve / githubweb
- Cuda10 requires "stream=nullptr" as default arg — tccleve / githubweb
- Rework stream inputs — tccleve / githubweb
- Use "if constexpr" for setCudaDevice — tccleve / githubweb
- Remove static in comment — tccleve / githubweb
- Update CI: use CUDA 11.6.2 instead of 11.6.0 (#6314) — noreply / githubweb
- add helper variable templates `are_*_iterators_v` (#6312) — noreply / githubweb
- [HIP] Optimize parallel_reduce (#6229) — noreply / githubweb
- Split Kokkos_SYCL_Parallel_Range.hpp — Daniel Arndt / githubweb
- Split Kokkos_SYCL_Parallel_Reduce.hpp — Daniel Arndt / githubweb
- Move Kokkos_SYCL_Scan.hpp — Daniel Arndt / githubweb
- Split Kokkos_Parallel_Team.hpp — Daniel Arndt / githubweb
- Add comment that we don't need to link against clang_rt with ROCm 5.5 — Bruno Turcksin / githubweb
- Error out when CXX standard is not set when using amdclang or cray clang — Bruno Turcksin / githubweb
- Disable failing tests for ROCm 5.5 and 5.6 — Bruno Turcksin / githubweb
- #5635: Add parallel_scan overload with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Serial/OpenMP: Parallel_scan with return value for TeamThreadRange (#6090) — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Allow using the SYCL execution space on AMD GPUs (#6321) — noreply / githubweb
- Update Clang+Cuda CI from 10.1 to 11.0.3 (#6318) — noreply / githubweb
- SIMD: add shift ops for all int types (#6109) — noreply / githubweb
- SYCL: Use sycl::bit_cast from oneAPI 2024.0.0 on (#6300) — noreply / githubweb
- remove spurious undefs — fnrizzi / githubweb
- Decrease maximum memory available to ccache — Bruno Turcksin / githubweb
- Fix gtest when using C++20 — Bruno Turcksin / githubweb
- Fully qualify Experimental::SYCL in algorithms to avoid conflicting namespaces — Daniel Arndt / githubweb
- address reviews [skip ci] — antoine.meyer54 / githubweb
- formatting — fnrizzi / githubweb
- SIMD: split simd unit tests into separate files (#6278) — noreply / githubweb
- Run NVHPC only on V100 — Bruno Turcksin / githubweb
- Use checked arithmetic builtins for overflow detection (#6313) — noreply / githubweb
- Passing size==0 to DeepCopy/memcpy/omp_target_memcpy (#6273) — noreply / githubweb
- allow sorting via native oneDPL to support views with stride = 1 (#6322) — noreply / githubweb
- Adopt new HIP cmake's way of finding clang-rt — nicurtis / githubweb
- Fix #6334: intel compiler does not like returning inside if constexpr (#6335) — noreply / githubweb
- Fix #6336: capturing `this` implicitly is deprecated in C++20 (#6337) — noreply / githubweb
- fix lambda capture — fnrizzi / githubweb
- remove unnecessary file, fix constraints — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix — fnrizzi / githubweb
- fix lambda capture and constraints — fnrizzi / githubweb
- keep only subset — fnrizzi / githubweb
- revert files — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix syntax — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Disable default oneDPL support in Trilinos — Daniel Arndt / githubweb
- Rename AMD GPU architectures (#6266) — noreply / githubweb
- Fix compiling SIMD libraray with NEON and gcc-13 — Daniel Arndt / githubweb
- SIMD: add generator constructors (#6347) — noreply / githubweb
- Use /usr/bin/env bash in nvcc_wrapper and kokkos_launch_compiler for portability — mikael.simberg / githubweb
- Fix cudaAPI wrapper errors for CUDA_MALLOC_ASYNC (#6346) — noreply / githubweb
- Use std::aligned_alloc for allocations (#6341) — noreply / githubweb
- Initial implementation of gfx942 (#6358) — noreply / githubweb
- Introduce constructor for multi-GPU support. — Daniel Arndt / githubweb
- Only set KOKKOS_ARCH_AMD_GPU if a AMD GPU architecture is enabled — Daniel Arndt / githubweb
- Extend 'hip_driver_check_page_migration' (#6364) — noreply / githubweb
- Replace volatile m_ready_count loads by desul::atomic_loads in task queue implementation — mikael.simberg / githubweb
- Add Kokkos_Atomic.hpp headers to OpenMP and HPX task headers — mikael.simberg / githubweb
- Add support for HIP Graph — Bruno Turcksin / githubweb
- Replace one of the ROCM 5.2 configuration with ROCM 5.6 — Bruno Turcksin / githubweb
- Use aligned version of operator new instead of aligned_alloc — Daniel Arndt / githubweb
- HPX: Don't interfere with exception handling — Daniel Arndt / githubweb
- Fix -Wformat-truncation warnings in CI (#6354) — noreply / githubweb
- Do not use HIP Graph with ROCm 5.2 — Bruno Turcksin / githubweb
- Use constexpr West in src — Bruno Turcksin / githubweb
- Use constexpr West in test — Bruno Turcksin / githubweb
- SIMD: convert binary operators to hidden friends (#6320) — noreply / githubweb
- simd: make mask and condition unit test to check with all data types (#6360) — noreply / githubweb
- team-level std algos: part 2 (#6205) — noreply / githubweb
- Add deprecated attribute to HostSpace(AllocationMechanism) definition — Damien L-G / githubweb
- Added gather_from and scatter_to for AVX2 and AVX512 simd — donlee / githubweb
- Rebased and applied feedbacks — donlee / githubweb
- Changed gen ctor usage to pass in KOKKOS_LAMBDA — donlee / githubweb
- Suppress NVCC attribute ignored warning on SIMD operators as hidden friends — Damien L-G / githubweb
- Remove deprecated code 3 support for volatile join — crtrott / githubweb
- Disable a test not working with nvhpc-23.1 — crtrott / githubweb
- Reenabling tests for nvhpc 23.7 — crtrott / githubweb
- Update containers and algorithms for NVC++ 23.7 — crtrott / githubweb
- More NVC++ 23.7 updates — crtrott / githubweb
- NVC++ clang-format fixes — crtrott / githubweb
- Update nvhpc to version 23.7 in the CI — crtrott / githubweb
- NVHPC 23.7 update: address reviewer comments — crtrott / githubweb
- OpenACC: Guard tests relying on abort — crtrott / githubweb
- Fix TestAtomic to use the test execspace — crtrott / githubweb
- Use NVHPC 23.7 for testing of OpenACC — crtrott / githubweb
- Update nvhpc gtest skip message — crtrott / githubweb
- Work around OpenMPTarget failure — crtrott / githubweb
- Update base docker file for nvhpc — crtrott / githubweb
- Remove stray Cuda graph pattern specialization from tag — Damien L-G / githubweb
- Fix typo — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Fix uninitialize variable warning with gcc 13 — Bruno Turcksin / githubweb
- HIP: Update print_configuration (#6387) — noreply / githubweb
- Add test — Daniel Arndt / githubweb
- Fix typo. — noreply / githubweb
- Check AVX, AVX2, AVX512XEON compiler macros on KokkosCore_config.h (#6248) — noreply / githubweb
- Don't use local headers or runtime in HPX backend due to deprecation — mikael.simberg / githubweb
- Explicitly check for valid device id — Daniel Arndt / githubweb
- Add a more comprehensive `kokkos_{malloc, free}` perf_test (#6377) — noreply / githubweb
- team-level std algos: part 3 (#6207) — noreply / githubweb
- Adding is_scoped_enum & to_underlying (#6356) — noreply / githubweb
- SIMD: add float simd support (#6177) — noreply / githubweb
- team-level std algos: part 4 (#6208) — noreply / githubweb
- Added a gen ctor for float (#6397) — noreply / githubweb
- team-level std algos: part 5 (#6209) — noreply / githubweb
- Deprecate Cuda(cudaStream_t, bool) — Damien L-G / githubweb
- Fixup checked interger operations death test — Damien L-G / githubweb
- Deprecate HIP(hipStream_t, bool) — Damien L-G / githubweb
- Let Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC be ON by default — Damien L-G / githubweb
- Print whether KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC is defined — Damien L-G / githubweb
- Introduce disable_malloc_async Cuda option with generated makefiles — Damien L-G / githubweb
- Preserve one build that disables Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC — Damien L-G / githubweb
- Use archive extraction time for timestamps — cezary.skrzynski / githubweb
- Disable performance benchmarks in AppVeyor CI — cezary.skrzynski / githubweb
- team-level std algos: part 6 (#6210) — noreply / githubweb
- address comments — fnrizzi / githubweb
- OpenMP backend refactor files. (#6403) — noreply / githubweb
- Drop (unused) `Cuda::cuda_internal_maximum_shared_words` — Damien L-G / githubweb
- Drop check that the host backend is initialized before the Cuda/HIP/SYCL one — Damien L-G / githubweb
- Drop unused HIPInternal::m_maxSharedWords data member — Damien L-G / githubweb
- Drop unused HIPInternal::m_hipArch static data member — Damien L-G / githubweb
- !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize — Damien L-G / githubweb
- Drop pre-Kepler logic in Cuda::impl_initialize — Damien L-G / githubweb
- use single — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- formatting — fnrizzi / githubweb
- Team-level std algos: part 7 (#6211) — noreply / githubweb
- formatting — fnrizzi / githubweb
- Enable death tests for fedora rawhide — cezary.skrzynski / githubweb
- core/src: Add half math functions to private header (#6124) — noreply / githubweb
- Drop check whether device supports unified addressing — Damien L-G / githubweb
- fix single as per Christian's suggestion — fnrizzi / githubweb
- Only warn once (at initialization) when forcing allocation in unified memory — Damien L-G / githubweb
- check-copyright improvements (#6399) — noreply / githubweb
- Use execution space instance argument to get device properties in block size deduction — Damien L-G / githubweb
- Address reviewer' comments — Bruno Turcksin / githubweb
- Fix to avoid #186-D pointless comparison warning. — maarten.arnst / githubweb
- add comment — fnrizzi / githubweb
- improve tests to address review — fnrizzi / githubweb
- Fix guard for isnan test for bhalf_t — Daniel Arndt / githubweb
- avoid potential race condition HIP — tccleve / githubweb
- Set the device id in cuda_kernel_arch — Daniel Arndt / githubweb
- [SYCL][Reduction] Group counter should use at least memory_order::acq_rel — andrei.elovikov / githubweb
- Same for scan — andrei.elovikov / githubweb
- Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy — Daniel Arndt / githubweb
- improve tests with intra-team result check — fnrizzi / githubweb
- Fixes for Kokkos::Array (#6372) — noreply / githubweb
- try fix for unique, previous impl to remove later — fnrizzi / githubweb
- #5635: Add parallel_scan changes for CUDA and TeamThreadRange — cezary.skrzynski / githubweb
- remove old impl — fnrizzi / githubweb
- #5635: Enable TeamThreadRange test for CUDA — cezary.skrzynski / githubweb
- Clean up benchmarks/gups — cwpears / githubweb
- benchmark/gups: use CMake — cwpears / githubweb
- OpenMPTarget: Disable check for SIMD compiler macros — Daniel Arndt / githubweb
- #5635: Add parallel_scan with value for CUDA and ThreadVectorRange — cezary.skrzynski / githubweb
- add missing assert — fnrizzi / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- add intra team check for missing test — fnrizzi / githubweb
- fix intel compile error — fnrizzi / githubweb
- fix unreachable for intel — fnrizzi / githubweb
- re-enable unit tests for sort and random via makefile (#6422) — noreply / githubweb
- OpenMPTarget init-join fix (#6444) — noreply / githubweb
- Fix Cuda parallel_scan ThreadVectorRange range — Daniel Arndt / githubweb
- Assign final sum in Cuda parallel_scan ThreadVectorRange — Daniel Arndt / githubweb
- Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443) — noreply / githubweb
- std_algos: improving min, max, minmax (#6421) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426) — noreply / githubweb
- Skip bessel function tests known to fail on Intel GPUs (#6434) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436) — noreply / githubweb
- Fix race condition in functor_vec_scan_ret_val test — Daniel Arndt / githubweb
- Fix parallel_scan_with_reducers test — Daniel Arndt / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425) — noreply / githubweb
- improve tests (#6432) — noreply / githubweb
- improve tests (#6437) — noreply / githubweb
- Move final assignment to correct scope — cezary.skrzynski / githubweb
- fix casting warning in Random test — fnrizzi / githubweb
- Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449) — noreply / githubweb
- HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7 — Daniel Arndt / githubweb
- fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459) — noreply / githubweb
- improve tests to check intra-team result (#6431) — noreply / githubweb
- SIMD: Math functions should be in namespace Kokkos — Daniel Arndt / githubweb
- SYCL: Disable another bessel function test for Intel GPUs — Daniel Arndt / githubweb
- Check for default device — Daniel Arndt / githubweb
- team-level std algos: part 10 (#6256) — noreply / githubweb
- team-level std algos: part 11 (#6258) — noreply / githubweb
- #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302) — noreply / githubweb
- #5635: Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- use shortcut — cezary.skrzynski / githubweb
- Skip testing for non-power-of-two team sizes — cezary.skrzynski / githubweb
- Fix formatting — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for HIP backend — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- #5635: SYCL: Add parallel_scan overload with return value — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1 — Daniel Arndt / githubweb
- Add compatible copy assignment operator to DualView — maarten.arnst / githubweb
- Fix minimum version for Google benchmark — Daniel Arndt / githubweb
- Add test of copy constructor/assignment operator for DualView. — maarten.arnst / githubweb
- Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU — maarten.arnst / githubweb
- Modify test so that source and destination view are of different type — maarten.arnst / githubweb
- Use call operator instead of run_me function — maarten.arnst / githubweb
- team-level std algos: part 12 (#6350) — noreply / githubweb
- core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407) — noreply / githubweb
- Moving abort and assert into their own public headers (#6445) — noreply / githubweb
- Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- Allow detecting SIMD types based on compiler macros (#6188) — noreply / githubweb
- Add KOKKOS_ARCH_ARM_NEON — Daniel Arndt / githubweb
- Fix implementation for cyl_bessel_i0 — Daniel Arndt / githubweb
- cleaning: remove iostream from headers where possible (IWYU) — romin.tomasetti / githubweb
- Fix compiling SIMD unit tests on NVIDIA — Daniel Arndt / githubweb
- guards to ensure DBL_EPSILON return for POWER8,9 — ajpowel / githubweb
- Fix NVCC warnings (#6483) — noreply / githubweb
- team-level std algos: part 13 (#6351) — noreply / githubweb
- Also fix annotations for generator constructor for AVX512 and NEON — Daniel Arndt / githubweb
- #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange — cezary.skrzynski / githubweb
- Fix atomic operations bug for Min and Max (#6435) — noreply / githubweb
- Fix example/build_cmake_installed_different_compiler — Daniel Arndt / githubweb
- fix impl — fnrizzi / githubweb
- Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp — noreply / githubweb
- Split Kokkos_Threads_Parallel files — Bruno Turcksin / githubweb
- HPX: Implement TeamThread and ThreadVector parallel_scan with return value — Daniel Arndt / githubweb
- Serial: Allow for distinct execution space instances (#6441) — noreply / githubweb
- add overload for TeamThreadRange — fnrizzi / githubweb
- address review comment — fnrizzi / githubweb
- Update to HIP TeamPolicy Block number heuristic (#6284) — noreply / githubweb
- SIMD: Split math functions from SIMD_Common.hpp (#6487) — noreply / githubweb
- Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON — Daniel Arndt / githubweb
- ad threadvector — fnrizzi / githubweb
- fix order — fnrizzi / githubweb
- remove guards — fnrizzi / githubweb
- UnorderedMap(space instance): proposal for #6067 — romin.tomasetti / githubweb
- Rename Kokkos_ThreadsExec to align with the other backends — Bruno Turcksin / githubweb
- Promote Kokkos_Printf.hpp to public include — Daniel Arndt / githubweb
- simd: add floor, ceil, round, trunc operations (#6393) — noreply / githubweb
- Update CI in OpenMPTarget to use llvm-17 (#6472) — noreply / githubweb
- Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp — Bruno Turcksin / githubweb
- Kokkos_SIMD_Scalar.hpp: remove extra ';' — ndellin / githubweb
- add guards — fnrizzi / githubweb
- avoid auto — fnrizzi / githubweb
- [ci skip] Update Kokkos version to 4.2.99 — Daniel Arndt / githubweb
- [deprecated code 3] remove all default device init tests — Damien L-G / githubweb
- [deprecated code 3] remove InitArguments — Damien L-G / githubweb
- [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants — Damien L-G / githubweb
- [deprecated code 3] remove {OpenMP,HPX}::partition_master — Damien L-G / githubweb
- OpenMP backend cleanup following removal of deprecated code 3 — Damien L-G / githubweb
- Get rid of FIXME_OPENMP — Damien L-G / githubweb
- [deprecated code 3] remove MasterLock — Damien L-G / githubweb
- [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option — Damien L-G / githubweb
- fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- Do not append " - blocks" to the bitset label — Damien L-G / githubweb
- with_updated_label -> append_to_label — Daniel Arndt / githubweb
- SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables — Daniel Arndt / githubweb
- Fixup in README (github -> GitHub) — Damien L-G / githubweb
- Check that device associated with stream matches requested device — Daniel Arndt / githubweb
- Threads: replace enum with constexpr int and enum class (#6514) — noreply / githubweb
- Added unit tests for reduction ops and few intel svml intrinsics — donlee / githubweb
- Remove sleep and wake functions — Bruno Turcksin / githubweb
- Remove extra constructor — Daniel Arndt / githubweb
- Prefer defaulted default constructor for Bitset (#6524) — noreply / githubweb
- SYCL: Use host-pinned memory to copy reduction/scan result (#6500) — noreply / githubweb
- Remove unused variables — Bruno Turcksin / githubweb
- Remove Sentinel struct from Threads — Bruno Turcksin / githubweb
- Small cleanup of ThreadsInternal::initialize — Bruno Turcksin / githubweb
- Remove useless forward declaration — Bruno Turcksin / githubweb
- Remove spawn function — Bruno Turcksin / githubweb
- Add comments — Bruno Turcksin / githubweb
- Fix indentation — Bruno Turcksin / githubweb
- Fix typo in macro guard — Bruno Turcksin / githubweb
- Reduce number of View constructor instantiations — Damien L-G / githubweb
- Bump HPX version used in CI to 1.9.0 — mikael.simberg / githubweb
- Split files in HIP backend — Bruno Turcksin / githubweb
- Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544) — noreply / githubweb
- Only define STDALGO_TEAM_SOURCES_* once — Daniel Arndt / githubweb
- Remove logical memory spaces — Damien L-G / githubweb
- Rollback changes to view constructors to reduce the number of instantiations (#6564) — noreply / githubweb
- Address reviewer comments — Daniel Arndt / githubweb
- Threads remove unused variables and functions (#6566) — noreply / githubweb
- Remove unused Sandia testing files (#6568) — noreply / githubweb
- fallback implementation cleanup — donlee / githubweb
- Remove empty quotation marks for static_assert — Daniel Arndt / githubweb
- [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header — Damien L-G / githubweb
- Drop Clang+CUDA workaround — Damien L-G / githubweb
- OpenMPTarget: CI compiler upgrade. (#6545) — noreply / githubweb
- m_cudaDev isn't static anymore — Daniel Arndt / githubweb
- Add crtrott's launch_latency benchmark (#6379) — noreply / githubweb
- Simplify fence functions in the Threads backend (#6571) — noreply / githubweb
- simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574) — noreply / githubweb
- Set the device id explicitly for CUDA API calls in impl_initialize — Daniel Arndt / githubweb
- OpenMPTarget: Guard scratch memory usage in ParallelReduce — Daniel Arndt / githubweb
- SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534) — noreply / githubweb
- Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header — Damien L-G / githubweb
- Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556) — noreply / githubweb
- OpenMP: No memset in viewfill (#6573) — noreply / githubweb
- Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header" — noreply / githubweb
- OpenACC: add atomics support (#6446) — noreply / githubweb
- Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543) — noreply / githubweb
- kokkos(unique): fix allocation of temporary view to enfore using the provided space instance — romin.tomasetti / githubweb
- Use binary wrapper for consistency in definition of half types numeric traits (#6590) — noreply / githubweb
- try fix — fnrizzi / githubweb
- Fix TestNumericTriats.hpp for SYCL with bfloat16 support — Daniel Arndt / githubweb
- Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3 — crtrott / githubweb
- Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601) — noreply / githubweb
- Added missing operator* to NEON simd — crtrott / githubweb
- [ci skip] Update changelog on develop for 4.2.00 (#6592) — noreply / githubweb
- Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF (#6593) — noreply / githubweb
- try fix — fnrizzi / githubweb
- avoid pyt package — fnrizzi / githubweb
- try — fnrizzi / githubweb
- fix for macos — fnrizzi / githubweb
- remove comments — fnrizzi / githubweb
- use reference — crtrott / githubweb
- add branching — fnrizzi / githubweb
- [ci skip] fix formatting — cezary.skrzynski / githubweb
- GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing — Daniel Arndt / githubweb
- nvcc(wrapper): adding missing `--generate-line-info` arg — romin.tomasetti / githubweb
- Add clang-format check to GitHub workflows (#6612) — noreply / githubweb
- graph(HIP): adding inline keyword to fix #6623 — romin.tomasetti / githubweb
- Add jenkins multibranch pipeline options — Bruno Turcksin / githubweb
- Add warp sync for Cuda parallel reduce — tccleve / githubweb
- kokkos(profiling): do not finalize in any backend — romin.tomasetti / githubweb
- Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format — Bruno Turcksin / githubweb
- update comment to include final() mention — tccleve / githubweb
- Disabling OpenACC in the CI because it emits too many warnings — Bruno Turcksin / githubweb
- unorderedmap: modernize traits — romin.tomasetti / githubweb
- nvcc wrapper: remove troubling flag to fix 6628 (#6629) — noreply / githubweb
- tools(profiling): type (related to kokkos/kokkos-tools/pull/221) — romin.tomasetti / githubweb
- This PR fixes the too-much-OpenACC-warning issue, mentioned in PR #6639. — lees2 / githubweb
- add missing header fix #6644 — fnrizzi / githubweb
- SYCL: Restrict workaround for is_device_copyable to oneAPI versions before 2024.0.0 (#6532) — noreply / githubweb
- Fixup test math functions ulp should double -> int — Damien L-G / githubweb
- Drop DualView converting copy assignment operator — Damien L-G / githubweb
- Don't use rocm-docker for clang-format — Daniel Arndt / githubweb
- Diable HIP CI — Daniel Arndt / githubweb
- Remove deprecation warning for AllocationMechanism for gcc <11.0 — Daniel Arndt / githubweb
- OpenMPTarget: clang extensions for dynamic shared memory. (#6380) — noreply / githubweb
- Fix builtin_unreachable use for MSVC/CUDA — crtrott / githubweb
- Fix missing include on msvc/cuda — crtrott / githubweb
- Avoid lambdas in constexpr branch for msvc/cuda — crtrott / githubweb
- Sidestep lacking CTAD support msvc/cuda — crtrott / githubweb
- Fix formatting — crtrott / githubweb
- Move header for Damien because he is right — crtrott / githubweb
- Unit test for issue 3371 (negative vector length should not yield a negative max_team_size) (#6076) — noreply / githubweb
- Add CMakeLists.txt for stream benchmark — cwpears / githubweb
- Do not negate the dependent true traits helper — Damien L-G / githubweb
- Drop guards to accommodate external code defining KOKKOS_ASSERT — Damien L-G / githubweb
- Use omp_get_max_active_levels() when supported — Daniel Arndt / githubweb
- Add missing gfx940 — rberger / githubweb
- Add Impl::always_false type-dendent false trait — Damien L-G / githubweb
- Per review prefer always_false<Arg>::value to is_void_v<Arg> — Damien L-G / githubweb
- Improve "no copy mechanism" exception message — bmkelle / githubweb
- Add a unit test for new deep_copy exception msg — bmkelle / githubweb
- Add missing include sstream — bmkelle / githubweb
- src->source, dst->destination — bmkelle / githubweb
- Workaround for ROCm 6.0 failing to compile with AVX2 SIMD support — Bruno Turcksin / githubweb
- SYCL: Force inlining of Kokkos::printf (#6650) — noreply / githubweb
- Improve handling of printf in OMPT on Intel GPUs — Daniel Arndt / githubweb
- OpenMP: Use `omp_get_nested` for older gcc versions (#6685) — noreply / githubweb
- Disable more Bessel tests for SYCL on INtel GPUs — Daniel Arndt / githubweb
- fill_random without exceution space instance should fence — Daniel Arndt / githubweb
- Drop unnecessary guarding for a tool library being loaded in ProfilingSection — Damien L-G / githubweb
- Drop unnecessary header include in Kokkos_Profiling_ProfileSection.hpp — Damien L-G / githubweb
- #5333: CUDA: Use scratch space appropriate to small reduction elements in Team reductions (#5334) — noreply / githubweb
- Cuda: Allocate using the correct device (#6392) — noreply / githubweb
- Let `Profiling::ProfilingSection(std::string)` constructor be explicit and nodiscard (#6690) — noreply / githubweb
- Cosmetic changes to ProfilingSection — Damien L-G / githubweb
- GitHub CI: Test with AddressSanitizer (#6676) — noreply / githubweb
- Kokkos::Array deduction guide (#6373) — noreply / githubweb
- Add CI for MSVC+Cuda (#6661) — noreply / githubweb
- SYCL: Address deprecations after oneAPI 2023.2.0 (#6577) — noreply / githubweb
- Fixup cast tolerance to double before printing — Damien L-G / githubweb
- Try linking against CUDA libararies even with KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE — Daniel Arndt / githubweb
- test_array_ctad: disable test for intel versions < 2021 — ndellin / githubweb
- Enable `{transform_}exclusive_scan` in place (#6667) — noreply / githubweb
- Add `ATOMICS_BYPASS` configuration option to disable atomics (#6692) — noreply / githubweb
- Check matching static extents in View constructor (#5190) — noreply / githubweb
- Remove Kokkos::[b]half_t volatile overloads (#6579) — noreply / githubweb
- add tests — fnrizzi / githubweb
- Provide `kokkos_swap` as part of Core and deprecate `Experimental::swap` in Algorithms (#6697) — noreply / githubweb
- Provide new public headers `<Kokkos_Clamp.hpp>` and `<Kokkos_MinMax.hpp>` (#6687) — noreply / githubweb
- Fix TeamThreadMDRange parallel_reduce (#6511) — noreply / githubweb
- add tests for in-place `inclusive_scan` (#6682) — noreply / githubweb
- Drop pointless Kokkos::Impl::CudaExec forward declaration — Damien L-G / githubweb
- Don't use the compiler launcher script if the compile language is CUDA. (#6704) — noreply / githubweb
- Deprecate `{Cuda,HIP}::detect_device_count()` and `Cuda::[detect_]device_arch()` (#6710) — noreply / githubweb
- Get rid of CudaInternal::cuda_get_error_{name,string}_wrapper — Damien L-G / githubweb
- No need to jump through so many hoops to print the error message — Damien L-G / githubweb
- HIP: Forgot to delete matching brace closing the namespace — Damien L-G / githubweb
- Make initialize and finalize of the Cuda/HIP singleton less special (#6714) — noreply / githubweb
- Kokkos_HIP.cpp: include Kokkos_Core.hpp to resolve errors — ndellin / githubweb
- Add bound checks in RangePolicy and MDRangePolicy (#6617) — noreply / githubweb
- Temporary fix to reenable HIP CI — Bruno Turcksin / githubweb
- Let the smart pointer manage the CUDA/HIP stream (#6721) — noreply / githubweb
- Fix Docker env variables — Bruno Turcksin / githubweb
- Ensure view_allocation_error does not silently ignore that no exception was thrown — Damien L-G / githubweb
- Add RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc enumerator — Damien L-G / githubweb
- [OpenACC] throw if acc_malloc returned nullptr — Damien L-G / githubweb
- Fixup using declaration — Damien L-G / githubweb
- Disable openacc.view_allocation_error test — Damien L-G / githubweb
- Guard `[MD]RangePolicy` precondition check for deprecated code 4 (#6726) — noreply / githubweb
- Add C++26 standard to CMake Setup — dev / githubweb
- Add support for C++26 in generated makefiles — Damien L-G / githubweb
- Add KOKKOS_ENABLE_CXX26 to the configuration metadata — Damien L-G / githubweb
- Reenable HIP testing — Bruno Turcksin / githubweb
- Disabling failing HIP test in the CI — Bruno Turcksin / githubweb
- Use team_size_max to fix "Team size too large" error in reducer test (#6725) — noreply / githubweb
- Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with `SharedAllocationRecord`) (#6732) — noreply / githubweb
- Untangle SharedAllocationRecord spaghetti code — Damien L-G / githubweb
- Fix TestThreadVectorMDRangeParallelReduce (#6734) — noreply / githubweb
- Cuda multi-GPU support: Allow execution space instance constructor to run (#6706) — noreply / githubweb
- Drop support for deprecated command-line arguments and environment variables (#6744) — noreply / githubweb
- Avoid unnecessary zero-memset of the scratch flags in SYCL (#6739) — noreply / githubweb
#3 (Jan 25, 2024, 10:41:36 AM)
- implementation and tests — fnrizzi / githubweb
- implementation and tests — fnrizzi / githubweb
- only compute with relavent entries — tccleve / githubweb
- subset of team level impl of std algorithms — fnrizzi / githubweb
- fix copyright — fnrizzi / githubweb
- guard for openmptarget — fnrizzi / githubweb
- fix for openmptarget — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- SIMD: suppress a uninitialized variable warning (#6294) — noreply / githubweb
- OpenACC CMakechange Clacc (#6250) — noreply / githubweb
- `Kokkos::sort` support custom comparator (#6253) — noreply / githubweb
- Add nightly build using latest gcc and c++23 — Bruno Turcksin / githubweb
- Explicitly capture this in lambda function — Bruno Turcksin / githubweb
- Fix typo in nightly jenkins configuration — Bruno Turcksin / githubweb
- Fix a memory bug in the `free_state` function of random pools. (#6290) — noreply / githubweb
- std_algos: for_each: try condense the impl — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Enforce create_mirror restrictions w.r.t. ViewCtorArgs on all variants — Daniel Arndt / githubweb
- Use KOKKOS_IF_ON_HOST — Daniel Arndt / githubweb
- Suppress warnings — Daniel Arndt / githubweb
- Don't suppress warnings for NVHPC — Daniel Arndt / githubweb
- Improve macro definitions — Daniel Arndt / githubweb
- Enable Serial backend in HPX build — cezary.skrzynski / githubweb
- Modify fences in View API test — cezary.skrzynski / githubweb
- Check for overflow during backend initialization (Cuda, HIP, SYCL) (#6159) — noreply / githubweb
- Improve SYCL reduction performance: MDRangePolicy (#6271) — noreply / githubweb
- bring back previous code as discussed in meeting — fnrizzi / githubweb
- create cudaAPI function wrappers — tccleve / githubweb
- Reorganize #include <cuda_runtime_api.h> — tccleve / githubweb
- Some api function require cuda11.2+ — tccleve / githubweb
- Cuda10 requires "stream=nullptr" as default arg — tccleve / githubweb
- Rework stream inputs — tccleve / githubweb
- Use "if constexpr" for setCudaDevice — tccleve / githubweb
- Remove static in comment — tccleve / githubweb
- Update CI: use CUDA 11.6.2 instead of 11.6.0 (#6314) — noreply / githubweb
- add helper variable templates `are_*_iterators_v` (#6312) — noreply / githubweb
- [HIP] Optimize parallel_reduce (#6229) — noreply / githubweb
- Split Kokkos_SYCL_Parallel_Range.hpp — Daniel Arndt / githubweb
- Split Kokkos_SYCL_Parallel_Reduce.hpp — Daniel Arndt / githubweb
- Move Kokkos_SYCL_Scan.hpp — Daniel Arndt / githubweb
- Split Kokkos_Parallel_Team.hpp — Daniel Arndt / githubweb
- Add comment that we don't need to link against clang_rt with ROCm 5.5 — Bruno Turcksin / githubweb
- Error out when CXX standard is not set when using amdclang or cray clang — Bruno Turcksin / githubweb
- Disable failing tests for ROCm 5.5 and 5.6 — Bruno Turcksin / githubweb
- #5635: Add parallel_scan overload with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Serial/OpenMP: Parallel_scan with return value for TeamThreadRange (#6090) — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Allow using the SYCL execution space on AMD GPUs (#6321) — noreply / githubweb
- Update Clang+Cuda CI from 10.1 to 11.0.3 (#6318) — noreply / githubweb
- SIMD: add shift ops for all int types (#6109) — noreply / githubweb
- SYCL: Use sycl::bit_cast from oneAPI 2024.0.0 on (#6300) — noreply / githubweb
- remove spurious undefs — fnrizzi / githubweb
- Decrease maximum memory available to ccache — Bruno Turcksin / githubweb
- Fix gtest when using C++20 — Bruno Turcksin / githubweb
- Fully qualify Experimental::SYCL in algorithms to avoid conflicting namespaces — Daniel Arndt / githubweb
- address reviews [skip ci] — antoine.meyer54 / githubweb
- formatting — fnrizzi / githubweb
- SIMD: split simd unit tests into separate files (#6278) — noreply / githubweb
- Run NVHPC only on V100 — Bruno Turcksin / githubweb
- Use checked arithmetic builtins for overflow detection (#6313) — noreply / githubweb
- Passing size==0 to DeepCopy/memcpy/omp_target_memcpy (#6273) — noreply / githubweb
- allow sorting via native oneDPL to support views with stride = 1 (#6322) — noreply / githubweb
- Adopt new HIP cmake's way of finding clang-rt — nicurtis / githubweb
- Fix #6334: intel compiler does not like returning inside if constexpr (#6335) — noreply / githubweb
- Fix #6336: capturing `this` implicitly is deprecated in C++20 (#6337) — noreply / githubweb
- fix lambda capture — fnrizzi / githubweb
- remove unnecessary file, fix constraints — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix — fnrizzi / githubweb
- fix lambda capture and constraints — fnrizzi / githubweb
- keep only subset — fnrizzi / githubweb
- revert files — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix syntax — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Disable default oneDPL support in Trilinos — Daniel Arndt / githubweb
- Rename AMD GPU architectures (#6266) — noreply / githubweb
- Fix compiling SIMD libraray with NEON and gcc-13 — Daniel Arndt / githubweb
- SIMD: add generator constructors (#6347) — noreply / githubweb
- Use /usr/bin/env bash in nvcc_wrapper and kokkos_launch_compiler for portability — mikael.simberg / githubweb
- Fix cudaAPI wrapper errors for CUDA_MALLOC_ASYNC (#6346) — noreply / githubweb
- Use std::aligned_alloc for allocations (#6341) — noreply / githubweb
- Initial implementation of gfx942 (#6358) — noreply / githubweb
- Introduce constructor for multi-GPU support. — Daniel Arndt / githubweb
- Only set KOKKOS_ARCH_AMD_GPU if a AMD GPU architecture is enabled — Daniel Arndt / githubweb
- Extend 'hip_driver_check_page_migration' (#6364) — noreply / githubweb
- Replace volatile m_ready_count loads by desul::atomic_loads in task queue implementation — mikael.simberg / githubweb
- Add Kokkos_Atomic.hpp headers to OpenMP and HPX task headers — mikael.simberg / githubweb
- Add support for HIP Graph — Bruno Turcksin / githubweb
- Replace one of the ROCM 5.2 configuration with ROCM 5.6 — Bruno Turcksin / githubweb
- Use aligned version of operator new instead of aligned_alloc — Daniel Arndt / githubweb
- HPX: Don't interfere with exception handling — Daniel Arndt / githubweb
- Fix -Wformat-truncation warnings in CI (#6354) — noreply / githubweb
- Do not use HIP Graph with ROCm 5.2 — Bruno Turcksin / githubweb
- Use constexpr West in src — Bruno Turcksin / githubweb
- Use constexpr West in test — Bruno Turcksin / githubweb
- SIMD: convert binary operators to hidden friends (#6320) — noreply / githubweb
- simd: make mask and condition unit test to check with all data types (#6360) — noreply / githubweb
- team-level std algos: part 2 (#6205) — noreply / githubweb
- Add deprecated attribute to HostSpace(AllocationMechanism) definition — Damien L-G / githubweb
- Added gather_from and scatter_to for AVX2 and AVX512 simd — donlee / githubweb
- Rebased and applied feedbacks — donlee / githubweb
- Changed gen ctor usage to pass in KOKKOS_LAMBDA — donlee / githubweb
- Suppress NVCC attribute ignored warning on SIMD operators as hidden friends — Damien L-G / githubweb
- Remove deprecated code 3 support for volatile join — crtrott / githubweb
- Disable a test not working with nvhpc-23.1 — crtrott / githubweb
- Reenabling tests for nvhpc 23.7 — crtrott / githubweb
- Update containers and algorithms for NVC++ 23.7 — crtrott / githubweb
- More NVC++ 23.7 updates — crtrott / githubweb
- NVC++ clang-format fixes — crtrott / githubweb
- Update nvhpc to version 23.7 in the CI — crtrott / githubweb
- NVHPC 23.7 update: address reviewer comments — crtrott / githubweb
- OpenACC: Guard tests relying on abort — crtrott / githubweb
- Fix TestAtomic to use the test execspace — crtrott / githubweb
- Use NVHPC 23.7 for testing of OpenACC — crtrott / githubweb
- Update nvhpc gtest skip message — crtrott / githubweb
- Work around OpenMPTarget failure — crtrott / githubweb
- Update base docker file for nvhpc — crtrott / githubweb
- Remove stray Cuda graph pattern specialization from tag — Damien L-G / githubweb
- Fix typo — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Fix uninitialize variable warning with gcc 13 — Bruno Turcksin / githubweb
- HIP: Update print_configuration (#6387) — noreply / githubweb
- Add test — Daniel Arndt / githubweb
- Fix typo. — noreply / githubweb
- Check AVX, AVX2, AVX512XEON compiler macros on KokkosCore_config.h (#6248) — noreply / githubweb
- Don't use local headers or runtime in HPX backend due to deprecation — mikael.simberg / githubweb
- Explicitly check for valid device id — Daniel Arndt / githubweb
- Add a more comprehensive `kokkos_{malloc, free}` perf_test (#6377) — noreply / githubweb
- team-level std algos: part 3 (#6207) — noreply / githubweb
- Adding is_scoped_enum & to_underlying (#6356) — noreply / githubweb
- SIMD: add float simd support (#6177) — noreply / githubweb
- team-level std algos: part 4 (#6208) — noreply / githubweb
- Added a gen ctor for float (#6397) — noreply / githubweb
- team-level std algos: part 5 (#6209) — noreply / githubweb
- Deprecate Cuda(cudaStream_t, bool) — Damien L-G / githubweb
- Fixup checked interger operations death test — Damien L-G / githubweb
- Deprecate HIP(hipStream_t, bool) — Damien L-G / githubweb
- Let Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC be ON by default — Damien L-G / githubweb
- Print whether KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC is defined — Damien L-G / githubweb
- Introduce disable_malloc_async Cuda option with generated makefiles — Damien L-G / githubweb
- Preserve one build that disables Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC — Damien L-G / githubweb
- Use archive extraction time for timestamps — cezary.skrzynski / githubweb
- Disable performance benchmarks in AppVeyor CI — cezary.skrzynski / githubweb
- team-level std algos: part 6 (#6210) — noreply / githubweb
- address comments — fnrizzi / githubweb
- OpenMP backend refactor files. (#6403) — noreply / githubweb
- Drop (unused) `Cuda::cuda_internal_maximum_shared_words` — Damien L-G / githubweb
- Drop check that the host backend is initialized before the Cuda/HIP/SYCL one — Damien L-G / githubweb
- Drop unused HIPInternal::m_maxSharedWords data member — Damien L-G / githubweb
- Drop unused HIPInternal::m_hipArch static data member — Damien L-G / githubweb
- !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize — Damien L-G / githubweb
- Drop pre-Kepler logic in Cuda::impl_initialize — Damien L-G / githubweb
- use single — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- formatting — fnrizzi / githubweb
- Team-level std algos: part 7 (#6211) — noreply / githubweb
- formatting — fnrizzi / githubweb
- Enable death tests for fedora rawhide — cezary.skrzynski / githubweb
- core/src: Add half math functions to private header (#6124) — noreply / githubweb
- Drop check whether device supports unified addressing — Damien L-G / githubweb
- fix single as per Christian's suggestion — fnrizzi / githubweb
- Only warn once (at initialization) when forcing allocation in unified memory — Damien L-G / githubweb
- check-copyright improvements (#6399) — noreply / githubweb
- Use execution space instance argument to get device properties in block size deduction — Damien L-G / githubweb
- Address reviewer' comments — Bruno Turcksin / githubweb
- Fix to avoid #186-D pointless comparison warning. — maarten.arnst / githubweb
- add comment — fnrizzi / githubweb
- improve tests to address review — fnrizzi / githubweb
- Fix guard for isnan test for bhalf_t — Daniel Arndt / githubweb
- avoid potential race condition HIP — tccleve / githubweb
- Set the device id in cuda_kernel_arch — Daniel Arndt / githubweb
- [SYCL][Reduction] Group counter should use at least memory_order::acq_rel — andrei.elovikov / githubweb
- Same for scan — andrei.elovikov / githubweb
- Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy — Daniel Arndt / githubweb
- improve tests with intra-team result check — fnrizzi / githubweb
- Fixes for Kokkos::Array (#6372) — noreply / githubweb
- try fix for unique, previous impl to remove later — fnrizzi / githubweb
- #5635: Add parallel_scan changes for CUDA and TeamThreadRange — cezary.skrzynski / githubweb
- remove old impl — fnrizzi / githubweb
- #5635: Enable TeamThreadRange test for CUDA — cezary.skrzynski / githubweb
- Clean up benchmarks/gups — cwpears / githubweb
- benchmark/gups: use CMake — cwpears / githubweb
- OpenMPTarget: Disable check for SIMD compiler macros — Daniel Arndt / githubweb
- #5635: Add parallel_scan with value for CUDA and ThreadVectorRange — cezary.skrzynski / githubweb
- add missing assert — fnrizzi / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- add intra team check for missing test — fnrizzi / githubweb
- fix intel compile error — fnrizzi / githubweb
- fix unreachable for intel — fnrizzi / githubweb
- re-enable unit tests for sort and random via makefile (#6422) — noreply / githubweb
- OpenMPTarget init-join fix (#6444) — noreply / githubweb
- Fix Cuda parallel_scan ThreadVectorRange range — Daniel Arndt / githubweb
- Assign final sum in Cuda parallel_scan ThreadVectorRange — Daniel Arndt / githubweb
- Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443) — noreply / githubweb
- std_algos: improving min, max, minmax (#6421) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426) — noreply / githubweb
- Skip bessel function tests known to fail on Intel GPUs (#6434) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436) — noreply / githubweb
- Fix race condition in functor_vec_scan_ret_val test — Daniel Arndt / githubweb
- Fix parallel_scan_with_reducers test — Daniel Arndt / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425) — noreply / githubweb
- improve tests (#6432) — noreply / githubweb
- improve tests (#6437) — noreply / githubweb
- Move final assignment to correct scope — cezary.skrzynski / githubweb
- fix casting warning in Random test — fnrizzi / githubweb
- Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449) — noreply / githubweb
- HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7 — Daniel Arndt / githubweb
- fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459) — noreply / githubweb
- improve tests to check intra-team result (#6431) — noreply / githubweb
- SIMD: Math functions should be in namespace Kokkos — Daniel Arndt / githubweb
- SYCL: Disable another bessel function test for Intel GPUs — Daniel Arndt / githubweb
- Check for default device — Daniel Arndt / githubweb
- team-level std algos: part 10 (#6256) — noreply / githubweb
- team-level std algos: part 11 (#6258) — noreply / githubweb
- #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302) — noreply / githubweb
- #5635: Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- use shortcut — cezary.skrzynski / githubweb
- Skip testing for non-power-of-two team sizes — cezary.skrzynski / githubweb
- Fix formatting — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for HIP backend — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- #5635: SYCL: Add parallel_scan overload with return value — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1 — Daniel Arndt / githubweb
- Add compatible copy assignment operator to DualView — maarten.arnst / githubweb
- Fix minimum version for Google benchmark — Daniel Arndt / githubweb
- Add test of copy constructor/assignment operator for DualView. — maarten.arnst / githubweb
- Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU — maarten.arnst / githubweb
- Modify test so that source and destination view are of different type — maarten.arnst / githubweb
- Use call operator instead of run_me function — maarten.arnst / githubweb
- team-level std algos: part 12 (#6350) — noreply / githubweb
- core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407) — noreply / githubweb
- Moving abort and assert into their own public headers (#6445) — noreply / githubweb
- Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- Allow detecting SIMD types based on compiler macros (#6188) — noreply / githubweb
- Add KOKKOS_ARCH_ARM_NEON — Daniel Arndt / githubweb
- Fix implementation for cyl_bessel_i0 — Daniel Arndt / githubweb
- cleaning: remove iostream from headers where possible (IWYU) — romin.tomasetti / githubweb
- Fix compiling SIMD unit tests on NVIDIA — Daniel Arndt / githubweb
- guards to ensure DBL_EPSILON return for POWER8,9 — ajpowel / githubweb
- Fix NVCC warnings (#6483) — noreply / githubweb
- team-level std algos: part 13 (#6351) — noreply / githubweb
- Also fix annotations for generator constructor for AVX512 and NEON — Daniel Arndt / githubweb
- #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange — cezary.skrzynski / githubweb
- Fix atomic operations bug for Min and Max (#6435) — noreply / githubweb
- Fix example/build_cmake_installed_different_compiler — Daniel Arndt / githubweb
- fix impl — fnrizzi / githubweb
- Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp — noreply / githubweb
- Split Kokkos_Threads_Parallel files — Bruno Turcksin / githubweb
- HPX: Implement TeamThread and ThreadVector parallel_scan with return value — Daniel Arndt / githubweb
- Serial: Allow for distinct execution space instances (#6441) — noreply / githubweb
- add overload for TeamThreadRange — fnrizzi / githubweb
- address review comment — fnrizzi / githubweb
- Update to HIP TeamPolicy Block number heuristic (#6284) — noreply / githubweb
- SIMD: Split math functions from SIMD_Common.hpp (#6487) — noreply / githubweb
- Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON — Daniel Arndt / githubweb
- ad threadvector — fnrizzi / githubweb
- fix order — fnrizzi / githubweb
- remove guards — fnrizzi / githubweb
- UnorderedMap(space instance): proposal for #6067 — romin.tomasetti / githubweb
- Rename Kokkos_ThreadsExec to align with the other backends — Bruno Turcksin / githubweb
- Promote Kokkos_Printf.hpp to public include — Daniel Arndt / githubweb
- simd: add floor, ceil, round, trunc operations (#6393) — noreply / githubweb
- Update CI in OpenMPTarget to use llvm-17 (#6472) — noreply / githubweb
- Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp — Bruno Turcksin / githubweb
- Kokkos_SIMD_Scalar.hpp: remove extra ';' — ndellin / githubweb
- add guards — fnrizzi / githubweb
- avoid auto — fnrizzi / githubweb
- [ci skip] Update Kokkos version to 4.2.99 — Daniel Arndt / githubweb
- [deprecated code 3] remove all default device init tests — Damien L-G / githubweb
- [deprecated code 3] remove InitArguments — Damien L-G / githubweb
- [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants — Damien L-G / githubweb
- [deprecated code 3] remove {OpenMP,HPX}::partition_master — Damien L-G / githubweb
- OpenMP backend cleanup following removal of deprecated code 3 — Damien L-G / githubweb
- Get rid of FIXME_OPENMP — Damien L-G / githubweb
- [deprecated code 3] remove MasterLock — Damien L-G / githubweb
- [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option — Damien L-G / githubweb
- fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- Do not append " - blocks" to the bitset label — Damien L-G / githubweb
- with_updated_label -> append_to_label — Daniel Arndt / githubweb
- SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables — Daniel Arndt / githubweb
- Fixup in README (github -> GitHub) — Damien L-G / githubweb
- Check that device associated with stream matches requested device — Daniel Arndt / githubweb
- Threads: replace enum with constexpr int and enum class (#6514) — noreply / githubweb
- Added unit tests for reduction ops and few intel svml intrinsics — donlee / githubweb
- Remove sleep and wake functions — Bruno Turcksin / githubweb
- Remove extra constructor — Daniel Arndt / githubweb
- Prefer defaulted default constructor for Bitset (#6524) — noreply / githubweb
- SYCL: Use host-pinned memory to copy reduction/scan result (#6500) — noreply / githubweb
- Remove unused variables — Bruno Turcksin / githubweb
- Remove Sentinel struct from Threads — Bruno Turcksin / githubweb
- Small cleanup of ThreadsInternal::initialize — Bruno Turcksin / githubweb
- Remove useless forward declaration — Bruno Turcksin / githubweb
- Remove spawn function — Bruno Turcksin / githubweb
- Add comments — Bruno Turcksin / githubweb
- Fix indentation — Bruno Turcksin / githubweb
- Fix typo in macro guard — Bruno Turcksin / githubweb
- Reduce number of View constructor instantiations — Damien L-G / githubweb
- Bump HPX version used in CI to 1.9.0 — mikael.simberg / githubweb
- Split files in HIP backend — Bruno Turcksin / githubweb
- Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544) — noreply / githubweb
- Only define STDALGO_TEAM_SOURCES_* once — Daniel Arndt / githubweb
- Remove logical memory spaces — Damien L-G / githubweb
- Rollback changes to view constructors to reduce the number of instantiations (#6564) — noreply / githubweb
- Address reviewer comments — Daniel Arndt / githubweb
- Threads remove unused variables and functions (#6566) — noreply / githubweb
- Remove unused Sandia testing files (#6568) — noreply / githubweb
- fallback implementation cleanup — donlee / githubweb
- Remove empty quotation marks for static_assert — Daniel Arndt / githubweb
- [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header — Damien L-G / githubweb
- Drop Clang+CUDA workaround — Damien L-G / githubweb
- OpenMPTarget: CI compiler upgrade. (#6545) — noreply / githubweb
- m_cudaDev isn't static anymore — Daniel Arndt / githubweb
- Add crtrott's launch_latency benchmark (#6379) — noreply / githubweb
- Simplify fence functions in the Threads backend (#6571) — noreply / githubweb
- simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574) — noreply / githubweb
- Set the device id explicitly for CUDA API calls in impl_initialize — Daniel Arndt / githubweb
- OpenMPTarget: Guard scratch memory usage in ParallelReduce — Daniel Arndt / githubweb
- SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534) — noreply / githubweb
- Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header — Damien L-G / githubweb
- Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556) — noreply / githubweb
- OpenMP: No memset in viewfill (#6573) — noreply / githubweb
- Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header" — noreply / githubweb
- OpenACC: add atomics support (#6446) — noreply / githubweb
- Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543) — noreply / githubweb
- kokkos(unique): fix allocation of temporary view to enfore using the provided space instance — romin.tomasetti / githubweb
- Use binary wrapper for consistency in definition of half types numeric traits (#6590) — noreply / githubweb
- try fix — fnrizzi / githubweb
- Fix TestNumericTriats.hpp for SYCL with bfloat16 support — Daniel Arndt / githubweb
- Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3 — crtrott / githubweb
- Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601) — noreply / githubweb
- Added missing operator* to NEON simd — crtrott / githubweb
- [ci skip] Update changelog on develop for 4.2.00 (#6592) — noreply / githubweb
- Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF (#6593) — noreply / githubweb
- try fix — fnrizzi / githubweb
- avoid pyt package — fnrizzi / githubweb
- try — fnrizzi / githubweb
- fix for macos — fnrizzi / githubweb
- remove comments — fnrizzi / githubweb
- use reference — crtrott / githubweb
- add branching — fnrizzi / githubweb
- [ci skip] fix formatting — cezary.skrzynski / githubweb
- GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing — Daniel Arndt / githubweb
- nvcc(wrapper): adding missing `--generate-line-info` arg — romin.tomasetti / githubweb
- Add clang-format check to GitHub workflows (#6612) — noreply / githubweb
- graph(HIP): adding inline keyword to fix #6623 — romin.tomasetti / githubweb
- Add jenkins multibranch pipeline options — Bruno Turcksin / githubweb
- Add warp sync for Cuda parallel reduce — tccleve / githubweb
- kokkos(profiling): do not finalize in any backend — romin.tomasetti / githubweb
- Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format — Bruno Turcksin / githubweb
- update comment to include final() mention — tccleve / githubweb
- Disabling OpenACC in the CI because it emits too many warnings — Bruno Turcksin / githubweb
- unorderedmap: modernize traits — romin.tomasetti / githubweb
- nvcc wrapper: remove troubling flag to fix 6628 (#6629) — noreply / githubweb
- tools(profiling): type (related to kokkos/kokkos-tools/pull/221) — romin.tomasetti / githubweb
- This PR fixes the too-much-OpenACC-warning issue, mentioned in PR #6639. — lees2 / githubweb
- add missing header fix #6644 — fnrizzi / githubweb
- SYCL: Restrict workaround for is_device_copyable to oneAPI versions before 2024.0.0 (#6532) — noreply / githubweb
- Fixup test math functions ulp should double -> int — Damien L-G / githubweb
- Drop DualView converting copy assignment operator — Damien L-G / githubweb
- Don't use rocm-docker for clang-format — Daniel Arndt / githubweb
- Diable HIP CI — Daniel Arndt / githubweb
- Remove deprecation warning for AllocationMechanism for gcc <11.0 — Daniel Arndt / githubweb
- OpenMPTarget: clang extensions for dynamic shared memory. (#6380) — noreply / githubweb
- Fix builtin_unreachable use for MSVC/CUDA — crtrott / githubweb
- Fix missing include on msvc/cuda — crtrott / githubweb
- Avoid lambdas in constexpr branch for msvc/cuda — crtrott / githubweb
- Sidestep lacking CTAD support msvc/cuda — crtrott / githubweb
- Fix formatting — crtrott / githubweb
- Move header for Damien because he is right — crtrott / githubweb
- Unit test for issue 3371 (negative vector length should not yield a negative max_team_size) (#6076) — noreply / githubweb
- Add CMakeLists.txt for stream benchmark — cwpears / githubweb
- Do not negate the dependent true traits helper — Damien L-G / githubweb
- Drop guards to accommodate external code defining KOKKOS_ASSERT — Damien L-G / githubweb
- Use omp_get_max_active_levels() when supported — Daniel Arndt / githubweb
- Add missing gfx940 — rberger / githubweb
- Add Impl::always_false type-dendent false trait — Damien L-G / githubweb
- Per review prefer always_false<Arg>::value to is_void_v<Arg> — Damien L-G / githubweb
- Improve "no copy mechanism" exception message — bmkelle / githubweb
- Add a unit test for new deep_copy exception msg — bmkelle / githubweb
- Add missing include sstream — bmkelle / githubweb
- src->source, dst->destination — bmkelle / githubweb
- Workaround for ROCm 6.0 failing to compile with AVX2 SIMD support — Bruno Turcksin / githubweb
- SYCL: Force inlining of Kokkos::printf (#6650) — noreply / githubweb
- Improve handling of printf in OMPT on Intel GPUs — Daniel Arndt / githubweb
- OpenMP: Use `omp_get_nested` for older gcc versions (#6685) — noreply / githubweb
- Disable more Bessel tests for SYCL on INtel GPUs — Daniel Arndt / githubweb
- fill_random without exceution space instance should fence — Daniel Arndt / githubweb
- Drop unnecessary guarding for a tool library being loaded in ProfilingSection — Damien L-G / githubweb
- Drop unnecessary header include in Kokkos_Profiling_ProfileSection.hpp — Damien L-G / githubweb
- #5333: CUDA: Use scratch space appropriate to small reduction elements in Team reductions (#5334) — noreply / githubweb
- Cuda: Allocate using the correct device (#6392) — noreply / githubweb
- Let `Profiling::ProfilingSection(std::string)` constructor be explicit and nodiscard (#6690) — noreply / githubweb
- Cosmetic changes to ProfilingSection — Damien L-G / githubweb
- GitHub CI: Test with AddressSanitizer (#6676) — noreply / githubweb
- Kokkos::Array deduction guide (#6373) — noreply / githubweb
- Add CI for MSVC+Cuda (#6661) — noreply / githubweb
- SYCL: Address deprecations after oneAPI 2023.2.0 (#6577) — noreply / githubweb
- Fixup cast tolerance to double before printing — Damien L-G / githubweb
- Try linking against CUDA libararies even with KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE — Daniel Arndt / githubweb
- test_array_ctad: disable test for intel versions < 2021 — ndellin / githubweb
- Enable `{transform_}exclusive_scan` in place (#6667) — noreply / githubweb
- Add `ATOMICS_BYPASS` configuration option to disable atomics (#6692) — noreply / githubweb
- Check matching static extents in View constructor (#5190) — noreply / githubweb
- Remove Kokkos::[b]half_t volatile overloads (#6579) — noreply / githubweb
- add tests — fnrizzi / githubweb
- Provide `kokkos_swap` as part of Core and deprecate `Experimental::swap` in Algorithms (#6697) — noreply / githubweb
- Provide new public headers `<Kokkos_Clamp.hpp>` and `<Kokkos_MinMax.hpp>` (#6687) — noreply / githubweb
- Fix TeamThreadMDRange parallel_reduce (#6511) — noreply / githubweb
- add tests for in-place `inclusive_scan` (#6682) — noreply / githubweb
- Drop pointless Kokkos::Impl::CudaExec forward declaration — Damien L-G / githubweb
- Don't use the compiler launcher script if the compile language is CUDA. (#6704) — noreply / githubweb
- Deprecate `{Cuda,HIP}::detect_device_count()` and `Cuda::[detect_]device_arch()` (#6710) — noreply / githubweb
- Get rid of CudaInternal::cuda_get_error_{name,string}_wrapper — Damien L-G / githubweb
- No need to jump through so many hoops to print the error message — Damien L-G / githubweb
- HIP: Forgot to delete matching brace closing the namespace — Damien L-G / githubweb
- Make initialize and finalize of the Cuda/HIP singleton less special (#6714) — noreply / githubweb
- Kokkos_HIP.cpp: include Kokkos_Core.hpp to resolve errors — ndellin / githubweb
- Add bound checks in RangePolicy and MDRangePolicy (#6617) — noreply / githubweb
- Temporary fix to reenable HIP CI — Bruno Turcksin / githubweb
- Let the smart pointer manage the CUDA/HIP stream (#6721) — noreply / githubweb
- Fix Docker env variables — Bruno Turcksin / githubweb
- Ensure view_allocation_error does not silently ignore that no exception was thrown — Damien L-G / githubweb
- Add RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc enumerator — Damien L-G / githubweb
- [OpenACC] throw if acc_malloc returned nullptr — Damien L-G / githubweb
- Fixup using declaration — Damien L-G / githubweb
- Disable openacc.view_allocation_error test — Damien L-G / githubweb
- Guard `[MD]RangePolicy` precondition check for deprecated code 4 (#6726) — noreply / githubweb
- Add C++26 standard to CMake Setup — dev / githubweb
- Add support for C++26 in generated makefiles — Damien L-G / githubweb
- Add KOKKOS_ENABLE_CXX26 to the configuration metadata — Damien L-G / githubweb
- Reenable HIP testing — Bruno Turcksin / githubweb
- Disabling failing HIP test in the CI — Bruno Turcksin / githubweb
- Use team_size_max to fix "Team size too large" error in reducer test (#6725) — noreply / githubweb
- Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with `SharedAllocationRecord`) (#6732) — noreply / githubweb
- Untangle SharedAllocationRecord spaghetti code — Damien L-G / githubweb
- Fix TestThreadVectorMDRangeParallelReduce (#6734) — noreply / githubweb
- Cuda multi-GPU support: Allow execution space instance constructor to run (#6706) — noreply / githubweb
- Drop support for deprecated command-line arguments and environment variables (#6744) — noreply / githubweb
- Avoid unnecessary zero-memset of the scratch flags in SYCL (#6739) — noreply / githubweb
#3 (Jan 25, 2024, 10:41:36 AM)
- implementation and tests — fnrizzi / githubweb
- implementation and tests — fnrizzi / githubweb
- only compute with relavent entries — tccleve / githubweb
- subset of team level impl of std algorithms — fnrizzi / githubweb
- fix copyright — fnrizzi / githubweb
- guard for openmptarget — fnrizzi / githubweb
- fix for openmptarget — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- SIMD: suppress a uninitialized variable warning (#6294) — noreply / githubweb
- OpenACC CMakechange Clacc (#6250) — noreply / githubweb
- `Kokkos::sort` support custom comparator (#6253) — noreply / githubweb
- Add nightly build using latest gcc and c++23 — Bruno Turcksin / githubweb
- Explicitly capture this in lambda function — Bruno Turcksin / githubweb
- Fix typo in nightly jenkins configuration — Bruno Turcksin / githubweb
- Fix a memory bug in the `free_state` function of random pools. (#6290) — noreply / githubweb
- std_algos: for_each: try condense the impl — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Enforce create_mirror restrictions w.r.t. ViewCtorArgs on all variants — Daniel Arndt / githubweb
- Use KOKKOS_IF_ON_HOST — Daniel Arndt / githubweb
- Suppress warnings — Daniel Arndt / githubweb
- Don't suppress warnings for NVHPC — Daniel Arndt / githubweb
- Improve macro definitions — Daniel Arndt / githubweb
- Enable Serial backend in HPX build — cezary.skrzynski / githubweb
- Modify fences in View API test — cezary.skrzynski / githubweb
- Check for overflow during backend initialization (Cuda, HIP, SYCL) (#6159) — noreply / githubweb
- Improve SYCL reduction performance: MDRangePolicy (#6271) — noreply / githubweb
- bring back previous code as discussed in meeting — fnrizzi / githubweb
- create cudaAPI function wrappers — tccleve / githubweb
- Reorganize #include <cuda_runtime_api.h> — tccleve / githubweb
- Some api function require cuda11.2+ — tccleve / githubweb
- Cuda10 requires "stream=nullptr" as default arg — tccleve / githubweb
- Rework stream inputs — tccleve / githubweb
- Use "if constexpr" for setCudaDevice — tccleve / githubweb
- Remove static in comment — tccleve / githubweb
- Update CI: use CUDA 11.6.2 instead of 11.6.0 (#6314) — noreply / githubweb
- add helper variable templates `are_*_iterators_v` (#6312) — noreply / githubweb
- [HIP] Optimize parallel_reduce (#6229) — noreply / githubweb
- Split Kokkos_SYCL_Parallel_Range.hpp — Daniel Arndt / githubweb
- Split Kokkos_SYCL_Parallel_Reduce.hpp — Daniel Arndt / githubweb
- Move Kokkos_SYCL_Scan.hpp — Daniel Arndt / githubweb
- Split Kokkos_Parallel_Team.hpp — Daniel Arndt / githubweb
- Add comment that we don't need to link against clang_rt with ROCm 5.5 — Bruno Turcksin / githubweb
- Error out when CXX standard is not set when using amdclang or cray clang — Bruno Turcksin / githubweb
- Disable failing tests for ROCm 5.5 and 5.6 — Bruno Turcksin / githubweb
- #5635: Add parallel_scan overload with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Serial/OpenMP: Parallel_scan with return value for TeamThreadRange (#6090) — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Allow using the SYCL execution space on AMD GPUs (#6321) — noreply / githubweb
- Update Clang+Cuda CI from 10.1 to 11.0.3 (#6318) — noreply / githubweb
- SIMD: add shift ops for all int types (#6109) — noreply / githubweb
- SYCL: Use sycl::bit_cast from oneAPI 2024.0.0 on (#6300) — noreply / githubweb
- remove spurious undefs — fnrizzi / githubweb
- Decrease maximum memory available to ccache — Bruno Turcksin / githubweb
- Fix gtest when using C++20 — Bruno Turcksin / githubweb
- Fully qualify Experimental::SYCL in algorithms to avoid conflicting namespaces — Daniel Arndt / githubweb
- address reviews [skip ci] — antoine.meyer54 / githubweb
- formatting — fnrizzi / githubweb
- SIMD: split simd unit tests into separate files (#6278) — noreply / githubweb
- Run NVHPC only on V100 — Bruno Turcksin / githubweb
- Use checked arithmetic builtins for overflow detection (#6313) — noreply / githubweb
- Passing size==0 to DeepCopy/memcpy/omp_target_memcpy (#6273) — noreply / githubweb
- allow sorting via native oneDPL to support views with stride = 1 (#6322) — noreply / githubweb
- Adopt new HIP cmake's way of finding clang-rt — nicurtis / githubweb
- Fix #6334: intel compiler does not like returning inside if constexpr (#6335) — noreply / githubweb
- Fix #6336: capturing `this` implicitly is deprecated in C++20 (#6337) — noreply / githubweb
- fix lambda capture — fnrizzi / githubweb
- remove unnecessary file, fix constraints — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix — fnrizzi / githubweb
- fix lambda capture and constraints — fnrizzi / githubweb
- keep only subset — fnrizzi / githubweb
- revert files — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix syntax — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Disable default oneDPL support in Trilinos — Daniel Arndt / githubweb
- Rename AMD GPU architectures (#6266) — noreply / githubweb
- Fix compiling SIMD libraray with NEON and gcc-13 — Daniel Arndt / githubweb
- SIMD: add generator constructors (#6347) — noreply / githubweb
- Use /usr/bin/env bash in nvcc_wrapper and kokkos_launch_compiler for portability — mikael.simberg / githubweb
- Fix cudaAPI wrapper errors for CUDA_MALLOC_ASYNC (#6346) — noreply / githubweb
- Use std::aligned_alloc for allocations (#6341) — noreply / githubweb
- Initial implementation of gfx942 (#6358) — noreply / githubweb
- Introduce constructor for multi-GPU support. — Daniel Arndt / githubweb
- Only set KOKKOS_ARCH_AMD_GPU if a AMD GPU architecture is enabled — Daniel Arndt / githubweb
- Extend 'hip_driver_check_page_migration' (#6364) — noreply / githubweb
- Replace volatile m_ready_count loads by desul::atomic_loads in task queue implementation — mikael.simberg / githubweb
- Add Kokkos_Atomic.hpp headers to OpenMP and HPX task headers — mikael.simberg / githubweb
- Add support for HIP Graph — Bruno Turcksin / githubweb
- Replace one of the ROCM 5.2 configuration with ROCM 5.6 — Bruno Turcksin / githubweb
- Use aligned version of operator new instead of aligned_alloc — Daniel Arndt / githubweb
- HPX: Don't interfere with exception handling — Daniel Arndt / githubweb
- Fix -Wformat-truncation warnings in CI (#6354) — noreply / githubweb
- Do not use HIP Graph with ROCm 5.2 — Bruno Turcksin / githubweb
- Use constexpr West in src — Bruno Turcksin / githubweb
- Use constexpr West in test — Bruno Turcksin / githubweb
- SIMD: convert binary operators to hidden friends (#6320) — noreply / githubweb
- simd: make mask and condition unit test to check with all data types (#6360) — noreply / githubweb
- team-level std algos: part 2 (#6205) — noreply / githubweb
- Add deprecated attribute to HostSpace(AllocationMechanism) definition — Damien L-G / githubweb
- Added gather_from and scatter_to for AVX2 and AVX512 simd — donlee / githubweb
- Rebased and applied feedbacks — donlee / githubweb
- Changed gen ctor usage to pass in KOKKOS_LAMBDA — donlee / githubweb
- Suppress NVCC attribute ignored warning on SIMD operators as hidden friends — Damien L-G / githubweb
- Remove deprecated code 3 support for volatile join — crtrott / githubweb
- Disable a test not working with nvhpc-23.1 — crtrott / githubweb
- Reenabling tests for nvhpc 23.7 — crtrott / githubweb
- Update containers and algorithms for NVC++ 23.7 — crtrott / githubweb
- More NVC++ 23.7 updates — crtrott / githubweb
- NVC++ clang-format fixes — crtrott / githubweb
- Update nvhpc to version 23.7 in the CI — crtrott / githubweb
- NVHPC 23.7 update: address reviewer comments — crtrott / githubweb
- OpenACC: Guard tests relying on abort — crtrott / githubweb
- Fix TestAtomic to use the test execspace — crtrott / githubweb
- Use NVHPC 23.7 for testing of OpenACC — crtrott / githubweb
- Update nvhpc gtest skip message — crtrott / githubweb
- Work around OpenMPTarget failure — crtrott / githubweb
- Update base docker file for nvhpc — crtrott / githubweb
- Remove stray Cuda graph pattern specialization from tag — Damien L-G / githubweb
- Fix typo — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Fix uninitialize variable warning with gcc 13 — Bruno Turcksin / githubweb
- HIP: Update print_configuration (#6387) — noreply / githubweb
- Add test — Daniel Arndt / githubweb
- Fix typo. — noreply / githubweb
- Check AVX, AVX2, AVX512XEON compiler macros on KokkosCore_config.h (#6248) — noreply / githubweb
- Don't use local headers or runtime in HPX backend due to deprecation — mikael.simberg / githubweb
- Explicitly check for valid device id — Daniel Arndt / githubweb
- Add a more comprehensive `kokkos_{malloc, free}` perf_test (#6377) — noreply / githubweb
- team-level std algos: part 3 (#6207) — noreply / githubweb
- Adding is_scoped_enum & to_underlying (#6356) — noreply / githubweb
- SIMD: add float simd support (#6177) — noreply / githubweb
- team-level std algos: part 4 (#6208) — noreply / githubweb
- Added a gen ctor for float (#6397) — noreply / githubweb
- team-level std algos: part 5 (#6209) — noreply / githubweb
- Deprecate Cuda(cudaStream_t, bool) — Damien L-G / githubweb
- Fixup checked interger operations death test — Damien L-G / githubweb
- Deprecate HIP(hipStream_t, bool) — Damien L-G / githubweb
- Let Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC be ON by default — Damien L-G / githubweb
- Print whether KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC is defined — Damien L-G / githubweb
- Introduce disable_malloc_async Cuda option with generated makefiles — Damien L-G / githubweb
- Preserve one build that disables Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC — Damien L-G / githubweb
- Use archive extraction time for timestamps — cezary.skrzynski / githubweb
- Disable performance benchmarks in AppVeyor CI — cezary.skrzynski / githubweb
- team-level std algos: part 6 (#6210) — noreply / githubweb
- address comments — fnrizzi / githubweb
- OpenMP backend refactor files. (#6403) — noreply / githubweb
- Drop (unused) `Cuda::cuda_internal_maximum_shared_words` — Damien L-G / githubweb
- Drop check that the host backend is initialized before the Cuda/HIP/SYCL one — Damien L-G / githubweb
- Drop unused HIPInternal::m_maxSharedWords data member — Damien L-G / githubweb
- Drop unused HIPInternal::m_hipArch static data member — Damien L-G / githubweb
- !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize — Damien L-G / githubweb
- Drop pre-Kepler logic in Cuda::impl_initialize — Damien L-G / githubweb
- use single — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- formatting — fnrizzi / githubweb
- Team-level std algos: part 7 (#6211) — noreply / githubweb
- formatting — fnrizzi / githubweb
- Enable death tests for fedora rawhide — cezary.skrzynski / githubweb
- core/src: Add half math functions to private header (#6124) — noreply / githubweb
- Drop check whether device supports unified addressing — Damien L-G / githubweb
- fix single as per Christian's suggestion — fnrizzi / githubweb
- Only warn once (at initialization) when forcing allocation in unified memory — Damien L-G / githubweb
- check-copyright improvements (#6399) — noreply / githubweb
- Use execution space instance argument to get device properties in block size deduction — Damien L-G / githubweb
- Address reviewer' comments — Bruno Turcksin / githubweb
- Fix to avoid #186-D pointless comparison warning. — maarten.arnst / githubweb
- add comment — fnrizzi / githubweb
- improve tests to address review — fnrizzi / githubweb
- Fix guard for isnan test for bhalf_t — Daniel Arndt / githubweb
- avoid potential race condition HIP — tccleve / githubweb
- Set the device id in cuda_kernel_arch — Daniel Arndt / githubweb
- [SYCL][Reduction] Group counter should use at least memory_order::acq_rel — andrei.elovikov / githubweb
- Same for scan — andrei.elovikov / githubweb
- Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy — Daniel Arndt / githubweb
- improve tests with intra-team result check — fnrizzi / githubweb
- Fixes for Kokkos::Array (#6372) — noreply / githubweb
- try fix for unique, previous impl to remove later — fnrizzi / githubweb
- #5635: Add parallel_scan changes for CUDA and TeamThreadRange — cezary.skrzynski / githubweb
- remove old impl — fnrizzi / githubweb
- #5635: Enable TeamThreadRange test for CUDA — cezary.skrzynski / githubweb
- Clean up benchmarks/gups — cwpears / githubweb
- benchmark/gups: use CMake — cwpears / githubweb
- OpenMPTarget: Disable check for SIMD compiler macros — Daniel Arndt / githubweb
- #5635: Add parallel_scan with value for CUDA and ThreadVectorRange — cezary.skrzynski / githubweb
- add missing assert — fnrizzi / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- add intra team check for missing test — fnrizzi / githubweb
- fix intel compile error — fnrizzi / githubweb
- fix unreachable for intel — fnrizzi / githubweb
- re-enable unit tests for sort and random via makefile (#6422) — noreply / githubweb
- OpenMPTarget init-join fix (#6444) — noreply / githubweb
- Fix Cuda parallel_scan ThreadVectorRange range — Daniel Arndt / githubweb
- Assign final sum in Cuda parallel_scan ThreadVectorRange — Daniel Arndt / githubweb
- Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443) — noreply / githubweb
- std_algos: improving min, max, minmax (#6421) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426) — noreply / githubweb
- Skip bessel function tests known to fail on Intel GPUs (#6434) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436) — noreply / githubweb
- Fix race condition in functor_vec_scan_ret_val test — Daniel Arndt / githubweb
- Fix parallel_scan_with_reducers test — Daniel Arndt / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425) — noreply / githubweb
- improve tests (#6432) — noreply / githubweb
- improve tests (#6437) — noreply / githubweb
- Move final assignment to correct scope — cezary.skrzynski / githubweb
- fix casting warning in Random test — fnrizzi / githubweb
- Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449) — noreply / githubweb
- HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7 — Daniel Arndt / githubweb
- fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459) — noreply / githubweb
- improve tests to check intra-team result (#6431) — noreply / githubweb
- SIMD: Math functions should be in namespace Kokkos — Daniel Arndt / githubweb
- SYCL: Disable another bessel function test for Intel GPUs — Daniel Arndt / githubweb
- Check for default device — Daniel Arndt / githubweb
- team-level std algos: part 10 (#6256) — noreply / githubweb
- team-level std algos: part 11 (#6258) — noreply / githubweb
- #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302) — noreply / githubweb
- #5635: Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- use shortcut — cezary.skrzynski / githubweb
- Skip testing for non-power-of-two team sizes — cezary.skrzynski / githubweb
- Fix formatting — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for HIP backend — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- #5635: SYCL: Add parallel_scan overload with return value — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1 — Daniel Arndt / githubweb
- Add compatible copy assignment operator to DualView — maarten.arnst / githubweb
- Fix minimum version for Google benchmark — Daniel Arndt / githubweb
- Add test of copy constructor/assignment operator for DualView. — maarten.arnst / githubweb
- Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU — maarten.arnst / githubweb
- Modify test so that source and destination view are of different type — maarten.arnst / githubweb
- Use call operator instead of run_me function — maarten.arnst / githubweb
- team-level std algos: part 12 (#6350) — noreply / githubweb
- core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407) — noreply / githubweb
- Moving abort and assert into their own public headers (#6445) — noreply / githubweb
- Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- Allow detecting SIMD types based on compiler macros (#6188) — noreply / githubweb
- Add KOKKOS_ARCH_ARM_NEON — Daniel Arndt / githubweb
- Fix implementation for cyl_bessel_i0 — Daniel Arndt / githubweb
- cleaning: remove iostream from headers where possible (IWYU) — romin.tomasetti / githubweb
- Fix compiling SIMD unit tests on NVIDIA — Daniel Arndt / githubweb
- guards to ensure DBL_EPSILON return for POWER8,9 — ajpowel / githubweb
- Fix NVCC warnings (#6483) — noreply / githubweb
- team-level std algos: part 13 (#6351) — noreply / githubweb
- Also fix annotations for generator constructor for AVX512 and NEON — Daniel Arndt / githubweb
- #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange — cezary.skrzynski / githubweb
- Fix atomic operations bug for Min and Max (#6435) — noreply / githubweb
- Fix example/build_cmake_installed_different_compiler — Daniel Arndt / githubweb
- fix impl — fnrizzi / githubweb
- Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp — noreply / githubweb
- Split Kokkos_Threads_Parallel files — Bruno Turcksin / githubweb
- HPX: Implement TeamThread and ThreadVector parallel_scan with return value — Daniel Arndt / githubweb
- Serial: Allow for distinct execution space instances (#6441) — noreply / githubweb
- add overload for TeamThreadRange — fnrizzi / githubweb
- address review comment — fnrizzi / githubweb
- Update to HIP TeamPolicy Block number heuristic (#6284) — noreply / githubweb
- SIMD: Split math functions from SIMD_Common.hpp (#6487) — noreply / githubweb
- Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON — Daniel Arndt / githubweb
- ad threadvector — fnrizzi / githubweb
- fix order — fnrizzi / githubweb
- remove guards — fnrizzi / githubweb
- UnorderedMap(space instance): proposal for #6067 — romin.tomasetti / githubweb
- Rename Kokkos_ThreadsExec to align with the other backends — Bruno Turcksin / githubweb
- Promote Kokkos_Printf.hpp to public include — Daniel Arndt / githubweb
- simd: add floor, ceil, round, trunc operations (#6393) — noreply / githubweb
- Update CI in OpenMPTarget to use llvm-17 (#6472) — noreply / githubweb
- Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp — Bruno Turcksin / githubweb
- Kokkos_SIMD_Scalar.hpp: remove extra ';' — ndellin / githubweb
- add guards — fnrizzi / githubweb
- avoid auto — fnrizzi / githubweb
- [ci skip] Update Kokkos version to 4.2.99 — Daniel Arndt / githubweb
- [deprecated code 3] remove all default device init tests — Damien L-G / githubweb
- [deprecated code 3] remove InitArguments — Damien L-G / githubweb
- [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants — Damien L-G / githubweb
- [deprecated code 3] remove {OpenMP,HPX}::partition_master — Damien L-G / githubweb
- OpenMP backend cleanup following removal of deprecated code 3 — Damien L-G / githubweb
- Get rid of FIXME_OPENMP — Damien L-G / githubweb
- [deprecated code 3] remove MasterLock — Damien L-G / githubweb
- [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option — Damien L-G / githubweb
- fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- Do not append " - blocks" to the bitset label — Damien L-G / githubweb
- with_updated_label -> append_to_label — Daniel Arndt / githubweb
- SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables — Daniel Arndt / githubweb
- Fixup in README (github -> GitHub) — Damien L-G / githubweb
- Check that device associated with stream matches requested device — Daniel Arndt / githubweb
- Threads: replace enum with constexpr int and enum class (#6514) — noreply / githubweb
- Added unit tests for reduction ops and few intel svml intrinsics — donlee / githubweb
- Remove sleep and wake functions — Bruno Turcksin / githubweb
- Remove extra constructor — Daniel Arndt / githubweb
- Prefer defaulted default constructor for Bitset (#6524) — noreply / githubweb
- SYCL: Use host-pinned memory to copy reduction/scan result (#6500) — noreply / githubweb
- Remove unused variables — Bruno Turcksin / githubweb
- Remove Sentinel struct from Threads — Bruno Turcksin / githubweb
- Small cleanup of ThreadsInternal::initialize — Bruno Turcksin / githubweb
- Remove useless forward declaration — Bruno Turcksin / githubweb
- Remove spawn function — Bruno Turcksin / githubweb
- Add comments — Bruno Turcksin / githubweb
- Fix indentation — Bruno Turcksin / githubweb
- Fix typo in macro guard — Bruno Turcksin / githubweb
- Reduce number of View constructor instantiations — Damien L-G / githubweb
- Bump HPX version used in CI to 1.9.0 — mikael.simberg / githubweb
- Split files in HIP backend — Bruno Turcksin / githubweb
- Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544) — noreply / githubweb
- Only define STDALGO_TEAM_SOURCES_* once — Daniel Arndt / githubweb
- Remove logical memory spaces — Damien L-G / githubweb
- Rollback changes to view constructors to reduce the number of instantiations (#6564) — noreply / githubweb
- Address reviewer comments — Daniel Arndt / githubweb
- Threads remove unused variables and functions (#6566) — noreply / githubweb
- Remove unused Sandia testing files (#6568) — noreply / githubweb
- fallback implementation cleanup — donlee / githubweb
- Remove empty quotation marks for static_assert — Daniel Arndt / githubweb
- [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header — Damien L-G / githubweb
- Drop Clang+CUDA workaround — Damien L-G / githubweb
- OpenMPTarget: CI compiler upgrade. (#6545) — noreply / githubweb
- m_cudaDev isn't static anymore — Daniel Arndt / githubweb
- Add crtrott's launch_latency benchmark (#6379) — noreply / githubweb
- Simplify fence functions in the Threads backend (#6571) — noreply / githubweb
- simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574) — noreply / githubweb
- Set the device id explicitly for CUDA API calls in impl_initialize — Daniel Arndt / githubweb
- OpenMPTarget: Guard scratch memory usage in ParallelReduce — Daniel Arndt / githubweb
- SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534) — noreply / githubweb
- Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header — Damien L-G / githubweb
- Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556) — noreply / githubweb
- OpenMP: No memset in viewfill (#6573) — noreply / githubweb
- Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header" — noreply / githubweb
- OpenACC: add atomics support (#6446) — noreply / githubweb
- Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543) — noreply / githubweb
- kokkos(unique): fix allocation of temporary view to enfore using the provided space instance — romin.tomasetti / githubweb
- Use binary wrapper for consistency in definition of half types numeric traits (#6590) — noreply / githubweb
- try fix — fnrizzi / githubweb
- Fix TestNumericTriats.hpp for SYCL with bfloat16 support — Daniel Arndt / githubweb
- Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3 — crtrott / githubweb
- Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601) — noreply / githubweb
- Added missing operator* to NEON simd — crtrott / githubweb
- [ci skip] Update changelog on develop for 4.2.00 (#6592) — noreply / githubweb
- Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF (#6593) — noreply / githubweb
- try fix — fnrizzi / githubweb
- avoid pyt package — fnrizzi / githubweb
- try — fnrizzi / githubweb
- fix for macos — fnrizzi / githubweb
- remove comments — fnrizzi / githubweb
- use reference — crtrott / githubweb
- add branching — fnrizzi / githubweb
- [ci skip] fix formatting — cezary.skrzynski / githubweb
- GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing — Daniel Arndt / githubweb
- nvcc(wrapper): adding missing `--generate-line-info` arg — romin.tomasetti / githubweb
- Add clang-format check to GitHub workflows (#6612) — noreply / githubweb
- graph(HIP): adding inline keyword to fix #6623 — romin.tomasetti / githubweb
- Add jenkins multibranch pipeline options — Bruno Turcksin / githubweb
- Add warp sync for Cuda parallel reduce — tccleve / githubweb
- kokkos(profiling): do not finalize in any backend — romin.tomasetti / githubweb
- Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format — Bruno Turcksin / githubweb
- update comment to include final() mention — tccleve / githubweb
- Disabling OpenACC in the CI because it emits too many warnings — Bruno Turcksin / githubweb
- unorderedmap: modernize traits — romin.tomasetti / githubweb
- nvcc wrapper: remove troubling flag to fix 6628 (#6629) — noreply / githubweb
- tools(profiling): type (related to kokkos/kokkos-tools/pull/221) — romin.tomasetti / githubweb
- This PR fixes the too-much-OpenACC-warning issue, mentioned in PR #6639. — lees2 / githubweb
- add missing header fix #6644 — fnrizzi / githubweb
- SYCL: Restrict workaround for is_device_copyable to oneAPI versions before 2024.0.0 (#6532) — noreply / githubweb
- Fixup test math functions ulp should double -> int — Damien L-G / githubweb
- Drop DualView converting copy assignment operator — Damien L-G / githubweb
- Don't use rocm-docker for clang-format — Daniel Arndt / githubweb
- Diable HIP CI — Daniel Arndt / githubweb
- Remove deprecation warning for AllocationMechanism for gcc <11.0 — Daniel Arndt / githubweb
- OpenMPTarget: clang extensions for dynamic shared memory. (#6380) — noreply / githubweb
- Fix builtin_unreachable use for MSVC/CUDA — crtrott / githubweb
- Fix missing include on msvc/cuda — crtrott / githubweb
- Avoid lambdas in constexpr branch for msvc/cuda — crtrott / githubweb
- Sidestep lacking CTAD support msvc/cuda — crtrott / githubweb
- Fix formatting — crtrott / githubweb
- Move header for Damien because he is right — crtrott / githubweb
- Unit test for issue 3371 (negative vector length should not yield a negative max_team_size) (#6076) — noreply / githubweb
- Add CMakeLists.txt for stream benchmark — cwpears / githubweb
- Do not negate the dependent true traits helper — Damien L-G / githubweb
- Drop guards to accommodate external code defining KOKKOS_ASSERT — Damien L-G / githubweb
- Use omp_get_max_active_levels() when supported — Daniel Arndt / githubweb
- Add missing gfx940 — rberger / githubweb
- Add Impl::always_false type-dendent false trait — Damien L-G / githubweb
- Per review prefer always_false<Arg>::value to is_void_v<Arg> — Damien L-G / githubweb
- Improve "no copy mechanism" exception message — bmkelle / githubweb
- Add a unit test for new deep_copy exception msg — bmkelle / githubweb
- Add missing include sstream — bmkelle / githubweb
- src->source, dst->destination — bmkelle / githubweb
- Workaround for ROCm 6.0 failing to compile with AVX2 SIMD support — Bruno Turcksin / githubweb
- SYCL: Force inlining of Kokkos::printf (#6650) — noreply / githubweb
- Improve handling of printf in OMPT on Intel GPUs — Daniel Arndt / githubweb
- OpenMP: Use `omp_get_nested` for older gcc versions (#6685) — noreply / githubweb
- Disable more Bessel tests for SYCL on INtel GPUs — Daniel Arndt / githubweb
- fill_random without exceution space instance should fence — Daniel Arndt / githubweb
- Drop unnecessary guarding for a tool library being loaded in ProfilingSection — Damien L-G / githubweb
- Drop unnecessary header include in Kokkos_Profiling_ProfileSection.hpp — Damien L-G / githubweb
- #5333: CUDA: Use scratch space appropriate to small reduction elements in Team reductions (#5334) — noreply / githubweb
- Cuda: Allocate using the correct device (#6392) — noreply / githubweb
- Let `Profiling::ProfilingSection(std::string)` constructor be explicit and nodiscard (#6690) — noreply / githubweb
- Cosmetic changes to ProfilingSection — Damien L-G / githubweb
- GitHub CI: Test with AddressSanitizer (#6676) — noreply / githubweb
- Kokkos::Array deduction guide (#6373) — noreply / githubweb
- Add CI for MSVC+Cuda (#6661) — noreply / githubweb
- SYCL: Address deprecations after oneAPI 2023.2.0 (#6577) — noreply / githubweb
- Fixup cast tolerance to double before printing — Damien L-G / githubweb
- Try linking against CUDA libararies even with KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE — Daniel Arndt / githubweb
- test_array_ctad: disable test for intel versions < 2021 — ndellin / githubweb
- Enable `{transform_}exclusive_scan` in place (#6667) — noreply / githubweb
- Add `ATOMICS_BYPASS` configuration option to disable atomics (#6692) — noreply / githubweb
- Check matching static extents in View constructor (#5190) — noreply / githubweb
- Remove Kokkos::[b]half_t volatile overloads (#6579) — noreply / githubweb
- add tests — fnrizzi / githubweb
- Provide `kokkos_swap` as part of Core and deprecate `Experimental::swap` in Algorithms (#6697) — noreply / githubweb
- Provide new public headers `<Kokkos_Clamp.hpp>` and `<Kokkos_MinMax.hpp>` (#6687) — noreply / githubweb
- Fix TeamThreadMDRange parallel_reduce (#6511) — noreply / githubweb
- add tests for in-place `inclusive_scan` (#6682) — noreply / githubweb
- Drop pointless Kokkos::Impl::CudaExec forward declaration — Damien L-G / githubweb
- Don't use the compiler launcher script if the compile language is CUDA. (#6704) — noreply / githubweb
- Deprecate `{Cuda,HIP}::detect_device_count()` and `Cuda::[detect_]device_arch()` (#6710) — noreply / githubweb
- Get rid of CudaInternal::cuda_get_error_{name,string}_wrapper — Damien L-G / githubweb
- No need to jump through so many hoops to print the error message — Damien L-G / githubweb
- HIP: Forgot to delete matching brace closing the namespace — Damien L-G / githubweb
- Make initialize and finalize of the Cuda/HIP singleton less special (#6714) — noreply / githubweb
- Kokkos_HIP.cpp: include Kokkos_Core.hpp to resolve errors — ndellin / githubweb
- Add bound checks in RangePolicy and MDRangePolicy (#6617) — noreply / githubweb
- Temporary fix to reenable HIP CI — Bruno Turcksin / githubweb
- Let the smart pointer manage the CUDA/HIP stream (#6721) — noreply / githubweb
- Fix Docker env variables — Bruno Turcksin / githubweb
- Ensure view_allocation_error does not silently ignore that no exception was thrown — Damien L-G / githubweb
- Add RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc enumerator — Damien L-G / githubweb
- [OpenACC] throw if acc_malloc returned nullptr — Damien L-G / githubweb
- Fixup using declaration — Damien L-G / githubweb
- Disable openacc.view_allocation_error test — Damien L-G / githubweb
- Guard `[MD]RangePolicy` precondition check for deprecated code 4 (#6726) — noreply / githubweb
- Add C++26 standard to CMake Setup — dev / githubweb
- Add support for C++26 in generated makefiles — Damien L-G / githubweb
- Add KOKKOS_ENABLE_CXX26 to the configuration metadata — Damien L-G / githubweb
- Reenable HIP testing — Bruno Turcksin / githubweb
- Disabling failing HIP test in the CI — Bruno Turcksin / githubweb
- Use team_size_max to fix "Team size too large" error in reducer test (#6725) — noreply / githubweb
- Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with `SharedAllocationRecord`) (#6732) — noreply / githubweb
- Untangle SharedAllocationRecord spaghetti code — Damien L-G / githubweb
- Fix TestThreadVectorMDRangeParallelReduce (#6734) — noreply / githubweb
- Cuda multi-GPU support: Allow execution space instance constructor to run (#6706) — noreply / githubweb
- Drop support for deprecated command-line arguments and environment variables (#6744) — noreply / githubweb
- Avoid unnecessary zero-memset of the scratch flags in SYCL (#6739) — noreply / githubweb
#3 (Jan 25, 2024, 10:41:36 AM)
- implementation and tests — fnrizzi / githubweb
- implementation and tests — fnrizzi / githubweb
- only compute with relavent entries — tccleve / githubweb
- subset of team level impl of std algorithms — fnrizzi / githubweb
- fix copyright — fnrizzi / githubweb
- guard for openmptarget — fnrizzi / githubweb
- fix for openmptarget — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- SIMD: suppress a uninitialized variable warning (#6294) — noreply / githubweb
- OpenACC CMakechange Clacc (#6250) — noreply / githubweb
- `Kokkos::sort` support custom comparator (#6253) — noreply / githubweb
- Add nightly build using latest gcc and c++23 — Bruno Turcksin / githubweb
- Explicitly capture this in lambda function — Bruno Turcksin / githubweb
- Fix typo in nightly jenkins configuration — Bruno Turcksin / githubweb
- Fix a memory bug in the `free_state` function of random pools. (#6290) — noreply / githubweb
- std_algos: for_each: try condense the impl — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Enforce create_mirror restrictions w.r.t. ViewCtorArgs on all variants — Daniel Arndt / githubweb
- Use KOKKOS_IF_ON_HOST — Daniel Arndt / githubweb
- Suppress warnings — Daniel Arndt / githubweb
- Don't suppress warnings for NVHPC — Daniel Arndt / githubweb
- Improve macro definitions — Daniel Arndt / githubweb
- Enable Serial backend in HPX build — cezary.skrzynski / githubweb
- Modify fences in View API test — cezary.skrzynski / githubweb
- Check for overflow during backend initialization (Cuda, HIP, SYCL) (#6159) — noreply / githubweb
- Improve SYCL reduction performance: MDRangePolicy (#6271) — noreply / githubweb
- bring back previous code as discussed in meeting — fnrizzi / githubweb
- create cudaAPI function wrappers — tccleve / githubweb
- Reorganize #include <cuda_runtime_api.h> — tccleve / githubweb
- Some api function require cuda11.2+ — tccleve / githubweb
- Cuda10 requires "stream=nullptr" as default arg — tccleve / githubweb
- Rework stream inputs — tccleve / githubweb
- Use "if constexpr" for setCudaDevice — tccleve / githubweb
- Remove static in comment — tccleve / githubweb
- Update CI: use CUDA 11.6.2 instead of 11.6.0 (#6314) — noreply / githubweb
- add helper variable templates `are_*_iterators_v` (#6312) — noreply / githubweb
- [HIP] Optimize parallel_reduce (#6229) — noreply / githubweb
- Split Kokkos_SYCL_Parallel_Range.hpp — Daniel Arndt / githubweb
- Split Kokkos_SYCL_Parallel_Reduce.hpp — Daniel Arndt / githubweb
- Move Kokkos_SYCL_Scan.hpp — Daniel Arndt / githubweb
- Split Kokkos_Parallel_Team.hpp — Daniel Arndt / githubweb
- Add comment that we don't need to link against clang_rt with ROCm 5.5 — Bruno Turcksin / githubweb
- Error out when CXX standard is not set when using amdclang or cray clang — Bruno Turcksin / githubweb
- Disable failing tests for ROCm 5.5 and 5.6 — Bruno Turcksin / githubweb
- #5635: Add parallel_scan overload with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Serial/OpenMP: Parallel_scan with return value for TeamThreadRange (#6090) — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Allow using the SYCL execution space on AMD GPUs (#6321) — noreply / githubweb
- Update Clang+Cuda CI from 10.1 to 11.0.3 (#6318) — noreply / githubweb
- SIMD: add shift ops for all int types (#6109) — noreply / githubweb
- SYCL: Use sycl::bit_cast from oneAPI 2024.0.0 on (#6300) — noreply / githubweb
- remove spurious undefs — fnrizzi / githubweb
- Decrease maximum memory available to ccache — Bruno Turcksin / githubweb
- Fix gtest when using C++20 — Bruno Turcksin / githubweb
- Fully qualify Experimental::SYCL in algorithms to avoid conflicting namespaces — Daniel Arndt / githubweb
- address reviews [skip ci] — antoine.meyer54 / githubweb
- formatting — fnrizzi / githubweb
- SIMD: split simd unit tests into separate files (#6278) — noreply / githubweb
- Run NVHPC only on V100 — Bruno Turcksin / githubweb
- Use checked arithmetic builtins for overflow detection (#6313) — noreply / githubweb
- Passing size==0 to DeepCopy/memcpy/omp_target_memcpy (#6273) — noreply / githubweb
- allow sorting via native oneDPL to support views with stride = 1 (#6322) — noreply / githubweb
- Adopt new HIP cmake's way of finding clang-rt — nicurtis / githubweb
- Fix #6334: intel compiler does not like returning inside if constexpr (#6335) — noreply / githubweb
- Fix #6336: capturing `this` implicitly is deprecated in C++20 (#6337) — noreply / githubweb
- fix lambda capture — fnrizzi / githubweb
- remove unnecessary file, fix constraints — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix — fnrizzi / githubweb
- fix lambda capture and constraints — fnrizzi / githubweb
- keep only subset — fnrizzi / githubweb
- revert files — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix syntax — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Disable default oneDPL support in Trilinos — Daniel Arndt / githubweb
- Rename AMD GPU architectures (#6266) — noreply / githubweb
- Fix compiling SIMD libraray with NEON and gcc-13 — Daniel Arndt / githubweb
- SIMD: add generator constructors (#6347) — noreply / githubweb
- Use /usr/bin/env bash in nvcc_wrapper and kokkos_launch_compiler for portability — mikael.simberg / githubweb
- Fix cudaAPI wrapper errors for CUDA_MALLOC_ASYNC (#6346) — noreply / githubweb
- Use std::aligned_alloc for allocations (#6341) — noreply / githubweb
- Initial implementation of gfx942 (#6358) — noreply / githubweb
- Introduce constructor for multi-GPU support. — Daniel Arndt / githubweb
- Only set KOKKOS_ARCH_AMD_GPU if a AMD GPU architecture is enabled — Daniel Arndt / githubweb
- Extend 'hip_driver_check_page_migration' (#6364) — noreply / githubweb
- Replace volatile m_ready_count loads by desul::atomic_loads in task queue implementation — mikael.simberg / githubweb
- Add Kokkos_Atomic.hpp headers to OpenMP and HPX task headers — mikael.simberg / githubweb
- Add support for HIP Graph — Bruno Turcksin / githubweb
- Replace one of the ROCM 5.2 configuration with ROCM 5.6 — Bruno Turcksin / githubweb
- Use aligned version of operator new instead of aligned_alloc — Daniel Arndt / githubweb
- HPX: Don't interfere with exception handling — Daniel Arndt / githubweb
- Fix -Wformat-truncation warnings in CI (#6354) — noreply / githubweb
- Do not use HIP Graph with ROCm 5.2 — Bruno Turcksin / githubweb
- Use constexpr West in src — Bruno Turcksin / githubweb
- Use constexpr West in test — Bruno Turcksin / githubweb
- SIMD: convert binary operators to hidden friends (#6320) — noreply / githubweb
- simd: make mask and condition unit test to check with all data types (#6360) — noreply / githubweb
- team-level std algos: part 2 (#6205) — noreply / githubweb
- Add deprecated attribute to HostSpace(AllocationMechanism) definition — Damien L-G / githubweb
- Added gather_from and scatter_to for AVX2 and AVX512 simd — donlee / githubweb
- Rebased and applied feedbacks — donlee / githubweb
- Changed gen ctor usage to pass in KOKKOS_LAMBDA — donlee / githubweb
- Suppress NVCC attribute ignored warning on SIMD operators as hidden friends — Damien L-G / githubweb
- Remove deprecated code 3 support for volatile join — crtrott / githubweb
- Disable a test not working with nvhpc-23.1 — crtrott / githubweb
- Reenabling tests for nvhpc 23.7 — crtrott / githubweb
- Update containers and algorithms for NVC++ 23.7 — crtrott / githubweb
- More NVC++ 23.7 updates — crtrott / githubweb
- NVC++ clang-format fixes — crtrott / githubweb
- Update nvhpc to version 23.7 in the CI — crtrott / githubweb
- NVHPC 23.7 update: address reviewer comments — crtrott / githubweb
- OpenACC: Guard tests relying on abort — crtrott / githubweb
- Fix TestAtomic to use the test execspace — crtrott / githubweb
- Use NVHPC 23.7 for testing of OpenACC — crtrott / githubweb
- Update nvhpc gtest skip message — crtrott / githubweb
- Work around OpenMPTarget failure — crtrott / githubweb
- Update base docker file for nvhpc — crtrott / githubweb
- Remove stray Cuda graph pattern specialization from tag — Damien L-G / githubweb
- Fix typo — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Fix uninitialize variable warning with gcc 13 — Bruno Turcksin / githubweb
- HIP: Update print_configuration (#6387) — noreply / githubweb
- Add test — Daniel Arndt / githubweb
- Fix typo. — noreply / githubweb
- Check AVX, AVX2, AVX512XEON compiler macros on KokkosCore_config.h (#6248) — noreply / githubweb
- Don't use local headers or runtime in HPX backend due to deprecation — mikael.simberg / githubweb
- Explicitly check for valid device id — Daniel Arndt / githubweb
- Add a more comprehensive `kokkos_{malloc, free}` perf_test (#6377) — noreply / githubweb
- team-level std algos: part 3 (#6207) — noreply / githubweb
- Adding is_scoped_enum & to_underlying (#6356) — noreply / githubweb
- SIMD: add float simd support (#6177) — noreply / githubweb
- team-level std algos: part 4 (#6208) — noreply / githubweb
- Added a gen ctor for float (#6397) — noreply / githubweb
- team-level std algos: part 5 (#6209) — noreply / githubweb
- Deprecate Cuda(cudaStream_t, bool) — Damien L-G / githubweb
- Fixup checked interger operations death test — Damien L-G / githubweb
- Deprecate HIP(hipStream_t, bool) — Damien L-G / githubweb
- Let Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC be ON by default — Damien L-G / githubweb
- Print whether KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC is defined — Damien L-G / githubweb
- Introduce disable_malloc_async Cuda option with generated makefiles — Damien L-G / githubweb
- Preserve one build that disables Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC — Damien L-G / githubweb
- Use archive extraction time for timestamps — cezary.skrzynski / githubweb
- Disable performance benchmarks in AppVeyor CI — cezary.skrzynski / githubweb
- team-level std algos: part 6 (#6210) — noreply / githubweb
- address comments — fnrizzi / githubweb
- OpenMP backend refactor files. (#6403) — noreply / githubweb
- Drop (unused) `Cuda::cuda_internal_maximum_shared_words` — Damien L-G / githubweb
- Drop check that the host backend is initialized before the Cuda/HIP/SYCL one — Damien L-G / githubweb
- Drop unused HIPInternal::m_maxSharedWords data member — Damien L-G / githubweb
- Drop unused HIPInternal::m_hipArch static data member — Damien L-G / githubweb
- !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize — Damien L-G / githubweb
- Drop pre-Kepler logic in Cuda::impl_initialize — Damien L-G / githubweb
- use single — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- formatting — fnrizzi / githubweb
- Team-level std algos: part 7 (#6211) — noreply / githubweb
- formatting — fnrizzi / githubweb
- Enable death tests for fedora rawhide — cezary.skrzynski / githubweb
- core/src: Add half math functions to private header (#6124) — noreply / githubweb
- Drop check whether device supports unified addressing — Damien L-G / githubweb
- fix single as per Christian's suggestion — fnrizzi / githubweb
- Only warn once (at initialization) when forcing allocation in unified memory — Damien L-G / githubweb
- check-copyright improvements (#6399) — noreply / githubweb
- Use execution space instance argument to get device properties in block size deduction — Damien L-G / githubweb
- Address reviewer' comments — Bruno Turcksin / githubweb
- Fix to avoid #186-D pointless comparison warning. — maarten.arnst / githubweb
- add comment — fnrizzi / githubweb
- improve tests to address review — fnrizzi / githubweb
- Fix guard for isnan test for bhalf_t — Daniel Arndt / githubweb
- avoid potential race condition HIP — tccleve / githubweb
- Set the device id in cuda_kernel_arch — Daniel Arndt / githubweb
- [SYCL][Reduction] Group counter should use at least memory_order::acq_rel — andrei.elovikov / githubweb
- Same for scan — andrei.elovikov / githubweb
- Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy — Daniel Arndt / githubweb
- improve tests with intra-team result check — fnrizzi / githubweb
- Fixes for Kokkos::Array (#6372) — noreply / githubweb
- try fix for unique, previous impl to remove later — fnrizzi / githubweb
- #5635: Add parallel_scan changes for CUDA and TeamThreadRange — cezary.skrzynski / githubweb
- remove old impl — fnrizzi / githubweb
- #5635: Enable TeamThreadRange test for CUDA — cezary.skrzynski / githubweb
- Clean up benchmarks/gups — cwpears / githubweb
- benchmark/gups: use CMake — cwpears / githubweb
- OpenMPTarget: Disable check for SIMD compiler macros — Daniel Arndt / githubweb
- #5635: Add parallel_scan with value for CUDA and ThreadVectorRange — cezary.skrzynski / githubweb
- add missing assert — fnrizzi / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- add intra team check for missing test — fnrizzi / githubweb
- fix intel compile error — fnrizzi / githubweb
- fix unreachable for intel — fnrizzi / githubweb
- re-enable unit tests for sort and random via makefile (#6422) — noreply / githubweb
- OpenMPTarget init-join fix (#6444) — noreply / githubweb
- Fix Cuda parallel_scan ThreadVectorRange range — Daniel Arndt / githubweb
- Assign final sum in Cuda parallel_scan ThreadVectorRange — Daniel Arndt / githubweb
- Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443) — noreply / githubweb
- std_algos: improving min, max, minmax (#6421) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426) — noreply / githubweb
- Skip bessel function tests known to fail on Intel GPUs (#6434) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436) — noreply / githubweb
- Fix race condition in functor_vec_scan_ret_val test — Daniel Arndt / githubweb
- Fix parallel_scan_with_reducers test — Daniel Arndt / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425) — noreply / githubweb
- improve tests (#6432) — noreply / githubweb
- improve tests (#6437) — noreply / githubweb
- Move final assignment to correct scope — cezary.skrzynski / githubweb
- fix casting warning in Random test — fnrizzi / githubweb
- Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449) — noreply / githubweb
- HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7 — Daniel Arndt / githubweb
- fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459) — noreply / githubweb
- improve tests to check intra-team result (#6431) — noreply / githubweb
- SIMD: Math functions should be in namespace Kokkos — Daniel Arndt / githubweb
- SYCL: Disable another bessel function test for Intel GPUs — Daniel Arndt / githubweb
- Check for default device — Daniel Arndt / githubweb
- team-level std algos: part 10 (#6256) — noreply / githubweb
- team-level std algos: part 11 (#6258) — noreply / githubweb
- #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302) — noreply / githubweb
- #5635: Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- use shortcut — cezary.skrzynski / githubweb
- Skip testing for non-power-of-two team sizes — cezary.skrzynski / githubweb
- Fix formatting — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for HIP backend — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- #5635: SYCL: Add parallel_scan overload with return value — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1 — Daniel Arndt / githubweb
- Add compatible copy assignment operator to DualView — maarten.arnst / githubweb
- Fix minimum version for Google benchmark — Daniel Arndt / githubweb
- Add test of copy constructor/assignment operator for DualView. — maarten.arnst / githubweb
- Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU — maarten.arnst / githubweb
- Modify test so that source and destination view are of different type — maarten.arnst / githubweb
- Use call operator instead of run_me function — maarten.arnst / githubweb
- team-level std algos: part 12 (#6350) — noreply / githubweb
- core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407) — noreply / githubweb
- Moving abort and assert into their own public headers (#6445) — noreply / githubweb
- Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- Allow detecting SIMD types based on compiler macros (#6188) — noreply / githubweb
- Add KOKKOS_ARCH_ARM_NEON — Daniel Arndt / githubweb
- Fix implementation for cyl_bessel_i0 — Daniel Arndt / githubweb
- cleaning: remove iostream from headers where possible (IWYU) — romin.tomasetti / githubweb
- Fix compiling SIMD unit tests on NVIDIA — Daniel Arndt / githubweb
- guards to ensure DBL_EPSILON return for POWER8,9 — ajpowel / githubweb
- Fix NVCC warnings (#6483) — noreply / githubweb
- team-level std algos: part 13 (#6351) — noreply / githubweb
- Also fix annotations for generator constructor for AVX512 and NEON — Daniel Arndt / githubweb
- #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange — cezary.skrzynski / githubweb
- Fix atomic operations bug for Min and Max (#6435) — noreply / githubweb
- Fix example/build_cmake_installed_different_compiler — Daniel Arndt / githubweb
- fix impl — fnrizzi / githubweb
- Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp — noreply / githubweb
- Split Kokkos_Threads_Parallel files — Bruno Turcksin / githubweb
- HPX: Implement TeamThread and ThreadVector parallel_scan with return value — Daniel Arndt / githubweb
- Serial: Allow for distinct execution space instances (#6441) — noreply / githubweb
- add overload for TeamThreadRange — fnrizzi / githubweb
- address review comment — fnrizzi / githubweb
- Update to HIP TeamPolicy Block number heuristic (#6284) — noreply / githubweb
- SIMD: Split math functions from SIMD_Common.hpp (#6487) — noreply / githubweb
- Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON — Daniel Arndt / githubweb
- ad threadvector — fnrizzi / githubweb
- fix order — fnrizzi / githubweb
- remove guards — fnrizzi / githubweb
- UnorderedMap(space instance): proposal for #6067 — romin.tomasetti / githubweb
- Rename Kokkos_ThreadsExec to align with the other backends — Bruno Turcksin / githubweb
- Promote Kokkos_Printf.hpp to public include — Daniel Arndt / githubweb
- simd: add floor, ceil, round, trunc operations (#6393) — noreply / githubweb
- Update CI in OpenMPTarget to use llvm-17 (#6472) — noreply / githubweb
- Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp — Bruno Turcksin / githubweb
- Kokkos_SIMD_Scalar.hpp: remove extra ';' — ndellin / githubweb
- add guards — fnrizzi / githubweb
- avoid auto — fnrizzi / githubweb
- [ci skip] Update Kokkos version to 4.2.99 — Daniel Arndt / githubweb
- [deprecated code 3] remove all default device init tests — Damien L-G / githubweb
- [deprecated code 3] remove InitArguments — Damien L-G / githubweb
- [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants — Damien L-G / githubweb
- [deprecated code 3] remove {OpenMP,HPX}::partition_master — Damien L-G / githubweb
- OpenMP backend cleanup following removal of deprecated code 3 — Damien L-G / githubweb
- Get rid of FIXME_OPENMP — Damien L-G / githubweb
- [deprecated code 3] remove MasterLock — Damien L-G / githubweb
- [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option — Damien L-G / githubweb
- fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- Do not append " - blocks" to the bitset label — Damien L-G / githubweb
- with_updated_label -> append_to_label — Daniel Arndt / githubweb
- SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables — Daniel Arndt / githubweb
- Fixup in README (github -> GitHub) — Damien L-G / githubweb
- Check that device associated with stream matches requested device — Daniel Arndt / githubweb
- Threads: replace enum with constexpr int and enum class (#6514) — noreply / githubweb
- Added unit tests for reduction ops and few intel svml intrinsics — donlee / githubweb
- Remove sleep and wake functions — Bruno Turcksin / githubweb
- Remove extra constructor — Daniel Arndt / githubweb
- Prefer defaulted default constructor for Bitset (#6524) — noreply / githubweb
- SYCL: Use host-pinned memory to copy reduction/scan result (#6500) — noreply / githubweb
- Remove unused variables — Bruno Turcksin / githubweb
- Remove Sentinel struct from Threads — Bruno Turcksin / githubweb
- Small cleanup of ThreadsInternal::initialize — Bruno Turcksin / githubweb
- Remove useless forward declaration — Bruno Turcksin / githubweb
- Remove spawn function — Bruno Turcksin / githubweb
- Add comments — Bruno Turcksin / githubweb
- Fix indentation — Bruno Turcksin / githubweb
- Fix typo in macro guard — Bruno Turcksin / githubweb
- Reduce number of View constructor instantiations — Damien L-G / githubweb
- Bump HPX version used in CI to 1.9.0 — mikael.simberg / githubweb
- Split files in HIP backend — Bruno Turcksin / githubweb
- Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544) — noreply / githubweb
- Only define STDALGO_TEAM_SOURCES_* once — Daniel Arndt / githubweb
- Remove logical memory spaces — Damien L-G / githubweb
- Rollback changes to view constructors to reduce the number of instantiations (#6564) — noreply / githubweb
- Address reviewer comments — Daniel Arndt / githubweb
- Threads remove unused variables and functions (#6566) — noreply / githubweb
- Remove unused Sandia testing files (#6568) — noreply / githubweb
- fallback implementation cleanup — donlee / githubweb
- Remove empty quotation marks for static_assert — Daniel Arndt / githubweb
- [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header — Damien L-G / githubweb
- Drop Clang+CUDA workaround — Damien L-G / githubweb
- OpenMPTarget: CI compiler upgrade. (#6545) — noreply / githubweb
- m_cudaDev isn't static anymore — Daniel Arndt / githubweb
- Add crtrott's launch_latency benchmark (#6379) — noreply / githubweb
- Simplify fence functions in the Threads backend (#6571) — noreply / githubweb
- simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574) — noreply / githubweb
- Set the device id explicitly for CUDA API calls in impl_initialize — Daniel Arndt / githubweb
- OpenMPTarget: Guard scratch memory usage in ParallelReduce — Daniel Arndt / githubweb
- SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534) — noreply / githubweb
- Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header — Damien L-G / githubweb
- Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556) — noreply / githubweb
- OpenMP: No memset in viewfill (#6573) — noreply / githubweb
- Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header" — noreply / githubweb
- OpenACC: add atomics support (#6446) — noreply / githubweb
- Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543) — noreply / githubweb
- kokkos(unique): fix allocation of temporary view to enfore using the provided space instance — romin.tomasetti / githubweb
- Use binary wrapper for consistency in definition of half types numeric traits (#6590) — noreply / githubweb
- try fix — fnrizzi / githubweb
- Fix TestNumericTriats.hpp for SYCL with bfloat16 support — Daniel Arndt / githubweb
- Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3 — crtrott / githubweb
- Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601) — noreply / githubweb
- Added missing operator* to NEON simd — crtrott / githubweb
- [ci skip] Update changelog on develop for 4.2.00 (#6592) — noreply / githubweb
- Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF (#6593) — noreply / githubweb
- try fix — fnrizzi / githubweb
- avoid pyt package — fnrizzi / githubweb
- try — fnrizzi / githubweb
- fix for macos — fnrizzi / githubweb
- remove comments — fnrizzi / githubweb
- use reference — crtrott / githubweb
- add branching — fnrizzi / githubweb
- [ci skip] fix formatting — cezary.skrzynski / githubweb
- GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing — Daniel Arndt / githubweb
- nvcc(wrapper): adding missing `--generate-line-info` arg — romin.tomasetti / githubweb
- Add clang-format check to GitHub workflows (#6612) — noreply / githubweb
- graph(HIP): adding inline keyword to fix #6623 — romin.tomasetti / githubweb
- Add jenkins multibranch pipeline options — Bruno Turcksin / githubweb
- Add warp sync for Cuda parallel reduce — tccleve / githubweb
- kokkos(profiling): do not finalize in any backend — romin.tomasetti / githubweb
- Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format — Bruno Turcksin / githubweb
- update comment to include final() mention — tccleve / githubweb
- Disabling OpenACC in the CI because it emits too many warnings — Bruno Turcksin / githubweb
- unorderedmap: modernize traits — romin.tomasetti / githubweb
- nvcc wrapper: remove troubling flag to fix 6628 (#6629) — noreply / githubweb
- tools(profiling): type (related to kokkos/kokkos-tools/pull/221) — romin.tomasetti / githubweb
- This PR fixes the too-much-OpenACC-warning issue, mentioned in PR #6639. — lees2 / githubweb
- add missing header fix #6644 — fnrizzi / githubweb
- SYCL: Restrict workaround for is_device_copyable to oneAPI versions before 2024.0.0 (#6532) — noreply / githubweb
- Fixup test math functions ulp should double -> int — Damien L-G / githubweb
- Drop DualView converting copy assignment operator — Damien L-G / githubweb
- Don't use rocm-docker for clang-format — Daniel Arndt / githubweb
- Diable HIP CI — Daniel Arndt / githubweb
- Remove deprecation warning for AllocationMechanism for gcc <11.0 — Daniel Arndt / githubweb
- OpenMPTarget: clang extensions for dynamic shared memory. (#6380) — noreply / githubweb
- Fix builtin_unreachable use for MSVC/CUDA — crtrott / githubweb
- Fix missing include on msvc/cuda — crtrott / githubweb
- Avoid lambdas in constexpr branch for msvc/cuda — crtrott / githubweb
- Sidestep lacking CTAD support msvc/cuda — crtrott / githubweb
- Fix formatting — crtrott / githubweb
- Move header for Damien because he is right — crtrott / githubweb
- Unit test for issue 3371 (negative vector length should not yield a negative max_team_size) (#6076) — noreply / githubweb
- Add CMakeLists.txt for stream benchmark — cwpears / githubweb
- Do not negate the dependent true traits helper — Damien L-G / githubweb
- Drop guards to accommodate external code defining KOKKOS_ASSERT — Damien L-G / githubweb
- Use omp_get_max_active_levels() when supported — Daniel Arndt / githubweb
- Add missing gfx940 — rberger / githubweb
- Add Impl::always_false type-dendent false trait — Damien L-G / githubweb
- Per review prefer always_false<Arg>::value to is_void_v<Arg> — Damien L-G / githubweb
- Improve "no copy mechanism" exception message — bmkelle / githubweb
- Add a unit test for new deep_copy exception msg — bmkelle / githubweb
- Add missing include sstream — bmkelle / githubweb
- src->source, dst->destination — bmkelle / githubweb
- Workaround for ROCm 6.0 failing to compile with AVX2 SIMD support — Bruno Turcksin / githubweb
- SYCL: Force inlining of Kokkos::printf (#6650) — noreply / githubweb
- Improve handling of printf in OMPT on Intel GPUs — Daniel Arndt / githubweb
- OpenMP: Use `omp_get_nested` for older gcc versions (#6685) — noreply / githubweb
- Disable more Bessel tests for SYCL on INtel GPUs — Daniel Arndt / githubweb
- fill_random without exceution space instance should fence — Daniel Arndt / githubweb
- Drop unnecessary guarding for a tool library being loaded in ProfilingSection — Damien L-G / githubweb
- Drop unnecessary header include in Kokkos_Profiling_ProfileSection.hpp — Damien L-G / githubweb
- #5333: CUDA: Use scratch space appropriate to small reduction elements in Team reductions (#5334) — noreply / githubweb
- Cuda: Allocate using the correct device (#6392) — noreply / githubweb
- Let `Profiling::ProfilingSection(std::string)` constructor be explicit and nodiscard (#6690) — noreply / githubweb
- Cosmetic changes to ProfilingSection — Damien L-G / githubweb
- GitHub CI: Test with AddressSanitizer (#6676) — noreply / githubweb
- Kokkos::Array deduction guide (#6373) — noreply / githubweb
- Add CI for MSVC+Cuda (#6661) — noreply / githubweb
- SYCL: Address deprecations after oneAPI 2023.2.0 (#6577) — noreply / githubweb
- Fixup cast tolerance to double before printing — Damien L-G / githubweb
- Try linking against CUDA libararies even with KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE — Daniel Arndt / githubweb
- test_array_ctad: disable test for intel versions < 2021 — ndellin / githubweb
- Enable `{transform_}exclusive_scan` in place (#6667) — noreply / githubweb
- Add `ATOMICS_BYPASS` configuration option to disable atomics (#6692) — noreply / githubweb
- Check matching static extents in View constructor (#5190) — noreply / githubweb
- Remove Kokkos::[b]half_t volatile overloads (#6579) — noreply / githubweb
- add tests — fnrizzi / githubweb
- Provide `kokkos_swap` as part of Core and deprecate `Experimental::swap` in Algorithms (#6697) — noreply / githubweb
- Provide new public headers `<Kokkos_Clamp.hpp>` and `<Kokkos_MinMax.hpp>` (#6687) — noreply / githubweb
- Fix TeamThreadMDRange parallel_reduce (#6511) — noreply / githubweb
- add tests for in-place `inclusive_scan` (#6682) — noreply / githubweb
- Drop pointless Kokkos::Impl::CudaExec forward declaration — Damien L-G / githubweb
- Don't use the compiler launcher script if the compile language is CUDA. (#6704) — noreply / githubweb
- Deprecate `{Cuda,HIP}::detect_device_count()` and `Cuda::[detect_]device_arch()` (#6710) — noreply / githubweb
- Get rid of CudaInternal::cuda_get_error_{name,string}_wrapper — Damien L-G / githubweb
- No need to jump through so many hoops to print the error message — Damien L-G / githubweb
- HIP: Forgot to delete matching brace closing the namespace — Damien L-G / githubweb
- Make initialize and finalize of the Cuda/HIP singleton less special (#6714) — noreply / githubweb
- Kokkos_HIP.cpp: include Kokkos_Core.hpp to resolve errors — ndellin / githubweb
- Add bound checks in RangePolicy and MDRangePolicy (#6617) — noreply / githubweb
- Temporary fix to reenable HIP CI — Bruno Turcksin / githubweb
- Let the smart pointer manage the CUDA/HIP stream (#6721) — noreply / githubweb
- Fix Docker env variables — Bruno Turcksin / githubweb
- Ensure view_allocation_error does not silently ignore that no exception was thrown — Damien L-G / githubweb
- Add RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc enumerator — Damien L-G / githubweb
- [OpenACC] throw if acc_malloc returned nullptr — Damien L-G / githubweb
- Fixup using declaration — Damien L-G / githubweb
- Disable openacc.view_allocation_error test — Damien L-G / githubweb
- Guard `[MD]RangePolicy` precondition check for deprecated code 4 (#6726) — noreply / githubweb
- Add C++26 standard to CMake Setup — dev / githubweb
- Add support for C++26 in generated makefiles — Damien L-G / githubweb
- Add KOKKOS_ENABLE_CXX26 to the configuration metadata — Damien L-G / githubweb
- Reenable HIP testing — Bruno Turcksin / githubweb
- Disabling failing HIP test in the CI — Bruno Turcksin / githubweb
- Use team_size_max to fix "Team size too large" error in reducer test (#6725) — noreply / githubweb
- Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with `SharedAllocationRecord`) (#6732) — noreply / githubweb
- Untangle SharedAllocationRecord spaghetti code — Damien L-G / githubweb
- Fix TestThreadVectorMDRangeParallelReduce (#6734) — noreply / githubweb
- Cuda multi-GPU support: Allow execution space instance constructor to run (#6706) — noreply / githubweb
- Drop support for deprecated command-line arguments and environment variables (#6744) — noreply / githubweb
- Avoid unnecessary zero-memset of the scratch flags in SYCL (#6739) — noreply / githubweb
#3 (Jan 25, 2024, 10:41:36 AM)
- implementation and tests — fnrizzi / githubweb
- implementation and tests — fnrizzi / githubweb
- only compute with relavent entries — tccleve / githubweb
- subset of team level impl of std algorithms — fnrizzi / githubweb
- fix copyright — fnrizzi / githubweb
- guard for openmptarget — fnrizzi / githubweb
- fix for openmptarget — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- SIMD: suppress a uninitialized variable warning (#6294) — noreply / githubweb
- OpenACC CMakechange Clacc (#6250) — noreply / githubweb
- `Kokkos::sort` support custom comparator (#6253) — noreply / githubweb
- Add nightly build using latest gcc and c++23 — Bruno Turcksin / githubweb
- Explicitly capture this in lambda function — Bruno Turcksin / githubweb
- Fix typo in nightly jenkins configuration — Bruno Turcksin / githubweb
- Fix a memory bug in the `free_state` function of random pools. (#6290) — noreply / githubweb
- std_algos: for_each: try condense the impl — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Enforce create_mirror restrictions w.r.t. ViewCtorArgs on all variants — Daniel Arndt / githubweb
- Use KOKKOS_IF_ON_HOST — Daniel Arndt / githubweb
- Suppress warnings — Daniel Arndt / githubweb
- Don't suppress warnings for NVHPC — Daniel Arndt / githubweb
- Improve macro definitions — Daniel Arndt / githubweb
- Enable Serial backend in HPX build — cezary.skrzynski / githubweb
- Modify fences in View API test — cezary.skrzynski / githubweb
- Check for overflow during backend initialization (Cuda, HIP, SYCL) (#6159) — noreply / githubweb
- Improve SYCL reduction performance: MDRangePolicy (#6271) — noreply / githubweb
- bring back previous code as discussed in meeting — fnrizzi / githubweb
- create cudaAPI function wrappers — tccleve / githubweb
- Reorganize #include <cuda_runtime_api.h> — tccleve / githubweb
- Some api function require cuda11.2+ — tccleve / githubweb
- Cuda10 requires "stream=nullptr" as default arg — tccleve / githubweb
- Rework stream inputs — tccleve / githubweb
- Use "if constexpr" for setCudaDevice — tccleve / githubweb
- Remove static in comment — tccleve / githubweb
- Update CI: use CUDA 11.6.2 instead of 11.6.0 (#6314) — noreply / githubweb
- add helper variable templates `are_*_iterators_v` (#6312) — noreply / githubweb
- [HIP] Optimize parallel_reduce (#6229) — noreply / githubweb
- Split Kokkos_SYCL_Parallel_Range.hpp — Daniel Arndt / githubweb
- Split Kokkos_SYCL_Parallel_Reduce.hpp — Daniel Arndt / githubweb
- Move Kokkos_SYCL_Scan.hpp — Daniel Arndt / githubweb
- Split Kokkos_Parallel_Team.hpp — Daniel Arndt / githubweb
- Add comment that we don't need to link against clang_rt with ROCm 5.5 — Bruno Turcksin / githubweb
- Error out when CXX standard is not set when using amdclang or cray clang — Bruno Turcksin / githubweb
- Disable failing tests for ROCm 5.5 and 5.6 — Bruno Turcksin / githubweb
- #5635: Add parallel_scan overload with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Serial/OpenMP: Parallel_scan with return value for TeamThreadRange (#6090) — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Allow using the SYCL execution space on AMD GPUs (#6321) — noreply / githubweb
- Update Clang+Cuda CI from 10.1 to 11.0.3 (#6318) — noreply / githubweb
- SIMD: add shift ops for all int types (#6109) — noreply / githubweb
- SYCL: Use sycl::bit_cast from oneAPI 2024.0.0 on (#6300) — noreply / githubweb
- remove spurious undefs — fnrizzi / githubweb
- Decrease maximum memory available to ccache — Bruno Turcksin / githubweb
- Fix gtest when using C++20 — Bruno Turcksin / githubweb
- Fully qualify Experimental::SYCL in algorithms to avoid conflicting namespaces — Daniel Arndt / githubweb
- address reviews [skip ci] — antoine.meyer54 / githubweb
- formatting — fnrizzi / githubweb
- SIMD: split simd unit tests into separate files (#6278) — noreply / githubweb
- Run NVHPC only on V100 — Bruno Turcksin / githubweb
- Use checked arithmetic builtins for overflow detection (#6313) — noreply / githubweb
- Passing size==0 to DeepCopy/memcpy/omp_target_memcpy (#6273) — noreply / githubweb
- allow sorting via native oneDPL to support views with stride = 1 (#6322) — noreply / githubweb
- Adopt new HIP cmake's way of finding clang-rt — nicurtis / githubweb
- Fix #6334: intel compiler does not like returning inside if constexpr (#6335) — noreply / githubweb
- Fix #6336: capturing `this` implicitly is deprecated in C++20 (#6337) — noreply / githubweb
- fix lambda capture — fnrizzi / githubweb
- remove unnecessary file, fix constraints — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix — fnrizzi / githubweb
- fix lambda capture and constraints — fnrizzi / githubweb
- keep only subset — fnrizzi / githubweb
- revert files — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix syntax — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Disable default oneDPL support in Trilinos — Daniel Arndt / githubweb
- Rename AMD GPU architectures (#6266) — noreply / githubweb
- Fix compiling SIMD libraray with NEON and gcc-13 — Daniel Arndt / githubweb
- SIMD: add generator constructors (#6347) — noreply / githubweb
- Use /usr/bin/env bash in nvcc_wrapper and kokkos_launch_compiler for portability — mikael.simberg / githubweb
- Fix cudaAPI wrapper errors for CUDA_MALLOC_ASYNC (#6346) — noreply / githubweb
- Use std::aligned_alloc for allocations (#6341) — noreply / githubweb
- Initial implementation of gfx942 (#6358) — noreply / githubweb
- Introduce constructor for multi-GPU support. — Daniel Arndt / githubweb
- Only set KOKKOS_ARCH_AMD_GPU if a AMD GPU architecture is enabled — Daniel Arndt / githubweb
- Extend 'hip_driver_check_page_migration' (#6364) — noreply / githubweb
- Replace volatile m_ready_count loads by desul::atomic_loads in task queue implementation — mikael.simberg / githubweb
- Add Kokkos_Atomic.hpp headers to OpenMP and HPX task headers — mikael.simberg / githubweb
- Add support for HIP Graph — Bruno Turcksin / githubweb
- Replace one of the ROCM 5.2 configuration with ROCM 5.6 — Bruno Turcksin / githubweb
- Use aligned version of operator new instead of aligned_alloc — Daniel Arndt / githubweb
- HPX: Don't interfere with exception handling — Daniel Arndt / githubweb
- Fix -Wformat-truncation warnings in CI (#6354) — noreply / githubweb
- Do not use HIP Graph with ROCm 5.2 — Bruno Turcksin / githubweb
- Use constexpr West in src — Bruno Turcksin / githubweb
- Use constexpr West in test — Bruno Turcksin / githubweb
- SIMD: convert binary operators to hidden friends (#6320) — noreply / githubweb
- simd: make mask and condition unit test to check with all data types (#6360) — noreply / githubweb
- team-level std algos: part 2 (#6205) — noreply / githubweb
- Add deprecated attribute to HostSpace(AllocationMechanism) definition — Damien L-G / githubweb
- Added gather_from and scatter_to for AVX2 and AVX512 simd — donlee / githubweb
- Rebased and applied feedbacks — donlee / githubweb
- Changed gen ctor usage to pass in KOKKOS_LAMBDA — donlee / githubweb
- Suppress NVCC attribute ignored warning on SIMD operators as hidden friends — Damien L-G / githubweb
- Remove deprecated code 3 support for volatile join — crtrott / githubweb
- Disable a test not working with nvhpc-23.1 — crtrott / githubweb
- Reenabling tests for nvhpc 23.7 — crtrott / githubweb
- Update containers and algorithms for NVC++ 23.7 — crtrott / githubweb
- More NVC++ 23.7 updates — crtrott / githubweb
- NVC++ clang-format fixes — crtrott / githubweb
- Update nvhpc to version 23.7 in the CI — crtrott / githubweb
- NVHPC 23.7 update: address reviewer comments — crtrott / githubweb
- OpenACC: Guard tests relying on abort — crtrott / githubweb
- Fix TestAtomic to use the test execspace — crtrott / githubweb
- Use NVHPC 23.7 for testing of OpenACC — crtrott / githubweb
- Update nvhpc gtest skip message — crtrott / githubweb
- Work around OpenMPTarget failure — crtrott / githubweb
- Update base docker file for nvhpc — crtrott / githubweb
- Remove stray Cuda graph pattern specialization from tag — Damien L-G / githubweb
- Fix typo — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Fix uninitialize variable warning with gcc 13 — Bruno Turcksin / githubweb
- HIP: Update print_configuration (#6387) — noreply / githubweb
- Add test — Daniel Arndt / githubweb
- Fix typo. — noreply / githubweb
- Check AVX, AVX2, AVX512XEON compiler macros on KokkosCore_config.h (#6248) — noreply / githubweb
- Don't use local headers or runtime in HPX backend due to deprecation — mikael.simberg / githubweb
- Explicitly check for valid device id — Daniel Arndt / githubweb
- Add a more comprehensive `kokkos_{malloc, free}` perf_test (#6377) — noreply / githubweb
- team-level std algos: part 3 (#6207) — noreply / githubweb
- Adding is_scoped_enum & to_underlying (#6356) — noreply / githubweb
- SIMD: add float simd support (#6177) — noreply / githubweb
- team-level std algos: part 4 (#6208) — noreply / githubweb
- Added a gen ctor for float (#6397) — noreply / githubweb
- team-level std algos: part 5 (#6209) — noreply / githubweb
- Deprecate Cuda(cudaStream_t, bool) — Damien L-G / githubweb
- Fixup checked interger operations death test — Damien L-G / githubweb
- Deprecate HIP(hipStream_t, bool) — Damien L-G / githubweb
- Let Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC be ON by default — Damien L-G / githubweb
- Print whether KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC is defined — Damien L-G / githubweb
- Introduce disable_malloc_async Cuda option with generated makefiles — Damien L-G / githubweb
- Preserve one build that disables Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC — Damien L-G / githubweb
- Use archive extraction time for timestamps — cezary.skrzynski / githubweb
- Disable performance benchmarks in AppVeyor CI — cezary.skrzynski / githubweb
- team-level std algos: part 6 (#6210) — noreply / githubweb
- address comments — fnrizzi / githubweb
- OpenMP backend refactor files. (#6403) — noreply / githubweb
- Drop (unused) `Cuda::cuda_internal_maximum_shared_words` — Damien L-G / githubweb
- Drop check that the host backend is initialized before the Cuda/HIP/SYCL one — Damien L-G / githubweb
- Drop unused HIPInternal::m_maxSharedWords data member — Damien L-G / githubweb
- Drop unused HIPInternal::m_hipArch static data member — Damien L-G / githubweb
- !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize — Damien L-G / githubweb
- Drop pre-Kepler logic in Cuda::impl_initialize — Damien L-G / githubweb
- use single — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- formatting — fnrizzi / githubweb
- Team-level std algos: part 7 (#6211) — noreply / githubweb
- formatting — fnrizzi / githubweb
- Enable death tests for fedora rawhide — cezary.skrzynski / githubweb
- core/src: Add half math functions to private header (#6124) — noreply / githubweb
- Drop check whether device supports unified addressing — Damien L-G / githubweb
- fix single as per Christian's suggestion — fnrizzi / githubweb
- Only warn once (at initialization) when forcing allocation in unified memory — Damien L-G / githubweb
- check-copyright improvements (#6399) — noreply / githubweb
- Use execution space instance argument to get device properties in block size deduction — Damien L-G / githubweb
- Address reviewer' comments — Bruno Turcksin / githubweb
- Fix to avoid #186-D pointless comparison warning. — maarten.arnst / githubweb
- add comment — fnrizzi / githubweb
- improve tests to address review — fnrizzi / githubweb
- Fix guard for isnan test for bhalf_t — Daniel Arndt / githubweb
- avoid potential race condition HIP — tccleve / githubweb
- Set the device id in cuda_kernel_arch — Daniel Arndt / githubweb
- [SYCL][Reduction] Group counter should use at least memory_order::acq_rel — andrei.elovikov / githubweb
- Same for scan — andrei.elovikov / githubweb
- Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy — Daniel Arndt / githubweb
- improve tests with intra-team result check — fnrizzi / githubweb
- Fixes for Kokkos::Array (#6372) — noreply / githubweb
- try fix for unique, previous impl to remove later — fnrizzi / githubweb
- #5635: Add parallel_scan changes for CUDA and TeamThreadRange — cezary.skrzynski / githubweb
- remove old impl — fnrizzi / githubweb
- #5635: Enable TeamThreadRange test for CUDA — cezary.skrzynski / githubweb
- Clean up benchmarks/gups — cwpears / githubweb
- benchmark/gups: use CMake — cwpears / githubweb
- OpenMPTarget: Disable check for SIMD compiler macros — Daniel Arndt / githubweb
- #5635: Add parallel_scan with value for CUDA and ThreadVectorRange — cezary.skrzynski / githubweb
- add missing assert — fnrizzi / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- add intra team check for missing test — fnrizzi / githubweb
- fix intel compile error — fnrizzi / githubweb
- fix unreachable for intel — fnrizzi / githubweb
- re-enable unit tests for sort and random via makefile (#6422) — noreply / githubweb
- OpenMPTarget init-join fix (#6444) — noreply / githubweb
- Fix Cuda parallel_scan ThreadVectorRange range — Daniel Arndt / githubweb
- Assign final sum in Cuda parallel_scan ThreadVectorRange — Daniel Arndt / githubweb
- Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443) — noreply / githubweb
- std_algos: improving min, max, minmax (#6421) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426) — noreply / githubweb
- Skip bessel function tests known to fail on Intel GPUs (#6434) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436) — noreply / githubweb
- Fix race condition in functor_vec_scan_ret_val test — Daniel Arndt / githubweb
- Fix parallel_scan_with_reducers test — Daniel Arndt / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425) — noreply / githubweb
- improve tests (#6432) — noreply / githubweb
- improve tests (#6437) — noreply / githubweb
- Move final assignment to correct scope — cezary.skrzynski / githubweb
- fix casting warning in Random test — fnrizzi / githubweb
- Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449) — noreply / githubweb
- HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7 — Daniel Arndt / githubweb
- fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459) — noreply / githubweb
- improve tests to check intra-team result (#6431) — noreply / githubweb
- SIMD: Math functions should be in namespace Kokkos — Daniel Arndt / githubweb
- SYCL: Disable another bessel function test for Intel GPUs — Daniel Arndt / githubweb
- Check for default device — Daniel Arndt / githubweb
- team-level std algos: part 10 (#6256) — noreply / githubweb
- team-level std algos: part 11 (#6258) — noreply / githubweb
- #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302) — noreply / githubweb
- #5635: Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- use shortcut — cezary.skrzynski / githubweb
- Skip testing for non-power-of-two team sizes — cezary.skrzynski / githubweb
- Fix formatting — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for HIP backend — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- #5635: SYCL: Add parallel_scan overload with return value — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1 — Daniel Arndt / githubweb
- Add compatible copy assignment operator to DualView — maarten.arnst / githubweb
- Fix minimum version for Google benchmark — Daniel Arndt / githubweb
- Add test of copy constructor/assignment operator for DualView. — maarten.arnst / githubweb
- Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU — maarten.arnst / githubweb
- Modify test so that source and destination view are of different type — maarten.arnst / githubweb
- Use call operator instead of run_me function — maarten.arnst / githubweb
- team-level std algos: part 12 (#6350) — noreply / githubweb
- core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407) — noreply / githubweb
- Moving abort and assert into their own public headers (#6445) — noreply / githubweb
- Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- Allow detecting SIMD types based on compiler macros (#6188) — noreply / githubweb
- Add KOKKOS_ARCH_ARM_NEON — Daniel Arndt / githubweb
- Fix implementation for cyl_bessel_i0 — Daniel Arndt / githubweb
- cleaning: remove iostream from headers where possible (IWYU) — romin.tomasetti / githubweb
- Fix compiling SIMD unit tests on NVIDIA — Daniel Arndt / githubweb
- guards to ensure DBL_EPSILON return for POWER8,9 — ajpowel / githubweb
- Fix NVCC warnings (#6483) — noreply / githubweb
- team-level std algos: part 13 (#6351) — noreply / githubweb
- Also fix annotations for generator constructor for AVX512 and NEON — Daniel Arndt / githubweb
- #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange — cezary.skrzynski / githubweb
- Fix atomic operations bug for Min and Max (#6435) — noreply / githubweb
- Fix example/build_cmake_installed_different_compiler — Daniel Arndt / githubweb
- fix impl — fnrizzi / githubweb
- Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp — noreply / githubweb
- Split Kokkos_Threads_Parallel files — Bruno Turcksin / githubweb
- HPX: Implement TeamThread and ThreadVector parallel_scan with return value — Daniel Arndt / githubweb
- Serial: Allow for distinct execution space instances (#6441) — noreply / githubweb
- add overload for TeamThreadRange — fnrizzi / githubweb
- address review comment — fnrizzi / githubweb
- Update to HIP TeamPolicy Block number heuristic (#6284) — noreply / githubweb
- SIMD: Split math functions from SIMD_Common.hpp (#6487) — noreply / githubweb
- Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON — Daniel Arndt / githubweb
- ad threadvector — fnrizzi / githubweb
- fix order — fnrizzi / githubweb
- remove guards — fnrizzi / githubweb
- UnorderedMap(space instance): proposal for #6067 — romin.tomasetti / githubweb
- Rename Kokkos_ThreadsExec to align with the other backends — Bruno Turcksin / githubweb
- Promote Kokkos_Printf.hpp to public include — Daniel Arndt / githubweb
- simd: add floor, ceil, round, trunc operations (#6393) — noreply / githubweb
- Update CI in OpenMPTarget to use llvm-17 (#6472) — noreply / githubweb
- Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp — Bruno Turcksin / githubweb
- Kokkos_SIMD_Scalar.hpp: remove extra ';' — ndellin / githubweb
- add guards — fnrizzi / githubweb
- avoid auto — fnrizzi / githubweb
- [ci skip] Update Kokkos version to 4.2.99 — Daniel Arndt / githubweb
- [deprecated code 3] remove all default device init tests — Damien L-G / githubweb
- [deprecated code 3] remove InitArguments — Damien L-G / githubweb
- [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants — Damien L-G / githubweb
- [deprecated code 3] remove {OpenMP,HPX}::partition_master — Damien L-G / githubweb
- OpenMP backend cleanup following removal of deprecated code 3 — Damien L-G / githubweb
- Get rid of FIXME_OPENMP — Damien L-G / githubweb
- [deprecated code 3] remove MasterLock — Damien L-G / githubweb
- [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option — Damien L-G / githubweb
- fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- Do not append " - blocks" to the bitset label — Damien L-G / githubweb
- with_updated_label -> append_to_label — Daniel Arndt / githubweb
- SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables — Daniel Arndt / githubweb
- Fixup in README (github -> GitHub) — Damien L-G / githubweb
- Check that device associated with stream matches requested device — Daniel Arndt / githubweb
- Threads: replace enum with constexpr int and enum class (#6514) — noreply / githubweb
- Added unit tests for reduction ops and few intel svml intrinsics — donlee / githubweb
- Remove sleep and wake functions — Bruno Turcksin / githubweb
- Remove extra constructor — Daniel Arndt / githubweb
- Prefer defaulted default constructor for Bitset (#6524) — noreply / githubweb
- SYCL: Use host-pinned memory to copy reduction/scan result (#6500) — noreply / githubweb
- Remove unused variables — Bruno Turcksin / githubweb
- Remove Sentinel struct from Threads — Bruno Turcksin / githubweb
- Small cleanup of ThreadsInternal::initialize — Bruno Turcksin / githubweb
- Remove useless forward declaration — Bruno Turcksin / githubweb
- Remove spawn function — Bruno Turcksin / githubweb
- Add comments — Bruno Turcksin / githubweb
- Fix indentation — Bruno Turcksin / githubweb
- Fix typo in macro guard — Bruno Turcksin / githubweb
- Reduce number of View constructor instantiations — Damien L-G / githubweb
- Bump HPX version used in CI to 1.9.0 — mikael.simberg / githubweb
- Split files in HIP backend — Bruno Turcksin / githubweb
- Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544) — noreply / githubweb
- Only define STDALGO_TEAM_SOURCES_* once — Daniel Arndt / githubweb
- Remove logical memory spaces — Damien L-G / githubweb
- Rollback changes to view constructors to reduce the number of instantiations (#6564) — noreply / githubweb
- Address reviewer comments — Daniel Arndt / githubweb
- Threads remove unused variables and functions (#6566) — noreply / githubweb
- Remove unused Sandia testing files (#6568) — noreply / githubweb
- fallback implementation cleanup — donlee / githubweb
- Remove empty quotation marks for static_assert — Daniel Arndt / githubweb
- [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header — Damien L-G / githubweb
- Drop Clang+CUDA workaround — Damien L-G / githubweb
- OpenMPTarget: CI compiler upgrade. (#6545) — noreply / githubweb
- m_cudaDev isn't static anymore — Daniel Arndt / githubweb
- Add crtrott's launch_latency benchmark (#6379) — noreply / githubweb
- Simplify fence functions in the Threads backend (#6571) — noreply / githubweb
- simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574) — noreply / githubweb
- Set the device id explicitly for CUDA API calls in impl_initialize — Daniel Arndt / githubweb
- OpenMPTarget: Guard scratch memory usage in ParallelReduce — Daniel Arndt / githubweb
- SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534) — noreply / githubweb
- Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header — Damien L-G / githubweb
- Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556) — noreply / githubweb
- OpenMP: No memset in viewfill (#6573) — noreply / githubweb
- Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header" — noreply / githubweb
- OpenACC: add atomics support (#6446) — noreply / githubweb
- Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543) — noreply / githubweb
- kokkos(unique): fix allocation of temporary view to enfore using the provided space instance — romin.tomasetti / githubweb
- Use binary wrapper for consistency in definition of half types numeric traits (#6590) — noreply / githubweb
- try fix — fnrizzi / githubweb
- Fix TestNumericTriats.hpp for SYCL with bfloat16 support — Daniel Arndt / githubweb
- Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3 — crtrott / githubweb
- Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601) — noreply / githubweb
- Added missing operator* to NEON simd — crtrott / githubweb
- [ci skip] Update changelog on develop for 4.2.00 (#6592) — noreply / githubweb
- Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF (#6593) — noreply / githubweb
- try fix — fnrizzi / githubweb
- avoid pyt package — fnrizzi / githubweb
- try — fnrizzi / githubweb
- fix for macos — fnrizzi / githubweb
- remove comments — fnrizzi / githubweb
- use reference — crtrott / githubweb
- add branching — fnrizzi / githubweb
- [ci skip] fix formatting — cezary.skrzynski / githubweb
- GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing — Daniel Arndt / githubweb
- nvcc(wrapper): adding missing `--generate-line-info` arg — romin.tomasetti / githubweb
- Add clang-format check to GitHub workflows (#6612) — noreply / githubweb
- graph(HIP): adding inline keyword to fix #6623 — romin.tomasetti / githubweb
- Add jenkins multibranch pipeline options — Bruno Turcksin / githubweb
- Add warp sync for Cuda parallel reduce — tccleve / githubweb
- kokkos(profiling): do not finalize in any backend — romin.tomasetti / githubweb
- Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format — Bruno Turcksin / githubweb
- update comment to include final() mention — tccleve / githubweb
- Disabling OpenACC in the CI because it emits too many warnings — Bruno Turcksin / githubweb
- unorderedmap: modernize traits — romin.tomasetti / githubweb
- nvcc wrapper: remove troubling flag to fix 6628 (#6629) — noreply / githubweb
- tools(profiling): type (related to kokkos/kokkos-tools/pull/221) — romin.tomasetti / githubweb
- This PR fixes the too-much-OpenACC-warning issue, mentioned in PR #6639. — lees2 / githubweb
- add missing header fix #6644 — fnrizzi / githubweb
- SYCL: Restrict workaround for is_device_copyable to oneAPI versions before 2024.0.0 (#6532) — noreply / githubweb
- Fixup test math functions ulp should double -> int — Damien L-G / githubweb
- Drop DualView converting copy assignment operator — Damien L-G / githubweb
- Don't use rocm-docker for clang-format — Daniel Arndt / githubweb
- Diable HIP CI — Daniel Arndt / githubweb
- Remove deprecation warning for AllocationMechanism for gcc <11.0 — Daniel Arndt / githubweb
- OpenMPTarget: clang extensions for dynamic shared memory. (#6380) — noreply / githubweb
- Fix builtin_unreachable use for MSVC/CUDA — crtrott / githubweb
- Fix missing include on msvc/cuda — crtrott / githubweb
- Avoid lambdas in constexpr branch for msvc/cuda — crtrott / githubweb
- Sidestep lacking CTAD support msvc/cuda — crtrott / githubweb
- Fix formatting — crtrott / githubweb
- Move header for Damien because he is right — crtrott / githubweb
- Unit test for issue 3371 (negative vector length should not yield a negative max_team_size) (#6076) — noreply / githubweb
- Add CMakeLists.txt for stream benchmark — cwpears / githubweb
- Do not negate the dependent true traits helper — Damien L-G / githubweb
- Drop guards to accommodate external code defining KOKKOS_ASSERT — Damien L-G / githubweb
- Use omp_get_max_active_levels() when supported — Daniel Arndt / githubweb
- Add missing gfx940 — rberger / githubweb
- Add Impl::always_false type-dendent false trait — Damien L-G / githubweb
- Per review prefer always_false<Arg>::value to is_void_v<Arg> — Damien L-G / githubweb
- Improve "no copy mechanism" exception message — bmkelle / githubweb
- Add a unit test for new deep_copy exception msg — bmkelle / githubweb
- Add missing include sstream — bmkelle / githubweb
- src->source, dst->destination — bmkelle / githubweb
- Workaround for ROCm 6.0 failing to compile with AVX2 SIMD support — Bruno Turcksin / githubweb
- SYCL: Force inlining of Kokkos::printf (#6650) — noreply / githubweb
- Improve handling of printf in OMPT on Intel GPUs — Daniel Arndt / githubweb
- OpenMP: Use `omp_get_nested` for older gcc versions (#6685) — noreply / githubweb
- Disable more Bessel tests for SYCL on INtel GPUs — Daniel Arndt / githubweb
- fill_random without exceution space instance should fence — Daniel Arndt / githubweb
- Drop unnecessary guarding for a tool library being loaded in ProfilingSection — Damien L-G / githubweb
- Drop unnecessary header include in Kokkos_Profiling_ProfileSection.hpp — Damien L-G / githubweb
- #5333: CUDA: Use scratch space appropriate to small reduction elements in Team reductions (#5334) — noreply / githubweb
- Cuda: Allocate using the correct device (#6392) — noreply / githubweb
- Let `Profiling::ProfilingSection(std::string)` constructor be explicit and nodiscard (#6690) — noreply / githubweb
- Cosmetic changes to ProfilingSection — Damien L-G / githubweb
- GitHub CI: Test with AddressSanitizer (#6676) — noreply / githubweb
- Kokkos::Array deduction guide (#6373) — noreply / githubweb
- Add CI for MSVC+Cuda (#6661) — noreply / githubweb
- SYCL: Address deprecations after oneAPI 2023.2.0 (#6577) — noreply / githubweb
- Fixup cast tolerance to double before printing — Damien L-G / githubweb
- Try linking against CUDA libararies even with KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE — Daniel Arndt / githubweb
- test_array_ctad: disable test for intel versions < 2021 — ndellin / githubweb
- Enable `{transform_}exclusive_scan` in place (#6667) — noreply / githubweb
- Add `ATOMICS_BYPASS` configuration option to disable atomics (#6692) — noreply / githubweb
- Check matching static extents in View constructor (#5190) — noreply / githubweb
- Remove Kokkos::[b]half_t volatile overloads (#6579) — noreply / githubweb
- add tests — fnrizzi / githubweb
- Provide `kokkos_swap` as part of Core and deprecate `Experimental::swap` in Algorithms (#6697) — noreply / githubweb
- Provide new public headers `<Kokkos_Clamp.hpp>` and `<Kokkos_MinMax.hpp>` (#6687) — noreply / githubweb
- Fix TeamThreadMDRange parallel_reduce (#6511) — noreply / githubweb
- add tests for in-place `inclusive_scan` (#6682) — noreply / githubweb
- Drop pointless Kokkos::Impl::CudaExec forward declaration — Damien L-G / githubweb
- Don't use the compiler launcher script if the compile language is CUDA. (#6704) — noreply / githubweb
- Deprecate `{Cuda,HIP}::detect_device_count()` and `Cuda::[detect_]device_arch()` (#6710) — noreply / githubweb
- Get rid of CudaInternal::cuda_get_error_{name,string}_wrapper — Damien L-G / githubweb
- No need to jump through so many hoops to print the error message — Damien L-G / githubweb
- HIP: Forgot to delete matching brace closing the namespace — Damien L-G / githubweb
- Make initialize and finalize of the Cuda/HIP singleton less special (#6714) — noreply / githubweb
- Kokkos_HIP.cpp: include Kokkos_Core.hpp to resolve errors — ndellin / githubweb
- Add bound checks in RangePolicy and MDRangePolicy (#6617) — noreply / githubweb
- Temporary fix to reenable HIP CI — Bruno Turcksin / githubweb
- Let the smart pointer manage the CUDA/HIP stream (#6721) — noreply / githubweb
- Fix Docker env variables — Bruno Turcksin / githubweb
- Ensure view_allocation_error does not silently ignore that no exception was thrown — Damien L-G / githubweb
- Add RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc enumerator — Damien L-G / githubweb
- [OpenACC] throw if acc_malloc returned nullptr — Damien L-G / githubweb
- Fixup using declaration — Damien L-G / githubweb
- Disable openacc.view_allocation_error test — Damien L-G / githubweb
- Guard `[MD]RangePolicy` precondition check for deprecated code 4 (#6726) — noreply / githubweb
- Add C++26 standard to CMake Setup — dev / githubweb
- Add support for C++26 in generated makefiles — Damien L-G / githubweb
- Add KOKKOS_ENABLE_CXX26 to the configuration metadata — Damien L-G / githubweb
- Reenable HIP testing — Bruno Turcksin / githubweb
- Disabling failing HIP test in the CI — Bruno Turcksin / githubweb
- Use team_size_max to fix "Team size too large" error in reducer test (#6725) — noreply / githubweb
- Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with `SharedAllocationRecord`) (#6732) — noreply / githubweb
- Untangle SharedAllocationRecord spaghetti code — Damien L-G / githubweb
- Fix TestThreadVectorMDRangeParallelReduce (#6734) — noreply / githubweb
- Cuda multi-GPU support: Allow execution space instance constructor to run (#6706) — noreply / githubweb
- Drop support for deprecated command-line arguments and environment variables (#6744) — noreply / githubweb
- Avoid unnecessary zero-memset of the scratch flags in SYCL (#6739) — noreply / githubweb
#3 (Jan 25, 2024, 10:41:36 AM)
- implementation and tests — fnrizzi / githubweb
- implementation and tests — fnrizzi / githubweb
- only compute with relavent entries — tccleve / githubweb
- subset of team level impl of std algorithms — fnrizzi / githubweb
- fix copyright — fnrizzi / githubweb
- guard for openmptarget — fnrizzi / githubweb
- fix for openmptarget — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- SIMD: suppress a uninitialized variable warning (#6294) — noreply / githubweb
- OpenACC CMakechange Clacc (#6250) — noreply / githubweb
- `Kokkos::sort` support custom comparator (#6253) — noreply / githubweb
- Add nightly build using latest gcc and c++23 — Bruno Turcksin / githubweb
- Explicitly capture this in lambda function — Bruno Turcksin / githubweb
- Fix typo in nightly jenkins configuration — Bruno Turcksin / githubweb
- Fix a memory bug in the `free_state` function of random pools. (#6290) — noreply / githubweb
- std_algos: for_each: try condense the impl — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Enforce create_mirror restrictions w.r.t. ViewCtorArgs on all variants — Daniel Arndt / githubweb
- Use KOKKOS_IF_ON_HOST — Daniel Arndt / githubweb
- Suppress warnings — Daniel Arndt / githubweb
- Don't suppress warnings for NVHPC — Daniel Arndt / githubweb
- Improve macro definitions — Daniel Arndt / githubweb
- Enable Serial backend in HPX build — cezary.skrzynski / githubweb
- Modify fences in View API test — cezary.skrzynski / githubweb
- Check for overflow during backend initialization (Cuda, HIP, SYCL) (#6159) — noreply / githubweb
- Improve SYCL reduction performance: MDRangePolicy (#6271) — noreply / githubweb
- bring back previous code as discussed in meeting — fnrizzi / githubweb
- create cudaAPI function wrappers — tccleve / githubweb
- Reorganize #include <cuda_runtime_api.h> — tccleve / githubweb
- Some api function require cuda11.2+ — tccleve / githubweb
- Cuda10 requires "stream=nullptr" as default arg — tccleve / githubweb
- Rework stream inputs — tccleve / githubweb
- Use "if constexpr" for setCudaDevice — tccleve / githubweb
- Remove static in comment — tccleve / githubweb
- Update CI: use CUDA 11.6.2 instead of 11.6.0 (#6314) — noreply / githubweb
- add helper variable templates `are_*_iterators_v` (#6312) — noreply / githubweb
- [HIP] Optimize parallel_reduce (#6229) — noreply / githubweb
- Split Kokkos_SYCL_Parallel_Range.hpp — Daniel Arndt / githubweb
- Split Kokkos_SYCL_Parallel_Reduce.hpp — Daniel Arndt / githubweb
- Move Kokkos_SYCL_Scan.hpp — Daniel Arndt / githubweb
- Split Kokkos_Parallel_Team.hpp — Daniel Arndt / githubweb
- Add comment that we don't need to link against clang_rt with ROCm 5.5 — Bruno Turcksin / githubweb
- Error out when CXX standard is not set when using amdclang or cray clang — Bruno Turcksin / githubweb
- Disable failing tests for ROCm 5.5 and 5.6 — Bruno Turcksin / githubweb
- #5635: Add parallel_scan overload with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Serial/OpenMP: Parallel_scan with return value for TeamThreadRange (#6090) — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Allow using the SYCL execution space on AMD GPUs (#6321) — noreply / githubweb
- Update Clang+Cuda CI from 10.1 to 11.0.3 (#6318) — noreply / githubweb
- SIMD: add shift ops for all int types (#6109) — noreply / githubweb
- SYCL: Use sycl::bit_cast from oneAPI 2024.0.0 on (#6300) — noreply / githubweb
- remove spurious undefs — fnrizzi / githubweb
- Decrease maximum memory available to ccache — Bruno Turcksin / githubweb
- Fix gtest when using C++20 — Bruno Turcksin / githubweb
- Fully qualify Experimental::SYCL in algorithms to avoid conflicting namespaces — Daniel Arndt / githubweb
- address reviews [skip ci] — antoine.meyer54 / githubweb
- formatting — fnrizzi / githubweb
- SIMD: split simd unit tests into separate files (#6278) — noreply / githubweb
- Run NVHPC only on V100 — Bruno Turcksin / githubweb
- Use checked arithmetic builtins for overflow detection (#6313) — noreply / githubweb
- Passing size==0 to DeepCopy/memcpy/omp_target_memcpy (#6273) — noreply / githubweb
- allow sorting via native oneDPL to support views with stride = 1 (#6322) — noreply / githubweb
- Adopt new HIP cmake's way of finding clang-rt — nicurtis / githubweb
- Fix #6334: intel compiler does not like returning inside if constexpr (#6335) — noreply / githubweb
- Fix #6336: capturing `this` implicitly is deprecated in C++20 (#6337) — noreply / githubweb
- fix lambda capture — fnrizzi / githubweb
- remove unnecessary file, fix constraints — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix — fnrizzi / githubweb
- fix lambda capture and constraints — fnrizzi / githubweb
- keep only subset — fnrizzi / githubweb
- revert files — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix syntax — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Disable default oneDPL support in Trilinos — Daniel Arndt / githubweb
- Rename AMD GPU architectures (#6266) — noreply / githubweb
- Fix compiling SIMD libraray with NEON and gcc-13 — Daniel Arndt / githubweb
- SIMD: add generator constructors (#6347) — noreply / githubweb
- Use /usr/bin/env bash in nvcc_wrapper and kokkos_launch_compiler for portability — mikael.simberg / githubweb
- Fix cudaAPI wrapper errors for CUDA_MALLOC_ASYNC (#6346) — noreply / githubweb
- Use std::aligned_alloc for allocations (#6341) — noreply / githubweb
- Initial implementation of gfx942 (#6358) — noreply / githubweb
- Introduce constructor for multi-GPU support. — Daniel Arndt / githubweb
- Only set KOKKOS_ARCH_AMD_GPU if a AMD GPU architecture is enabled — Daniel Arndt / githubweb
- Extend 'hip_driver_check_page_migration' (#6364) — noreply / githubweb
- Replace volatile m_ready_count loads by desul::atomic_loads in task queue implementation — mikael.simberg / githubweb
- Add Kokkos_Atomic.hpp headers to OpenMP and HPX task headers — mikael.simberg / githubweb
- Add support for HIP Graph — Bruno Turcksin / githubweb
- Replace one of the ROCM 5.2 configuration with ROCM 5.6 — Bruno Turcksin / githubweb
- Use aligned version of operator new instead of aligned_alloc — Daniel Arndt / githubweb
- HPX: Don't interfere with exception handling — Daniel Arndt / githubweb
- Fix -Wformat-truncation warnings in CI (#6354) — noreply / githubweb
- Do not use HIP Graph with ROCm 5.2 — Bruno Turcksin / githubweb
- Use constexpr West in src — Bruno Turcksin / githubweb
- Use constexpr West in test — Bruno Turcksin / githubweb
- SIMD: convert binary operators to hidden friends (#6320) — noreply / githubweb
- simd: make mask and condition unit test to check with all data types (#6360) — noreply / githubweb
- team-level std algos: part 2 (#6205) — noreply / githubweb
- Add deprecated attribute to HostSpace(AllocationMechanism) definition — Damien L-G / githubweb
- Added gather_from and scatter_to for AVX2 and AVX512 simd — donlee / githubweb
- Rebased and applied feedbacks — donlee / githubweb
- Changed gen ctor usage to pass in KOKKOS_LAMBDA — donlee / githubweb
- Suppress NVCC attribute ignored warning on SIMD operators as hidden friends — Damien L-G / githubweb
- Remove deprecated code 3 support for volatile join — crtrott / githubweb
- Disable a test not working with nvhpc-23.1 — crtrott / githubweb
- Reenabling tests for nvhpc 23.7 — crtrott / githubweb
- Update containers and algorithms for NVC++ 23.7 — crtrott / githubweb
- More NVC++ 23.7 updates — crtrott / githubweb
- NVC++ clang-format fixes — crtrott / githubweb
- Update nvhpc to version 23.7 in the CI — crtrott / githubweb
- NVHPC 23.7 update: address reviewer comments — crtrott / githubweb
- OpenACC: Guard tests relying on abort — crtrott / githubweb
- Fix TestAtomic to use the test execspace — crtrott / githubweb
- Use NVHPC 23.7 for testing of OpenACC — crtrott / githubweb
- Update nvhpc gtest skip message — crtrott / githubweb
- Work around OpenMPTarget failure — crtrott / githubweb
- Update base docker file for nvhpc — crtrott / githubweb
- Remove stray Cuda graph pattern specialization from tag — Damien L-G / githubweb
- Fix typo — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Fix uninitialize variable warning with gcc 13 — Bruno Turcksin / githubweb
- HIP: Update print_configuration (#6387) — noreply / githubweb
- Add test — Daniel Arndt / githubweb
- Fix typo. — noreply / githubweb
- Check AVX, AVX2, AVX512XEON compiler macros on KokkosCore_config.h (#6248) — noreply / githubweb
- Don't use local headers or runtime in HPX backend due to deprecation — mikael.simberg / githubweb
- Explicitly check for valid device id — Daniel Arndt / githubweb
- Add a more comprehensive `kokkos_{malloc, free}` perf_test (#6377) — noreply / githubweb
- team-level std algos: part 3 (#6207) — noreply / githubweb
- Adding is_scoped_enum & to_underlying (#6356) — noreply / githubweb
- SIMD: add float simd support (#6177) — noreply / githubweb
- team-level std algos: part 4 (#6208) — noreply / githubweb
- Added a gen ctor for float (#6397) — noreply / githubweb
- team-level std algos: part 5 (#6209) — noreply / githubweb
- Deprecate Cuda(cudaStream_t, bool) — Damien L-G / githubweb
- Fixup checked interger operations death test — Damien L-G / githubweb
- Deprecate HIP(hipStream_t, bool) — Damien L-G / githubweb
- Let Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC be ON by default — Damien L-G / githubweb
- Print whether KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC is defined — Damien L-G / githubweb
- Introduce disable_malloc_async Cuda option with generated makefiles — Damien L-G / githubweb
- Preserve one build that disables Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC — Damien L-G / githubweb
- Use archive extraction time for timestamps — cezary.skrzynski / githubweb
- Disable performance benchmarks in AppVeyor CI — cezary.skrzynski / githubweb
- team-level std algos: part 6 (#6210) — noreply / githubweb
- address comments — fnrizzi / githubweb
- OpenMP backend refactor files. (#6403) — noreply / githubweb
- Drop (unused) `Cuda::cuda_internal_maximum_shared_words` — Damien L-G / githubweb
- Drop check that the host backend is initialized before the Cuda/HIP/SYCL one — Damien L-G / githubweb
- Drop unused HIPInternal::m_maxSharedWords data member — Damien L-G / githubweb
- Drop unused HIPInternal::m_hipArch static data member — Damien L-G / githubweb
- !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize — Damien L-G / githubweb
- Drop pre-Kepler logic in Cuda::impl_initialize — Damien L-G / githubweb
- use single — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- formatting — fnrizzi / githubweb
- Team-level std algos: part 7 (#6211) — noreply / githubweb
- formatting — fnrizzi / githubweb
- Enable death tests for fedora rawhide — cezary.skrzynski / githubweb
- core/src: Add half math functions to private header (#6124) — noreply / githubweb
- Drop check whether device supports unified addressing — Damien L-G / githubweb
- fix single as per Christian's suggestion — fnrizzi / githubweb
- Only warn once (at initialization) when forcing allocation in unified memory — Damien L-G / githubweb
- check-copyright improvements (#6399) — noreply / githubweb
- Use execution space instance argument to get device properties in block size deduction — Damien L-G / githubweb
- Address reviewer' comments — Bruno Turcksin / githubweb
- Fix to avoid #186-D pointless comparison warning. — maarten.arnst / githubweb
- add comment — fnrizzi / githubweb
- improve tests to address review — fnrizzi / githubweb
- Fix guard for isnan test for bhalf_t — Daniel Arndt / githubweb
- avoid potential race condition HIP — tccleve / githubweb
- Set the device id in cuda_kernel_arch — Daniel Arndt / githubweb
- [SYCL][Reduction] Group counter should use at least memory_order::acq_rel — andrei.elovikov / githubweb
- Same for scan — andrei.elovikov / githubweb
- Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy — Daniel Arndt / githubweb
- improve tests with intra-team result check — fnrizzi / githubweb
- Fixes for Kokkos::Array (#6372) — noreply / githubweb
- try fix for unique, previous impl to remove later — fnrizzi / githubweb
- #5635: Add parallel_scan changes for CUDA and TeamThreadRange — cezary.skrzynski / githubweb
- remove old impl — fnrizzi / githubweb
- #5635: Enable TeamThreadRange test for CUDA — cezary.skrzynski / githubweb
- Clean up benchmarks/gups — cwpears / githubweb
- benchmark/gups: use CMake — cwpears / githubweb
- OpenMPTarget: Disable check for SIMD compiler macros — Daniel Arndt / githubweb
- #5635: Add parallel_scan with value for CUDA and ThreadVectorRange — cezary.skrzynski / githubweb
- add missing assert — fnrizzi / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- add intra team check for missing test — fnrizzi / githubweb
- fix intel compile error — fnrizzi / githubweb
- fix unreachable for intel — fnrizzi / githubweb
- re-enable unit tests for sort and random via makefile (#6422) — noreply / githubweb
- OpenMPTarget init-join fix (#6444) — noreply / githubweb
- Fix Cuda parallel_scan ThreadVectorRange range — Daniel Arndt / githubweb
- Assign final sum in Cuda parallel_scan ThreadVectorRange — Daniel Arndt / githubweb
- Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443) — noreply / githubweb
- std_algos: improving min, max, minmax (#6421) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426) — noreply / githubweb
- Skip bessel function tests known to fail on Intel GPUs (#6434) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436) — noreply / githubweb
- Fix race condition in functor_vec_scan_ret_val test — Daniel Arndt / githubweb
- Fix parallel_scan_with_reducers test — Daniel Arndt / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425) — noreply / githubweb
- improve tests (#6432) — noreply / githubweb
- improve tests (#6437) — noreply / githubweb
- Move final assignment to correct scope — cezary.skrzynski / githubweb
- fix casting warning in Random test — fnrizzi / githubweb
- Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449) — noreply / githubweb
- HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7 — Daniel Arndt / githubweb
- fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459) — noreply / githubweb
- improve tests to check intra-team result (#6431) — noreply / githubweb
- SIMD: Math functions should be in namespace Kokkos — Daniel Arndt / githubweb
- SYCL: Disable another bessel function test for Intel GPUs — Daniel Arndt / githubweb
- Check for default device — Daniel Arndt / githubweb
- team-level std algos: part 10 (#6256) — noreply / githubweb
- team-level std algos: part 11 (#6258) — noreply / githubweb
- #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302) — noreply / githubweb
- #5635: Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- use shortcut — cezary.skrzynski / githubweb
- Skip testing for non-power-of-two team sizes — cezary.skrzynski / githubweb
- Fix formatting — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for HIP backend — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- #5635: SYCL: Add parallel_scan overload with return value — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1 — Daniel Arndt / githubweb
- Add compatible copy assignment operator to DualView — maarten.arnst / githubweb
- Fix minimum version for Google benchmark — Daniel Arndt / githubweb
- Add test of copy constructor/assignment operator for DualView. — maarten.arnst / githubweb
- Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU — maarten.arnst / githubweb
- Modify test so that source and destination view are of different type — maarten.arnst / githubweb
- Use call operator instead of run_me function — maarten.arnst / githubweb
- team-level std algos: part 12 (#6350) — noreply / githubweb
- core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407) — noreply / githubweb
- Moving abort and assert into their own public headers (#6445) — noreply / githubweb
- Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- Allow detecting SIMD types based on compiler macros (#6188) — noreply / githubweb
- Add KOKKOS_ARCH_ARM_NEON — Daniel Arndt / githubweb
- Fix implementation for cyl_bessel_i0 — Daniel Arndt / githubweb
- cleaning: remove iostream from headers where possible (IWYU) — romin.tomasetti / githubweb
- Fix compiling SIMD unit tests on NVIDIA — Daniel Arndt / githubweb
- guards to ensure DBL_EPSILON return for POWER8,9 — ajpowel / githubweb
- Fix NVCC warnings (#6483) — noreply / githubweb
- team-level std algos: part 13 (#6351) — noreply / githubweb
- Also fix annotations for generator constructor for AVX512 and NEON — Daniel Arndt / githubweb
- #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange — cezary.skrzynski / githubweb
- Fix atomic operations bug for Min and Max (#6435) — noreply / githubweb
- Fix example/build_cmake_installed_different_compiler — Daniel Arndt / githubweb
- fix impl — fnrizzi / githubweb
- Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp — noreply / githubweb
- Split Kokkos_Threads_Parallel files — Bruno Turcksin / githubweb
- HPX: Implement TeamThread and ThreadVector parallel_scan with return value — Daniel Arndt / githubweb
- Serial: Allow for distinct execution space instances (#6441) — noreply / githubweb
- add overload for TeamThreadRange — fnrizzi / githubweb
- address review comment — fnrizzi / githubweb
- Update to HIP TeamPolicy Block number heuristic (#6284) — noreply / githubweb
- SIMD: Split math functions from SIMD_Common.hpp (#6487) — noreply / githubweb
- Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON — Daniel Arndt / githubweb
- ad threadvector — fnrizzi / githubweb
- fix order — fnrizzi / githubweb
- remove guards — fnrizzi / githubweb
- UnorderedMap(space instance): proposal for #6067 — romin.tomasetti / githubweb
- Rename Kokkos_ThreadsExec to align with the other backends — Bruno Turcksin / githubweb
- Promote Kokkos_Printf.hpp to public include — Daniel Arndt / githubweb
- simd: add floor, ceil, round, trunc operations (#6393) — noreply / githubweb
- Update CI in OpenMPTarget to use llvm-17 (#6472) — noreply / githubweb
- Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp — Bruno Turcksin / githubweb
- Kokkos_SIMD_Scalar.hpp: remove extra ';' — ndellin / githubweb
- add guards — fnrizzi / githubweb
- avoid auto — fnrizzi / githubweb
- [ci skip] Update Kokkos version to 4.2.99 — Daniel Arndt / githubweb
- [deprecated code 3] remove all default device init tests — Damien L-G / githubweb
- [deprecated code 3] remove InitArguments — Damien L-G / githubweb
- [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants — Damien L-G / githubweb
- [deprecated code 3] remove {OpenMP,HPX}::partition_master — Damien L-G / githubweb
- OpenMP backend cleanup following removal of deprecated code 3 — Damien L-G / githubweb
- Get rid of FIXME_OPENMP — Damien L-G / githubweb
- [deprecated code 3] remove MasterLock — Damien L-G / githubweb
- [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option — Damien L-G / githubweb
- fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- Do not append " - blocks" to the bitset label — Damien L-G / githubweb
- with_updated_label -> append_to_label — Daniel Arndt / githubweb
- SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables — Daniel Arndt / githubweb
- Fixup in README (github -> GitHub) — Damien L-G / githubweb
- Check that device associated with stream matches requested device — Daniel Arndt / githubweb
- Threads: replace enum with constexpr int and enum class (#6514) — noreply / githubweb
- Added unit tests for reduction ops and few intel svml intrinsics — donlee / githubweb
- Remove sleep and wake functions — Bruno Turcksin / githubweb
- Remove extra constructor — Daniel Arndt / githubweb
- Prefer defaulted default constructor for Bitset (#6524) — noreply / githubweb
- SYCL: Use host-pinned memory to copy reduction/scan result (#6500) — noreply / githubweb
- Remove unused variables — Bruno Turcksin / githubweb
- Remove Sentinel struct from Threads — Bruno Turcksin / githubweb
- Small cleanup of ThreadsInternal::initialize — Bruno Turcksin / githubweb
- Remove useless forward declaration — Bruno Turcksin / githubweb
- Remove spawn function — Bruno Turcksin / githubweb
- Add comments — Bruno Turcksin / githubweb
- Fix indentation — Bruno Turcksin / githubweb
- Fix typo in macro guard — Bruno Turcksin / githubweb
- Reduce number of View constructor instantiations — Damien L-G / githubweb
- Bump HPX version used in CI to 1.9.0 — mikael.simberg / githubweb
- Split files in HIP backend — Bruno Turcksin / githubweb
- Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544) — noreply / githubweb
- Only define STDALGO_TEAM_SOURCES_* once — Daniel Arndt / githubweb
- Remove logical memory spaces — Damien L-G / githubweb
- Rollback changes to view constructors to reduce the number of instantiations (#6564) — noreply / githubweb
- Address reviewer comments — Daniel Arndt / githubweb
- Threads remove unused variables and functions (#6566) — noreply / githubweb
- Remove unused Sandia testing files (#6568) — noreply / githubweb
- fallback implementation cleanup — donlee / githubweb
- Remove empty quotation marks for static_assert — Daniel Arndt / githubweb
- [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header — Damien L-G / githubweb
- Drop Clang+CUDA workaround — Damien L-G / githubweb
- OpenMPTarget: CI compiler upgrade. (#6545) — noreply / githubweb
- m_cudaDev isn't static anymore — Daniel Arndt / githubweb
- Add crtrott's launch_latency benchmark (#6379) — noreply / githubweb
- Simplify fence functions in the Threads backend (#6571) — noreply / githubweb
- simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574) — noreply / githubweb
- Set the device id explicitly for CUDA API calls in impl_initialize — Daniel Arndt / githubweb
- OpenMPTarget: Guard scratch memory usage in ParallelReduce — Daniel Arndt / githubweb
- SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534) — noreply / githubweb
- Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header — Damien L-G / githubweb
- Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556) — noreply / githubweb
- OpenMP: No memset in viewfill (#6573) — noreply / githubweb
- Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header" — noreply / githubweb
- OpenACC: add atomics support (#6446) — noreply / githubweb
- Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543) — noreply / githubweb
- kokkos(unique): fix allocation of temporary view to enfore using the provided space instance — romin.tomasetti / githubweb
- Use binary wrapper for consistency in definition of half types numeric traits (#6590) — noreply / githubweb
- try fix — fnrizzi / githubweb
- Fix TestNumericTriats.hpp for SYCL with bfloat16 support — Daniel Arndt / githubweb
- Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3 — crtrott / githubweb
- Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601) — noreply / githubweb
- Added missing operator* to NEON simd — crtrott / githubweb
- [ci skip] Update changelog on develop for 4.2.00 (#6592) — noreply / githubweb
- Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF (#6593) — noreply / githubweb
- try fix — fnrizzi / githubweb
- avoid pyt package — fnrizzi / githubweb
- try — fnrizzi / githubweb
- fix for macos — fnrizzi / githubweb
- remove comments — fnrizzi / githubweb
- use reference — crtrott / githubweb
- add branching — fnrizzi / githubweb
- [ci skip] fix formatting — cezary.skrzynski / githubweb
- GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing — Daniel Arndt / githubweb
- nvcc(wrapper): adding missing `--generate-line-info` arg — romin.tomasetti / githubweb
- Add clang-format check to GitHub workflows (#6612) — noreply / githubweb
- graph(HIP): adding inline keyword to fix #6623 — romin.tomasetti / githubweb
- Add jenkins multibranch pipeline options — Bruno Turcksin / githubweb
- Add warp sync for Cuda parallel reduce — tccleve / githubweb
- kokkos(profiling): do not finalize in any backend — romin.tomasetti / githubweb
- Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format — Bruno Turcksin / githubweb
- update comment to include final() mention — tccleve / githubweb
- Disabling OpenACC in the CI because it emits too many warnings — Bruno Turcksin / githubweb
- unorderedmap: modernize traits — romin.tomasetti / githubweb
- nvcc wrapper: remove troubling flag to fix 6628 (#6629) — noreply / githubweb
- tools(profiling): type (related to kokkos/kokkos-tools/pull/221) — romin.tomasetti / githubweb
- This PR fixes the too-much-OpenACC-warning issue, mentioned in PR #6639. — lees2 / githubweb
- add missing header fix #6644 — fnrizzi / githubweb
- SYCL: Restrict workaround for is_device_copyable to oneAPI versions before 2024.0.0 (#6532) — noreply / githubweb
- Fixup test math functions ulp should double -> int — Damien L-G / githubweb
- Drop DualView converting copy assignment operator — Damien L-G / githubweb
- Don't use rocm-docker for clang-format — Daniel Arndt / githubweb
- Diable HIP CI — Daniel Arndt / githubweb
- Remove deprecation warning for AllocationMechanism for gcc <11.0 — Daniel Arndt / githubweb
- OpenMPTarget: clang extensions for dynamic shared memory. (#6380) — noreply / githubweb
- Fix builtin_unreachable use for MSVC/CUDA — crtrott / githubweb
- Fix missing include on msvc/cuda — crtrott / githubweb
- Avoid lambdas in constexpr branch for msvc/cuda — crtrott / githubweb
- Sidestep lacking CTAD support msvc/cuda — crtrott / githubweb
- Fix formatting — crtrott / githubweb
- Move header for Damien because he is right — crtrott / githubweb
- Unit test for issue 3371 (negative vector length should not yield a negative max_team_size) (#6076) — noreply / githubweb
- Add CMakeLists.txt for stream benchmark — cwpears / githubweb
- Do not negate the dependent true traits helper — Damien L-G / githubweb
- Drop guards to accommodate external code defining KOKKOS_ASSERT — Damien L-G / githubweb
- Use omp_get_max_active_levels() when supported — Daniel Arndt / githubweb
- Add missing gfx940 — rberger / githubweb
- Add Impl::always_false type-dendent false trait — Damien L-G / githubweb
- Per review prefer always_false<Arg>::value to is_void_v<Arg> — Damien L-G / githubweb
- Improve "no copy mechanism" exception message — bmkelle / githubweb
- Add a unit test for new deep_copy exception msg — bmkelle / githubweb
- Add missing include sstream — bmkelle / githubweb
- src->source, dst->destination — bmkelle / githubweb
- Workaround for ROCm 6.0 failing to compile with AVX2 SIMD support — Bruno Turcksin / githubweb
- SYCL: Force inlining of Kokkos::printf (#6650) — noreply / githubweb
- Improve handling of printf in OMPT on Intel GPUs — Daniel Arndt / githubweb
- OpenMP: Use `omp_get_nested` for older gcc versions (#6685) — noreply / githubweb
- Disable more Bessel tests for SYCL on INtel GPUs — Daniel Arndt / githubweb
- fill_random without exceution space instance should fence — Daniel Arndt / githubweb
- Drop unnecessary guarding for a tool library being loaded in ProfilingSection — Damien L-G / githubweb
- Drop unnecessary header include in Kokkos_Profiling_ProfileSection.hpp — Damien L-G / githubweb
- #5333: CUDA: Use scratch space appropriate to small reduction elements in Team reductions (#5334) — noreply / githubweb
- Cuda: Allocate using the correct device (#6392) — noreply / githubweb
- Let `Profiling::ProfilingSection(std::string)` constructor be explicit and nodiscard (#6690) — noreply / githubweb
- Cosmetic changes to ProfilingSection — Damien L-G / githubweb
- GitHub CI: Test with AddressSanitizer (#6676) — noreply / githubweb
- Kokkos::Array deduction guide (#6373) — noreply / githubweb
- Add CI for MSVC+Cuda (#6661) — noreply / githubweb
- SYCL: Address deprecations after oneAPI 2023.2.0 (#6577) — noreply / githubweb
- Fixup cast tolerance to double before printing — Damien L-G / githubweb
- Try linking against CUDA libararies even with KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE — Daniel Arndt / githubweb
- test_array_ctad: disable test for intel versions < 2021 — ndellin / githubweb
- Enable `{transform_}exclusive_scan` in place (#6667) — noreply / githubweb
- Add `ATOMICS_BYPASS` configuration option to disable atomics (#6692) — noreply / githubweb
- Check matching static extents in View constructor (#5190) — noreply / githubweb
- Remove Kokkos::[b]half_t volatile overloads (#6579) — noreply / githubweb
- add tests — fnrizzi / githubweb
- Provide `kokkos_swap` as part of Core and deprecate `Experimental::swap` in Algorithms (#6697) — noreply / githubweb
- Provide new public headers `<Kokkos_Clamp.hpp>` and `<Kokkos_MinMax.hpp>` (#6687) — noreply / githubweb
- Fix TeamThreadMDRange parallel_reduce (#6511) — noreply / githubweb
- add tests for in-place `inclusive_scan` (#6682) — noreply / githubweb
- Drop pointless Kokkos::Impl::CudaExec forward declaration — Damien L-G / githubweb
- Don't use the compiler launcher script if the compile language is CUDA. (#6704) — noreply / githubweb
- Deprecate `{Cuda,HIP}::detect_device_count()` and `Cuda::[detect_]device_arch()` (#6710) — noreply / githubweb
- Get rid of CudaInternal::cuda_get_error_{name,string}_wrapper — Damien L-G / githubweb
- No need to jump through so many hoops to print the error message — Damien L-G / githubweb
- HIP: Forgot to delete matching brace closing the namespace — Damien L-G / githubweb
- Make initialize and finalize of the Cuda/HIP singleton less special (#6714) — noreply / githubweb
- Kokkos_HIP.cpp: include Kokkos_Core.hpp to resolve errors — ndellin / githubweb
- Add bound checks in RangePolicy and MDRangePolicy (#6617) — noreply / githubweb
- Temporary fix to reenable HIP CI — Bruno Turcksin / githubweb
- Let the smart pointer manage the CUDA/HIP stream (#6721) — noreply / githubweb
- Fix Docker env variables — Bruno Turcksin / githubweb
- Ensure view_allocation_error does not silently ignore that no exception was thrown — Damien L-G / githubweb
- Add RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc enumerator — Damien L-G / githubweb
- [OpenACC] throw if acc_malloc returned nullptr — Damien L-G / githubweb
- Fixup using declaration — Damien L-G / githubweb
- Disable openacc.view_allocation_error test — Damien L-G / githubweb
- Guard `[MD]RangePolicy` precondition check for deprecated code 4 (#6726) — noreply / githubweb
- Add C++26 standard to CMake Setup — dev / githubweb
- Add support for C++26 in generated makefiles — Damien L-G / githubweb
- Add KOKKOS_ENABLE_CXX26 to the configuration metadata — Damien L-G / githubweb
- Reenable HIP testing — Bruno Turcksin / githubweb
- Disabling failing HIP test in the CI — Bruno Turcksin / githubweb
- Use team_size_max to fix "Team size too large" error in reducer test (#6725) — noreply / githubweb
- Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with `SharedAllocationRecord`) (#6732) — noreply / githubweb
- Untangle SharedAllocationRecord spaghetti code — Damien L-G / githubweb
- Fix TestThreadVectorMDRangeParallelReduce (#6734) — noreply / githubweb
- Cuda multi-GPU support: Allow execution space instance constructor to run (#6706) — noreply / githubweb
- Drop support for deprecated command-line arguments and environment variables (#6744) — noreply / githubweb
- Avoid unnecessary zero-memset of the scratch flags in SYCL (#6739) — noreply / githubweb
#3 (Jan 25, 2024, 10:41:36 AM)
- implementation and tests — fnrizzi / githubweb
- implementation and tests — fnrizzi / githubweb
- only compute with relavent entries — tccleve / githubweb
- subset of team level impl of std algorithms — fnrizzi / githubweb
- fix copyright — fnrizzi / githubweb
- guard for openmptarget — fnrizzi / githubweb
- fix for openmptarget — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- SIMD: suppress a uninitialized variable warning (#6294) — noreply / githubweb
- OpenACC CMakechange Clacc (#6250) — noreply / githubweb
- `Kokkos::sort` support custom comparator (#6253) — noreply / githubweb
- Add nightly build using latest gcc and c++23 — Bruno Turcksin / githubweb
- Explicitly capture this in lambda function — Bruno Turcksin / githubweb
- Fix typo in nightly jenkins configuration — Bruno Turcksin / githubweb
- Fix a memory bug in the `free_state` function of random pools. (#6290) — noreply / githubweb
- std_algos: for_each: try condense the impl — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Enforce create_mirror restrictions w.r.t. ViewCtorArgs on all variants — Daniel Arndt / githubweb
- Use KOKKOS_IF_ON_HOST — Daniel Arndt / githubweb
- Suppress warnings — Daniel Arndt / githubweb
- Don't suppress warnings for NVHPC — Daniel Arndt / githubweb
- Improve macro definitions — Daniel Arndt / githubweb
- Enable Serial backend in HPX build — cezary.skrzynski / githubweb
- Modify fences in View API test — cezary.skrzynski / githubweb
- Check for overflow during backend initialization (Cuda, HIP, SYCL) (#6159) — noreply / githubweb
- Improve SYCL reduction performance: MDRangePolicy (#6271) — noreply / githubweb
- bring back previous code as discussed in meeting — fnrizzi / githubweb
- create cudaAPI function wrappers — tccleve / githubweb
- Reorganize #include <cuda_runtime_api.h> — tccleve / githubweb
- Some api function require cuda11.2+ — tccleve / githubweb
- Cuda10 requires "stream=nullptr" as default arg — tccleve / githubweb
- Rework stream inputs — tccleve / githubweb
- Use "if constexpr" for setCudaDevice — tccleve / githubweb
- Remove static in comment — tccleve / githubweb
- Update CI: use CUDA 11.6.2 instead of 11.6.0 (#6314) — noreply / githubweb
- add helper variable templates `are_*_iterators_v` (#6312) — noreply / githubweb
- [HIP] Optimize parallel_reduce (#6229) — noreply / githubweb
- Split Kokkos_SYCL_Parallel_Range.hpp — Daniel Arndt / githubweb
- Split Kokkos_SYCL_Parallel_Reduce.hpp — Daniel Arndt / githubweb
- Move Kokkos_SYCL_Scan.hpp — Daniel Arndt / githubweb
- Split Kokkos_Parallel_Team.hpp — Daniel Arndt / githubweb
- Add comment that we don't need to link against clang_rt with ROCm 5.5 — Bruno Turcksin / githubweb
- Error out when CXX standard is not set when using amdclang or cray clang — Bruno Turcksin / githubweb
- Disable failing tests for ROCm 5.5 and 5.6 — Bruno Turcksin / githubweb
- #5635: Add parallel_scan overload with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Serial/OpenMP: Parallel_scan with return value for TeamThreadRange (#6090) — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Allow using the SYCL execution space on AMD GPUs (#6321) — noreply / githubweb
- Update Clang+Cuda CI from 10.1 to 11.0.3 (#6318) — noreply / githubweb
- SIMD: add shift ops for all int types (#6109) — noreply / githubweb
- SYCL: Use sycl::bit_cast from oneAPI 2024.0.0 on (#6300) — noreply / githubweb
- remove spurious undefs — fnrizzi / githubweb
- Decrease maximum memory available to ccache — Bruno Turcksin / githubweb
- Fix gtest when using C++20 — Bruno Turcksin / githubweb
- Fully qualify Experimental::SYCL in algorithms to avoid conflicting namespaces — Daniel Arndt / githubweb
- address reviews [skip ci] — antoine.meyer54 / githubweb
- formatting — fnrizzi / githubweb
- SIMD: split simd unit tests into separate files (#6278) — noreply / githubweb
- Run NVHPC only on V100 — Bruno Turcksin / githubweb
- Use checked arithmetic builtins for overflow detection (#6313) — noreply / githubweb
- Passing size==0 to DeepCopy/memcpy/omp_target_memcpy (#6273) — noreply / githubweb
- allow sorting via native oneDPL to support views with stride = 1 (#6322) — noreply / githubweb
- Adopt new HIP cmake's way of finding clang-rt — nicurtis / githubweb
- Fix #6334: intel compiler does not like returning inside if constexpr (#6335) — noreply / githubweb
- Fix #6336: capturing `this` implicitly is deprecated in C++20 (#6337) — noreply / githubweb
- fix lambda capture — fnrizzi / githubweb
- remove unnecessary file, fix constraints — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix — fnrizzi / githubweb
- fix lambda capture and constraints — fnrizzi / githubweb
- keep only subset — fnrizzi / githubweb
- revert files — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix syntax — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Disable default oneDPL support in Trilinos — Daniel Arndt / githubweb
- Rename AMD GPU architectures (#6266) — noreply / githubweb
- Fix compiling SIMD libraray with NEON and gcc-13 — Daniel Arndt / githubweb
- SIMD: add generator constructors (#6347) — noreply / githubweb
- Use /usr/bin/env bash in nvcc_wrapper and kokkos_launch_compiler for portability — mikael.simberg / githubweb
- Fix cudaAPI wrapper errors for CUDA_MALLOC_ASYNC (#6346) — noreply / githubweb
- Use std::aligned_alloc for allocations (#6341) — noreply / githubweb
- Initial implementation of gfx942 (#6358) — noreply / githubweb
- Introduce constructor for multi-GPU support. — Daniel Arndt / githubweb
- Only set KOKKOS_ARCH_AMD_GPU if a AMD GPU architecture is enabled — Daniel Arndt / githubweb
- Extend 'hip_driver_check_page_migration' (#6364) — noreply / githubweb
- Replace volatile m_ready_count loads by desul::atomic_loads in task queue implementation — mikael.simberg / githubweb
- Add Kokkos_Atomic.hpp headers to OpenMP and HPX task headers — mikael.simberg / githubweb
- Add support for HIP Graph — Bruno Turcksin / githubweb
- Replace one of the ROCM 5.2 configuration with ROCM 5.6 — Bruno Turcksin / githubweb
- Use aligned version of operator new instead of aligned_alloc — Daniel Arndt / githubweb
- HPX: Don't interfere with exception handling — Daniel Arndt / githubweb
- Fix -Wformat-truncation warnings in CI (#6354) — noreply / githubweb
- Do not use HIP Graph with ROCm 5.2 — Bruno Turcksin / githubweb
- Use constexpr West in src — Bruno Turcksin / githubweb
- Use constexpr West in test — Bruno Turcksin / githubweb
- SIMD: convert binary operators to hidden friends (#6320) — noreply / githubweb
- simd: make mask and condition unit test to check with all data types (#6360) — noreply / githubweb
- team-level std algos: part 2 (#6205) — noreply / githubweb
- Add deprecated attribute to HostSpace(AllocationMechanism) definition — Damien L-G / githubweb
- Added gather_from and scatter_to for AVX2 and AVX512 simd — donlee / githubweb
- Rebased and applied feedbacks — donlee / githubweb
- Changed gen ctor usage to pass in KOKKOS_LAMBDA — donlee / githubweb
- Suppress NVCC attribute ignored warning on SIMD operators as hidden friends — Damien L-G / githubweb
- Remove deprecated code 3 support for volatile join — crtrott / githubweb
- Disable a test not working with nvhpc-23.1 — crtrott / githubweb
- Reenabling tests for nvhpc 23.7 — crtrott / githubweb
- Update containers and algorithms for NVC++ 23.7 — crtrott / githubweb
- More NVC++ 23.7 updates — crtrott / githubweb
- NVC++ clang-format fixes — crtrott / githubweb
- Update nvhpc to version 23.7 in the CI — crtrott / githubweb
- NVHPC 23.7 update: address reviewer comments — crtrott / githubweb
- OpenACC: Guard tests relying on abort — crtrott / githubweb
- Fix TestAtomic to use the test execspace — crtrott / githubweb
- Use NVHPC 23.7 for testing of OpenACC — crtrott / githubweb
- Update nvhpc gtest skip message — crtrott / githubweb
- Work around OpenMPTarget failure — crtrott / githubweb
- Update base docker file for nvhpc — crtrott / githubweb
- Remove stray Cuda graph pattern specialization from tag — Damien L-G / githubweb
- Fix typo — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Fix uninitialize variable warning with gcc 13 — Bruno Turcksin / githubweb
- HIP: Update print_configuration (#6387) — noreply / githubweb
- Add test — Daniel Arndt / githubweb
- Fix typo. — noreply / githubweb
- Check AVX, AVX2, AVX512XEON compiler macros on KokkosCore_config.h (#6248) — noreply / githubweb
- Don't use local headers or runtime in HPX backend due to deprecation — mikael.simberg / githubweb
- Explicitly check for valid device id — Daniel Arndt / githubweb
- Add a more comprehensive `kokkos_{malloc, free}` perf_test (#6377) — noreply / githubweb
- team-level std algos: part 3 (#6207) — noreply / githubweb
- Adding is_scoped_enum & to_underlying (#6356) — noreply / githubweb
- SIMD: add float simd support (#6177) — noreply / githubweb
- team-level std algos: part 4 (#6208) — noreply / githubweb
- Added a gen ctor for float (#6397) — noreply / githubweb
- team-level std algos: part 5 (#6209) — noreply / githubweb
- Deprecate Cuda(cudaStream_t, bool) — Damien L-G / githubweb
- Fixup checked interger operations death test — Damien L-G / githubweb
- Deprecate HIP(hipStream_t, bool) — Damien L-G / githubweb
- Let Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC be ON by default — Damien L-G / githubweb
- Print whether KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC is defined — Damien L-G / githubweb
- Introduce disable_malloc_async Cuda option with generated makefiles — Damien L-G / githubweb
- Preserve one build that disables Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC — Damien L-G / githubweb
- Use archive extraction time for timestamps — cezary.skrzynski / githubweb
- Disable performance benchmarks in AppVeyor CI — cezary.skrzynski / githubweb
- team-level std algos: part 6 (#6210) — noreply / githubweb
- address comments — fnrizzi / githubweb
- OpenMP backend refactor files. (#6403) — noreply / githubweb
- Drop (unused) `Cuda::cuda_internal_maximum_shared_words` — Damien L-G / githubweb
- Drop check that the host backend is initialized before the Cuda/HIP/SYCL one — Damien L-G / githubweb
- Drop unused HIPInternal::m_maxSharedWords data member — Damien L-G / githubweb
- Drop unused HIPInternal::m_hipArch static data member — Damien L-G / githubweb
- !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize — Damien L-G / githubweb
- Drop pre-Kepler logic in Cuda::impl_initialize — Damien L-G / githubweb
- use single — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- formatting — fnrizzi / githubweb
- Team-level std algos: part 7 (#6211) — noreply / githubweb
- formatting — fnrizzi / githubweb
- Enable death tests for fedora rawhide — cezary.skrzynski / githubweb
- core/src: Add half math functions to private header (#6124) — noreply / githubweb
- Drop check whether device supports unified addressing — Damien L-G / githubweb
- fix single as per Christian's suggestion — fnrizzi / githubweb
- Only warn once (at initialization) when forcing allocation in unified memory — Damien L-G / githubweb
- check-copyright improvements (#6399) — noreply / githubweb
- Use execution space instance argument to get device properties in block size deduction — Damien L-G / githubweb
- Address reviewer' comments — Bruno Turcksin / githubweb
- Fix to avoid #186-D pointless comparison warning. — maarten.arnst / githubweb
- add comment — fnrizzi / githubweb
- improve tests to address review — fnrizzi / githubweb
- Fix guard for isnan test for bhalf_t — Daniel Arndt / githubweb
- avoid potential race condition HIP — tccleve / githubweb
- Set the device id in cuda_kernel_arch — Daniel Arndt / githubweb
- [SYCL][Reduction] Group counter should use at least memory_order::acq_rel — andrei.elovikov / githubweb
- Same for scan — andrei.elovikov / githubweb
- Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy — Daniel Arndt / githubweb
- improve tests with intra-team result check — fnrizzi / githubweb
- Fixes for Kokkos::Array (#6372) — noreply / githubweb
- try fix for unique, previous impl to remove later — fnrizzi / githubweb
- #5635: Add parallel_scan changes for CUDA and TeamThreadRange — cezary.skrzynski / githubweb
- remove old impl — fnrizzi / githubweb
- #5635: Enable TeamThreadRange test for CUDA — cezary.skrzynski / githubweb
- Clean up benchmarks/gups — cwpears / githubweb
- benchmark/gups: use CMake — cwpears / githubweb
- OpenMPTarget: Disable check for SIMD compiler macros — Daniel Arndt / githubweb
- #5635: Add parallel_scan with value for CUDA and ThreadVectorRange — cezary.skrzynski / githubweb
- add missing assert — fnrizzi / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- add intra team check for missing test — fnrizzi / githubweb
- fix intel compile error — fnrizzi / githubweb
- fix unreachable for intel — fnrizzi / githubweb
- re-enable unit tests for sort and random via makefile (#6422) — noreply / githubweb
- OpenMPTarget init-join fix (#6444) — noreply / githubweb
- Fix Cuda parallel_scan ThreadVectorRange range — Daniel Arndt / githubweb
- Assign final sum in Cuda parallel_scan ThreadVectorRange — Daniel Arndt / githubweb
- Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443) — noreply / githubweb
- std_algos: improving min, max, minmax (#6421) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426) — noreply / githubweb
- Skip bessel function tests known to fail on Intel GPUs (#6434) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436) — noreply / githubweb
- Fix race condition in functor_vec_scan_ret_val test — Daniel Arndt / githubweb
- Fix parallel_scan_with_reducers test — Daniel Arndt / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425) — noreply / githubweb
- improve tests (#6432) — noreply / githubweb
- improve tests (#6437) — noreply / githubweb
- Move final assignment to correct scope — cezary.skrzynski / githubweb
- fix casting warning in Random test — fnrizzi / githubweb
- Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449) — noreply / githubweb
- HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7 — Daniel Arndt / githubweb
- fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459) — noreply / githubweb
- improve tests to check intra-team result (#6431) — noreply / githubweb
- SIMD: Math functions should be in namespace Kokkos — Daniel Arndt / githubweb
- SYCL: Disable another bessel function test for Intel GPUs — Daniel Arndt / githubweb
- Check for default device — Daniel Arndt / githubweb
- team-level std algos: part 10 (#6256) — noreply / githubweb
- team-level std algos: part 11 (#6258) — noreply / githubweb
- #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302) — noreply / githubweb
- #5635: Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- use shortcut — cezary.skrzynski / githubweb
- Skip testing for non-power-of-two team sizes — cezary.skrzynski / githubweb
- Fix formatting — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for HIP backend — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- #5635: SYCL: Add parallel_scan overload with return value — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1 — Daniel Arndt / githubweb
- Add compatible copy assignment operator to DualView — maarten.arnst / githubweb
- Fix minimum version for Google benchmark — Daniel Arndt / githubweb
- Add test of copy constructor/assignment operator for DualView. — maarten.arnst / githubweb
- Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU — maarten.arnst / githubweb
- Modify test so that source and destination view are of different type — maarten.arnst / githubweb
- Use call operator instead of run_me function — maarten.arnst / githubweb
- team-level std algos: part 12 (#6350) — noreply / githubweb
- core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407) — noreply / githubweb
- Moving abort and assert into their own public headers (#6445) — noreply / githubweb
- Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- Allow detecting SIMD types based on compiler macros (#6188) — noreply / githubweb
- Add KOKKOS_ARCH_ARM_NEON — Daniel Arndt / githubweb
- Fix implementation for cyl_bessel_i0 — Daniel Arndt / githubweb
- cleaning: remove iostream from headers where possible (IWYU) — romin.tomasetti / githubweb
- Fix compiling SIMD unit tests on NVIDIA — Daniel Arndt / githubweb
- guards to ensure DBL_EPSILON return for POWER8,9 — ajpowel / githubweb
- Fix NVCC warnings (#6483) — noreply / githubweb
- team-level std algos: part 13 (#6351) — noreply / githubweb
- Also fix annotations for generator constructor for AVX512 and NEON — Daniel Arndt / githubweb
- #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange — cezary.skrzynski / githubweb
- Fix atomic operations bug for Min and Max (#6435) — noreply / githubweb
- Fix example/build_cmake_installed_different_compiler — Daniel Arndt / githubweb
- fix impl — fnrizzi / githubweb
- Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp — noreply / githubweb
- Split Kokkos_Threads_Parallel files — Bruno Turcksin / githubweb
- HPX: Implement TeamThread and ThreadVector parallel_scan with return value — Daniel Arndt / githubweb
- Serial: Allow for distinct execution space instances (#6441) — noreply / githubweb
- add overload for TeamThreadRange — fnrizzi / githubweb
- address review comment — fnrizzi / githubweb
- Update to HIP TeamPolicy Block number heuristic (#6284) — noreply / githubweb
- SIMD: Split math functions from SIMD_Common.hpp (#6487) — noreply / githubweb
- Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON — Daniel Arndt / githubweb
- ad threadvector — fnrizzi / githubweb
- fix order — fnrizzi / githubweb
- remove guards — fnrizzi / githubweb
- UnorderedMap(space instance): proposal for #6067 — romin.tomasetti / githubweb
- Rename Kokkos_ThreadsExec to align with the other backends — Bruno Turcksin / githubweb
- Promote Kokkos_Printf.hpp to public include — Daniel Arndt / githubweb
- simd: add floor, ceil, round, trunc operations (#6393) — noreply / githubweb
- Update CI in OpenMPTarget to use llvm-17 (#6472) — noreply / githubweb
- Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp — Bruno Turcksin / githubweb
- Kokkos_SIMD_Scalar.hpp: remove extra ';' — ndellin / githubweb
- add guards — fnrizzi / githubweb
- avoid auto — fnrizzi / githubweb
- [ci skip] Update Kokkos version to 4.2.99 — Daniel Arndt / githubweb
- [deprecated code 3] remove all default device init tests — Damien L-G / githubweb
- [deprecated code 3] remove InitArguments — Damien L-G / githubweb
- [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants — Damien L-G / githubweb
- [deprecated code 3] remove {OpenMP,HPX}::partition_master — Damien L-G / githubweb
- OpenMP backend cleanup following removal of deprecated code 3 — Damien L-G / githubweb
- Get rid of FIXME_OPENMP — Damien L-G / githubweb
- [deprecated code 3] remove MasterLock — Damien L-G / githubweb
- [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option — Damien L-G / githubweb
- fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- Do not append " - blocks" to the bitset label — Damien L-G / githubweb
- with_updated_label -> append_to_label — Daniel Arndt / githubweb
- SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables — Daniel Arndt / githubweb
- Fixup in README (github -> GitHub) — Damien L-G / githubweb
- Check that device associated with stream matches requested device — Daniel Arndt / githubweb
- Threads: replace enum with constexpr int and enum class (#6514) — noreply / githubweb
- Added unit tests for reduction ops and few intel svml intrinsics — donlee / githubweb
- Remove sleep and wake functions — Bruno Turcksin / githubweb
- Remove extra constructor — Daniel Arndt / githubweb
- Prefer defaulted default constructor for Bitset (#6524) — noreply / githubweb
- SYCL: Use host-pinned memory to copy reduction/scan result (#6500) — noreply / githubweb
- Remove unused variables — Bruno Turcksin / githubweb
- Remove Sentinel struct from Threads — Bruno Turcksin / githubweb
- Small cleanup of ThreadsInternal::initialize — Bruno Turcksin / githubweb
- Remove useless forward declaration — Bruno Turcksin / githubweb
- Remove spawn function — Bruno Turcksin / githubweb
- Add comments — Bruno Turcksin / githubweb
- Fix indentation — Bruno Turcksin / githubweb
- Fix typo in macro guard — Bruno Turcksin / githubweb
- Reduce number of View constructor instantiations — Damien L-G / githubweb
- Bump HPX version used in CI to 1.9.0 — mikael.simberg / githubweb
- Split files in HIP backend — Bruno Turcksin / githubweb
- Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544) — noreply / githubweb
- Only define STDALGO_TEAM_SOURCES_* once — Daniel Arndt / githubweb
- Remove logical memory spaces — Damien L-G / githubweb
- Rollback changes to view constructors to reduce the number of instantiations (#6564) — noreply / githubweb
- Address reviewer comments — Daniel Arndt / githubweb
- Threads remove unused variables and functions (#6566) — noreply / githubweb
- Remove unused Sandia testing files (#6568) — noreply / githubweb
- fallback implementation cleanup — donlee / githubweb
- Remove empty quotation marks for static_assert — Daniel Arndt / githubweb
- [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header — Damien L-G / githubweb
- Drop Clang+CUDA workaround — Damien L-G / githubweb
- OpenMPTarget: CI compiler upgrade. (#6545) — noreply / githubweb
- m_cudaDev isn't static anymore — Daniel Arndt / githubweb
- Add crtrott's launch_latency benchmark (#6379) — noreply / githubweb
- Simplify fence functions in the Threads backend (#6571) — noreply / githubweb
- simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574) — noreply / githubweb
- Set the device id explicitly for CUDA API calls in impl_initialize — Daniel Arndt / githubweb
- OpenMPTarget: Guard scratch memory usage in ParallelReduce — Daniel Arndt / githubweb
- SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534) — noreply / githubweb
- Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header — Damien L-G / githubweb
- Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556) — noreply / githubweb
- OpenMP: No memset in viewfill (#6573) — noreply / githubweb
- Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header" — noreply / githubweb
- OpenACC: add atomics support (#6446) — noreply / githubweb
- Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543) — noreply / githubweb
- kokkos(unique): fix allocation of temporary view to enfore using the provided space instance — romin.tomasetti / githubweb
- Use binary wrapper for consistency in definition of half types numeric traits (#6590) — noreply / githubweb
- try fix — fnrizzi / githubweb
- Fix TestNumericTriats.hpp for SYCL with bfloat16 support — Daniel Arndt / githubweb
- Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3 — crtrott / githubweb
- Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601) — noreply / githubweb
- Added missing operator* to NEON simd — crtrott / githubweb
- [ci skip] Update changelog on develop for 4.2.00 (#6592) — noreply / githubweb
- Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF (#6593) — noreply / githubweb
- try fix — fnrizzi / githubweb
- avoid pyt package — fnrizzi / githubweb
- try — fnrizzi / githubweb
- fix for macos — fnrizzi / githubweb
- remove comments — fnrizzi / githubweb
- use reference — crtrott / githubweb
- add branching — fnrizzi / githubweb
- [ci skip] fix formatting — cezary.skrzynski / githubweb
- GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing — Daniel Arndt / githubweb
- nvcc(wrapper): adding missing `--generate-line-info` arg — romin.tomasetti / githubweb
- Add clang-format check to GitHub workflows (#6612) — noreply / githubweb
- graph(HIP): adding inline keyword to fix #6623 — romin.tomasetti / githubweb
- Add jenkins multibranch pipeline options — Bruno Turcksin / githubweb
- Add warp sync for Cuda parallel reduce — tccleve / githubweb
- kokkos(profiling): do not finalize in any backend — romin.tomasetti / githubweb
- Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format — Bruno Turcksin / githubweb
- update comment to include final() mention — tccleve / githubweb
- Disabling OpenACC in the CI because it emits too many warnings — Bruno Turcksin / githubweb
- unorderedmap: modernize traits — romin.tomasetti / githubweb
- nvcc wrapper: remove troubling flag to fix 6628 (#6629) — noreply / githubweb
- tools(profiling): type (related to kokkos/kokkos-tools/pull/221) — romin.tomasetti / githubweb
- This PR fixes the too-much-OpenACC-warning issue, mentioned in PR #6639. — lees2 / githubweb
- add missing header fix #6644 — fnrizzi / githubweb
- SYCL: Restrict workaround for is_device_copyable to oneAPI versions before 2024.0.0 (#6532) — noreply / githubweb
- Fixup test math functions ulp should double -> int — Damien L-G / githubweb
- Drop DualView converting copy assignment operator — Damien L-G / githubweb
- Don't use rocm-docker for clang-format — Daniel Arndt / githubweb
- Diable HIP CI — Daniel Arndt / githubweb
- Remove deprecation warning for AllocationMechanism for gcc <11.0 — Daniel Arndt / githubweb
- OpenMPTarget: clang extensions for dynamic shared memory. (#6380) — noreply / githubweb
- Fix builtin_unreachable use for MSVC/CUDA — crtrott / githubweb
- Fix missing include on msvc/cuda — crtrott / githubweb
- Avoid lambdas in constexpr branch for msvc/cuda — crtrott / githubweb
- Sidestep lacking CTAD support msvc/cuda — crtrott / githubweb
- Fix formatting — crtrott / githubweb
- Move header for Damien because he is right — crtrott / githubweb
- Unit test for issue 3371 (negative vector length should not yield a negative max_team_size) (#6076) — noreply / githubweb
- Add CMakeLists.txt for stream benchmark — cwpears / githubweb
- Do not negate the dependent true traits helper — Damien L-G / githubweb
- Drop guards to accommodate external code defining KOKKOS_ASSERT — Damien L-G / githubweb
- Use omp_get_max_active_levels() when supported — Daniel Arndt / githubweb
- Add missing gfx940 — rberger / githubweb
- Add Impl::always_false type-dendent false trait — Damien L-G / githubweb
- Per review prefer always_false<Arg>::value to is_void_v<Arg> — Damien L-G / githubweb
- Improve "no copy mechanism" exception message — bmkelle / githubweb
- Add a unit test for new deep_copy exception msg — bmkelle / githubweb
- Add missing include sstream — bmkelle / githubweb
- src->source, dst->destination — bmkelle / githubweb
- Workaround for ROCm 6.0 failing to compile with AVX2 SIMD support — Bruno Turcksin / githubweb
- SYCL: Force inlining of Kokkos::printf (#6650) — noreply / githubweb
- Improve handling of printf in OMPT on Intel GPUs — Daniel Arndt / githubweb
- OpenMP: Use `omp_get_nested` for older gcc versions (#6685) — noreply / githubweb
- Disable more Bessel tests for SYCL on INtel GPUs — Daniel Arndt / githubweb
- fill_random without exceution space instance should fence — Daniel Arndt / githubweb
- Drop unnecessary guarding for a tool library being loaded in ProfilingSection — Damien L-G / githubweb
- Drop unnecessary header include in Kokkos_Profiling_ProfileSection.hpp — Damien L-G / githubweb
- #5333: CUDA: Use scratch space appropriate to small reduction elements in Team reductions (#5334) — noreply / githubweb
- Cuda: Allocate using the correct device (#6392) — noreply / githubweb
- Let `Profiling::ProfilingSection(std::string)` constructor be explicit and nodiscard (#6690) — noreply / githubweb
- Cosmetic changes to ProfilingSection — Damien L-G / githubweb
- GitHub CI: Test with AddressSanitizer (#6676) — noreply / githubweb
- Kokkos::Array deduction guide (#6373) — noreply / githubweb
- Add CI for MSVC+Cuda (#6661) — noreply / githubweb
- SYCL: Address deprecations after oneAPI 2023.2.0 (#6577) — noreply / githubweb
- Fixup cast tolerance to double before printing — Damien L-G / githubweb
- Try linking against CUDA libararies even with KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE — Daniel Arndt / githubweb
- test_array_ctad: disable test for intel versions < 2021 — ndellin / githubweb
- Enable `{transform_}exclusive_scan` in place (#6667) — noreply / githubweb
- Add `ATOMICS_BYPASS` configuration option to disable atomics (#6692) — noreply / githubweb
- Check matching static extents in View constructor (#5190) — noreply / githubweb
- Remove Kokkos::[b]half_t volatile overloads (#6579) — noreply / githubweb
- add tests — fnrizzi / githubweb
- Provide `kokkos_swap` as part of Core and deprecate `Experimental::swap` in Algorithms (#6697) — noreply / githubweb
- Provide new public headers `<Kokkos_Clamp.hpp>` and `<Kokkos_MinMax.hpp>` (#6687) — noreply / githubweb
- Fix TeamThreadMDRange parallel_reduce (#6511) — noreply / githubweb
- add tests for in-place `inclusive_scan` (#6682) — noreply / githubweb
- Drop pointless Kokkos::Impl::CudaExec forward declaration — Damien L-G / githubweb
- Don't use the compiler launcher script if the compile language is CUDA. (#6704) — noreply / githubweb
- Deprecate `{Cuda,HIP}::detect_device_count()` and `Cuda::[detect_]device_arch()` (#6710) — noreply / githubweb
- Get rid of CudaInternal::cuda_get_error_{name,string}_wrapper — Damien L-G / githubweb
- No need to jump through so many hoops to print the error message — Damien L-G / githubweb
- HIP: Forgot to delete matching brace closing the namespace — Damien L-G / githubweb
- Make initialize and finalize of the Cuda/HIP singleton less special (#6714) — noreply / githubweb
- Kokkos_HIP.cpp: include Kokkos_Core.hpp to resolve errors — ndellin / githubweb
- Add bound checks in RangePolicy and MDRangePolicy (#6617) — noreply / githubweb
- Temporary fix to reenable HIP CI — Bruno Turcksin / githubweb
- Let the smart pointer manage the CUDA/HIP stream (#6721) — noreply / githubweb
- Fix Docker env variables — Bruno Turcksin / githubweb
- Ensure view_allocation_error does not silently ignore that no exception was thrown — Damien L-G / githubweb
- Add RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc enumerator — Damien L-G / githubweb
- [OpenACC] throw if acc_malloc returned nullptr — Damien L-G / githubweb
- Fixup using declaration — Damien L-G / githubweb
- Disable openacc.view_allocation_error test — Damien L-G / githubweb
- Guard `[MD]RangePolicy` precondition check for deprecated code 4 (#6726) — noreply / githubweb
- Add C++26 standard to CMake Setup — dev / githubweb
- Add support for C++26 in generated makefiles — Damien L-G / githubweb
- Add KOKKOS_ENABLE_CXX26 to the configuration metadata — Damien L-G / githubweb
- Reenable HIP testing — Bruno Turcksin / githubweb
- Disabling failing HIP test in the CI — Bruno Turcksin / githubweb
- Use team_size_max to fix "Team size too large" error in reducer test (#6725) — noreply / githubweb
- Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with `SharedAllocationRecord`) (#6732) — noreply / githubweb
- Untangle SharedAllocationRecord spaghetti code — Damien L-G / githubweb
- Fix TestThreadVectorMDRangeParallelReduce (#6734) — noreply / githubweb
- Cuda multi-GPU support: Allow execution space instance constructor to run (#6706) — noreply / githubweb
- Drop support for deprecated command-line arguments and environment variables (#6744) — noreply / githubweb
- Avoid unnecessary zero-memset of the scratch flags in SYCL (#6739) — noreply / githubweb
#3 (Jan 25, 2024, 10:41:36 AM)
- implementation and tests — fnrizzi / githubweb
- implementation and tests — fnrizzi / githubweb
- only compute with relavent entries — tccleve / githubweb
- subset of team level impl of std algorithms — fnrizzi / githubweb
- fix copyright — fnrizzi / githubweb
- guard for openmptarget — fnrizzi / githubweb
- fix for openmptarget — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- SIMD: suppress a uninitialized variable warning (#6294) — noreply / githubweb
- OpenACC CMakechange Clacc (#6250) — noreply / githubweb
- `Kokkos::sort` support custom comparator (#6253) — noreply / githubweb
- Add nightly build using latest gcc and c++23 — Bruno Turcksin / githubweb
- Explicitly capture this in lambda function — Bruno Turcksin / githubweb
- Fix typo in nightly jenkins configuration — Bruno Turcksin / githubweb
- Fix a memory bug in the `free_state` function of random pools. (#6290) — noreply / githubweb
- std_algos: for_each: try condense the impl — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Enforce create_mirror restrictions w.r.t. ViewCtorArgs on all variants — Daniel Arndt / githubweb
- Use KOKKOS_IF_ON_HOST — Daniel Arndt / githubweb
- Suppress warnings — Daniel Arndt / githubweb
- Don't suppress warnings for NVHPC — Daniel Arndt / githubweb
- Improve macro definitions — Daniel Arndt / githubweb
- Enable Serial backend in HPX build — cezary.skrzynski / githubweb
- Modify fences in View API test — cezary.skrzynski / githubweb
- Check for overflow during backend initialization (Cuda, HIP, SYCL) (#6159) — noreply / githubweb
- Improve SYCL reduction performance: MDRangePolicy (#6271) — noreply / githubweb
- bring back previous code as discussed in meeting — fnrizzi / githubweb
- create cudaAPI function wrappers — tccleve / githubweb
- Reorganize #include <cuda_runtime_api.h> — tccleve / githubweb
- Some api function require cuda11.2+ — tccleve / githubweb
- Cuda10 requires "stream=nullptr" as default arg — tccleve / githubweb
- Rework stream inputs — tccleve / githubweb
- Use "if constexpr" for setCudaDevice — tccleve / githubweb
- Remove static in comment — tccleve / githubweb
- Update CI: use CUDA 11.6.2 instead of 11.6.0 (#6314) — noreply / githubweb
- add helper variable templates `are_*_iterators_v` (#6312) — noreply / githubweb
- [HIP] Optimize parallel_reduce (#6229) — noreply / githubweb
- Split Kokkos_SYCL_Parallel_Range.hpp — Daniel Arndt / githubweb
- Split Kokkos_SYCL_Parallel_Reduce.hpp — Daniel Arndt / githubweb
- Move Kokkos_SYCL_Scan.hpp — Daniel Arndt / githubweb
- Split Kokkos_Parallel_Team.hpp — Daniel Arndt / githubweb
- Add comment that we don't need to link against clang_rt with ROCm 5.5 — Bruno Turcksin / githubweb
- Error out when CXX standard is not set when using amdclang or cray clang — Bruno Turcksin / githubweb
- Disable failing tests for ROCm 5.5 and 5.6 — Bruno Turcksin / githubweb
- #5635: Add parallel_scan overload with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
- #5635: Serial/OpenMP: Parallel_scan with return value for TeamThreadRange (#6090) — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Allow using the SYCL execution space on AMD GPUs (#6321) — noreply / githubweb
- Update Clang+Cuda CI from 10.1 to 11.0.3 (#6318) — noreply / githubweb
- SIMD: add shift ops for all int types (#6109) — noreply / githubweb
- SYCL: Use sycl::bit_cast from oneAPI 2024.0.0 on (#6300) — noreply / githubweb
- remove spurious undefs — fnrizzi / githubweb
- Decrease maximum memory available to ccache — Bruno Turcksin / githubweb
- Fix gtest when using C++20 — Bruno Turcksin / githubweb
- Fully qualify Experimental::SYCL in algorithms to avoid conflicting namespaces — Daniel Arndt / githubweb
- address reviews [skip ci] — antoine.meyer54 / githubweb
- formatting — fnrizzi / githubweb
- SIMD: split simd unit tests into separate files (#6278) — noreply / githubweb
- Run NVHPC only on V100 — Bruno Turcksin / githubweb
- Use checked arithmetic builtins for overflow detection (#6313) — noreply / githubweb
- Passing size==0 to DeepCopy/memcpy/omp_target_memcpy (#6273) — noreply / githubweb
- allow sorting via native oneDPL to support views with stride = 1 (#6322) — noreply / githubweb
- Adopt new HIP cmake's way of finding clang-rt — nicurtis / githubweb
- Fix #6334: intel compiler does not like returning inside if constexpr (#6335) — noreply / githubweb
- Fix #6336: capturing `this` implicitly is deprecated in C++20 (#6337) — noreply / githubweb
- fix lambda capture — fnrizzi / githubweb
- remove unnecessary file, fix constraints — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix — fnrizzi / githubweb
- fix lambda capture and constraints — fnrizzi / githubweb
- keep only subset — fnrizzi / githubweb
- revert files — fnrizzi / githubweb
- remove file — fnrizzi / githubweb
- fix syntax — fnrizzi / githubweb
- format — fnrizzi / githubweb
- Disable default oneDPL support in Trilinos — Daniel Arndt / githubweb
- Rename AMD GPU architectures (#6266) — noreply / githubweb
- Fix compiling SIMD libraray with NEON and gcc-13 — Daniel Arndt / githubweb
- SIMD: add generator constructors (#6347) — noreply / githubweb
- Use /usr/bin/env bash in nvcc_wrapper and kokkos_launch_compiler for portability — mikael.simberg / githubweb
- Fix cudaAPI wrapper errors for CUDA_MALLOC_ASYNC (#6346) — noreply / githubweb
- Use std::aligned_alloc for allocations (#6341) — noreply / githubweb
- Initial implementation of gfx942 (#6358) — noreply / githubweb
- Introduce constructor for multi-GPU support. — Daniel Arndt / githubweb
- Only set KOKKOS_ARCH_AMD_GPU if a AMD GPU architecture is enabled — Daniel Arndt / githubweb
- Extend 'hip_driver_check_page_migration' (#6364) — noreply / githubweb
- Replace volatile m_ready_count loads by desul::atomic_loads in task queue implementation — mikael.simberg / githubweb
- Add Kokkos_Atomic.hpp headers to OpenMP and HPX task headers — mikael.simberg / githubweb
- Add support for HIP Graph — Bruno Turcksin / githubweb
- Replace one of the ROCM 5.2 configuration with ROCM 5.6 — Bruno Turcksin / githubweb
- Use aligned version of operator new instead of aligned_alloc — Daniel Arndt / githubweb
- HPX: Don't interfere with exception handling — Daniel Arndt / githubweb
- Fix -Wformat-truncation warnings in CI (#6354) — noreply / githubweb
- Do not use HIP Graph with ROCm 5.2 — Bruno Turcksin / githubweb
- Use constexpr West in src — Bruno Turcksin / githubweb
- Use constexpr West in test — Bruno Turcksin / githubweb
- SIMD: convert binary operators to hidden friends (#6320) — noreply / githubweb
- simd: make mask and condition unit test to check with all data types (#6360) — noreply / githubweb
- team-level std algos: part 2 (#6205) — noreply / githubweb
- Add deprecated attribute to HostSpace(AllocationMechanism) definition — Damien L-G / githubweb
- Added gather_from and scatter_to for AVX2 and AVX512 simd — donlee / githubweb
- Rebased and applied feedbacks — donlee / githubweb
- Changed gen ctor usage to pass in KOKKOS_LAMBDA — donlee / githubweb
- Suppress NVCC attribute ignored warning on SIMD operators as hidden friends — Damien L-G / githubweb
- Remove deprecated code 3 support for volatile join — crtrott / githubweb
- Disable a test not working with nvhpc-23.1 — crtrott / githubweb
- Reenabling tests for nvhpc 23.7 — crtrott / githubweb
- Update containers and algorithms for NVC++ 23.7 — crtrott / githubweb
- More NVC++ 23.7 updates — crtrott / githubweb
- NVC++ clang-format fixes — crtrott / githubweb
- Update nvhpc to version 23.7 in the CI — crtrott / githubweb
- NVHPC 23.7 update: address reviewer comments — crtrott / githubweb
- OpenACC: Guard tests relying on abort — crtrott / githubweb
- Fix TestAtomic to use the test execspace — crtrott / githubweb
- Use NVHPC 23.7 for testing of OpenACC — crtrott / githubweb
- Update nvhpc gtest skip message — crtrott / githubweb
- Work around OpenMPTarget failure — crtrott / githubweb
- Update base docker file for nvhpc — crtrott / githubweb
- Remove stray Cuda graph pattern specialization from tag — Damien L-G / githubweb
- Fix typo — noreply / githubweb
- Fix reviewer's comments — Bruno Turcksin / githubweb
- Fix uninitialize variable warning with gcc 13 — Bruno Turcksin / githubweb
- HIP: Update print_configuration (#6387) — noreply / githubweb
- Add test — Daniel Arndt / githubweb
- Fix typo. — noreply / githubweb
- Check AVX, AVX2, AVX512XEON compiler macros on KokkosCore_config.h (#6248) — noreply / githubweb
- Don't use local headers or runtime in HPX backend due to deprecation — mikael.simberg / githubweb
- Explicitly check for valid device id — Daniel Arndt / githubweb
- Add a more comprehensive `kokkos_{malloc, free}` perf_test (#6377) — noreply / githubweb
- team-level std algos: part 3 (#6207) — noreply / githubweb
- Adding is_scoped_enum & to_underlying (#6356) — noreply / githubweb
- SIMD: add float simd support (#6177) — noreply / githubweb
- team-level std algos: part 4 (#6208) — noreply / githubweb
- Added a gen ctor for float (#6397) — noreply / githubweb
- team-level std algos: part 5 (#6209) — noreply / githubweb
- Deprecate Cuda(cudaStream_t, bool) — Damien L-G / githubweb
- Fixup checked interger operations death test — Damien L-G / githubweb
- Deprecate HIP(hipStream_t, bool) — Damien L-G / githubweb
- Let Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC be ON by default — Damien L-G / githubweb
- Print whether KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC is defined — Damien L-G / githubweb
- Introduce disable_malloc_async Cuda option with generated makefiles — Damien L-G / githubweb
- Preserve one build that disables Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC — Damien L-G / githubweb
- Use archive extraction time for timestamps — cezary.skrzynski / githubweb
- Disable performance benchmarks in AppVeyor CI — cezary.skrzynski / githubweb
- team-level std algos: part 6 (#6210) — noreply / githubweb
- address comments — fnrizzi / githubweb
- OpenMP backend refactor files. (#6403) — noreply / githubweb
- Drop (unused) `Cuda::cuda_internal_maximum_shared_words` — Damien L-G / githubweb
- Drop check that the host backend is initialized before the Cuda/HIP/SYCL one — Damien L-G / githubweb
- Drop unused HIPInternal::m_maxSharedWords data member — Damien L-G / githubweb
- Drop unused HIPInternal::m_hipArch static data member — Damien L-G / githubweb
- !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize — Damien L-G / githubweb
- Drop pre-Kepler logic in Cuda::impl_initialize — Damien L-G / githubweb
- use single — fnrizzi / githubweb
- address comments — fnrizzi / githubweb
- formatting — fnrizzi / githubweb
- Team-level std algos: part 7 (#6211) — noreply / githubweb
- formatting — fnrizzi / githubweb
- Enable death tests for fedora rawhide — cezary.skrzynski / githubweb
- core/src: Add half math functions to private header (#6124) — noreply / githubweb
- Drop check whether device supports unified addressing — Damien L-G / githubweb
- fix single as per Christian's suggestion — fnrizzi / githubweb
- Only warn once (at initialization) when forcing allocation in unified memory — Damien L-G / githubweb
- check-copyright improvements (#6399) — noreply / githubweb
- Use execution space instance argument to get device properties in block size deduction — Damien L-G / githubweb
- Address reviewer' comments — Bruno Turcksin / githubweb
- Fix to avoid #186-D pointless comparison warning. — maarten.arnst / githubweb
- add comment — fnrizzi / githubweb
- improve tests to address review — fnrizzi / githubweb
- Fix guard for isnan test for bhalf_t — Daniel Arndt / githubweb
- avoid potential race condition HIP — tccleve / githubweb
- Set the device id in cuda_kernel_arch — Daniel Arndt / githubweb
- [SYCL][Reduction] Group counter should use at least memory_order::acq_rel — andrei.elovikov / githubweb
- Same for scan — andrei.elovikov / githubweb
- Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy — Daniel Arndt / githubweb
- improve tests with intra-team result check — fnrizzi / githubweb
- Fixes for Kokkos::Array (#6372) — noreply / githubweb
- try fix for unique, previous impl to remove later — fnrizzi / githubweb
- #5635: Add parallel_scan changes for CUDA and TeamThreadRange — cezary.skrzynski / githubweb
- remove old impl — fnrizzi / githubweb
- #5635: Enable TeamThreadRange test for CUDA — cezary.skrzynski / githubweb
- Clean up benchmarks/gups — cwpears / githubweb
- benchmark/gups: use CMake — cwpears / githubweb
- OpenMPTarget: Disable check for SIMD compiler macros — Daniel Arndt / githubweb
- #5635: Add parallel_scan with value for CUDA and ThreadVectorRange — cezary.skrzynski / githubweb
- add missing assert — fnrizzi / githubweb
- #5635: Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- add intra team check for missing test — fnrizzi / githubweb
- fix intel compile error — fnrizzi / githubweb
- fix unreachable for intel — fnrizzi / githubweb
- re-enable unit tests for sort and random via makefile (#6422) — noreply / githubweb
- OpenMPTarget init-join fix (#6444) — noreply / githubweb
- Fix Cuda parallel_scan ThreadVectorRange range — Daniel Arndt / githubweb
- Assign final sum in Cuda parallel_scan ThreadVectorRange — Daniel Arndt / githubweb
- Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443) — noreply / githubweb
- std_algos: improving min, max, minmax (#6421) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426) — noreply / githubweb
- Skip bessel function tests known to fail on Intel GPUs (#6434) — noreply / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436) — noreply / githubweb
- Fix race condition in functor_vec_scan_ret_val test — Daniel Arndt / githubweb
- Fix parallel_scan_with_reducers test — Daniel Arndt / githubweb
- team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425) — noreply / githubweb
- improve tests (#6432) — noreply / githubweb
- improve tests (#6437) — noreply / githubweb
- Move final assignment to correct scope — cezary.skrzynski / githubweb
- fix casting warning in Random test — fnrizzi / githubweb
- Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449) — noreply / githubweb
- HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7 — Daniel Arndt / githubweb
- fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459) — noreply / githubweb
- improve tests to check intra-team result (#6431) — noreply / githubweb
- SIMD: Math functions should be in namespace Kokkos — Daniel Arndt / githubweb
- SYCL: Disable another bessel function test for Intel GPUs — Daniel Arndt / githubweb
- Check for default device — Daniel Arndt / githubweb
- team-level std algos: part 10 (#6256) — noreply / githubweb
- team-level std algos: part 11 (#6258) — noreply / githubweb
- #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302) — noreply / githubweb
- #5635: Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- use shortcut — cezary.skrzynski / githubweb
- Skip testing for non-power-of-two team sizes — cezary.skrzynski / githubweb
- Fix formatting — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for HIP backend — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
- #5635: SYCL: Add parallel_scan overload with return value — cezary.skrzynski / githubweb
- Use std::is_same_v — cezary.skrzynski / githubweb
- OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1 — Daniel Arndt / githubweb
- Add compatible copy assignment operator to DualView — maarten.arnst / githubweb
- Fix minimum version for Google benchmark — Daniel Arndt / githubweb
- Add test of copy constructor/assignment operator for DualView. — maarten.arnst / githubweb
- Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU — maarten.arnst / githubweb
- Modify test so that source and destination view are of different type — maarten.arnst / githubweb
- Use call operator instead of run_me function — maarten.arnst / githubweb
- team-level std algos: part 12 (#6350) — noreply / githubweb
- core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407) — noreply / githubweb
- Moving abort and assert into their own public headers (#6445) — noreply / githubweb
- Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
- Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
- Allow detecting SIMD types based on compiler macros (#6188) — noreply / githubweb
- Add KOKKOS_ARCH_ARM_NEON — Daniel Arndt / githubweb
- Fix implementation for cyl_bessel_i0 — Daniel Arndt / githubweb
- cleaning: remove iostream from headers where possible (IWYU) — romin.tomasetti / githubweb
- Fix compiling SIMD unit tests on NVIDIA — Daniel Arndt / githubweb
- guards to ensure DBL_EPSILON return for POWER8,9 — ajpowel / githubweb
- Fix NVCC warnings (#6483) — noreply / githubweb
- team-level std algos: part 13 (#6351) — noreply / githubweb
- Also fix annotations for generator constructor for AVX512 and NEON — Daniel Arndt / githubweb
- #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange — cezary.skrzynski / githubweb
- Fix atomic operations bug for Min and Max (#6435) — noreply / githubweb
- Fix example/build_cmake_installed_different_compiler — Daniel Arndt / githubweb
- fix impl — fnrizzi / githubweb
- Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp — noreply / githubweb
- Split Kokkos_Threads_Parallel files — Bruno Turcksin / githubweb
- HPX: Implement TeamThread and ThreadVector parallel_scan with return value — Daniel Arndt / githubweb
- Serial: Allow for distinct execution space instances (#6441) — noreply / githubweb
- add overload for TeamThreadRange — fnrizzi / githubweb
- address review comment — fnrizzi / githubweb
- Update to HIP TeamPolicy Block number heuristic (#6284) — noreply / githubweb
- SIMD: Split math functions from SIMD_Common.hpp (#6487) — noreply / githubweb
- Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON — Daniel Arndt / githubweb
- ad threadvector — fnrizzi / githubweb
- fix order — fnrizzi / githubweb
- remove guards — fnrizzi / githubweb
- UnorderedMap(space instance): proposal for #6067 — romin.tomasetti / githubweb
- Rename Kokkos_ThreadsExec to align with the other backends — Bruno Turcksin / githubweb
- Promote Kokkos_Printf.hpp to public include — Daniel Arndt / githubweb
- simd: add floor, ceil, round, trunc operations (#6393) — noreply / githubweb
- Update CI in OpenMPTarget to use llvm-17 (#6472) — noreply / githubweb
- Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp — Bruno Turcksin / githubweb
- Kokkos_SIMD_Scalar.hpp: remove extra ';' — ndellin / githubweb
- add guards — fnrizzi / githubweb
- avoid auto — fnrizzi / githubweb
- [ci skip] Update Kokkos version to 4.2.99 — Daniel Arndt / githubweb
- [deprecated code 3] remove all default device init tests — Damien L-G / githubweb
- [deprecated code 3] remove InitArguments — Damien L-G / githubweb
- [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants — Damien L-G / githubweb
- [deprecated code 3] remove {OpenMP,HPX}::partition_master — Damien L-G / githubweb
- OpenMP backend cleanup following removal of deprecated code 3 — Damien L-G / githubweb
- Get rid of FIXME_OPENMP — Damien L-G / githubweb
- [deprecated code 3] remove MasterLock — Damien L-G / githubweb
- [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option — Damien L-G / githubweb
- fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
- Do not append " - blocks" to the bitset label — Damien L-G / githubweb
- with_updated_label -> append_to_label — Daniel Arndt / githubweb
- SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables — Daniel Arndt / githubweb
- Fixup in README (github -> GitHub) — Damien L-G / githubweb
- Check that device associated with stream matches requested device — Daniel Arndt / githubweb
- Threads: replace enum with constexpr int and enum class (#6514) — noreply / githubweb
- Added unit tests for reduction ops and few intel svml intrinsics — donlee / githubweb
- Remove sleep and wake functions — Bruno Turcksin / githubweb
- Remove extra constructor — Daniel Arndt / githubweb
- Prefer defaulted default constructor for Bitset (#6524) — noreply / githubweb
- SYCL: Use host-pinned memory to copy reduction/scan result (#6500) — noreply / githubweb
- Remove unused variables — Bruno Turcksin / githubweb
- Remove Sentinel struct from Threads — Bruno Turcksin / githubweb
- Small cleanup of ThreadsInternal::initialize — Bruno Turcksin / githubweb
- Remove useless forward declaration — Bruno Turcksin / githubweb
- Remove spawn function — Bruno Turcksin / githubweb
- Add comments — Bruno Turcksin / githubweb
- Fix indentation — Bruno Turcksin / githubweb
- Fix typo in macro guard — Bruno Turcksin / githubweb
- Reduce number of View constructor instantiations — Damien L-G / githubweb
- Bump HPX version used in CI to 1.9.0 — mikael.simberg / githubweb
- Split files in HIP backend — Bruno Turcksin / githubweb
- Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544) — noreply / githubweb
- Only define STDALGO_TEAM_SOURCES_* once — Daniel Arndt / githubweb
- Remove logical memory spaces — Damien L-G / githubweb
- Rollback changes to view constructors to reduce the number of instantiations (#6564) — noreply / githubweb
- Address reviewer comments — Daniel Arndt / githubweb
- Threads remove unused variables and functions (#6566) — noreply / githubweb
- Remove unused Sandia testing files (#6568) — noreply / githubweb
- fallback implementation cleanup — donlee / githubweb
- Remove empty quotation marks for static_assert — Daniel Arndt / githubweb
- [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header — Damien L-G / githubweb
- Drop Clang+CUDA workaround — Damien L-G / githubweb
- OpenMPTarget: CI compiler upgrade. (#6545) — noreply / githubweb
- m_cudaDev isn't static anymore — Daniel Arndt / githubweb
- Add crtrott's launch_latency benchmark (#6379) — noreply / githubweb
- Simplify fence functions in the Threads backend (#6571) — noreply / githubweb
- simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574) — noreply / githubweb
- Set the device id explicitly for CUDA API calls in impl_initialize — Daniel Arndt / githubweb
- OpenMPTarget: Guard scratch memory usage in ParallelReduce — Daniel Arndt / githubweb
- SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534) — noreply / githubweb
- Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header — Damien L-G / githubweb
- Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556) — noreply / githubweb
- OpenMP: No memset in viewfill (#6573) — noreply / githubweb
- Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header" — noreply / githubweb
- OpenACC: add atomics support (#6446) — noreply / githubweb
- Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543) — noreply / githubweb
- kokkos(unique): fix allocation of temporary view to enfore using the provided space instance — romin.tomasetti / githubweb
- Use binary wrapper for consistency in definition of half types numeric traits (#6590) — noreply / githubweb
- try fix — fnrizzi / githubweb
- Fix TestNumericTriats.hpp for SYCL with bfloat16 support — Daniel Arndt / githubweb
- Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3 — crtrott / githubweb
- Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601) — noreply / githubweb
- Added missing operator* to NEON simd — crtrott / githubweb
- [ci skip] Update changelog on develop for 4.2.00 (#6592) — noreply / githubweb
- Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF (#6593) — noreply / githubweb
- try fix — fnrizzi / githubweb
- avoid pyt package — fnrizzi / githubweb
- try — fnrizzi / githubweb
- fix for macos — fnrizzi / githubweb
- remove comments — fnrizzi / githubweb
- use reference — crtrott / githubweb
- add branching — fnrizzi / githubweb
- [ci skip] fix formatting — cezary.skrzynski / githubweb
- GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing — Daniel Arndt / githubweb
- nvcc(wrapper): adding missing `--generate-line-info` arg — romin.tomasetti / githubweb
- Add clang-format check to GitHub workflows (#6612) — noreply / githubweb
- graph(HIP): adding inline keyword to fix #6623 — romin.tomasetti / githubweb
- Add jenkins multibranch pipeline options — Bruno Turcksin / githubweb
- Add warp sync for Cuda parallel reduce — tccleve / githubweb
- kokkos(profiling): do not finalize in any backend — romin.tomasetti / githubweb
- Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format — Bruno Turcksin / githubweb
- update comment to include final() mention — tccleve / githubweb
- Disabling OpenACC in the CI because it emits too many warnings — Bruno Turcksin / githubweb
- unorderedmap: modernize traits — romin.tomasetti / githubweb
- nvcc wrapper: remove troubling flag to fix 6628 (#6629) — noreply / githubweb
- tools(profiling): type (related to kokkos/kokkos-tools/pull/221) — romin.tomasetti / githubweb
- This PR fixes the too-much-OpenACC-warning issue, mentioned in PR #6639. — lees2 / githubweb
- add missing header fix #6644 — fnrizzi / githubweb
- SYCL: Restrict workaround for is_device_copyable to oneAPI versions before 2024.0.0 (#6532) — noreply / githubweb
- Fixup test math functions ulp should double -> int — Damien L-G / githubweb
- Drop DualView converting copy assignment operator — Damien L-G / githubweb
- Don't use rocm-docker for clang-format — Daniel Arndt / githubweb
- Diable HIP CI — Daniel Arndt / githubweb
- Remove deprecation warning for AllocationMechanism for gcc <11.0 — Daniel Arndt / githubweb
- OpenMPTarget: clang extensions for dynamic shared memory. (#6380) — noreply / githubweb
- Fix builtin_unreachable use for MSVC/CUDA — crtrott / githubweb
- Fix missing include on msvc/cuda — crtrott / githubweb
- Avoid lambdas in constexpr branch for msvc/cuda — crtrott / githubweb
- Sidestep lacking CTAD support msvc/cuda — crtrott / githubweb
- Fix formatting — crtrott / githubweb
- Move header for Damien because he is right — crtrott / githubweb
- Unit test for issue 3371 (negative vector length should not yield a negative max_team_size) (#6076) — noreply / githubweb
- Add CMakeLists.txt for stream benchmark — cwpears / githubweb
- Do not negate the dependent true traits helper — Damien L-G / githubweb
- Drop guards to accommodate external code defining KOKKOS_ASSERT — Damien L-G / githubweb
- Use omp_get_max_active_levels() when supported — Daniel Arndt / githubweb
- Add missing gfx940 — rberger / githubweb
- Add Impl::always_false type-dendent false trait — Damien L-G / githubweb
- Per review prefer always_false<Arg>::value to is_void_v<Arg> — Damien L-G / githubweb
- Improve "no copy mechanism" exception message — bmkelle / githubweb
- Add a unit test for new deep_copy exception msg — bmkelle / githubweb
- Add missing include sstream — bmkelle / githubweb
- src->source, dst->destination — bmkelle / githubweb
- Workaround for ROCm 6.0 failing to compile with AVX2 SIMD support — Bruno Turcksin / githubweb
- SYCL: Force inlining of Kokkos::printf (#6650) — noreply / githubweb
- Improve handling of printf in OMPT on Intel GPUs — Daniel Arndt / githubweb
- OpenMP: Use `omp_get_nested` for older gcc versions (#6685) — noreply / githubweb
- Disable more Bessel tests for SYCL on INtel GPUs — Daniel Arndt / githubweb
- fill_random without exceution space instance should fence — Daniel Arndt / githubweb
- Drop unnecessary guarding for a tool library being loaded in ProfilingSection — Damien L-G / githubweb
- Drop unnecessary header include in Kokkos_Profiling_ProfileSection.hpp — Damien L-G / githubweb
- #5333: CUDA: Use scratch space appropriate to small reduction elements in Team reductions (#5334) — noreply / githubweb
- Cuda: Allocate using the correct device (#6392) — noreply / githubweb
- Let `Profiling::ProfilingSection(std::string)` constructor be explicit and nodiscard (#6690) — noreply / githubweb
- Cosmetic changes to ProfilingSection — Damien L-G / githubweb
- GitHub CI: Test with AddressSanitizer (#6676) — noreply / githubweb
- Kokkos::Array deduction guide (#6373) — noreply / githubweb
- Add CI for MSVC+Cuda (#6661) — noreply / githubweb
- SYCL: Address deprecations after oneAPI 2023.2.0 (#6577) — noreply / githubweb
- Fixup cast tolerance to double before printing — Damien L-G / githubweb
- Try linking against CUDA libararies even with KOKKOS_ENABLE_COMPILE_AS_CMAKE_LANGUAGE — Daniel Arndt / githubweb
- test_array_ctad: disable test for intel versions < 2021 — ndellin / githubweb
- Enable `{transform_}exclusive_scan` in place (#6667) — noreply / githubweb
- Add `ATOMICS_BYPASS` configuration option to disable atomics (#6692) — noreply / githubweb
- Check matching static extents in View constructor (#5190) — noreply / githubweb
- Remove Kokkos::[b]half_t volatile overloads (#6579) — noreply / githubweb
- add tests — fnrizzi / githubweb
- Provide `kokkos_swap` as part of Core and deprecate `Experimental::swap` in Algorithms (#6697) — noreply / githubweb
- Provide new public headers `<Kokkos_Clamp.hpp>` and `<Kokkos_MinMax.hpp>` (#6687) — noreply / githubweb
- Fix TeamThreadMDRange parallel_reduce (#6511) — noreply / githubweb
- add tests for in-place `inclusive_scan` (#6682) — noreply / githubweb
- Drop pointless Kokkos::Impl::CudaExec forward declaration — Damien L-G / githubweb
- Don't use the compiler launcher script if the compile language is CUDA. (#6704) — noreply / githubweb
- Deprecate `{Cuda,HIP}::detect_device_count()` and `Cuda::[detect_]device_arch()` (#6710) — noreply / githubweb
- Get rid of CudaInternal::cuda_get_error_{name,string}_wrapper — Damien L-G / githubweb
- No need to jump through so many hoops to print the error message — Damien L-G / githubweb
- HIP: Forgot to delete matching brace closing the namespace — Damien L-G / githubweb
- Make initialize and finalize of the Cuda/HIP singleton less special (#6714) — noreply / githubweb
- Kokkos_HIP.cpp: include Kokkos_Core.hpp to resolve errors — ndellin / githubweb
- Add bound checks in RangePolicy and MDRangePolicy (#6617) — noreply / githubweb
- Temporary fix to reenable HIP CI — Bruno Turcksin / githubweb
- Let the smart pointer manage the CUDA/HIP stream (#6721) — noreply / githubweb
- Fix Docker env variables — Bruno Turcksin / githubweb
- Ensure view_allocation_error does not silently ignore that no exception was thrown — Damien L-G / githubweb
- Add RawMemoryAllocationFailure::AllocationMechanism::OpenACCMalloc enumerator — Damien L-G / githubweb
- [OpenACC] throw if acc_malloc returned nullptr — Damien L-G / githubweb
- Fixup using declaration — Damien L-G / githubweb
- Disable openacc.view_allocation_error test — Damien L-G / githubweb
- Guard `[MD]RangePolicy` precondition check for deprecated code 4 (#6726) — noreply / githubweb
- Add C++26 standard to CMake Setup — dev / githubweb
- Add support for C++26 in generated makefiles — Damien L-G / githubweb
- Add KOKKOS_ENABLE_CXX26 to the configuration metadata — Damien L-G / githubweb
- Reenable HIP testing — Bruno Turcksin / githubweb
- Disabling failing HIP test in the CI — Bruno Turcksin / githubweb
- Use team_size_max to fix "Team size too large" error in reducer test (#6725) — noreply / githubweb
- Don't touch my records! (refactor Cuda/HIP/SYCL/Threads to not directly mess with `SharedAllocationRecord`) (#6732) — noreply / githubweb
- Untangle SharedAllocationRecord spaghetti code — Damien L-G / githubweb
- Fix TestThreadVectorMDRangeParallelReduce (#6734) — noreply / githubweb
- Cuda multi-GPU support: Allow execution space instance constructor to run (#6706) — noreply / githubweb
- Drop support for deprecated command-line arguments and environment variables (#6744) — noreply / githubweb
- Avoid unnecessary zero-memset of the scratch flags in SYCL (#6739) — noreply / githubweb