Skip to content

Changes

#2 (Dec 1, 2023, 11:53:27 AM)

  1. Only pass one wrapper object in SYCL reductions — Daniel Arndt / githubweb
  2. Explicitly cast to CombinedFunctorReducerType — Daniel Arndt / githubweb
  3. Improve SYCL parallel_scan — Daniel Arndt / githubweb
  4. Compiling with auto deduction of workgroup sizes — Daniel Arndt / githubweb
  5. Unconditionally enable CUDA extended lambda support — pbmille / githubweb
  6. Tentative arguments switch for nvcc 12+ — pbmille / githubweb
  7. Change Makefile.kokkos too — pbmille / githubweb
  8. Implement CMake messages per team decision — pbmille / githubweb
  9. Fix definitions and docs to remove CUDA Lambda option — pbmille / githubweb
  10. Don't fail to define broader 'lambdas are available' macro — pbmille / githubweb
  11. Always expect KOKKOS_ENABLE_CXX11_DISPATCH_LAMBDA to be set — pbmille / githubweb
  12. Remove various test exclusions based on KOKKOS_ENABLE_CUDA_LAMBDA — pbmille / githubweb
  13. Update changelog — ndellin / githubweb
  14. [ci skip] Fixup changelog — ndellin / githubweb
  15. Work around nvcc issue for view_mapping and add FIXME_NVCC comment — pbmille / githubweb
  16. OpenMPTarget: Update hierarchical parallelism. (#6043) — noreply / githubweb
  17. Enable OpenMP in CUDA-11.0-NVCC-RDC to test DEPRECATED_CODE_3=ON (#5978) — noreply / githubweb
  18. fix ternary op in subset of std algorithms not working with nvhpc (#6095) — noreply / githubweb
  19. Add implementation of bit_cast in <Kokkos_BitManipulation.hpp> — Damien L-G / githubweb
  20. Add compile time tests for the constraints on the bit_cast function template — Damien L-G / githubweb
  21. Add the Experimental:: builtin variant (just defer to regular bit_cast) — Damien L-G / githubweb
  22. Add runtime tests for bit_cast — Damien L-G / githubweb
  23. Use Kokkos::bit_cast in SIMD instead of rolling its own — Damien L-G / githubweb
  24. Clang-format glitch — Damien L-G / githubweb
  25. view(uvm): fence if need in allocation (#6005) — romin.tomasetti / githubweb
  26. Disable tests that fail at runtime with NVHPC (likely not liking the class declaration within the body of the functor) — Damien L-G / githubweb
  27. change impl of `is_sorted_until` to use reduce (#6097) — noreply / githubweb
  28. Fix typo and remove accidentally committed assertions — noreply / githubweb
  29. Added multiple reducers support for team-level parallel reduce (#5727) — noreply / githubweb
  30. Work around NVHPC issue with enum types — crtrott / githubweb
  31. Work around NVHPC 23.x issues — crtrott / githubweb
  32. Kokkos: Remove TriBITS Kokkos subpackages (trilinos/Trilinos#11545) (#6104) — noreply / githubweb
  33. Drop pointless Kokkos{Algorithms,Containers}_config.h files — Damien L-G / githubweb
  34. Revert "Merge pull request #5964 from PhilMiller/cuda-lambda-default" — Damien L-G / githubweb
  35. Update the OpenACC parallel_reduce() constructs with Range/MDRange/Team (#6072) — noreply / githubweb
  36. Always pass -extended-lambda option to NVCC and force Kokkos_ENABLE_CUDA_LAMBDA ON — Damien L-G / githubweb
  37. Reorganize ZeroMemset (#6087) — noreply / githubweb
  38. Drop CUDA_LAMBDA guards in Cuda headers — Damien L-G / githubweb
  39. Work around NVHPC 23.x not dealing with __isGlobal — crtrott / githubweb
  40. Drop unused cmake macros — Damien L-G / githubweb
  41. Fixup cmake style — Damien L-G / githubweb
  42. use ASSERT_EQ in all std algorithms tests — fnrizzi / githubweb
  43. Reintroduce test skip for nvhpc < 23.3 — crtrott / githubweb
  44. hpcbind: check for correct Slurm variable — rberger / githubweb
  45. Fix macro guards in test for NVC++ as the CUDA compiler — Damien L-G / githubweb
  46. Allow templated functors in parallel_for, parallel_reduce and parallel_scan (#5976) — noreply / githubweb
  47. Import sycl::bit_cast into the Kokkos namespace — Daniel Arndt / githubweb
  48. Qualify calls possibly ambiguous calls to bit_cast — Daniel Arndt / githubweb
  49. Fix nightlies -- workaround compiler bug in GCC 9.1 and 9.2 (#6118) — noreply / githubweb
  50. Kokkos_BitManipulation: KOKKOS_COMPILER_GCC->KOKKOS_COMPILER_GNU (#6119) — noreply / githubweb
  51. Cuda: Remove unused attach_texture_object — Daniel Arndt / githubweb
  52. Move half traits to private header and add half/bhalf infinity trait (#6055) — noreply / githubweb
  53. Increase minimum required HPX version to 1.8.0 — mikael.simberg / githubweb
  54. Conditionally use hpx::post instead of hpx::apply based on HPX version — mikael.simberg / githubweb
  55. Don't restrict index type in builtin reducers — Daniel Arndt / githubweb
  56. dual view: update template types (#6085) — romin.tomasetti / githubweb
  57. sorting an empty view should exit early and not fail (#6130) — noreply / githubweb
  58. core/src: Move floating_point_wrapper to private header — eharvey / githubweb
  59. Disable tests failing with NVHPC — Daniel Arndt / githubweb
  60. Fix bit_cast for SYCL again — Daniel Arndt / githubweb
  61. Disable tests for OpenMPTarget — Daniel Arndt / githubweb
  62. Improve indentation of comments — Daniel Arndt / githubweb
  63. Allow deprecated declarations in SYCL+Cuda CI — Daniel Arndt / githubweb
  64. Try running for other execution spaces — Daniel Arndt / githubweb
  65. Add guards for Cuda — Daniel Arndt / githubweb
  66. Expand list of kokkos options not to export with cmake — Damien L-G / githubweb
  67. Do not append to Kokkos_OPTIONS variables those in the do not export list — Damien L-G / githubweb
  68. Drop Kokkos_ENABLE_LAUNCH_COMPILER option — Damien L-G / githubweb
  69. Export Kokkos_ENABLE_<OPTION> that are relevant — Damien L-G / githubweb
  70. Drop Kokkos_ENABLE_PROFILING_LOAD_PRINT option — Damien L-G / githubweb
  71. Suppress bogus warning about CUDA_LAMBDA being ON — Damien L-G / githubweb
  72. [ci skip] Add nightly ci for spack (#6135) — noreply / githubweb
  73. OpenMPTarget: Enable Cray compiler for the OpenMPTarget backend. (#5889) — noreply / githubweb
  74. Revert to `DualView<class,class=void,class=void,class=void>` when deprecated code 4 is enabled — Damien L-G / githubweb
  75. Fix Kokkos_ENABLE_CUDA_LAMBDA for Trilinos — Daniel Arndt / githubweb
  76. Fix bogus warnings in nested CUDA parallel_reduce — Daniel Arndt / githubweb
  77. `BinSort`, `BinOp1D`, `BinOp3D`: mark default constructor as deleted (#6131) — noreply / githubweb
  78. KokkosTools: Don't call callbacks before backends are initialized (#6114) — noreply / githubweb
  79. Add `Kokkos::all_libs` alias target for compatibility with TriBITS/Trilinos (#6157) — noreply / githubweb
  80. sorting: add to binsort support for strided views and reorg tests (#6081) — noreply / githubweb
  81. Allow linking against build tree (#6078) — noreply / githubweb
  82. Implement `HPX::in_parallel` (#6143) — noreply / githubweb
  83. OpenMPTarget: Changes for OpenMPTarget backend with nvhpc compiler. — rgayatri / githubweb
  84. OpenMPTarget: Add a fixme. — rgayatri / githubweb
  85. Update Makefile.kokkos — noreply / githubweb
  86. Remove extended_namespace template paramter for SYCLMemoryOrder/Scope — Daniel Arndt / githubweb
  87. OpenMPTarget: update fixme comment. — rgayatri / githubweb
  88. OpenMPTarget: Replace kokkos macros in desul. — rgayatri / githubweb
  89. OpenMPTarget: Restore desul changes. — rgayatri / githubweb
  90. Cherry-pick v3.7.02 changelog into develop [ci skip] — Damien L-G / githubweb
  91. Kokkos: Mark HWLOC as a TriBITS TPL as well (trilinos/Trilinos#11938) — rabartl / githubweb
  92. Clean up FunctorAnalysis — Daniel Arndt / githubweb
  93. SIMD: make binary op tests to test against all data types (#5913) — noreply / githubweb
  94. Also create symlinks for CMake configuration files to cmake_packages/Kokkos for TriBITS (#6163) — noreply / githubweb
  95. Allow passing a temporary std::vector to partition_space (#6167) — noreply / githubweb
  96. .github/workflows: Remove push trigger — eharvey / githubweb
  97. .github/workflows: Only trigger upon push to develop — eharvey / githubweb
  98. Replace _mm512_loadu_epi64 and _mm512_storeu_epi64 with _mm512_loadu_si512 and _mm512_storeu_si512 — donlee / githubweb
  99. OpenMPTarget: include desul changes. — rgayatri / githubweb
  100. Weed out verbose output from `dynamic_view` container unit test (#6173) — noreply / githubweb
  101. shortcut value for is_dynamic_view — fnrizzi / githubweb
  102. add trait and test — fnrizzi / githubweb
  103. Fix global fence in Kokkos::resize(DynRankView) (#6184) — noreply / githubweb
  104. Left align demangled stacktrace output. (#6191) — noreply / githubweb
  105. [HIP] Improve heuristic deciding the number of blocks used in parallel_reduce (#6160) — noreply / githubweb
  106. Improve OpenMP affinity warning to include MPI concerns (#6185) — noreply / githubweb
  107. Update version number on develop after branching off for 4.1.00 — Daniel Arndt / githubweb
  108. Fix test_quad_precision_math_constants test — Daniel Arndt / githubweb
  109. implementation and tests — fnrizzi / githubweb
  110. implementation and tests — fnrizzi / githubweb
  111. [ci skip] test_all_sandia: update compilers and queues — ndellin / githubweb
  112. team-level std algos: common code needed (#6199) — noreply / githubweb
  113. Fix compiling SYCL with KOKKOS_IMPL_DO_NOT_USE_PRINTF_USAGE — Daniel Arndt / githubweb
  114. snapshot mdspan namespace changes (#6162) — noreply / githubweb
  115. Disable AVX512 support for NVHPC — Daniel Arndt / githubweb
  116. Fix host-annotations of AVX2, AVX512, and NEON constructors — Daniel Arndt / githubweb
  117. Introduce impl_get_value/impl_get_mask — Daniel Arndt / githubweb
  118. Fix a gcc-8.4.0 warning — Daniel Arndt / githubweb
  119. Fix host-device annotation for where_expression/const_where_expression — Daniel Arndt / githubweb
  120. Make in-order queues the default via macro — Daniel Arndt / githubweb
  121. Avoid SFINAE in favor of overloads — Daniel Arndt / githubweb
  122. Move scalar overloads to Scalar header — Daniel Arndt / githubweb
  123. Disable KOKKOS_ARCH_AVX512XEON for NVHPC — Daniel Arndt / githubweb
  124. Changelog for 4.1.00 (#6225) — noreply / githubweb
  125. reorganize sort headers (#6230) — noreply / githubweb
  126. SYCL: Support for bhalf_t (#6204) — noreply / githubweb
  127. only compute with relavent entries — tccleve / githubweb
  128. make constraints on `Kokkos::sort` more visible/clear (#6234) — noreply / githubweb
  129. slim API and move code to impl — fnrizzi / githubweb
  130. This PR contains minor code changes and bug fixes needed for LLVM-Clacc — lees2 / githubweb
  131. improve all other corner cases as per review comment — fnrizzi / githubweb
  132. refine for cuda uvm — fnrizzi / githubweb
  133. use exespace to check rather than mem space — fnrizzi / githubweb
  134. Fix AVX2 simd support for ZEN2 AMD CPU. (#6238) — noreply / githubweb
  135. fix corner case — fnrizzi / githubweb
  136. Fix windows symlink configure issue (#6241) — noreply / githubweb
  137. fix corner cases — fnrizzi / githubweb
  138. bug_report.md: new PR branching from `develop` (#5034) — noreply / githubweb
  139. Fix whitespace in bug_report.md (#6244) — noreply / githubweb
  140. Avoid undefined behavior in TestTaskScheduker.hpp — Daniel Arndt / githubweb
  141. Remove calling tribits_exclude_autotools_files() — rabartl / githubweb
  142. Ensure that complex is only instantiated for cv-unqualified floating-point type — Damien L-G / githubweb
  143. Deprecated Kokkos::vector — Damien L-G / githubweb
  144. Warn if <Kokkos_Vector.hpp> is included — Damien L-G / githubweb
  145. Drop Vector test with makefiles and conditionally remove it with CMake — Damien L-G / githubweb
  146. Ignore <Kokkos_Vector.hpp> in the header self-containment tests — Damien L-G / githubweb
  147. SYCL: Use in-order queues in InterOp tests (#6246) — noreply / githubweb
  148. std_algos: fix wrong corner case for `is_partitioned` (#6257) — noreply / githubweb
  149. Make sure macros are defined — Daniel Arndt / githubweb
  150. Error out when Kokkoks_Vector.hpp is included with deprecated code disabled — Daniel Arndt / githubweb
  151. SIMD: Add abs() for all int types (#6069) — noreply / githubweb
  152. Fix SIMD abs unit test accidentally using complex overload — Damien L-G / githubweb
  153. Fix SIMD tests on NEON — Daniel Arndt / githubweb
  154. Add default ParallelFor copy constructor for HIP — Bruno Turcksin / githubweb
  155. Workaround gcc/8.2.0 compiler issue with _mm512_abs_pd — ndellin / githubweb
  156. Implement Kokkos::printf (#6083) — noreply / githubweb
  157. Improve SYCL TeamPolicy reduction — Daniel Arndt / githubweb
  158. make Kokkos_CXX_COMPILER_VERSION available to CMake consumers — cwpears / githubweb
  159. Fully qualify Experimental::SYCL in algorithms to avoid finding conflicting namespaces — Daniel Arndt / githubweb
  160. subset of team level impl of std algorithms — fnrizzi / githubweb
  161. fix copyright — fnrizzi / githubweb
  162. guard for openmptarget — fnrizzi / githubweb
  163. fix for openmptarget — fnrizzi / githubweb
  164. address comments — fnrizzi / githubweb
  165. Update CMakeLists for unit tests with OpenMPTarget, OpenACC with NVHPC (#6260) — noreply / githubweb
  166. Update CI from CUDA 11.7.0 to 11.7.1 — Bruno Turcksin / githubweb
  167. Improve SYCL reduction performance: RangePolicy (#6264) — noreply / githubweb
  168. Improve SYCL reduction performance: workgroup_reduction (#6270) — noreply / githubweb
  169. SYCL TeamPolicy: Fix sign comparison warning — Daniel Arndt / githubweb
  170. SIMD: suppress a uninitialized variable warning (#6294) — noreply / githubweb
  171. OpenACC CMakechange Clacc (#6250) — noreply / githubweb
  172. `Kokkos::sort` support custom comparator (#6253) — noreply / githubweb
  173. Add nightly build using latest gcc and c++23 — Bruno Turcksin / githubweb
  174. Explicitly capture this in lambda function — Bruno Turcksin / githubweb
  175. Fix typo in nightly jenkins configuration — Bruno Turcksin / githubweb
  176. Fix a memory bug in the `free_state` function of random pools. (#6290) — noreply / githubweb
  177. std_algos: for_each: try condense the impl — fnrizzi / githubweb
  178. format — fnrizzi / githubweb
  179. Enforce create_mirror restrictions w.r.t. ViewCtorArgs on all variants — Daniel Arndt / githubweb
  180. Use KOKKOS_IF_ON_HOST — Daniel Arndt / githubweb
  181. Suppress warnings — Daniel Arndt / githubweb
  182. Don't suppress warnings for NVHPC — Daniel Arndt / githubweb
  183. Improve macro definitions — Daniel Arndt / githubweb
  184. Enable Serial backend in HPX build — cezary.skrzynski / githubweb
  185. Modify fences in View API test — cezary.skrzynski / githubweb
  186. Check for overflow during backend initialization (Cuda, HIP, SYCL) (#6159) — noreply / githubweb
  187. Improve SYCL reduction performance: MDRangePolicy (#6271) — noreply / githubweb
  188. bring back previous code as discussed in meeting — fnrizzi / githubweb
  189. create cudaAPI function wrappers — tccleve / githubweb
  190. Reorganize #include <cuda_runtime_api.h> — tccleve / githubweb
  191. Some api function require cuda11.2+ — tccleve / githubweb
  192. Cuda10 requires "stream=nullptr" as default arg — tccleve / githubweb
  193. Rework stream inputs — tccleve / githubweb
  194. Use "if constexpr" for setCudaDevice — tccleve / githubweb
  195. Remove static in comment — tccleve / githubweb
  196. Update CI: use CUDA 11.6.2 instead of 11.6.0 (#6314) — noreply / githubweb
  197. add helper variable templates `are_*_iterators_v` (#6312) — noreply / githubweb
  198. [HIP] Optimize parallel_reduce (#6229) — noreply / githubweb
  199. Split Kokkos_SYCL_Parallel_Range.hpp — Daniel Arndt / githubweb
  200. Split Kokkos_SYCL_Parallel_Reduce.hpp — Daniel Arndt / githubweb
  201. Move Kokkos_SYCL_Scan.hpp — Daniel Arndt / githubweb
  202. Split Kokkos_Parallel_Team.hpp — Daniel Arndt / githubweb
  203. Add comment that we don't need to link against clang_rt with ROCm 5.5 — Bruno Turcksin / githubweb
  204. Error out when CXX standard is not set when using amdclang or cray clang — Bruno Turcksin / githubweb
  205. Disable failing tests for ROCm 5.5 and 5.6 — Bruno Turcksin / githubweb
  206. #5635: Add parallel_scan overload with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
  207. #5635: Add test for parallel_scan with return value for ThreadVectorRange — arek.szczepkowicz / githubweb
  208. #5635: Serial/OpenMP:  Parallel_scan with return value for TeamThreadRange (#6090) — noreply / githubweb
  209. Fix reviewer's comments — Bruno Turcksin / githubweb
  210. Allow using the SYCL execution space on AMD GPUs (#6321) — noreply / githubweb
  211. Update Clang+Cuda CI from 10.1 to 11.0.3 (#6318) — noreply / githubweb
  212. SIMD: add shift ops for all int types (#6109) — noreply / githubweb
  213. SYCL: Use sycl::bit_cast from oneAPI 2024.0.0 on  (#6300) — noreply / githubweb
  214. remove spurious undefs — fnrizzi / githubweb
  215. Decrease maximum memory available to ccache — Bruno Turcksin / githubweb
  216. Fix gtest when using C++20 — Bruno Turcksin / githubweb
  217. Fully qualify Experimental::SYCL in algorithms to avoid conflicting namespaces — Daniel Arndt / githubweb
  218. address reviews [skip ci] — antoine.meyer54 / githubweb
  219. formatting — fnrizzi / githubweb
  220. SIMD: split simd unit tests into separate files (#6278) — noreply / githubweb
  221. Run NVHPC only on V100 — Bruno Turcksin / githubweb
  222. Use checked arithmetic builtins for overflow detection (#6313) — noreply / githubweb
  223. Passing size==0 to DeepCopy/memcpy/omp_target_memcpy (#6273) — noreply / githubweb
  224. allow sorting via native oneDPL  to support views with stride = 1 (#6322) — noreply / githubweb
  225. Adopt new HIP cmake's way of finding clang-rt — nicurtis / githubweb
  226. Fix #6334: intel compiler does not like returning inside if constexpr (#6335) — noreply / githubweb
  227. Fix #6336: capturing `this` implicitly is deprecated in C++20 (#6337) — noreply / githubweb
  228. fix lambda capture — fnrizzi / githubweb
  229. remove unnecessary file, fix constraints — fnrizzi / githubweb
  230. remove file — fnrizzi / githubweb
  231. fix — fnrizzi / githubweb
  232. fix lambda capture and constraints — fnrizzi / githubweb
  233. keep only subset — fnrizzi / githubweb
  234. revert files — fnrizzi / githubweb
  235. remove file — fnrizzi / githubweb
  236. fix syntax — fnrizzi / githubweb
  237. format — fnrizzi / githubweb
  238. Disable default oneDPL support in Trilinos — Daniel Arndt / githubweb
  239. Rename AMD GPU architectures (#6266) — noreply / githubweb
  240. Fix compiling SIMD libraray with NEON and gcc-13 — Daniel Arndt / githubweb
  241. SIMD: add generator constructors (#6347) — noreply / githubweb
  242. Use /usr/bin/env bash in nvcc_wrapper and kokkos_launch_compiler for portability — mikael.simberg / githubweb
  243. Fix cudaAPI wrapper errors for CUDA_MALLOC_ASYNC (#6346) — noreply / githubweb
  244. Use std::aligned_alloc for allocations (#6341) — noreply / githubweb
  245. Initial implementation of gfx942 (#6358) — noreply / githubweb
  246. Only set KOKKOS_ARCH_AMD_GPU if a AMD GPU architecture is enabled — Daniel Arndt / githubweb
  247. Extend 'hip_driver_check_page_migration' (#6364) — noreply / githubweb
  248. Replace volatile m_ready_count loads by desul::atomic_loads in task queue implementation — mikael.simberg / githubweb
  249. Add Kokkos_Atomic.hpp headers to OpenMP and HPX task headers — mikael.simberg / githubweb
  250. Add support for HIP Graph — Bruno Turcksin / githubweb
  251. Replace one of the ROCM 5.2 configuration with ROCM 5.6 — Bruno Turcksin / githubweb
  252. Use aligned version of operator new instead of aligned_alloc — Daniel Arndt / githubweb
  253. HPX: Don't interfere with exception handling — Daniel Arndt / githubweb
  254. Fix -Wformat-truncation warnings in CI (#6354) — noreply / githubweb
  255. Do not use HIP Graph with ROCm 5.2 — Bruno Turcksin / githubweb
  256. Use constexpr West in src — Bruno Turcksin / githubweb
  257. Use constexpr West in test — Bruno Turcksin / githubweb
  258. SIMD: convert binary operators to hidden friends (#6320) — noreply / githubweb
  259. simd: make mask and condition unit test to check with all data types (#6360) — noreply / githubweb
  260. team-level std algos: part 2 (#6205) — noreply / githubweb
  261. Add deprecated attribute to HostSpace(AllocationMechanism) definition — Damien L-G / githubweb
  262. Added gather_from and scatter_to for AVX2 and AVX512 simd — donlee / githubweb
  263. Rebased and applied feedbacks — donlee / githubweb
  264. Changed gen ctor usage to pass in KOKKOS_LAMBDA — donlee / githubweb
  265. Suppress NVCC attribute ignored warning on SIMD operators as hidden friends — Damien L-G / githubweb
  266. Remove deprecated code 3 support for volatile join — crtrott / githubweb
  267. Disable a test not working with nvhpc-23.1 — crtrott / githubweb
  268. Reenabling tests for nvhpc 23.7 — crtrott / githubweb
  269. Update containers and algorithms for NVC++ 23.7 — crtrott / githubweb
  270. More NVC++ 23.7 updates — crtrott / githubweb
  271. NVC++ clang-format fixes — crtrott / githubweb
  272. Update nvhpc to version 23.7 in the CI — crtrott / githubweb
  273. NVHPC 23.7 update: address reviewer comments — crtrott / githubweb
  274. OpenACC: Guard tests relying on abort — crtrott / githubweb
  275. Fix TestAtomic to use the test execspace — crtrott / githubweb
  276. Use NVHPC 23.7 for testing of OpenACC — crtrott / githubweb
  277. Update nvhpc gtest skip message — crtrott / githubweb
  278. Work around OpenMPTarget failure — crtrott / githubweb
  279. Update base docker file for nvhpc — crtrott / githubweb
  280. Remove stray Cuda graph pattern specialization from tag — Damien L-G / githubweb
  281. Fix reviewer's comments — Bruno Turcksin / githubweb
  282. Fix uninitialize variable warning with gcc 13 — Bruno Turcksin / githubweb
  283. HIP: Update print_configuration (#6387) — noreply / githubweb
  284. Check AVX, AVX2, AVX512XEON compiler macros on KokkosCore_config.h (#6248) — noreply / githubweb
  285. Don't use local headers or runtime in HPX backend due to deprecation — mikael.simberg / githubweb
  286. Add a more comprehensive `kokkos_{malloc, free}` perf_test (#6377) — noreply / githubweb
  287. team-level std algos: part 3 (#6207) — noreply / githubweb
  288. Adding is_scoped_enum & to_underlying (#6356) — noreply / githubweb
  289. SIMD: add float simd support (#6177) — noreply / githubweb
  290. team-level std algos: part 4 (#6208) — noreply / githubweb
  291. Added a gen ctor for float (#6397) — noreply / githubweb
  292. team-level std algos: part 5 (#6209) — noreply / githubweb
  293. Deprecate Cuda(cudaStream_t, bool) — Damien L-G / githubweb
  294. Fixup checked interger operations death test — Damien L-G / githubweb
  295. Deprecate HIP(hipStream_t, bool) — Damien L-G / githubweb
  296. Let Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC be ON by default — Damien L-G / githubweb
  297. Print whether KOKKOS_ENABLE_IMPL_CUDA_MALLOC_ASYNC is defined — Damien L-G / githubweb
  298. Introduce disable_malloc_async Cuda option with generated makefiles — Damien L-G / githubweb
  299. Preserve one build that disables Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC — Damien L-G / githubweb
  300. Use archive extraction time for timestamps — cezary.skrzynski / githubweb
  301. Disable performance benchmarks in AppVeyor CI — cezary.skrzynski / githubweb
  302. team-level std algos: part 6 (#6210) — noreply / githubweb
  303. address comments — fnrizzi / githubweb
  304. OpenMP backend refactor files. (#6403) — noreply / githubweb
  305. Drop (unused) `Cuda::cuda_internal_maximum_shared_words` — Damien L-G / githubweb
  306. Drop check that the host backend is initialized before the Cuda/HIP/SYCL one — Damien L-G / githubweb
  307. Drop unused HIPInternal::m_maxSharedWords data member — Damien L-G / githubweb
  308. Drop unused HIPInternal::m_hipArch static data member — Damien L-G / githubweb
  309. !initialized() should be a precondition for calling {Cuda,HIP,SYCL}Internal::initialize — Damien L-G / githubweb
  310. Drop pre-Kepler logic in Cuda::impl_initialize — Damien L-G / githubweb
  311. use single — fnrizzi / githubweb
  312. address comments — fnrizzi / githubweb
  313. formatting — fnrizzi / githubweb
  314. Team-level std algos: part 7 (#6211) — noreply / githubweb
  315. formatting — fnrizzi / githubweb
  316. Enable death tests for fedora rawhide — cezary.skrzynski / githubweb
  317. core/src: Add half math functions to private header (#6124) — noreply / githubweb
  318. Drop check whether device supports unified addressing — Damien L-G / githubweb
  319. fix single as per Christian's suggestion — fnrizzi / githubweb
  320. Only warn once (at initialization) when forcing allocation in unified memory — Damien L-G / githubweb
  321. check-copyright improvements (#6399) — noreply / githubweb
  322. Use execution space instance argument to get device properties in block size deduction — Damien L-G / githubweb
  323. Address reviewer' comments — Bruno Turcksin / githubweb
  324. Fix to avoid #186-D pointless comparison warning. — maarten.arnst / githubweb
  325. add comment — fnrizzi / githubweb
  326. improve tests to address review — fnrizzi / githubweb
  327. Fix guard for isnan test for bhalf_t — Daniel Arndt / githubweb
  328. avoid potential race condition HIP — tccleve / githubweb
  329. [SYCL][Reduction] Group counter should use at least memory_order::acq_rel — andrei.elovikov / githubweb
  330. Same for scan — andrei.elovikov / githubweb
  331. Initialize m_num_scratch_locks for Cuda parallel_for TeamPolicy — Daniel Arndt / githubweb
  332. improve tests with intra-team result check — fnrizzi / githubweb
  333. Fixes for Kokkos::Array (#6372) — noreply / githubweb
  334. try fix for unique, previous impl to remove later — fnrizzi / githubweb
  335. #5635: Add parallel_scan changes for CUDA and TeamThreadRange — cezary.skrzynski / githubweb
  336. remove old impl — fnrizzi / githubweb
  337. #5635: Enable TeamThreadRange test for CUDA — cezary.skrzynski / githubweb
  338. Clean up benchmarks/gups — cwpears / githubweb
  339. benchmark/gups: use CMake — cwpears / githubweb
  340. OpenMPTarget: Disable check for SIMD compiler macros — Daniel Arndt / githubweb
  341. #5635: Add parallel_scan with value for CUDA and ThreadVectorRange — cezary.skrzynski / githubweb
  342. add missing assert — fnrizzi / githubweb
  343. #5635: Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
  344. add intra team check for missing test — fnrizzi / githubweb
  345. fix intel compile error — fnrizzi / githubweb
  346. fix unreachable for intel — fnrizzi / githubweb
  347. re-enable unit tests for sort and random via makefile (#6422) — noreply / githubweb
  348. OpenMPTarget init-join fix (#6444) — noreply / githubweb
  349. Fix Cuda parallel_scan ThreadVectorRange range — Daniel Arndt / githubweb
  350. Assign final sum in Cuda parallel_scan ThreadVectorRange — Daniel Arndt / githubweb
  351. Fix compiling code using Kokkos::printf for OpenMPTarget on Intel GPUs (#6443) — noreply / githubweb
  352. std_algos: improving min, max, minmax (#6421) — noreply / githubweb
  353. team-level stdalgos: improve tests, check intra-team result matching (part 2/7) (#6426) — noreply / githubweb
  354. Skip bessel function tests known to fail on Intel GPUs (#6434) — noreply / githubweb
  355. team-level stdalgos: improve tests, check intra-team result matching (part 6/7) (#6436) — noreply / githubweb
  356. Fix race condition in functor_vec_scan_ret_val test — Daniel Arndt / githubweb
  357. Fix parallel_scan_with_reducers test — Daniel Arndt / githubweb
  358. team-level stdalgos: improve tests, check intra-team result matching (part 3/7) (#6425) — noreply / githubweb
  359. improve tests (#6432) — noreply / githubweb
  360. improve tests (#6437) — noreply / githubweb
  361. Move final assignment to correct scope — cezary.skrzynski / githubweb
  362. fix casting warning in Random test — fnrizzi / githubweb
  363. Workaround for ROCm 5.6+ failing to compile with AVX2 SIMD support (#6449) — noreply / githubweb
  364. HIP: Restrict AVX2 workaround to ROCm 5.6 and 5.7 — Daniel Arndt / githubweb
  365. fixes build error for TeamReduce and TeamTranformReduced tests for specific GCC (#6459) — noreply / githubweb
  366. improve tests to check intra-team result (#6431) — noreply / githubweb
  367. SIMD: Math functions should be in namespace Kokkos — Daniel Arndt / githubweb
  368. SYCL: Disable another bessel function test for Intel GPUs — Daniel Arndt / githubweb
  369. team-level std algos: part 10 (#6256) — noreply / githubweb
  370. team-level std algos: part 11 (#6258) — noreply / githubweb
  371. #5635: HIP: Add Overloads for parallel_scan with return value for TeamThreadRange (#6302) — noreply / githubweb
  372. #5635: Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
  373. #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
  374. use shortcut — cezary.skrzynski / githubweb
  375. Skip testing for non-power-of-two team sizes — cezary.skrzynski / githubweb
  376. Fix formatting — cezary.skrzynski / githubweb
  377. Add parallel_scan overloads with value for HIP backend — cezary.skrzynski / githubweb
  378. Use std::is_same_v — cezary.skrzynski / githubweb
  379. #5635: Move some tests for parallel_scan to TestTeamScan — cezary.skrzynski / githubweb
  380. #5635: SYCL: Add parallel_scan overload with return value — cezary.skrzynski / githubweb
  381. Use std::is_same_v — cezary.skrzynski / githubweb
  382. OpenMP: Fix TeamThreadRange parallel_scan with return value for team_size > 1 — Daniel Arndt / githubweb
  383. Add compatible copy assignment operator to DualView — maarten.arnst / githubweb
  384. Fix minimum version for Google benchmark — Daniel Arndt / githubweb
  385. Add test of copy constructor/assignment operator for DualView. — maarten.arnst / githubweb
  386. Compute concurrency on HIP using Kokkos hardcoded m_maxWavesPerCU — maarten.arnst / githubweb
  387. Modify test so that source and destination view are of different type — maarten.arnst / githubweb
  388. Use call operator instead of run_me function — maarten.arnst / githubweb
  389. team-level std algos: part 12 (#6350) — noreply / githubweb
  390. core/src: Add half single and double mixed compare (LT,GT,LE,GE) (#6407) — noreply / githubweb
  391. Moving abort and assert into their own public headers (#6445) — noreply / githubweb
  392. Add test for parallel_scan with return value for ThreadVectorRange — cezary.skrzynski / githubweb
  393. Add parallel_scan overloads with value for Threads — cezary.skrzynski / githubweb
  394. Allow detecting SIMD types based on compiler macros  (#6188) — noreply / githubweb
  395. Add KOKKOS_ARCH_ARM_NEON — Daniel Arndt / githubweb
  396. Fix implementation for cyl_bessel_i0 — Daniel Arndt / githubweb
  397. cleaning: remove iostream from headers where possible (IWYU) — romin.tomasetti / githubweb
  398. Fix compiling SIMD unit tests on NVIDIA — Daniel Arndt / githubweb
  399. guards to ensure DBL_EPSILON return for POWER8,9 — ajpowel / githubweb
  400. Fix NVCC warnings (#6483) — noreply / githubweb
  401. team-level std algos: part 13 (#6351) — noreply / githubweb
  402. Also fix annotations for generator constructor for AVX512 and NEON — Daniel Arndt / githubweb
  403. #5635: SYCL: Add parallel_scan overload with value for ThreadVectorRange — cezary.skrzynski / githubweb
  404. Fix atomic operations bug for Min and Max (#6435) — noreply / githubweb
  405. Fix example/build_cmake_installed_different_compiler — Daniel Arndt / githubweb
  406. fix impl — fnrizzi / githubweb
  407. Update core/src/HIP/Kokkos_HIP_KernelLaunch.hpp — noreply / githubweb
  408. Split Kokkos_Threads_Parallel files — Bruno Turcksin / githubweb
  409. HPX: Implement TeamThread and ThreadVector parallel_scan with return value — Daniel Arndt / githubweb
  410. Serial: Allow for distinct execution space instances (#6441) — noreply / githubweb
  411. add overload for TeamThreadRange — fnrizzi / githubweb
  412. address review comment — fnrizzi / githubweb
  413. Update to HIP TeamPolicy Block number heuristic (#6284) — noreply / githubweb
  414. SIMD: Split math functions from SIMD_Common.hpp (#6487) — noreply / githubweb
  415. Allow NVHPC as device compiler only with Kokkos_ENABLE_IMPL_NVHPC_AS_DEVICE_COMPILER=ON — Daniel Arndt / githubweb
  416. ad threadvector — fnrizzi / githubweb
  417. fix order — fnrizzi / githubweb
  418. remove guards — fnrizzi / githubweb
  419. UnorderedMap(space instance): proposal for #6067 — romin.tomasetti / githubweb
  420. Rename Kokkos_ThreadsExec to align with the other backends — Bruno Turcksin / githubweb
  421. Promote Kokkos_Printf.hpp to public include — Daniel Arndt / githubweb
  422. simd: add floor, ceil, round, trunc operations (#6393) — noreply / githubweb
  423. Update CI in OpenMPTarget to use llvm-17  (#6472) — noreply / githubweb
  424. Rename Kokkos_ThreadsTeam.hpp to Kokkos_Threads_Team.hpp — Bruno Turcksin / githubweb
  425. Kokkos_SIMD_Scalar.hpp: remove extra ';' — ndellin / githubweb
  426. add guards — fnrizzi / githubweb
  427. avoid auto — fnrizzi / githubweb
  428. [ci skip] Update Kokkos version to 4.2.99 — Daniel Arndt / githubweb
  429. [deprecated code 3] remove all default device init tests — Damien L-G / githubweb
  430. [deprecated code 3] remove InitArguments — Damien L-G / githubweb
  431. [deprecated code 3] remove KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macros — Damien L-G / githubweb
  432. [deprecated code 3] remove using declaration in Kokkos::Experimental:: for clamp, min, max, and minmax — Damien L-G / githubweb
  433. [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
  434. [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math constants — Damien L-G / githubweb
  435. [deprecated code 3] remove {OpenMP,HPX}::partition_master — Damien L-G / githubweb
  436. OpenMP backend cleanup following removal of deprecated code 3 — Damien L-G / githubweb
  437. Get rid of FIXME_OPENMP — Damien L-G / githubweb
  438. [deprecated code 3] remove MasterLock — Damien L-G / githubweb
  439. [deprecated code 3] remove ENABLE_DEPRECATED_CODE_3 option — Damien L-G / githubweb
  440. fixup! [deprecated code 3] remove using declaration in Kokkos::Experimental:: for all math functions — Damien L-G / githubweb
  441. Do not append " - blocks" to the bitset label — Damien L-G / githubweb
  442. with_updated_label -> append_to_label — Daniel Arndt / githubweb
  443. SYCL: Use SYCL_EXT_ONEAPI_DEVICE_GLOBAL to detect support for device global variables — Daniel Arndt / githubweb
  444. Fixup in README (github -> GitHub) — Damien L-G / githubweb
  445. Threads: replace enum with constexpr int and enum class (#6514) — noreply / githubweb
  446. Added unit tests for reduction ops and few intel svml intrinsics — donlee / githubweb
  447. Remove sleep and wake functions — Bruno Turcksin / githubweb
  448. Prefer defaulted default constructor for Bitset (#6524) — noreply / githubweb
  449. SYCL: Use host-pinned memory to copy reduction/scan result (#6500) — noreply / githubweb
  450. Remove unused variables — Bruno Turcksin / githubweb
  451. Remove Sentinel struct from Threads — Bruno Turcksin / githubweb
  452. Small cleanup of ThreadsInternal::initialize — Bruno Turcksin / githubweb
  453. Remove useless forward declaration — Bruno Turcksin / githubweb
  454. Remove spawn function — Bruno Turcksin / githubweb
  455. Add comments — Bruno Turcksin / githubweb
  456. Fix indentation — Bruno Turcksin / githubweb
  457. Fix typo in macro guard — Bruno Turcksin / githubweb
  458. Reduce number of View constructor instantiations — Damien L-G / githubweb
  459. Bump HPX version used in CI to 1.9.0 — mikael.simberg / githubweb
  460. Split files in HIP backend — Bruno Turcksin / githubweb
  461. Trim some fat in `CudaInternal` (towards multiple GPUs support) (#6544) — noreply / githubweb
  462. Only define STDALGO_TEAM_SOURCES_* once — Daniel Arndt / githubweb
  463. Rollback changes to view constructors to reduce the number of instantiations (#6564) — noreply / githubweb
  464. Threads remove unused variables and functions (#6566) — noreply / githubweb
  465. Remove unused Sandia testing files (#6568) — noreply / githubweb
  466. fallback implementation cleanup — donlee / githubweb
  467. Remove empty quotation marks for static_assert — Daniel Arndt / githubweb
  468. [ci skip] Drop unused <impl/Kokkos_Memory_Fence.hpp> header — Damien L-G / githubweb
  469. Drop Clang+CUDA workaround — Damien L-G / githubweb
  470. OpenMPTarget: CI compiler upgrade. (#6545) — noreply / githubweb
  471. Add crtrott's launch_latency benchmark (#6379) — noreply / githubweb
  472. Simplify fence functions in the Threads backend (#6571) — noreply / githubweb
  473. simd: temporarily skip device math ops unit test for OpenMPTarget build (#6574) — noreply / githubweb
  474. OpenMPTarget: Guard scratch memory usage in ParallelReduce — Daniel Arndt / githubweb
  475. SYCL: Implement DESUL_ATOMICS_ENABLE_SYCL_SEPARABLE_COMPILATION path (#6534) — noreply / githubweb
  476. Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header — Damien L-G / githubweb
  477. Replace Marsaglia polar method with Box-muller to generate a normally distributed random number (#6556) — noreply / githubweb
  478. OpenMP: No memset in viewfill (#6573) — noreply / githubweb
  479. Revert "Desul atomics: Trade SYCL-specific compile definition for a macro defintion in the configuration header" — noreply / githubweb
  480. OpenACC: add atomics support (#6446) — noreply / githubweb
  481. Fix infinity, quiet_NaN, signaling_Nan, isfinite, isnan, isinf for half_t and bhalf_t (#6543) — noreply / githubweb
  482. kokkos(unique): fix allocation of temporary view to enfore using the provided space instance — romin.tomasetti / githubweb
  483. Use binary wrapper for consistency in definition of half types numeric traits (#6590) — noreply / githubweb
  484. try fix — fnrizzi / githubweb
  485. Fix TestNumericTriats.hpp for SYCL with bfloat16 support — Daniel Arndt / githubweb
  486. Fix generated Makefile when using gnu_generate_makefile.sh and make >= 4.3 — crtrott / githubweb
  487. Threads: add missing broadcast to TeamThreadRange parallel_scan (#6601) — noreply / githubweb
  488. Added missing operator* to NEON simd — crtrott / githubweb
  489. [ci skip] Update changelog on develop for 4.2.00 (#6592) — noreply / githubweb
  490. Remove KOKKOS_IMPL_DO_NOT_USE_PRINTF  (#6593) — noreply / githubweb
  491. try fix — fnrizzi / githubweb
  492. avoid pyt package — fnrizzi / githubweb
  493. try — fnrizzi / githubweb
  494. fix for macos — fnrizzi / githubweb
  495. remove comments — fnrizzi / githubweb
  496. use reference — crtrott / githubweb
  497. add branching — fnrizzi / githubweb
  498. [ci skip] fix formatting — cezary.skrzynski / githubweb
  499. GitHub Workflows: Use Ubuntu 22.04 instead of Fedora for Intel compiler testing — Daniel Arndt / githubweb
  500. nvcc(wrapper): adding missing `--generate-line-info` arg — romin.tomasetti / githubweb
  501. Add clang-format check to GitHub workflows (#6612) — noreply / githubweb
  502. graph(HIP): adding inline keyword to fix #6623 — romin.tomasetti / githubweb
  503. Add jenkins multibranch pipeline options — Bruno Turcksin / githubweb
  504. kokkos(profiling): do not finalize in any backend — romin.tomasetti / githubweb
  505. Replace ubuntu:18.04 with ubuntu:20.04 as base image for clang-format — Bruno Turcksin / githubweb
  506. Disabling OpenACC in the CI because it emits too many warnings — Bruno Turcksin / githubweb
  507. unorderedmap: modernize traits — romin.tomasetti / githubweb
  508. nvcc wrapper: remove troubling flag to fix 6628 (#6629) — noreply / githubweb