Hi! I'm Torsten Keßler (tpkessler in AUR and on GitHub) from Saarland, a federal state in the south west of Germany. With this email I'm applying to be become a trusted user. After graduating with a PhD in applied mathematics this year I'm now a post-doc with a focus on numerical analysis, the art of solving physical problems with mathematically sound algorithms on a computer. I've been using Arch Linux on my private machines (and at work) since my first weeks at university ten years ago. After initial distro hopping a friend recommended Arch. I immediately liked the way it handles packages via pacman, its wiki and the flexibility of its installation process. Owing to their massively parallel architecture, GPUs have emerged as the leading platform for computationally expensive problems: Machine Learning/AI, real-world engineering problems, simulation of complex physical systems. For a long time, nVidia's CUDA framework (closed source, exclusively for their GPUs) has dominated this field. In 2015, AMD announced ROCm, their open source compute framework for GPUs. A common interface to CUDA, called HIP, makes it possible to write code that compiles and runs both on AMD and nVidia hardware. I've been closely following the development of ROCm on GitHub, trying to compile the stack from time to time. But only since 2020, the kernel includes all the necessary code to compile the ROCm stack on Arch Linux. Around this time I've started to contribute to rocm-arch on GitHub, a collection of PKGBUILDs for ROCm (with around 50 packages). Soon after that, I became the main contributor to the repository and, since 2021, I've been the maintainer of the whole ROCm stack. We have an active issue tracker and recently started a discussion page for rocm-arch. Most of the open issues as of now are for bookkeeping of patches we applied to run ROCm on Arch Linux. Many of them are linked to an upstream issue and a corresponding pull request that fixes the issues. This way I've already contributed code to a couple of libraries of the ROCm stack. Over the years, many libraries have added official support for ROCm, including tensorflow, pytorch, python-cupy, python-numba (not actively maintained anymore) and blender. Support of ROCm for the latter generated large interest in the community and is one reason Sven contacted me, asking me if I would be interested to take care of ROCm in [community]. In its current version, ROCm support for blender works out of the box. Just install hip-runtime-amd from the AUR and enable the HIP backend in blender's settings for rendering. The machine learning libraries require more dependencies from the AUR. Once installed, pytorch and tensorflow are known to work on Vega GPUs and the recent RDNA architecture. My first action as a TU would be to add basic support of ROCm to [community], i.e. the low level libraries, including HIP and an open source runtime for OpenCL based on ROCm. That would be enough to run blender with its ROCm backend. At the same time, I would expand the wiki article on ROCm. The interaction with the community would also move from the issue tracker of rocm-arch to the Arch Linux bug tracker and the forums. In a second phase I would add the high level libraries that would enable users to quickly compile and run complex libraries such as tensorflow, pytorch or cupy. #BEGIN Technical details The minimal package list for HIP which includes the runtime libraries for basic GPU programming and the GPU compiler (hipcc) comprises eight packages * rocm-cmake (basic cmake files for ROCm) * rocm-llvm (upstream llvm with to-be-merged changes by AMD) * rocm-device-libs (implements math functions for all GPU architectures) * comgr (runtime library, "compiler support" for rocm-llvm) * hsakmt-roct (interface to the amdgpu kernel driver) * hsa-rocr (runtime for HSA compute kernels) * rocminfo (display information on HSA agents: GPU and possibly CPU) * hip-runtime-amd (runtime and compiler for HIP, a C++ dialect inspired by CUDA C++) All but rocm-llvm are small libraries under the permissive MIT license. Since ROCm 5.2, all packages successfully build in a clean chroot and are distributed in the community repo arch4edu. The application libraries for numerical linear algebra, sparse matrices or random numbers start with roc and hip (rocblas, rocsparse, rocrand). The hip* packages are designed in such a way that they would also work with CUDA if hip is configured with CUDA instead of a ROCm/HSA backend. With few exceptions (rocthrust, rccl) these packages are licensed under MIT. Possible issues: There are three packages that are not fully working under Arch Linux or lack an open source license. The first is rocm-gdb, a fork of gdb with GPU support. To work properly it needs a kernel module currently not available in upstream linux but only as part of AMD's dkms modules. But they only work with specific kernel versions. Support for this from my side on Arch Linux was dropped a while ago. One closed source package is hsa-amd-aqlprofile. As the name suggests it is used for profiling as part of rocprofiler. Above mentioned packages are only required for debugging and profiling but are no runtime dependencies of the big machine learning libraries or any other package with ROCm support I'm aware of. The third package is rocm-core, a package only part of the meta packages for ROCm with no influence on the ROCm runtime. It provides a single header and a library with a single function that returns the current ROCm version. No source code has been published by AMD so far and the official package lacks a license file. A second issue is GPU support. AMD officially only supports the professional compute GPUs. This does not mean that ROCm is not working on consumer cards but merely that AMD cannot guarantee all functionalities through excessive testing. Recently, ROCm added support for Navi 21 (RX 6800 onwards), see https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardw... I own a Vega 56 (gfx900) that is officially supported, so I can test all packages before publishing them on the AUR / in [community]. #END Technical details On the long term, I would like to foster Arch Linux as the leading platform for scientific computing. This includes Machine Learning libraries in the official repositories as well as packages for classical "number crunching" such as petsc, trilinos and packages that depend on them: deal-ii, dune or ngsolve. The sponsors of my application are Sven (svenstaro) and Bruno (archange). I'm looking forward to the upcoming the discussion and your feedback on my application. Best, Torsten
On 26.10.22 08:30, Torsten Keßler wrote:
Hi! I'm Torsten Keßler (tpkessler in AUR and on GitHub) from Saarland, a federal state in the south west of Germany. With this email I'm applying to be become a trusted user. After graduating with a PhD in applied mathematics this year I'm now a post-doc with a focus on numerical analysis, the art of solving physical problems with mathematically sound algorithms on a computer. I've been using Arch Linux on my private machines (and at work) since my first weeks at university ten years ago. After initial distro hopping a friend recommended Arch. I immediately liked the way it handles packages via pacman, its wiki and the flexibility of its installation process.
Owing to their massively parallel architecture, GPUs have emerged as the leading platform for computationally expensive problems: Machine Learning/AI, real-world engineering problems, simulation of complex physical systems. For a long time, nVidia's CUDA framework (closed source, exclusively for their GPUs) has dominated this field. In 2015, AMD announced ROCm, their open source compute framework for GPUs. A common interface to CUDA, called HIP, makes it possible to write code that compiles and runs both on AMD and nVidia hardware. I've been closely following the development of ROCm on GitHub, trying to compile the stack from time to time. But only since 2020, the kernel includes all the necessary code to compile the ROCm stack on Arch Linux. Around this time I've started to contribute to rocm-arch on GitHub, a collection of PKGBUILDs for ROCm (with around 50 packages). Soon after that, I became the main contributor to the repository and, since 2021, I've been the maintainer of the whole ROCm stack.
We have an active issue tracker and recently started a discussion page for rocm-arch. Most of the open issues as of now are for bookkeeping of patches we applied to run ROCm on Arch Linux. Many of them are linked to an upstream issue and a corresponding pull request that fixes the issues. This way I've already contributed code to a couple of libraries of the ROCm stack.
Over the years, many libraries have added official support for ROCm, including tensorflow, pytorch, python-cupy, python-numba (not actively maintained anymore) and blender. Support of ROCm for the latter generated large interest in the community and is one reason Sven contacted me, asking me if I would be interested to take care of ROCm in [community]. In its current version, ROCm support for blender works out of the box. Just install hip-runtime-amd from the AUR and enable the HIP backend in blender's settings for rendering. The machine learning libraries require more dependencies from the AUR. Once installed, pytorch and tensorflow are known to work on Vega GPUs and the recent RDNA architecture.
My first action as a TU would be to add basic support of ROCm to [community], i.e. the low level libraries, including HIP and an open source runtime for OpenCL based on ROCm. That would be enough to run blender with its ROCm backend. At the same time, I would expand the wiki article on ROCm. The interaction with the community would also move from the issue tracker of rocm-arch to the Arch Linux bug tracker and the forums. In a second phase I would add the high level libraries that would enable users to quickly compile and run complex libraries such as tensorflow, pytorch or cupy.
#BEGIN Technical details
The minimal package list for HIP which includes the runtime libraries for basic GPU programming and the GPU compiler (hipcc) comprises eight packages
* rocm-cmake (basic cmake files for ROCm) * rocm-llvm (upstream llvm with to-be-merged changes by AMD) * rocm-device-libs (implements math functions for all GPU architectures) * comgr (runtime library, "compiler support" for rocm-llvm) * hsakmt-roct (interface to the amdgpu kernel driver) * hsa-rocr (runtime for HSA compute kernels) * rocminfo (display information on HSA agents: GPU and possibly CPU) * hip-runtime-amd (runtime and compiler for HIP, a C++ dialect inspired by CUDA C++)
All but rocm-llvm are small libraries under the permissive MIT license. Since ROCm 5.2, all packages successfully build in a clean chroot and are distributed in the community repo arch4edu.
The application libraries for numerical linear algebra, sparse matrices or random numbers start with roc and hip (rocblas, rocsparse, rocrand). The hip* packages are designed in such a way that they would also work with CUDA if hip is configured with CUDA instead of a ROCm/HSA backend. With few exceptions (rocthrust, rccl) these packages are licensed under MIT.
Possible issues: There are three packages that are not fully working under Arch Linux or lack an open source license. The first is rocm-gdb, a fork of gdb with GPU support. To work properly it needs a kernel module currently not available in upstream linux but only as part of AMD's dkms modules. But they only work with specific kernel versions. Support for this from my side on Arch Linux was dropped a while ago. One closed source package is hsa-amd-aqlprofile. As the name suggests it is used for profiling as part of rocprofiler. Above mentioned packages are only required for debugging and profiling but are no runtime dependencies of the big machine learning libraries or any other package with ROCm support I'm aware of. The third package is rocm-core, a package only part of the meta packages for ROCm with no influence on the ROCm runtime. It provides a single header and a library with a single function that returns the current ROCm version. No source code has been published by AMD so far and the official package lacks a license file.
A second issue is GPU support. AMD officially only supports the professional compute GPUs. This does not mean that ROCm is not working on consumer cards but merely that AMD cannot guarantee all functionalities through excessive testing. Recently, ROCm added support for Navi 21 (RX 6800 onwards), see
https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardw...
I own a Vega 56 (gfx900) that is officially supported, so I can test all packages before publishing them on the AUR / in [community].
#END Technical details
On the long term, I would like to foster Arch Linux as the leading platform for scientific computing. This includes Machine Learning libraries in the official repositories as well as packages for classical "number crunching" such as petsc, trilinos and packages that depend on them: deal-ii, dune or ngsolve.
The sponsors of my application are Sven (svenstaro) and Bruno (archange).
I'm looking forward to the upcoming the discussion and your feedback on my application.
Best, Torsten
I'm indeed sponsoring Torsten. 😄
Le 26/10/2022 à 10:30, Torsten Keßler a écrit :
[…]
The sponsors of my application are Sven (svenstaro) and Bruno (archange).
[…]
I hereby confirm my sponsorship of Torsten Keßler. Bruno/Archange
On 27.10.22 09:45, Archange wrote:
Le 26/10/2022 à 10:30, Torsten Keßler a écrit :
[…]
The sponsors of my application are Sven (svenstaro) and Bruno (archange).
[…]
I hereby confirm my sponsorship of Torsten Keßler.
Bruno/Archange
It hasn't been stated explicitly but Bruno's confirmation begins the discussion period which will conclude in two weeks on 2022-11-11. The voting will start on the same day and conclude on 2022-11-18. Cheers, Sven
On 29.10.22 05:30, Sven-Hendrik Haase wrote:
On 27.10.22 09:45, Archange wrote:
Le 26/10/2022 à 10:30, Torsten Keßler a écrit :
[…]
The sponsors of my application are Sven (svenstaro) and Bruno (archange).
[…]
I hereby confirm my sponsorship of Torsten Keßler.
Bruno/Archange
It hasn't been stated explicitly but Bruno's confirmation begins the discussion period which will conclude in two weeks on 2022-11-11. The voting will start on the same day and conclude on 2022-11-18.
Cheers, Sven
Just a reminder: We only have five more days to go and no one has roasted Torsten yet! Sven
On Wed, 2022-10-26 at 06:30 +0000, Torsten Keßler wrote:
Hi! I'm Torsten Keßler (tpkessler in AUR and on GitHub) from Saarland, a federal state in the south west of Germany. With this email I'm applying to be become a trusted user. After graduating with a PhD in applied mathematics this year I'm now a post-doc with a focus on numerical analysis, the art of solving physical problems with mathematically sound algorithms on a computer. I've been using Arch Linux on my private machines (and at work) since my first weeks at university ten years ago. After initial distro hopping a friend recommended Arch. I immediately liked the way it handles packages via pacman, its wiki and the flexibility of its installation process.
Owing to their massively parallel architecture, GPUs have emerged as the leading platform for computationally expensive problems: Machine Learning/AI, real-world engineering problems, simulation of complex physical systems. For a long time, nVidia's CUDA framework (closed source, exclusively for their GPUs) has dominated this field. In 2015, AMD announced ROCm, their open source compute framework for GPUs. A common interface to CUDA, called HIP, makes it possible to write code that compiles and runs both on AMD and nVidia hardware. I've been closely following the development of ROCm on GitHub, trying to compile the stack from time to time. But only since 2020, the kernel includes all the necessary code to compile the ROCm stack on Arch Linux. Around this time I've started to contribute to rocm-arch on GitHub, a collection of PKGBUILDs for ROCm (with around 50 packages). Soon after that, I became the main contributor to the repository and, since 2021, I've been the maintainer of the whole ROCm stack.
We have an active issue tracker and recently started a discussion page for rocm-arch. Most of the open issues as of now are for bookkeeping of patches we applied to run ROCm on Arch Linux. Many of them are linked to an upstream issue and a corresponding pull request that fixes the issues. This way I've already contributed code to a couple of libraries of the ROCm stack.
Over the years, many libraries have added official support for ROCm, including tensorflow, pytorch, python-cupy, python-numba (not actively maintained anymore) and blender. Support of ROCm for the latter generated large interest in the community and is one reason Sven contacted me, asking me if I would be interested to take care of ROCm in [community]. In its current version, ROCm support for blender works out of the box. Just install hip-runtime-amd from the AUR and enable the HIP backend in blender's settings for rendering. The machine learning libraries require more dependencies from the AUR. Once installed, pytorch and tensorflow are known to work on Vega GPUs and the recent RDNA architecture.
My first action as a TU would be to add basic support of ROCm to [community], i.e. the low level libraries, including HIP and an open source runtime for OpenCL based on ROCm. That would be enough to run blender with its ROCm backend. At the same time, I would expand the wiki article on ROCm. The interaction with the community would also move from the issue tracker of rocm-arch to the Arch Linux bug tracker and the forums. In a second phase I would add the high level libraries that would enable users to quickly compile and run complex libraries such as tensorflow, pytorch or cupy.
Huge +1 for me here. It would be awesome to bring ROCm to the official repos. I have not done it as currently I am split between tons of projects, which makes it hard to find the time for the initial work and then commit to maintaining the stack, so I am very excited having someone take this item of the my endless TODO list!
#BEGIN Technical details
The minimal package list for HIP which includes the runtime libraries for basic GPU programming and the GPU compiler (hipcc) comprises eight packages
* rocm-cmake (basic cmake files for ROCm) * rocm-llvm (upstream llvm with to-be-merged changes by AMD) * rocm-device-libs (implements math functions for all GPU architectures) * comgr (runtime library, "compiler support" for rocm-llvm) * hsakmt-roct (interface to the amdgpu kernel driver) * hsa-rocr (runtime for HSA compute kernels) * rocminfo (display information on HSA agents: GPU and possibly CPU) * hip-runtime-amd (runtime and compiler for HIP, a C++ dialect inspired by CUDA C++)
All but rocm-llvm are small libraries under the permissive MIT license. Since ROCm 5.2, all packages successfully build in a clean chroot and are distributed in the community repo arch4edu.
The application libraries for numerical linear algebra, sparse matrices or random numbers start with roc and hip (rocblas, rocsparse, rocrand). The hip* packages are designed in such a way that they would also work with CUDA if hip is configured with CUDA instead of a ROCm/HSA backend. With few exceptions (rocthrust, rccl) these packages are licensed under MIT.
Possible issues: There are three packages that are not fully working under Arch Linux or lack an open source license. The first is rocm-gdb, a fork of gdb with GPU support. To work properly it needs a kernel module currently not available in upstream linux but only as part of AMD's dkms modules. But they only work with specific kernel versions. Support for this from my side on Arch Linux was dropped a while ago. One closed source package is hsa-amd-aqlprofile. As the name suggests it is used for profiling as part of rocprofiler. Above mentioned packages are only required for debugging and profiling but are no runtime dependencies of the big machine learning libraries or any other package with ROCm support I'm aware of. The third package is rocm-core, a package only part of the meta packages for ROCm with no influence on the ROCm runtime. It provides a single header and a library with a single function that returns the current ROCm version. No source code has been published by AMD so far and the official package lacks a license file.
A second issue is GPU support. AMD officially only supports the professional compute GPUs. This does not mean that ROCm is not working on consumer cards but merely that AMD cannot guarantee all functionalities through excessive testing. Recently, ROCm added support for Navi 21 (RX 6800 onwards), see
https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardw...
I own a Vega 56 (gfx900) that is officially supported, so I can test all packages before publishing them on the AUR / in [community].
I own a RX 5700 XT (gfx1010), if specific testing is required.
#END Technical details
On the long term, I would like to foster Arch Linux as the leading platform for scientific computing. This includes Machine Learning libraries in the official repositories as well as packages for classical "number crunching" such as petsc, trilinos and packages that depend on them: deal-ii, dune or ngsolve.
+1 on this too. My day job is supporting the Python scientific computing / data science ecosystem, with a focus on packaging, so I am looking forward to this, and helping out where I can.
The sponsors of my application are Sven (svenstaro) and Bruno (archange).
I'm looking forward to the upcoming the discussion and your feedback on my application.
Best, Torsten
That said, I skimmed Torsten's PKGBUILD, and the only thing I noticed was the missing -DCMAKE_BUILD_TYPE=None argument from CMake packages, against the recommendations from [1], which I wouldn't consider a bid deal anyway. So no roast for me, against Sven's expectations :P Overall, I am very happy we have someone interested in working on ROCm support in the offical repos, and am looking forward to working with Torsten. +1 on the candidate for me! [1] https://wiki.archlinux.org/title/CMake_package_guidelines Cheers, Filipe Laíns
Hi Torsten! On Wed, 26 Oct 2022 06:30:33 +0000 Torsten Keßler <t.kessler@posteo.de> wrote:
Hi! I'm Torsten Keßler (tpkessler in AUR and on GitHub) from Saarland, a federal state in the south west of Germany. With this email I'm applying to be become a trusted user. After graduating with a PhD in applied mathematics this year I'm now a post-doc with a focus on numerical analysis, the art of solving physical problems with mathematically sound algorithms on a computer. I've been using Arch Linux on my private machines (and at work) since my first weeks at university ten years ago. After initial distro hopping a friend recommended Arch. I immediately liked the way it handles packages via pacman, its wiki and the flexibility of its installation process.
Soon we can switch the Arch Linux IRC main language to German!
Owing to their massively parallel architecture, GPUs have emerged as the leading platform for computationally expensive problems: Machine Learning/AI, real-world engineering problems, simulation of complex physical systems. For a long time, nVidia's CUDA framework (closed source, exclusively for their GPUs) has dominated this field. In 2015, AMD announced ROCm, their open source compute framework for GPUs. A common interface to CUDA, called HIP, makes it possible to write code that compiles and runs both on AMD and nVidia hardware. I've been closely following the development of ROCm on GitHub, trying to compile the stack from time to time. But only since 2020, the kernel includes all the necessary code to compile the ROCm stack on Arch Linux. Around this time I've started to contribute to rocm-arch on GitHub, a collection of PKGBUILDs for ROCm (with around 50 packages). Soon after that, I became the main contributor to the repository and, since 2021, I've been the maintainer of the whole ROCm stack.
We have an active issue tracker and recently started a discussion page for rocm-arch. Most of the open issues as of now are for bookkeeping of patches we applied to run ROCm on Arch Linux. Many of them are linked to an upstream issue and a corresponding pull request that fixes the issues. This way I've already contributed code to a couple of libraries of the ROCm stack.
Over the years, many libraries have added official support for ROCm, including tensorflow, pytorch, python-cupy, python-numba (not actively maintained anymore) and blender. Support of ROCm for the latter generated large interest in the community and is one reason Sven contacted me, asking me if I would be interested to take care of ROCm in [community]. In its current version, ROCm support for blender works out of the box. Just install hip-runtime-amd from the AUR and enable the HIP backend in blender's settings for rendering. The machine learning libraries require more dependencies from the AUR. Once installed, pytorch and tensorflow are known to work on Vega GPUs and the recent RDNA architecture.
My first action as a TU would be to add basic support of ROCm to [community], i.e. the low level libraries, including HIP and an open source runtime for OpenCL based on ROCm. That would be enough to run blender with its ROCm backend. At the same time, I would expand the wiki article on ROCm. The interaction with the community would also move from the issue tracker of rocm-arch to the Arch Linux bug tracker and the forums. In a second phase I would add the high level libraries that would enable users to quickly compile and run complex libraries such as tensorflow, pytorch or cupy.
The limited support of ROCm has been one of the main things locking me into Nvidia for my workstations. Having stuff in community would certainly help with that!
#BEGIN Technical details
The minimal package list for HIP which includes the runtime libraries for basic GPU programming and the GPU compiler (hipcc) comprises eight packages
* rocm-cmake (basic cmake files for ROCm) * rocm-llvm (upstream llvm with to-be-merged changes by AMD) * rocm-device-libs (implements math functions for all GPU architectures) * comgr (runtime library, "compiler support" for rocm-llvm) * hsakmt-roct (interface to the amdgpu kernel driver) * hsa-rocr (runtime for HSA compute kernels) * rocminfo (display information on HSA agents: GPU and possibly CPU) * hip-runtime-amd (runtime and compiler for HIP, a C++ dialect inspired by CUDA C++)
PKGBUILDs look good to me. Some ROC repositories include documentation (cmake, device libs, hip), maybe it would make sense to include those in `/usr/share/doc/${pkgname}`?
All but rocm-llvm are small libraries under the permissive MIT license. Since ROCm 5.2, all packages successfully build in a clean chroot and are distributed in the community repo arch4edu.
The application libraries for numerical linear algebra, sparse matrices or random numbers start with roc and hip (rocblas, rocsparse, rocrand). The hip* packages are designed in such a way that they would also work with CUDA if hip is configured with CUDA instead of a ROCm/HSA backend. With few exceptions (rocthrust, rccl) these packages are licensed under MIT.
Possible issues: There are three packages that are not fully working under Arch Linux or lack an open source license. The first is rocm-gdb, a fork of gdb with GPU support. To work properly it needs a kernel module currently not available in upstream linux but only as part of AMD's dkms modules. But they only work with specific kernel versions. Support for this from my side on Arch Linux was dropped a while ago. One closed source package is hsa-amd-aqlprofile. As the name suggests it is used for profiling as part of rocprofiler. Above mentioned packages are only required for debugging and profiling but are no runtime dependencies of the big machine learning libraries or any other package with ROCm support I'm aware of. The third package is rocm-core, a package only part of the meta packages for ROCm with no influence on the ROCm runtime. It provides a single header and a library with a single function that returns the current ROCm version. No source code has been published by AMD so far and the official package lacks a license file.
Have you tried contacting AMD about `rocm-core`? It seems odd to keep such a small thing closed source / without a license.
A second issue is GPU support. AMD officially only supports the professional compute GPUs. This does not mean that ROCm is not working on consumer cards but merely that AMD cannot guarantee all functionalities through excessive testing. Recently, ROCm added support for Navi 21 (RX 6800 onwards), see
https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardw...
I own a Vega 56 (gfx900) that is officially supported, so I can test all packages before publishing them on the AUR / in [community].
Finding information about ROCm support in consumer cards really isn't easy – but I guess with CUDA I just expect it to work with recent Nvidia cards? I would guess that we have a bunch of TUs with Radeon RX 5000/6000 (and soon 7000) series cards, but without the needed knowledge / use case for ROCm. Maybe it would be a good idea to provide testing scripts / documents for them, so they can report back once you push things into testing? Having a list of tested cards in the wiki would be great as well.
#END Technical details
On the long term, I would like to foster Arch Linux as the leading platform for scientific computing. This includes Machine Learning libraries in the official repositories as well as packages for classical "number crunching" such as petsc, trilinos and packages that depend on them: deal-ii, dune or ngsolve.
The sponsors of my application are Sven (svenstaro) and Bruno (archange).
I'm looking forward to the upcoming the discussion and your feedback on my application.
Best, Torsten
Best Regards Justin -- hashworks Web https://hashworks.net Public Key 0x4FE7F4FEAC8EBE67
Hi Filipe! It's great to meet a further Arch TU who's excited about the ROCm stack. Thank you for your feedback on my application.
I own a RX 5700 XT (gfx1010), if specific testing is required. Very nice! Hopefully, we will find more people who would like to help with the testing. So far, I can only extrapolate from my own experience and the issues on GitHub. Most of them are concerned with build failures and hardly address runtime issues with supported GPUs. Having first hand information on the performance of the ROCm stack on different GPU architectures is very important when shipping the binary packages.
Python scientific computing / data science ecosystem, with a focus on packaging, so I am looking forward to this, and helping out where I can. Awesome! Regarding ML with ROCm I already had a short conversation with Konstantin (kgizdov) who's maintaining the ML packages for Arch Linux. He's eager to include a ROCm backend and is looking forward to my possible future contributions.
I noticed was the missing -DCMAKE_BUILD_TYPE=None argument from CMake packages That's (partly) done intentionally. Most of the ROCm packages enable a "Release" build by default, see [1,2] for instance. As I've been trying to stay a close as possible to upstream (and the official Debian packages) I haven't touched CMAKE_BUILD_TYPE. I'm of course willing to change this.
+1 on the candidate for me! Thank you! :)
Best! Torsten [1] https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/f4826cbfce09fb1ed6292d8... [2] https://github.com/ROCmSoftwarePlatform/rocSPARSE/blob/3a05469742e91841676f5...
On Sun, 2022-11-06 at 19:22 +0000, Torsten Keßler wrote:
I noticed was the missing -DCMAKE_BUILD_TYPE=None argument from CMake packages That's (partly) done intentionally. Most of the ROCm packages enable a "Release" build by default, see [1,2] for instance. As I've been trying to stay a close as possible to upstream (and the official Debian packages) I haven't touched CMAKE_BUILD_TYPE. I'm of course willing to change this.
Staying as close to the upstream as possible is the correct when talking about code, i.e. we try to avoid patching the production code as much as possible, but not the right call for compiler flags. Granted, there may be some cases where for specific reasons we might want to do that, but the general rule is that we should use arch's compiler flags. Lots of people have misconceptions about -O3 vs -O2, and set the release build to -O3. However, arch has decided that our builds -O2 should be used, so we should (generally) set DCMAKE_BUILD_TYPE=None. See [1], specifically 2.1.2, which highlights this issue. [1] https://wiki.archlinux.org/title/CMake_package_guidelines#CMake_can_automati... Cheers, Filipe Laíns
Hi Torsten, good luck for your application :) My first question would be how you keep track of upstream locations and which one has new releases available? It looks like the whole stack has version 5.3.1 released. DCMAKE_BUILD_TYPE=None has already been mentioned, also also explained equally as I would have -- hence I will leave that one out of all reviews (while agreeing with Filipe). Now, a "little bit" of feedback after reviewing your current AUR packages. Please don't feel overwhelmed. I've reviewed now for nearly 3 hours and will send the current results over :) comgr - looks like upstream provides some tests, it would be useful to always try running tests whenever available: https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/tree/amd-stg-open/... hip-runtime-amd - I'm not sure how that stacked package really works and which one its real tests are, but there seem to be some available in: https://github.com/ROCm-Developer-Tools/HIP/tree/develop/tests hip-runtime-nvidia - same question about testsas hip-runtime-amd - the nvcc.patch:: prefix for the pull request patch is not good as its not really a unique name. The reason: whenever sources are placed into the same dir, like when setting SRCDEST in makepkg this leads to issues if any other package may ever specify nvcc.patch. f.e. hip-runtime-fix-logic-for-finding-nvcc.patch:: Depending on the filename, it sometimes makes sense to also include the $pkgver into the prefix name - the pull request #2623 which this patch depends on has been rejected upstream and it looks like superseded by #2849. Worth checking out a way that upstream is not against. hipblas - Again wondering a bit about tests, as this time we even seem to disable them on purpose: DBUILD_CLIENTS_TESTS=OFF hipcub - reference-previous-mention: tests - The git dependency made me wonder, and actually the cmake ecosystem seems to clone rocPRIM.git and doesn't seem to pin it to a specific hash which means this package isn't reproducible when the repo changes. Instead we need to specify all downloaded repo in the sources array with a fixed hash and link the $srcdir repos into the proper place with a small patch to the cmake build env to avoid fresh clones. this also applies to googletest and googlebenchmark. On top it seems to download cub/thrust as well: https://github.com/ROCmSoftwarePlatform/hipCUB/blob/develop/cmake/Dependenci... Some infos about reproducible builds: https://reproducible-builds.org/ hipfft - reference-previous-mention: tests - has similar none reproducible download issues as hipcub which need to be pinned and passed in sources() It seems to download rocm-cmake from master https://github.com/ROCmSoftwarePlatform/hipFFT/blob/develop/cmake/dependenci... This package should instead depend on rocm-cmake hipfort - reference-previous-mention: tests hipify-clang - reference-previous-mention: tests hipsolver - reference-previous-mention: reproducible seems to download rocm-cmake and should instead depend on it https://github.com/ROCmSoftwarePlatform/hipSOLVER/blob/develop/CMakeLists.tx... hipsparse - reference-previous-mention: tests - reference-previous-mention: reproducible / rocm-cmake hsa-amd-aqlprofile-bin - doesn't seem to distribute the proprietary license. hsa-rocr - CMAKE_CXX_FLAGS='-DNDEBUG' seems to discard our distro CXXFLAGS hsakmt-roct - reference-previous-mention: tests - wondering if it wouldn't be better to use BUILD_SHARED_LIBS instead of statically linking? mathtime-professional - don't quite understand this package with sources to local://mtp2fonts.zip.tpm Does this package make sense for the general public? migraphx - reference-previous-mention: tests - patch in PR #1435 has been merged and should be replaced by the upstream url: https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/pull/1435 https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/commit/ba0913b1e9c86e944... - some none unique source file prefix, reference explained in hip-runtime-nvidia - reference-previous-mention: reproducible https://github.com/ROCmSoftwarePlatform/AMDMIGraphX/blob/develop/install_dep... miopen-hip - reference-previous-mention: tests - reference-previous-mention: none-deterministic https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/fin/install_deps... miopen-opencl - same as miopen-hip miopengemm - reference-previous-mention: tests mivisionx - reference-previous-mention: tests pcg-c-git - missing conflicts=("$_pkgname") for correctness, even if it doesn't currently exist python-meshio - reference-previous-mention: tests rccl - reference-previous-mention: tests BUILD_TESTS=OFF - reference-previous-mention: reproducible / rocm-cmake https://github.com/ROCmSoftwarePlatform/rccl/blob/develop/cmake/Dependencies... cheers, Levente
Some ROC repositories include documentation (cmake, device libs, hip), maybe it would make sense to include those in `/usr/share/doc/${pkgname}`? That's a very good idea. For some packages, AMD bundles them with the
The limited support of ROCm has been one of the main things locking me into Nvidia for my workstations. Yes, that's really the main drawback of ROCm. CUDA works on almost any Nvidia GPU (even on mobile variants). I hope AMD will change their
Hi Justin! package (rocm-dbgapi) and sometimes it's shipped separately, see hip-doc [1]. policy with Navi 30+.
Have you tried contacting AMD about `rocm-core`? Others already did. AMD supported promised to release the source code in March [2].
Finding information about ROCm support in consumer cards really isn't easy – but I guess with CUDA I just expect it to work with recent Nvidia cards? Do you mean the common HIP abstraction layer (like hipfft, hipblas,...)? Yes, that should work with any recent CUDA version. But I haven't tried this as I don't have access to an Nvidia GPU. Furthermore, this feature (HIP with CUDA) has never been requested by the community at rocm-arch. I think Nvidia users just stick with CUDA and don't need HIP.
Maybe it would be a good idea to provide testing scripts / documents for them, so they can report back once you push things into testing? Absolutely! There's HIP examples [3] from AMD which checks basic HIP language features. Additionally, we have `rocm-validation-suite` which offers several tests.
Having a list of tested cards in the wiki would be great as well. I agree! Once we have an established test suite, this should be straightforward.
Best! Torsten [1] http://repo.radeon.com/rocm/apt/5.3/pool/main/h/hip-doc/ [2] https://github.com/RadeonOpenCompute/ROCm/issues/1705#issuecomment-108159928... [3] https://github.com/ROCm-Developer-Tools/HIP-Examples Am 06.11.22 um 23:10 schrieb aur-general-request@lists.archlinux.org:
Send Aur-general mailing list submissions to aur-general@lists.archlinux.org
To subscribe or unsubscribe via email, send a message with subject or body 'help' to aur-general-request@lists.archlinux.org
You can reach the person managing the list at aur-general-owner@lists.archlinux.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Aur-general digest..."
Today's Topics:
1. Re: TU Application - tpkessler (Justin Kromlinger) 2. Re: TU Application - tpkessler (Torsten Keßler) 3. Re: TU Application - tpkessler (Filipe Laíns)
----------------------------------------------------------------------
Message: 1 Date: Sun, 6 Nov 2022 20:01:14 +0100 From: Justin Kromlinger <hashworks@archlinux.org> Subject: Re: TU Application - tpkessler To: aur-general@lists.archlinux.org Message-ID: <20221106200114.438404af@maker.hashworks.net> Content-Type: multipart/signed; boundary="Sig_//b6Alp4sEqgQD9YkXUpB1=1"; protocol="application/pgp-signature"; micalg=pgp-sha256
Hi Torsten!
On Wed, 26 Oct 2022 06:30:33 +0000 Torsten Keßler <t.kessler@posteo.de> wrote:
Hi! I'm Torsten Keßler (tpkessler in AUR and on GitHub) from Saarland, a federal state in the south west of Germany. With this email I'm applying to be become a trusted user. After graduating with a PhD in applied mathematics this year I'm now a post-doc with a focus on numerical analysis, the art of solving physical problems with mathematically sound algorithms on a computer. I've been using Arch Linux on my private machines (and at work) since my first weeks at university ten years ago. After initial distro hopping a friend recommended Arch. I immediately liked the way it handles packages via pacman, its wiki and the flexibility of its installation process. Soon we can switch the Arch Linux IRC main language to German!
Owing to their massively parallel architecture, GPUs have emerged as the leading platform for computationally expensive problems: Machine Learning/AI, real-world engineering problems, simulation of complex physical systems. For a long time, nVidia's CUDA framework (closed source, exclusively for their GPUs) has dominated this field. In 2015, AMD announced ROCm, their open source compute framework for GPUs. A common interface to CUDA, called HIP, makes it possible to write code that compiles and runs both on AMD and nVidia hardware. I've been closely following the development of ROCm on GitHub, trying to compile the stack from time to time. But only since 2020, the kernel includes all the necessary code to compile the ROCm stack on Arch Linux. Around this time I've started to contribute to rocm-arch on GitHub, a collection of PKGBUILDs for ROCm (with around 50 packages). Soon after that, I became the main contributor to the repository and, since 2021, I've been the maintainer of the whole ROCm stack. We have an active issue tracker and recently started a discussion page for rocm-arch. Most of the open issues as of now are for bookkeeping of patches we applied to run ROCm on Arch Linux. Many of them are linked to an upstream issue and a corresponding pull request that fixes the issues. This way I've already contributed code to a couple of libraries of the ROCm stack.
Over the years, many libraries have added official support for ROCm, including tensorflow, pytorch, python-cupy, python-numba (not actively maintained anymore) and blender. Support of ROCm for the latter generated large interest in the community and is one reason Sven contacted me, asking me if I would be interested to take care of ROCm in [community]. In its current version, ROCm support for blender works out of the box. Just install hip-runtime-amd from the AUR and enable the HIP backend in blender's settings for rendering. The machine learning libraries require more dependencies from the AUR. Once installed, pytorch and tensorflow are known to work on Vega GPUs and the recent RDNA architecture.
My first action as a TU would be to add basic support of ROCm to [community], i.e. the low level libraries, including HIP and an open source runtime for OpenCL based on ROCm. That would be enough to run blender with its ROCm backend. At the same time, I would expand the wiki article on ROCm. The interaction with the community would also move from the issue tracker of rocm-arch to the Arch Linux bug tracker and the forums. In a second phase I would add the high level libraries that would enable users to quickly compile and run complex libraries such as tensorflow, pytorch or cupy. The limited support of ROCm has been one of the main things locking me into Nvidia for my workstations. Having stuff in community would certainly help with that!
#BEGIN Technical details
The minimal package list for HIP which includes the runtime libraries for basic GPU programming and the GPU compiler (hipcc) comprises eight packages
* rocm-cmake (basic cmake files for ROCm) * rocm-llvm (upstream llvm with to-be-merged changes by AMD) * rocm-device-libs (implements math functions for all GPU architectures) * comgr (runtime library, "compiler support" for rocm-llvm) * hsakmt-roct (interface to the amdgpu kernel driver) * hsa-rocr (runtime for HSA compute kernels) * rocminfo (display information on HSA agents: GPU and possibly CPU) * hip-runtime-amd (runtime and compiler for HIP, a C++ dialect inspired by CUDA C++) PKGBUILDs look good to me. Some ROC repositories include documentation (cmake, device libs, hip), maybe it would make sense to include those in `/usr/share/doc/${pkgname}`?
All but rocm-llvm are small libraries under the permissive MIT license. Since ROCm 5.2, all packages successfully build in a clean chroot and are distributed in the community repo arch4edu.
The application libraries for numerical linear algebra, sparse matrices or random numbers start with roc and hip (rocblas, rocsparse, rocrand). The hip* packages are designed in such a way that they would also work with CUDA if hip is configured with CUDA instead of a ROCm/HSA backend. With few exceptions (rocthrust, rccl) these packages are licensed under MIT.
Possible issues: There are three packages that are not fully working under Arch Linux or lack an open source license. The first is rocm-gdb, a fork of gdb with GPU support. To work properly it needs a kernel module currently not available in upstream linux but only as part of AMD's dkms modules. But they only work with specific kernel versions. Support for this from my side on Arch Linux was dropped a while ago. One closed source package is hsa-amd-aqlprofile. As the name suggests it is used for profiling as part of rocprofiler. Above mentioned packages are only required for debugging and profiling but are no runtime dependencies of the big machine learning libraries or any other package with ROCm support I'm aware of. The third package is rocm-core, a package only part of the meta packages for ROCm with no influence on the ROCm runtime. It provides a single header and a library with a single function that returns the current ROCm version. No source code has been published by AMD so far and the official package lacks a license file. Have you tried contacting AMD about `rocm-core`? It seems odd to keep such a small thing closed source / without a license.
A second issue is GPU support. AMD officially only supports the professional compute GPUs. This does not mean that ROCm is not working on consumer cards but merely that AMD cannot guarantee all functionalities through excessive testing. Recently, ROCm added support for Navi 21 (RX 6800 onwards), see
https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardw...
I own a Vega 56 (gfx900) that is officially supported, so I can test all packages before publishing them on the AUR / in [community]. Finding information about ROCm support in consumer cards really isn't easy – but I guess with CUDA I just expect it to work with recent Nvidia cards?
I would guess that we have a bunch of TUs with Radeon RX 5000/6000 (and soon 7000) series cards, but without the needed knowledge / use case for ROCm. Maybe it would be a good idea to provide testing scripts / documents for them, so they can report back once you push things into testing?
Having a list of tested cards in the wiki would be great as well.
#END Technical details
On the long term, I would like to foster Arch Linux as the leading platform for scientific computing. This includes Machine Learning libraries in the official repositories as well as packages for classical "number crunching" such as petsc, trilinos and packages that depend on them: deal-ii, dune or ngsolve.
The sponsors of my application are Sven (svenstaro) and Bruno (archange).
I'm looking forward to the upcoming the discussion and your feedback on my application.
Best, Torsten Best Regards Justin
On Mon, 7 Nov 2022 19:05:16 +0000 Torsten Keßler <t.kessler@posteo.de> wrote:
Hi Justin!
Some ROC repositories include documentation (cmake, device libs, hip), maybe it would make sense to include those in `/usr/share/doc/${pkgname}`? That's a very good idea. For some packages, AMD bundles them with the package (rocm-dbgapi) and sometimes it's shipped separately, see hip-doc [1].
The limited support of ROCm has been one of the main things locking me into Nvidia for my workstations. Yes, that's really the main drawback of ROCm. CUDA works on almost any Nvidia GPU (even on mobile variants). I hope AMD will change their policy with Navi 30+.
Have you tried contacting AMD about `rocm-core`? Others already did. AMD supported promised to release the source code in March [2].
Finding information about ROCm support in consumer cards really isn't easy – but I guess with CUDA I just expect it to work with recent Nvidia cards? Do you mean the common HIP abstraction layer (like hipfft, hipblas,...)? Yes, that should work with any recent CUDA version. But I haven't tried this as I don't have access to an Nvidia GPU. Furthermore, this feature (HIP with CUDA) has never been requested by the community at rocm-arch. I think Nvidia users just stick with CUDA and don't need HIP.
I mean with ROCm I'm not sure if a GPU I'm going to buy will support it.
Maybe it would be a good idea to provide testing scripts / documents for them, so they can report back once you push things into testing? Absolutely! There's HIP examples [3] from AMD which checks basic HIP language features. Additionally, we have `rocm-validation-suite` which offers several tests.
Having a list of tested cards in the wiki would be great as well. I agree! Once we have an established test suite, this should be straightforward.
Best! Torsten
[1] http://repo.radeon.com/rocm/apt/5.3/pool/main/h/hip-doc/ [2] https://github.com/RadeonOpenCompute/ROCm/issues/1705#issuecomment-108159928... [3] https://github.com/ROCm-Developer-Tools/HIP-Examples
Am 06.11.22 um 23:10 schrieb aur-general-request@lists.archlinux.org:
Send Aur-general mailing list submissions to aur-general@lists.archlinux.org
To subscribe or unsubscribe via email, send a message with subject or body 'help' to aur-general-request@lists.archlinux.org
You can reach the person managing the list at aur-general-owner@lists.archlinux.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Aur-general digest..."
Today's Topics:
1. Re: TU Application - tpkessler (Justin Kromlinger) 2. Re: TU Application - tpkessler (Torsten Keßler) 3. Re: TU Application - tpkessler (Filipe Laíns)
----------------------------------------------------------------------
Message: 1 Date: Sun, 6 Nov 2022 20:01:14 +0100 From: Justin Kromlinger <hashworks@archlinux.org> Subject: Re: TU Application - tpkessler To: aur-general@lists.archlinux.org Message-ID: <20221106200114.438404af@maker.hashworks.net> Content-Type: multipart/signed; boundary="Sig_//b6Alp4sEqgQD9YkXUpB1=1"; protocol="application/pgp-signature"; micalg=pgp-sha256
Hi Torsten!
On Wed, 26 Oct 2022 06:30:33 +0000 Torsten Keßler <t.kessler@posteo.de> wrote:
Hi! I'm Torsten Keßler (tpkessler in AUR and on GitHub) from Saarland, a federal state in the south west of Germany. With this email I'm applying to be become a trusted user. After graduating with a PhD in applied mathematics this year I'm now a post-doc with a focus on numerical analysis, the art of solving physical problems with mathematically sound algorithms on a computer. I've been using Arch Linux on my private machines (and at work) since my first weeks at university ten years ago. After initial distro hopping a friend recommended Arch. I immediately liked the way it handles packages via pacman, its wiki and the flexibility of its installation process. Soon we can switch the Arch Linux IRC main language to German!
Owing to their massively parallel architecture, GPUs have emerged as the leading platform for computationally expensive problems: Machine Learning/AI, real-world engineering problems, simulation of complex physical systems. For a long time, nVidia's CUDA framework (closed source, exclusively for their GPUs) has dominated this field. In 2015, AMD announced ROCm, their open source compute framework for GPUs. A common interface to CUDA, called HIP, makes it possible to write code that compiles and runs both on AMD and nVidia hardware. I've been closely following the development of ROCm on GitHub, trying to compile the stack from time to time. But only since 2020, the kernel includes all the necessary code to compile the ROCm stack on Arch Linux. Around this time I've started to contribute to rocm-arch on GitHub, a collection of PKGBUILDs for ROCm (with around 50 packages). Soon after that, I became the main contributor to the repository and, since 2021, I've been the maintainer of the whole ROCm stack. We have an active issue tracker and recently started a discussion page for rocm-arch. Most of the open issues as of now are for bookkeeping of patches we applied to run ROCm on Arch Linux. Many of them are linked to an upstream issue and a corresponding pull request that fixes the issues. This way I've already contributed code to a couple of libraries of the ROCm stack.
Over the years, many libraries have added official support for ROCm, including tensorflow, pytorch, python-cupy, python-numba (not actively maintained anymore) and blender. Support of ROCm for the latter generated large interest in the community and is one reason Sven contacted me, asking me if I would be interested to take care of ROCm in [community]. In its current version, ROCm support for blender works out of the box. Just install hip-runtime-amd from the AUR and enable the HIP backend in blender's settings for rendering. The machine learning libraries require more dependencies from the AUR. Once installed, pytorch and tensorflow are known to work on Vega GPUs and the recent RDNA architecture.
My first action as a TU would be to add basic support of ROCm to [community], i.e. the low level libraries, including HIP and an open source runtime for OpenCL based on ROCm. That would be enough to run blender with its ROCm backend. At the same time, I would expand the wiki article on ROCm. The interaction with the community would also move from the issue tracker of rocm-arch to the Arch Linux bug tracker and the forums. In a second phase I would add the high level libraries that would enable users to quickly compile and run complex libraries such as tensorflow, pytorch or cupy. The limited support of ROCm has been one of the main things locking me into Nvidia for my workstations. Having stuff in community would certainly help with that!
#BEGIN Technical details
The minimal package list for HIP which includes the runtime libraries for basic GPU programming and the GPU compiler (hipcc) comprises eight packages
* rocm-cmake (basic cmake files for ROCm) * rocm-llvm (upstream llvm with to-be-merged changes by AMD) * rocm-device-libs (implements math functions for all GPU architectures) * comgr (runtime library, "compiler support" for rocm-llvm) * hsakmt-roct (interface to the amdgpu kernel driver) * hsa-rocr (runtime for HSA compute kernels) * rocminfo (display information on HSA agents: GPU and possibly CPU) * hip-runtime-amd (runtime and compiler for HIP, a C++ dialect inspired by CUDA C++) PKGBUILDs look good to me. Some ROC repositories include documentation (cmake, device libs, hip), maybe it would make sense to include those in `/usr/share/doc/${pkgname}`?
All but rocm-llvm are small libraries under the permissive MIT license. Since ROCm 5.2, all packages successfully build in a clean chroot and are distributed in the community repo arch4edu.
The application libraries for numerical linear algebra, sparse matrices or random numbers start with roc and hip (rocblas, rocsparse, rocrand). The hip* packages are designed in such a way that they would also work with CUDA if hip is configured with CUDA instead of a ROCm/HSA backend. With few exceptions (rocthrust, rccl) these packages are licensed under MIT.
Possible issues: There are three packages that are not fully working under Arch Linux or lack an open source license. The first is rocm-gdb, a fork of gdb with GPU support. To work properly it needs a kernel module currently not available in upstream linux but only as part of AMD's dkms modules. But they only work with specific kernel versions. Support for this from my side on Arch Linux was dropped a while ago. One closed source package is hsa-amd-aqlprofile. As the name suggests it is used for profiling as part of rocprofiler. Above mentioned packages are only required for debugging and profiling but are no runtime dependencies of the big machine learning libraries or any other package with ROCm support I'm aware of. The third package is rocm-core, a package only part of the meta packages for ROCm with no influence on the ROCm runtime. It provides a single header and a library with a single function that returns the current ROCm version. No source code has been published by AMD so far and the official package lacks a license file. Have you tried contacting AMD about `rocm-core`? It seems odd to keep such a small thing closed source / without a license.
A second issue is GPU support. AMD officially only supports the professional compute GPUs. This does not mean that ROCm is not working on consumer cards but merely that AMD cannot guarantee all functionalities through excessive testing. Recently, ROCm added support for Navi 21 (RX 6800 onwards), see
https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardw...
I own a Vega 56 (gfx900) that is officially supported, so I can test all packages before publishing them on the AUR / in [community]. Finding information about ROCm support in consumer cards really isn't easy – but I guess with CUDA I just expect it to work with recent Nvidia cards?
I would guess that we have a bunch of TUs with Radeon RX 5000/6000 (and soon 7000) series cards, but without the needed knowledge / use case for ROCm. Maybe it would be a good idea to provide testing scripts / documents for them, so they can report back once you push things into testing?
Having a list of tested cards in the wiki would be great as well.
#END Technical details
On the long term, I would like to foster Arch Linux as the leading platform for scientific computing. This includes Machine Learning libraries in the official repositories as well as packages for classical "number crunching" such as petsc, trilinos and packages that depend on them: deal-ii, dune or ngsolve.
The sponsors of my application are Sven (svenstaro) and Bruno (archange).
I'm looking forward to the upcoming the discussion and your feedback on my application.
Best, Torsten Best Regards Justin
On Mon, 7 Nov 2022 19:05:16 +0000 Torsten Keßler <t.kessler@posteo.de> wrote:
Hi Justin!
Some ROC repositories include documentation (cmake, device libs, hip), maybe it would make sense to include those in `/usr/share/doc/${pkgname}`? That's a very good idea. For some packages, AMD bundles them with the package (rocm-dbgapi) and sometimes it's shipped separately, see hip-doc [1].
The limited support of ROCm has been one of the main things locking me into Nvidia for my workstations. Yes, that's really the main drawback of ROCm. CUDA works on almost any Nvidia GPU (even on mobile variants). I hope AMD will change their policy with Navi 30+.
Have you tried contacting AMD about `rocm-core`? Others already did. AMD supported promised to release the source code in March [2].
Finding information about ROCm support in consumer cards really isn't easy – but I guess with CUDA I just expect it to work with recent Nvidia cards? Do you mean the common HIP abstraction layer (like hipfft, hipblas,...)? Yes, that should work with any recent CUDA version. But I haven't tried this as I don't have access to an Nvidia GPU. Furthermore, this feature (HIP with CUDA) has never been requested by the community at rocm-arch. I think Nvidia users just stick with CUDA and don't need HIP.
I mean with ROCm I'm not sure if a GPU I'm going to buy will support it.
Maybe it would be a good idea to provide testing scripts / documents for them, so they can report back once you push things into testing? Absolutely! There's HIP examples [3] from AMD which checks basic HIP language features. Additionally, we have `rocm-validation-suite` which offers several tests.
Having a list of tested cards in the wiki would be great as well. I agree! Once we have an established test suite, this should be straightforward.
Best! Torsten
[1] http://repo.radeon.com/rocm/apt/5.3/pool/main/h/hip-doc/ [2] https://github.com/RadeonOpenCompute/ROCm/issues/1705#issuecomment-108159928... [3] https://github.com/ROCm-Developer-Tools/HIP-Examples
Am 06.11.22 um 23:10 schrieb aur-general-request@lists.archlinux.org:
Send Aur-general mailing list submissions to aur-general@lists.archlinux.org
To subscribe or unsubscribe via email, send a message with subject or body 'help' to aur-general-request@lists.archlinux.org
You can reach the person managing the list at aur-general-owner@lists.archlinux.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Aur-general digest..."
Today's Topics:
1. Re: TU Application - tpkessler (Justin Kromlinger) 2. Re: TU Application - tpkessler (Torsten Keßler) 3. Re: TU Application - tpkessler (Filipe Laíns)
----------------------------------------------------------------------
Message: 1 Date: Sun, 6 Nov 2022 20:01:14 +0100 From: Justin Kromlinger <hashworks@archlinux.org> Subject: Re: TU Application - tpkessler To: aur-general@lists.archlinux.org Message-ID: <20221106200114.438404af@maker.hashworks.net> Content-Type: multipart/signed; boundary="Sig_//b6Alp4sEqgQD9YkXUpB1=1"; protocol="application/pgp-signature"; micalg=pgp-sha256
Hi Torsten!
On Wed, 26 Oct 2022 06:30:33 +0000 Torsten Keßler <t.kessler@posteo.de> wrote:
Hi! I'm Torsten Keßler (tpkessler in AUR and on GitHub) from Saarland, a federal state in the south west of Germany. With this email I'm applying to be become a trusted user. After graduating with a PhD in applied mathematics this year I'm now a post-doc with a focus on numerical analysis, the art of solving physical problems with mathematically sound algorithms on a computer. I've been using Arch Linux on my private machines (and at work) since my first weeks at university ten years ago. After initial distro hopping a friend recommended Arch. I immediately liked the way it handles packages via pacman, its wiki and the flexibility of its installation process. Soon we can switch the Arch Linux IRC main language to German!
Owing to their massively parallel architecture, GPUs have emerged as the leading platform for computationally expensive problems: Machine Learning/AI, real-world engineering problems, simulation of complex physical systems. For a long time, nVidia's CUDA framework (closed source, exclusively for their GPUs) has dominated this field. In 2015, AMD announced ROCm, their open source compute framework for GPUs. A common interface to CUDA, called HIP, makes it possible to write code that compiles and runs both on AMD and nVidia hardware. I've been closely following the development of ROCm on GitHub, trying to compile the stack from time to time. But only since 2020, the kernel includes all the necessary code to compile the ROCm stack on Arch Linux. Around this time I've started to contribute to rocm-arch on GitHub, a collection of PKGBUILDs for ROCm (with around 50 packages). Soon after that, I became the main contributor to the repository and, since 2021, I've been the maintainer of the whole ROCm stack. We have an active issue tracker and recently started a discussion page for rocm-arch. Most of the open issues as of now are for bookkeeping of patches we applied to run ROCm on Arch Linux. Many of them are linked to an upstream issue and a corresponding pull request that fixes the issues. This way I've already contributed code to a couple of libraries of the ROCm stack.
Over the years, many libraries have added official support for ROCm, including tensorflow, pytorch, python-cupy, python-numba (not actively maintained anymore) and blender. Support of ROCm for the latter generated large interest in the community and is one reason Sven contacted me, asking me if I would be interested to take care of ROCm in [community]. In its current version, ROCm support for blender works out of the box. Just install hip-runtime-amd from the AUR and enable the HIP backend in blender's settings for rendering. The machine learning libraries require more dependencies from the AUR. Once installed, pytorch and tensorflow are known to work on Vega GPUs and the recent RDNA architecture.
My first action as a TU would be to add basic support of ROCm to [community], i.e. the low level libraries, including HIP and an open source runtime for OpenCL based on ROCm. That would be enough to run blender with its ROCm backend. At the same time, I would expand the wiki article on ROCm. The interaction with the community would also move from the issue tracker of rocm-arch to the Arch Linux bug tracker and the forums. In a second phase I would add the high level libraries that would enable users to quickly compile and run complex libraries such as tensorflow, pytorch or cupy. The limited support of ROCm has been one of the main things locking me into Nvidia for my workstations. Having stuff in community would certainly help with that!
#BEGIN Technical details
The minimal package list for HIP which includes the runtime libraries for basic GPU programming and the GPU compiler (hipcc) comprises eight packages
* rocm-cmake (basic cmake files for ROCm) * rocm-llvm (upstream llvm with to-be-merged changes by AMD) * rocm-device-libs (implements math functions for all GPU architectures) * comgr (runtime library, "compiler support" for rocm-llvm) * hsakmt-roct (interface to the amdgpu kernel driver) * hsa-rocr (runtime for HSA compute kernels) * rocminfo (display information on HSA agents: GPU and possibly CPU) * hip-runtime-amd (runtime and compiler for HIP, a C++ dialect inspired by CUDA C++) PKGBUILDs look good to me. Some ROC repositories include documentation (cmake, device libs, hip), maybe it would make sense to include those in `/usr/share/doc/${pkgname}`?
All but rocm-llvm are small libraries under the permissive MIT license. Since ROCm 5.2, all packages successfully build in a clean chroot and are distributed in the community repo arch4edu.
The application libraries for numerical linear algebra, sparse matrices or random numbers start with roc and hip (rocblas, rocsparse, rocrand). The hip* packages are designed in such a way that they would also work with CUDA if hip is configured with CUDA instead of a ROCm/HSA backend. With few exceptions (rocthrust, rccl) these packages are licensed under MIT.
Possible issues: There are three packages that are not fully working under Arch Linux or lack an open source license. The first is rocm-gdb, a fork of gdb with GPU support. To work properly it needs a kernel module currently not available in upstream linux but only as part of AMD's dkms modules. But they only work with specific kernel versions. Support for this from my side on Arch Linux was dropped a while ago. One closed source package is hsa-amd-aqlprofile. As the name suggests it is used for profiling as part of rocprofiler. Above mentioned packages are only required for debugging and profiling but are no runtime dependencies of the big machine learning libraries or any other package with ROCm support I'm aware of. The third package is rocm-core, a package only part of the meta packages for ROCm with no influence on the ROCm runtime. It provides a single header and a library with a single function that returns the current ROCm version. No source code has been published by AMD so far and the official package lacks a license file. Have you tried contacting AMD about `rocm-core`? It seems odd to keep such a small thing closed source / without a license.
A second issue is GPU support. AMD officially only supports the professional compute GPUs. This does not mean that ROCm is not working on consumer cards but merely that AMD cannot guarantee all functionalities through excessive testing. Recently, ROCm added support for Navi 21 (RX 6800 onwards), see
https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardw...
I own a Vega 56 (gfx900) that is officially supported, so I can test all packages before publishing them on the AUR / in [community]. Finding information about ROCm support in consumer cards really isn't easy – but I guess with CUDA I just expect it to work with recent Nvidia cards?
I would guess that we have a bunch of TUs with Radeon RX 5000/6000 (and soon 7000) series cards, but without the needed knowledge / use case for ROCm. Maybe it would be a good idea to provide testing scripts / documents for them, so they can report back once you push things into testing?
Having a list of tested cards in the wiki would be great as well.
#END Technical details
On the long term, I would like to foster Arch Linux as the leading platform for scientific computing. This includes Machine Learning libraries in the official repositories as well as packages for classical "number crunching" such as petsc, trilinos and packages that depend on them: deal-ii, dune or ngsolve.
The sponsors of my application are Sven (svenstaro) and Bruno (archange).
I'm looking forward to the upcoming the discussion and your feedback on my application.
Best, Torsten Best Regards Justin
-- hashworks Web https://hashworks.net Public Key 0x4FE7F4FEAC8EBE67
how you keep track of upstream locations and which one has new releases available? New releases are announced on the ROCm GitHub page [1]. All components of the ROCm stack are hosted on Github, distributed over three projects:
Hi Levente! the core components [2], the "software platform" [3] and the developer tools [4]. Changes in the package structure are usually described in the release notes. For major releases (such as a potential ROCm 6) it also helps to browse AMD's Ubuntu repo [5].
It looks like the whole stack has version 5.3.1 released. A patch release only affects a small part of the ROCm stack. So it's common that some developers tag a new release before the official release if there are no changes in their package.
Now, a "little bit" of feedback after reviewing your mcurrent AUR packages. Thank you very much for your review. I will implement as much as possible in the upcoming 5.3.1 release. As a first test I've updated comgr. Unit tests are now called in check() and CMAKE_BUILD_TYPE is set to None.
The packages mathtime-professional, pcg-c-git and python-meshio are not part of the ROCm stack. Regarding mathtime-professional: It's a popular (proprietary) math font for scientific texts. Several large publishers (Springer-Verlag, Cambridge University Press) use it in their books. The package integrates it in LaTeX, the most popular markdown language for scientific texts in STEM. Best! Torsten [1] https://github.com/RadeonOpenCompute/ROCm [2] https://github.com/RadeonOpenCompute [3] https://github.com/ROCmSoftwarePlatform [4] https://github.com/ROCm-Developer-Tools [5] http://repo.radeon.com/rocm/apt/
On 26.10.22 08:30, Torsten Keßler wrote:
Hi! I'm Torsten Keßler (tpkessler in AUR and on GitHub) from Saarland, a federal state in the south west of Germany. With this email I'm applying to be become a trusted user. After graduating with a PhD in applied mathematics this year I'm now a post-doc with a focus on numerical analysis, the art of solving physical problems with mathematically sound algorithms on a computer. I've been using Arch Linux on my private machines (and at work) since my first weeks at university ten years ago. After initial distro hopping a friend recommended Arch. I immediately liked the way it handles packages via pacman, its wiki and the flexibility of its installation process.
Owing to their massively parallel architecture, GPUs have emerged as the leading platform for computationally expensive problems: Machine Learning/AI, real-world engineering problems, simulation of complex physical systems. For a long time, nVidia's CUDA framework (closed source, exclusively for their GPUs) has dominated this field. In 2015, AMD announced ROCm, their open source compute framework for GPUs. A common interface to CUDA, called HIP, makes it possible to write code that compiles and runs both on AMD and nVidia hardware. I've been closely following the development of ROCm on GitHub, trying to compile the stack from time to time. But only since 2020, the kernel includes all the necessary code to compile the ROCm stack on Arch Linux. Around this time I've started to contribute to rocm-arch on GitHub, a collection of PKGBUILDs for ROCm (with around 50 packages). Soon after that, I became the main contributor to the repository and, since 2021, I've been the maintainer of the whole ROCm stack.
We have an active issue tracker and recently started a discussion page for rocm-arch. Most of the open issues as of now are for bookkeeping of patches we applied to run ROCm on Arch Linux. Many of them are linked to an upstream issue and a corresponding pull request that fixes the issues. This way I've already contributed code to a couple of libraries of the ROCm stack.
Over the years, many libraries have added official support for ROCm, including tensorflow, pytorch, python-cupy, python-numba (not actively maintained anymore) and blender. Support of ROCm for the latter generated large interest in the community and is one reason Sven contacted me, asking me if I would be interested to take care of ROCm in [community]. In its current version, ROCm support for blender works out of the box. Just install hip-runtime-amd from the AUR and enable the HIP backend in blender's settings for rendering. The machine learning libraries require more dependencies from the AUR. Once installed, pytorch and tensorflow are known to work on Vega GPUs and the recent RDNA architecture.
My first action as a TU would be to add basic support of ROCm to [community], i.e. the low level libraries, including HIP and an open source runtime for OpenCL based on ROCm. That would be enough to run blender with its ROCm backend. At the same time, I would expand the wiki article on ROCm. The interaction with the community would also move from the issue tracker of rocm-arch to the Arch Linux bug tracker and the forums. In a second phase I would add the high level libraries that would enable users to quickly compile and run complex libraries such as tensorflow, pytorch or cupy.
#BEGIN Technical details
The minimal package list for HIP which includes the runtime libraries for basic GPU programming and the GPU compiler (hipcc) comprises eight packages
* rocm-cmake (basic cmake files for ROCm) * rocm-llvm (upstream llvm with to-be-merged changes by AMD) * rocm-device-libs (implements math functions for all GPU architectures) * comgr (runtime library, "compiler support" for rocm-llvm) * hsakmt-roct (interface to the amdgpu kernel driver) * hsa-rocr (runtime for HSA compute kernels) * rocminfo (display information on HSA agents: GPU and possibly CPU) * hip-runtime-amd (runtime and compiler for HIP, a C++ dialect inspired by CUDA C++)
All but rocm-llvm are small libraries under the permissive MIT license. Since ROCm 5.2, all packages successfully build in a clean chroot and are distributed in the community repo arch4edu.
The application libraries for numerical linear algebra, sparse matrices or random numbers start with roc and hip (rocblas, rocsparse, rocrand). The hip* packages are designed in such a way that they would also work with CUDA if hip is configured with CUDA instead of a ROCm/HSA backend. With few exceptions (rocthrust, rccl) these packages are licensed under MIT.
Possible issues: There are three packages that are not fully working under Arch Linux or lack an open source license. The first is rocm-gdb, a fork of gdb with GPU support. To work properly it needs a kernel module currently not available in upstream linux but only as part of AMD's dkms modules. But they only work with specific kernel versions. Support for this from my side on Arch Linux was dropped a while ago. One closed source package is hsa-amd-aqlprofile. As the name suggests it is used for profiling as part of rocprofiler. Above mentioned packages are only required for debugging and profiling but are no runtime dependencies of the big machine learning libraries or any other package with ROCm support I'm aware of. The third package is rocm-core, a package only part of the meta packages for ROCm with no influence on the ROCm runtime. It provides a single header and a library with a single function that returns the current ROCm version. No source code has been published by AMD so far and the official package lacks a license file.
A second issue is GPU support. AMD officially only supports the professional compute GPUs. This does not mean that ROCm is not working on consumer cards but merely that AMD cannot guarantee all functionalities through excessive testing. Recently, ROCm added support for Navi 21 (RX 6800 onwards), see
https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardw...
I own a Vega 56 (gfx900) that is officially supported, so I can test all packages before publishing them on the AUR / in [community].
#END Technical details
On the long term, I would like to foster Arch Linux as the leading platform for scientific computing. This includes Machine Learning libraries in the official repositories as well as packages for classical "number crunching" such as petsc, trilinos and packages that depend on them: deal-ii, dune or ngsolve.
The sponsors of my application are Sven (svenstaro) and Bruno (archange).
I'm looking forward to the upcoming the discussion and your feedback on my application.
Best, Torsten
Everyone's had ample time to discuss Torsten's application. It is time to cast your votes: https://aur.archlinux.org/tu/141 Voting will conclude in one week on 2022-11-20. Sven
Hi everyone, Le 13/11/2022 à 09:41, Sven-Hendrik Haase a écrit :
On 26.10.22 08:30, Torsten Keßler wrote:
[…]
Everyone's had ample time to discuss Torsten's application. It is time to cast your votes: https://aur.archlinux.org/tu/141
Voting will conclude in one week on 2022-11-20.
The voting period has ended. Yes 46 No 1 Abstain 7 Total 54 Participation 88.52% Result: Accepted Congratulations, you are now officially accepted as TU. Please proceed with https://wiki.archlinux.org/title/AUR_Trusted_User_guidelines#TODO_list_for_n... Regards, Bruno/Archange
participants (7)
-
Archange
-
Filipe Laíns
-
Justin Kromlinger
-
Justin Kromlinger
-
Levente Polyak
-
Sven-Hendrik Haase
-
Torsten Keßler