[arch-dev-public] Create guidelines regarding SIMD instructions/x86 extensions
Hello, Currently there are no guidelines stating which x86 extensions (ex. SSE2, SEE3, SSE4, AVX, etc.) we support. This is a bit problematic since it lets compilers do what they want and possible generate code that can't run on some systems. Even though this is an issue, it's not complete anarchy, at least yet! Just kidding :p. The vast majority of our native packages are compiled with GCC and we do default to `-mtune=generic` which is good but not optimal. `-mtune=generic` tells GCC to compile for a generic processor so it's up to GCC to decide which architecture extensions would compose a generic processor. I haven't been able to find any documentation on what x86 extensions are enabled for a "generic" processor but I was able to track them down to MMX, SSE (or KNI) and SSE2. Being undocumented they could change at any time so I don't think we should rely on `-mtune=generic`. What I propose is to define a set of x86 extensions to support and make all compilers default to that if possible. I am fine with MMX, SSE and SSE2 but they should be *our* choice, not GCC's. This begs the question, would such approach be doable? So I ask the compiler maintainers to please check if this would be possible. I would also like to explore the idea of adding an "high performance" architecture which would be able to make use of SSE{,2,3,4,4.1,4.2} and AVX, which seem to be the standard for newer processors (>=2013). This would only be available for packages that do high performance computing (ex. openblas, sdrangel, etc.). Any thoughts on this? Thanks, Filipe Laíns 3DCE 51D6 0930 EBA4 7858 BA41 46F6 33CB B0EB 4BF2
On 25/5/19 10:17 am, Filipe Laíns via arch-dev-public wrote:
Hello,
Currently there are no guidelines stating which x86 extensions (ex. SSE2, SEE3, SSE4, AVX, etc.) we support. This is a bit problematic since it lets compilers do what they want and possible generate code that can't run on some systems.
Even though this is an issue, it's not complete anarchy, at least yet! Just kidding :p. The vast majority of our native packages are compiled with GCC and we do default to `-mtune=generic` which is good but not optimal. `-mtune=generic` tells GCC to compile for a generic processor so it's up to GCC to decide which architecture extensions would compose a generic processor. I haven't been able to find any documentation on what x86 extensions are enabled for a "generic" processor but I was able to track them down to MMX, SSE (or KNI) and SSE2. Being undocumented they could change at any time so I don't think we should rely on `-mtune=generic`.
I think you need to look at the difference between -march and -mtune. We use "-march=x86-64", which defines the instruction sets that can be used. Adding "-mtune=generic" does not allow the inclusion of additional instruction sets. Look at the output of: gcc -march=x86-64 -Q --help=target Allan
On Sat, 2019-05-25 at 10:35 +1000, Allan McRae wrote:
On 25/5/19 10:17 am, Filipe Laíns via arch-dev-public wrote:
Hello,
Currently there are no guidelines stating which x86 extensions (ex. SSE2, SEE3, SSE4, AVX, etc.) we support. This is a bit problematic since it lets compilers do what they want and possible generate code that can't run on some systems.
Even though this is an issue, it's not complete anarchy, at least yet! Just kidding :p. The vast majority of our native packages are compiled with GCC and we do default to `-mtune=generic` which is good but not optimal. `-mtune=generic` tells GCC to compile for a generic processor so it's up to GCC to decide which architecture extensions would compose a generic processor. I haven't been able to find any documentation on what x86 extensions are enabled for a "generic" processor but I was able to track them down to MMX, SSE (or KNI) and SSE2. Being undocumented they could change at any time so I don't think we should rely on `-mtune=generic`.
I think you need to look at the difference between -march and -mtune. We use "-march=x86-64", which defines the instruction sets that can be used. Adding "-mtune=generic" does not allow the inclusion of additional instruction sets.
Look at the output of: gcc -march=x86-64 -Q --help=target
Allan
Yes! My bad. I got confused. From the `-march=x86_64` documentation: A generic CPU with 64-bit extensions. Setting `-mtune` to generic won't add any additional instruction sets by itself, but it does not prevent instruction sets from being added. Looks like GCC enables MMX, SSE and SSE2 by default, it isn't related at all to `-march` like I stated in the email but it still presents the same issue. What do you think? Filipe Laíns 3DCE 51D6 0930 EBA4 7858 BA41 46F6 33CB B0EB 4BF2
On Sat, 25 May 2019 at 04:27, Filipe Laíns via arch-dev-public <arch-dev-public@archlinux.org> wrote:
Setting `-mtune` to generic won't add any additional instruction sets by itself, but it does not prevent instruction sets from being added. Looks like GCC enables MMX, SSE and SSE2 by default, it isn't related at all to `-march` like I stated in the email but it still presents the same issue.
As far as I know, MMX, SSE and SSE2 are mandatory part of the AMD64 instruction set, so they are not enabled randomly just because someone felt like it, but because they are be present on every x86_64 cpu.
On 25/5/19 5:22 pm, Lukas Jirkovsky via arch-dev-public wrote:
On Sat, 25 May 2019 at 04:27, Filipe Laíns via arch-dev-public <arch-dev-public@archlinux.org> wrote:
Setting `-mtune` to generic won't add any additional instruction sets by itself, but it does not prevent instruction sets from being added. Looks like GCC enables MMX, SSE and SSE2 by default, it isn't related at all to `-march` like I stated in the email but it still presents the same issue.
As far as I know, MMX, SSE and SSE2 are mandatory part of the AMD64 instruction set, so they are not enabled randomly just because someone felt like it, but because they are be present on every x86_64 cpu. .
Correct. Using the command I gave in my first reply: $ gcc -march=x86-64 -Q --help=target | grep sse -mfpmath= sse -mno-sse4 [enabled] -msse [enabled] -msse2 [enabled] -msse2avx [disabled] -msse3 [disabled] -msse4 [disabled] ... $ gcc -march=x86-64 -Q --help=target | grep mmx -mmmx [enabled] -mtune just tunes instructions for a "representative" set of "current" CPUs that run as x86-64. Allan
Hi, Le 25/05/2019 à 02:17, Filipe Laíns via arch-dev-public a écrit :
I would also like to explore the idea of adding an "high performance" architecture which would be able to make use of SSE{,2,3,4,4.1,4.2} and AVX, which seem to be the standard for newer processors (>=2013). This would only be available for packages that do high performance computing (ex. openblas, sdrangel, etc.). Any thoughts on this?
As said on IRC, they have been discussions before on having multiple targets and corresponding repos, but the starting point is that we need automated build before going into such a direction, and this in turn has several requirements. I’ve linked to you the pad where we put our ideas together regarding this. In the meantime, we had the case before of whether we should package e.g. $pkgname-{sse4,avx} in a case where it mattered a lot, but it turned out the software in question (embree) is able to do runtime detection of available ISA. Maybe some other packages are doing this too, else we could discuss whether allowing such flavours as a temporary measure would be acceptable for selected packages. Regards, Bruno
On 25/5/19 9:19 pm, Bruno Pagani via arch-dev-public wrote:
Hi,
Le 25/05/2019 à 02:17, Filipe Laíns via arch-dev-public a écrit :
I would also like to explore the idea of adding an "high performance" architecture which would be able to make use of SSE{,2,3,4,4.1,4.2} and AVX, which seem to be the standard for newer processors (>=2013). This would only be available for packages that do high performance computing (ex. openblas, sdrangel, etc.). Any thoughts on this?
As said on IRC, they have been discussions before on having multiple targets and corresponding repos, but the starting point is that we need automated build before going into such a direction, and this in turn has several requirements. I’ve linked to you the pad where we put our ideas together regarding this.
In the meantime, we had the case before of whether we should package e.g. $pkgname-{sse4,avx} in a case where it mattered a lot, but it turned out the software in question (embree) is able to do runtime detection of available ISA. Maybe some other packages are doing this too, else we could discuss whether allowing such flavours as a temporary measure would be acceptable for selected packages.
glibc detects available instruction sets and uses the best for many functions. I'd be very, very, very much against providing multiple variants of a package in our repos. Using asp and makepkg are is a hard solution for those who really need a few packages rebuilt. PS - I rebuilt [core] with -march=haswell recently as a test. Automated building is not an issue. Unattended package/database signing is the major stumbling block. Allan
Le 25/05/2019 à 13:27, Allan McRae via arch-dev-public a écrit :
On 25/5/19 9:19 pm, Bruno Pagani via arch-dev-public wrote:
Hi,
Le 25/05/2019 à 02:17, Filipe Laíns via arch-dev-public a écrit :
I would also like to explore the idea of adding an "high performance" architecture which would be able to make use of SSE{,2,3,4,4.1,4.2} and AVX, which seem to be the standard for newer processors (>=2013). This would only be available for packages that do high performance computing (ex. openblas, sdrangel, etc.). Any thoughts on this? As said on IRC, they have been discussions before on having multiple targets and corresponding repos, but the starting point is that we need automated build before going into such a direction, and this in turn has several requirements. I’ve linked to you the pad where we put our ideas together regarding this.
In the meantime, we had the case before of whether we should package e.g. $pkgname-{sse4,avx} in a case where it mattered a lot, but it turned out the software in question (embree) is able to do runtime detection of available ISA. Maybe some other packages are doing this too, else we could discuss whether allowing such flavours as a temporary measure would be acceptable for selected packages. glibc detects available instruction sets and uses the best for many functions.
Great!
I'd be very, very, very much against providing multiple variants of a package in our repos. Using asp and makepkg are is a hard solution for those who really need a few packages rebuilt.
I’m fine with that possibility too.
PS - I rebuilt [core] with -march=haswell recently as a test. Automated building is not an issue. Unattended package/database signing is the major stumbling block.
Yes, in our discussions it boiled down to “Automated rebuilds” → “Unattented signing” → “Reproducible builds”. Out of curiosity, what did you rebuild of [core] lead to? Bruno
On 25/5/19 9:34 pm, Bruno Pagani wrote:
Out of curiosity, what did you rebuild of [core] lead to?
I had a potentially slightly faster system for a week... It was mainly a test to see if I spotted some build issues of test suite failures beyond what is seen for x86_64. All was good. A
On Sat, 2019-05-25 at 21:27 +1000, Allan McRae via arch-dev-public wrote:
On 25/5/19 9:19 pm, Bruno Pagani via arch-dev-public wrote:
Hi,
Le 25/05/2019 à 02:17, Filipe Laíns via arch-dev-public a écrit :
I would also like to explore the idea of adding an "high performance" architecture which would be able to make use of SSE{,2,3,4,4.1,4.2} and AVX, which seem to be the standard for newer processors (>=2013). This would only be available for packages that do high performance computing (ex. openblas, sdrangel, etc.). Any thoughts on this?
As said on IRC, they have been discussions before on having multiple targets and corresponding repos, but the starting point is that we need automated build before going into such a direction, and this in turn has several requirements. I’ve linked to you the pad where we put our ideas together regarding this.
In the meantime, we had the case before of whether we should package e.g. $pkgname-{sse4,avx} in a case where it mattered a lot, but it turned out the software in question (embree) is able to do runtime detection of available ISA. Maybe some other packages are doing this too, else we could discuss whether allowing such flavours as a temporary measure would be acceptable for selected packages.
glibc detects available instruction sets and uses the best for many functions.
I'd be very, very, very much against providing multiple variants of a package in our repos. Using asp and makepkg are is a hard solution for those who really need a few packages rebuilt.
PS - I rebuilt [core] with -march=haswell recently as a test. Automated building is not an issue. Unattended package/database signing is the major stumbling block.
Allan
In cases where the instruction set is detected at runtime it would not be needed a new variation of the package since we can guarantee the software isn't going to try to run any unsupported instructions. What we are discussing really only applies to packages without runtime SIMD code selection. Thanks, Filipe Laíns 3DCE 51D6 0930 EBA4 7858 BA41 46F6 33CB B0EB 4BF2
On Sat, 2019-05-25 at 13:19 +0200, Bruno Pagani via arch-dev-public wrote:
Hi,
Le 25/05/2019 à 02:17, Filipe Laíns via arch-dev-public a écrit :
I would also like to explore the idea of adding an "high performance" architecture which would be able to make use of SSE{,2,3,4,4.1,4.2} and AVX, which seem to be the standard for newer processors (>=2013). This would only be available for packages that do high performance computing (ex. openblas, sdrangel, etc.). Any thoughts on this?
As said on IRC, they have been discussions before on having multiple targets and corresponding repos, but the starting point is that we need automated build before going into such a direction, and this in turn has several requirements. I’ve linked to you the pad where we put our ideas together regarding this.
In the meantime, we had the case before of whether we should package e.g. $pkgname-{sse4,avx} in a case where it mattered a lot, but it turned out the software in question (embree) is able to do runtime detection of available ISA. Maybe some other packages are doing this too, else we could discuss whether allowing such flavours as a temporary measure would be acceptable for selected packages.
Regards, Bruno
This is fine my me. My biggest concern was the fact C doesn't support __attribute__(("instruction set here")) but there are of course workarounds. Creating a new architecture only makes sense if there are multiple packages needing this but it seems not. I am fine with a suffix, although I was thinking more something like -simd as SSE4, AVX, etc. are usually available at the same time. In this cases I think we should add a post_install step that gives a warning if the user CPU doesn't support the used instruction sets. Thanks, Filipe Laíns 3DCE 51D6 0930 EBA4 7858 BA41 46F6 33CB B0EB 4BF2
participants (4)
-
Allan McRae
-
Bruno Pagani
-
Filipe Laíns
-
Lukas Jirkovsky