[arch-dev-public] Discussion - Increasing our CPU requirements
Remember when Arch Linux was optimized out of the box. We have the blazingly fast i686 port while other distros hung out in i386 land. Those were the days where the idea of Arch being fast started. Now it has degraded to stuff of legend. Now, x86_64 is old. We should continue to push forward and add further optimization. Reasonable optimizations to consider: AVX2 FMA SSE4.2 AVX2 is Intel Haswell and newer or AMD Ryzen and newer. This CPUs released 2013 to 2015. So 5 - 7 years old. Discuss.
If I see a SIGILL on my AMD Phenom II X6 1090T then Arch will have failed me. 😜 I believe your proposal should only be discussed as co-existing optimized port(s) and even then I'm not sure it's worth the trouble. Performance-critical applications can and frequently are optimized for the running processor (I'm thinking of stuff like glibc and ffmpeg here).
On 29/3/20 8:52 pm, Evangelos Foutras wrote:
If I see a SIGILL on my AMD Phenom II X6 1090T then Arch will have failed me. 😜
I believe your proposal should only be discussed as co-existing optimized port(s) and even then I'm not sure it's worth the trouble. Performance-critical applications can and frequently are optimized for the running processor (I'm thinking of stuff like glibc and ffmpeg here).
AVX2 was a bold choice, and really a place to get a discussion started. We are currently supporting processors from 2003. We can be better than that. A
Am Sun, 29 Mar 2020 21:44:38 +1000 schrieb Allan McRae via arch-dev-public <arch-dev-public@archlinux.org>:
We are currently supporting processors from 2003. We can be better than that.
A
In the very early Linux days many tasks maxed out the cpu performance and every cpus optimization was noticeable. This has changed a lot. Many even very old cpus are still fast enough for useful tasks. Do not force users with such a system to leave Arch. My main workstation system is still a SandyBridge 2600K and I guess it will last another 5-10 years. I much prefer runtime extension detection that should be implemented upstream. I'm strongly against increasing our main architecture requirements. I'm not sure if adding any additional more optimized repo is worth the work. -Andy
On Sun, 29 Mar 2020 at 12:26, Allan McRae via arch-dev-public < arch-dev-public@archlinux.org> wrote:
Remember when Arch Linux was optimized out of the box. We have the blazingly fast i686 port while other distros hung out in i386 land. Those were the days where the idea of Arch being fast started. Now it has degraded to stuff of legend.
Now, x86_64 is old. We should continue to push forward and add further optimization.
Reasonable optimizations to consider:
AVX2 FMA SSE4.2
AVX2 is Intel Haswell and newer or AMD Ryzen and newer. This CPUs released 2013 to 2015. So 5 - 7 years old.
Discuss.
I'm definitely all for this. However, I'd strongly prefer it if we used some heavy automation for building for all the variants. coderobe actually started an experimental project to explore this. It would also increase our mirror size requirements quite drastically which I think is likely fine as our full mirror size is quite small but it should be considered. I suggest going by processor support generation alone instead of per feature. For instance, Haswell introduced AVX2 as well as FMA3 so it doesn't really make much sense to separate those out, I think. Besides, if you have AVX2 support and care for speed you'll also want to enable FMA3. Suggested processor-generation based optimization "tier"s: - nehalem (SSE4.2) - sandybridge (SSE4.2, AVX) - haswell (SSE4.2, AVX, AVX2, FMA3) (soon-ish) - icelake (SSE4.2, AVX, AVX2, FMA3, AVX-512) I know this sounds Intel specific so these names might not be optimal. There is quite some work involved in this but I also strongly believe that we have to keep pushing forward.
On Sun, 2020-03-29 at 20:26 +1000, Allan McRae via arch-dev-public wrote:
Remember when Arch Linux was optimized out of the box. We have the blazingly fast i686 port while other distros hung out in i386 land. Those were the days where the idea of Arch being fast started. Now it has degraded to stuff of legend.
Now, x86_64 is old. We should continue to push forward and add further optimization.
Reasonable optimizations to consider:
AVX2 FMA SSE4.2
AVX2 is Intel Haswell and newer or AMD Ryzen and newer. This CPUs released 2013 to 2015. So 5 - 7 years old.
Discuss.
Absolutely not! A huge amount of systems do not meet that requirements. This would rule out all Intel 3rd gen and older cpus, which would for eg. impact laptop models such as the Thinkpad X220 and Thinkpad X230. Instead of adding a requirement for newer CPU extensions let's fix this the proper way. The correct approach to this is to push upstreams to support dynamic detection of CPU extensions. That means the performance critical code is compiled with a different range of extensions and the binary detects at runtime what to use. A lot of them do that already, and for the ones which don't we should push for it (point them to [1]). Unfortunately, we will always have upstreams that don't support that. For this we should define a separate architecture. Right now I have building 2 variants of the projects (see srslte-avx2 and liquid-dsp- sse4.1), optimally we would have a different arch for them. I would also like to note that rebuilding everything with forced support for AVX2 or whatever won't have much effect. Most packages do not have workloads where it would make use sense to use these CPU extensions, and as such, GCC would not use them. There is only maybe a handful of packages in the repos that would benefit from this. [1] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-tar... Cheers, Filipe Laíns
On 29/3/20 11:17 pm, Filipe Laíns wrote:
I would also like to note that rebuilding everything with forced support for AVX2 or whatever won't have much effect. Most packages do not have workloads where it would make use sense to use these CPU extensions, and as such, GCC would not use them.
That assumes we just add AVX2. Whereas, requiring a CPU supporting AVX2 would bring other optimizations that would be used. As I replied earlier, AVX2 may be going too far. But is a good starting point for discussion. If that is too far, what could we accept? SSE4.2? AVX? Surely we can do better than pure x86_64. To have a separate architecture would require automated builds, which requires being able to sign packages automatically. And we have not achieved database signing in 9 years.... I'm looking for a boost that could be achieved now. Allan
On Sun, 2020-03-29 at 23:37 +1000, Allan McRae via arch-dev-public wrote:
On 29/3/20 11:17 pm, Filipe Laíns wrote:
I would also like to note that rebuilding everything with forced support for AVX2 or whatever won't have much effect. Most packages do not have workloads where it would make use sense to use these CPU extensions, and as such, GCC would not use them.
That assumes we just add AVX2. Whereas, requiring a CPU supporting AVX2 would bring other optimizations that would be used.
No, it should be true for all extensions.
As I replied earlier, AVX2 may be going too far. But is a good starting point for discussion. If that is too far, what could we accept? SSE4.2? AVX? Surely we can do better than pure x86_64.
No, SSE4.2 is too far. For me, the minimum should be AVX.
To have a separate architecture would require automated builds, which requires being able to sign packages automatically. And we have not achieved database signing in 9 years.... I'm looking for a boost that could be achieved now.
No, it would not. Where is this coming from? I already build split packages with SIMD instructions, I make the PKGBUILD build for 2 architectures instead with a minimal patch. If pacman is not able to handle parallel architectures, we should fix that. I think it's a valid use case. Furthermore, if you do indeed whish to move this forward please present us with reasonable data. Take a few packages that would benefit from this, build them with the proposed architecture and show us benchmarks. I think it's gonna be very hard for you to find packages with considerable improvement but I might be wrong, please show me. Filipe Laíns
On Sun, 2020-03-29 at 15:39 +0100, Filipe Laíns via arch-dev-public wrote: > I make the PKGBUILD build for 2 * I can make Sorry, I am a little distracted today. Filipe Laíns
On Sun, Mar 29, 2020 at 03:39:51PM +0100, Filipe Laíns via arch-dev-public wrote:
To have a separate architecture would require automated builds, which requires being able to sign packages automatically. And we have not achieved database signing in 9 years.... I'm looking for a boost that could be achieved now.
No, it would not. Where is this coming from? I already build split packages with SIMD instructions, I make the PKGBUILD build for 2 architectures instead with a minimal patch.
If pacman is not able to handle parallel architectures, we should fix that. I think it's a valid use case.
Well, how do you think we supported two architectures? Why do you think `extra-x86_64-build` is named the way it is? The "problem" is that we have no intentions of building 1 package 4 times and keep things in sync by hand, it was tedious enough with i686, which was part of why it was dropped in the first place. Thus we want build-servers to do this for us. Allan is going to have a hard time argueing that the minimal improvements is going to justify the absurd time we'll end up building things by hand, it's the crux of the problem essentially. I'm also sure he knows this. Surely we can bikeshed about which architectures to support, what we should discuss is how we should accomplish the task in general.
Furthermore, if you do indeed whish to move this forward please present us with reasonable data. Take a few packages that would benefit from this, build them with the proposed architecture and show us benchmarks. I think it's gonna be very hard for you to find packages with considerable improvement but I might be wrong, please show me.
See last paragraph. -- Morten Linderud PGP: 9C02FF419FECBE16
I want to clarify what I am proposing. I would not be an entirely new architecture in the sense of i686, CPU extensions are not different architectures and shouldn't be treated as such. What I would for us to do is to create a x86-64-axv2, etc. that would complement x86-64. We would not add it as a target for all packages, just for the ones that make sense. For this pacman would have to support architecture priority. We could have something like this: Architecture = x86-64-axv2 x86-64 This means if a x86-64-axv2 package is available, it would be selected over the x86-64 one. That way we don't need to rebuild all packages. My point here is that to me it does not really make sense to drop support for older CPUs. We will have little benefit in newer CPUs. Projects that need the performance already dynamically choose the CPU extensions to use in the runtime -- they will work on all x86-64 CPUs. If this did in fact bring a relevant performance improvement, like the original mail let's me to believe (but this may be just me), I would be all for it, but that's not the case. On Sun, 2020-03-29 at 16:51 +0200, Morten Linderud via arch-dev-public wrote:
Well, how do you think we supported two architectures? Why do you think `extra-x86_64-build` is named the way it is?
The "problem" is that we have no intentions of building 1 package 4 times and keep things in sync by hand, it was tedious enough with i686, which was part of why it was dropped in the first place. Thus we want build-servers to do this for us.
Then automate it? Is there any reason why we can't have the tooling build all architectures for us? Why not have an `extra-build` helper that will call extra-$arch-build for all every architecture? This will have practically the same affect as my SIMD packages not. The only difference would be how people are consuming them. It would just work out of the box instead of them having to install the -avx2 variant.
Allan is going to have a hard time argueing that the minimal improvements is going to justify the absurd time we'll end up building things by hand, it's the crux of the problem essentially. I'm also sure he knows this.
Surely we can bikeshed about which architectures to support, what we should discuss is how we should accomplish the task in general.
Furthermore, if you do indeed whish to move this forward please present us with reasonable data. Take a few packages that would benefit from this, build them with the proposed architecture and show us benchmarks. I think it's gonna be very hard for you to find packages with considerable improvement but I might be wrong, please show me.
See last paragraph.
Which paragraph are you referring to? I will be taking a step back now. I will probably wait 1 or 2 days before replying. Feel free to reach me privately for a direct discussion if you want to. Cheers, Filipe Laíns
On Sun, 2020-03-29 at 16:25 +0100, Filipe Laíns via arch-dev-public wrote:
I would not be an entirely *It would
What I would for us to do is to create a x86-64-axv2, etc. that would *would like for us
let's me to believe (but this may be just me), I would be *let me to
Ugh, sorry again. Today I am only catching the errors when obsessing over sending.
On Sun, Mar 29, 2020 at 04:25:48PM +0100, Filipe Laíns via arch-dev-public wrote:
I want to clarify what I am proposing.
I would not be an entirely new architecture in the sense of i686, CPU extensions are not different architectures and shouldn't be treated as such.
What I would for us to do is to create a x86-64-axv2, etc. that would complement x86-64. We would not add it as a target for all packages, just for the ones that make sense.
For this pacman would have to support architecture priority. We could have something like this:
Architecture = x86-64-axv2 x86-64
Couldn't this be as simple as having a package with avx2 (or whatever) extensions compiled in in a separate repository that takes precedence in pacman.conf? Thanks, -Santiago
On 3/29/20 11:25 AM, Filipe Laíns via arch-dev-public wrote:
I want to clarify what I am proposing.
I would not be an entirely new architecture in the sense of i686, CPU extensions are not different architectures and shouldn't be treated as such.
What I would for us to do is to create a x86-64-axv2, etc. that would complement x86-64. We would not add it as a target for all packages, just for the ones that make sense.
For this pacman would have to support architecture priority. We could have something like this:
Architecture = x86-64-axv2 x86-64
This means if a x86-64-axv2 package is available, it would be selected over the x86-64 one. That way we don't need to rebuild all packages.
Where would you store this package? The pkgname must be unique in each repository database, so you would need a community-avx2 repository. Then it is as simple as Santiago said, just have users add the additional repository if they need it, giving it precedence in pacman.conf. (Except I will go one step further and say this is the *only* way.) -- Eli Schwartz Bug Wrangler and Trusted User
[2020-03-29 16:25:48 +0100] Filipe Laíns via arch-dev-public:
What I would for us to do is to create a x86-64-axv2, etc. that would complement x86-64. We would not add it as a target for all packages, just for the ones that make sense.
For this pacman would have to support architecture priority. We could have something like this:
Architecture = x86-64-axv2 x86-64
I'd like to say why not but everything remains to be done, here. Whereas pacman and our toolchain have mature support for multiple architectures, and they have it today.
My point here is that to me it does not really make sense to drop support for older CPUs. We will have little benefit in newer CPUs.
Nothing is being dropped. Every CPU that does not support the new architecture can keep running the x86_64 packages they currently do.
Then automate it? Is there any reason why we can't have the tooling build all architectures for us? Why not have an `extra-build` helper that will call extra-$arch-build for all every architecture?
That would be awesome but the tooling does not yet exist. Personally I do not consider it terribly bothersome to build packages for multiple architectures like we did for i686 and x86_64. And I think it would be preferable to introduce a new architecture tomorrow than wait a few more months in the hope someone implements your proposed scheme. Cheers. -- Gaetan
On 30/3/20 12:39 am, Filipe Laíns wrote:
On Sun, 2020-03-29 at 23:37 +1000, Allan McRae via arch-dev-public wrote:
On 29/3/20 11:17 pm, Filipe Laíns wrote:
I would also like to note that rebuilding everything with forced support for AVX2 or whatever won't have much effect. Most packages do not have workloads where it would make use sense to use these CPU extensions, and as such, GCC would not use them.
That assumes we just add AVX2. Whereas, requiring a CPU supporting AVX2 would bring other optimizations that would be used.
No, it should be true for all extensions.
As I replied earlier, AVX2 may be going too far. But is a good starting point for discussion. If that is too far, what could we accept? SSE4.2? AVX? Surely we can do better than pure x86_64.
No, SSE4.2 is too far. For me, the minimum should be AVX.
SSE4.2 is 2008 for Intel, 2011 for AMD. Though I guess some processors were released without it for some time after that. AVX was released by both in 2011. So why is one too far and the other not?
To have a separate architecture would require automated builds, which requires being able to sign packages automatically. And we have not achieved database signing in 9 years.... I'm looking for a boost that could be achieved now.
No, it would not. Where is this coming from? I already build split packages with SIMD instructions, I make the PKGBUILD build for 2 architectures instead with a minimal patch.
If pacman is not able to handle parallel architectures, we should fix that. I think it's a valid use case.
No need for pacman support. Just add higher instruction set to a new repo and set that repo with higher priority. But that involves developers choosing which packages to build with higher instruction sets, which requires extra developer time. Ideally, we would just autobuild for more optimized architectures, but this requires auto-signing packages, which has not happened in the last decade (but may in this one...). Picking an instruction set that is ~10 years old and making it the default for the distro seems a reasonable approach to me. A
On Mon, 2020-03-30 at 09:07 +1000, Allan McRae via arch-dev-public wrote:
SSE4.2 is 2008 for Intel, 2011 for AMD. Though I guess some processors were released without it for some time after that. AVX was released by both in 2011.
So why is one too far and the other not?
I was looking at some edge cases where the CPUs had AVX but no SSE4.2. Intel's website is also a bit unreliable with older CPUs, a lot of the cases it is not listing extensions when it should be.
No need for pacman support. Just add higher instruction set to a new repo and set that repo with higher priority.
Right, that works too. Although, we can't have the word "auto" in the arch definition automatically identifying and enabling the supported extensions.
But that involves developers choosing which packages to build with higher instruction sets, which requires extra developer time.
Well. The thing about these CPU extensions is that they require very specific use workloads to be useful. They are more efficient when performing the tasks they were design to do, but they have an higher power consumption, which generally results in the CPU lowering clocks. To make proper use of them, you usually need to build your algorithm around them. Just enabling it in GCC might bring some performance improvements but they are usually *very* minimal. Upstreams that benefit from these extensions will most of the time write their own kernels and provide runtime detection or add a switch in the build system. I don't believe this is too much of burden for packagers. 99.9% of the time, just building a package with SEE/AVX natively in GCC will not bring any relevant performance enhancement. Furthermore, building an extra optimized version makes it much more reasonable for us to choose AVX2, which will have a much higher real world impact than just building everything with SSE4.2 instead. With that said, if you still want to bump the minimum requirements, could please consider taking just a little bit of time to run some benchmarks on packages you think should get an improvement?
Ideally, we would just autobuild for more optimized architectures, but this requires auto-signing packages, which has not happened in the last decade (but may in this one...).
Picking an instruction set that is ~10 years old and making it the default for the distro seems a reasonable approach to me.
The instruction set might be 10 years old, but you have to look at the discontinuation date for CPU families which don't support it, and maybe add 1 or 2 years for the bulk of the inventory to be cleared out from stores. Cheers, Filipe Laíns
participants (9)
-
Allan McRae
-
Andreas Radke
-
Eli Schwartz
-
Evangelos Foutras
-
Filipe Laíns
-
Gaetan Bisson
-
Morten Linderud
-
Santiago Torres-Arias
-
Sven-Hendrik Haase