New subject: [arch-general] RFC: Use x86_64-v2 architecture

13 Mar 2021

      Hello,

I am going to benchmark the performance difference between the various x86 uarch
levels. I will be using Phoronix Test Suite, which has some support for
performing compiler and compile flag benchmarks.

I am opposed to dropping support for older CPUs, but I will perform this test
fairly. That's why I'm posting this advance notice before performing actual
benchmarks.

I am an Ubuntu user and I am concerned that Ubuntu may require x86_64-2 in the
not so distant future. I will be performing this test on Ubuntu 20.04.2 with GCC
9.3.0 [1].

I am going to use selected tests from this Phoronix article [2], but I will
exclude benchmarks that do return much performance difference when using the
"-O1" and "-O3" compiler flags, as:
    - their build scripts may ignore the CFLAGS/CXXFLAGS variables,
    - they may use some assembly code or C asm intrinsics,
    - they may have separate SSE4/AVX code paths,
    - the compiler is unable to optimize the code much, due to its nature.

These are the benchmarks that probably would benefit the most from compiling for
different uarch levels, which should be taken into account when interpreting the
results.

My rough comparision between "-O1" and "-O3" are at [3].
So, I will use the following tests:
    pts/scimark2 (all tests)
    pts/john-the-ripper (all tests)
    pts/graphics-magick ("swirl", "resizing", "HWB Color space")
    pts/coremark
    pts/himeno
    pts/encode-flac
    pts/c-ray

Greetings,
Mateusz Jończyk

[1] GCC 9.3 does not support -march=x86_64-v2 and so. I will use switches like
-march=nehalem instead.
[2] https://www.phoronix.com/scan.php?page=article&item=gcc-10900k-compiler
[3] https://openbenchmarking.org/result/2103131-HA-DRAFTUARC92

----------------------

Benchmark selection details:

Page 2 from the Phoronix article:
    cryptopp - is compiling some code with the flag "-msse4.2", so skipping it.
    smhasher - same
    fftw - has some kernels that explicitly use AVX / AVX2, point in
benchmarking it,
    scimark - OK, I will also run some other tests from this benchmark - where
the difference between "-O1" and "-O3" is nice,

Page 3:
    TSCP - no real difference between "-O1" and "-O3" performance data,
    John The Ripper - small difference between "-O1" and "-O3", but leave it now
    GraphichMagick - OK, I'll choose tests with the biggest difference between
"-O1" and "-O3": "swirl", "resizing", "HWB Color space",

Page 4:
    AOM AV1 - no performance difference between "-O1" and "-O3", so skipping,
    x265 - patent encumbered format, skipping,

    Coremark - OK, "CoreMark Size 666 - Iterations Per Second"
    Himeno - OK,
    Stockfish - enables SSSE3 and SSE4.1 by default, leave it,
    FLAC Audio Encoding - OK, but probably the difference won't be big,
    Minion - tries to install many package dependencies, just leave it now,
    LevelDB - benchmark suite ignores the "-O1" flag, highly susceptible to
non-quiet systems (so the results were bogus),
    GROMACS - same as Minion, additionally has long runtime IIRC,
    Darmstadt Automotive Parallel Heterogeneous Suite (daphne) - requires huge
amounts of disk space, seems it will be IO-bound
    pgbench - I have still spinning HDDs, the benchmark was IO-bound
    NGiNX - it looks like it is testing both kernel and userspace, so leave it out,

Additional benchmarks:
    n-queens - no difference between "-O1" and "-O3", build scripts seem to
ignore CFLAGS,
    OpenSSL - no real difference between "-O1" and "-O3"
    c-ray - OK, let's include it

Re: [arch-general] RFC: Use x86_64-v2 architecture

Mateusz Jończyk

mpan

Mateusz Jończyk

tags

participants (2)