[arch-devops] Let's get a big build box

Sven-Hendrik Haase svenstaro at gmail.com
Tue Jan 22 16:02:32 UTC 2019

Hi all,

so this has been a long time coming as you know from IRC but now I'm
actually taking the time to write an email. :P

## Suggested new server and finances

So I'd like us to get a big build box. Specifically this one:
This would be an upgrade to soyuz (and the current soyuz would go away).

Total cost with 2x1.92TiB NVMe disks and 256GiB of RAM is € 461.00/month +
€ 219.00 setup.

soyuz currently costs us € 54.00 so we'd be paying € 407.00/month extra.
This is a big step up in cost but
1) our infra costs are very low all in all otherwise and
2) frankly we just have a ton of money laying around doing nothing and
while that doesn't mean we have to spend it needlessly, I believe that this
is a useful thing to do with the money.

## Performance

### Processors

The suggested DX292 has two Intel Xeon Gold 6130 16-Core processors while
the current one has a single Intel Xeon CPU E3-1275 v5. From benchmarks,
I'm estimating the compute power to be almost exactly 4 times as good in
the suggested server [0][1] for our workloads.

### Disks

We currently have spinning disks in soyuz and that isn't great for
building. While I believe soyuz instead puts chroots onto a tmpfs to
mitigate this, it takes away from the usable RAM that we have. This is
actually a problem as the server has ran out of memory a few times before.
Using RAID1 NVMs (as in the suggested new server) for building would make
that workaround unnecessary as these should just generally be fast enough
for building.

## Reasoning

I believe that the current soyuz is too small for bigger rebuilds and big
packages for them to get done quickly. I've heard some members of the team
complain about rebuild times of C++-based rebuilds in the past as well. I
know that soyuz sits mostly idle currently but I suppose the reason for
that is that some people build big packages on their own, faster machines
(I know that I do this and some TUs as well). On my machine (12 threads),
tensorflow takes ~10h to compile while pytorch and arrayfire are at 2-3h.
Yes, these are certainly outliers but imagine we have quite a few more of
these packages that I don't know about. Also big rebuilds like KDE, boost
would benefit.

Ultimately, we all want Arch CI and then we could theoretically dynamically
spin up/down big build slaves automatically as we need. However, this is
currently blocked by reproducible builds AND the svn-git migration.
Therefore, I don't see that happening any time soon. This proposal is for
getting a practical solution now and not in a few months/years.

Additionally, this big server could also serve as a testbed for the CI.

### Alternatives

People have suggested this [2] alternative in the past and while it's quite
a bit cheaper, it's also only about half as powerful. While the CPU is
about the same speed [3], it only has one of them.

## Closing

I know that some people have been skeptical about getting a big, expensive
server but I hope I made a good case for why I think we should get one. If
not, well, at least we'll have it in the archive.


[0] cpubenchmark.net shows only the single processor version but we can
roughly double the performance given our workload to estimate dual
processor performance:
[1] geekbench.com has whole systems and I actually found a DELL R740 which
has the exact same processor configuration as the R640 DX292 from Hetzner
that I'm suggesting. From those numbers, 4x the compute power seems about
right: https://browser.geekbench.com/v4/cpu/11406589 vs
[2] https://www.hetzner.de/dedicated-rootserver/ax160
