[arch-devops] Let's get a big build box

Jelle van der Waa jelle at vdwaa.nl
Tue Jan 22 21:22:50 UTC 2019


On 01/22/19 at 05:02pm, Sven-Hendrik Haase via arch-devops wrote:
> Hi all,
> 
> so this has been a long time coming as you know from IRC but now I'm
> actually taking the time to write an email. :P

:)

> ## Suggested new server and finances
> 
> So I'd like us to get a big build box. Specifically this one:
> https://www.hetzner.de/dedicated-rootserver/dell/dx292
> This would be an upgrade to soyuz (and the current soyuz would go away).
> 
> Total cost with 2x1.92TiB NVMe disks and 256GiB of RAM is € 461.00/month +
> € 219.00 setup.
> 
> soyuz currently costs us € 54.00 so we'd be paying € 407.00/month extra.
> This is a big step up in cost but
> 1) our infra costs are very low all in all otherwise and
> 2) frankly we just have a ton of money laying around doing nothing and
> while that doesn't mean we have to spend it needlessly, I believe that this
> is a useful thing to do with the money.

> ### Disks
> 
> We currently have spinning disks in soyuz and that isn't great for
> building. While I believe soyuz instead puts chroots onto a tmpfs to
> mitigate this, it takes away from the usable RAM that we have. This is
> actually a problem as the server has ran out of memory a few times before.
> Using RAID1 NVMs (as in the suggested new server) for building would make
> that workaround unnecessary as these should just generally be fast enough
> for building.

I agree, if we get something new, use nvme's and use RAM for building
only. That saves the devops team from resetting a locked box due to
these issues.

> 
> ## Reasoning
> 
> I believe that the current soyuz is too small for bigger rebuilds and big
> packages for them to get done quickly. I've heard some members of the team
> complain about rebuild times of C++-based rebuilds in the past as well. I
> know that soyuz sits mostly idle currently but I suppose the reason for
> that is that some people build big packages on their own, faster machines
> (I know that I do this and some TUs as well). On my machine (12 threads),
> tensorflow takes ~10h to compile while pytorch and arrayfire are at 2-3h.
> Yes, these are certainly outliers but imagine we have quite a few more of
> these packages that I don't know about. Also big rebuilds like KDE, boost
> would benefit.

I can't really complain here, soyuz is fast enough for me but I don't
package heavy stuf.
I do however like this proposal with the following
reasoning. We now have a buildserver where TU/Dev's build *official*
packages and we run services which can be pwnd such as quassel/synapse
and our irc bot. I want to have a nice separation of services and keep
the buildserver "clean".

If this means getting a new (smaller) box for < ~ 54 euro / month that's
fine from my side as long as things are separated.

> Ultimately, we all want Arch CI and then we could theoretically dynamically
> spin up/down big build slaves automatically as we need. However, this is
> currently blocked by reproducible builds AND the svn-git migration.
> Therefore, I don't see that happening any time soon. This proposal is for
> getting a practical solution now and not in a few months/years.
> 
> Additionally, this big server could also serve as a testbed for the CI.

For CI, we can (ab)use the four leftover PIA boxes of which two I want to
use for setting up a reproducing CI for our packages. The other two can
be used to test a CI, since it can just first test [core] for example.

> ### Alternatives
> 
> People have suggested this [2] alternative in the past and while it's quite
> a bit cheaper, it's also only about half as powerful. While the CPU is
> about the same speed [3], it only has one of them.

Since I glanced over it, the difference is that we then have two * 16
cores (32) instead of 24. It is however 164 euro versus ~ 2.5 times as
much. 
It is however ~ 45% faster then our current setup but has more threads
and double the amount of ram which would resolve most C++ issues (if not
using -j24 I guess???).
 
> ## Closing
> 
> I know that some people have been skeptical about getting a big, expensive
> server but I hope I made a good case for why I think we should get one. If
> not, well, at least we'll have it in the archive.

I still think it's a very steep increase of spending per month i.e.  400
month increase.

> 
> Sven
> 
> [0] cpubenchmark.net shows only the single processor version but we can
> roughly double the performance given our workload to estimate dual
> processor performance:
> https://www.cpubenchmark.net/compare/Intel-Xeon-Gold-6130-vs-Intel-Xeon-E3-1275-v5/3126vs2672
> [1] geekbench.com has whole systems and I actually found a DELL R740 which
> has the exact same processor configuration as the R640 DX292 from Hetzner
> that I'm suggesting. From those numbers, 4x the compute power seems about
> right: https://browser.geekbench.com/v4/cpu/11406589 vs
> https://browser.geekbench.com/v4/cpu/11568488
> [2] https://www.hetzner.de/dedicated-rootserver/ax160
> [3]
> https://www.cpubenchmark.net/compare/AMD-EPYC-7401P-vs-Intel-Xeon-Gold-6130/3118vs3126

-- 
Jelle van der Waa


More information about the arch-devops mailing list