On 01/22/19 at 05:02pm, Sven-Hendrik Haase via arch-devops wrote:
Hi all,
so this has been a long time coming as you know from IRC but now I'm actually taking the time to write an email. :P
:)
## Suggested new server and finances
So I'd like us to get a big build box. Specifically this one: https://www.hetzner.de/dedicated-rootserver/dell/dx292 This would be an upgrade to soyuz (and the current soyuz would go away).
Total cost with 2x1.92TiB NVMe disks and 256GiB of RAM is € 461.00/month + € 219.00 setup.
soyuz currently costs us € 54.00 so we'd be paying € 407.00/month extra. This is a big step up in cost but 1) our infra costs are very low all in all otherwise and 2) frankly we just have a ton of money laying around doing nothing and while that doesn't mean we have to spend it needlessly, I believe that this is a useful thing to do with the money.
### Disks
We currently have spinning disks in soyuz and that isn't great for building. While I believe soyuz instead puts chroots onto a tmpfs to mitigate this, it takes away from the usable RAM that we have. This is actually a problem as the server has ran out of memory a few times before. Using RAID1 NVMs (as in the suggested new server) for building would make that workaround unnecessary as these should just generally be fast enough for building.
I agree, if we get something new, use nvme's and use RAM for building only. That saves the devops team from resetting a locked box due to these issues.
## Reasoning
I believe that the current soyuz is too small for bigger rebuilds and big packages for them to get done quickly. I've heard some members of the team complain about rebuild times of C++-based rebuilds in the past as well. I know that soyuz sits mostly idle currently but I suppose the reason for that is that some people build big packages on their own, faster machines (I know that I do this and some TUs as well). On my machine (12 threads), tensorflow takes ~10h to compile while pytorch and arrayfire are at 2-3h. Yes, these are certainly outliers but imagine we have quite a few more of these packages that I don't know about. Also big rebuilds like KDE, boost would benefit.
I can't really complain here, soyuz is fast enough for me but I don't package heavy stuf. I do however like this proposal with the following reasoning. We now have a buildserver where TU/Dev's build *official* packages and we run services which can be pwnd such as quassel/synapse and our irc bot. I want to have a nice separation of services and keep the buildserver "clean". If this means getting a new (smaller) box for < ~ 54 euro / month that's fine from my side as long as things are separated.
Ultimately, we all want Arch CI and then we could theoretically dynamically spin up/down big build slaves automatically as we need. However, this is currently blocked by reproducible builds AND the svn-git migration. Therefore, I don't see that happening any time soon. This proposal is for getting a practical solution now and not in a few months/years.
Additionally, this big server could also serve as a testbed for the CI.
For CI, we can (ab)use the four leftover PIA boxes of which two I want to use for setting up a reproducing CI for our packages. The other two can be used to test a CI, since it can just first test [core] for example.
### Alternatives
People have suggested this [2] alternative in the past and while it's quite a bit cheaper, it's also only about half as powerful. While the CPU is about the same speed [3], it only has one of them.
Since I glanced over it, the difference is that we then have two * 16 cores (32) instead of 24. It is however 164 euro versus ~ 2.5 times as much. It is however ~ 45% faster then our current setup but has more threads and double the amount of ram which would resolve most C++ issues (if not using -j24 I guess???).
## Closing
I know that some people have been skeptical about getting a big, expensive server but I hope I made a good case for why I think we should get one. If not, well, at least we'll have it in the archive.
I still think it's a very steep increase of spending per month i.e. 400 month increase.
Sven
[0] cpubenchmark.net shows only the single processor version but we can roughly double the performance given our workload to estimate dual processor performance: https://www.cpubenchmark.net/compare/Intel-Xeon-Gold-6130-vs-Intel-Xeon-E3-1... [1] geekbench.com has whole systems and I actually found a DELL R740 which has the exact same processor configuration as the R640 DX292 from Hetzner that I'm suggesting. From those numbers, 4x the compute power seems about right: https://browser.geekbench.com/v4/cpu/11406589 vs https://browser.geekbench.com/v4/cpu/11568488 [2] https://www.hetzner.de/dedicated-rootserver/ax160 [3] https://www.cpubenchmark.net/compare/AMD-EPYC-7401P-vs-Intel-Xeon-Gold-6130/...
-- Jelle van der Waa