[arch-devops] Let's get a big build box
Hi all, so this has been a long time coming as you know from IRC but now I'm actually taking the time to write an email. :P ## Suggested new server and finances So I'd like us to get a big build box. Specifically this one: https://www.hetzner.de/dedicated-rootserver/dell/dx292 This would be an upgrade to soyuz (and the current soyuz would go away). Total cost with 2x1.92TiB NVMe disks and 256GiB of RAM is € 461.00/month + € 219.00 setup. soyuz currently costs us € 54.00 so we'd be paying € 407.00/month extra. This is a big step up in cost but 1) our infra costs are very low all in all otherwise and 2) frankly we just have a ton of money laying around doing nothing and while that doesn't mean we have to spend it needlessly, I believe that this is a useful thing to do with the money. ## Performance ### Processors The suggested DX292 has two Intel Xeon Gold 6130 16-Core processors while the current one has a single Intel Xeon CPU E3-1275 v5. From benchmarks, I'm estimating the compute power to be almost exactly 4 times as good in the suggested server [0][1] for our workloads. ### Disks We currently have spinning disks in soyuz and that isn't great for building. While I believe soyuz instead puts chroots onto a tmpfs to mitigate this, it takes away from the usable RAM that we have. This is actually a problem as the server has ran out of memory a few times before. Using RAID1 NVMs (as in the suggested new server) for building would make that workaround unnecessary as these should just generally be fast enough for building. ## Reasoning I believe that the current soyuz is too small for bigger rebuilds and big packages for them to get done quickly. I've heard some members of the team complain about rebuild times of C++-based rebuilds in the past as well. I know that soyuz sits mostly idle currently but I suppose the reason for that is that some people build big packages on their own, faster machines (I know that I do this and some TUs as well). On my machine (12 threads), tensorflow takes ~10h to compile while pytorch and arrayfire are at 2-3h. Yes, these are certainly outliers but imagine we have quite a few more of these packages that I don't know about. Also big rebuilds like KDE, boost would benefit. Ultimately, we all want Arch CI and then we could theoretically dynamically spin up/down big build slaves automatically as we need. However, this is currently blocked by reproducible builds AND the svn-git migration. Therefore, I don't see that happening any time soon. This proposal is for getting a practical solution now and not in a few months/years. Additionally, this big server could also serve as a testbed for the CI. ### Alternatives People have suggested this [2] alternative in the past and while it's quite a bit cheaper, it's also only about half as powerful. While the CPU is about the same speed [3], it only has one of them. ## Closing I know that some people have been skeptical about getting a big, expensive server but I hope I made a good case for why I think we should get one. If not, well, at least we'll have it in the archive. Sven [0] cpubenchmark.net shows only the single processor version but we can roughly double the performance given our workload to estimate dual processor performance: https://www.cpubenchmark.net/compare/Intel-Xeon-Gold-6130-vs-Intel-Xeon-E3-1... [1] geekbench.com has whole systems and I actually found a DELL R740 which has the exact same processor configuration as the R640 DX292 from Hetzner that I'm suggesting. From those numbers, 4x the compute power seems about right: https://browser.geekbench.com/v4/cpu/11406589 vs https://browser.geekbench.com/v4/cpu/11568488 [2] https://www.hetzner.de/dedicated-rootserver/ax160 [3] https://www.cpubenchmark.net/compare/AMD-EPYC-7401P-vs-Intel-Xeon-Gold-6130/...
On 01/22/19 at 05:02pm, Sven-Hendrik Haase via arch-devops wrote:
Hi all,
so this has been a long time coming as you know from IRC but now I'm actually taking the time to write an email. :P
:)
## Suggested new server and finances
So I'd like us to get a big build box. Specifically this one: https://www.hetzner.de/dedicated-rootserver/dell/dx292 This would be an upgrade to soyuz (and the current soyuz would go away).
Total cost with 2x1.92TiB NVMe disks and 256GiB of RAM is € 461.00/month + € 219.00 setup.
soyuz currently costs us € 54.00 so we'd be paying € 407.00/month extra. This is a big step up in cost but 1) our infra costs are very low all in all otherwise and 2) frankly we just have a ton of money laying around doing nothing and while that doesn't mean we have to spend it needlessly, I believe that this is a useful thing to do with the money.
### Disks
We currently have spinning disks in soyuz and that isn't great for building. While I believe soyuz instead puts chroots onto a tmpfs to mitigate this, it takes away from the usable RAM that we have. This is actually a problem as the server has ran out of memory a few times before. Using RAID1 NVMs (as in the suggested new server) for building would make that workaround unnecessary as these should just generally be fast enough for building.
I agree, if we get something new, use nvme's and use RAM for building only. That saves the devops team from resetting a locked box due to these issues.
## Reasoning
I believe that the current soyuz is too small for bigger rebuilds and big packages for them to get done quickly. I've heard some members of the team complain about rebuild times of C++-based rebuilds in the past as well. I know that soyuz sits mostly idle currently but I suppose the reason for that is that some people build big packages on their own, faster machines (I know that I do this and some TUs as well). On my machine (12 threads), tensorflow takes ~10h to compile while pytorch and arrayfire are at 2-3h. Yes, these are certainly outliers but imagine we have quite a few more of these packages that I don't know about. Also big rebuilds like KDE, boost would benefit.
I can't really complain here, soyuz is fast enough for me but I don't package heavy stuf. I do however like this proposal with the following reasoning. We now have a buildserver where TU/Dev's build *official* packages and we run services which can be pwnd such as quassel/synapse and our irc bot. I want to have a nice separation of services and keep the buildserver "clean". If this means getting a new (smaller) box for < ~ 54 euro / month that's fine from my side as long as things are separated.
Ultimately, we all want Arch CI and then we could theoretically dynamically spin up/down big build slaves automatically as we need. However, this is currently blocked by reproducible builds AND the svn-git migration. Therefore, I don't see that happening any time soon. This proposal is for getting a practical solution now and not in a few months/years.
Additionally, this big server could also serve as a testbed for the CI.
For CI, we can (ab)use the four leftover PIA boxes of which two I want to use for setting up a reproducing CI for our packages. The other two can be used to test a CI, since it can just first test [core] for example.
### Alternatives
People have suggested this [2] alternative in the past and while it's quite a bit cheaper, it's also only about half as powerful. While the CPU is about the same speed [3], it only has one of them.
Since I glanced over it, the difference is that we then have two * 16 cores (32) instead of 24. It is however 164 euro versus ~ 2.5 times as much. It is however ~ 45% faster then our current setup but has more threads and double the amount of ram which would resolve most C++ issues (if not using -j24 I guess???).
## Closing
I know that some people have been skeptical about getting a big, expensive server but I hope I made a good case for why I think we should get one. If not, well, at least we'll have it in the archive.
I still think it's a very steep increase of spending per month i.e. 400 month increase.
Sven
[0] cpubenchmark.net shows only the single processor version but we can roughly double the performance given our workload to estimate dual processor performance: https://www.cpubenchmark.net/compare/Intel-Xeon-Gold-6130-vs-Intel-Xeon-E3-1... [1] geekbench.com has whole systems and I actually found a DELL R740 which has the exact same processor configuration as the R640 DX292 from Hetzner that I'm suggesting. From those numbers, 4x the compute power seems about right: https://browser.geekbench.com/v4/cpu/11406589 vs https://browser.geekbench.com/v4/cpu/11568488 [2] https://www.hetzner.de/dedicated-rootserver/ax160 [3] https://www.cpubenchmark.net/compare/AMD-EPYC-7401P-vs-Intel-Xeon-Gold-6130/...
-- Jelle van der Waa
On 22/01/2019 17.02, Sven-Hendrik Haase via arch-devops wrote:
I believe that the current soyuz is too small for bigger rebuilds and big packages for them to get done quickly. I've heard some members of the team complain about rebuild times of C++-based rebuilds in the past as well. I know that soyuz sits mostly idle currently but I suppose the reason for that is that some people build big packages on their own, faster machines (I know that I do this and some TUs as well). On my machine (12 threads), tensorflow takes ~10h to compile while pytorch and arrayfire are at 2-3h. Yes, these are certainly outliers but imagine we have quite a few more of these packages that I don't know about. Also big rebuilds like KDE, boost would benefit.
Almost 500€ a month is complete overkill for what we do and what we actually need. This machine is going to stay mostly idle and the fact that we received huge donation does not justify burning money. I'm also pretty sure we don't know about more packages like yours because no one else adds them. Both KDE and boost rebuilds were doing fine so far.
Ultimately, we all want Arch CI and then we could theoretically dynamically spin up/down big build slaves automatically as we need. However, this is currently blocked by reproducible builds AND the svn-git migration. Therefore, I don't see that happening any time soon. This proposal is for getting a practical solution now and not in a few months/years.
I don't think it's blocked by anything but time. Neither git migration nor reprobuilds affect development of service that would take source tarball or svn directory as input and return ready packages. We have access to packet.net thanks to CNCF, I just haven't heard from anyone actually interested in picking up the slack. Bartłomiej
On Wed, Jan 23, 2019 at 09:30:04AM +0100, Bartłomiej Piotrowski via arch-devops <arch-devops@lists.archlinux.org> wrote:
Almost 500€ a month is complete overkill for what we do and what we actually need.
500€ per month does indeed sound like too much. I can see why we'd maybe want a box with slightly more memory or with SSDs, but then again I don't know if building on HDDs really slows things down all that much compared to our current tempfs builds. Maybe someone is interested in building some 5-10minute package to get us some numbers to compare? If HDDs are really much slower, I can see why we might want a new machine with SSDs, but I don't see us needing as big of a machine as what you suggested. Florian
On 1/23/19 11:55 AM, Florian Pritz via arch-devops wrote:
On Wed, Jan 23, 2019 at 09:30:04AM +0100, Bartłomiej Piotrowski via arch-devops <arch-devops@lists.archlinux.org> wrote:
Almost 500€ a month is complete overkill for what we do and what we actually need.
500€ per month does indeed sound like too much.
I'd have to agree here... Another thing to consider is what the average monthly donation totals are for the distro. Even if we went with something beefy because "why not", my concern would be around sustaining it long term. That is, once the large sum we have burns down, do we have enough in monthly donations to keep it going... I don't have insight in the accounting side of things but I would be surprised if we total more than 500€/month on average. Just some commentary from the peanut gallery ;) Regards, Andrew
On Wed, 23 Jan 2019 at 14:33, Andrew Crerar <andrew@crerar.io> wrote:
On 1/23/19 11:55 AM, Florian Pritz via arch-devops wrote:
On Wed, Jan 23, 2019 at 09:30:04AM +0100, Bartłomiej Piotrowski via arch-devops <arch-devops@lists.archlinux.org> wrote:
Almost 500€ a month is complete overkill for what we do and what we actually need.
500€ per month does indeed sound like too much.
I'd have to agree here... Another thing to consider is what the average monthly donation totals are for the distro. Even if we went with something beefy because "why not", my concern would be around sustaining it long term. That is, once the large sum we have burns down, do we have enough in monthly donations to keep it going...
I don't have insight in the accounting side of things but I would be surprised if we total more than 500€/month on average.
Just some commentary from the peanut gallery ;)
Regards,
Andrew
Well, people seem to be overwhelmingly of the opinion that 500€/month is too much. In that case, I put forth the next best contender, the Hetzner AX160-NVMe at 164€/month base price. At its base configuration, it has half the memory, half the disk space and roughly half the compute power of the server I originally put forth but it's also 1/3 the price at this configuration. Given that we'd trade it for current soyuz at 54€/month, it means we'd pay 110€/month extra. What do you guys think about that?
On Fri, Jan 25, 2019 at 02:30:43PM +0100, Sven-Hendrik Haase via arch-devops <arch-devops@lists.archlinux.org> wrote:
In that case, I put forth the next best contender, the Hetzner AX160-NVMe at 164€/month base price.
That's certainly a much more realistic option, but I'm still not sure if we really need it. If I look at the cpu graph of soyuz for the last month, I see a lot of idle time. There's a base load from quassel/matrix which should really be moved elsewhere (a hetzner cloud VM maybe?) and the occasional peak, but I don't really see us needing a bigger machine just yet. I see the build server more as a support machine in case a packager doesn't have a suitable build machine themselves or if their network connection is too slow to upload the packages. For that purpose I'd say the load that soyuz has is perfectly fine and no upgrade is required. That said, I know that you want a faster machine for your big packages. Since I don't have any packages like that personally, I don't have a strong opinion here. Also I fear that if we have a really beefy machine, it might attract more attention from packagers with slower machines and therefore it might be more loaded than what we have now. I mean, who in their right mind wouldn't want to build on the fancy, new, super-fast build server where the same build takes only 1/4 of the time. I'd rather have a second machine similar to soyuz so that we can allow more people to build at the same time without stepping on each other's toes. Then again, we do have sgp.pkgbuild.com and we could probably convert 1-3 more machines if needed. I agree that these machines are "slow", but, to some degree, I see that as a good thing. I hope this explanation makes sense. If not feel free to tell me. Florian
participants (5)
-
Andrew Crerar
-
Bartłomiej Piotrowski
-
Florian Pritz
-
Jelle van der Waa
-
Sven-Hendrik Haase