[arch-mirrors] Possibility of adding debug repositories
If Arch Linux were to add repositories containing split debug packages for all our x86_64 packages, this would obviously add a fair amount of space to the mirror requirements. It could possibly double or triple the size taken by non-data packages. I don't have real-world numbers for how much space it would take up, but I do have some comparisons. For my custom repo as a sample, which I do upload debug packages for, I am using 2.9Gi of space, and 1.7Gi comes from debug packages. On the other side of things, our biggest 2 official packages are cuda, 4137.96 MiB of proprietary blobs that aren't currently stripped and therefore even if we did split it into a debug package it wouldn't increase space usage at all, and kicad-library-3d, 5171.97 MiB of pure data in /usr/share, which is not eligible for debug packages anyway. A naive list of packages which would probably generate debug packages: $ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1109.50 MiB emscripten x86_64 1136.27 MiB python-tensorflow x86_64 1145.23 MiB python-tensorflow-opt x86_64 1286.69 MiB python-pytorch-cuda x86_64 1289.44 MiB python-pytorch-opt-cuda x86_64 1518.42 MiB ghc-static x86_64 2589.23 MiB python-tensorflow-cuda x86_64 2597.38 MiB python-tensorflow-opt-cuda x86_64 3757.68 MiB tensorflow-cuda x86_64 3765.75 MiB tensorflow-opt-cuda x86_64 4137.96 MiB cuda x86_64 (Basically all of the really big stuff is tensorflow/cuda/machine learning bits. We could selectively disable debug packages for two PKGBUILDs and avoid all the worst offenders, if we needed to. heh.) Packages which definitely would not (there's some big, high-profile packages here, and my custom repo doesn't reflect this sort of spread at all): $ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1202.02 MiB texlive-fontsextra any 1307.98 MiB texlive-fontsextra any 2006.67 MiB 0ad-data any 3112.39 MiB nltk-data any 5171.97 MiB kicad-library-3d any ... Anyway, providing these symbols would be generally desirable for users, and ideally it would work opt-out to make it easier for users to get access to them. It's something we've generally wanted to do, see for example https://bugs.archlinux.org/task/38755 And it's possible we may actually, at long last, get around to implementing this. So, question to mirror admins: if Arch was to add debug repositories, would you be okay syncing them? And should it be opt-in or opt out? The answers to these questions will influence the direction I will take in trying to devise a satisfactory resolution to this outstanding infrastructure request. So I would love to get some input from the people who would be affected by such a change. -- Eli Schwartz Bug Wrangler and Trusted User
Hi Eli,
So, question to mirror admins: if Arch was to add debug repositories, would you be okay syncing them? And should it be opt-in or opt out?
We (ftp.snt.utwente.nl) would be okay syncing them. I really don't mind whether it will be an opt-in or opt-out solution. I'm okay with an increase in size of the current mirror, but I also don't mind setting up an extra rsync-cronjob if that is what it takes to opt-in. Kind regards, Erwin Bronkhorst SNT FTPCom
-----Oorspronkelijk bericht----- Van: arch-mirrors <arch-mirrors-bounces@archlinux.org> Namens Eli Schwartz Verzonden: vrijdag 5 juni 2020 21:58 Aan: arch-mirrors@archlinux.org Onderwerp: [arch-mirrors] Possibility of adding debug repositories
If Arch Linux were to add repositories containing split debug packages for all our x86_64 packages, this would obviously add a fair amount of space to the mirror requirements. It could possibly double or triple the size taken by non-data packages. I don't have real-world numbers for how much space it would take up, but I do have some comparisons.
For my custom repo as a sample, which I do upload debug packages for, I am using 2.9Gi of space, and 1.7Gi comes from debug packages.
On the other side of things, our biggest 2 official packages are cuda, 4137.96 MiB of proprietary blobs that aren't currently stripped and therefore even if we did split it into a debug package it wouldn't increase space usage at all, and kicad-library-3d, 5171.97 MiB of pure data in /usr/share, which is not eligible for debug packages anyway.
A naive list of packages which would probably generate debug packages:
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1109.50 MiB emscripten x86_64 1136.27 MiB python-tensorflow x86_64 1145.23 MiB python-tensorflow-opt x86_64 1286.69 MiB python-pytorch-cuda x86_64 1289.44 MiB python-pytorch-opt-cuda x86_64 1518.42 MiB ghc-static x86_64 2589.23 MiB python-tensorflow-cuda x86_64 2597.38 MiB python-tensorflow-opt-cuda x86_64 3757.68 MiB tensorflow-cuda x86_64 3765.75 MiB tensorflow-opt-cuda x86_64 4137.96 MiB cuda x86_64
(Basically all of the really big stuff is tensorflow/cuda/machine learning bits. We could selectively disable debug packages for two PKGBUILDs and avoid all the worst offenders, if we needed to. heh.)
Packages which definitely would not (there's some big, high-profile packages here, and my custom repo doesn't reflect this sort of spread at all):
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1202.02 MiB texlive-fontsextra any 1307.98 MiB texlive-fontsextra any 2006.67 MiB 0ad-data any 3112.39 MiB nltk-data any 5171.97 MiB kicad-library-3d any
...
Anyway, providing these symbols would be generally desirable for users, and ideally it would work opt-out to make it easier for users to get access to them. It's something we've generally wanted to do, see for example https://bugs.archlinux.org/task/38755 And it's possible we may actually, at long last, get around to implementing this.
So, question to mirror admins: if Arch was to add debug repositories, would you be okay syncing them? And should it be opt-in or opt out?
The answers to these questions will influence the direction I will take in trying to devise a satisfactory resolution to this outstanding infrastructure request. So I would love to get some input from the people who would be affected by such a change.
-- Eli Schwartz Bug Wrangler and Trusted User
Hacking & Coffee would be interested, but make it a separate "opt-in" dataset especially if it's unpredictable how much space it will take. If suddenly enabled, and set to opt-out, I imagine some mirror clusters would have a sudden, unexpected storage problem On Sat, 6 Jun 2020, 17:12 Erwin Bronkhorst - Studenten Net Twente, < erwin@snt.utwente.nl> wrote:
Hi Eli,
So, question to mirror admins: if Arch was to add debug repositories, would you be okay syncing them? And should it be opt-in or opt out?
We (ftp.snt.utwente.nl) would be okay syncing them. I really don't mind whether it will be an opt-in or opt-out solution. I'm okay with an increase in size of the current mirror, but I also don't mind setting up an extra rsync-cronjob if that is what it takes to opt-in.
Kind regards,
Erwin Bronkhorst SNT FTPCom
-----Oorspronkelijk bericht----- Van: arch-mirrors <arch-mirrors-bounces@archlinux.org> Namens Eli Schwartz Verzonden: vrijdag 5 juni 2020 21:58 Aan: arch-mirrors@archlinux.org Onderwerp: [arch-mirrors] Possibility of adding debug repositories
If Arch Linux were to add repositories containing split debug packages for all our x86_64 packages, this would obviously add a fair amount of space to the mirror requirements. It could possibly double or triple the size taken by non-data packages. I don't have real-world numbers for how much space it would take up, but I do have some comparisons.
For my custom repo as a sample, which I do upload debug packages for, I am using 2.9Gi of space, and 1.7Gi comes from debug packages.
On the other side of things, our biggest 2 official packages are cuda, 4137.96 MiB of proprietary blobs that aren't currently stripped and therefore even if we did split it into a debug package it wouldn't increase space usage at all, and kicad-library-3d, 5171.97 MiB of pure data in /usr/share, which is not eligible for debug packages anyway.
A naive list of packages which would probably generate debug packages:
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1109.50 MiB emscripten x86_64 1136.27 MiB python-tensorflow x86_64 1145.23 MiB python-tensorflow-opt x86_64 1286.69 MiB python-pytorch-cuda x86_64 1289.44 MiB python-pytorch-opt-cuda x86_64 1518.42 MiB ghc-static x86_64 2589.23 MiB python-tensorflow-cuda x86_64 2597.38 MiB python-tensorflow-opt-cuda x86_64 3757.68 MiB tensorflow-cuda x86_64 3765.75 MiB tensorflow-opt-cuda x86_64 4137.96 MiB cuda x86_64
(Basically all of the really big stuff is tensorflow/cuda/machine learning bits. We could selectively disable debug packages for two PKGBUILDs and avoid all the worst offenders, if we needed to. heh.)
Packages which definitely would not (there's some big, high-profile packages here, and my custom repo doesn't reflect this sort of spread at all):
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1202.02 MiB texlive-fontsextra any 1307.98 MiB texlive-fontsextra any 2006.67 MiB 0ad-data any 3112.39 MiB nltk-data any 5171.97 MiB kicad-library-3d any
...
Anyway, providing these symbols would be generally desirable for users, and ideally it would work opt-out to make it easier for users to get access to them. It's something we've generally wanted to do, see for example https://bugs.archlinux.org/task/38755 And it's possible we may actually, at long last, get around to implementing this.
So, question to mirror admins: if Arch was to add debug repositories, would you be okay syncing them? And should it be opt-in or opt out?
The answers to these questions will influence the direction I will take in trying to devise a satisfactory resolution to this outstanding infrastructure request. So I would love to get some input from the people who would be affected by such a change.
-- Eli Schwartz Bug Wrangler and Trusted User
-- This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
I (arlm.tyzoid.com) would prefer an opt-in solution. Such a method would allow us to evaluate in advance the storage requirements to make sure that we don't run out of storage space on the virtual disk. How it's implemented with rsync is an open question. Would it exist as a new endpoint? If so, wouldn't it necessitate extra config on the apache/nginx side? On Sat, Jun 6, 2020, 6:37 PM Tails Hon1nbo <hon1nbo+mirror@hackingand.coffee> wrote:
Hacking & Coffee would be interested, but make it a separate "opt-in" dataset especially if it's unpredictable how much space it will take. If suddenly enabled, and set to opt-out, I imagine some mirror clusters would have a sudden, unexpected storage problem
On Sat, 6 Jun 2020, 17:12 Erwin Bronkhorst - Studenten Net Twente, < erwin@snt.utwente.nl> wrote:
Hi Eli,
So, question to mirror admins: if Arch was to add debug repositories, would you be okay syncing them? And should it be opt-in or opt out?
We (ftp.snt.utwente.nl) would be okay syncing them. I really don't mind whether it will be an opt-in or opt-out solution. I'm okay with an increase in size of the current mirror, but I also don't mind setting up an extra rsync-cronjob if that is what it takes to opt-in.
Kind regards,
Erwin Bronkhorst SNT FTPCom
-----Oorspronkelijk bericht----- Van: arch-mirrors <arch-mirrors-bounces@archlinux.org> Namens Eli Schwartz Verzonden: vrijdag 5 juni 2020 21:58 Aan: arch-mirrors@archlinux.org Onderwerp: [arch-mirrors] Possibility of adding debug repositories
If Arch Linux were to add repositories containing split debug packages for all our x86_64 packages, this would obviously add a fair amount of space to the mirror requirements. It could possibly double or triple the size taken by non-data packages. I don't have real-world numbers for how much space it would take up, but I do have some comparisons.
For my custom repo as a sample, which I do upload debug packages for, I am using 2.9Gi of space, and 1.7Gi comes from debug packages.
On the other side of things, our biggest 2 official packages are cuda, 4137.96 MiB of proprietary blobs that aren't currently stripped and therefore even if we did split it into a debug package it wouldn't increase space usage at all, and kicad-library-3d, 5171.97 MiB of pure data in /usr/share, which is not eligible for debug packages anyway.
A naive list of packages which would probably generate debug packages:
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1109.50 MiB emscripten x86_64 1136.27 MiB python-tensorflow x86_64 1145.23 MiB python-tensorflow-opt x86_64 1286.69 MiB python-pytorch-cuda x86_64 1289.44 MiB python-pytorch-opt-cuda x86_64 1518.42 MiB ghc-static x86_64 2589.23 MiB python-tensorflow-cuda x86_64 2597.38 MiB python-tensorflow-opt-cuda x86_64 3757.68 MiB tensorflow-cuda x86_64 3765.75 MiB tensorflow-opt-cuda x86_64 4137.96 MiB cuda x86_64
(Basically all of the really big stuff is tensorflow/cuda/machine learning bits. We could selectively disable debug packages for two PKGBUILDs and avoid all the worst offenders, if we needed to. heh.)
Packages which definitely would not (there's some big, high-profile packages here, and my custom repo doesn't reflect this sort of spread at all):
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1202.02 MiB texlive-fontsextra any 1307.98 MiB texlive-fontsextra any 2006.67 MiB 0ad-data any 3112.39 MiB nltk-data any 5171.97 MiB kicad-library-3d any
...
Anyway, providing these symbols would be generally desirable for users, and ideally it would work opt-out to make it easier for users to get access to them. It's something we've generally wanted to do, see for example https://bugs.archlinux.org/task/38755 And it's possible we may actually, at long last, get around to implementing this.
So, question to mirror admins: if Arch was to add debug repositories, would you be okay syncing them? And should it be opt-in or opt out?
The answers to these questions will influence the direction I will take in trying to devise a satisfactory resolution to this outstanding infrastructure request. So I would love to get some input from the people who would be affected by such a change.
-- Eli Schwartz Bug Wrangler and Trusted User
This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.
On 6/6/20 7:05 PM, Tyler Dence wrote:
I (arlm.tyzoid.com) would prefer an opt-in solution. Such a method would allow us to evaluate in advance the storage requirements to make sure that we don't run out of storage space on the virtual disk.
How it's implemented with rsync is an open question. Would it exist as a new endpoint? If so, wouldn't it necessitate extra config on the apache/nginx side?
On Sat, Jun 6, 2020, 6:37 PM Tails Hon1nbo <hon1nbo+mirror@hackingand.coffee> wrote:
Hacking & Coffee would be interested, but make it a separate "opt-in" dataset especially if it's unpredictable how much space it will take. If suddenly enabled, and set to opt-out, I imagine some mirror clusters would have a sudden, unexpected storage problem The storage requirements should not sharply rise, unless we do a mass rebuild that enables debug packages. I think a reasonable thing to do from the packaging side of things is to rebuild some critical libraries, but leave most packages to simply acquire debug packages whenever they are rebuilt for other reasons.
We could also give advance notice, e.g. something along the lines of "we will enable debug packages next month, make sure your disks can handle irregular growth of up to XXX gb or set the following rsync exclusion in advance and re-evaluate after the churn". Would this be a reasonable way to handle it? It would have the advantage of not requiring new rsync endpoints and server configs. -- Eli Schwartz Bug Wrangler and Trusted User
ftp.acc.umu.se is good with either decision, having the default rsync module include the debug repos would be easiest for us (ie. opt-out). On Fri, 5 Jun 2020, Eli Schwartz wrote:
If Arch Linux were to add repositories containing split debug packages for all our x86_64 packages, this would obviously add a fair amount of space to the mirror requirements. It could possibly double or triple the size taken by non-data packages. I don't have real-world numbers for how much space it would take up, but I do have some comparisons.
For my custom repo as a sample, which I do upload debug packages for, I am using 2.9Gi of space, and 1.7Gi comes from debug packages.
On the other side of things, our biggest 2 official packages are cuda, 4137.96 MiB of proprietary blobs that aren't currently stripped and therefore even if we did split it into a debug package it wouldn't increase space usage at all, and kicad-library-3d, 5171.97 MiB of pure data in /usr/share, which is not eligible for debug packages anyway.
A naive list of packages which would probably generate debug packages:
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1109.50 MiB emscripten x86_64 1136.27 MiB python-tensorflow x86_64 1145.23 MiB python-tensorflow-opt x86_64 1286.69 MiB python-pytorch-cuda x86_64 1289.44 MiB python-pytorch-opt-cuda x86_64 1518.42 MiB ghc-static x86_64 2589.23 MiB python-tensorflow-cuda x86_64 2597.38 MiB python-tensorflow-opt-cuda x86_64 3757.68 MiB tensorflow-cuda x86_64 3765.75 MiB tensorflow-opt-cuda x86_64 4137.96 MiB cuda x86_64
(Basically all of the really big stuff is tensorflow/cuda/machine learning bits. We could selectively disable debug packages for two PKGBUILDs and avoid all the worst offenders, if we needed to. heh.)
Packages which definitely would not (there's some big, high-profile packages here, and my custom repo doesn't reflect this sort of spread at all):
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1202.02 MiB texlive-fontsextra any 1307.98 MiB texlive-fontsextra any 2006.67 MiB 0ad-data any 3112.39 MiB nltk-data any 5171.97 MiB kicad-library-3d any
...
Anyway, providing these symbols would be generally desirable for users, and ideally it would work opt-out to make it easier for users to get access to them. It's something we've generally wanted to do, see for example https://bugs.archlinux.org/task/38755 And it's possible we may actually, at long last, get around to implementing this.
So, question to mirror admins: if Arch was to add debug repositories, would you be okay syncing them? And should it be opt-in or opt out?
The answers to these questions will influence the direction I will take in trying to devise a satisfactory resolution to this outstanding infrastructure request. So I would love to get some input from the people who would be affected by such a change.
/Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke@acc.umu.se --------------------------------------------------------------------------- <==Hey, dude, It's Registered! I'M legal. Are you? =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Em junho 5, 2020 16:58 Eli Schwartz escreveu:
If Arch Linux were to add repositories containing split debug packages for all our x86_64 packages, this would obviously add a fair amount of space to the mirror requirements. It could possibly double or triple the size taken by non-data packages. I don't have real-world numbers for how much space it would take up, but I do have some comparisons.
For my custom repo as a sample, which I do upload debug packages for, I am using 2.9Gi of space, and 1.7Gi comes from debug packages.
On the other side of things, our biggest 2 official packages are cuda, 4137.96 MiB of proprietary blobs that aren't currently stripped and therefore even if we did split it into a debug package it wouldn't increase space usage at all, and kicad-library-3d, 5171.97 MiB of pure data in /usr/share, which is not eligible for debug packages anyway.
A naive list of packages which would probably generate debug packages:
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1109.50 MiB emscripten x86_64 1136.27 MiB python-tensorflow x86_64 1145.23 MiB python-tensorflow-opt x86_64 1286.69 MiB python-pytorch-cuda x86_64 1289.44 MiB python-pytorch-opt-cuda x86_64 1518.42 MiB ghc-static x86_64 2589.23 MiB python-tensorflow-cuda x86_64 2597.38 MiB python-tensorflow-opt-cuda x86_64 3757.68 MiB tensorflow-cuda x86_64 3765.75 MiB tensorflow-opt-cuda x86_64 4137.96 MiB cuda x86_64
(Basically all of the really big stuff is tensorflow/cuda/machine learning bits. We could selectively disable debug packages for two PKGBUILDs and avoid all the worst offenders, if we needed to. heh.)
Packages which definitely would not (there's some big, high-profile packages here, and my custom repo doesn't reflect this sort of spread at all):
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1202.02 MiB texlive-fontsextra any 1307.98 MiB texlive-fontsextra any 2006.67 MiB 0ad-data any 3112.39 MiB nltk-data any 5171.97 MiB kicad-library-3d any
...
Anyway, providing these symbols would be generally desirable for users, and ideally it would work opt-out to make it easier for users to get access to them. It's something we've generally wanted to do, see for example https://bugs.archlinux.org/task/38755 And it's possible we may actually, at long last, get around to implementing this.
So, question to mirror admins: if Arch was to add debug repositories, would you be okay syncing them? And should it be opt-in or opt out?
The answers to these questions will influence the direction I will take in trying to devise a satisfactory resolution to this outstanding infrastructure request. So I would love to get some input from the people who would be affected by such a change.
Hi Eli, Did we even investigate debuginfod? I really don't think we should add -debug packages. I have been taking a look at it, and it couples well with our reproducible effort, since build id's are used to search for symbols on the debuginfod server. Regards, Giancarlo Razzolini
On 6/8/20 6:12 AM, Giancarlo Razzolini wrote:
Em junho 5, 2020 16:58 Eli Schwartz escreveu:
If Arch Linux were to add repositories containing split debug packages for all our x86_64 packages, this would obviously add a fair amount of space to the mirror requirements. It could possibly double or triple the size taken by non-data packages. I don't have real-world numbers for how much space it would take up, but I do have some comparisons.
For my custom repo as a sample, which I do upload debug packages for, I am using 2.9Gi of space, and 1.7Gi comes from debug packages.
On the other side of things, our biggest 2 official packages are cuda, 4137.96 MiB of proprietary blobs that aren't currently stripped and therefore even if we did split it into a debug package it wouldn't increase space usage at all, and kicad-library-3d, 5171.97 MiB of pure data in /usr/share, which is not eligible for debug packages anyway.
A naive list of packages which would probably generate debug packages:
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1109.50 MiB emscripten x86_64 1136.27 MiB python-tensorflow x86_64 1145.23 MiB python-tensorflow-opt x86_64 1286.69 MiB python-pytorch-cuda x86_64 1289.44 MiB python-pytorch-opt-cuda x86_64 1518.42 MiB ghc-static x86_64 2589.23 MiB python-tensorflow-cuda x86_64 2597.38 MiB python-tensorflow-opt-cuda x86_64 3757.68 MiB tensorflow-cuda x86_64 3765.75 MiB tensorflow-opt-cuda x86_64 4137.96 MiB cuda x86_64
(Basically all of the really big stuff is tensorflow/cuda/machine learning bits. We could selectively disable debug packages for two PKGBUILDs and avoid all the worst offenders, if we needed to. heh.)
Packages which definitely would not (there's some big, high-profile packages here, and my custom repo doesn't reflect this sort of spread at all):
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1202.02 MiB texlive-fontsextra any 1307.98 MiB texlive-fontsextra any 2006.67 MiB 0ad-data any 3112.39 MiB nltk-data any 5171.97 MiB kicad-library-3d any
...
Anyway, providing these symbols would be generally desirable for users, and ideally it would work opt-out to make it easier for users to get access to them. It's something we've generally wanted to do, see for example https://bugs.archlinux.org/task/38755 And it's possible we may actually, at long last, get around to implementing this.
So, question to mirror admins: if Arch was to add debug repositories, would you be okay syncing them? And should it be opt-in or opt out?
The answers to these questions will influence the direction I will take in trying to devise a satisfactory resolution to this outstanding infrastructure request. So I would love to get some input from the people who would be affected by such a change.
Hi Eli,
Did we even investigate debuginfod? I really don't think we should add -debug packages. I have been taking a look at it, and it couples well with our reproducible effort, since build id's are used to search for symbols on the debuginfod server.
debuginfod is an on-demand proxy for debug packages, so we need to build them anyway, and host them somewhere. I don't see how this relates to reproducible builds, since the debug packages must still exist, and are reproducible -- or not -- either way. Though in order to reproduce the debug using tools like makerepropkg or archlinux-repro packages, you cannot use debuginfod... Anyway, we still need some tool to accept debug packages, and clean up old ones or ones from packages which have been deleted. The most convenient way to do this is by adding it to a pacman repository. pacman repositories have additional advantages, in that anyone can get at the underlying debug packages, mirror them to run their own debuginfod server, re-host them in a downstream distro (Parabola), etc. Furthermore, users can install a debug package and pacman will upgrade it automatically (getting rid of the old version), instead of downloading it to a cache which goes stale and needs to be occasionally cleaned up. If you're often debugging a specific package or library, you can install debug symbols up front and it works without delay and without configuring $DEBUGINFOD_URLS. There are pros and cons to both ways of using it. debuginfod is wonderful for one-shot debug sessions, or if you're not sure which symbols for which libraries you'll need. tl;dr I believe we should publicly provide pacman repositories containing debug packages (Debian and Fedora also host repositories, and debuginfod was written by Fedora/Red Hat people...) and do so in a way that is convenient for users to actually use. We can then run debuginfod on any internally maintained mirror, and tell it to index the debug repository's pool directory. Third parties could host their own federated debuginfo servers as well. Mozilla could more easily import entire packages into their debug symbol server for use when resolving user crash reports that touch system-linked libraries. etc. -- Eli Schwartz Bug Wrangler and Trusted User
Hello, This is not a concern for archlinux.mirror.digitalpacific.com.au I think it should be mandatory for Tier1 mirrors to support whatever the Arch community requires, although Tier2 should be opt-in. Thanks, Matthew. On Sat, Jun 6, 2020 at 5:58 AM Eli Schwartz <eschwartz@archlinux.org> wrote:
If Arch Linux were to add repositories containing split debug packages for all our x86_64 packages, this would obviously add a fair amount of space to the mirror requirements. It could possibly double or triple the size taken by non-data packages. I don't have real-world numbers for how much space it would take up, but I do have some comparisons.
For my custom repo as a sample, which I do upload debug packages for, I am using 2.9Gi of space, and 1.7Gi comes from debug packages.
On the other side of things, our biggest 2 official packages are cuda, 4137.96 MiB of proprietary blobs that aren't currently stripped and therefore even if we did split it into a debug package it wouldn't increase space usage at all, and kicad-library-3d, 5171.97 MiB of pure data in /usr/share, which is not eligible for debug packages anyway.
A naive list of packages which would probably generate debug packages:
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1109.50 MiB emscripten x86_64 1136.27 MiB python-tensorflow x86_64 1145.23 MiB python-tensorflow-opt x86_64 1286.69 MiB python-pytorch-cuda x86_64 1289.44 MiB python-pytorch-opt-cuda x86_64 1518.42 MiB ghc-static x86_64 2589.23 MiB python-tensorflow-cuda x86_64 2597.38 MiB python-tensorflow-opt-cuda x86_64 3757.68 MiB tensorflow-cuda x86_64 3765.75 MiB tensorflow-opt-cuda x86_64 4137.96 MiB cuda x86_64
(Basically all of the really big stuff is tensorflow/cuda/machine learning bits. We could selectively disable debug packages for two PKGBUILDs and avoid all the worst offenders, if we needed to. heh.)
Packages which definitely would not (there's some big, high-profile packages here, and my custom repo doesn't reflect this sort of spread at all):
$ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort [...] 1202.02 MiB texlive-fontsextra any 1307.98 MiB texlive-fontsextra any 2006.67 MiB 0ad-data any 3112.39 MiB nltk-data any 5171.97 MiB kicad-library-3d any
...
Anyway, providing these symbols would be generally desirable for users, and ideally it would work opt-out to make it easier for users to get access to them. It's something we've generally wanted to do, see for example https://bugs.archlinux.org/task/38755 And it's possible we may actually, at long last, get around to implementing this.
So, question to mirror admins: if Arch was to add debug repositories, would you be okay syncing them? And should it be opt-in or opt out?
The answers to these questions will influence the direction I will take in trying to devise a satisfactory resolution to this outstanding infrastructure request. So I would love to get some input from the people who would be affected by such a change.
-- Eli Schwartz Bug Wrangler and Trusted User
participants (7)
-
Eli Schwartz
-
Erwin Bronkhorst - Studenten Net Twente
-
Giancarlo Razzolini
-
Matthew Taylor
-
Niklas Edmundsson
-
Tails Hon1nbo
-
Tyler Dence