[arch-mirrors] Possibility of adding debug repositories

Eli Schwartz eschwartz at archlinux.org
Fri Jun 12 18:20:46 UTC 2020


On 6/8/20 6:12 AM, Giancarlo Razzolini wrote:
> Em junho 5, 2020 16:58 Eli Schwartz escreveu:
>> If Arch Linux were to add repositories containing split debug packages
>> for all our x86_64 packages, this would obviously add a fair amount of
>> space to the mirror requirements. It could possibly double or triple the
>> size taken by non-data packages. I don't have real-world numbers for how
>> much space it would take up, but I do have some comparisons.
>>
>> For my custom repo as a sample, which I do upload debug packages for, I
>> am using 2.9Gi of space, and 1.7Gi comes from debug packages.
>>
>> On the other side of things, our biggest 2 official packages are cuda,
>> 4137.96 MiB of proprietary blobs that aren't currently stripped and
>> therefore even if we did split it into a debug package it wouldn't
>> increase space usage at all, and kicad-library-3d, 5171.97 MiB of pure
>> data in /usr/share, which is not eligible for debug packages anyway.
>>
>> A naive list of packages which would probably generate debug packages:
>>
>> $ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort
>> [...]
>> 1109.50 MiB emscripten x86_64
>> 1136.27 MiB python-tensorflow x86_64
>> 1145.23 MiB python-tensorflow-opt x86_64
>> 1286.69 MiB python-pytorch-cuda x86_64
>> 1289.44 MiB python-pytorch-opt-cuda x86_64
>> 1518.42 MiB ghc-static x86_64
>> 2589.23 MiB python-tensorflow-cuda x86_64
>> 2597.38 MiB python-tensorflow-opt-cuda x86_64
>> 3757.68 MiB tensorflow-cuda x86_64
>> 3765.75 MiB tensorflow-opt-cuda x86_64
>> 4137.96 MiB cuda x86_64
>>
>> (Basically all of the really big stuff is tensorflow/cuda/machine
>> learning bits. We could selectively disable debug packages for two
>> PKGBUILDs and avoid all the worst offenders, if we needed to. heh.)
>>
>> Packages which definitely would not (there's some big, high-profile
>> packages here, and my custom repo doesn't reflect this sort of spread at
>> all):
>>
>> $ expac -H M '%m %n %a' | grep -v 'any$' | sort --human-numeric-sort
>> [...]
>> 1202.02 MiB texlive-fontsextra any
>> 1307.98 MiB texlive-fontsextra any
>> 2006.67 MiB 0ad-data any
>> 3112.39 MiB nltk-data any
>> 5171.97 MiB kicad-library-3d any
>>
>> ...
>>
>> Anyway, providing these symbols would be generally desirable for users,
>> and ideally it would work opt-out to make it easier for users to get
>> access to them. It's something we've generally wanted to do, see for
>> example https://bugs.archlinux.org/task/38755
>> And it's possible we may actually, at long last, get around to
>> implementing this.
>>
>> So, question to mirror admins: if Arch was to add debug repositories,
>> would you be okay syncing them? And should it be opt-in or opt out?
>>
>> The answers to these questions will influence the direction I will take
>> in trying to devise a satisfactory resolution to this outstanding
>> infrastructure request. So I would love to get some input from the
>> people who would be affected by such a change.
>>
> 
> Hi Eli,
> 
> Did we even investigate debuginfod? I really don't think we should add
> -debug
> packages. I have been taking a look at it, and it couples well with our
> reproducible
> effort, since build id's are used to search for symbols on the
> debuginfod server.

debuginfod is an on-demand proxy for debug packages, so we need to build
them anyway, and host them somewhere.

I don't see how this relates to reproducible builds, since the debug
packages must still exist, and are reproducible -- or not -- either way.
Though in order to reproduce the debug using tools like makerepropkg or
archlinux-repro packages, you cannot use debuginfod...

Anyway, we still need some tool to accept debug packages, and clean up
old ones or ones from packages which have been deleted. The most
convenient way to do this is by adding it to a pacman repository.

pacman repositories have additional advantages, in that anyone can get
at the underlying debug packages, mirror them to run their own
debuginfod server, re-host them in a downstream distro (Parabola), etc.

Furthermore, users can install a debug package and pacman will upgrade
it automatically (getting rid of the old version), instead of
downloading it to a cache which goes stale and needs to be occasionally
cleaned up. If you're often debugging a specific package or library, you
can install debug symbols up front and it works without delay and
without configuring $DEBUGINFOD_URLS.

There are pros and cons to both ways of using it. debuginfod is
wonderful for one-shot debug sessions, or if you're not sure which
symbols for which libraries you'll need.

tl;dr I believe we should publicly provide pacman repositories
containing debug packages (Debian and Fedora also host repositories, and
debuginfod was written by Fedora/Red Hat people...) and do so in a way
that is convenient for users to actually use.

We can then run debuginfod on any internally maintained mirror, and tell
it to index the debug repository's pool directory. Third parties could
host their own federated debuginfo servers as well. Mozilla could more
easily import entire packages into their debug symbol server for use
when resolving user crash reports that touch system-linked libraries. etc.

-- 
Eli Schwartz
Bug Wrangler and Trusted User

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1601 bytes
Desc: OpenPGP digital signature
URL: <https://lists.archlinux.org/pipermail/arch-mirrors/attachments/20200612/47320e52/attachment.sig>


More information about the arch-mirrors mailing list