[arch-general] Why are Archlinux packages stripped of (debugging) symbols?
Hello, Why is it that makepkg strips symbols by default, and many packagers even make extra effort to get packages stripped; instead of building with "-g"? Even Go software, which by Go's design makes use of debugging symbols at run time had been stripped as far as I remember (although it seems that has changed, thankfully). It is quite nice to have debugging symbols in executables for learning and entertainment purposes (seriously, try Ghidra or radare2 once), and they are, of course, indispensable when bad luck strikes and one actually has to debug. And there do not seem to be any significant downsides to extra symbols, it just means more permanent storage and bandwidth used. Especially in view of Arch's existing packaging practice patterns, like no "-dev" or "-doc" split packages. I know some developers have some degree of desire for split packages with stripped symbols in separate files, but that would indeed be inconsistent with the lack of "-dev" or "-doc" packages. More importantly, splitting symbols from executable files is most of the time a harmful complication: it makes packaging more complicated, it makes using the separated symbols by humans more complicated, and it makes using the debugging symbols from the program they belong to harder (ref. Ian Lance Taylor's libbacktrace, which does not work with symbols in a separate file, very possibly for reasons fundamental to libbacktrace's purpose). To conclude: besides arguing for debugging symbols to be installed as part of executable files, I am honestly asking what are the reasons for the apparent aversion towards them in Arch's (and wider) culture (because I am curious about that). Regards, Neven Sajko
On 1/21/20 3:21 PM, Neven Sajko via arch-general wrote:
Hello,
Why is it that makepkg strips symbols by default,
Because Arch Linux's default vendor options for makepkg.conf include the optional strip option.
and many packagers even make extra effort to get packages stripped; instead of building with "-g"?
Packagers do not go to extra effort for this. makepkg provides this as a tuneable, and any PKGBUILD is supposed to build with debug symbols when the "debug" makepkg.conf configuration option is set; if it does not, then the PKGBUILD has a bug that should be fixed.
Even Go software, which by Go's design makes use of debugging symbols at run time had been stripped as far as I remember (although it seems that has changed, thankfully).
There is no "even", here. The golang programming language is not *atypical*, it should not receive abnormal treatment. I'm not sure what you men by "design makes use of debugging symbols at runtime". They're debug symbols, not runtime logic symbols.
It is quite nice to have debugging symbols in executables for learning and entertainment purposes (seriously, try Ghidra or radare2 once), and they are, of course, indispensable when bad luck strikes and one actually has to debug.
It is very nice indeed! Splitdebug symbols work fine in gdb, and I believe in radare2 as well: https://github.com/radareorg/radare2/issues/5758 Of course, archlinux doesn't really provide splitdebug packages by default, so you cannot generally use them unless you're using your own packages...
And there do not seem to be any significant downsides to extra symbols, it just means more permanent storage and bandwidth used. Especially in view of Arch's existing packaging practice patterns, like no "-dev" or "-doc" split packages.
Headers and such are distributed along with the main package because by definition, they are needed as a core part of the project. Anyone who wants to build reverse dependencies needs them, the *only* people who don't need development headers are the people who never build packages themselves. There's a simple solution for such people: pacman.conf supports "NoExtract = usr/include/" Also, "NoExtract = usr/share/doc/" if you do not want the help documentation which many end users do in fact need. Debug symbols, on the other hand, are *always* unnecessary unless you are debugging. Moreover, they tend to result in dramatically increased package size. Headers are tiny, and docs often are (but we have lint checkers to warn us if abnormal packages contain mostly docs, and there are several packages that do indeed split out *-docs, so this is not an absolute!) Have you tried building, say, a web browser with debugging symbols?
I know some developers have some degree of desire for split packages with stripped symbols in separate files, but that would indeed be inconsistent with the lack of "-dev" or "-doc" packages. More importantly, splitting symbols from executable files is most of the time a harmful complication: it makes packaging more complicated,
No it does not, makepkg handles this transparently with absolutely no effort on the part of the maintainer. In fact, makepkg can programmatically split out debug packages using trivial logic when it *cannot* do so for development files (which include more than headers) or documentation (which is sort of kind of standard except not really), which may well be a contributing factor to why makepkg supports it at all. ;)
it makes using the separated symbols by humans more complicated, and it makes using the debugging symbols from the program they belong to harder (ref. Ian Lance Taylor's libbacktrace, which does not work with symbols in a separate file, very possibly for reasons fundamental to libbacktrace's purpose).
Perhaps libbacktrace should work more like gdb then? It works fine with gdb, and the ELF metadata has .gnu_debuglink for this exact purpose -- it's fundamental to binutils, see the objcopy manpage for example. You're saying it's "harder and more complicated" to use detached debug symbols, but I'm really not seeing it.
To conclude: besides arguing for debugging symbols to be installed as part of executable files, I am honestly asking what are the reasons for the apparent aversion towards them in Arch's (and wider) culture (because I am curious about that).
They're *huge*, and the standard gdb, when used to execute a program or to inspect a coredump file, can seamlessly merge the detached debug data and display enhanced debug info. This works even when you only install the split -debug package using pacman, *after* the program crashes. The coredump contains all the info you need. Programs like firefox have extensive upstream tooling for telemetry, whereby heavily stripped programs are distributed to end users, and if the program crashes it can send the backtrace to Mozilla.org; this backtrace is then merged with the debug info which is on Mozilla's servers, to produce meaningful output. Users don't have to suffer huge downloads. (Mozilla's symbol server can also be used with a trivial gdb script to let gdb download the debug info on-demand, if you're debugging firefox.) The Arch maintainer for firefox actually does exactly this -- our firefox package is stripped, but the symbols are uploaded to Mozilla right after makepkg completes. -- Eli Schwartz Bug Wrangler and Trusted User
There is no "even", here. The golang programming language is not *atypical*, it should not receive abnormal treatment.
I'm not sure what you men by "design makes use of debugging symbols at runtime". They're debug symbols, not runtime logic symbols.
Golang (and libbacktrace) use DWARF for backtraces at runtime.
It is very nice indeed! Splitdebug symbols work fine in gdb, and I believe in radare2 as well: https://github.com/radareorg/radare2/issues/5758
Of course, archlinux doesn't really provide splitdebug packages by default, so you cannot generally use them unless you're using your own packages...
I would of course prefer split debugging symbols to no symbols at all.
Debug symbols, on the other hand, are *always* unnecessary unless you are debugging. Moreover, they tend to result in dramatically increased package size. Headers are tiny, and docs often are (but we have lint checkers to warn us if abnormal packages contain mostly docs, and there are several packages that do indeed split out *-docs, so this is not an absolute!)
Have you tried building, say, a web browser with debugging symbols?
Sorry, I did not mean to argue that absolutely all executables must be installed with debugging symbols. The ideal situation I am imagining is that if a packager thinks the debugging symbols would be too much for some executable in the package, she simply disables them and enables stripping for the whole package. But most executables are small and stripping their debugging symbols does not gain much.
No it does not, makepkg handles this transparently with absolutely no effort on the part of the maintainer.
I was actually referring to the fact that this feature was not available before because of libalpm limitations (I think that required hooks or somehing, and was only added recently?). Anyway, I am not saying this is some great issue, but it certainly somewhat increases complexity of some Arch projects. But maybe that complexity is good if it is not exclusively needed for this usecase, thus on further thought I probably should have done more research before raising this particular point.
Perhaps libbacktrace should work more like gdb then? It works fine with gdb, and the ELF metadata has .gnu_debuglink for this exact purpose -- it's fundamental to binutils, see the objcopy manpage for example.
I assumed libbacktrace could not do that because of constraints on memory allocation (whether on stack or on heap) or reentrancy, but apparently it has that functionality since 2017. Oops.
You're saying it's "harder and more complicated" to use detached debug symbols, but I'm really not seeing it.
Depending on an arbitrary file determined by a path is complicated, there are all kinds of concerns, like async-signal-safety (one has to use open instead of fopen), getting the file before somebody overwrites it or moves it (or just changes a symlink) ...
They're *huge*, and the standard gdb, when used to execute a program or to inspect a coredump file, can seamlessly merge the detached debug data and display enhanced debug info. This works even when you only install the split -debug package using pacman, *after* the program crashes. The coredump contains all the info you need.
Programs like firefox have extensive upstream tooling for telemetry, whereby heavily stripped programs are distributed to end users, and if the program crashes it can send the backtrace to Mozilla.org; this backtrace is then merged with the debug info which is on Mozilla's servers, to produce meaningful output. Users don't have to suffer huge downloads.
(Mozilla's symbol server can also be used with a trivial gdb script to let gdb download the debug info on-demand, if you're debugging firefox.)
The Arch maintainer for firefox actually does exactly this -- our firefox package is stripped, but the symbols are uploaded to Mozilla right after makepkg completes.
Well this is certainly *complicated*. But it is warranted because of the great size difference, most packages don't need this and could include debugging symbols, I think. To reiterate, I certainly think that split debugging symbols in split packages in official repos would be an improvement; but I would like to know why are more packages built with included debugging symbols. Do you think that, eg., all packages in "core" being built with debugging symbols would be OK? Maybe it would be OK if just function names were included, without source file line info? Sidenote: Do you know why are split debug packages not yet available? Regards, Neven Sajko
One thing that I should have said right away is that one can not know in advance when and which executable he will need to debug.
Regarding the firefox example, are the split debugging symbols files publicly available?
On 1/21/20 6:00 PM, Neven Sajko wrote:
Regarding the firefox example, are the split debugging symbols files publicly available?
Mozilla's symbol server is described here: https://developer.mozilla.org/en-US/docs/Mozilla/Using_the_Mozilla_symbol_se... -- Eli Schwartz Bug Wrangler and Trusted User
Hi Neven, On Tue, 21 Jan 2020, 23:58 Neven Sajko via arch-general, < arch-general@archlinux.org> wrote:
One thing that I should have said right away is that one can not know in advance when and which executable he will need to debug.
Clear Linux uses a daemon installed in the client to make debug symbols automatically available on access. The details can be found here: https://docs.01.org/clearlinux/latest/guides/clear/debug.html Best Regards, Tobias
The reason I'd really like native packages to be built with split symbols even if they aren't included in the package but available through some other means... Is so that bug wranglers can more easily make sense of traces/coredumpctl info output, where rebuilding the package would just be a hassle and potentially result in different symbols, which defeats the point. Maybe one day users could submit coredumps / backtraces to a webservice that would reference the symbols, and "bucket" the traces to help triage/identify unique crashes
On 22/01/2020 14.36, Justin Capella via arch-general wrote:
point. Maybe one day users could submit coredumps / backtraces to a webservice that would reference the symbols, and "bucket" the traces to help triage/identify unique crashes
For me it would be better if I could just download debugging symbols using pacman and analyze coredumps locally. Regards, Łukasz
On 1/21/20 5:44 PM, Neven Sajko wrote:
There is no "even", here. The golang programming language is not *atypical*, it should not receive abnormal treatment.
I'm not sure what you men by "design makes use of debugging symbols at runtime". They're debug symbols, not runtime logic symbols.
Golang (and libbacktrace) use DWARF for backtraces at runtime.
Ah, should've guessed. :D That's a nice extra, but I'd still suspect that opening a coredump in gdb or similar is even better than getting a pretty backtrace on exit.
It is very nice indeed! Splitdebug symbols work fine in gdb, and I believe in radare2 as well: https://github.com/radareorg/radare2/issues/5758
Of course, archlinux doesn't really provide splitdebug packages by default, so you cannot generally use them unless you're using your own packages...
I would of course prefer split debugging symbols to no symbols at all.
Debug symbols, on the other hand, are *always* unnecessary unless you are debugging. Moreover, they tend to result in dramatically increased package size. Headers are tiny, and docs often are (but we have lint checkers to warn us if abnormal packages contain mostly docs, and there are several packages that do indeed split out *-docs, so this is not an absolute!)
Have you tried building, say, a web browser with debugging symbols?
Sorry, I did not mean to argue that absolutely all executables must be installed with debugging symbols. The ideal situation I am imagining is that if a packager thinks the debugging symbols would be too much for some executable in the package, she simply disables them and enables stripping for the whole package. But most executables are small and stripping their debugging symbols does not gain much.
No it does not, makepkg handles this transparently with absolutely no effort on the part of the maintainer.
I was actually referring to the fact that this feature was not available before because of libalpm limitations (I think that required hooks or somehing, and was only added recently?). Anyway, I am not saying this is some great issue, but it certainly somewhat increases complexity of some Arch projects. But maybe that complexity is good if it is not exclusively needed for this usecase, thus on further thought I probably should have done more research before raising this particular point.
It's not a libalpm limitation. :) pacman doesn't know or care about this, it just appears as a package. You can see how this works for the glib2/gtk3 packages using my personal repo: https://wiki.archlinux.org/index.php/Debug_-_Getting_Traces#Gtk3/glib2 The changes needed to handle debug packages would be all in the dbscripts project, and would amount to tracking the packages when they are added, and dispatching them to their own repository e.g. in [community-debug]
Perhaps libbacktrace should work more like gdb then? It works fine with gdb, and the ELF metadata has .gnu_debuglink for this exact purpose -- it's fundamental to binutils, see the objcopy manpage for example.
I assumed libbacktrace could not do that because of constraints on memory allocation (whether on stack or on heap) or reentrancy, but apparently it has that functionality since 2017. Oops.
Cool. ;) Has golang also grown that feature?
You're saying it's "harder and more complicated" to use detached debug symbols, but I'm really not seeing it.
Depending on an arbitrary file determined by a path is complicated, there are all kinds of concerns, like async-signal-safety (one has to use open instead of fopen), getting the file before somebody overwrites it or moves it (or just changes a symlink) ...
Eh, I don't really think you need to worry about people overwriting or moving it, we're dealing with package manager managed files. You'd need to have the same worries about plugins which are loaded via dlopen(), or programming languages that uses script interpreters rather than ld.so -- you can just assume there is consistency managed at the OS layer.
They're *huge*, and the standard gdb, when used to execute a program or to inspect a coredump file, can seamlessly merge the detached debug data and display enhanced debug info. This works even when you only install the split -debug package using pacman, *after* the program crashes. The coredump contains all the info you need.
Programs like firefox have extensive upstream tooling for telemetry, whereby heavily stripped programs are distributed to end users, and if the program crashes it can send the backtrace to Mozilla.org; this backtrace is then merged with the debug info which is on Mozilla's servers, to produce meaningful output. Users don't have to suffer huge downloads.
(Mozilla's symbol server can also be used with a trivial gdb script to let gdb download the debug info on-demand, if you're debugging firefox.)
The Arch maintainer for firefox actually does exactly this -- our firefox package is stripped, but the symbols are uploaded to Mozilla right after makepkg completes.
Well this is certainly *complicated*. But it is warranted because of the great size difference, most packages don't need this and could include debugging symbols, I think.
To reiterate, I certainly think that split debugging symbols in split packages in official repos would be an improvement; but I would like to know why are more packages built with included debugging symbols. Do you think that, eg., all packages in "core" being built with debugging symbols would be OK? Maybe it would be OK if just function names were included, without source file line info?
Sidenote: Do you know why are split debug packages not yet available?
I'm personally not a fan of bloating packages even by 10% or whatever for debug symbols that many users don't need. As I said above, split debug packages need "dbscripts" support to make sure they are correctly handled by our repository-building scripts. If dbscripts supported it, we could enable the debug option in devtools' makepkg.conf and start building all packages with debug info. (Patches welcome!) -- Eli Schwartz Bug Wrangler and Trusted User
Em janeiro 21, 2020 20:05 Eli Schwartz via arch-general escreveu:
I'm personally not a fan of bloating packages even by 10% or whatever for debug symbols that many users don't need.
Me neither.
As I said above, split debug packages need "dbscripts" support to make sure they are correctly handled by our repository-building scripts.
If dbscripts supported it, we could enable the debug option in devtools' makepkg.conf and start building all packages with debug info.
(Patches welcome!)
I wouldn't be opposed to have something like tecken [0] or some other software for this (not sure if there is one) where we would upload all the symbol artifacts for Arch built packages and that users could use when needed. This wouldn't require changing neither dbscripts nor devtools, but it would be helpful if devtools had some function to facilitate this. This solution wouldn't bloat neither our packages nor our mirrors and would be useful to all Arch users. Of course, we could keep only the last X number of versions on this service, I don't see the point in having something like the Arch Linux Archive where we try to preserve everything. Regards, Giancarlo Razzolini [0] https://github.com/mozilla-services/tecken
On 1/24/20 2:29 PM, Giancarlo Razzolini wrote:
I wouldn't be opposed to have something like tecken [0] or some other software for this (not sure if there is one) where we would upload all the symbol artifacts for Arch built packages and that users could use when needed.
This wouldn't require changing neither dbscripts nor devtools, but it would be helpful if devtools had some function to facilitate this. This solution wouldn't bloat neither our packages nor our mirrors and would be useful to all Arch users. Of course, we could keep only the last X number of versions on this service, I don't see the point in having something like the Arch Linux Archive where we try to preserve everything.
Regards, Giancarlo Razzolini
Oooh, that's actually a really interesting idea. I bet we could make this just consume a foo-debug package, then we could just modify devtools to add OPTIONS+=('debug') in makepkg.conf, and have commitpkg upload the debug package separately to the symbol server. -- Eli Schwartz Bug Wrangler and Trusted User
Em janeiro 24, 2020 16:41 Eli Schwartz via arch-general escreveu:
Oooh, that's actually a really interesting idea. I bet we could make this just consume a foo-debug package, then we could just modify devtools to add OPTIONS+=('debug') in makepkg.conf, and have commitpkg upload the debug package separately to the symbol server.
I thought more of a "build and they shall come" approach where we put this server out there and create a todo for people to upload symbols. Of course, -any packages don't need any changes, also, I don't think all packages would require symbols to be uploaded. Of course, if this is a relatively speaking low handing fruit to implement on our tooling, why not? I'm just not sure yet if tecken is usable outside mozilla things, but let's see. I have been discussing too with a KDE dev that approached me about this and was unaware of this thread (and other discussions we had in the past). And he'll also propose that KDE implements a symbols server, regardless of what we do. But, let me say again, I'm completely against having actual debug packages. Regards, Giancarlo Razzolini
participants (6)
-
Eli Schwartz
-
Giancarlo Razzolini
-
Justin Capella
-
Neven Sajko
-
Tobias Hunger
-
Łukasz Michalski