[pacman-dev] [PATCH] makepkg: Include more source files in debug packages.
Currently only the file pointed to by the DW_AT_name is included as a source file in debug packages. This means many files that are useful for debugging are not included. For example, no header files are included but yet these may by referenced in the .debug_line section. This sed script converts into shell variables the debug dump information from readelf about compilation units, directory tables and file tables. This can be used to get the full path of all the source files from within the package being compiled that are referenced in the debugging information. Also, placeholder symbols (e.g. <builtin>) and paths outside the current source (e.g. linked libraries) will be more consistently ignored from inclusion in the debug packages. Signed-off-by: Austin Lund <austin.lund@gmail.com> --- scripts/libmakepkg/tidy/strip.sh.in | 46 +++++++++++++++++++++++++++-- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/scripts/libmakepkg/tidy/strip.sh.in b/scripts/libmakepkg/tidy/strip.sh.in index 2b6f732d..2598d606 100644 --- a/scripts/libmakepkg/tidy/strip.sh.in +++ b/scripts/libmakepkg/tidy/strip.sh.in @@ -35,9 +35,51 @@ build_id() { LANG=C readelf -n "$1" | sed -n '/Build ID/ { s/.*: //p; q; }' } +_source_files_sedscript='1 { a \ +declare -A COMPDIR +} +/^Contents of the .debug_info section:$/,/^[^[:space:]]/ { +/^ <0>.*\(DW_TAG_compile_unit\)$/,/^ {0,3}[^[:space:]]/{ + /^ <0>.*\(DW_TAG_compile_unit\)$/{h;d} + /^ {0,3}[^[:space:]]/{ + x + s/(.*)/\n&\n/ + s/.*DW_AT_comp_dir[^\n]*: ([^\n]*)\n.*/'\''\1'\''&/ + s/.*DW_AT_stmt_list[^\n]*: ([^\n]*)\n.*/COMPDIR\[\1\]=&/ + P;d + } + H + d +} +} +/^Raw dump of debug contents of section .debug_line:$/,/^[^[:space:]]/ { +/^ Offset:/,/^ Line Number Statements:$/ { + /^ Offset:/ { + a \ +unset DIRTABLE\ +unset FILETABLE + + s/.*(0x[[:xdigit:]]+)$/OFFSET=\1/; p; d } + /^ The Directory Table/,/^$/ { + /^ [[:digit:]]+\t[^/.]/ { + s/^ ([[:digit:]]+)\t(.*)/DIRTABLE[\1]=\$\{COMPDIR\[\$OFFSET]\}\/\2/ + p; d + } + d + } + /^ The File Name Table/,/^$/ { + /^ [[:digit:]]+\t[1-9]/ { + s/^ [[:digit:]]+\t([[:digit:]]+)\t.*\t(.*)$/\[\[ -v DIRTABLE\[\1\] \]\] \&\& echo \$\{DIRTABLE\[\1\]\}\/\2/ + p; d + } + d + } +} +}' + source_files() { LANG=C readelf "$1" --debug-dump | \ - awk '/DW_AT_name +:/{name=$8}/DW_AT_comp_dir +:/{{if (name == "<artificial>") next}{if (name !~ /^[<\/]/) {printf "%s/", $8}}{print name}}' + sed -n -E -e "${_source_files_sedscript}" | /bin/bash } strip_file() { @@ -60,7 +102,7 @@ strip_file() { while IFS= read -r t; do f=${t/${dbgsrcdir}/"$srcdir"} mkdir -p "${dbgsrc/"$dbgsrcdir"/}${t%/*}" - cp -- "$f" "${dbgsrc/"$dbgsrcdir"/}$t" + cp -n -- "$f" "${dbgsrc/"$dbgsrcdir"/}$t" done < <(source_files "$binary") # copy debug symbols to debug directory -- 2.25.0
On 2/19/20 8:51 PM, Austin Lund wrote:
Currently only the file pointed to by the DW_AT_name is included as a source file in debug packages. This means many files that are useful for debugging are not included. For example, no header files are included but yet these may by referenced in the .debug_line section.
Why do we need headers? headers by definition are supposed to declare things that are then defined by a source file which is in DW_AT_name.
This sed script converts into shell variables the debug dump information from readelf about compilation units, directory tables and file tables. This can be used to get the full path of all the source files from within the package being compiled that are referenced in the debugging information. Also, placeholder symbols (e.g. <builtin>) and paths outside the current source (e.g. linked libraries) will be more consistently ignored from inclusion in the debug packages.
I... don't really follow what this sed thing is doing. Is it printing out a bash script? Why? -- Eli Schwartz Bug Wrangler and Trusted User
On Thu, 20 Feb 2020 at 13:41, Eli Schwartz <eschwartz@archlinux.org> wrote:
On 2/19/20 8:51 PM, Austin Lund wrote:
Currently only the file pointed to by the DW_AT_name is included as a source file in debug packages. This means many files that are useful for debugging are not included. For example, no header files are included but yet these may by referenced in the .debug_line section.
Why do we need headers? headers by definition are supposed to declare things that are then defined by a source file which is in DW_AT_name.
Some function definitions are presented within header files. For example, people will declare short inline functions in header files. The line debugging information then points into the header files. Just a random example: gdb ls b initialize_exit_failure r does_not_exist A code listing in GDB should point to the inline function definition in system.h in coreutils But currently that's missing from a built coreutils-debug package.
This sed script converts into shell variables the debug dump information from readelf about compilation units, directory tables and file tables. This can be used to get the full path of all the source files from within the package being compiled that are referenced in the debugging information. Also, placeholder symbols (e.g. <builtin>) and paths outside the current source (e.g. linked libraries) will be more consistently ignored from inclusion in the debug packages.
I... don't really follow what this sed thing is doing. Is it printing out a bash script? Why?
A fairly complex state needs to be saved to give a output useful for packaging. The things that are needed are the compilation directory for each compilation unit and the file and directory tables for the corresponding compilation units. The way this is structured in the file and the way readelf presents this information isn't totally linear. DW_AT_comp_dir is in the .debug_info section whilst the file and directory tables are in the .debug_line section where the file table refers to the directory table. readelf tries to resolve things with the --debug-dump=decodedline option. But the compilation directories are lost and the directory tables aren't used consistently in the output (at least in my version of readelf (i.e. 2.34). Keeping track of this state information is completely impractical in sed, and confusing in awk. Using bash associative arrays was an easy solution that came to mind, so I just used that and parsed the output in a subshell. Perhaps there is a better way.
On 20/2/20 11:51 am, Austin Lund wrote:
Currently only the file pointed to by the DW_AT_name is included as a source file in debug packages. This means many files that are useful for debugging are not included. For example, no header files are included but yet these may by referenced in the .debug_line section.
This sed script converts into shell variables the debug dump information from readelf about compilation units, directory tables and file tables. This can be used to get the full path of all the source files from within the package being compiled that are referenced in the debugging information. Also, placeholder symbols (e.g. <builtin>) and paths outside the current source (e.g. linked libraries) will be more consistently ignored from inclusion in the debug packages.
Signed-off-by: Austin Lund <austin.lund@gmail.com> ---
So... that sed script is horrendous! But let me see if I understand this correctly. We currently only look at the .debug_info section, finding DW_AT_name/DW_AT_comp_dir pair to grab file names. That appears to get the main compilation units, but misses header files. It looks like your sed does something slightly different to get that info, although I can't tell if there is a functional difference. Your patch additionally looks at the .debug_line section. This section has a table of directories that source files come from (which can be filtered to remove system directories), and a file name table with files from each directory. This does not include the files we currently grab. But there must be something I am missing... For the example of "ls" I see "selinux.h" in from directory "./lib/selinux" in that .debug_line output. Your script does not include this file. Using the rpmtool "debugedit -l" does include that file in its file list. There are quite a few other examples. Am I on the right track? Can you clarify what I am missing here? Thanks, Allan
On Mon, Feb 24, 2020 at 03:56:58PM +1000, Allan McRae wrote:
On 20/2/20 11:51 am, Austin Lund wrote:
Currently only the file pointed to by the DW_AT_name is included as a source file in debug packages. This means many files that are useful for debugging are not included. For example, no header files are included but yet these may by referenced in the .debug_line section.
This sed script converts into shell variables the debug dump information from readelf about compilation units, directory tables and file tables. This can be used to get the full path of all the source files from within the package being compiled that are referenced in the debugging information. Also, placeholder symbols (e.g. <builtin>) and paths outside the current source (e.g. linked libraries) will be more consistently ignored from inclusion in the debug packages.
Signed-off-by: Austin Lund <austin.lund@gmail.com> ---
So... that sed script is horrendous! But let me see if I understand this correctly.
We currently only look at the .debug_info section, finding DW_AT_name/DW_AT_comp_dir pair to grab file names. That appears to get the main compilation units, but misses header files. It looks like your sed does something slightly different to get that info, although I can't tell if there is a functional difference.
Your patch additionally looks at the .debug_line section. This section has a table of directories that source files come from (which can be filtered to remove system directories), and a file name table with files from each directory. This does not include the files we currently grab.
But there must be something I am missing... For the example of "ls" I see "selinux.h" in from directory "./lib/selinux" in that .debug_line output. Your script does not include this file. Using the rpmtool "debugedit -l" does include that file in its file list. There are quite a few other examples.
Am I on the right track? Can you clarify what I am missing here?
It would seem I made a bad assmption about what was in and not in the current source tree. There will be references to libc files that appear in the output as a relative path. My thought was that if there wasn't a preceeding './' then it must refer to another directory. Anyway, yes. You are on the right track. My hope was that you can install the debug package and get all available source listings. Without these functions within the header files, that isn't possible. What's a better approach to parsing the readelf output? Awk?
On 26/2/20 11:00 am, Austin Lund wrote:
On Mon, Feb 24, 2020 at 03:56:58PM +1000, Allan McRae wrote:
On 20/2/20 11:51 am, Austin Lund wrote:
Currently only the file pointed to by the DW_AT_name is included as a source file in debug packages. This means many files that are useful for debugging are not included. For example, no header files are included but yet these may by referenced in the .debug_line section.
This sed script converts into shell variables the debug dump information from readelf about compilation units, directory tables and file tables. This can be used to get the full path of all the source files from within the package being compiled that are referenced in the debugging information. Also, placeholder symbols (e.g. <builtin>) and paths outside the current source (e.g. linked libraries) will be more consistently ignored from inclusion in the debug packages.
Signed-off-by: Austin Lund <austin.lund@gmail.com> ---
So... that sed script is horrendous! But let me see if I understand this correctly.
We currently only look at the .debug_info section, finding DW_AT_name/DW_AT_comp_dir pair to grab file names. That appears to get the main compilation units, but misses header files. It looks like your sed does something slightly different to get that info, although I can't tell if there is a functional difference.
Your patch additionally looks at the .debug_line section. This section has a table of directories that source files come from (which can be filtered to remove system directories), and a file name table with files from each directory. This does not include the files we currently grab.
But there must be something I am missing... For the example of "ls" I see "selinux.h" in from directory "./lib/selinux" in that .debug_line output. Your script does not include this file. Using the rpmtool "debugedit -l" does include that file in its file list. There are quite a few other examples.
Am I on the right track? Can you clarify what I am missing here?
It would seem I made a bad assmption about what was in and not in the current source tree. There will be references to libc files that appear in the output as a relative path. My thought was that if there wasn't a preceeding './' then it must refer to another directory.
Anyway, yes. You are on the right track. My hope was that you can install the debug package and get all available source listings. Without these functions within the header files, that isn't possible.
What's a better approach to parsing the readelf output? Awk?
Short answer... I'm not sure of the best approach. I'd just like it to be readable. I have file a bug: https://bugs.archlinux.org/task/65677
participants (3)
-
Allan McRae
-
Austin Lund
-
Eli Schwartz