[arch-dev-public] [idea] global link database for all packages
Hi, I just got an idea which might be worth to think about. Namcap is quite useful but due to its limitation of only seeing a certain pkg file at a time it cannot answer all questions. The idea is to create a database (similar to the file list we already create) which includes lists of files and to which they are linked. Dumping this togehter with our pkgdb and file lists in a hughe SQL db we can answer questions such as: * What are the hidden deps of a package? * Which pacakge need a rebuld if I bump a certain package? * Find missing deps * What happens if I remove a dep from a package? ** e.g. if I remove openssl as dep for Qt I need to review every package that directly or indirectly depends on Qt and check if it needs openssl (which was hidden by the Qt dep before) * Maybe check for so name conflicts or packages providing the same one? (libgl, java) * check a package deps without the need to actually install its deps. (like namcap) I made a quick and dirty script based on the createFileLists script: http://users.archlinux.de/~pierre/tmp/createLinkLists.txt It should work fine and even incremental but its runtime is awful. So, if you think that might be a good idea there is a lot of room for improvements. I don't know if its possible with bash but we don't really need to extract every file: * check if a file is an elf file * extract the header * move to next file -- Pierre Schmitz, http://users.archlinux.de/~pierre
On Wed, Jun 24, 2009 at 15:36, Pierre Schmitz<pierre@archlinux.de> wrote:
* check if a file is an elf file * extract the header * move to next file I think `file` and `readelf` would help.
On Wednesday 24 June 2009 23:00:02 Daenyth Blank wrote:
On Wed, Jun 24, 2009 at 15:36, Pierre Schmitz<pierre@archlinux.de> wrote:
* check if a file is an elf file * extract the header * move to next file
I think `file` and `readelf` would help.
file is several times slower than readelf. But I have found some other optimization: * Only extract files from /opt /lib /sbin /bin /usr/lib /usr/sbin /usr/bin * run the script for both arches in parallel * treat links to libs and executables as files (this way we can check for exact so names) * The script uses results from previous runs and as a result should be fast enough to be run on gerolde (I can provide an inital data set) The resultsing db files are quite small: 20 KB for core and 320 KB for extra. So, what do you think about adding this to our cron jobs? We could run it on a daily base and update the db files. It should be easy to write clients which check for possible rebuild candidates and do all kinds of integrity checks. -- Pierre Schmitz, http://users.archlinux.de/~pierre
On Sunday 28 June 2009 18:13:54 Pierre Schmitz wrote:
The resultsing db files are quite small: 20 KB for core and 320 KB for extra.
I have uploaded some example files to http://users.archlinux.de/~pierre/tmp/extra.links.tar.gz (also for core, community and testing) -- Pierre Schmitz, http://users.archlinux.de/~pierre
Pierre Schmitz wrote:
On Sunday 28 June 2009 18:13:54 Pierre Schmitz wrote:
The resultsing db files are quite small: 20 KB for core and 320 KB for extra.
I have uploaded some example files to http://users.archlinux.de/~pierre/tmp/extra.links.tar.gz (also for core, community and testing)
You have given the links for every file examined. Do we need that much information? Is there use beyond a global list for the package? That would also have the advantage of simplifying the format (no need for % symbols) which would simplify writing clients to generate rebuild lists or integrity checks etc. Allan
On Sunday 28 June 2009 18:44:54 Allan McRae wrote:
You have given the links for every file examined. Do we need that much information? Is there use beyond a global list for the package? That would also have the advantage of simplifying the format (no need for % symbols) which would simplify writing clients to generate rebuild lists or integrity checks etc.
I have thought about it. But the db files are already small and if you don't need that information you can pipe it through "grep -v '%' | sort -u". On the other hand one could use that information for optdepends, possible split candidates or if you want to know which feature of package a needs pacakge b. -- Pierre Schmitz, http://users.archlinux.de/~pierre
On Sunday 28 June 2009 18:52:45 Pierre Schmitz wrote:
I have thought about it.
Just forgot: I don't have a strong oppionion about having those detailed information or not. If we think a simple list per package would be better I am fine with that. -- Pierre Schmitz, http://users.archlinux.de/~pierre
On Sun, Jun 28, 2009 at 11:13 AM, Pierre Schmitz<pierre@archlinux.de> wrote:
On Wednesday 24 June 2009 23:00:02 Daenyth Blank wrote:
On Wed, Jun 24, 2009 at 15:36, Pierre Schmitz<pierre@archlinux.de> wrote:
* check if a file is an elf file * extract the header * move to next file
I think `file` and `readelf` would help.
file is several times slower than readelf. But I have found some other optimization: * Only extract files from /opt /lib /sbin /bin /usr/lib /usr/sbin /usr/bin * run the script for both arches in parallel * treat links to libs and executables as files (this way we can check for exact so names) * The script uses results from previous runs and as a result should be fast enough to be run on gerolde (I can provide an inital data set)
The resultsing db files are quite small: 20 KB for core and 320 KB for extra.
So, what do you think about adding this to our cron jobs? We could run it on a daily base and update the db files.
It should be easy to write clients which check for possible rebuild candidates and do all kinds of integrity checks.
Part of me feels like we should adopt Gerardo's script for this purpose, as it seems a little more robust. Would you have a problem with that?
On Monday 29 June 2009 17:54:39 Aaron Griffin wrote:
Part of me feels like we should adopt Gerardo's script for this purpose, as it seems a little more robust. Would you have a problem with that?
Did not know about that. Do you mean this one? http://github.com/djgera/pkgdyn/blob/92191b7cff428159c42080d36d7db936c13a5d2... What do you mean by more robust? -- Pierre Schmitz, http://users.archlinux.de/~pierre
On Mon, Jun 29, 2009 at 11:53 AM, Pierre Schmitz<pierre@archlinux.de> wrote:
On Monday 29 June 2009 17:54:39 Aaron Griffin wrote:
Part of me feels like we should adopt Gerardo's script for this purpose, as it seems a little more robust. Would you have a problem with that?
Did not know about that. Do you mean this one? http://github.com/djgera/pkgdyn/blob/92191b7cff428159c42080d36d7db936c13a5d2...
What do you mean by more robust?
I meant pkgdyn itself: http://github.com/djgera/pkgdyn/tree/master
On Monday 29 June 2009 19:02:08 Aaron Griffin wrote:
What do you mean by more robust?
I meant pkgdyn itself: http://github.com/djgera/pkgdyn/tree/master
It does a lot more than my stupid script so its hard to compare. My goal was just to provide some raw data which can be used by and for a lot of things; pkgsyn could use that data, too. Anyway: the awk script in dynup does not seem to be a bad idea; so I might add something similar to my script. -- Pierre Schmitz, http://users.archlinux.de/~pierre
participants (4)
-
Aaron Griffin
-
Allan McRae
-
Daenyth Blank
-
Pierre Schmitz