[arch-dev-public] Filename search for Arch
It is a much requested feature: Someone wants to search for a filename and get a package name, and there is no possibility in Arch. I have scripts to generate filelists from the package files which could run once a day or so. However, cactus (after a short jabber discussion) disagress with me on how the search would be done: My suggestion: Create a script on the webserver that "greps" through the prepared filelists or searches a sql database, then gives you a machine-readable output via xml or json that can be displayed in a client (for example a 'pacman -So' option). However, cactus thinks that this would put too much load on the server. Another possibility: Make the filelists available for (optional) download and search them offline (with pacman or another tool). My problem here is: total 140M -rw-r--r-- 1 thomas users 35M Jan 21 17:06 filelist.community-i686 -rw-r--r-- 1 thomas users 34M Jan 21 17:11 filelist.community-x86_64 -rw-r--r-- 1 thomas users 1.6M Jan 21 16:52 filelist.core-i686 -rw-r--r-- 1 thomas users 1.6M Jan 21 16:52 filelist.core-x86_64 -rw-r--r-- 1 thomas users 32M Jan 21 16:57 filelist.extra-i686 -rw-r--r-- 1 thomas users 32M Jan 21 17:02 filelist.extra-x86_64 -rw-r--r-- 1 thomas users 2.1M Jan 21 17:11 filelist.testing-i686 -rw-r--r-- 1 thomas users 2.0M Jan 21 17:12 filelist.testing-x86_64 -rw-r--r-- 1 thomas users 1.2M Jan 21 17:11 filelist.unstable-i686 -rw-r--r-- 1 thomas users 673K Jan 21 17:11 filelist.unstable-x86_64 Which is 8.9MB compressed with bz2, a lot to download if you have to update the lists every day to have an up-to-date version. What do you guys think?
Thomas Bächler wrote:
Which is 8.9MB compressed with bz2, a lot to download if you have to update the lists every day to have an up-to-date version.
What do you guys think?
What about using a diff or something, to avoid downloading the whole list every time? It probably doesn't change a lot between two updates.
On Wed, Jan 23, 2008 at 07:10:37PM +0100, Thomas Bächler wrote:
It is a much requested feature: Someone wants to search for a filename and get a package name, and there is no possibility in Arch.
I have scripts to generate filelists from the package files which could run once a day or so. However, cactus (after a short jabber discussion) disagress with me on how the search would be done:
My suggestion: Create a script on the webserver that "greps" through the prepared filelists or searches a sql database, then gives you a machine-readable output via xml or json that can be displayed in a client (for example a 'pacman -So' option). However, cactus thinks that this would put too much load on the server.
Another possibility: Make the filelists available for (optional) download and search them offline (with pacman or another tool). My problem here is: total 140M -rw-r--r-- 1 thomas users 35M Jan 21 17:06 filelist.community-i686 -rw-r--r-- 1 thomas users 34M Jan 21 17:11 filelist.community-x86_64 -rw-r--r-- 1 thomas users 1.6M Jan 21 16:52 filelist.core-i686 -rw-r--r-- 1 thomas users 1.6M Jan 21 16:52 filelist.core-x86_64 -rw-r--r-- 1 thomas users 32M Jan 21 16:57 filelist.extra-i686 -rw-r--r-- 1 thomas users 32M Jan 21 17:02 filelist.extra-x86_64 -rw-r--r-- 1 thomas users 2.1M Jan 21 17:11 filelist.testing-i686 -rw-r--r-- 1 thomas users 2.0M Jan 21 17:12 filelist.testing-x86_64 -rw-r--r-- 1 thomas users 1.2M Jan 21 17:11 filelist.unstable-i686 -rw-r--r-- 1 thomas users 673K Jan 21 17:11 filelist.unstable-x86_64
Which is 8.9MB compressed with bz2, a lot to download if you have to update the lists every day to have an up-to-date version.
What do you guys think?
Hmm... you use to be able to seacrh through package lists through the web interface. Did this change when we moved to django or something? Jason
Jason Chu schrieb:
Hmm... you use to be able to seacrh through package lists through the web interface. Did this change when we moved to django or something?
I don't think it still works. And it never worked that well, often missed stuff and so on. And of course, you couldn't search using a script of pacman -So.
On Wed, Jan 23, 2008 at 10:23:55AM -0800, Jason Chu wrote:
On Wed, Jan 23, 2008 at 07:10:37PM +0100, Thomas Bächler wrote:
It is a much requested feature: Someone wants to search for a filename and get a package name, and there is no possibility in Arch.
I have scripts to generate filelists from the package files which could run once a day or so. However, cactus (after a short jabber discussion) disagress with me on how the search would be done:
My suggestion: Create a script on the webserver that "greps" through the prepared filelists or searches a sql database, then gives you a machine-readable output via xml or json that can be displayed in a client (for example a 'pacman -So' option). However, cactus thinks that this would put too much load on the server.
Another possibility: Make the filelists available for (optional) download and search them offline (with pacman or another tool). My problem here is: total 140M -rw-r--r-- 1 thomas users 35M Jan 21 17:06 filelist.community-i686 -rw-r--r-- 1 thomas users 34M Jan 21 17:11 filelist.community-x86_64 -rw-r--r-- 1 thomas users 1.6M Jan 21 16:52 filelist.core-i686 -rw-r--r-- 1 thomas users 1.6M Jan 21 16:52 filelist.core-x86_64 -rw-r--r-- 1 thomas users 32M Jan 21 16:57 filelist.extra-i686 -rw-r--r-- 1 thomas users 32M Jan 21 17:02 filelist.extra-x86_64 -rw-r--r-- 1 thomas users 2.1M Jan 21 17:11 filelist.testing-i686 -rw-r--r-- 1 thomas users 2.0M Jan 21 17:12 filelist.testing-x86_64 -rw-r--r-- 1 thomas users 1.2M Jan 21 17:11 filelist.unstable-i686 -rw-r--r-- 1 thomas users 673K Jan 21 17:11 filelist.unstable-x86_64
Which is 8.9MB compressed with bz2, a lot to download if you have to update the lists every day to have an up-to-date version.
What do you guys think?
Hmm... you use to be able to seacrh through package lists through the web interface. Did this change when we moved to django or something?
Jason
It did not. Filelists still get inserted into the sql database when you run the update scripts on gerolde, however there is no way to access them through the current django interface. All the data is there and perfectly up to date though. -S
On Jan 23, 2008 12:42 PM, Simo Leone <simo@archlinux.org> wrote:
On Wed, Jan 23, 2008 at 10:23:55AM -0800, Jason Chu wrote:
On Wed, Jan 23, 2008 at 07:10:37PM +0100, Thomas Bächler wrote:
It is a much requested feature: Someone wants to search for a filename and get a package name, and there is no possibility in Arch.
I have scripts to generate filelists from the package files which could run once a day or so. However, cactus (after a short jabber discussion) disagress with me on how the search would be done:
My suggestion: Create a script on the webserver that "greps" through the prepared filelists or searches a sql database, then gives you a machine-readable output via xml or json that can be displayed in a client (for example a 'pacman -So' option). However, cactus thinks that this would put too much load on the server.
Another possibility: Make the filelists available for (optional) download and search them offline (with pacman or another tool). My problem here is: total 140M -rw-r--r-- 1 thomas users 35M Jan 21 17:06 filelist.community-i686 -rw-r--r-- 1 thomas users 34M Jan 21 17:11 filelist.community-x86_64 -rw-r--r-- 1 thomas users 1.6M Jan 21 16:52 filelist.core-i686 -rw-r--r-- 1 thomas users 1.6M Jan 21 16:52 filelist.core-x86_64 -rw-r--r-- 1 thomas users 32M Jan 21 16:57 filelist.extra-i686 -rw-r--r-- 1 thomas users 32M Jan 21 17:02 filelist.extra-x86_64 -rw-r--r-- 1 thomas users 2.1M Jan 21 17:11 filelist.testing-i686 -rw-r--r-- 1 thomas users 2.0M Jan 21 17:12 filelist.testing-x86_64 -rw-r--r-- 1 thomas users 1.2M Jan 21 17:11 filelist.unstable-i686 -rw-r--r-- 1 thomas users 673K Jan 21 17:11 filelist.unstable-x86_64
Which is 8.9MB compressed with bz2, a lot to download if you have to update the lists every day to have an up-to-date version.
What do you guys think?
Hmm... you use to be able to seacrh through package lists through the web interface. Did this change when we moved to django or something?
Jason
It did not. Filelists still get inserted into the sql database when you run the update scripts on gerolde, however there is no way to access them through the current django interface. All the data is there and perfectly up to date though.
I think this would be far more useful than the date field we removed, for instance. Can we add a file search somewhere to the web interface? Obviously you can throw "patches welcome" back in my face. :) -Dan
My main disagreement was providing a pacman -So interface that queried the server directly. For three main reasons: 1. The load on the server from having pacman -So run on end users systems. This is also not something that could be mirrored. It would *all* point to the arch webserver (of which we only have one right now). 2. This would *tightly* couple pacman to the arch website. It is hard to express how much I disagree with such a tight coupling. 3. Why would someone need to search to see what owns a file that they don't have on their system, with pacman?
eliott wrote:
3. Why would someone need to search to see what owns a file that they don't have on their system, with pacman?
This is apparently regularly needed, I see the !owns command of phrik in #archlinux used every day. An user might know that he needs some library file, or some binary, without knowing the package name. Here is a typical example : http://bugs.archlinux.org/task/4824#comment14448
I agree with everything eliott has said except: On Jan 23, 2008 2:32 PM, eliott <eliott@cactuswax.net> wrote:
3. Why would someone need to search to see what owns a file that they don't have on their system, with pacman?
My favorite case for this is when I build something from source and either I don't look up or don't know the dependencies of the app. It spits out some linking error about some library file. Wouldn't be awesome if I could just pacman -So /lib/libyourmother.so.hot and it spit out that I was looking for extra/your-mother? This is just one very simple example. Not to mention it'll also help someone who has something that needs to be recompiled, because they can run pacman -So /lib/yourmother.so.hot.3 and see that it isn't in any up to date packages. The current scenario is me asking someone else to search on THEIR system for it, or searching google to find a relevant piece of software and then figure out of arch has that piece of software. Not awesome. I think this is a useful feature IF implemented as phrakture states, which is merely throwing around filelists that are compressed on the mirrors. For people who don't want to download it, they shouldn't be forced to. // jeff -- . : [ + carpe diem totus tuus + ] : .
On Jan 23, 2008 2:00 PM, Jeff Mickey <jeff@archlinux.org> wrote:
I agree with everything eliott has said except:
On Jan 23, 2008 2:32 PM, eliott <eliott@cactuswax.net> wrote:
3. Why would someone need to search to see what owns a file that they don't have on their system, with pacman?
My favorite case for this is when I build something from source and either I don't look up or don't know the dependencies of the app. It spits out some linking error about some library file. Wouldn't be awesome if I could just pacman -So /lib/libyourmother.so.hot and it spit out that I was looking for extra/your-mother? This is just one very simple example. Not to mention it'll also help someone who has something that needs to be recompiled, because they can run pacman -So /lib/yourmother.so.hot.3 and see that it isn't in any up to date packages.
The current scenario is me asking someone else to search on THEIR system for it, or searching google to find a relevant piece of software and then figure out of arch has that piece of software. Not awesome.
I think this is a useful feature IF implemented as phrakture states, which is merely throwing around filelists that are compressed on the mirrors. For people who don't want to download it, they shouldn't be forced to.
After talking to cactus on jabber, he pointed out the fact that the critical phrase in that sentence is "with pacman". It appears that the common case for looking up library names and things like that is related to *building* packages and software, and as such, might fit better as a supplementary tool to makepkg (or even in devtools).
On Jan 23, 2008 3:11 PM, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On Jan 23, 2008 2:00 PM, Jeff Mickey <jeff@archlinux.org> wrote:
I agree with everything eliott has said except:
On Jan 23, 2008 2:32 PM, eliott <eliott@cactuswax.net> wrote:
3. Why would someone need to search to see what owns a file that they don't have on their system, with pacman?
My favorite case for this is when I build something from source and either I don't look up or don't know the dependencies of the app. It spits out some linking error about some library file. Wouldn't be awesome if I could just pacman -So /lib/libyourmother.so.hot and it spit out that I was looking for extra/your-mother? This is just one very simple example. Not to mention it'll also help someone who has something that needs to be recompiled, because they can run pacman -So /lib/yourmother.so.hot.3 and see that it isn't in any up to date packages.
The current scenario is me asking someone else to search on THEIR system for it, or searching google to find a relevant piece of software and then figure out of arch has that piece of software. Not awesome.
I think this is a useful feature IF implemented as phrakture states, which is merely throwing around filelists that are compressed on the mirrors. For people who don't want to download it, they shouldn't be forced to.
After talking to cactus on jabber, he pointed out the fact that the critical phrase in that sentence is "with pacman". It appears that the common case for looking up library names and things like that is related to *building* packages and software, and as such, might fit better as a supplementary tool to makepkg (or even in devtools).
For looking up library names, yes. There are other cases though - when (for example) glxgears and glxinfo moved into their own package (mesa-apps) tons of people were asking where they went. Even I didn't know for a while. There's already the uudecode example provided earlier. And which package is kde-app-X located inside? kdm? I don't know. In any case, there are valid use cases for this feature that don't necessarily include building packages.
On Jan 23, 2008 3:19 PM, Travis Willard <travis@archlinux.org> wrote:
And which package is kde-app-X located inside? kdm? I don't know.
Yay ambiguity. That should read: "And which package is kde-app-X located inside? Where do I find kdm, for example? I don't know."
On Jan 23, 2008 2:19 PM, Travis Willard <travis@archlinux.org> wrote:
On Jan 23, 2008 3:11 PM, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On Jan 23, 2008 2:00 PM, Jeff Mickey <jeff@archlinux.org> wrote:
I agree with everything eliott has said except:
On Jan 23, 2008 2:32 PM, eliott <eliott@cactuswax.net> wrote:
3. Why would someone need to search to see what owns a file that they don't have on their system, with pacman?
My favorite case for this is when I build something from source and either I don't look up or don't know the dependencies of the app. It spits out some linking error about some library file. Wouldn't be awesome if I could just pacman -So /lib/libyourmother.so.hot and it spit out that I was looking for extra/your-mother? This is just one very simple example. Not to mention it'll also help someone who has something that needs to be recompiled, because they can run pacman -So /lib/yourmother.so.hot.3 and see that it isn't in any up to date packages.
The current scenario is me asking someone else to search on THEIR system for it, or searching google to find a relevant piece of software and then figure out of arch has that piece of software. Not awesome.
I think this is a useful feature IF implemented as phrakture states, which is merely throwing around filelists that are compressed on the mirrors. For people who don't want to download it, they shouldn't be forced to.
After talking to cactus on jabber, he pointed out the fact that the critical phrase in that sentence is "with pacman". It appears that the common case for looking up library names and things like that is related to *building* packages and software, and as such, might fit better as a supplementary tool to makepkg (or even in devtools).
For looking up library names, yes. There are other cases though - when (for example) glxgears and glxinfo moved into their own package (mesa-apps) tons of people were asking where they went. Even I didn't know for a while.
There's already the uudecode example provided earlier.
And which package is kde-app-X located inside? kdm? I don't know.
In any case, there are valid use cases for this feature that don't necessarily include building packages.
Ah thanks, these examples help too. I honestly have never needed something like -So, so I don't know the use cases
Aaron Griffin schrieb:
Ah thanks, these examples help too. I honestly have never needed something like -So, so I don't know the use cases
I'd like to point out again that -So or something similar is one of the most requested features from users.
On Jan 23, 2008 2:48 PM, Thomas Bächler <thomas@archlinux.org> wrote:
Aaron Griffin schrieb:
Ah thanks, these examples help too. I honestly have never needed something like -So, so I don't know the use cases
I'd like to point out again that -So or something similar is one of the most requested features from users.
I'm neutral on this whole thing, except integrating it into pacman- that seems like an awful idea, although the premise of "-So" makes sense. But when I saw this comment above I had to wonder - ***where is the bug report?*** -Dan
On Jan 23, 2008 3:59 PM, Dan McGee <dpmcgee@gmail.com> wrote:
But when I saw this comment above I had to wonder - ***where is the bug report?***
Uh - Xavier already posted it in this very ML thread: On Jan 23, 2008 2:58 PM, Xavier <shiningxc@gmail.com> wrote:
This is apparently regularly needed, I see the !owns command of phrik in #archlinux used every day. An user might know that he needs some library file, or some binary, without knowing the package name. Here is a typical example : http://bugs.archlinux.org/task/4824#comment14448
On Jan 23, 2008 3:02 PM, Travis Willard <travis@archlinux.org> wrote:
On Jan 23, 2008 3:59 PM, Dan McGee <dpmcgee@gmail.com> wrote:
But when I saw this comment above I had to wonder - ***where is the bug report?***
Uh - Xavier already posted it in this very ML thread:
On Jan 23, 2008 2:58 PM, Xavier <shiningxc@gmail.com> wrote:
This is apparently regularly needed, I see the !owns command of phrik in #archlinux used every day. An user might know that he needs some library file, or some binary, without knowing the package name. Here is a typical example : http://bugs.archlinux.org/task/4824#comment14448
/me goes back to the corner. Of course, it hasn't been touched in a long time. :) -Dan
On Jan 23, 2008 2:48 PM, Thomas Bächler <thomas@archlinux.org> wrote:
Aaron Griffin schrieb:
Ah thanks, these examples help too. I honestly have never needed something like -So, so I don't know the use cases
I'd like to point out again that -So or something similar is one of the most requested features from users.
That doesn't sound empirical. According to flyspray votes (the only empirical evidence we have), signed packages are #1: http://bugs.archlinux.org/index/proj3?project=3&do=index&order=votes&sort=desc&order2=&sort2= I agree it's requested a lot, but no more than, say, sqlite backends and the like. Still, like Dan, I'm neutral. I don't care. But what I *do* care about is json/xml parsing integration into pacman. I'm just going to say "no" on that.
2008/1/23, Aaron Griffin <aaronmgriffin@gmail.com>:
On Jan 23, 2008 2:48 PM, Thomas Bächler <thomas@archlinux.org> wrote:
Aaron Griffin schrieb:
Ah thanks, these examples help too. I honestly have never needed something like -So, so I don't know the use cases
I'd like to point out again that -So or something similar is one of the most requested features from users.
That doesn't sound empirical. According to flyspray votes (the only empirical evidence we have), signed packages are #1:
Hey, votes for packages were introduced way after very demanded bugs were filed, so votes doesn't mean much for my opinion about bug's importance.
http://bugs.archlinux.org/index/proj3?project=3&do=index&order=votes&sort=desc&order2=&sort2=
I agree it's requested a lot, but no more than, say, sqlite backends and the like.
Still, like Dan, I'm neutral. I don't care. But what I *do* care about is json/xml parsing integration into pacman. I'm just going to say "no" on that.
Sure, no json/xml/yaml/etc. parsing in pacman. A separare script - maybe. /me shrugs. As for tarball consisting of all packages' filelists - it would be nice if the script could rsync it (like abs in future). -- Roman Kyrylych (Роман Кирилич)
2008/1/23, Roman Kyrylych <roman.kyrylych@gmail.com>:
2008/1/23, Aaron Griffin <aaronmgriffin@gmail.com>:
On Jan 23, 2008 2:48 PM, Thomas Bächler <thomas@archlinux.org> wrote:
Aaron Griffin schrieb:
Ah thanks, these examples help too. I honestly have never needed something like -So, so I don't know the use cases
I'd like to point out again that -So or something similar is one of the most requested features from users.
That doesn't sound empirical. According to flyspray votes (the only empirical evidence we have), signed packages are #1:
Hey, votes for packages were introduced way after very demanded bugs were filed, so votes doesn't mean much for my opinion about bug's importance.
http://bugs.archlinux.org/index/proj3?project=3&do=index&order=votes&sort=desc&order2=&sort2=
I agree it's requested a lot, but no more than, say, sqlite backends and the like.
Still, like Dan, I'm neutral. I don't care. But what I *do* care about is json/xml parsing integration into pacman. I'm just going to say "no" on that.
Sure, no json/xml/yaml/etc. parsing in pacman. A separare script - maybe. /me shrugs. As for tarball consisting of all packages' filelists - it would be nice if the script could rsync it (like abs in future).
Heh, the last message in Repo filename search thread on arch-general is interesting. :-) -- Roman Kyrylych (Роман Кирилич)
Aaron Griffin schrieb:
That doesn't sound empirical. According to flyspray votes (the only empirical evidence we have), signed packages are #1:
Just count how often someone asks the question "Which package contains file X" on IRC, or types "!owns foo". Some people even say that THE advantage of Debian over Arch is a filename search.
Still, like Dan, I'm neutral. I don't care. But what I *do* care about is json/xml parsing integration into pacman. I'm just going to say "no" on that.
It was just an idea which sounded feasible. Even if it won't be implemented, I must disagree that it would be archlinux centric: one could have a number of configurable filename search URLs in the configuration files, which could all be queried. Back on topic: I'd still like to see a -So option in pacman, so it's more or less up to you to tell us what you think would fit into pacman. In an earlier message, it was suggested that filelists should be added to the repo dir (like core.filelist.tar.gz), can optionally be synced and then queried by pacman. I also like this idea, but: - The lists should be stored compressed on the local filesystem and only unpacked on demand - filelists are over 40MB for extra uncompressed. - The lists should not be updated on every package update, that would take too much time. Updates should occur with a cronjob or so.
On Jan 23, 2008 4:08 PM, Thomas Bächler <thomas@archlinux.org> wrote:
In an earlier message, it was suggested that filelists should be added to the repo dir (like core.filelist.tar.gz), can optionally be synced and then queried by pacman. I also like this idea, but: - The lists should be stored compressed on the local filesystem and only unpacked on demand - filelists are over 40MB for extra uncompressed.
How does apt-cache do it? Could you look into it?
- The lists should not be updated on every package update, that would take too much time. Updates should occur with a cronjob or so.
This is unrelated to pacman, but related to the DB scripts themselves, which are much easier for us to futz with considering they're really only used on one place.
On Jan 23, 2008 12:10 PM, Thomas Bächler <thomas@archlinux.org> wrote:
It is a much requested feature: Someone wants to search for a filename and get a package name, and there is no possibility in Arch.
I have scripts to generate filelists from the package files which could run once a day or so. However, cactus (after a short jabber discussion) disagress with me on how the search would be done:
My suggestion: Create a script on the webserver that "greps" through the prepared filelists or searches a sql database, then gives you a machine-readable output via xml or json that can be displayed in a client (for example a 'pacman -So' option). However, cactus thinks that this would put too much load on the server.
Another possibility: Make the filelists available for (optional) download and search them offline (with pacman or another tool). My problem here is: total 140M -rw-r--r-- 1 thomas users 35M Jan 21 17:06 filelist.community-i686 -rw-r--r-- 1 thomas users 34M Jan 21 17:11 filelist.community-x86_64 -rw-r--r-- 1 thomas users 1.6M Jan 21 16:52 filelist.core-i686 -rw-r--r-- 1 thomas users 1.6M Jan 21 16:52 filelist.core-x86_64 -rw-r--r-- 1 thomas users 32M Jan 21 16:57 filelist.extra-i686 -rw-r--r-- 1 thomas users 32M Jan 21 17:02 filelist.extra-x86_64 -rw-r--r-- 1 thomas users 2.1M Jan 21 17:11 filelist.testing-i686 -rw-r--r-- 1 thomas users 2.0M Jan 21 17:12 filelist.testing-x86_64 -rw-r--r-- 1 thomas users 1.2M Jan 21 17:11 filelist.unstable-i686 -rw-r--r-- 1 thomas users 673K Jan 21 17:11 filelist.unstable-x86_64
Which is 8.9MB compressed with bz2, a lot to download if you have to update the lists every day to have an up-to-date version.
What do you guys think?
I would implement this as follows: Add some option to repo-add to generate a second list like above. "repo-add --filelists" would generate extra.files.tar.gz or something Then a pacman config option like so: SyncFilelists = yes And everything else should flow from there. Integrating json and xml parsing into pacman just seems.... wrong
I wrote a script, which you can read here: http://dev.archlinux.org/~thomas/genrepofilelist.sh.html If there are no objections, I would add a daily cronjob to my user's crontab which runs this script for testing,core,extra,community,unstable for both architectures. The filelist for extra is about 2.7MB gz-compressed. This would at least allow advanced users to download these file lists and use zgrep to search for files. I would love to see options to sync and search those file lists in pacman. IMO, it should be possible to configure the following in pacman.conf: - Always sync the filelists with -Sy for offline use - Sync the filelists only on demand (when pacman -So is called and they are out of date)
On Thu, Jan 24, 2008 at 11:08:13AM +0100, Thomas Bächler wrote:
I wrote a script, which you can read here: http://dev.archlinux.org/~thomas/genrepofilelist.sh.html
If there are no objections, I would add a daily cronjob to my user's crontab which runs this script for testing,core,extra,community,unstable for both architectures. The filelist for extra is about 2.7MB gz-compressed.
This would at least allow advanced users to download these file lists and use zgrep to search for files.
I would love to see options to sync and search those file lists in pacman. IMO, it should be possible to configure the following in pacman.conf: - Always sync the filelists with -Sy for offline use - Sync the filelists only on demand (when pacman -So is called and they are out of date)
Wow that script is intense on load. Wouldn't it be easier to just take the data out of the mysql db seeing as it's all already there? If I get a chance this afternoon, I'll write a script as well. -S
Simo Leone schrieb:
Wow that script is intense on load. Wouldn't it be easier to just take the data out of the mysql db seeing as it's all already there? If I get a chance this afternoon, I'll write a script as well.
I don't know the structure of the mysql db, plus it only contains lists for i686 and not for community! And I suspect it may be otherwise incomplete. The load is high, but the script runs less than 5 minutes per repository, I think this would be okay to run once a day.
On Jan 24, 2008 10:42 AM, Thomas Bächler <thomas@archlinux.org> wrote:
The load is high, but the script runs less than 5 minutes per repository, I think this would be okay to run once a day.
Hrm, gerolde is already quite taxed as it is. Would it be possible to just use "tar -tzf" instead of pacman for listing files? It's going to prevent a complete DB reread for each of 3000 packages.
It seems to me that two separate issues being discussed in this thread. 1. A mechanism to search for files in packages. 2. A mechanism to search for files in packages, from pacman. thomas' script sounds like it is attempting to cover the first case. I think we can provide a web interface for it, since the data is in the database already. Something like a simplified package search. the second case is a different issue and would require a different solution. If that is the case that people want covered, then thomas' script (from how it has been discussed so far) would not provide that. I guess I don't see where this script fits in, and how it is supposed to be used. thomas made mention of using zgrep for advanced users, but that seems just as difficult as opening a web browser and typing into a search box. I am trying to understand the use case, and work towards a solution that not only works, but is efficient and can coexist well within our infrastructure.
eliott schrieb:
I guess I don't see where this script fits in, and how it is supposed to be used. thomas made mention of using zgrep for advanced users, but that seems just as difficult as opening a web browser and typing into a search box.
The purpose is to provide filelists for download, so they can be searched offline by pacman. My first idea (implementing an online search in pacman) was rejected, thus I thought about a "download the filelist and search it" offline solution.
On Jan 25, 2008 5:02 PM, Thomas Bächler <thomas@archlinux.org> wrote:
eliott schrieb:
I guess I don't see where this script fits in, and how it is supposed to be used. thomas made mention of using zgrep for advanced users, but that seems just as difficult as opening a web browser and typing into a search box.
The purpose is to provide filelists for download, so they can be searched offline by pacman. My first idea (implementing an online search in pacman) was rejected, thus I thought about a "download the filelist and search it" offline solution.
Oh, I must have misunderstood too. If you're going to implement filelist search and all that stuff, we should: a) Move this to the pacman-dev mailing list b) Add external tools to do this as part of the "pacman source", i.e. as a patch to repo-add c) Not use this script until pacman actually has this feature. If the intent is to let users zgrep it, then I agree with cactus that that is significantly more complex then actually using the website to provide a search interface.
On 1/25/08, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On Jan 25, 2008 5:02 PM, Thomas Bächler <thomas@archlinux.org> wrote:
eliott schrieb:
I guess I don't see where this script fits in, and how it is supposed to be used. thomas made mention of using zgrep for advanced users, but that seems just as difficult as opening a web browser and typing into a search box.
The purpose is to provide filelists for download, so they can be searched offline by pacman. My first idea (implementing an online search in pacman) was rejected, thus I thought about a "download the filelist and search it" offline solution.
Oh, I must have misunderstood too. If you're going to implement filelist search and all that stuff, we should: a) Move this to the pacman-dev mailing list b) Add external tools to do this as part of the "pacman source", i.e. as a patch to repo-add c) Not use this script until pacman actually has this feature.
If the intent is to let users zgrep it, then I agree with cactus that that is significantly more complex then actually using the website to provide a search interface.
Yeah. I wasn't apposed to having a file search mechanism on the site. I was apposed to having pacman query the website. If a user opens up a browser and searches, no problem. It was tying this to pacman that I felt was a *really bad idea*. Alternatively, if there is a pacman only solution, that involves some mirrored meta in the repository, that is something else entirely, and should probably be talked about on the pacman dev list, so as to make it as distribution neutral as possible.
On Jan 25, 2008 5:31 PM, eliott <eliott@cactuswax.net> wrote:
On 1/25/08, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On Jan 25, 2008 5:02 PM, Thomas Bächler <thomas@archlinux.org> wrote:
eliott schrieb:
I guess I don't see where this script fits in, and how it is supposed to be used. thomas made mention of using zgrep for advanced users, but that seems just as difficult as opening a web browser and typing into a search box.
The purpose is to provide filelists for download, so they can be searched offline by pacman. My first idea (implementing an online search in pacman) was rejected, thus I thought about a "download the filelist and search it" offline solution.
Oh, I must have misunderstood too. If you're going to implement filelist search and all that stuff, we should: a) Move this to the pacman-dev mailing list b) Add external tools to do this as part of the "pacman source", i.e. as a patch to repo-add c) Not use this script until pacman actually has this feature.
If the intent is to let users zgrep it, then I agree with cactus that that is significantly more complex then actually using the website to provide a search interface.
Yeah. I wasn't apposed to having a file search mechanism on the site. I was apposed to having pacman query the website. If a user opens up a browser and searches, no problem. It was tying this to pacman that I felt was a *really bad idea*.
Alternatively, if there is a pacman only solution, that involves some mirrored meta in the repository, that is something else entirely, and should probably be talked about on the pacman dev list, so as to make it as distribution neutral as possible.
As I've said already, I really don't think this feature belongs in pacman. Obviously you can draw the connection with the -Ql operation, and the fact that we have -Ss, but this is something a bit different than that and I see it as feature creep. -Dan
On Jan 25, 2008 6:40 PM, Dan McGee <dpmcgee@gmail.com> wrote:
[...] this is something a bit different than that and I see it as feature creep.
-Dan
After a lil thinking, I agree that this doesn't belong in pacman. I also feel this is an important tool people should have offline. Maybe just a entirely new script that searches a provided filelist, or goes to a default location? Something like the following sounds appealing to me: pacman -s findpkgfile # or whatever else would be a good name findpkgfile -y # download filelists from some given mirror findpkgfile '/lib/libyourmom.so.hot' => yourmom-5.2-2 Then we can add fun functionality and features to THAT script instead of pacman. One that comes to mind is asking if you'd like to install the found package that contains/owns the file, etc. Thought I'd document my thoughts. // jeff -- . : [ + carpe diem totus tuus + ] : .
On Jan 25, 2008 6:40 PM, Dan McGee <dpmcgee@gmail.com> wrote:
On Jan 25, 2008 5:31 PM, eliott <eliott@cactuswax.net> wrote:
On 1/25/08, Aaron Griffin <aaronmgriffin@gmail.com> wrote:
On Jan 25, 2008 5:02 PM, Thomas Bächler <thomas@archlinux.org> wrote:
eliott schrieb:
I guess I don't see where this script fits in, and how it is supposed to be used. thomas made mention of using zgrep for advanced users, but that seems just as difficult as opening a web browser and typing into a search box.
The purpose is to provide filelists for download, so they can be searched offline by pacman. My first idea (implementing an online search in pacman) was rejected, thus I thought about a "download the filelist and search it" offline solution.
Oh, I must have misunderstood too. If you're going to implement filelist search and all that stuff, we should: a) Move this to the pacman-dev mailing list b) Add external tools to do this as part of the "pacman source", i.e. as a patch to repo-add c) Not use this script until pacman actually has this feature.
If the intent is to let users zgrep it, then I agree with cactus that that is significantly more complex then actually using the website to provide a search interface.
Yeah. I wasn't apposed to having a file search mechanism on the site. I was apposed to having pacman query the website. If a user opens up a browser and searches, no problem. It was tying this to pacman that I felt was a *really bad idea*.
Alternatively, if there is a pacman only solution, that involves some mirrored meta in the repository, that is something else entirely, and should probably be talked about on the pacman dev list, so as to make it as distribution neutral as possible.
As I've said already, I really don't think this feature belongs in pacman. Obviously you can draw the connection with the -Ql operation, and the fact that we have -Ss, but this is something a bit different than that and I see it as feature creep.
I disagree - pacman is the tool for managing local and remote collections of packages, and knowing what files are inside what packages certainly falls in that realm. I don't see how this feature is any more feature-creepish than pacman -Ql or pacman -Qo. There've been many valid use-cases suggested already, so it's not a fluff request. Maybe I'm missing something here, but I don't see what's so horrible about including it, aside from the fact it means we need to download more meta-info from the repos. I've skimmed through the thread, and haven't seen this yet, so I'll ask - can those who are opposed (Dan, and Jeff for instance) give reasons why you think it's improper to place this functionality inside pacman itself?
participants (10)
-
Aaron Griffin
-
Dan McGee
-
eliott
-
Jason Chu
-
Jeff Mickey
-
Roman Kyrylych
-
Simo Leone
-
Thomas Bächler
-
Travis Willard
-
Xavier