### query the sql directly $ time sqlite3 archweb.db "select path, pkgname from packages p, packages_files pf where p.id = pf.pkg_id and path like '%usr/bin/gcc%'" usr/bin/gcc-3.3|gcc3 usr/bin/gccbug-3.3|gcc3 usr/bin/gcc-3.4|gcc34 usr/bin/gccbug-3.4|gcc34 usr/bin/gccbug|gcc usr/bin/gcc|gcc usr/bin/gccmakedep|imake sqlite3 archweb.db 3.45s user 0.14s system 99% cpu 3.622 total ### create a greppable file $ time sqlite3 archweb.db "select path, pkgname from packages p, packages_files pf where p.id = pf.pkg_id and path not like '%/'" | sort > filelist sqlite3 archweb.db 5.80s user 0.25s system 86% cpu 6.981 total sort > filelist 1.30s user 0.41s system 21% cpu 7.843 total $ time gzip -9 < filelist > filelist.gz gzip -9 < filelist > filelist.gz 3.06s user 0.03s system 98% cpu 3.119 total $ lh filelist* -rw-r--r-- 1 nathanj users 32M 2008-01-27 11:46 filelist -rw-r--r-- 1 nathanj users 2.7M 2008-01-27 11:46 filelist.gz ### query using grep $ time grep usr/bin/gcc filelist usr/bin/gcc-3.3|gcc3 usr/bin/gcc-3.4|gcc34 usr/bin/gccbug-3.3|gcc3 usr/bin/gccbug-3.4|gcc34 usr/bin/gccbug|gcc usr/bin/gccmakedep|imake usr/bin/gcc|gcc grep usr/bin/gcc filelist 0.02s user 0.02s system 80% cpu 0.045 total $ time zgrep usr/bin/gcc filelist.gz usr/bin/gcc-3.3|gcc3 usr/bin/gcc-3.4|gcc34 usr/bin/gccbug-3.3|gcc3 usr/bin/gccbug-3.4|gcc34 usr/bin/gccbug|gcc usr/bin/gccmakedep|imake usr/bin/gcc|gcc zgrep usr/bin/gcc filelist.gz 0.23s user 0.05s system 99% cpu 0.279 total I think the best way to implement this would be to set a cronjob on the main archlinux server to generate the greppable file. The generation is fast enough to run every day if wanted, but that is a bit overkill (FWIW, debian's apt-file database regenerates weekly I believe). I would not gzip the file since the search is quite a bit slower that way. All searches would go through the website. Server load shouldn't be a problem since having 50 people hit the page to search a few times each would still be more efficient than having those 50 people all download a 2.7MB file. I doubt there would be that many searches anyway; this type of search is very useful when needed, but it is not needed that often. For the command line usage, it should be easy to create a 10 line python script to call to the server and print the results.