On Sat, Dec 08, 2007 at 11:21:40PM +0100, JJDaNiMoTh wrote:
Hello to all.
After 2 weeks of intensive work (:P) I'm glad to post the first proof-of-concept code about using index in our life.
A few suggestions: 1. Make sure to test with packages that contain hyphens, like 'gcc-libs'. Your regular expression does not work well with those packages. 2. Store the actual byte offsets in the index file rather than (or in addition to) the line numbers. It is easier to seek to a position than a line number; see the man page for fseek. 3. You call writeIndexEntry() n times (n = # of pkgs), and each call reads in the entire huge database file. Change it so that it is only read once. Once you do this, you should find that the tot_lines being passed to the script is unnecessary. Pseudocode: open_files() count = 0 pkg = None for i in txtdb.readlines(): if pkg is None: (pkg, ver) = parse_pkg() count = 0 else: count += 1 if i == "@@ENDS\n": write_index_line() pkg = None close_files() I am interested in seeing what the performance differences would be between this, the current backend, and a tar backend (FS#8586), so keep it up and good luck.