On 23/02/12 17:09, Dan McGee wrote:
On Tue, Feb 21, 2012 at 10:02 PM, Allan McRae <allan@archlinux.org> wrote:
When installing a package, write an mtree of the package files into the local database. This will be useful for doing validation of all files on a system.
Signed-off-by: Allan McRae <allan@archlinux.org> ---
Query: should we keep the info on .INSTALL and .CHANGELOG files? Changing a .INSTALL file would be an interesting tactic, but if someone is doing that then they can already adjust the mtree file...
Also, from http://goo.gl/Uq6X5 it appears that this could be made more efficient by reusing the file descriptor, but I could not get that working after many, many, many attempts. Did you rewind the file descriptor? You should just have to call `lseek(fd, 0, SEEK_SET)` first. Of course, since the current version of _alpm_open_archive does both the open() and archive_read_new() business, the abstraction there would have to change.
Ah... lseek was the key. I can do that and make the abstraction to _alpm_open archive(). But it will not be needed if...
With that said, not having to decompress everything twice would also be a win; I saw some chatter about this on IRC but I would definitely prefer to not iterate again; removing the iteration from the diskspace sped it up enough that I enabled that by default; I don't want to lose those gains.
I think this can be done. But it is far from simple. It involves us doing an archive_read_data() to read the data into a buffer, duplicating that buffer and then passing one copy to the archive_write_data() for the file on disk and the other to the write for the mtree archive. It means that we can not use the convenience function archive_read_extract() and that is a big convenience... archive_read_extract(), archive_read_extract_set_skip_file(): A convenience function that wraps the corresponding archive_write_disk(3) interfaces. The first call to archive_read_extract() creates a restore object using archive_write_disk_new(3) and archive_write_disk_set_standard_lookup(3), then transparently invokes archive_write_disk_set_options(3), archive_write_header(3), archive_write_data(3), and archive_write_finish_entry(3) to create the entry on disk and copy data into it. The flags argument is passed unmodified to archive_write_disk_set_options(3). So we would have to duplicate that entire functionality... <snip>
+ /* output the type, uid, gid, mode, size, time, md5 and link fields */ + archive_write_set_options(mtree, "use-set,!device,!flags,!gname,!nlink,!uname,md5"); Did 'use-set' end up being a net-win on size and/or speed?
The size is much small for the raw file when using 'use-set' but that difference entirely disappears when compressing with gzip. In the brief tests I did, the reading was slightly faster using 'use-set'. So, should I go ahead and write a version of archive_read_extract into a function that does both the extraction and mtree creation? Or do people see another way around this? Allan