On Tuesday 16 March 2010 14:41:41 Nathan Wayde wrote:
On 16/03/10 00:48, Shridhar Daithankar wrote:
[...] But as far as file system performance goes, the overhead should be identical for both the runs, no?
I'm not too sure about that. I'm guessing there is less seeking going on with Btrfs. Some files systems (reiserfs + reiserfs4 IIRC) are very good with many small files, better than the ext*fs, this may be another case of that.
Yes btrfs does have tail packing i.e. storing inode and the file together in a single block. However all the files I had in the tree were 50-55K in size and that definitely does not fit in a block.
I still think you could achieve better times by not calling the external command that many times. Since you're already gonna store the checksums in a database, I'd just write a proper program in python or something.
The application I am developing already has copy/copyttree and md5sum built- in. I mmap the whole file and do memcpy/memcmp/md5sum in a single pass. That is already a bit faster than native cp, which uses write and buffer management. I changed/refactored the tree copy code and created a new tree. And I wanted to verify outside the application that the tree copy has gone good. Hence did find/md5sum. This was a one time exercise only but the result were drastic enough to be published. -- Regards Shridhar