[arch-general] Btrfs more than twice as fast compared to ext4
Nathan Wayde
kumyco at konnichi.com
Tue Mar 16 10:11:41 CET 2010
On 16/03/10 00:48, Shridhar Daithankar wrote:
> [...]
> But as far as file system performance goes, the overhead should be identical
> for both the runs, no?
>
I'm not too sure about that. I'm guessing there is less seeking going on
with Btrfs. Some files systems (reiserfs + reiserfs4 IIRC) are very good
with many small files, better than the ext*fs, this may be another case
of that.
> Besides, I need to run the comparison(rather verification of file contents)
> many times over during the application life-cycle and I cannot afford to bring
> in another copy from disk. The working set is expected to be 30-40GB at a
> time, 3GB is just test setup.
>
> With md5sum, I can store it in database and verify it on one copy only.
>
Fair enough.
> And finally, it is terrible on timings. Running md5sum is lot faster, about 3
> times in the best case.
> [...]
wow, that's slow!
> So when the source file system is btrfs, it is still couple of times faster at
> least.
I still think you could achieve better times by not calling the external
command that many times.
Since you're already gonna store the checksums in a database, I'd just
write a proper program in python or something.
Or even just a shellscript, but you might wanna refrain from for .. in
`find .. , it's the slowest and that relies on the fact that your
filenames don't have spaces in them.
[[ky] ~]# }} time find /usr/bin -type f -print0 | xargs -0 md5sum > /tmp/1
real 0m3.633s
[[ky] ~]# }} time find /usr/bin -type f -exec md5sum "{}" \; > /tmp/2
real 0m10.196s
[[ky] ~]# }} time for i in `find /usr/bin -type f`;do md5sum "$i";done >
/tmp/3
real 0m11.245s
this last version missed a file because it has spaces in its name and as
result the file 3 was inconsistent with files 1 and 2
[[ky] ~]# }} diff /tmp/{1,2}
[[ky] ~]# }} diff /tmp/{3,2}
3054a3055
> 0c5d8f10aa0731671a00961f059dc46e /usr/bin/New SMB and DCERPC
features in Impacket.pdf
that was a test against just 4008, so you can imagine time savings with
50000+ files.
More information about the arch-general
mailing list