[arch-general] Btrfs more than twice as fast compared to ext4

Nathan Wayde kumyco at konnichi.com
Tue Mar 16 10:11:41 CET 2010


On 16/03/10 00:48, Shridhar Daithankar wrote:
> [...]
> But as far as file system performance goes, the overhead should be identical
> for both the runs, no?
>
I'm not too sure about that. I'm guessing there is less seeking going on 
with Btrfs. Some files systems (reiserfs + reiserfs4 IIRC) are very good 
with many small files, better than the ext*fs, this may be another case 
of that.

> Besides, I need to run the comparison(rather verification of file contents)
> many times over during the application life-cycle and I cannot afford to bring
> in another copy from disk. The working set is expected to be 30-40GB at a
> time, 3GB is just test setup.
>
> With md5sum, I can store it in database and verify it on one copy only.
>
Fair enough.

> And finally, it is terrible on timings. Running md5sum is lot faster, about 3
> times in the best case.
>  [...]
wow, that's slow!

> So when the source file system is btrfs, it is still couple of times faster at
> least.
I still think you could achieve better times by not calling the external 
command that many times.
Since you're already gonna store the checksums in a database, I'd just 
write a proper program in python or something.

Or even just a shellscript, but you might wanna refrain from for .. in 
`find .. , it's the slowest and that relies on the fact that your 
filenames don't have spaces in them.

[[ky] ~]# }} time find /usr/bin -type f -print0 | xargs -0 md5sum > /tmp/1
real	0m3.633s

[[ky] ~]# }} time find /usr/bin -type f -exec md5sum "{}" \; > /tmp/2
real	0m10.196s
[[ky] ~]# }} time for i in `find /usr/bin -type f`;do md5sum "$i";done > 
/tmp/3
real	0m11.245s

this last version missed a file because it has spaces in its name and as 
result the file 3 was inconsistent with files 1 and 2

[[ky] ~]# }} diff /tmp/{1,2}
[[ky] ~]# }} diff /tmp/{3,2}
3054a3055
 > 0c5d8f10aa0731671a00961f059dc46e  /usr/bin/New SMB and DCERPC 
features in Impacket.pdf

that was a test against just 4008, so you can imagine time savings with 
50000+ files.


More information about the arch-general mailing list