[arch-releng] [archiso] saying good-bye to union-fs-method?

Tue May 31 22:59:55 EDT 2011

On 05/31/2011 09:43 PM, Gerardo Exequiel Pozzi wrote:
> On 05/31/2011 03:19 PM, Thomas Bächler wrote:
>> Am 31.05.2011 18:07, schrieb Thomas Bächler:
>>>>>> This change from file-to-file level to block-to-block logic.
>>>>> This actually sounds pretty cool. However, I don't understand how it
>>>>> works: The squashfs file system is a read-only one, how can we put 
>>>>> that
>>>>> into a block-level snapshot?
>>>>>
>>>> Yes, but there are at least one downside: since is not an union-fs,
>>>> there is no concept of layers. We are currently using "overlay" thats
>>>> overlap some files on the layer "root-image".
>>> This is not problematic.
> OK.
>
> Maybe if we want to maintain an original root-image we can do this 
> using a snapshot with all changes.
> This make the build process a bit more complex.
>
>>>> About your question: The squasfs image contains only one file... an
>>>> image of and 4 GiB ext4. There is another small squasfs image with
>>>> another one file inside that is a "lvm snapshot". So dm device is made
>>>> via these images loopback mounted...
>>> Some of this doesn't make sense to me right now (especially the second
>>> squashfs image).
>>>
>>> However: With an ext4 loopback inside the squashfs image you loose one
>>> squashfs feature: metadata compression. I don't know how bad that will
>>> be though.
> Yes. But the difference is small.
>
> # du -sh /tmp/archbase*
> 398M    /tmp/archbase (1)
> 456M    /tmp/archbase.ext4 (2)
> 148M    /tmp/archbase.ext4.sfs (3)
> 145M    /tmp/archbase.sfs (4)
>
> (1) "base" group with all deps.
> (2) 1GiB file (sparse) then copy all files on it (avoid unused space 
> usage of file deletions, in other words: trash)
> (3) squashfs of one file (/tmp/archbase.ext4)
> (4) squashfs of many files (/tmp/archbase/)
>>> I'll summarize what I think is going on: You mount the squashfs 
>>> (loop0),
>>> and set up the ext4 image inside as loop1. You then create a large
>>> sparse file on tmpfs and set it up as loop2. Then, you create a 
>>> snapshot
>>> device (can't use LVM here, can one use the device-mapper snapshot
>>> target directly?) with loop1 as read-only and loop2 as read-write 
>>> layer.
>>> You then mount that device. Sounds doable, but not optimal (a VFS-based
>>> solution would be way cooler).
>> Okay, this is how a short test works:
>>
>> # dd if=/dev/zero of=ro.img bs=1 seek=100M count=1
>> # dd if=/dev/zero of=cow.img bs=1 seek=100M count=1
>> # mkfs.ext4 -F ro.img
>> # losetup /dev/loop0 ro.img
>> # losetup /dev/loop1 cow.img
>> # mount /dev/loop0 /some/path
>> (put some stuff into /some/path)
>> # umount /some/path
>> # echo "0 $(blockdev --getsize /dev/loop0) snapshot /dev/loop0
>> /dev/loop1 N 8" | dmsetup create snapshottest
>> (8 ist the "chunk size" here, which is now 8 sectors. no idea what a
>> good value might be)
> I think that 8 is the optimal 8*512 = 4096, that is the size of block 
> of ext4 image and the block size of loopback device.
>> # mount /dev/mapper/snapshottest /mnt/other
>> (do whatever you want in /mnt/other)
>> # umount /mnt/other
>> # dmsetup remove snapshottest
>> # losetup -d /dev/loop0
>> # losetup -d /dev/loop1
>>
>> In archiso, the file 'ro.img' would be inside squashfs, while 'cow.img'
>> would be in a tmpfs. Noteagain that we might get worse compression with
>> a single file in squashfs, compared to a real read-only squashfs - we
>> could consider making part of the file system entirely read-only, not 
>> sure.
> Again not a big issue here, in terms of size.
>> This seems fairly easy to implement on the mounting side. Creation of
>> the ext4 should be pretty straightforward, too.
> There are others downside, only notable when you work inside the "live 
> medium": Create/modify a file and delete it, the space is not released 
> at "physical level" (on tmpfs), like now where new/modify files are 
> written directly on tmpfs.
>
> So go ahead?
>
> What do you think: support of both modes if aufs come again(*) or 
> change all code and simply keep this mode (if all people are happy)
>
> Thanks for your feedback.
>
> (*) or something official union layer in Linux 100.0 :P.
>
before I start, I am thinking in some others areas:

* core-iso: I think we need (If I don't remember bad) write access to 
/src (where core pkgs is mounted), implies we need to mount core-pkgs 
indirectly via dm-dev w/snapshot like root-img.
* dual images (depending on profile [-T] )
** core-any-pkgs.sqfs, will dissapear (no more union-fs trick), yes this 
can be done via symlinks... (about 16MB saved)
** usr-share.sqfs, should be done in the same way that root-image, 
otherwise update is not posible.
** lib-modules.sqfs, idem usr-share and root-image.
* net-iso, no issues.

mkarchiso can take a optional parameter when creating the "fs.img" thats 
add some percent to be "free space". """ USED=$(du -sh directory) .... 
dd ..... of="fs.img" seek=$((USED*PERCENT_FREE))  """

-- 
Gerardo Exequiel Pozzi
\cos^2\alpha + \sin^2\alpha = 1