While I like the idea of the archive, it currently does not implement
any kind of cleanup and forever grows in size. Right now is is somewhere
around 50 million files with a total size of 1.7TB. Those are very rough
numbers since calculating them for real takes considerable time.
This creates multiple problems:
- The disks only have ~700gb more space before they are full. We will
reach that eventually
- The backup creation takes somewhere between 4 to 5 hours depending on
I am thinking about increasing the backup frequency where possible, but
if one backup takes 4 hours that won't happen.
I am also worried that if we ever need to restore, this could easily
take well over 24 hours. The initial backup (which read all data rather
than ignoring unchanged data) took something around 24 hours, but that
was with less files.
Essentially this means we need to put some kind of automatic deletion of
old data in place and reduce the size of the archive. What would be a
good time frame here? Have we defined a clear goal for the archive that
we can use as a guideline?
My gut says 6 months should be enough, but feel free to disagree.
Once we have decided how long data should be kept, we also need
something to actually delete it. Deleting files from the ./repos
directory should be simple, but the ./packages directory is more
complicated because it is not nicely separated into directories per day.
Sebastien, do you have an idea about how we could delete old data or do
you have a script for that?