On 5 March 2018 at 21:06, Florian Pritz via arch-devops < arch-devops@lists.archlinux.org> wrote:
In that case you could go to the list archive, open the post you want to reply to and then click the email address of the sender which will set the correct subject and In-Reply-To headers.
TIL. Thank-you :)
The problem here is that restic doesn't work with glacier according to this[1]. So we'd need to use s3 which is more expensive. How much mostly depends on how long we want to keep the data and how well restic compresses/deduplicates it.
restic is yet to implement compression [0]. The deduplication seems quite functional, especially when you can share a restic repository between multiple clients (if desired), so common data across clients is deduped in the repo. Have we committed to the idea of paying for Amazon or similar service for this project? I like the idea of using a different tool with (hopefully) good
deduplication/compression though. This is certainly better than sending many gigabytes of tarballs around for each backup.
Definitely! :)
As for the cleanups, I understand that the server and the client would both have keys to access the backup data, correct? That means that the server can read all of the data which makes it a good target for an attacker. Currently we avoid this by only storing client-side encrypted data on the server. I'd like to keep it this way.
I don't see any way to allow the client to manage cleanups without having write access (and therefore the ability to delete) to the 2nd backup. Perhaps we could consider having the 2nd backup on a snapshotting file-system (ie, ZFS) with something like rsync.net [1]. Then it would just be a dumb rsync from primary backup to secondary backup and the 2nd host retains snapshots to protect against malicious 'cleanups' from the 1st backup host.
I also like the idea of having a WORM s3/glacier bucket. However, I'm not sure how this can be combined sanely with anything other than tarballs. From looking at the restic documentation it seems that they also use an object store so even old objects might still be used in recent backups. Is there another way to achieve cleanup with restic that doesn't require a server with access to the backup keys?
Indexes etc would have to be updated I'm sure, so I don't think there is any tricky ways to do this. I did read somewhere that the repo format is 'read-only' to ensure consistency (ie, files only ever get added to the repo on disk). I can't find the reference to that right now though sorry.
Also, how badly do outside changes impact the performance? Let's say we have the keys on the admin machines (which we need for restores anyway) and perform the cleanup there. How long would it take to run, how much data would it need to transfer (few megabytes, few hundred megabytes, gigabytes, ...?) and do the clients then need to regenerate their caches or can they run at full performance just like before?
I'll do some testing to get some idea of the answers for this. [0] https://github.com/restic/restic/issues/21 [1] http://www.rsync.net/resources/howto/snapshots.html