Re: [arch-devops] Secondary backup ideas (Task #50)

6 Mar 2018

      On 5 March 2018 at 21:06, Florian Pritz via arch-devops <
arch-devops@lists.archlinux.org> wrote:
...
In that case you could go to the list archive, open the post you want to
reply to and then click the email address of the sender which will set
the correct subject and In-Reply-To headers.
TIL. Thank-you :)

...
The problem here is that restic doesn't work with glacier according to
this[1]. So we'd need to use s3 which is more expensive. How much mostly
depends on how long we want to keep the data and how well restic
compresses/deduplicates it.
restic is yet to implement compression [0].  The deduplication seems quite
functional, especially when you can share a restic repository between
multiple clients (if desired), so common data across clients is deduped in
the repo.

Have we committed to the idea of paying for Amazon or similar service
for this project?

I like the idea of using a different tool with (hopefully) good
...
deduplication/compression though. This is certainly better than sending
many gigabytes of tarballs around for each backup.
Definitely! :)

...
As for the cleanups, I understand that the server and the client would
both have keys to access the backup data, correct? That means that the
server can read all of the data which makes it a good target for an
attacker. Currently we avoid this by only storing client-side encrypted
data on the server. I'd like to keep it this way.
I don't see any way to allow the client to manage cleanups without
having write access (and therefore the ability to delete) to the 2nd
backup.
Perhaps we could consider having the 2nd backup on a snapshotting
file-system (ie, ZFS) with something like rsync.net [1]. Then it would
just be a dumb rsync from primary backup to secondary backup and
the 2nd host retains snapshots to protect against malicious 'cleanups'
from the 1st backup host.
...
I also like the idea of having a WORM s3/glacier bucket. However, I'm
not sure how this can be combined sanely with anything other than
tarballs. From looking at the restic documentation it seems that they
also use an object store so even old objects might still be used in
recent backups. Is there another way to achieve cleanup with restic that
doesn't require a server with access to the backup keys?
Indexes etc would have to be updated I'm sure, so I don't think there
is any tricky ways to do this. I did read somewhere that the repo
format is 'read-only' to ensure consistency (ie, files only ever get
added to the repo on disk). I can't find the reference to that right
now though sorry.
...
Also, how badly do outside changes impact the performance? Let's say we
have the keys on the admin machines (which we need for restores anyway)
and perform the cleanup there. How long would it take to run, how much
data would it need to transfer (few megabytes, few hundred megabytes,
gigabytes, ...?) and do the clients then need to regenerate their caches
or can they run at full performance just like before?
I'll do some testing to get some idea of the answers for this.

[0] https://github.com/restic/restic/issues/21
[1] http://www.rsync.net/resources/howto/snapshots.html

Re: [arch-devops] Secondary backup ideas (Task #50)

Phillip Smith