[arch-devops] Secondary backup ideas (Task #50)

Mon Mar 5 23:38:57 UTC 2018

On 5 March 2018 at 21:06, Florian Pritz via arch-devops <
arch-devops at lists.archlinux.org> wrote:

> In that case you could go to the list archive, open the post you want to
> reply to and then click the email address of the sender which will set
> the correct subject and In-Reply-To headers.
>

TIL. Thank-you :)

>
> The problem here is that restic doesn't work with glacier according to
> this[1]. So we'd need to use s3 which is more expensive. How much mostly
> depends on how long we want to keep the data and how well restic
> compresses/deduplicates it.
>

restic is yet to implement compression [0].  The deduplication seems quite
functional, especially when you can share a restic repository between
multiple clients (if desired), so common data across clients is deduped in
the repo.

Have we committed to the idea of paying for Amazon or similar service
for this project?

I like the idea of using a different tool with (hopefully) good
> deduplication/compression though. This is certainly better than sending
> many gigabytes of tarballs around for each backup.
>

Definitely! :)

>
> As for the cleanups, I understand that the server and the client would
> both have keys to access the backup data, correct? That means that the
> server can read all of the data which makes it a good target for an
> attacker. Currently we avoid this by only storing client-side encrypted
> data on the server. I'd like to keep it this way.
>

I don't see any way to allow the client to manage cleanups without
having write access (and therefore the ability to delete) to the 2nd
backup.
Perhaps we could consider having the 2nd backup on a snapshotting
file-system (ie, ZFS) with something like rsync.net [1]. Then it would
just be a dumb rsync from primary backup to secondary backup and
the 2nd host retains snapshots to protect against malicious 'cleanups'
from the 1st backup host.

> I also like the idea of having a WORM s3/glacier bucket. However, I'm
> not sure how this can be combined sanely with anything other than
> tarballs. From looking at the restic documentation it seems that they
> also use an object store so even old objects might still be used in
> recent backups. Is there another way to achieve cleanup with restic that
> doesn't require a server with access to the backup keys?
>

Indexes etc would have to be updated I'm sure, so I don't think there
is any tricky ways to do this. I did read somewhere that the repo
format is 'read-only' to ensure consistency (ie, files only ever get
added to the repo on disk). I can't find the reference to that right
now though sorry.

> Also, how badly do outside changes impact the performance? Let's say we
> have the keys on the admin machines (which we need for restores anyway)
> and perform the cleanup there. How long would it take to run, how much
> data would it need to transfer (few megabytes, few hundred megabytes,
> gigabytes, ...?) and do the clients then need to regenerate their caches
> or can they run at full performance just like before?
>

I'll do some testing to get some idea of the answers for this.

[0] https://github.com/restic/restic/issues/21
[1] http://www.rsync.net/resources/howto/snapshots.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.archlinux.org/pipermail/arch-devops/attachments/20180306/7d1feec9/attachment-0001.html>