On 06.03.2018 00:38, Phillip Smith via arch-devops wrote:
restic is yet to implement compression [0]. The deduplication seems quite functional, especially when you can share a restic repository between multiple clients (if desired), so common data across clients is deduped in the repo.
I don't like the idea of deduplicating across clients because that potentially allows an attacker to gain more than just the information stored on the attacked machine. I know of at least one case where a company had serious problems because their backup server was hacked. They didn't have client-side encrypted backups so the attacker could access some keys that allowed to connect to other machines and so on. Compression would be nice, but maybe we can survive without it. The biggest chunk of data are our packages and those are compressed anyways.
Have we committed to the idea of paying for Amazon or similar service for this project?
Sure, infra costs money and if we pay it ourselves we can actually count on getting good support and/or being independent. So paying something is certainly fine, but the final amount will have to be discussed once we have satisfactory solutions.
Perhaps we could consider having the 2nd backup on a snapshotting file-system (ie, ZFS) with something like rsync.net <http://rsync.net> [1]. Then it would just be a dumb rsync from primary backup to secondary backup and the 2nd host retains snapshots to protect against malicious 'cleanups' from the 1st backup host.
That has been suggested elsewhere in the original thread already. I also have a note mentioning a combination of such a snapshotting solution on one of our own servers and more rarely created tarball backups on glacier with deletion restrictions enabled for disaster recovery.
Also, how badly do outside changes impact the performance? Let's say we have the keys on the admin machines (which we need for restores anyway) and perform the cleanup there. How long would it take to run, how much data would it need to transfer (few megabytes, few hundred megabytes, gigabytes, ...?) and do the clients then need to regenerate their caches or can they run at full performance just like before?
I'll do some testing to get some idea of the answers for this.
Looking forward to the results! Florian