[arch-devops] Secondary backup ideas

newer
[arch-devops] New devops member -...

older
[arch-devops] Status of the Arch...

Florian Pritz

11 Jan 2018 11 Jan '18

4:54 p.m.

Hi, Bartłomiej pointed out that we somehow stopped using the mailing list for discussion so here we go. Currently we only create backups on vostok using borg. A possibly quite big problem with this is that if an attacker gains access to a server, they also have sufficient access to remove all backups of that server. We could restrict that in borg, but then borg wouldn't be able to clean up old backups regularly and our repos would grow forever. A better solution is to create backups of the backups in a way that the front end servers can not delete any of these secondary backups. Possible solutions include: - Creation of a second layer of backups (backups of the backups) on vostok. This roughly doubles our space requirement and we are currently at 44% usage so this won't work well/for long. Unless we can use file system level snapshots or similar for this to reduce the required space, it's out. - Put the secondary backups on a different, possibly new, machine using borg. The backup would be created on vostok from the existing backup data. - Put them on AWS glacier. Roughly 4€/month and TB; suggested by Tyler from Arch Linux 32 Using glacier would require that we export tarballs (supported by borg) and then upload them. However, since the backups are encrypted and vostok is not supposed to be able to read them, the tarballs need to be created on and uploaded from the servers themselves. This may become a cpu/bandwidth/traffic concern if done often. Tyler is currently investigating this for Arch Linux 32's backups AFAIK. Does anyone have other ideas what we could do here to ensure that we have backups of the backups? The most important requirements are that no matter which server an attacker manages to get access to, they cannot read any user data from other servers and that they cannot remove all backups from that server alone. Florian

Attachments:

signature.asc (application/pgp-signature — 858 bytes)

Show replies by date

Archange

11 Jan 11 Jan

5:32 p.m.

Le 11/01/2018 à 17:54, Florian Pritz via arch-devops a écrit :

...

Hi,

Bartłomiej pointed out that we somehow stopped using the mailing list for discussion so here we go.

Currently we only create backups on vostok using borg. A possibly quite big problem with this is that if an attacker gains access to a server, they also have sufficient access to remove all backups of that server. We could restrict that in borg, but then borg wouldn't be able to clean up old backups regularly and our repos would grow forever.

I haven’t digged deep into borg doc, but I suppose there is no mode where only the client has the encryption/decryption key, but only the server can remove data? If e.g. backup date metadata is available to the server this could be done, but I don’t know how it works exactly so…

...

A better solution is to create backups of the backups in a way that the front end servers can not delete any of these secondary backups.

Possible solutions include: - Creation of a second layer of backups (backups of the backups) on vostok. This roughly doubles our space requirement and we are currently at 44% usage so this won't work well/for long. Unless we can use file system level snapshots or similar for this to reduce the required space, it's out. - Put the secondary backups on a different, possibly new, machine using borg. The backup would be created on vostok from the existing backup data. - Put them on AWS glacier. Roughly 4€/month and TB; suggested by Tyler from Arch Linux 32

Using glacier would require that we export tarballs (supported by borg) and then upload them. However, since the backups are encrypted and vostok is not supposed to be able to read them, the tarballs need to be created on and uploaded from the servers themselves. This may become a cpu/bandwidth/traffic concern if done often. Tyler is currently investigating this for Arch Linux 32's backups AFAIK.

Does anyone have other ideas what we could do here to ensure that we have backups of the backups? The most important requirements are that no matter which server an attacker manages to get access to, they cannot read any user data from other servers and that they cannot remove all backups from that server alone.

Florian

I’m not sure how that works technically speaking, but I suppose that for AWS Glacier you intend to use borg append-only mode to avoid an attacker deleting the backups? How would the cleaning work in this case? We would just not care about it because Glacier? In your ideas #1 and #2, it seems to me that an attacker gaining access to vostok is able to remove all backups from all servers, right? So “the most import requirement that no matter which server an attacker manages to get access to they cannot remove all backups [from that server alone]” seems flawed from the start. Or you implies that if an attacker only get access to vostok, that’s OK because all servers are still running so that there is no loss of data? Bruno

Florian Pritz

7:51 p.m.

On 11.01.2018 18:32, Archange wrote:

...

I haven’t digged deep into borg doc, but I suppose there is no mode where only the client has the encryption/decryption key, but only the server can remove data? If e.g. backup date metadata is available to the server this could be done, but I don’t know how it works exactly so…

Borg stores pretty much everything, including metadata, in encrypted form so to do any cleanup you need to have the key. Also AFAIK it only supports a single key.

...

I’m not sure how that works technically speaking, but I suppose that for AWS Glacier you intend to use borg append-only mode to avoid an attacker deleting the backups? How would the cleaning work in this case? We would just not care about it because Glacier?

If we were to use glacier, we would upload tarballs, not borg repos. We would use `borg export-tar` to create the tarball from the repo though, but that's really just a minor detail. It creates a normal tarball in the end. That would then be encrypted with GPG or something and uploaded to glacier. Cleanup would either work via a dedicated access key to glacier or via automated cleanup rules inside glacier. I'm not sure if cleanup rules are generally supported or if they are only available in special cases, but dedicated accounts should work. It's mostly an idea and not yet tested. I actually dislike the part where it uploads the whole tarball each time. At least for orion that would be ~200GiB and even with 1Gbit/s that takes roughly 30 minutes, but that would probably impede performance too much. Also it will get worse if we ever add more data. In total I'm not too happy with this.

...

In your ideas #1 and #2, it seems to me that an attacker gaining access to vostok is able to remove all backups from all servers, right?

Good point. vostok has a much smaller attack surface (just ssh and zabbix-agent) than the other machines, but this is certainly not ideal. With #1 and #2 vostok would have access to remove the files so we really just moved the problem. Still better than what we have right now, but not great. Glacier with separated upload and removal accounts/access keys would be better here. I'll think some more about that. Thanks! Florian

Thore Bödecker

7:17 p.m.

Hey, On 11.01.18 - 17:54, Florian Pritz via arch-devops wrote:

...

Does anyone have other ideas what we could do here to ensure that we have backups of the backups? The most important requirements are that no matter which server an attacker manages to get access to, they cannot read any user data from other servers and that they cannot remove all backups from that server alone.

Do we have the same contraints/requirements for the secondary (backup) backups, that we have for the current primary ones? That is, do we need to have the same intervals for the secondary as for the primary? (I'm going to explore all ideas even if already determined "not optimal") What I first had in mind sadly does not work properly in the long run after thinking it through: Rent a second server (let's call it "B"), that is being push-rsync'ed from vostok. Here we could use rrsync to deny deletion of files on "B", so that even if vostok is compromised, the attacker could not do something like: rsync --delete -a /empty B:/the/precious/backups/ The problem with is: we lose the ability of borg prune cleanups and the backups on "B" would continue to grow and grow and grow.... (It is not possible to use find -mtime +90 or similar due to the nature of borg) Second idea: Use rsnapshot or similar pull-based backups on "B" to pull the data from vostok. The access permissions of "B" would need to be limitied to "read only" on vostok, so that an attacker on "B" is not able to delete antyhing on vostok. It *would* be possible to do something like rsync --delete-after now, so that the local copy on "B" would also be cleaned up, but this has another caveat: If vostok gets compromised and the attacker deletes all backups there, a subsequent rsync from "B" would propagate the deletion to "B" and all backups would be lost. So we are back at the problem of missing cleanup abilities again. Third idea: Use a different backup interval on the individual servers and use borg to push the backups to "B". Doing all backups twice with the same interval would create a considerable (at least double) amount of backup-data. Ideally we would deny prune/delete through borg on "B", so that even if a server gets compromised, the attacker would not be able do delete the backups of that server on "B". This in turn would again result in continuesly growing backup/archive space. Another thing I would like to mention: It would be best to use a completely different backup-software AND filesystem for the secondary backups. So that even if there happens to be a real bad in borg or the the filesystem where our current borg backups reside on (I suppose ext4?), it would ensure that our secondary backups remain unaffected by that. Unfortunately I have no complete solution for the concers raised by Bartłomiej and Florian (yet). I hope that someone might be able to pick up where I left off with my ideas and can put them into proper concepts. Cheers, Thore --

Thore Bödecker

7:20 p.m.

On 11.01.18 - 20:17, Thore Bödecker via arch-devops wrote:

...

So that even if there happens to be a real bad in borg or the the filesystem where our current borg backups reside on (I suppose ext4?), it would ensure that our secondary backups remain unaffected by that.

... real bad *bug* ... --

Thore Bödecker

7:47 p.m.

After giving it a little more thought, I just came up with the following proposal: As before, rent a second storage server ("B"). Use a different Filesystem than vostok for the backup-storage mountpoint. Use duplicity/duply on the servers for our secondary backup-chain. Duply/Duplicity allow for assymmetric crypto using gpg and they support separate keys for signing and encrypting. Each server could have its own keypair so that it can sign its own backups. For encryption we could use the Arch Linux Master keys?! (The individual servers only need the public key parts, obviously). Although duplicity/duply are not as "fancy" as borg, we could follow some kind of "weekly full backup" with "incremental daily backups". Of course we would need to deny the servers deletion/prune permissions on "B" but this time we could use some sort of find -mtime +90 algorithm on "B" for cleanup. This would be a considerable benefit over borg for the secondary backup chain as the servers themselves are not able to decrypt their own backups, hence an attacker couldn't do that either. As this would be part of a "disaster recovery" backup chain, it should not be an issue that the backups can only be decrypted by our Master keys (or a new set of privileged keys that only a very few selected members will given access to) because for our day-to-day restore needs we could continue to use borg. (And the awesome borg-restore.pl script from Florian) I think this a more complete concept and could suite us well. Feedback welcome! (For my previous "failed" attempts too!) Cheers, Thore --

Florian Pritz

8:31 p.m.

On 11.01.2018 20:47, Thore Bödecker via arch-devops wrote:

...

Use duplicity/duply on the servers for our secondary backup-chain.

I used to have duplicity for my personal backups and compared to borg is felt awfully slow and we need at least twice the space of a full backup plus the incrementals which might be a problem in the future.

...

This would be a considerable benefit over borg for the secondary backup chain as the servers themselves are not able to decrypt their own backups, hence an attacker couldn't do that either.

I'm not sure why an attacker would be interested in the backup data when they have access to the source data. Unless they are really interested in history and not current data that seems moot. Future data would be easy to get if they just stay hidden until that data is current. Thinking about the rsnapshot/borg-bug situation some more, it might be nice if we have monthly/bi-weekly tarballs on glacier for 2-3 months so that we can roll back to an old borg version/operating system that worked. Also that would be a totally second chain, similar to what you aimed at with duplicity. The low frequency would also allow us to keep the additional load relatively low. Florian

Florian Pritz

8:22 p.m.

On 11.01.2018 20:17, Thore Bödecker wrote:

...

Do we have the same contraints/requirements for the secondary (backup) backups, that we have for the current primary ones? That is, do we need to have the same intervals for the secondary as for the primary?

Intervals don't matter too much I'd say, but something like every one or two days would be good I think. Once a week may be a little bit much loss if it should ever come to us needing those backups.

...

Here we could use rrsync to deny deletion of files on "B", so that even if vostok is compromised, the attacker could not do something like: rsync --delete -a /empty B:/the/precious/backups/

There is actually a note in the rrsync source that it is not as secure as you might imagine. It only checks the rsync command that is executed, but apparently the rsync protocol itself may be vulnerable to attack even with the correct options set. Sure, it may be unlikely that someone tries this, but we're exploring all options here.

...

Second idea:

Use rsnapshot or similar pull-based backups on "B" to pull the data from vostok. The access permissions of "B" would need to be limitied to "read only" on vostok, so that an attacker on "B" is not able to delete antyhing on vostok. It *would* be possible to do something like rsync --delete-after now, so that the local copy on "B" would also be cleaned up, but this has another caveat: If vostok gets compromised and the attacker deletes all backups there, a subsequent rsync from "B" would propagate the deletion to "B" and all backups would be lost.

They wouldn't because we use rsnapshot and thus have old snapshots still available. Sure, an attacker could delete the backups and after X days B would have finally rotated away all snapshots with data, but the point is that we have sufficient time to notice the problem and deal with it. One potential problem here is that rsnapshot doesn't have verification support so if we are unlucky, the files could get eaten by bit rot and borg is somewhat susceptible to this thanks to deduplication and encryption. We could probably work around this by creating checksum files in rsnapshot (it can execute commands too) and then writing a custom verification script. I like this idea. Does anyone see any more potential issues with it? Feel free to take it apart as good as you can.

...

Third idea:

Use a different backup interval on the individual servers and use borg to push the backups to "B". Doing all backups twice with the same interval would create a considerable (at least double) amount of backup-data. Ideally we would deny prune/delete through borg on "B", so that even if a server gets compromised, the attacker would not be able do delete the backups of that server on "B". This in turn would again result in continuesly growing backup/archive space.

I know that running borg twice is what upstream recommends, but I think we can do better without having additional load on the servers. Also the append-only problem you mentioned is still there...

...

It would be best to use a completely different backup-software AND filesystem for the secondary backups. So that even if there happens to be a real bad in borg or the the filesystem where our current borg backups reside on (I suppose ext4?), it would ensure that our secondary backups remain unaffected by that.

The file system is a good point. We currently don't have automatic verification, so that should be improved. We might not notice bit rot or broken file systems otherwise. Using a different software might be a nice idea, but that certainly puts additional load on the source servers and I'm not sure if it's really worth it. Borg is well tested and old versions are available. It might be worth thinking about this more though. Florian

Giancarlo Razzolini

12 Jan 12 Jan

10:34 a.m.

Em janeiro 11, 2018 13:54 Florian Pritz via arch-devops escreveu:

...

Bartłomiej pointed out that we somehow stopped using the mailing list for discussion so here we go.

We discuss a lot of these things over IRC and now that I'm currently without a bouncer, I feel this more.

...

Possible solutions include: - Put the secondary backups on a different, possibly new, machine using borg. The backup would be created on vostok from the existing backup data. - Put them on AWS glacier. Roughly 4€/month and TB; suggested by Tyler from Arch Linux 32

Using glacier would require that we export tarballs (supported by borg) and then upload them. However, since the backups are encrypted and vostok is not supposed to be able to read them, the tarballs need to be created on and uploaded from the servers themselves. This may become a cpu/bandwidth/traffic concern if done often. Tyler is currently investigating this for Arch Linux 32's backups AFAIK.

Having used S3/Glacier a lot for backups I can say it's a great option for this. You can actually upload everything to S3 and have it move automatically to Glacier after a period of time has passed. In that way you have the most recent backups available right away and the older ones are on Glacier. S3 has the infrequent access pricing tier for this. If we can wait for Glacier retrieval times, then we don't need to use S3. As for the security, we can have keys that can only upload to a specific S3 bucket/Glacier, and they can't remove data from there. But we would need to keep the main AWS account and any full access IAM account very secure, because it could possibly remove everything. Regards, Giancarlo Razzolini

2639

Age (days ago)

2640

Last active (days ago)

List overview

Download

8 comments

4 participants

participants (4)

Archange
Florian Pritz
Giancarlo Razzolini
Thore Bödecker

[arch-devops] Secondary backup ideas

tags

participants (4)