Server mode is insecure! (server must be trusted)

I just configured a kopia server, but I’m very surprised: I was expecting at some points that I would need to provide the password of the created repository when connecting, but it is not the case, I just need to provide the password of the current user. Similarly, the password is not even asked in the server when rebooting. So is the repository really encrypted server side? Said differently, if one corrupts the server, can they decrypt my whole backups? If not, how can the kopia server encrypt the repository while sharing deduplication between users? (i.e. all users must “know” the password, or the server) I’m quite worried that kopia’s server is insecure in server mode against a corrupted server…

The Kopia server acts as a proxy for access control and therefore needs to verify a user’s credentials only. It does not have to know the repository password as all data is encrypted and decrypted at the client’s end.

@christoph thanks, but after checking, your statement is wrong: kopia server IS insecure and data are NOT encrypted client side but server side, hence the server must be trusted. The first clue is that I don’t need to specify the repository key when connecting from the client: the only specified credential is the password of the user, but as far as I know it relates in no way to the repository key.

Worse, I can run on the server kopia repository status indicating that the server itself is connected to the local filesystem repository, explaining why I don’t need to re-specify the key of the repository on restart as connection is not removed upon restart. Since the repository is opened server-side, I can run any mount command from the server to read user data, and this way I succesfully decrypted all backups server side without knowing any user password. I don’t understand how the doc can be so misleading, a huge warning the be written here saying that the server must be trusted!!

This leads me to my second question: how can we have both an untrusted server and forbid users to remove snapshots unless they specify a specific non-cached password? (I want to prevent a virus on the client to remove my snapshots on the backup server)

Object locking by your cloud provider is only way to ensure that your data is protected fully from malicious user:

and kopia is only FOSS backup software I am aware of which has it implemented. Be careful when playing with it… There is really no way to delete your data before lock expires (and of course paying for storage space used) short of terminating your cloud storage account completely.

You can not solve it reliably relying only on local solution. As at the end you have to trust some machine and user in order for backup to work:) The best what you can do is to run your server in append only mode.

Would you mind sharing the source of this statement?

Well I can’t find official sources, but I just tried to run in the server kopia snapshot list and kopia mount SNAPSHOTID FOLDER and I could read all data… so unless I misconfigured it is a proof of not being secure.

1 Like

I’ll do that then if server mode can’t work, thanks!

It seems like the s3:BypassGovernanceRetention + governance mode (Locking objects with Object Lock - Amazon Simple Storage Service) can help to remove them before no? And I plan to use minio to self host it.

You mean kopia server? Seems to be insecure for what I see as discussed above.

Well you can assume that either the client or the server are honest and get meaningful security guarantees in either cases with a proper protocol.

This is completely by design and client-side vs server-side encryption have different use cases and trade-offs. Doing encryption client side is appropriate when you don’t trust the server and Kopia supports it, just not in the “repository server” mode.

Kopia repository server mode is specifically designed to avoid having to distribute data encryption keys to semi-trusted/untrusted clients (typically employees in a company who can’t be fully trusted to not spy on/attack one another), hence it must be the only one that can encrypt/decrypt the content on behalf of them. In server mode hashing/deduplication happens on the client side, but encryption happens on the server. The data between client and server is protected in transit using TLS so it can’t be spied upon on the network.

The classic/non-server mode of Kopia supports end-to-end encryption where all clients (who trust one another, typically a single user on multiple machines/home lab/family situation) have access to shared repository encryption key and they are responsible for encrypting data before it is sent. In this mode the encrypted data lands directly in S3 so no server ever sees the data unencrypted.

For example in my home I use the classic mode backing up ~15 machines to Wasabi. Many businesses use Kopia in server mode - since that allows the server to apply certain access controls and policies centrally and protect the encryption keys.

2 Likes

Thanks a lot for this clarification, this use case is indeed interesting in specific settings, but it should really be properly documented, as I was myself expecting to have client-side encryption for any client (and I’m not the only one).

It’s precisely because of the access controls that I wanted to use the server mode. It would be great then to provide two modes for the server: client-side encryption vs server-side encryption.

This is quite the opposite of what the server documentation say (which I linked to before):

Repository Server allows an instance of Kopia to proxy access to the underlying storage and has Kopia clients proxy all access through it, only requiring a username and password to talk to the server without any knowledge of repository storage credentials.

(Emphasis by me)

Now I’m utterly confused. :confused:

Edit: Oh wait, I got that wrong: It’s not the server but the clients who don’t need any knowledge about the credentials.

Edit 2: Could that sentence be made clearer, e.g.:

Repository Server acts as a gatekeeper: it is the only place that holds the repository’s actual storage credentials. Clients never use them. Instead, a client merely authenticates to the server with a username and a password; the server then forwards the client’s traffic to the underlying storage.

I think that this is indeed quite unclear, and easy to miss. I’d add a big warning in the doc saying that contrary to the other modes, the server has the capability to decrypt all backup data of all users. For now, saying that users don’t know the repository storage credentials does not imply that the server knows them (e.g. they could do some sort of multi-partite computation to encrypt while nobody knows the key…)

Well… that has been the case for any backup server out there, which has to be able to perform deduplication on the data it handles… As @jkowalski stated, this is by design and rather than getting all worked up about this fact, I’d suggest to think hard about how you can safeguard your Kopia server host. There’s no such thing like a free lunch and you’ll always have to put in the work to secure such a system - or not, but then don’t try to shift this responsibility to the backup software. Keeping a backup server save is, has been and always will be paramount!

Repos do have ACLs and a repo’s user will only be able to access its own data, so even if some client got compromised, it would only affect its own snapshots - but that would be the same for any other backup solution, where a client got compromised.

Regarding your 2nd question… you could place an ACL on your repo, which only allows a client to add new snapshots. Then, you’d set the repo owner as the maintenance user for the repo, so Kopia server still can remove snapshots that are no longer subject to the repo’s retention policy/policies.

“has to” is, like, your opinion, man. I get that deduplication is important within a Snapshot, but across different Snapshots there could be dedup-able bytes, but you really can’t compare that to incremental backups of a single Snapshot. Maybe there’s some use case that’s whooshing over my head here?

I got confused in the exact same way reading the docs. I think this is a perfectly reasonable thing to have concerns about: unlike the other backup targets, encryption is not E2E and should not be advertised as such. I think you really have to contort the definition of “E2E” to say “yeah because there’s encryption in transit, it’s E2E encrypted!”. (TIL Cloudflare’s reverse proxy feature is E2E encrypted too! /s)

By default, every user of Kopia repository directly connects to an underlying storage using read-write access. If the users who share the repository do not entirely trust each other, some malicious actors can delete repository data structures, causing data loss for others.

Repository Server allows an instance of Kopia to proxy access to the underlying storage and has Kopia clients proxy all access through it, only requiring a username and password to talk to the server without any knowledge of repository storage credentials.

In repository server mode, each user is limited to seeing their own snapshots and policy manifest without being able to access those from another user account.

NOTE: Only snapshot and policy manifests are access-controlled, not the underlying contents. If two users share the same file, it will be backed using identical content IDs. The consequence is that if a third user can guess the content ID of files in the repository, they can access the files. Because content IDs are one-way salted hashes of contents, it should be impossible to guess content ID without possessing original content.

The only hints you get that the server can decrypt all client data are “only requiring a username and password to talk to the server without any knowledge of repository storage credentials”, but you could be forgiven for assuming there would be some KDF from the username and password to encrypt data. The real giveaway is “If two users share the same file, it will be backed using identical content IDs.” The only way that would be possible would be if the clients could decrypt each other’s contents or were using the same repository password…

It’s not a “free lunch”. Kopia literally does this when connecting to any other repository. To me it seemed the value add of the repository server was the ability to have a shared storage system with access controls between users. Just run it on a NAS and you don’t have to deal with configuring and dealing with samba or another filesystem: just do everything over a purpose-built HTTP-proxyable API.

I have a really overkill homelab and I just decided to run a S3 gateway with buckets and creds per client. This gets me what I wanted out of the repository server, but I’d imagine a lot of folks just want an easy way to do the same thing from their NAS without much headache.

Well… different buckets also mean distinct repos, no? In that case you loose the deduplication of your overall dataset and only get it per bucket. This, is probably still better than incremental backups and yes, you can keep the data private between the repo host and the actual Kopia client, no argument there.

However, dedup was mainly invented to tackle redundancies in large volumes of data and across the boundaries of different hosts and it surely depends on you type of setup, if it will be of use for you. In my case, I do have the same files on a Nextcloud share, as well as on multiple clients, so having a separate bucket for any client would void the deduplication for that use case.

I will concur that a lof of people want simple solutions, but blaming Kopia for not being one is not the way to go, because in the end most of these users actually want a simple, full-featured, but inexpensive, or better yet, free solution. Kopia ticks two of these three checks, but comes with the caveat, that its not easy to use, once you get into the weeds of it.

Yeah… but that’s the only way to do it since architecturally Kopia ties all crypto to a single repository secret. I can back up multiple clients to the same storage system and

  1. The clients can’t leak/learn each others’ data.
  2. The server can’t leak/learn any of the clients’ data.

Item (2) is what I think the repository server misses the mark on.

Yeah that setup seems fair to want such a feature, but OTOH why are you backing up those files from all your devices? I exclude caches and cloud-synced files from my backups, but I guess you could have them scattered across your FS?

If this is valuable to you, I’d question if you also need to be able to limit access control of the uncommon files. I could imagine an architecture where you have clients provide username, password, and repository key and you could take your tradeoff and I could take mine and the server is still blind.

I’d agree with you there. I’ve still gotta do a restore test, but I’ve been very pleased with Kopia so far. Yeah I’ve struggled a bit using it because the UX clearly doesn’t have 20SWE perfectly molding CUJs, but I don’t think I’ve fundamentally misconfigured anything in a way that my data is jeopardized. And if I did and I learn about it, I’ll similarly at least point out the footgun. This is a pretty big footgun if you’re just following along the instructions and hope the documentation is self-consistent.

They way I see it is that there are three entities involved here, one would be the client obviously, secondly the kopia server and thirdly the owner of the storage which could be some public S3 for instance. I am somewhat ok with me running 2 out of those 3 roles, and being certain that the S3 owner can’t see or read my files, regardless of if I move data directly from kopia or by proxy of a Kopia server. I might also add other peoples computers to my kopia-server setup and while a compromised server would compromise their data, it would still not let the third-party in this scenario read any of our data, and we could get dedup of common content to save up space and allow more backups-per-GB.

Anyone that thinks my kopia server might be insecure would be free to encrypt data and only send that via the kopia server, and then it would protect from reading from both party 2 and party 3, but would also make data incompressible and seldomly dedupable at all, in which case this person might aswell skip the whole server part and go directly from local kopia client → encrypt/compress → Storage because the gains would be zero in passing over the middle step.
It’s not a lot more complicated to run it directly than to set up the kopia client to a kopia server, so I don’t see that as a too hard issue either. If you want to share dedup and still have your own PW for the connection and some (barring bugs) separation between clients, then run a kopia server in between, if you don’t trust anyone else than you for nothing, then don’t.

Also, I’ve been doing “regular” backup systems (butc, amanda, NetBackup, TSM and so on) at work for some 25+ years and a LOT of them assumes that each client trusts the “man in the white robe” that operates the tape-robots.

Some of them can encrypt data before sending to tape, so that you can later send tapes to cold storage using untrusted couriers or whatever. Not saying this is the best or even a good choice, but that is how many of them are designed, and very few of them would take measures to allow clients not to trust the server software operator, so they would have client-side hooks for encrypting things before sending over, but it would also make stored-compression and dedup impossible at the server end. Only sending encrypted stuff is quite possible, but cumbersome for restores - especially bare-metal restores.

I think Kopia server somewhat replaces these setups, allowing clients to not have to trust each other (ie, not using same pw to be allowed to access any part of the repo) while retaining the things that kopia does really well, which is dedupe and inline fast encryption (so the storage provider can’t steal our data) but it is still in line with “we have to trust the backup service operator” assumption in order to keep the things kopia does well, OR you will have to run clients directly against storage since kopia server isn’t adding any value to you.

So while I get that it might sound that I am advocating for everyone trusting everyone else, or that there are no risks, or “don’t worry, it will be safe”, I do get that people should not put important data in someone elses hands just like that, I’m trying to advocate for that Kopia and specifically kopia server suits a certain subset of computer backups, and it does that well. No single system will suit all the corner cases you can imagine, and there are A LOT of those, trust me. So if kopia server isn’t for you, then it isn’t for you. It is that simple. No single design will be able to please everyone, and I would rather see that the kopia devs focus on the things kopia does well, and not chase after any small (or large) feature that someone can imagine or that someone else found in a previous system.

If your main goal is to have full dumps without dedup every weekend to make sure you have lots and lots of full dumps, stored in .tar-compatible format and only do incrementals off the latest weekend 0-dump, then NetBackup will suit you. If you need to backup DB2 with an agent or MS-AD/Exch to DLT tapes then IBM will gladly sell you TSM and tape robots for a buttload of money. We do have choices, but not all choices are equal and they can never be. If kopia devs thinks it should stay as it is, and I don’t like it, I will have to ask for my $0 back I guess.

1 Like