Offload Client Processing to Server Repository

Hello, I’ve go through some of the documentation, but couldn’t find anything related to offloading the processing of incrementals, deduplication and compression to the server repository.

As an example. I’m looking to stream files from a host, and then have Kopia create the incremental, deduplicate and compress the data. The hosts in question can’t do this as it would impact the application.

Kopia doesn’t support that. I also don’t know about any application, that runs that dedupe on raw data - but there seem to be some storage solutions, which does this - just check a couple of post before, where the question about disabling data encryption had been raised.

What you want to achieve is often referred to as pull based backup (server pulling data from the client). As @budy already noted this is not supported by Kopia out of the box. Kopia assumes that the repository is insecure and encrypts all data before sending it to the backend (push based setup).

You can however share your data from the host (e.g. NFS, Samba, SSHFS) and mount it on the server. Then do backups on the server by reading the mount. Note that reading data from a network share can have drawbacks.

Thanks.

That makes sense, it would never be possible to dedupe encrypted data. Encryption in transit still will result in the repository seeing data unecrypted.

NFS and Samba are great for a within-rack solution, but spanning data centers would require private networking, and there might be some latency issues with those protocols. I’m not a huge fan of SSHFS from past experiences; I also haven’t utilized it recently and it may be improved.

The solution I was hoping for was a method to backup data from servers anywhere, behind firewalls and on the internet and offloading the heavy lifting to the repository server. IBM’s Tivoli Storage Manager does this; I’m only bringing it up as a reference; it’s hugely expensive.

I suggested this approach as a workaround but I’m no fan of it neither.

In my experience most of the heavy lifting is done by the initial snapshot and the maintenance. Subsequent snapshots are barely noticeable because only new data needs to be calculated. And reducing the load on any active services by setting Nice and IOSchedulingPriority in my systemd service file has worked well for me. But I’m not running any services which are sensitive to load.

I think everything has it’s place, even Kopia :wink: I do like to know the options, and their advantages and limitations. I’ve had great success with NFS and Samba, but also major issues.

This would mostly be for servers with websites, and docker containers.

I guess the changed files process is the issue, mostly because the sites would be WordPress or PHP based. So that could mean a ton of media files.