Offload Client Processing to Server Repository

jordantrizz · August 21, 2024, 1:55pm

Hello, I’ve go through some of the documentation, but couldn’t find anything related to offloading the processing of incrementals, deduplication and compression to the server repository.

As an example. I’m looking to stream files from a host, and then have Kopia create the incremental, deduplicate and compress the data. The hosts in question can’t do this as it would impact the application.

budy · August 22, 2024, 9:16am

Kopia doesn’t support that. I also don’t know about any application, that runs that dedupe on raw data - but there seem to be some storage solutions, which does this - just check a couple of post before, where the question about disabling data encryption had been raised.

dimejo · August 22, 2024, 12:13pm

What you want to achieve is often referred to as pull based backup (server pulling data from the client). As @budy already noted this is not supported by Kopia out of the box. Kopia assumes that the repository is insecure and encrypts all data before sending it to the backend (push based setup).

You can however share your data from the host (e.g. NFS, Samba, SSHFS) and mount it on the server. Then do backups on the server by reading the mount. Note that reading data from a network share can have drawbacks.

jordantrizz · August 22, 2024, 1:28pm

Thanks.

That makes sense, it would never be possible to dedupe encrypted data. Encryption in transit still will result in the repository seeing data unecrypted.

NFS and Samba are great for a within-rack solution, but spanning data centers would require private networking, and there might be some latency issues with those protocols. I’m not a huge fan of SSHFS from past experiences; I also haven’t utilized it recently and it may be improved.

The solution I was hoping for was a method to backup data from servers anywhere, behind firewalls and on the internet and offloading the heavy lifting to the repository server. IBM’s Tivoli Storage Manager does this; I’m only bringing it up as a reference; it’s hugely expensive.

dimejo · August 22, 2024, 2:49pm

I suggested this approach as a workaround but I’m no fan of it neither.

In my experience most of the heavy lifting is done by the initial snapshot and the maintenance. Subsequent snapshots are barely noticeable because only new data needs to be calculated. And reducing the load on any active services by setting Nice and IOSchedulingPriority in my systemd service file has worked well for me. But I’m not running any services which are sensitive to load.

jordantrizz · August 22, 2024, 5:39pm

I think everything has it’s place, even Kopia I do like to know the options, and their advantages and limitations. I’ve had great success with NFS and Samba, but also major issues.

This would mostly be for servers with websites, and docker containers.

I guess the changed files process is the issue, mostly because the sites would be WordPress or PHP based. So that could mean a ton of media files.

Topic		Replies	Views
A question regarding Kopia Repository Server General	5	543	April 14, 2024
Compression on client side General Topics	7	110	March 25, 2025
Maximum usable size of the repository? Petabyte scale possible? General Topics	16	2513	September 26, 2023
Server with remote storage General Topics	3	511	January 15, 2023
Split and then merge repository? Support	5	503	June 9, 2022

Offload Client Processing to Server Repository

Related topics