Performance impact initial snapshot, then snapshots of subfolders?

On my ubuntu server I made a filesystem snapshot of the folder xyz from a large external USB drive that holds a copy from 3 different folders from a remote Windows machine (1.2 TB of data):

xyz/c_users_henri
xyz/d_
xyz/f_

Then I intend to run three separate snapshots from the original Windows machine via sftp:

Snapshot 1: c_users_henri
Snapshot 2: d_
Snapshot 3: f_

(if more context is needed see this post)

From a quick test I ran on a small repo where I did an initial snapshot followed by a second snaphot of a subfolder of the inital one, kopia found that the subfolder I was making a snapshot of was already there and the total size of the repository did not change (deduplication works!! :slight_smile:).

I am wondering however if with 1.2 TB of data, deduplication / finding out that the data already exists may slow the incremental sftp snapshots significantly?

Would it be better that I start over and instead of making the initial filesystem snapshot of xyz, I make 3 different filesystem snapshots of each main subfolder, as I will do these 3 snapshots from the remote machine?

Or I instead, on the remote Windows machine should I mount the 3 folders under one same parent with a junction or hard-link (MS article)?

Not sure I understand the query, so someone else feel free to jump in. But deduplication is going to be way faster than reuploading already uploaded files, no?

the reuploading is a one time thing on my LAN, versus daily deduplication from the remote site over 1 Mbps upload.

I do not know how much overhead deduplication adds, if any.

This is a moot point because there is no way to disable deduplication in Kopia. If you dont want to deduplicate, you should not use Kopia.

Sorry, I was not very clear, let me clarify and report (amazing) performance results.

I made an initial filesystem snapshot of folder xyz. This folder contains 3 subfolders, a, b and c, for a total of 1.2 TB.

Then via sftp I made a new snapshot of a, a new snapshot of b and a new snapshot of c (it was not possible to make a snapshot of xyz).

I was concerned that because the new snapshots were not of xyz with its 3 subfolders, it would every time take a very long time to detect what was already where on the repository.

Results:

  • The first 3 snapshots of ‘a’, ‘b’ and ‘c’ via sftp @ 1 Mbps took 17h46 (3h24+7h46+6h36). This was just to analyze, no new files were uploaded
  • The subsequent 3 snapshots of ‘a’, ‘b’ and ‘c’ via sftp @ 1 Mbps only took 7min29 (1:45+3:41+2:3) (no new files to upload).

I find this totally mind boggling that 1.2 TB could be analyzed so fast. I am very impressed with kopia, congrats to Jarek and all contributors.

1 Like