Performance impact initial snapshot, then snapshots of subfolders?

bobmorane · July 31, 2022, 10:17pm

On my ubuntu server I made a filesystem snapshot of the folder xyz from a large external USB drive that holds a copy from 3 different folders from a remote Windows machine (1.2 TB of data):

xyz/c_users_henri
xyz/d_
xyz/f_

Then I intend to run three separate snapshots from the original Windows machine via sftp:

Snapshot 1: c_users_henri
Snapshot 2: d_
Snapshot 3: f_

(if more context is needed see this post)

From a quick test I ran on a small repo where I did an initial snapshot followed by a second snaphot of a subfolder of the inital one, kopia found that the subfolder I was making a snapshot of was already there and the total size of the repository did not change (deduplication works!! ).

I am wondering however if with 1.2 TB of data, deduplication / finding out that the data already exists may slow the incremental sftp snapshots significantly?

Would it be better that I start over and instead of making the initial filesystem snapshot of xyz, I make 3 different filesystem snapshots of each main subfolder, as I will do these 3 snapshots from the remote machine?

Or I instead, on the remote Windows machine should I mount the 3 folders under one same parent with a junction or hard-link (MS article)?

basldfalksjdf · August 1, 2022, 5:29am

Not sure I understand the query, so someone else feel free to jump in. But deduplication is going to be way faster than reuploading already uploaded files, no?

bobmorane · August 1, 2022, 3:58pm

the reuploading is a one time thing on my LAN, versus daily deduplication from the remote site over 1 Mbps upload.

basldfalksjdf · August 2, 2022, 7:10pm

I do not know how much overhead deduplication adds, if any.

This is a moot point because there is no way to disable deduplication in Kopia. If you dont want to deduplicate, you should not use Kopia.

bobmorane · August 3, 2022, 10:23am

Sorry, I was not very clear, let me clarify and report (amazing) performance results.

I made an initial filesystem snapshot of folder xyz. This folder contains 3 subfolders, a, b and c, for a total of 1.2 TB.

Then via sftp I made a new snapshot of a, a new snapshot of b and a new snapshot of c (it was not possible to make a snapshot of xyz).

I was concerned that because the new snapshots were not of xyz with its 3 subfolders, it would every time take a very long time to detect what was already where on the repository.

Results:

The first 3 snapshots of ‘a’, ‘b’ and ‘c’ via sftp @ 1 Mbps took 17h46 (3h24+7h46+6h36). This was just to analyze, no new files were uploaded
The subsequent 3 snapshots of ‘a’, ‘b’ and ‘c’ via sftp @ 1 Mbps only took 7min29 (1:45+3:41+2:3) (no new files to upload).

I find this totally mind boggling that 1.2 TB could be analyzed so fast. I am very impressed with kopia, congrats to Jarek and all contributors.

Topic		Replies	Views
New snapshot = rescan whole folder tree? How long? Support	12	743	August 4, 2022
Slow incremental snapshot of remote sftp drive to local repository Support	2	224	November 24, 2023
Any tips for faster snapshots of large (+50 TB) backups? General Topics	0	297	January 24, 2024
The incremental backup process is slow Support	2	396	November 10, 2022
Kopia Incremental Snapshots are slow Support	3	495	November 21, 2021

Performance impact initial snapshot, then snapshots of subfolders?

Related topics