On my ubuntu server I made a filesystem snapshot of the folder xyz
from a large external USB drive that holds a copy from 3 different folders from a remote Windows machine (1.2 TB of data):
xyz/c_users_henri
xyz/d_
xyz/f_
Then I intend to run three separate snapshots from the original Windows machine via sftp:
Snapshot 1: c_users_henri
Snapshot 2: d_
Snapshot 3: f_
(if more context is needed see this post)
From a quick test I ran on a small repo where I did an initial snapshot followed by a second snaphot of a subfolder of the inital one, kopia found that the subfolder I was making a snapshot of was already there and the total size of the repository did not change (deduplication works!! ).
I am wondering however if with 1.2 TB of data, deduplication / finding out that the data already exists may slow the incremental sftp snapshots significantly?
Would it be better that I start over and instead of making the initial filesystem snapshot of xyz
, I make 3 different filesystem snapshots of each main subfolder, as I will do these 3 snapshots from the remote machine?
Or I instead, on the remote Windows machine should I mount the 3 folders under one same parent with a junction or hard-link (MS article)?
Not sure I understand the query, so someone else feel free to jump in. But deduplication is going to be way faster than reuploading already uploaded files, no?
the reuploading is a one time thing on my LAN, versus daily deduplication from the remote site over 1 Mbps upload.
I do not know how much overhead deduplication adds, if any.
This is a moot point because there is no way to disable deduplication in Kopia. If you dont want to deduplicate, you should not use Kopia.
Sorry, I was not very clear, let me clarify and report (amazing) performance results.
I made an initial filesystem snapshot of folder xyz
. This folder contains 3 subfolders, a
, b
and c
, for a total of 1.2 TB.
Then via sftp I made a new snapshot of a
, a new snapshot of b
and a new snapshot of c
(it was not possible to make a snapshot of xyz
).
I was concerned that because the new snapshots were not of xyz
with its 3 subfolders, it would every time take a very long time to detect what was already where on the repository.
Results:
- The first 3 snapshots of ‘a’, ‘b’ and ‘c’ via sftp @ 1 Mbps took 17h46 (3h24+7h46+6h36). This was just to analyze, no new files were uploaded
- The subsequent 3 snapshots of ‘a’, ‘b’ and ‘c’ via sftp @ 1 Mbps only took 7min29 (1:45+3:41+2:3) (no new files to upload).
I find this totally mind boggling that 1.2 TB could be analyzed so fast. I am very impressed with kopia, congrats to Jarek and all contributors.
1 Like