New snapshot = rescan whole folder tree? How long?

I’m setting up kopia to backup a friend’s Windows PC to my personal ubuntu server via sftp. As his internet upload speed is only 1 Mbps, at his place I copied his 1.2 TB of data on a USB drive and now that I’m back home I’m about to generate a first snapshot from this USB drive to a new repo on my server (via my LAN), then I’ll setup kopia on his PC to open the existing repo on my server via sftp through his ADSL connection, so that the next snapshot will only be incremental.

I’m wondering if for each new incremental snapshot, kopia will [A] scan all of my friend’s 1.2 TB of files and then incrementally sftp the changes (like rsync does), or if [B] there is some sort of cache that allows kopia not to rescan all folders. I assume it will be [A] as kopia is probably not intercepting the system files I/O to figure what has changed since last snapshot…

I’m afraid that for 1.2 TB it will be very long and that it may slow his PC during the scan, so I may split the snapshots into 3 parts of 400 GB each and do one of the 3 snapshots each day.

Any experience on how long it takes to scan 1.2 TB of data before starting the upload of the changes?

I’m also hesitating between creating a batch launched from Windows’ Task Scheduler vs. using kopia-ui (if I understood correctly, if kopia-ui is launched at startup, it takes care of generating the snapshot as per the policy set?). Any recommendation?

The size of data is not the major concern, it’s the number of directory entries - so tons of very small files will be slower than fewer larger files. But in general kopia does pretty aggressive caching and once metadata blobs are downloaded this should be fairly quick (I take ~1TB snapshots twice/hour with very few changes and mostly large-ish files and that takes ~10 seconds total on my NAS)

1 Like

wow, these are impressive numbers. Would a 1 Mbps link with high latency (ADSL) impact much the performance that you described on your LAN?

Jarek can provide more details on exactly how Kopia achieves it (although I am certain Kopia does not use a changed block tracker, which is actually a huge plus in my book), but, in general, scans are very quick with Kopia.

As example, 800 GB of data (110,000 files in 1,000 folders) takes less than 10 seconds for me. Kopia is so quick that I am able to run three concurrent Kopia repos that run at the same time, and it does not impact my performance at all. This is on modern hardware running SSDs, so if your friend is on older hardware, it will be more slow – but, still, Kopia will be one of the fastest backup solutions you could setup for your friend. For comparison, the same incremental backup with Veeam and Macrium take several minutes for me.

The bottleneck with your friend’s backup will not be Kopia, but rather his 1 Mbps connection.

Unless your friend is skilled at command-line, I strongly suggest he run KopiaUI. KopiaUI handles all the scheduling, etc., you just need to connect to the repo and setup the policy for whatever you want to backup.

A tip, though.

Kopia uses different file structures depending on the type of repo used. So if you backup via the local directory/NAS repo initially, you may not be able to simple have your friend connect to the repo via SFTP without having to reupload everything. However, there are two solutions to your problem.

The most straightforward solution is that when you are creating the snapshot locally from the USB, do it via the SFTP repo rather than local directory/NAS repo. (So, connect to your Ubuntu server via SFTP rather than as a network share.) This way, if you create an SFTP repo initially, your friend will have no issue connecting the SFTP repo.

If you cannot do that, the other solution to your problem is called sync-to, which allows you to literally copy one repo to another: see Synchronization | Kopia and repository sync-to sftp | Kopia. The one thing to keep in mind when using sync-to is to use the --times so that it synchronizes the times when syncing. Otherwise, you may run into issues with Kopia wanting to reupload files when you reconnect to the repo.

Thank you Jarek and basldfalksjdf.

I intended to make the first snapshot via sftp on my LAN as you suggested, though I read here that sftp and filesystem should be compatible.

Though to play it safe I plan to use sftp, I assume that it would be faster to use filesystem.

Ah, I did not know they were compatible. Thanks for letting me know. Might as well use filesystem if it is compatible with sftp.

I started the initial kopia filesystem snapshot, but I’m not satisfied with how I did it, could you recommend the best approach?

For my friend (Henri) to be able to make snapshots from his place, I created a user henri on my ubuntu server, then created a folder /mnt/5tb/repos/henri with henri:henri permissions to hold his repository. I will later edit /etc/ssh/sshd_config to limit henri to sftp access (no ssh, on a port other than 22) with ChrootDirectory to /mnt/5tb/repos/

henri is not in the sudoers and I wasn’t able to get kopia-ui to work from Gnome logged in as henri (I installed kopia from my user account with sudo apt install kopia-ui). Was it supposed to run?

On the other hand if I launch kopia-ui from my ubuntu user in gnome, then it fails because of wrong permissions to /mnt/5tb/repos/henri

So for now I have changed ownerships of /mnt/5tb/repos/henri to ubuntu:ubuntu and I created the snapshot from the ubuntu account.

Retroactively, I think I should have dropped kopia-ui and simply do via cli logged in as henri.

What would you recommend for this scenario in order to be able to use kopia-ui? (after the initial snapshot that is currently running, when I set the snapshots from his computer it will need to be from kopia-ui)

I assume I will have to chown /mnt/5tb/repos/henri to henri:henri for him to be able to connect via sftp?

I did a test on a new small snapshot (on a different filesystem repository than in the post above), initial snapshot from henri on my server via cli (k40f3ffaca967449ab03f5410fd71f21b), second snapshot (kf6261dea7ee45df40972d2947b284412, with one file added) via Windows kopia-ui and sftp to my ubuntu server:

henri@Ub:~$ kopia diff kf6261dea7ee45df40972d2947b284412 k40f3ffaca967449ab03f5410fd71f21b
. modes differ:  drwxrwxrwx drwxrwxr-x
. modification times differ:  2022-07-31 20:01:06.2235601 +0200 CEST 2022-07-31 23:07:11.973870113 +0200 CEST
. owner users differ:  0 1002
. owner groups differ:  0 10000
./Documents modes differ:  drwxrwxrwx drwxrwxr-x
./Documents modification times differ:  2022-07-31 14:16:25.8982158 +0200 CEST 2022-07-31 14:16:25.898215827 +0200 CEST
./Documents owner users differ:  0 1002
./Documents owner groups differ:  0 10000
./Documents/cuillères modes differ:  drwxrwxrwx drwxrwxr-x
./Documents/cuillères modification times differ:  2022-07-31 14:16:25.8982158 +0200 CEST 2022-07-31 14:16:25.898215827 +0200 CEST
./Documents/cuillères owner users differ:  0 1002
./Documents/cuillères owner groups differ:  0 10000
./Documents/cuillères/[Fichiers originaux] modes differ:  drwxrwxrwx drwxrwxr-x
./Documents/cuillères/[Fichiers originaux] owner users differ:  0 1002
./Documents/cuillères/[Fichiers originaux] owner groups differ:  0 10000
added file ./23h07.txt (442 bytes)

Kopia does not manage permissions to files or folders. You need to make sure whatever user you install or run Kopia with also has the appropriate permissions to the repo location. This holds true for both Kopia CLI and KopiaUI.

Also, not sure why you are messing with folder permissions. Why not just setup an SFTP user with right/write access to /mnt/5tb/repos/henri and then connect to an SFTP repo in Kopia with that SFTP user? That should work flawlessly.

That’s because of our previous comments, I thought filesystem would be faster than sftp (“Might as well use filesystem if it is compatible with sftp.”)

What my last test shows is that when comparing a snapshot done fia filesystem on ubuntu vs. a snapshot via sftp with same user but the drive mapped on windows via samba, then modes, users and groups differ, which I understand. I’m not sure of the first modification time delta though:
modification times differ: 2022-07-31 20:01:06.2235601 +0200 CEST 2022-07-31 23:07:11.973870113 +0200 CEST

These are linux / windows filesystem issues, not kopia. I just assume / hope this won’t impact recovery the day I need it.

Make sure to test your backup after it has completed by running kopia snapshot verify --verify-files-percent=100. This will download the whole repo and decrypt/decompress the files. This helps ensure that the backup is valid and can be restored.

It is recommended to regularly test backups.

I noticed that a repo created with filesystem by default creates folders with permissions 700 and files 600 (I found afterwards but too late the options --file-mode and --dir-mode).

On the other hand a repository created via sftp creates folders with permissions 755 and files 644.

On the filesystem repo can I simply

sudo chmod go+r /repo_folder

Or do I need to also change something in a config file for future snapshots to use the proper mode?

Note: kopia repository status reports the same “Storage config” section for the 2 repos (not sure what 384 / 448 refer-to):

Storage config:      {
                       "path": "/repo_folder",
                       "fileMode": 384,
                       "dirMode": 448,
                       "dirShards": null

duh, 384 is 600 octal but in decimal.

Should I just change the 384 to 420 (644 octal) in ~/.config/kopia/repository.config ?

Or do I need to run a repository repair?