Questions i cannot find an answer to

  1. Kopia uses symmetric encryption today (AES-256-GCM and CHACHA20POLY1305) but any authenticated encryption scheme is easily pluggable.

  2. Yes, Kopia is designed to handle all kinds of crashes and ideally not redo work that has been done. During long snapshots Kopia will write checkpoints every 45 minutes or so, which will be reused on next snapshot attempt to not only avoid uploading data again but in many cases also avoid hashing. The partial snapshot is transparently merged with last full snapshot from the history to get good incremental performance.

  3. Yes. Kopia runs lock free in all situations. It has optional server mode but that does not introduce locks and is primarily for better access control and to avoid storing low-level repository credentials on client machines. Instead of locks, kopia relies on passage of time for safety of its maintenance operations so it requires somewhat reasonably synchronized clocks (drift of several seconds to minutes is fine, but hours not so much)

  4. Kopia uses multi-stage maintenance routine to perform what purge does in Restic and others. It’s described here Details of maintenance command - #8 by jkowalski and I think it’s quite fast:

On my main personal repository of 730GB and 1.5M contents (file chunks), I’m running full maintenance every 4 hours and it currently takes less than 40 seconds to complete the full cycle (on my home internet which is 500Mbps symmetrical). Full maintenance performs a walk of the snapshot tree and deletes unreachable contents. This is possible through efficient index structures and separate cache for metadata and data and lots of heavy parallelization and sharding to efficiently use all local machine and network resources.

The following stats show Kopia maintenance repackages virtually all data into pack blobs of around 22.5MB each.

$ kopia blob stats
Count: 32514
Total: 729.9 GB
Average: 22.4 MB
Histogram:

       70 between 100 B and 1 KB (total 12.6 KB)
      160 between 1 KB and 10 KB (total 689.2 KB)
        1 between 10 KB and 100 KB (total 49.5 KB)
       39 between 100 KB and 1 MB (total 14.2 MB)
        2 between 1 MB and 10 MB (total 12.4 MB)
    32242 between 10 MB and 100 MB (total 729.8 GB)

This shows that the total size of live data is very close to the physical storage size: 729 GB blobs (physical) vs 722 GB contents (logical)

$ kopia content stats
kopia content stats
Count: 1493817
Total: 722.8 GB
Average: 483.8 KB
Histogram:

    83096 between 10 B and 100 B (total 4.3 MB)
   618051 between 100 B and 1 KB (total 267 MB)
   352019 between 1 KB and 10 KB (total 1.1 GB)
   111774 between 10 KB and 100 KB (total 3.6 GB)
    78833 between 100 KB and 1 MB (total 35 GB)
   250044 between 1 MB and 10 MB (total 682.7 GB)

Note that content sizes are not related to source file sizes.

I know folks have Kopia for repositories of 10s of TBs, I’d be curious to know their stats as well.

2 Likes