I’m having a bad day with the docs, I can’t seem to find where to set the kopia chunk sizing settings for a repository. my backend is S3 compatible, but performs really poorly with small files. kopia seems to break everything down into 20-30 MB chunks, when what I really want is to pack things into 1-32GB chunks, configurable by me.
am I missing something? is the chunk size just hardcoded for now?
How big is your data set you’re trying to backup? There are reasons why chunks are roughly 20-30MB size - mostly to be able to manipulate them in memory more efficiently. This may be slightly tunable (in the future, not right now) but I don’t think gigabytes are a good idea as they would require a lot of RAM to operate the repository.
I’ve got two initial target systems in mind for backups. my desktop has about an 80GiB working set, and my server has about 500 GiB. I guess I don’t know enough about the design to understand why larger chunks would mean more RAM consumption, I’d expect it to be using something like mmap so that this is pretty much a non-issue. but I’ll take your word for it that this is a Bad Idea with the current design.
unfortunate, because the current chunksizes basically make kopia a non-starter for me. which is too bad, since it seems like a lot of the memory issues I’d seen before have calmed down.
What kind of performance are you seeing vs expecting to see? Folks are regularly able to saturate multi-gig links when uploading with high parallelism. Did you run any specific benchmark?
with large file uploads to this provider using standard S3 tools, I can get in the order of 600-900 Mbps sustained (over a 1g link). with kopia at any concurrency level, I barely ever get more than 100MBbps, and that’s definitely not sustained. I’ve tried concurrency 8,16,32,128,256, although 128 and 256 are completely unusable given the existing memory constraints. during these tests, my system remains mostly idle, from CPU to disk IO.
(system is a ryzen 5600X, high performance M.2 SSD, 64GB of ECC RAM)
I’ll see about digging that up when I get a chance. to be clear though, this is expected behaviour with small files and this storage provider, it’s part of the design. so I’m not expecting kopia to have any magic that can improve it.
I’m rather sure I have asked kopia to change the chunk size on one of my S3 repos once, and as said, it eats memory to redo smaller chunks into larger that scales with the sizes involved, but it worked. Since then I make sure to set the size I want from the beginning when creating repos so I don’t have to pull down many small pieces and rework them into a larger one and put it back up, but you can do it if you need to.