Error: snapshot task: unable to save snapshot: error putting manifest

Hello,

I have a Kopia server (Docker) running behind a Traefik proxy, the repository is on an S3 bucket.
With an increasing number of clients (Windows, now about 100 servers), I have recently been getting the error message “Error: snapshot task: unable to save snapshot: error putting manifest: EOF: EOF” more and more often. What can I do to find out the reason?
The problem could be that all clients are trying to create their backup at the same time via scheduling. Is there a way to add a random time (0-30min) to the schedule (currently fixed times: 0:00, 2:00, 5:00, 10:00, 15:00, 19:00, 23:00) so that the backups are not all sent to the Kopia server at the same time?
Or what possibilities do I have to analyze the cause of the error in the logs? Unfortunately, I cannot see the user/host in the error messages (email). This makes it difficult for me to identify the client and the cause.
Is the problem:

  • the proxy (Traefik)?
  • the Kopia server
  • the connection to the S3 bucket
  • possibly the general resources of the server

I would be very grateful for any information.

Tobias

I think the main problem was the traefik timeouts - that was set to 600s instead of zero to disable it. I found in the traefik logs that the 600s was reached by many of the POST-Request.
[22/Jul/2025:17:02:12 +0000] “POST /kopia_repository.KopiaRepository/Session HTTP/2.0” 200 409 “-” “-” 66250 “kopia@docker” “https://172.30.0.2:51515” 600000ms

I have also found a simple way to distribute the CPU and network load. In the global policy I have set as before action “powershell -command ”Start-Sleep -Seconds (Get-Random -Minimum 0 -Maximum 1800)" and set the timeout for the action set to 2000 seconds.

This means that the backups of the 100 hosts are randomly distributed over 30 minutes, and the CPU and Network-Load is not so high at once.

Maybe this also helps others to reduce the load on kopia server for fixed time based backup-schedules.

1 Like