CLI progress bar

Trying to understand the CLI progress bar.

Have a long running snapshot running since last night and when I checked it this AM noticed that the most current line had a number hashed lower than previous line. Is that because of checkpoint and need to start rehash? How about the percentage all the way to the right? is it lower because of the rehash?

Even I doubt the progess bar.

I was backing up a single vmdk file around 85gb the progess stated going above 87g :thinking:
I will test this in a fresh new repo later

I had come accross the post that the UI shows incorrect percentage there seems to be something even in the cli

Is there a verbose flag for debugging

Yes, progress bar was all messed up for long uploads due to checkpointing. I have a fix in the works. See this topic for info on upcoming improvements.

Or this:

1 Like

I tried this with and without the --progress flag in 0.07-rc1 and saw not difference in the output. It is not clear to me what the items are trying to say. The last % went to 100% and stayed there while the backup continued.

Besides when the backup is run in the background with systemd you don’t get to see the output. I have to pipe it to file and try to determine the progress by looking at the logs and compare the number of finished reading directory log outputs with the list length given by kopia snapshot estimate.

The percentage Kopia is showing is compared to last known snapshot of the directory, if it changed significantly or if the previous snapshot was incomplete, it will indeed sit at 100% for a long time.

What are you suggesting we do instead? I really don’t want to pollute the directory name to the output.

For a while I considered scanning the source tree in parallel to compute the actual source size which would make the progress bar accurate, but this will slow down the snapshotting process since we’ll be reading each directory twice.

As a new user it isn’t clear what the numbers are saying. Now you clarify that it is a directory by directory comparison I’m not sure what the value is to the user. A user wants to know the overall progress in getting to the end of a snapshot. Directory based progress is secondary although maybe important for debugging or other purposes.

Even at a directory level of overall progress it doesn’t seem too computationaly consuming to calculate the number of directories (a one time pass at the start) that are to be processed and then counting them off as you go to give an overall percentage perhaps?

At the moment I have to use 2 logging files to capture the outputs. All logs to file are most important I think because with systemd running as a background process the file logs are all you have. The logging is also alot of work to set up and capture for every command as I like to store my logs with each repo for easier lookup. e.g. my create command and other commands have to look like this.

			kopia snapshot create \
				  -p "$PW" \
				  --description "" \
				  --progress \
				  --force-hash 0 \
				  --log-dir "${LOGDIR}" \
				  --log-file "${LOGPATH}" \
				  --file-log-level debug \
 				  --log-level info \
				  "${SOURCE[i]}" |& log

from which I get the overall progress by counting the directories completed.

I have a pending change to improve the progress indicator by computing the size of the directory tree to be uploaded in parallel to the actual upload.

This estimation tends to complete very quickly and usually after few seconds we’re able to display nice percentage progress and relatively precise ETA:

Let me know whether it makes sense. BTW, please don’t rely on parsing the logs as their contents are not guaranteed. It’s better to make the required changes in the code directly if you think those are going to be beneficial to others.

Seems to me that you are still only tallying up the progress of each directory not the overall progress of the job.

With what you displayed in the video I would change the arrangement from:

* 0 hashing, 27 hashed (1.8 MB), 139709 cached (22.3 GB), 3 uploaded (37.5 KB), 0 errors, estimated 22.3 GB (100.0%) 0s left

to (with the correct text added in where I’ve added a question e.g. [files/chuncks/?] or as appropriate). Also added an overall progress section.

* 0 hashing [files/chuncks/?], 27 hashed (1.8 MB), 139709 [files/chuncks/?] cached (22.3 GB / 22.3 GB (100.0%) 0s remaining), 3 uploaded [files/chuncks/?]  (37.5 KB), 0 errors | Directories (or Files) processed: 2345 / 245000 (45%) 1d:3h:45m:10s remaining

My initial snapshots are taking anything up to 30 hours to complete so the overall progress is helpful.

The estimation process is tallying up the file count and total file size before any deduplication. This is exactly the same as during upload so should be accurate.

This is now released in v0.7.0-rc2

Oh, I think I understand what you meant by “Seems to me that you are still only tallying up the progress of each directory not the overall progress of the job.”. This command is actually taking multiple snapshots in sequence, each of them has separate progress bar.

Is there some way this progress bar could be logged to its own file as a feature request?
Added with each update of its output to a new line?

In its current form it is difficult to capture with its “carraige return” (CR ‘\r’) (0x0D) character to a file via a pipe.

1 Like

(sorry for bumping an old thread - this is the thread that comes up when searching for carriage return or pipe or cli progress, and I think this will be helpful to others)

I’m running kopia snapshot in kubernetes and I was having trouble capturing progress logs also. The problems were that every progress update overwrites the previous (rather than being on a new line) and that the stdout and stderr was buffered so that I would get nothing and then all the progress reports at the end after it finished. This was slightly annoying for centralised logging.

The solution I found was to run kopia as:

kopia snapshot create [dir] --progress-update-interval 1m 2>&1 | stdbuf -oL -eL tr '\r' '\n'

This reduces the updates to 1 every minute, and then more importantly, stdbuf sets the stdout and stderr buffer to “line mode” so it prints after every line, and tr '\r' '\n' replaces the carriage return \r (used for overwriting a cli line) with a “true” newline, so that each progress line is printed on a new line. This makes the progress updates act in a way that’s more helpful for pipes and central logging.

1 Like