I’m not sure if my problem resides here, with my storage aggregating software, or the drives themselves but I’ll start here (actually now that I think about it the problem could even be the HBA I’m using).
Basically when I try to run a Kopia snapshot I get several errors during the process that reads “The request could not be performed because of an I/O device error,” in the logging when trying to read files from the source, which is a group of 8 6TB SAS drives being aggregated by StableBit DrivePool. Whenever this happens I think it’s just dropping the file and doesn’t attempt to read it again.
Initially I reran the snapshot after telling it to ignore file errors and I think it managed to grab everything in the directory on that attempt but then I decided to remake the entire repository on the USB drive I was saving to due to wanting to reformat the drive itself with a different filesystem. Then I started a completely new snapshot from zero with the same error handling policy and when it finished the new snapshot was missing files again that came up with that I/O error.
I think I’ve also seen this error when trying to write to the drive pool when using FastCopy so it suggests that the problem is with the physical media (either the array of drives or the USB devices I’m reading/writing on) but at the same time every piece of data on the pool is perfectly readable (I’m not sure if it’s because I’m mirroring the files and I’m reading a good copy or if the array’s integrity is simply fine). I’ve copied out the files that get targeted by that error and they copy off the pool without corruption and HD Sentinel has indicated all the drives in the pool are in good condition (they are used so having their SMART data manipulated isn’t impossible but the uptime readings don’t look manipulated and I feel like HD Sentinel should be able to get more about the drive’s state than simply going by what SMART is reporting). So I don’t know what’s triggering this error and is there a way to work around it (like telling Kopia to retry reading files that kick back this error - because I know they can be read)?
I have noticed the array goes through waves of activity followed by periods where nothing happens then it resumes. I’m not sure if that’s what’s causing this (like if this period happens in the middle of trying to access a file and it times out or whatever then it’s treated like an I/O failure).