Shingled Drives - A tip/warning

I had errors in my repository - but I’m not knowledgeable enough to establish causality. Would think that shingled drives might have fewer errors since they always perform a read after write verify?

The new CMR drive seems to be working fine…so it’s possible Kopia just doesn’t like shingled drives?

The problem with the SMR technology is that it works fine only as long as you mainly have write once / read many data. If you want to use it for continuous and repeating write operations you are in a bad spot very quickly, all the more so if you want to overwrite old data.

The reason is the S in SMR: shingled. The data are put to the disks like shingles on a roof, and like those if you want to replace some the disk has to tamper with the surrounding shingles as well. Of course those SMR disks have a cache where they can put the old shingles that may not be overwritten, but if you write a lot of data that buffer will be exhausted resulting in long i/o wait states.

Apparently ALL of the drive manufacturers are substituting shingled drives when they can get away with it. Those drives are NOT labeled as SMR.

I work with storage and we have tried SMR for storage once, and it crashed and burned. We hoped it would be write-one-read-many but when drives fail and you start recovery, the S in SMR start hurting you, so I agree fully with this post, make as sure as you ever can that you don’t get spindrives that are SMR. It’s even worse if manufacturers try to hide this information, which makes it even harder on us consumers, but whatever you read or see about SMR being usable and good value, just forget it and make sure you pay that extra bit of money to get “real” drives that not only can store data, but can also handle repairs and other small-IO intensive operations that will occur at some point on your storage system. SMR is to be avoided. There is a reason for them being so cheap, and that reason will cause tears and sadness in the long run.

I’m very interested in your insights here because all of this is new to me.

So you are saying that the risks of SMR drives overwhelm their “advantages” as a kind of near non erasable magnetic “paper”?

I’ll admit that none of my current tools for evaluating disk health are trustworthy with these things. There is just too much slight of hand with their fancy controllers. So it’s very difficult to “see” the actual disk.

But lets say, knowing this, you use these things only for near immutable archives. “Write once/Read many”. Are you claiming there still would be hell to pay?

Victoria show random “slow blocks” on these things. And a Microsoft format with /p 5ea iterations took over 2 weeks with these drives. But I still suspect that if you had the time to mess with SMR, you could still move anything weak to the $badclus file. Perhaps nothing doable for a “time is money” commercial operation…but would it be usable for cold archives?

At this point, I don’t see SMR as real drives. Instead see them as paper. So my concern moves to “will this paper retain the writing”? There are plenty of rsync clones and hash change detectors out there. Wouldn’t such tools be enough? Stanford’s old rule “lotsa copies keep stuff safe” might apply? ie “You have a copy”, “the hash checks”? Would you be comfortable with such an abbreviated use knowing what you know now?

Some background. We operated a snapraid array for maybe 14 years. During that time, we saw many disk failures - but no bitrot. Slowly coming to the realization that disk error correction is actually pretty good. So for non changing data like movies…just having a spare copy is enough. That’s the kind of application I’m envisioning for SMR. “Just enough”.

I’d also be very interested in learning how long SMR platters will hold data. But that’s not information where any disk manufacturer can serve as a trusted source.

It does bother me that SMR when used in Kopia threw a bad hash warning. I would have expected data loss - but not data change. It’s particularly unnerving since SMR drives apparently perform an automatic read after write. Something a normal disk never does. I would think that SMR drives would be slower - but never “inaccurate” compared to CMR drives?

I haven’t thought of it as “paper” before, but more as “tape”. You can use SMRs if you vomit large blocks of data onto it (128k or more to match the disks minimum size), and somehow keep tabs on where your blobs are on this “tape”. It will probably work great as holder of data in that regard. You would not be able to take an object in the middle, grow it 55k and write it back into the same place, just as you can’t do that with tapes. This parallel also shows that noone would run a file system on a tape, it would just be weird to think it would work, even if disk and tape both can store data on them, it still doesn’t mean they are good at the same things.

People still use lots of tapes but they have systems that talk to tapes in the manner that tapes wants to be talked to, and they do not try to squeeze disk behaviour onto tapes. My view of SMRs is that if you can find a way (a driver, a “filesystem”, a program) that will treat the SMR disk as a tape, moving the index to something outside and only putting large blobs onto the SMR in a decent linear fashion, you can get great value for your money. You can read it as much as you like, in small or large chunks, but the writes need to know how SMR behaves, and from what I know now, zero file systems seem to know that. They act as if it would be optimal to make 512b or 4096b changes and then move to another cylinder and repeat. This will give you super poor performance on SMR drives.

The bad part of SMR is that they do have a “fast” part which acts like normal disks, so you can benchmark poorly and get “decent”, normal results and then move on thinking that you were lucky. But as you use up this fast part, the disk will internally move data from the fast part onto the slow part, while you are trying to also inject data, so you are not getting the slow performance, you are getting even slower than this, because internally it is causing IO on itself. People who end up in this situation do the wildest things, including resetting the computer/NAS box, but the SMR will just restart these internal ops after power-on, so it just keeps being bad until you stop writing more and then wait for long times.

You will easily find posts like this:
https://www.reddit.com/r/DataHoarder/comments/i2zmum/i_knew_smr_had_slow_write_speeds_but_1mbsecond/?rdt=48564
where you get to learn how the drive is going to be busy for hours or days while you see 1990s hard drive speeds.

So, if you treat it like a tape, let it flush from the fast part into the slow at some point of the day, you can get value from it just like you can get value from tapes if you only value cost-per-TB, but if you value any other thing, like interactive performance for small ops or disklike behaviour (like you have when fsck/chkdsk runs) then SMR is not for you. Regardless of the price value.

Thank you for your comments. They’ve been a big help.

Your “tape” description is apt, and actually there were attempts to use tape as a random access device during the early days. A small version of the OS they used was included with the 8 “track looking” tape developed in the Netherlands. And it did work (cuz I owned one). But boy was it ridiculous back then going from one end of the tape to the other :slight_smile:

“Echo” software sounds about right. Their system maintained a parallel database to find your stuff.

The tape was fragile and not very reliable. Manufacturer made big claims for it tho :slight_smile: