[KLUG Hardware] re: "new" RAID level
Adam Williams
hardware@kalamazoolinux.org
02 Aug 2003 14:16:03 -0400
> >Yep, they should have called their other controllers RAID 0.9?
> At least it would have made them explain how broken their RAID 1
> implementation is.
> >>>There's no reason
> >>>not to when the drive contents are identical and it seems to me to be a
> >>>trivial amount of extra code.
> >You have to keep track of which requests have ben dispatched to which
> >drives. Turns out to be much worst than "trivial"
> Still doesn't seem too bad. Just two queues rather than one, since
> ATAPI doesn't support out of order execution. You probably know more
> about it than I do, though.
Yea, ATAPI stupidity probably makes this easier than on SCSI. But it
does require the controller to have more scratch RAM to keep track of
all those requests when they really start coming in heavy. You can't
ever say "Ooops! Sorry by read distribution hash table is full." The
overlying OS would probably be real unhappy. Then operations on drive B
may finish before the read operaiton on drive A, but the OS asked for A
first, and you have to (on ATAPI) return these things in order - that is
what the OS expects, after all, it is ATAPI - so need to buffer B till A
comes in and you've sent that, and on and on. Some new IDE RAID cards
I've glanced at the specs for look they they actually emulate a SCSI
bus/controller to the OS, this is alot smarter as it gives them more
flexibity in doling data back out to the OS, reordering operations,
etc...
> >>>The part that bothers me is that in the IDE world, broken RAID 1 has
> >>>become such a standard that they have to invent another name for it when
> >>>somebody finally does it right.
> >ATAPI RAID is just silly anyway. A stupid bus without any error
> >detection/correction; and duping drives is going to help?
> Depends on the logic that they use to decide which drive is right if
> they disagree. read-after-write would help, although the performance
No. I'm not talking about drives, I'm talking about the *BUS*. If I
send "000011110001000011100011100001111" across the bus to the drive
what if (due to capacitence, voltage variation, bad connections, cosmic
rays, radio interference) the drive receives
"000011110001000011100011100000111"? It writes it to a sector and
responds "OK" to the controller. The error could go unnoticed for a
very long time, and get propogated to the rest of the array. SCSI
demards parity and error checksums. ATAPI does has a spec for this, but
it is not mandatory and alot of controler/drives don't bother to
implement it. If your data is really important to you, enough to bother
RAIDing it, then IMHO, this problem just rules our ATAPI anyway. If
your just after faster reads (mirroring, true RAID 1) or more space
(drive aggregation) or just protection from HDA* failure, then ATAPI is
fine. But don't get fooled into thinking your getting an robust
industrial data redundancy mechanism.
(*)HDA="Hard Drive Assembly", not "hda" as in "/dev/hda1". The physical
mechanism of the drive. A common acronym in the pSeries(RS600) /
iSeries / AS400 world, but not one I see much used in PC land.
> penalty would be pretty bad. The FastTrack method of only using one
> drive sucks, though. When that one gets flaky (as has happened to me
> way too many times) there a 50% or better chance that the whole array
> fails. I've had to rip out my RAID card and run the drive of the mobo
> connector about 3 times in the last year.
Yep. Localized data corruption without any reported errors is something
I've picked up several times on ATAPI drives. It could be the OS
screwed it up or it could be shoddy electronics or it could be the
problem I describe above about on-bus corruption (in which the drive
actually may have performed flawlessly).
Note that I use ATAPI all the time, I run software raid (MD) ATAPI
configurations in production. Most of the time it is an adequate
solution (guarding against HDA failure). But it is important to
understand what your getting - and what your not getting. Use the
solution appropriate to the problem domain.
> >One should only use Western Digital drives in such a configuration, as
> >they die catastropically instead of degrading over time.
> So noted. The gradual (OK, they fail too often to be called gradual,
> but it's not instant) failure of these IBM's has been disastrous.
Yea, the WD was a bit of a joke; 'cause I burned through a slew of them
in a single year - Best Buy just kept replacing them. (Always by IDE
hardware from somewhere you can take it back to and get in someone's
face!) I don't know how many places are left that actually MAKE drives,
and home many are relabeling shops. I'd wager alot of these drives are
all the same HDA.