[SLL] Rant: ATA-over-Ethernet 0x88a2

Andrew Sweger andrew at sweger.net
Thu May 5 18:12:50 EDT 2005


On Thu, 5 May 2005, Jesse Keating wrote:

> I've been down this road before.  Unless the AoE has enclosure services,
> this will fail horribly.  

I disagre here. But...

> First) ATA is a horrible code set to start with.  I really hope that it
> is ATA in name only, and that it borrowed most it's code from the SCSI
> stack.

This was one of my first concerns as well. Even if the packets are ATA
frames on the wire, the aoe driver should be a SCSI-like interface at a
minimum.

> Second) Without some high level enclosure services, any type of array
> you make out of these disks on the network will be extremely fragile.
> If one disk goes down, your array will stop responding at all until you
> completely power off and power back on minus the dead disk.  Or if you
> even have network congestion or a network failure, it will bring the
> whole thing crumbling down.

I don't it. The problem you describe is not a feature of the array or
network but of the application built on top of it. When my uplink router
goes nuts, I don't have to reboot all the computers in the house, nor do I
lose the use of my local network. The same can be done for this type of
storage design.

Also, most people would most likely deploy the AoE array on a separate
network segment ("See that wire there? That's Ethernet. But we don't call
it that. It's the storage system connection." As in, who cares what the
link to the storage is (IDE, USB, SCSI, I2O, FC, ATM, etc.) as long as
it provides the storage abstraction required by the application.

> Third) Network overhead.  When you've got one or two of these disks
> attached to even a gigabit network you're OK.  When you've got 3TB worth
> of 40~80 gig drives sitting on your network, your network overhead is
> going to be huge.  Thats just a ton of data to try and break up and send
> out to all those disks.  Even w/out the TCP/IP overhead, you've GOT to
> have some sort of delivery assurance mechanism to ensure your data gets
> where it is going.  Thats going to require a bit of cross talk for each
> write/read operation.  If you're doing some sort of RAID level ( I can't
> imagine NOT ) then you're going to have metadata being shifted around a
> bunch as well.

A bad RAID strategy could melt the network with a lot of spindles. But the
same can be said for other storage topologies (I would not stripe across
15 SCSI disks if that results in swamping the channel). An GFS AoE array
with multiple initiators could be a serious problem. But that's a corner
case far from the problems I see AoE helping to solve.

> So now you're looking at maybe a dedicated gigabit network JUST for the
> storage, and even that won't scale all that well with a LOT of small
> devices.  Just too many devices to talk to.

The head end doesn't talk to all the devices at once. Only those that it
needs data from. LVM knows which device it needs to retrieve a block from
and won't bother talking to any of the other devices (same goes for other
storage networks).

> My opinion is that this technology is good for a single disk or two for
> family network storage.  Any type of array stuff is just beyond the
> scope of the technology and asking for trouble.  iSCSI is even having
> problems with this, and with multiple concurrent accesses and stuff.
> Fibre Channel seems to have it nailed down, but MAN what a price you
> pay.

Amen on the FC pricing. Bad economics. But GigE is faster than FC now,
right?

I've come to recognize (something I've known for decades) from these
conversations is that even a humble 8086 with a single MFM disk attached
to an onboard connection is a network. The storage application is where
the flexibility needs to go (or the smarts as Glenn is pointing out). I
see the enclosure service being the smarts of the storage application and
that's only needed at the point of use (at the head unit).

-- 
Andrew B. Sweger -- The great thing about multitasking is that several
                                things can go wrong at once.




More information about the linux-list mailing list