Cloud Data Services Force Awakens

If you’ve been reading the storage blogs and analyst reports you may conclude that storage growth is in Flash arrays, Hyper-converged, maybe scale-out NAS or Object. Most ignore the massive growth in Self-served and fully integrated Data Services and its potential impact on the overall storage market.

It’s all part of the same trend of moving from infrastructure focus (Private Clouds, IaaS, ..) to services and applications (PaaS, SaaS, Micro-Services, DevOps, ..). Cloud vendors deliver a full turn-key solutions that are billed by the hour or usage. Amazon experiences exponential growth a few years in a row for all those Data Services (S3, RedShift, DynamoDB, Kinesis, Aurora, ..).

Did you know that the cost of higher level services like DynamoDB or RedShift is 30-100X higher per GB compared to S3 simple service? Why settle for 3 cents/GB when you can charge Dollars. And yet, many customers opt to pay the extra bucks and the adoption of these services is soaring, since its robust, easy to use, and still lower TCO than legacy databases.

To be competitive such services are built on the most cost effective and bottleneck free architectures. None of these services leverage SAN, Hyper-converged, or even NAS. Those are not relevant, as I will explain below, they use storage servers (DAS) with new and more relevant abstractions that maximize the scale, efficiency and performance. Those technologies are starting to proliferate within the enterprise and will have an even bigger impact on the already struggling enterprise storage market, especially with the erosion in margins and the fact that the highest data growth is in unstructured, BI, and Analytics data – not those VM images or Legacy databases most storage vendors go after.

If you are still skeptical check out Google Cloud Platform. 8 out of the main 14 services are Managed Data Services and none of them use SAN or NAS or traditional Hyper-Converged (file/block) layer underneath.

GCP Data Services

Microsoft is already fully vested in this with Azure Data services. Larry Allison is now trying to steer the Oracle ship at full speed in that direction selling per usage cloud services vs licenses. IBM is throwing much of the hardware business baggage and is re-focusing on data services and analytics – It remains to be seen if they manage to take off. HP is (over) promising the future “Machine” as the best hardware for Data Services and Analytics. Meanwhile the Enterprise storage vendors fight over the shrinking pie, without internalizing the colossal changes that are about to come.

So what’s wrong with the current storage abstractions

The most common Enterprise storage architecture today is SAN or Virtual-SAN (Hyper-Converged), which is pulling together a bunch of disks in some RAID schema, and is creating virtual disk (LUN) abstractions. Striping, caching, indexing, data layout, and compression are all owned by the storage layer, which has no clue how the apps work or layout the data. This is why we constantly tune storage and applications. It may be ok for general purpose VM images which are not application specific, but the new trends such as micro-services are making those images much smaller and more stateless, and data is stored in dedicated Persistent Data Services (see my post on micro-services architecture).

Now imagine a database or NoSQL data query. It is forced to scan and transfer many redundant disk blocks over the fabric, it must implement its own layer of caching and indexing which doesn’t benefit from the underline layer. Many of the new databases today implement extremely efficient contextual columnar compression and dedup, so compression in the underline layer will be useless. A critical problem is that any update to a higher level data service involves multiple changes to the disk which have to occur atomically. This means that with SAN we must use locking and journaling with redo and undo logs – an extremely high overhead operation. If the storage is application-aware it can also combine multiple types of storage like NV-RAM, Flash, and Disk and use them most appropriately vs relay on statistics based tiering. Why waste the x86 CPU cycles on organizing, caching, and indexing fixed sized and randomly accessed blocks when we can do it in the context of an organized set of variable size application records

Oracle designed Exadata quite some time ago which pioneered the notion of Database optimized storage, and moved table indexing, scanning, and compression to the storage node. Most of the NoSQL and Hadoop components today are optimized for Direct Attached Storage (DAS) or key/value abstractions for the same reasons, running over SAN or NAS is more expensive and usually deliver worse results.

Last year Amazon exposed the internal architecture of Aurora (a scale-out SQL DB) which performs 6x faster than alternatives, scales linearly, and is far simpler to operate. Oracle is in panic since Aurora is one of the fastest growing AWS service. As you can see below they moved some of the traditional functionality like journaling and caching to the storage nodes, and using S3 for archiving. You can imagine how much cost it saves when comparing it with SAN and all the operational overhead around it.

Aurora

Ok, so why not Scale-out NAS

Indeed, many traditional data services run on top of local file-systems. This is mainly to abstract the storage and avoid ties with the low-level layers. They will usually open a few very big sparse files and work with them just as if it was block storage, but with a higher overhead.

Add to that all the file system lookups and traversals/indirections, and since the app needs to update several files per transaction we now have two layers of journaling – one in the file system and one in the application. In the previous post we also discussed the fact traditional file systems don’t know how to take advantage of non-volatile byte aligned memory like Intel 3D Xpoint, or how to maximize the value of NVMe.

If we use remote shared files (Scale-out NAS) there will be too much protocol chatter and potential locks that degrade performance even further. Given we need to guarantee transaction isolation and atomicity we must implement replication as part of the application anyway and can’t depend solely on the storage replication.  There is now a growing trend to use optimized key/value abstractions under the data service (e.g. LevelDB, RocksDB) which owns the data organization, fragmentation, and lookup and those work best with direct attached disks,Flash or non-volatile Memory.

When it comes to static files like images, logs, or archiving NAS has still some room. The challenge is that with the large growth in unstructured data NAS and POSIX semantics becoming a huge burden, we cannot use central metadata services and directory traversals. We’d rather use fast hashing and sharding approaches of object storage which enable practically unlimited scale at very low cost. As the number of data items keeps on growing we are also becoming very reliant on extensible object metadata to describe and quickly lookup objects/files.

A somewhat overlooked fact is that much of the unstructured file content is ingested or read as a whole. While NAS can be fast on reads and writes, it’s extremely slow when creating/opening new files or doing metadata operations, again a big win for object.

Note that many of the new entrants in scale-out storage claim to have object implementations. Those are usually emulated over their limited POSIX/NFSv3 optimized file systems and won’t be a native object implementation. A simple test would be to check how many objects they can create per second, or how well/fast they can query metadata and Billions of objects, and can they really match Amazon S3 prices.

Summary

Many organizations no longer want to consume fragmented hardware and software components and integrate or maintain them internally, they like the cloud self-service approach where developers or users can just consume services and pay for what they use. IT is now a utility, and organizations rather focus their most valuable assets (people) on business applications and their competitive edge.

Integrated Data Services means we are no longer bound to legacy SAN and NAS approaches. New layering is required between storage and applications. We need to implement critical processing closer to the data, while decoupling application logic from the data to allow better scaling and elasticity.

Will the Enterprise IT Federation be able to defend against the imminent attack by the Cloud Empire? Will the FORCE guide them to stop thinking infrastructure, and deliver optimized and self-served data platforms? Will someone come to the rescue?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s