Cloud-Native Will Shake Up Enterprise Storage !

Cloud-native shake enterprise

Enterprise IT is on the verge of a revolution, adopting hyper-scale and cloud methodologies such as Micro-services, DevOps and Cloud-Native. As you might expect the immediate reaction is to try and apply the same infrastructure, practices and vendor solutions to the new world, but many solutions and practices are becoming irrelevant, SAN/VSAN and NAS among others.

Read my previous blog post for background on Cloud-Native, or this nice post from an eBay expert.

Overview

In the new paradigms we develop software the way cloud vendors do:

  • We assume everything can break
  • Services need to be elastic
  • Features are constantly added in an agile way
  • There is no notion of downtime

The way to achieve this nirvana is to use small stateless, elastic and versioned micro-services deployed in lightweight VMs or Docker containers. When we need to scale we add more micro-service instances. When we need to upgrade, DevOps guys replace the micro-service version on the fly and declare its dependencies. If things break the overall service is not interrupted. The data and state of the application services are stored in a set of “persistent” services (Which will be elaborated on later), and those have unique attributes such as Atomicity, Concurrency, Elasticity, etc. specifically targeting the new model.

If we contrast this new model with current Enterprise IT: Today, application state is stored in Virtual Disks.  This means we have to have complex and labor intensive provisioning tools to build it, snapshot, and backup. Storage updates are not atomic so we invented “consistent snapshot” which doesn’t always work. We don’t distinguish between shared OS/application files and data, so we must dedup all the overlap. Today the storage layer is not aware of the data semantics, so we deploy complex caching solutions, or just go for expensive All-Flash or In-Memory solution – why be bothered with app specific performance tuning.?

Managing Data in a Stateless World

Now that we understand the basic notion that everything can and will break, we have to adopt several key data storage paradigms:

  • All data updates must be atomic and to a shared persistency layer. We cannot have temporary dirty caches, cannot use local logs, cannot do partial updates to files, or maintain local journals in the micro-service. Micro-services are disposable!
  • Data access must be concurrent (asynchronous). Multiple micro-services can read/update the same data repository in parallel. Updates should be serialized, no blocking or locking or exclusivity is allowed. This allows us to adjust the number of service instances according to demand.
  • Data layer must be elastic and durable – we need to support constant data growth or model changes without any disruption to the service. Failures to data nodes should not lead to data loss.
  • Everything needs to be versioned to detect and avoid inconsistencies.

You can notice that Enterprise NAS, POSIX semantics and not to mention SAN/VSAN solutions do not comply with the above requirements, and specifically with Atomicity, Concurrency, and Versioning. This can explain why Hyper-Scale Cloud vendors don’t widely use SAN or NAS internally.

With Cloud-Native Apps services like Object Storage, Key/Value, Message Queues, Log Streams are used to make the different types of data items persistent. Disk images may still exist to store small stateless application binaries (like Docker does), those would be generated automatically by the build and CI/CD systems and don’t need to be backed up.

persistent services

Data items and files are backed up in the object storage, which have built-in versioning, cloud tiering, extensible and searchable metadata. No need for separate backup tools and processes or complex integrations, and no need to decipher VMDK (virtual disk) image snapshots to get to a specific file version since data is stored and indexed in its native and most granular form.

Unlike traditional file storage Cloud-Native data services have atomic and stateless semantics such as Put (to save an object/record), Get (to retrieve an object or record by key and version), List/select (to retrieve a bunch of objects or records matching the query statement and relevant version), exec (to execute a DB side procedure atomically).

The Table below describes some of the key persistent services by category

Category Amazon AWS Service Name OpenSource Alternatives Focus
Object Storage S3 OpenStack Swift Store mid–large objects cost effectively, extensible Metadata & versioning, usually slow
NoSQL/NewSQL DB, Key/Value DynamoDB, Aurora Cassandra, MongoDB, Etc. Store small-mid size objects, data/column awareness, faster
Object Cache
(in memory)
ElastiCache (Redis, Memcached) Redis, Memcached Store objects in memory (as shared cache), no/partial durability
Durable Message Queue Kinesis Kafka Store and route message and task objects between services, fast
Log Streams CloudWatch Logs Elastic Search (ELK), Solr Store, map, and query semi-structured log streams
Time Series Streams CloudWatch Monitoring Graphite, InfluxDB Store, compact, and query semi-structured time series data

One may raise the possibility of deploying those persistent services over a SAN or VSAN. But that won’t work well since they must be atomic and keep the data, metadata, and state consistent across multiple nodes and implement their own replication anyway. So using an underline storage RAID/Virtualization is not useful (in many cases even more harmful). The same applies for snapshots/versioning which are handled by those tools at transaction boundaries Vs. at non consistent intervals. In most cases such tools will use just a bunch of local drives.

What to expect in the future?

The fact that each persistent service manages its own data pool, repeats similar functionality, is tight to local physical drives, and lacks data security, tiering, backups or reduction is challenging. One can also observe there is a lot of overlap between the services and most of the difference is at the trade-off between volume, velocity, and data awareness (Variety). In the future many of these tools would be able to use shared Low-Latency, Atomic, and Concurrent Object Storage APIs as an alternative (already supported by MongoDB, CouchDB, Redis, Hadoop, Spark, Etc.). This would lead to centralizing the storage resources and management, disaggregating the services from the physical media, allowing better governess and greater efficiency, and simplifying deployment. All are key for broader Enterprise adoption.

Summary

If you are about to deploy a micro-services and agile IT architecture don’t be tempted to reuse your existing IT practices. Learn how cloud and SaaS vendors do it, and internalize that it may require a complete paradigm shift. Some of those brand-new SANs, VSANs, Hyper-Converged, AFAs, and even scale-out NAS solutions may not play very well in this new world.

9 thoughts on “Cloud-Native Will Shake Up Enterprise Storage !

  1. Yaron, I really loved this blog post; it shows great understanding of the software stack. Well put: “centralizing the storage resources and management, disaggregating the services from the physical media, allowing better governess and greater efficiency, and simplifying deployment.”

    Like

    • Pankaj,

      Thanks for the note, i think the storage industry today is running after incremental improvements, things we did years ago, just faster with Flash or running on the servers
      its about time we look at the revolution in the App stacks, change storage accordingly, and improve things by a factor or more
      Some of those ideas like Atomic updates and K/V storage were pioneered by you guys at Fusion IO as you worked with the hyper-scale, now Enterprises started adopting cloud software architectures, and Enterprise storage need to evolve with it to gain the full micro-services benefit

      Yaron

      Like

  2. If you the Enterprise does not want to use public cloud, it still needs NAS/SAN to run all these message queues or key value stores, right. So it is just another layer in between maybe freeing the apps of the infrastructure but making the life of infrastructure engineers just more complex.

    Either you outsource everything (data and problems) to public cloud and learn how to deal with volatility and resiliency or if you do it internally you just have more problems. I am maybe old school but at the end of the day I see plenty of hype and coolness (I agree) for not so much extra outcomes except that you do not look like an old fart. Nowadays the KISS principle is dieing in profit of a few small advantages which are often linked to speed improvements (which drives the competition). But at the end of the day, that’s where the industry goes because our society is moving so fast due (thanks to marketing):

    As said in previous post, keep blogging, I like your posts, they are clear and consice and free of any BS. Even though I disagree with where IT is heading, this blog post is spot on, as usual.

    Like

    • Tom, Thanks for commenting

      the point i made was that when you build an internal (not public) Micro-services/Cloud-Native cluster, and want the full gain, NAS/SAN don’t fit well, that also apply for the implementation of the K/V or MQ or Object (see the section starting with “One may raise the possibility of deploying those persistent services over a SAN or VSAN”).

      since the benefit of SAN vs DAS is the RAID/HA/Scalability/.. and those features are already handled by the various K/V (NoSQL), MQ, Object .. implementations in an app specific way, if you use a SAN you end-up with degraded performance since thinks that could have been sequential I/O become random. NAS or NFS abstraction is even worse (why those tools will usually work on top of a local FS with sparse files), not to mention the FS jurnaling overhead which can be avoided.

      If you want to centralize the storage and keep or improve the efficiency you may want to layer it over a basic object semantic, can follow how Ceph or Scality or others use Seagate Kinetic APIs (a native Object disk drive) vs a local file system, or the work done by Fusion IO to accelerate MySQL using Atomic ops, how Aerospike works directly over NVMe and bypass the FS to gain better efficiency, or how a bunch of NoSQL tools can run natively over LevelDB.

      if Private Clouds want to have the 10x better efficiency & scalability promised by the Public clouds, they should remove the extra fat introduced by SAN/NAS, and move to an “Object Based” model, same way public cloud do it.

      Like

  3. Pingback: EMC/Dell, IBM, HP – Wake Up! | SDS Blog

  4. Pingback: Cloud Data Services Force Awakens | SDS Blog

  5. Chong Minnehan 说道:Hey friend can i pulbish some paragraph of your article on my little blog of university.I have to pulbish a good articles out there and i really think your post Fits best into it.I will be grateful to give you an source link as well.I have two blogs one my own and the other which is my college blog.I will pulbish some part in the university blog.Hope you do not mind.

    Like

  6. Pingback: DC/OS Enables Data Center “App Stores” | Iguazio

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s