22 Aug 2014

I have lately been having a similar discussion with many of my clients, colleagues and peers, so I felt it would probably be of interest to all.
The conversation usually starts in similar ways: “What is happening with the storage business?” or “Where should I invest my storage dollars?”. But the real question that it usually boils down to is: “What are flash and cloud doing to the storage industry”.

A little history lesson

First let’s go back by about 10 years, Storage Area Networks (SAN) were the rage. Fancy ERP systems required fancy storage arrays for their fancy databases. They were expensive, but we needed them and the cost benefit worked, so we bought them. We were mainly fighting 2 issues, either we needed more IOPS (Input Output per Second) and we had to buy humongous amount of small drives to get it, using the only portions of them to get the performance we needed. The other common issue is that we needed capacity and we still had to buy fairly large number of drives because they were relatively small. Both options resulted in very large and very expensive storage arrays.

The new kid on the block

Then the Solid State Drives (SSD) came. They were small, they were fast and boy were they expensive, but they were the game changers. We now had an option to get those IOPS without having to buy tons of spinning disks. The initial Return on Investment (ROI) calculation were hard, but for the right workload, considering floor space and power usage, we could make it work. Now they are getting cheaper and bigger, so those ROI calculations are a lot easier to make. At the same time, spinning disks are getting humongous and cheap. Which is leading for service providers (clouds) to be able to provide crazy amount of storage at ridiculous prices.

The application angle

Another factor to take into consideration is how applications consume storage and how they achieve resiliency. Storage and server memory was expensive, so applications used it conservatively, using the minimal amount of information and flushing it to disk very quickly. When you have very expensive memory and very slow disks, cache management becomes very important. So very complex (expensive) algorithms were written and could be used by vendors as differentiators. At the same time, the application expected the availability to come from the hardware (once that write confirmation came back from the disk, the application was hand off). But with memory becoming much cheaper, storage being both large and fast and CPU cycles becoming cheap, applications are transforming. In memory processing makes sure you can have access to all your data quickly to run very performant simulation models. Large amounts of disks allows you to be less selective on what data you want to collect allowing to mine for trends you could not dream of before. And when you combine this with software being able to handle a good portion of the resiliency, what do you get? A storage array that no longer needs to super big or über intelligent…

The cloud!

And once again there is the cloud… What do you do when you have tons (Tera? Peta? Exa???) of data you never use, but that you cannot really delete (regulation, anxiety…). You find the cheapest place to store it and quickly forget about it. That’s where the storage clouds come in, they can open a virtual dumping ground within your environment. Many appliances already exists today that will sit in your environment with minimal footprint (many are even virtual), they will monitor your data usage and make decisions about which data should be local (fresh, often used) or pushed a couple of seconds away (into the cloud). You can even set it up in ways that redundancy is built in and removes the requirement for local backups. What’s not to like? This is where another two of my favorite topics come in: security and data sovereignty. For me they represent the same issue and have the same solution. The question is: what if someone could read my data (competitor? local and foreign government agencies?). It will take some time for the industry to catch up (1-3 years?), but eventually encryption will be accepted to solve both problems: I do not care where my data actually is, I just want to make sure only I can read it. The precedent exist in IT, we did with the data loss disclosure laws, if a laptop gets stolen with sensitive data, it should be disclosed, but many of those laws allow the disclosure not to happen if the data is encrypted because it is deemed not to be readable and therefore not causing risks.

Where does that leave us?

Think about what is really important to you, cost? performance? geography? hug-ability? (the ability to wrap your arms around your storage array to reassure yourself that it is still there).
Review your encryption plans, they might become important.
Embrace data, it is the key to the future, companies that understand where to get it and how to use it will rule.
Your storage needs will keep increasing (really fast). Have a plan, specialize your approach (have the right solution for the right problem).