Deep Learning Myths, Lies, and Videotape - Part 2: Balderdash!
In Part 1 of this blog post, we discussed the history and definitions of Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL), as well as Infinidat’s use of true Deep Learning in our Neural Cache software.
While there are no enterprise storage systems other than InfiniBox that claim to use DL (and none do), there are many that claim to use AI/ML, so let’s take a closer look.
For most vendors that claim to have AI or ML associated with their storage, the AI or ML commonly is not a native part of the storage system at all. Typically there is telemetry data gathered from systems that are sent off to a cloud-based data lake, where independently run external AI/ML applications perform predictive analytics. Typical uses are preemptive system maintenance, repair, and problem avoidance, capacity planning, and performance planning. These are very good and useful things!, and most of the better storage vendors do it. Infinidat does this with InfiniVerse.
There is one very large storage vendor in particular that aggressively markets some of their native product features using both the terms AI and ML very freely. In my opinion their use of these terms is usually inaccurate.
Here’s an examination of one prominent example. Last year that major storage vendor introduced a completely new storage system targeted to replace multiple incompatible storage system families that they had acquired over the years. As part of their announcement, and ongoing marketing, they continue to claim that this new system has “integrated machine learning” and an “AI/ML-based Resource Balancer.” In December 2020, their President of Global Sales and Customer Operations touted this feature in a press article, stating that their product “has a machine-learning engine to optimize performance and reduce cost by automating labor-intensive processes like initial volume placement.”
The feature that is being referred to appears to be called a resource balancer. After the storage administrator creates a new storage volume, instead of them having to manually determine which one of up to a maximum of four appliances in the system to place the new volume on, the system will choose which appliance to place the volume on, and then automatically complete the process of volume placement.
As I believe any good storage admin will attest, when trying to determine where to manually place a volume, there are many important factors to take into consideration before making their decision. These include questions such as:
- What are the performance requirements for the volume?
- What are the data access patterns for the volume?
- What are the data protection and replication requirements for the volume?
- What are the performance profiles for each of the appliances where it might be placed?
- What are the performance profiles for the other volumes on those appliances?
- How much capacity is already on each appliance?
- What percentage of total capacity is being consumed on each appliance?
- Should data reduction options be enabled for the volume?
- Are there any special security of access concerns?
Unfortunately, the “machine-learning engine” for the product in question doesn’t seem to consider these kinds of important things. The product’s technical primer states that the placement is based on which appliance has the most unused capacity. That’s it.
Let that sink in. Since this feature appears to completely ignore all the real-world important factors of volume placement (mentioned above) there is a fair probability that it may create problems rather than solve any. Fortunately for any users of that platform, it also offers the ability to override the resource balancer and place volumes manually. I can’t imagine any good storage admin that wouldn’t choose to do it manually on that platform, since doing it manually allows their human intelligence to consider all the issues above, which the resource balancer seems to be unaware of.
An analogy might be that you have just returned from shopping at a grocery store and have different groceries to put away in your home. There are four places that you can put them: a refrigerator, a freezer, a storage shelf, or an oven heated to 450 degrees fahrenheit. A truly intelligent approach would be to put ice cream in the freezer, milk in the refrigerator, tortilla chips on the storage shelf, and a pizza in the oven. Based upon the kind of “intelligence” in the product described above however, any of these products could wind up in any of the locations, simply based upon how much available free space they have — like putting the ice cream into the oven, or the milk into the freezer.
So how does this product’s AI/ML claim hold up against the AI/ML/DL definitions from Part 1? It doesn’t. Any regular human knows not to put milk in the freezer simply because it can physically fit into a larger space. Just as any good storage admin knows not to put a performance-critical volume on the slowest appliance simply because it has the most available free capacity. There’s no need to even consider testing this resource balancer as ML, since it doesn’t pass the more basic test to be AI. In consideration of the Turing Test, it is very easy to tell the difference between what a storage admin (human) would do, and what the machine (the resource balancer) might do. The resource balancer isn’t “intelligent” by definition.
But there is an accepted industry term for what it actually is. First, let me summarize what a basic software flowchart for the resource balancer probably would look like, based upon its described function:
- There is a new volume to place, open the resource balancing module, if enabled
- The module queries each appliance visible to the server that created the volume, and requests the available free capacity from each
- The module compares and sorts the responses from highest to lowest (there are no more than at most four different numbers to sort, so that shouldn’t take very long for either a machine OR a human!)
- The volume is placed on the appliance that had the highest amount of free capacity
The general computer science term for this is PROGRAMMING, and what appears to be particularly simple programming at that. It doesn’t pass the sniff tests to be AI or ML. In spite of this, every public mention of this feature that I’ve seen from independent sources has simply repeated the vendor’s claim that it’s “AI/ML,” without questioning the accuracy of the claim nor the merit of the feature.
Speaking of merit, there is a more basic issue. This feature may not be AI/ML, but is it even useful? The claimed benefit as was stated above is “automating labor-intensive processes like initial volume placement.” So how labor intensive is volume placement?
On our InfiniBox, this process is done automatically by the system, without ever requiring any human intervention. It is impossible to improve storage admin productivity relative to how InfiniBox has always done volume placement, since humans don’t ever need to do it. In practice, even the entire end-to-end larger process of both volume definition and the (fully automated) volume placement can take less than a minute with just a couple of clicks on the GUI — not just on InfiniBox, but also on some of the better storage products from other storage vendors in today’s market. We don’t call this particular individual feature AI or ML on our system because it isn’t — it’s just simply good programming.
As I was getting ready to publish this blog, the very same vendor mentioned in the example above has done it again, publicly proclaiming “machine learning” for a new feature that again appears to not even be AI. This new feature is referred to under the category of “Intelligent data reduction enhancements (DRR).”
Prior to this new release of software, that storage system’s “deduplication and compression are always engaged inline and cannot be bypassed.” With this new updated version of the storage O/S, the system “will dynamically prioritize workload performance and defer deduplication operations to a period in time with a less extreme I/O demand.”
In simple terms, if performance demand rises to a challenging level, the system will tell the deduplication process to turn itself off. Then, if and when a lower level of performance demand has been achieved, it will tell the deduplication process to turn itself back on.
Here’s an analogy: I have a thermostat in my home that senses when the temperature gets too cold and automatically turns the heat on. Then when the temperature rises to the desired temperature, it turns the heat off. I do not consider that to be AI or ML.
Like the data placement example above, one might also ask if this is actually a useful feature. There is only one logical reason why such a feature would be introduced, and that is that the system may experience serious performance problems if attempting to continuously provide data deduplication. So it is a bit surprising that what appears to merely be a valid and noble attempt to mitigate a problem that was not understood or foreseen when the product was first introduced is now being intentionally highlighted as a “feature.” All IT vendors make mid-course product corrections to address newly discovered problems all the time, but most don’t try to sell them as differentiated value-added features simply because they are new. And I still think that this kind of thing is just programming rather than AI, let alone ML.
The terms AI, ML, and DL have become popular because real AI, ML, and DL have proven to be able to add significant value — and the deeper you go from AI to ML to DL usually the greater the value. At Infinidat while some of our software we’d simply call programming, we pride ourselves on AI/ML/DL in our neural cache software being the core essence of what makes our systems unique, and more importantly their ability to provide unique value for our customers. We don’t take these terms lightly.
When vendors attempt to exploit popular terms - like riding the coattails of AI/ML/DL value by claiming to offer AI, ML, or DL for features that don’t meet the accepted definitions for those concepts - they do our entire industry a disservice. When they push features that upon examination might create unwanted problems rather than solve any, they do their users a disservice.