What is Data Virtualization? - Why Does the Enterprise Need It?

Chances are if an enterprise company has customers, it is operating in a hybrid cloud environment. The small fraction that are not are predicted to adopt this structure by the end of next year. According to Dave Bartoletti, VP and Principal Analyst at Forrester Research, 74% of enterprises describe their strategy as hybrid/multi-cloud today. So the hybrid cloud has gone beyond critical mass: it is now critical to operations. As the hybrid cloud matures, there is one element that is the pillar on which its usability rests: Intelligent Data Virtualization.

What is Data Virtualization? Why does the enterprise need it?

Data virtualization is defined as an approach to data management allowing applications to use data without requiring technical details about the data, such as how it is structured, where it is physically located, and how the data is accessed.

Data virtualization provides a bridge across data warehouses, data marts, and data lakes, delivering a single view of an organization’s data without having to physically integrate it. *This abstraction enables enterprises to use their data, no matter where it’s actually stored, to produce fast, timely insights. *

Data Virtualization: Still the Champion of The Hybrid Cloud

To understand how data virtualization works, envision a database that has entries from SaaS applications, enterprise data warehouses, emails, and business applications. Information about formatting, sources and application location makes this data hard to manipulate, store and secure. When data is virtualized, only the metadata (or top-level data) is used. Business can access data repositories regardless of the location or platform on which they reside to conduct joins between different fact tables, allowing users to assemble richer pictures from which to draw insights.

Why Do Companies Need Data Virtualization?

Without advancing data virtualization first, there will be no progress in making the hybrid cloud more agile or accessible. Virtualized data allows the folks in sales, marketing, IT and operations to get answers that span multiple datastores. Without virtualized data, the hybrid cloud would be a slow, insecure, unresponsive and expensive collection of data without the ability to answer the high-value questions every enterprise has.

Current positioning from RedHat and Microsoft have proven the concept. Said Joe Fernandes RedHat vice president, Cloud Platforms Products:
“Virtualization provides a foundation for modern computing and entry point for hybrid cloud deployments, making a flexible, stable and open virtualization platform a key piece of an enterprise technology mix.”

He’s right. Virtualization is still the foundation.

What Data Virtualization Is Not

Data virtualization should not be confused with data blending via the visualization tool. Visualization tools that employ data blending require the end-user to understand the structure of the data, how it joins, and anticipating how it was intended to be used. Data blending comes with critical scale and security limitations.

Data virtualization is also not federation. Federation is the ability to join data across multiple data stores. While federation is an essential part of virtualization, it alone does not provide an abstraction that makes many complex things look like one simple thing.

How does Data Virtualization Work?

Virtualizing data requires that a new combination of physical data is defined and stored in a metadata software layer, combined with algorithms that can decode requests for data that span multiple data stores, dispatch datasource specific queries, and finally combine the multiple results into one answer for the data consumer.

Data virtualization can paper-over differences in access models for each data store ranging from SQL Dialect specific differences, to entirely different access patterns such as API vs SQL vs file reads.

Data that becomes part of this abstraction does not need to be extracted, translated and loaded (ETL) into a more analytical tool friendly format – it never has to leave its originating data store.

Data virtualization addresses the complexity of utilizing data at scale and delivers the automation necessary for an enterprise to access the hybrid cloud without concerns about security and accuracy.

Three Urgent Needs to be Addressed with Data Virtualization

As data virtualization becomes as ubiquitous as the hybrid cloud, it’s a good time to re-evaluate three urgent points in its development:

Data Virtualization Is Not a Commodity: Typically, combining data that spans on-premises and multi-cloud sources is incredibly complex, usually involving IT led software projects that have requirements as well as build and testing phases. Not all companies have the ability to run these sorts of projects effectively and efficiently enough to serve the business needs for speed, accuracy, and security. AtScale removes the need for complex software projects and enables the business to form a hypothesis, generate sophisticated questions, and get answers quickly and securely.
No virtualization → No innovation: Machine Learning and Artificial Intelligence are currently the biggest drivers of data value. When an enterprise recognizes the value of data, their appetite to collect more grows. IDC predicts data creation swelling to a total of 163 zettabytes (ZB) by 2025. The last ten years centered around the conversion of analog data to digital and the next ten are focused on getting value from data: collecting and leveraging mission critical data necessary for the smooth running of daily life for business, government and consumers. Data Scientists spend a disproportionate amount of time in data acquisition and prep vs maximizing their value by generating models and insights. The amount and variety of data an enterprise must manage is growing exponentially, and virtualizing that data on the hybrid cloud agile makes it flexible enough to accommodate new technologies and applications.
The Rise of Containers: Container technology and the orchestration of Kubernetes enable the operationalization of the hybrid cloud. Containers are built as a stripped-down OS that can add applications and infrastructure on an as-needed basis. Kubernetes and its competitors virtualize the entire infrastructure requirements for complex application systems.

Referring back to RedHat’s most recent positioning, “If we add Kubernetes to the (data) mix, which brings its own orchestration and scheduling, we end up with multiple layers of management and orchestration just to keep our clusters running.” Data virtualization and containers go together like peanut butter and chocolate: virtualization hides the complexity of data, thus driving massively increased consumption, and containers hide the complexity of scaling the infrastructure required to serve. Bottom line: Kubernetes necessitates data virtualization as it integrates into the hybrid cloud.

Bottom line: Kubernetes necessitates data virtualization as it integrates into the hybrid cloud.

Data Virtualization In Action

The leadership team of a Fortune 50 DIY retailer recently tasked their IT department with moving data from an on-premise enterprise data warehouse to a cloud enterprise data warehouse. The reason: the need for an economical rollout of broad analytics use cases, with a focus on using more analytics at a scalable price. The retailer looked for an improvement in their BI performance and a way to secure data that would be served outside their firewall and to advance their query capability and results. In short: the existing data needed to be virtualized to avoid costly business interruption.

The retailer’s IT department was able to connect their different BI tools to their on-premise warehouse using AtScale’s Universal Semantic Layer. By breaking the hard-wired connection of BI tool to the data warehouse, they migrated their analytics to the cloud within a weekend. No BI users were impacted (or even knew!) the data had moved when performing their reporting tasks on the following Monday morning. Business analysts ran the exact same reports as they had on Friday without realizing that the data store architecture and even the physical location of the data had changed.

Data virtualization is the foundation and the future of the hybrid cloud. No innovation can ignore it; smart companies will thrive on it.