VMware Puts Virtual Spin on Big Data
Updated · Aug 29, 2012
Page Contents
VMware has long been known as the king of virtualization. But during this week’s VMworld 2012 show in San Francisco, the company made it clear that it has far loftier aspirations. It wants to own the entire data center, including enterprise applications, databases and even business intelligence (BI).
“We aim to provision the entire data center,” said VMware CTO Steve Herrod. “Within the virtual data center, we can keep enterprise applications like SAP, Oracle and BI running while keeping them available and secure.”
VMware’s goal is to facilitate enterprise apps, make them run quicker, more efficiently and more securely. The company has coined the term “the software-defined data center” as the moniker for this vision.
“The software-defined data center is the platform of the future that will allow you to run all of your applications and, ultimately, be a competitive differentiator for your business,” said Herrod.
“In the end, it is the applications that matter,” he added. “It’s the applications that help a business make new revenue or be more efficient in how they are doing so. The infrastructure side is just a means to an end, a way to run these applications.”
As well as making its flagship VMware virtualization software able to cope with the kinds of high-performance workloads increasingly demanded by large enterprises, VMware has made an acquisition and quietly launched a few projects aimed at staking its claim over the virtual data center. This includes Project Serengeti for Big Data, the acquisition of Cetas for business intelligence, and the establishment of a VMware platform to deliver database-as-a-service (DBaaS), a variant on software-as-a-service (SaaS).
Many of these efforts dovetail with Big Data acquisitions and partnerships at VMware parent EMC.
Serengeti: Simplifying Hadoop
VMware’s immediate play for Big Data glory is via Serengeti, an open-source toolkit designed to be used with Hadoop, a platform managed by the Apache Software Foundation that is gaining traction as a framework for storage and analysis of large volumes of data. The open source Hadoop operates on commodity hardware instead of pricey proprietary boxes. Facebook, Google, Yahoo and Amazon are among its chief contributors.
A problem with Hadoop, though, is it can be difficult to provision. June Yang, senior director of Product Management for VMware, explained that Serengeti will help organizations implement Hadoop and make managing it easier. In essence, it automates the involved process of building a Hadoop-based infrastructure to store and mine unstructured data.
“In a couple of clicks you can build what you need,” she said.
Serengeti, then, makes it feasible to deploy Hadoop across a large number of servers in a few minutes. It is backed up by another VMware contribution to the open source community known as Hadoop Virtual Extensions, which makes Hadoop virtualization-aware. VMware says organizations will enjoy a jump in performance by virtualizing Hadoop using VMware tools rather than running it on a few physical servers.
Cetas: Mining Unstructured Data
Like many others, VMware has taken note of the tremendous potential that lies within vast stores of unstructured data.
“Unstructured data could outstrip structured by 10 to one in the next 10 years,” said Yang. “All that data used to sit in a dump somewhere and never be analyzed. Now with Hadoop, we can make some sense of it.”
With the traditional database being stretched into a Big Data world, there are many areas outside of relational databases that require analysis. And with a shortage of so-called data scientists, there is a strong need to democratize business intelligence.
VMware recently acquired Cetas Software, which Yang said provides “Big Data as a service.” Cetas offers technology to isolate patterns within Big Data streams and stored application data, something that can’t be done by relational databases. This analytics application sits on top of either Hadoop or in the cloud and offers pre-built algorithms for specific verticals. Yang gave an example of an online eCommerce store or online gaming company. Either would be able to use Cetas for Big Data analytics.
Data Director: Virtual Databases
Yang explained the need for Data Director, VMware’s database-as-a-service (DBaaS platform), by relating trends such as the rise of unstructured data, device proliferation and mobility.
“Data is changing in the modern world,” she said. “Relational databases have been used for everything over the last 20 years but one size no longer fits all.”
VMware’s Data Director automates routine tasks, including database provisioning and backup. Currently it supports mainly Oracle databases, providing them with a similar level of flexibility as public cloud database services while maintaining enterprise-grade control. Perhaps more importantly, by virtualizing the database, organizations can reduce hardware and licensing costs while also speeding application development.
Migration from a physical to a virtual database setup is said to be a relatively simple task. This approach also makes it possible to manage thousands of databases through one pane of glass.
“A few years or more ago, virtualizing applications was the new thing and now nobody questions it,” said Ronaldo Ama, VMware's vice president of R&D. “Now the same thing is happening with data engines.”
Drew Robb is a freelance writer specializing in technology and engineering. Currently living in California, he is originally from Scotland, where he received a degree in geology and geography from the University of Strathclyde. He is the author of Server Disk Management in a Windows Environment (CRC Press).
Drew Robb is a writer who has been writing about IT, engineering, and other topics. Originating from Scotland, he currently resides in Florida. Highly skilled in rapid prototyping innovative and reliable systems. He has been an editor and professional writer full-time for more than 20 years. He works as a freelancer at Enterprise Apps Today, CIO Insight and other IT publications. He is also an editor-in chief of an international engineering journal. He enjoys solving data problems and learning abstractions that will allow for better infrastructure.