Big and Fast Data: Upgrading Your Infrastructure
Updated · Oct 05, 2015
Providing the infrastructure for both Big Data and the newer Fast Data is not yet a matter of applying cookie-cutter “best practices.” Both require significant tuning or changing of both hardware and software infrastructure, and the newer Fast Data architectures differ significantly from both Big Data's architecture and the tried-and-true OLTP (online transaction processing) solutions that Fast Data supplements.
Big Data Requirements
Big Data is about analyzing and gaining deeper insights from much larger pools of data, much of it accessible in public clouds. Social media data about customers is a good example. This data emphasizes consistency less and fast access more, leading to a wide array of Hadoop-based solutions. Thus, the following changes in architecture and emphasis are common:
- Support for in-house Hadoop (software such as Hadoop and Hive, hardware typically scale-out and cloud-enabled) as a staging place for social media data and the like
- Private-cloud enablement software (e.g., virtualization) for existing analytics data architectures
- Software support for large-scale, deep-dive and ad-hoc analytics, plus software tools to allow data scientists to customize for the needs of the enterprise
- Massive expansions of storage capacity, particularly for near-real-time analytics
Fast Data Requirements
Fast Data is about handling streaming “sensor-driven” data in near-real time, such as data from the Internet of Things (IoT). That means a focus on very rapid updates, with frequent loosening of “lock until written to disk” constraints. The resulting data often also receives initial streaming analytics, either from existing (typically columnar) databases or from specially designed Hadoop-associated solutions. The following changes in architecture and emphasis so far appear to be common:
- Database software designed for rapid updates and streaming initial analytics
- Large enhancement of use of NVRAM (non-volatile random-access memory) and SSD (solid state drive) for Fast-Data storage (e.g., one terabyte of main memory and one petabyte of SSD)
- Software constraints on time of “response” that resemble those of the old RTOs (real-time operating system)
Putting Big Data and Fast Data Together
Fast Data is intended to work with Big Data architectures. Thus, to mesh the two:
- Data is separated on disk between Fast Data and the less-constrained Big Data data stores
- The architecture allows access by Big Data databases and analytics tools to Fast Data data stores
This is a very brief overview of typical implementations. Major vendors sell a wide variety of software and hardware to cover all of Big Data and much of Fast Data, while groups of open source vendors cover much of the same software territory.
Therefore, implementation is often a matter of balancing cost vs. speed-to-implement. While buyer should beware, smart buyers can gain competitive advantage.
Wayne Kernochan has been an IT industry analyst and auther for over 15 years. He has been focusing on the most important information-related technologies as well as ways to measure their effectiveness over that period. He also has extensive research on the SMB, Big Data, BI, databases, development tools and data virtualization solutions. Wayne is a regular speaker at webinars and is a writer for many publications.