Big data analytics: Tackling the historical data challenge
By Simon Garland, Chief Strategist, Kx Systems
Big data is noisy, messy and, frequently, inconsistent. To make sense of it, companies are investing time and money in an array of technology solutions. However, some much-touted approaches are actually quite costly and in many ways, have not yet addressed the expanding set of historical information that companies want to mine.
A solution to these shortcomings is to take a hybrid approach – combining the performance and speed of in-memory processing with the vast data storage capabilities that a traditional on-disk approach offers.
The challenge: speed versus space
To get meaningful results from big data, companies need to slice and dice their data in many different ways to work out which parts are worth using. Considering the enormity of the task, for many, this requires– above all – speed.
One approach to dealing with this data is in-memory computing. This delivers both performance and speed, allowing companies to analyze data dynamically and quickly. For many companies, however, it is not always the most feasible solution. This is because of the high cost of keeping and running huge data sets in-memory.
“Combine the performance and speed of in-memory processing with the vast data storage capabilities that a traditional on-disk approach offers”
While hardware makers have been building machines with more and more memory, even when the memory and storage problems are solved, those using more traditional database approaches to work with historical data will often struggle with speed at scale.
A hybrid solution
For years, the financial sector has successfully combined inmemory processing with on-disk storage to manage trade data, which adds up to billions of records per day, in a fast, yet efficient manner. Other industries facing similar data can learn from their example.
With a more hybrid approach, companies can combine the high performance and speed capabilities of in-memory while solving the storage issues by putting the vast historical data sets on disk. By bridging available technologies, companies can deliver on all counts – including cost.
Crucially, by folding in a high performance programming language right in with the data, users can interact directly with their data in one place. This gives a super-charged in-memory and on-disk historical database at their fingertips and the ability to deliver results in speeds and complexity previously unavailable. In a world where the most interesting information is also the trickiest to manage, this is without doubt the Holy Grail of data management.