Next generation realtime data warehouse: Bringing analytics to the data
In the blog "BI software vendors choose their strategies: on-premise, cloud or hybrid approach?" I promised to elaborate further on the future of data warehousing and how this might affect the way we work with analytics. If we require agile and fast performing analytics available from anywhere and any place, we basically also require platforms and sources that allow our analytical software to do so.
Imagine the following situation: you are analysing and gathering insights in the performance of product sales and wonder why a certain area in yiur country is doing better than others. You deep dive, slice and dice an use different perspectives to analyse but can't find the answer WHY sales are better for that region. You conclude you need data that is not available in your corparate systems. Some geographical data available through Hadoop might answer your question. How to get it available for analyses quickly?
Bring the Analytics to the data
If we don't want to go the traditional way of specifying, remodeling the data warehouse, uploading and testing data, we'd need a whole new way of modern data warehousing. What we ultimately need is a kind of semantics that allows to remodel our data warehouse in real-time and on-the-fly. Semantics that allow to leave the data where it is stored without populating it into the data warehouse. We need a way to bring our analytics to the data, instead of the other way around? So our wishlist would be:
access to data source on the fly
ability to remodel the data warehouse on the fly
no replication of data - the data stays where it is
not loosing time with data-load-jobs
analytical processing done on the fly with push back to an in-memory computing platform
drastic reduce of data objects to be stored and maintained
elimination of aggregates
Traditional data warehouse are probably the biggest hurdle when it comes to agile business analytics. Though modern analytical tools perfectly allow to add data sources on the fly and even blend different data sources, these components by all means remain analytical tools. When additional data is required that either needs to be available permanently for multiple users, or that is huge in scale and complexity, analytical tools lack the computing power and scalability needed. It simply doesn't make sense when multiple users require the same complex additional data, to have them all blend that individually. A data warehouse in this case provides an answer. However, here the hurdle starts: traditional data warehouse require substantial effort to be adjusted to the new data needs:
adjust and adapt the modeling
develop load and transformation script
set-up scheduling and linage
test and maintain
2016 has been the year where the future of data warehousing has started. In memory technology with smart, native real-time access for both analytics to the data warehouse, as the data warehouse to its core in-memory systems. Combined with push-back technology, - where any possible required analytical calculation is pushed back onto the in-memory core data platform - we can state that analytics is brought back to the data. End-to-end in-memory processing has become reality and allows for true agility. End-to-end processing that is ready for the Internet of Things at petabyte scale. Are we happy with this? Sure, we are! Does it come as a surprise? Hello no, Digital Transformation just required for it!
Native, real-time access for Analytics
Now what do the next generation data warehouses bring to analytics? Well, they allow for native access from top end analytics components, through the data warehouse, all the way to the core in-memory platform with our operational data. Even more, this native access is real-time ladies and gentlemen. And this is exactly the reason why we speak from bringing the analytics to the data: every analytics interactivity from an end-user generates calculations to be done. With the described architecture, these calculations are massively pushed back to the core platform where our data resides.
The same integrated architecture is also a game changer when it comes to agility and data optimization: when new, complex data is required, it can be added without data replication. Since there is no data replication, the data warehouse modeling can be done on the fly leveraging the semantics. In "normal-people-language" it means that we do not have to model, create and populate new tables and aggregates when additional data is required in the data warehouse, because there are no new tables needed! We only create additional semantics and this can be done on the fly.
Below is an example overview for SAP BusinessObjects Analytics (Cloud or on-premise) running on the next generation SAP BW/4HANA data warehouse, all driven by the in-memory SAP HANA Platform. End-to-end native and real-time access for all components of the architecture.
SAP BW/4HANA Next Generation real-time data warheouse
In below video-example I "sampled" the use case described in the beginning of this article: when analyzing certain data, the user notices some data is missing. The required data is available in HADOOP and using BW/4HANA the analytics is brought to that data in real-time. No replication, no users waiting, no new load or transformation scripts; just using ODP semantics and the user has access in real-time to the new data. Stunning, yet reality ! Below a video where I demonstrate the full scenario. Enjoy!
PART II: let's discuss Analytics on the next generation data warehouses
This evolution is huge and there is a lot to tell. Stay tuned, since in a next article we will have a look on how analytics will really benefit from the next generation data warehouses. It might even permanently change the way we work with analytics in the near future!