VIDEO TRANSCRIPTION
Hi. Welcome again. This is the third video I upload to my blog techbizdesign.com to explain something about Data Analytics and Big Data. I like to understand them as an enough deep explanation for non-technical roles. If this is the third one, there are two previous ones. They talk about Data Analytics and Big Data basic notions and how it integrates into organizations. If you have not seen them, I think it’s interesting that you do.
The complete presentation you can consult and download below, aims to deepen a topic difficult to explain to non-technical people: Big Data. For that reason, I wanted to title it "big data does not hurt or how to understand something about a complex geek concept without feel they are speaking Gibberish."
To start in this world, the presentation covers different aspects around Big Data, from the simplest form of basic concepts about the great importance for society, to definitions and myths that surround it. A special mention has the paradigm shift that Big Data has involved in Data management and Data Analytics. But at the same time, there’s abundant resources focused on tools, processes and components that make up Big Data architectures. And all that information, taking into account the state of the art, trends and proposals from key tech vendors.
Presentation begins with a simple phrase, what is Big Data? To answer in a simple way, several slides are presented that delve into technical concepts such as distributed systems, the Map Reduce or the CAP theorem. But perhaps the easiest way to know what Big Data is, is by reviewing the different Vs that express its main characteristics. People talk about the three Vs of Big Data: velocity, volume and variety. But they expand them to encompass other seven Vs equally important like veracity or value. I know. It’s a topic. But that’s what works best to understand Big Data.
Knowing what this new concept is based on, helps us to ask ourselves the next big question. Why did this new way of extracting value from data appear? The answer must be sought in how the way of accessing the data evolved. At first, the data were disseminated in multiple locations, which impacted in necessary time and money to analyze and extract quite good conclusions. Over time the Data Warehousing concept appeared. It pursued the concentration of data in a single point, the one-stop shopping for our data. The union of these new warehouses with easy-to-consult tools such as Business Intelligence contributed to improve dramatically data consultation processes, being much faster to us to respond to business questions. But not enough. We’ve fallen short. There are several shortcomings explained in the presentation that have caused we cannot reach the appropriate response to an increasingly fast, varied and datafull world with just DataWarehouses.
The tech response to face these challenges came from Big Data. In a few slides I explain the substantial differences between the two disciplines. With high representative schemes, I try to explain the paradigm shift that Big Data represents to acquisition and treatment data processes, and for the modern end-to-end modern data analytical architectures. In the same way, I make a point in the concept of Data Lake, so fashionable in the world of Data Analytics today. A classic also.
This presentation climax comes when I enter into the closest tech part. Trying to be as didactic as possible, I explain with enough detail (I guess) the history and guts of a game-changer Big Data platform: Hadoop. My interest is to make understandable to you how this ecosystem works from the description of its parts, from the NoSQL databases to the data intake components through all the valuable components that allow data to be queried using SQL. This module ends showing the most used commercial and Open Source distributions worldwide these days.
Trying to stun you a little bit more, a Big Data type architecture is presented. I show you one that could be easily implemented in different existing Cloud infrastructures. To be able to contrast it, I provide some architectural diagrams proposed by top tech giants: Microsoft Azure, Amazon Web Services, Google Cloud Platform, Alibaba and Oracle. These schemes help us to know very faithfully the state of the art of Big Data architectures.
Finally, it’s time to warn you about some unfounded myths around Big Data. It’s nothing more than a list of the most important ones, those you should keep in mind if you decide to push forward the design and construction of such a challenging architecture for your organization.
Complete, don't you think? That was my intention, being deep without being bore or presumptuous. I believe these presentations’ greatest value lie from the fact they are a well-balanced collection of most relevant aspects you need to manage to really squeeze Data Analytics and Big Data power. Where do you apply this tremendous power? How? Answer these questions means you should analyze other external factors and apply a Computational Design. It’s the easiest way for success in Data Analytics and Big Data, among other disciplines.
I’m currently working on an upcoming presentation. This will talk about how Data Science is integrated into this complex equation of analytics, data and Big Data. I hope you enjoy it too.
See you soon