Josep Curto is a renowned research analyst and professor who has helped many companies to leverage data-based decision-making. He's CEO of Delfos Research and Business Analytics (BA) professor at the IE Business School (Madrid). Ain.Ua has interviewed Josep about his vision of the future of Big Data and here's our English coverage of the interview.
What is Big Data and where does it start in terms of scale and scope? Let's say we have 10 million notes - is it already Big Data or not yet? The term is being used very often in different contexts, but we suspect many people still don't understand what it means.
It's a very good question. Many companies are asking themselves: "When do we have an issue with our Big Data?" But sometimes Big Data is not a correct term to use. That's because of complex nature of data that's referred to in terms of volume, velocity and variety. So, what does it mean? In some cases we deal with petabytes and even hundreds of petabytes of data that should be processed and analyzed at a very high speed (in split seconds). In this case, we're working with time intervals from milliseconds to minutes and we're analyzing data very quickly to make BI driven decisions or benchmark trends.
- Josep Curto
Or there're cases when our data set cannot be presented in a traditional way as a spreadsheet with certain descriptive attributes. All of these scenarios demonstrate the complex nature of information we have to deal with, and this is key to Big Data understanding. A company has Big Data issues when it needs to extract particular business value from robust and complex data sets and needs to find a proper way to do so.
In my understanding, Big Data is a set of strategies and technologies that enable capturing, storage, processing, analytics and visualization of complex data sets.
That being said, the company has a Big Data issue if it's sitting on a huge pile of unstructured data and doesn't know what to do with it?
It's one of the ways to explain it. But, as I've mentioned above, the Big Data issue has 3 aspects - volume, velocity and variety. Big Data may envision all three factors at once or just 2 or 1 factor. And in the future, any company will have to deal with Big Data because business as a whole is being digitalized now. As such, all transactions and interactions with stakeholders become data. And companies are asking themselves: "How can we better understand this transaction? Are there ways for us to better understand our clients?" From this perspective, all verticals, not only traditionally data-rich ones like banking or telecom, will be leveraging Big Data. Take a look at agriculture: Big Data revolution is already happening here. Landowners are actively using sensors to track soil and plants conditions. Isn't it amazing?
Should businesses store any information, even insignificant data, in order to gain some value from it in the future? What are the limits of data storage?
That's an important question, too. Businesses should evaluate situations from the viewpoint of a current problem. That's definitely wrong to store huge amounts of data and hope to get some value from it in the future. We need to identify specific areas of Big Data application, run test projects and assess them in terms of value they bring. If the company finds these pilot projects truly valuable, it should bring them to life and move on to the next problem. We need to set priorities properly and focus on problems that are really critical for business.
For instance, if Big Data enables your company to move from customers' segmentation to micro-segmentation, you'll be able to improve all aspects of doing business including marketing, finance and customer services. In most cases, it's advisable to employ Big Data technologies gradually and in a limited context.
Let's say your company uses a traditional data-warehouse that fails to visualize data correctly and isn't scalable enough to cover your current number of transactions. In this case, it makes sense to give up using traditional data-warehouses and migrate to highly efficient and scalable transactional databases that are meant to process Big Data in the real time. This will increase efficiency, allow for processing of a greater number of transactions and will likely help reduce expenses. And using this database as a fundament, the company will be able to execute many other Big Data projects the right way.
And how can Big Data benefit NGOs and citizens in general?
I believe Big Data is a perfect tool not only for businesses, but for public sector, too. Let's take city council as an example. We want our cities to become smart, i.e. more comfortable to live in, more efficient and economically successful. It means the city council should have a clear understanding of what's going on in the city: how much energy and water it consumes, how many people own a car, how many people live in different districts, how they commute to work and move around the city. This knowledge will help better organize public transportation, and Big Data will become the platform allowing municipal authorities to answer all of those questions.
The same occurs at a higher level. Governments can use Big Data to better understand what resources they have available, people's needs, etc. Many countries have already realized that putting data from different hospitals in a single hub will allow for better understanding of current healthcare issues and state of patient care in general. So, for governments Big Data is a platform allowing to achieve greater transparency and make useful information publicly available.
Another trend I like a lot is open data, i.e. when the government understands that data in hand can be useful for society and provides open access to it. In this case, it's not only the government that creates a value-added proposition, but also private companies and even individual citizens. For instance, if there's open data about the current environmental issues, you, as a business, can build a mobile application that makes use of this data (e.g., a pollution map).
- Josep Curto
Should Big Data be deployed differently for different verticals? For instance, is there a difference in how we deploy Big Data in agriculture and telecom?
Each industry has its own path. Let's take agriculture - it's a great example because traditionally this industry has never relied on IT. Here everything happens one step at a time, farmers are getting used to new technologies rather slowly. How do farmers typically buy-in to technology? The farmer purchases a John Deere harvester and understands this machine is equipped with GPS, i.e. he can track its location in the field at any point of time. The harvester is fitted with a great variety of sensors, which makes the farmer aware about the type of seeds that were planted in a certain patch of the field. Some time later, the farmer makes the next step and uses data generated by sensors to analyze and increase efficiency, i.e. to save water and energy, improve planting conditions, etc.
Other industries may be moving to Big Data faster, because they've historically used huge data pools and algorithms. A great example is insurance that's impossible without algorithms used to assess risks for certain clients, identify fraud and make the best value proposition.
When it comes to telecom, availability of infrastructure is key. The company provides connectivity and telecommunication solutions and should monitor their network on a continuous basis to ensure the best quality.
While some companies are able to develop own effective approaches to Big Data analytics, others may need external help with this!
Why is Big Data a prospective niche? I've already said that in the future each company will be using Big Data and, as such, data specialists will be in great demand! McKinsey study shows that the future demands for Big Data specialists and managers will be tremendous; millions of such specialists will be sought after. So, if you want to get a good job in future IT, you should consider Big Data.
What Big Data technologies will be used in the future? What about machine learning?
Although we consider Big Data as something new, the concept is not new at all. Some technologies we use in our work today emerged 10 years. For instance, Hadoop. Speaking about trends, few can be distinguished.
First, corporate Big Data platforms that satisfy large enterprise demands for security and corporate management. When Hadoop appeared 10 years ago, it mostly helped solve data volume issues. A few years later, there emerged Spark that was mainly focused on solving data velocity issues. Today, new platforms like Apache Flink are popping up that combine data volume and velocity processing and, thus, eliminate issues that arise when both platforms are used at the same time. This is trend #2: emergence of good, fast and integrated solutions for Big Data analysis.
Trend #3 is machine learning. Today, several solutions are already available in the market: machine learning for Hadoop, Spark and Flink. Tech giants like Amazon, Google and Microsoft build their own Big Data solutions and provide open sources for developers. The result is highly competitive ecosystems that are fighting for software developers and users. That's not just a Big Data competition; rather, it's a competition of algorithms. And this means new big challenges for companies making first steps towards Big Data: how to choose the right platform, how to understand what tools are better suited for solving a particular issue? Therefore, all companies will need data specialists and Big Data skilled engineers to help them make the right choice!
Interview and images source: ain.ua