Principal at CloudData LLC (www.cloud-data.biz)
Why is that so? It is a case of putting the cart in front of the horse. And it is time for us to change that by engaging in a more methodical approach. We need to work on AI’s prerequisites first, so that we derive the true value of AI in the future.
Yes, I am building a case for delayed gratification with the necessary justification. When we do this right, there will be no debate about AI’s value or its ability to succeed.
Purely from a business standpoint, any investment of resources needs to be done in a meaningful manner to create value. AI is important for our future and we need to get it right. Logic mandates that we take this time to focus on the foundational requirements of AI.
The need of the hour is to solve the rampant data problem. We first need to address the need to deliver high-quality, integrated and standardized data to the enterprise. When that’s done, each enterprise’s AI will have all the relevant data, exactly the way it needs it.
AI & Machine Learning (ML) failures don’t occur due some perceived lack in data science. The single biggest contributor is bad quality training data. Here are some top-notch examples that speak to that.
Build a world-class model, develop a fabulous algorithm, feed bad training data (simulated, synthetic, real or otherwise) and you will be staring down the barrel of AI failure. Incorrect predictions, high error rates, biases, skews and slew of other issues come to the fore. AI’s data problem usually boils down to:
1) Bad quality of data
2) Lack of the right data
3) Combination of #1 & #2
Consider this recent Forbes article regarding AI and the COVID-19 (corona virus). It clearly brings the issue on hand into focus. The said AI model and algorithm was incorrect in forecasting and predicting the spread of the virus. A relevant question comes to mind – How could the predictions for Day#45 possibly be erroneous in excess of 5 orders of magnitude? On examining the details, it is easy to conclude that the predictions were made with inadequate data (#2) and the model’s failure to account for a set of continuously changing environmental factors and conditions.
Now imagine a similar scenario back at your workplace, where business insights generated by AI, are erroneous by over 5 orders of magnitude. What impact will that have on your organization’s AI projects, the data science team and the business?
Acting upon erroneous insights, could potentially trigger irrecoverable long-term consequences for the business. We need to take responsibility for the relevance and quality of data, we feed AI.
Bottom line – Every enterprise needs to feed AI the required
data, with high levels of quality and consistent standards. Without
high-quality, integrated and standardized data, we will be unable to practice ‘Responsible AI’. We all have a moral responsibility to create good behaviors and great outcomes from AI.
Using a running analogy, AI is comparable to running the Ultra-Marathon (distances exceeding 42 km). It is unwise to start an ultra-marathon without any prior running experience. It is also irresponsible to start an AI initiative without data management maturity.
A novice runner needs to attain an optimum level of fitness and meticulously focus on her diet, training regimen, rest, healthy lifestyle habits and more.
That significantly increases the probability of a successful 5K run. Once that is achieved, s/he works towards repeating that success for a 10K run, and then the half-marathon and so on.
As the runner gradually (methodically) progresses towards the ultra-marathon, s/he gets fitter, stronger and gains the required experience and maturity to run the longest race.
We need to adhere to a similar approach in data management to get an enterprise - ‘AI Ready’ – Its Ultra Marathon. We need to take a phased approach and progressively work on gaining data management maturity (data quality, business rules, compliance requirements and more).
Here is a simplified progression to data management maturity:
1) Cleanse, Comply & Integrate business data from all relevant relational ‘system of record’ data sources to deliver high-quality Key Performance Indicators (KPIs) – 5K
2) Incorporate Customer Relationship Management (CRM) data for Analytics-I, where meaningful data journeys generate customer, product and other relevant insights – 10K
3) Integrate first set of non-relational data sources (documents, images) for Analytics-II – Half-Marathon
4) Integrate second set of non-relational data sources (web logs, social media, chatbot conversations, telemetry) for ML-I – Full-Marathon (Phase I)
5) Integrate third set of non-relational data sources (sound clips, videos) for ML-II – Full-Marathon (Phase II)
6) Your data is now ready for AI – Ultra-Marathon
Most data scientists will tell you that at least 50% of their time is spent cleansing and wrangling with data. This puts a huge burden on the Data Science team, who should primarily focus on models and algorithms.
Without a centralized high-quality data source that feeds AI, each data science project engages in one-off data cleansing. And that is not good data management, any way you look at it.
Data integration is a critical Information Technology (IT) function, that needs to be centrally addressed by every organization. A Data Integration Hub delivers value to the business, by enabling Analytics, ML and AI initiatives with high-quality, integrated and standardized data.
The infamous and much coveted 360-degree view of the customer can be achieved only when data from all customer touchpoints are brought to bear.
Engaging in AI without high-quality standardized data is akin to Forrest Gump’s famous quote – "Mama always said life was like a box of chocolates. You never know what you're gonna get". All the very best in achieving enterprise data management maturity. Enjoy your Ultra-Marathon!