You must have come across various data science projects problems by working on projects with companies or startups that later get scrapped. Or, maybe the problem statement gets changed by some upper management, or there were client interventions due to lack of desired result or incomplete data. In any case, the failure rate of various data science initiatives is really high — often estimated at approximately 70–80%.
As per my experience, various reasons for data science project failure can be attributed to:
Not involving the right stakeholders in defining the problem who speaks the language of both the data and Business
Lack of research work and brainstorming because defining a problem is hard work requires multiple iterations to get it right
No proper analysis of the data/problem, and a lack of resource availability but the project continues
The team starts directly analysing data before agreeing on the problem to be solved
Confusion between the problem with its proposed solution
Defining the initial milestone of the project is necessary to keep stakeholders on the same page.
Better Problem definition keeps checks on the expectations of stakeholders and it saves a lot of time by reducing unnecessary iterations and creates a better understanding of the product for the developer, analysts, data scientists, and product managers. Involving someone who speaks the language of both data and business is super useful in this process, they become an organizational bridge between data science teams and business units so they are the ideal candidate to assume overall responsibility to enforce certain principles that are applied during problem definition process.
Some of the principles are mentioned below:
Get the right stakeholders involved. To ensure that your problem definition has the correct inputs, achievable expectations, and initial milestones defined to keep everyone involved on the same page.
Leaders should allow plenty of time to rigorously define the problem. You must have experience in your team, problem statement often changes as people work to get them right. Leaders of a data science project should allow plenty of time, and encourage brainstorming, debates, and documentation of problem statements in detail as they progress which ensures all the stakeholders are on the same page.
Do RCA to understand the problem definition better. Frame the problem in terms of data complexity, data availability, and data liability. Although having a proper problem definition is nice but it must be supported in terms of data and infrastructure available in the organization. Please do not confuse the problem and its proposed solution, For example, A social media product is getting less engagement compared to another similar service provider and management believes that competitors are using an advanced recommendations engine. It will be easy to jump to a problem statement like building a better recommendation engine to increase product engagement. But that presupposes that a more sophisticated model is the solution to the problem without considering other options, such as improving the push notification algorithm or building a better UI engine, etc. Confusing the problem and proposed solution all but ensures that the problem is not well understood, limits creativity, and keeps potential problem-solvers in confusion.
Do not move past the problem definition until it meets the following objectives:
Problem definition should be clear, solving it should lead to a good business result
It considers all constraints involving time, initial milestone, budget, Technology, data complexity, data availability, data liability, and relevant people/stakeholders, which should be clearly articulated to avoid a problem statement misalignment with business objectives
It must receive approval from all the involved stakeholders
Organization alignment is key to success, so ensure all resources who are responsible for solving the problem understand their role and responsibilities.
In conclusion, taking time to define the problem is a painful exercise and sometimes it will feel like an uncomfortable process but there is no substitute for getting the right people involved, probing the problem more deeply, and taking time to understand the business objective that an organization trying to achieve. Every data science team needs to get better at defining the problem the right way.
At last, I will leave with a good quote from Albert Einstein:
“ If I were given one hour to save the planet, I would spend 59 minutes defining the problem and one minute solving it ”