The value of using data to make rational and accurate decisions is not new. From generals to great fictional detectives, people have relied on data to make the right decision and find the answer. Just in the same way businesses have been doing it for the past century.
Even before data science was a term or the words data-driven decisions flew through every boardroom. Using data to make better good choices and rational decisions existed. Why? Because it works. With the power of increased computing and easy to use analytical programming libraries analyzing massive amounts of data just got that much easier.
So then why is it that we have so many libraries to help analysts, data sources, increased computing, etc is it still difficult for some teams to produce concrete ROI?
Even with all of the improved capabilities and tools, executing good data analysis and data science is still not easy. This happens for a couple of reasons. Some of these include lack of experience, unclear goals, poor understanding of the subject matter and miscommunication.
Due to the difficulty in executing good data analysis our team wanted to share some tips we follow to help both experienced and new team members better approach any type of data problem. Whether it be data science or data engineering.
State A Clear Problem And How The End User Will Use The Answers
One of the issues we noticed, especially with new hires is they will often head off on a problem without even really knowing what the problem they are solving actually is. They have great intentions to solve the problem, but they just don’t grasp it yet.
Think back to when your English teachers provided you a prompt. Perhaps you or a friend read the prompt once and got a preconceived idea of what the prompt was asking. You then wrote a beautiful essay on what you understood. The essay was perfect in every way, it would have been an A+…had you actually written for the prompt. However, you didn’t and suddenly you get a C or lower because you never understood what was actually asked.
This is a similar issue with some data professionals. They will develop a final product that is beautiful in every way except it doesn’t actually answer the customer’s question. Typically this is caught pretty early in a meeting with the customer. It is frustrating for both parties. One party has put in a lot of work to start developing a product and the other feels unheard. In these cases, it is great to catch the issue early enough so that things can be fixed.
From the beginning, the goal is to make sure both the developer and the stakeholder are on the same page about the problem. One way to do this is to make sure they both work on the user stories. Now, this can either be done by having the developer spend time with the stakeholder to figure out how they will use the data system resulting in the developer writing the stories or have the end user writer up some user stories. User stories are descriptions on how the end-user views a feature, or in this case a metric or algorithm output. How will they use the data point to make a decision? Who will the decision impact?
For some engineers, this seems as pointless as writing an outline to an essay and double checking that it lines up with the prompt. In most cases, it helps to take this step anyway.
Creating a clear understanding of the question and how the answers will be used allows the data team to have something to check as they are developing the final product. That way, they can ensure the metrics and algorithms they develop match up with the steps the stakeholders want to take. If the metrics don’t answer the questions or provide the correct information for the stakeholders to take the next step, then it will take more time to rework and redevelop the final product to match what everyone is looking for. This is why it is important that the question and end-user stories are understood from the beginning.
Analyze The Daylights Out Of Your Data But Summarize Your Findings
We, as data professionals, have a bad habit of trying to walk through every graph, figure, chart and line item we analyzed when going through our work with stakeholders. This is not necessary. As data engineers and data scientists, we want to show our work to depict value and to show the credibility of our findings. Yet, by displaying lots of data and graphs we cause confusion rather than clarity.
To avoid confusing the stakeholders and to get them more on board with your conclusions it is helpful to avoid burying the lead. Don’t get too caught up in the fine details. Have a clear conclusion that is easy to explain and three or four specific data points that can help support it. This is much more helpful for a stakeholder compared to ten pages of research.
This is not to say get rid of the ten pages of research. The ten pages of research are very helpful for future projects and in case the stakeholder does want a few more specifics. The point is to avoid overwhelming the stakeholder with too much information and also the more numbers provided the more likely one may be miscalculated which can lead to lack of trust.
Act-On The Metrics And Track The Outcome
Once the system is actually developed it is important to get the end-users using it to understand what is working and what isn’t. In order to figure out the ROI of the current version of the tool, the data team needs to work with the stakeholders to gauge their outcomes.
For the things that are working, the data team or the stakeholders can further analyze to see if they can improve it. If they are targeting fraudulent behavior, then perhaps the algorithm is currently 70% accurate. The data team can increase the accuracy to %80 or %90 by looking at the %30 percent false positives for patterns and then removing the clusters with either a more refined algorithm or another follow-up algorithm. This feedback loop approach allows for improved accuracy and improved user satisfaction.
For the metrics and algorithms that aren’t working well the approach isn’t that different from figuring out how to improve the algorithm that is already doing well. Besides just looking at what isn’t working and false positives another good step is to reach out to subject matter experts(SME). Hopefully, this isn’t the first time the data team is reaching out the SMEs. This time, the team actually has data to share with the SMEs that could show patterns that the SMEs are accustomed to seeing as false positives. This could in turn help correct the issues with the algorithm faster than just having the data team trying to look for new correlations or possible solutions.
Data engineers and data scientists are smart people. This does not mean they know everything and it doesn’t mean they don’t get over excited. We need guidance and tips too!
These are just a few tips to help your data teams improve their execution. None of these tips are rocket science. They just involve taking a moment to think out the problem, create systems that are robust and easy to reuse and gain further insight through communication between stakeholders and data teams. Many of your data teams and project management professionals have processes set into place that. If your team needs help setting up processes or perhaps you are looking for a team to take on the development of an entire project, then feel free to reach out and we would love to help!