5 Most Important Tips Every Data Analyst Should Know

Written by xavierdeboisredon | Published 2021/11/11
Tech Story Tags: data-science | data-analysis | data-analytics | big-data-analytics | data-engineering | founders | data | data-bias

TLDR#1 If your analysis doesn’t have any bias, then look again #2 Most first drafts can be done in Excel #3 Get yourself a tool that keeps your query history #4 Don’t fix the data, fix the process that creates it #5 Read the article to find out via the TL;DR App

#1 If your analysis doesn’t have any bias, then look again

Problem Definition
A bias is an inclination for or against an idea. Most of the time, this is totally unconscious, it takes place mainly when our results are exactly how we expect them to be. We are all human beings, if we have expectations about something, and after digging the data a bit, our first results are as per our expectations, then we tend to stop right there. When our results aren’t how we expect them to be we can keep digging until there are.

How to avoid that?

Think about what could make your analysis results wrong. I see two main drivers of such bias.
The scope of your analysis
Try Changing the date range focus or even the data used may get you different results. The classic challenges deal with seasonality and mix effects. Be mindful of cohorts effects
The methodology of your analysis
This one flirts with statistics 101, now that you’ve got the right scope of time and data points, think carefully about how you aggregate them to get results. Outliers are to be considered, aggregation metric too. Always check the Mean Versus Median.

#2 Most first drafts can be done in Excel

That title is a bit provocative. Yes, python is powerful and allows you to save and repeat your data processing. But there is the cost to that. First, it takes time, especially if you’re not a python hotshot. Second, collaboration is tougher with non-tech users. If you need non-code-savvy people to work with you on your data app, then python will slow them down.
As a data player, you’ll want to do projects in Python, simply to ramp up. But choose them carefully. If you have a super tight schedule and excel does the job then go for excel. You can migrate later to python as it is always easier to learn one thing at a time. It’s hard to do a brand new data app with a language you’re not comfortable with. First do the analysis with a tool you know well, then migrate it to the new language.

#3 Get yourself a tool that keeps your query history

Ever got a data request similar to the one you had 3 months ago? Happens too many times per year, wishing you had a nice history of all queries you ran in the last 365 days…
Check out Castor to do so, a tool built by me and my team.

#4 Don’t fix the data, fix the process that creates it

Let’s start with a real-life example.
One of the data pipelines of one of my previous companies kept breaking because of a not-unique issue: a table field was supposed to be a primary but there were duplicates. That field was client_id and normally a client was supposed to be in one and only one country.
So whenever we had this issue we had to find the client linked to several countries and fix it. We would also remind the sales team of the “one country rule”.
Should we make a dedicated alerting system on this specific matter? Should we add a transformation layer on top? Should we remove that “unique” check? None of these. We must (and haven’t yet) simply enforce that rule when the data is created at the source, aka, in Salesforce by Salespeople.
As much as possible get to the root cause of your data issues, and make people understand that good data requires processes that are optimized for it. Processes are indeed made first to improve the business, but for the sake of having good data, they must factor in the data dependencies.

#5 Share your analysis as widely as possible

Too many data players wait for their data app to be perfect before sharing it. Share it now (with a “WIP” disclaimer at the beginning if you want). Do not spend more than a few days without having a peer review of your work. It will give you perspective.

Conclusion

Yes, hard skills (Python, SQL, R…) are key to getting started with your analysis but personally, I am looking more into soft skills (good communication, ability to see the big picture, straight-to-the-point, hacky).
Happy to have a constructive debate in the comments.
Also published on: https://www.castordoc.com/blog/the-5-things-every-data-analyst-should-know

Written by xavierdeboisredon | Castor (https://castordoc.com) - Bring trust and visibility to your data
Published by HackerNoon on 2021/11/11