Data Artist & Influencer, Human - Machine mediator, Commercial connect-the-dotter, ML Trainer
I recently attended a networking event where I spoke to a range of graduates who were looking at prospective careers in the data science and adjacent spaces.
While talking to many of them, I would ask:
What is the most underrated or under-valued skill or trait in data analytics?
It is a question that often gets you an impassioned response, as well as giving you an insight into what people consider to be their own differentiating traits.
As the conversation went along many of the graduates would reciprocate and ask me for what I considered the most under-appreciated skill to be. I caught a few of them aback by answering immediately & without hesitation, and further surprising them with my answer.
Often when you read an article about creativity in data analytics, it will mostly be about the importance of visualisation. What good is all that great analytics processing power if you can’t make the message stick with a (non-technical) audience right?
This conversation will take you into the worlds of how to use drop and drag software like Tableau, on the importance of selecting the right visual for the right data message and storytelling through ‘builds’ and other methods, to the general ‘evil’ and aesthetic veto against the pie chart. (1)
However, this is only part of the story (and a part that I will get to later). There are other parts of the analytics process that either gain value when you apply creative thinking to them, or are in and of themselves symbolic of the creative process.
- Hypothesis generation
- Feature engineering
- Workflow management
- And finally, visualization
Let’s have a look at each and talk about how data professionals would be wise to tap into each.
The most valuable (albeit rare) insights are those that are;
1) Currently unknown to the business, or
2) Contrarian to a commonly assumed position currently held by the business
So, although domain knowledge is very important it will only get you so far. Once you have an understanding of the asset (data) at your disposal and what is known to date, you will need to ask yourself:
- What questions can I ask of the data to best unlock value for my organisation? and
- Which previously held assumptions are the ripest for stress testing?
The most valuable of those questions (as alluded above) can include those that do not have any precedence within your organisation. Hence, you will be developing a hypothesis on something that may not have been tackled before. You aren’t only revisiting work previously done, or incrementally improving on something that already exists, you are looking to do 0 to 1 analysis in some instances (2). You generally need to be creative to identify these opportunities and craft the right hypothesis for testing.
Although some teams rely on ‘the business’ to guide data scientists & analysts by pointing them to questions that they would like to have answered, you will be much more valuable if you are able to produce these leads yourself. So learn how to create good hypothesis for testing, working backwards from what you know would be both valuable and implementable to the business, and adding some creative flair to ensure that you examine the problem from the every angle, and you will enhance your value and impact.
This one should be self-explanatory, but when you curate your data source so as to come up with the features that make the whole greater than the sum of the parts, this is inherently a creative process. Some of the best performing additive actions that I have made, to models that I have built, have not been in the iterative improvement of those models, but rather in adding a feature that was previously unavailable, or just not being used by the business. Learn how to think about what data assets you have and how they can be combined when curating your final source, as well as which features can be created from scratch, to boost the quality of your insights or the real life application of your models.
One of the other most under-appreciated aspects of a data professionals skill-sets is workflow management. Are you able to identify the tools and processes that require you to get you to your desired result (fastest)? Can you weigh up the value of making incremental deviations from said approach versus the return you can expect when proceeding down said path?. Can you adjust on the fly and stack hypothesis on top of hypothesis as newfound insights reveal new information to you and present new leads to pursue? Can you quickly identify which hypothesis are proving to be futile and quickly cut your losses?
Which tools should you leverage when moving through the various stages of data science and adjacent professions (data cleaning and wrangling, EDA, feature engineering, data mining, modelling, visualisation and then presentation)? Do you move from away from an RStudio into a Tableau when it comes time to produce a visualisation or do you code up your graphs using ggplot? Hint: I think you are crazy if you write any lines of code (which in most instances are many lines of code) when doing visualisations, if you are a fan of ggplot at least use a package like esquisse (3) to slash the time you spend on this exercise. Managing this and knowing what suits your personal coding style and maximizes the impact of your work while minimizing the time spent doing that work is a valuable skill.
So tap into your inner artist (or hacker) and creatively stitch together the processes that you feel are best suited to your personal (coding) style and get you closest to the best end result possible with minimum effort (time). Be brush agnostic as long as you are making beautiful art!
This is more of an ethos than a technique. There is art that is magnificently complex, but there is value in art (and aesthetic) that defaults to the simple. Err on the side of simplicity and embrace this when using your workflow. Keep your visualisations simple but rich in information. Stress test whether something requires more rigor or whether enough has been done to establish an acceptable result — and then only proceed with a more technical additive approach if you feel the returns will justify the time spent. Work on subsets of a problem (by small chunking) if the likeliness of success presents some risks. Default to simplicity rather than complexity especially if you are working in a data immature organisation (where in some instances you may find valuable insights simply through EDA).
So this is the one that everybody talks about — and fair enough, it is the most visible and influential of the above. It also is an embodiment of the work that you have done to date — so it’s in your best interest to do your work justice and learn how to present it well.
Essentially, make sure you put the right amount of effort into effectively communicating the outcomes of your work to your stakeholders. Put some time into thinking about the way that they digest information and create a presentation or story to capture their imagination while simultaneously getting your message across with the right amount of detail and flare. This is the obvious one, so you cannot neglect it.
So don’t under-appreciate the times that you will need to be creative, innovative or think quickly on your feet when doing data science and adjacent professions. Focus on where you can apply your unique view of the world and add some creativity to your analysis, and you will differentiate yourself from the competition.
This article originally appeared here.
If you like this content and you also like BTC, feel free to send a tip to:
Or alternatively scan the QR Code below to tip / donate: