How to make data science experiments agile The tension between long-term planning and short-term flexibility is everywhere, including methodology. Is it possible for product development teams to reconcile rapid iteration with the slow-moving behemoth of the deep research process, or must they pick one? data science Agile iteration or rigorously planned deep research: can data science have both, or must it choose? For our case study, we’ll take . not only is it possible to reconcile fast and slow approaches to data science experiments, but the lessons BrainQ has learned along the way offer a your team can follow. BrainQ Spoiler alert: roadmap BrainQ’s mission is to treat neuro-disorders with AI-powered technologies. If you were about to humor our discussion of agility, that probably stopped you in your tracks — we’re dealing with a behemoth’s behemoth: medical research data science. Traditional is bad enough; most practitioners agree that if you have your heart set on trustworthy statistical inference, there’s a lot of thinking and planning in your future. Every statistics 101 class teaches you to have your hypotheses, methodologies, and assumptions hammered out before your gruelling data collection process even begins. Measure twice, cut once, and don’t even think about iterating! data science Medical research is an especially difficult place to inject an agile approach into data science experiments. BrainQ’s with EEG applications shows it can be done, which is good news for other startups whose industries come with less red tape. case-study Whenever you add even a whiff of medical research, it all gets slower. Now the process starts out coated in the superglue of regulatory approval and clinical trials coordination. Agility — with your process broken into small, predictable, iteration-friendly components — is the dream, but how do you inject an agile approach into deep medical research? Data science lends itself well to both exploration and rigor, though not always at the same time. The trick is a two-punch approach. Data science lends itself well to both exploration and rigor, though not always at the same time. It turns out that data science best practices for exploring and triaging what’s worth doing with depth and rigor are all about nimbleness. Not everything has to be done slowly and carefully... Fixing a broken mindset The first thing to fix is the mindset that can come with classical training in and . A typical university exam in statistics presents a series of hypotheses for budding students to test, along with mathematically-phrased assumptions. Most of the finesse is in carefully (properly! rigorously!) testing them. From my first STAT101 midterm to my statistics PhD qualifying exams, the format I experienced was pretty much the same. This kind of thing makes up the bulk of our training, so it’s often the part a newly-minted treasures. statistics data science statistician Have you ever noticed that the hypotheses are there all along? Have you ever noticed that the hypotheses are there all along — nicely thought out by the professor — and students rarely have to question their genesis? Once the sacred question is in place, of course we have to pursue answering it with utmost seriousness. Now turn the whole thing on its head: have to come up with the and assumptions. How do you do that? you hypothesis It’s time to think about where the rigid mindset commonly encountered with classically-trained statisticians and data scientists comes from. Could it have something to do with traditional statistical education? One option is to mimic what you’d be used to from class. Meditate in a closet and come up with the hypothesis and assumptions in advance. Design the data collection strategy and statistical testing in advance of any data. Get everything ready to go and then get it right in one shot. Sounds good? We forgot humility. Chances are that we made a mistake in the setup. As someone with over a decade of experience at this, one of the best lessons I’ve learned is: it’s too hard to think of everything up front. It’s too hard to think of everything up front. Locking in an approach up front and following it rigidly means we’ll end up with a perfect solution to the wrong question. (Lovingly called making a .) Type III error in statistics What you never see in class is how everything can crash and burn if you messed up on figuring out how to ask your question. Those life lessons are hard to simulate and your head might pop as you imagine not imagining everything you forgot to imagine. Permission to get agile So if the inflexible approach that feels comfy from camp doesn’t work, what to do? Blend in some agile thinking, of course. Here’s the mind hack: allow your approach to be and burn some of your initial time, energy, and data on informing a good direction later. data science sloppy at first Here’s the mind hack: allow your approach to be sloppy at first to inform a good direction later. How do we go about doing that? This means you’re encouraged to start with: Allow yourself phases where the only result you’re after is an idea of how to design your ultimate approach better. use small sample sizes, synthetic data, and non-randomly sampled data to gain insights about the data collection process itself. Low-quality data: seek an understanding of what the payoff from minimum effort looks like. Start with bad algorithms which you know are only going to give you a benchmark, not your best solution. Rough-and-dirty models: instead of picking a single hypothesis test, feel free to throw the kitchen sink at your data for inspiration. You’re doing this to discover signals worth basing your final approach on. Add deadlines and MVP milestones to avoid the trap of infinite polishing, poking, and prodding. Multiple comparisons: This advice breaks pretty much every rule you learned in class. If the statistician in you isn’t screaming yet, I admire your sangfroid. This advice breaks pretty much every rule you learned in class. So why am I endorsing these “bad behaviors”? Because it matters what project phase you’re in. I’m all about following the standard advice later, but the early pilot phase has different rules. The important thing is to avoid rookie mistakes by remembering these two principles: take any findings from the early phase too seriously. Don’t collect a clean new dataset when you’re ready for the final version. Do Pilot studies in data science You’re using your initial iterative exploratory efforts to inform your eventual approach (which you’ll take just as seriously as the most studious statistician would). The trick is to use the best of exploratory nimbleness to inform what’s worth considering along the way. If you’re used to the rigidity of traditional statistical inference, it’s time to rediscover the benefits of pilot studies in science and find ways to embed the equivalent into your data science. The best source of inspiration for a bulletproof final version is the collection of lessons learned along the way to an MVP. This is the approach that BrainQ has embraced and it has worked wonders for them. If you’d like to learn more about the nitty-gritty of BrainQ’s process, check out the full case-study on , a technically-oriented source of applied AI advice for startups, operated by , and co-edited by and myself. The Lever Google Developers Launchpad Peter Norvig Looking for a detailed guide to help you start an applied project? . Enjoy! ML/AI I’ve got your back

Rethinking Fast and Slow in Data Science

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

9 Things You Should Know About TensorFlow

What Are Convolution Neural Networks? [ELI5]

The Noonification: Have U Been Pwned? (1/12/2023)

Goldman Sachs, Data Lineage, and Harry Potter Spells

People are still crazy about Python after twenty-five years

10 Questions to Consider when Setting up a Corporate A.I project

9 Things You Should Know About TensorFlow

What Are Convolution Neural Networks? [ELI5]

The Noonification: Have U Been Pwned? (1/12/2023)

Goldman Sachs, Data Lineage, and Harry Potter Spells

People are still crazy about Python after twenty-five years

10 Questions to Consider when Setting up a Corporate A.I project

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps