Here’s a reality check(list) to help you avoid the pain of learning the hard way If you’re about to dive into a or project, here’s a checklist for you to cover you dive into algorithms, data, and engineering. Think of it as your friendly consultant-in-a-box. machine learning AI before Don’t waste your time on AI for AI’s sake. Be motivated by what it will do for you, not by how sci-fi it sounds. This is a super-short version of my 18 minute monster . If you’re about to embark on , here’s hoping you can answer “yes” to all of these questions. Ultimate Guide to Starting AI ML/AI If you answer “no” to any of the checklist questions, this might be a portrait of your project. Step 1 of ML/AI in 22 parts: Outputs, objectives, and feasibility Does the person running your project and completing this checklist really understand your business? to the business-savvy person, not the garden-variety algorithms nerd. Correct delegation: Delegate decision-making : Can you explain what your system’s will be and why they’re worth having? Focus first on what you’re making, not how you’re making it; don’t confuse the end with the means. Output-focused ideation outputs Have you at least considered as an approach for getting inspired about potential use cases? Though not mandatory, it can help you find a good direction. Source of inspiration: data-mining Are you automating decisions/ ? Where you can’t just look the answer up perfectly each time? Answering “no” is a fairly loud sign that is not for you. Appropriate task for ML/AI: many labels ML/AI : Can you articulate who your intended users are? How will they use your outputs? You’ll suffer from shoddy design if you’re not thinking about your users early. UX perspective : Have you thought about all the humans your creation might impact? This is especially important for all technologies with the potential to scale rapidly. Ethical development Do you understand that your system might be excellent, but it will not be flawless? Can you live with the occasional mistake? Have you thought about what this means from an ethics standpoint? Reasonable expectations: Regardless of where those decisions/labels come from, will you be able to serve them in production? Can you muster the engineering resources to do it at the scale you’re anticipating? Possible in production: Do potentially useful exist? Can you gain access to them? (It’s okay if the data don’t exist yet as long as you have a plan to get them soon.) Data to learn from: inputs Have you asked a whether the amount of data you have is enough to learn from? Enough isn’t measured in bytes, so grab a coffee with someone whose intuition is well-trained and run it by them. Enough examples: statistician or machine learning engineer Do you have access to enough processing power to handle your dataset size? ( make this is an automatic yes for anyone who’s open to .) Computers: Cloud technologies considering using them Are you confident you can assemble a ? Team: team with the necessary skills Unless you’re after , do you have access to outputs? If not, can you pay humans to make them for you by performing the task over and over? Ground truth: unsupervised learning It’s possible to tell which input goes with which output, right? Logging sanity: Do you trust that the dataset actually is what its purveyors claim it is? (To learn from examples, you need good examples to learn from.) Logging quality: : Since your system will make mistakes, have you considered one type of mistake is relative to another? Indifference curves how much worse : Have you considered working with an expert in to help you visualize what you’re asking for? Not mandatory, but useful. Simulation simulation Have you stitched the scoring of individual outputs into a for the business performance of your system over many instances? Metric creation: metric : Has your business performance metric been reviewed to ensure that it’s not possible to get a good score on it in some way? Metric review perverse and harmful : (Optional.) Does your business performance metric with a standard ? If not, what you’re asking for might be very difficult. Metric-loss comparison correlate well loss function : Have you thought carefully about which you need your system to work for? The defines which broad collection of instances your system’s performance tests will cover. Population instances statistical population of interest : Have you defined a strict criterion for testing and committed to crushing your system if it doesn’t make this bar? Minimum performance minimum performance Once you’ve answered “yes” to all that, you’re ready to move to the next step of ML/AI, which involves data and hardware (and engineers, yay!). I’ll be putting out a guide on that soon. If that was too short of a short summary, the full guide to starting an AI project is . Enjoy! here