Kubeflow. If you read on, you might just fall in love with Kubeflow . It’s not just a cuddly way to get going with machine learning , it stands for something huge for the data science community. You’ve been warned. The Hard Way Let’s talk about an ancient phenomenon: worshipping . Many of us have been guilty of it, but it’s especially common among STEM folk who’ve undergone decades of hazing. The kind of hazing that measures your worth in how well you can reinvent a wheel, how well you can do every bit yourself, and how little help you accept. When you’re in the thick of this mentality, you think the difficulty or complexity of a task matters more than its value. (And perhaps one day you’ll use your vacation to build a computer and its operating system from scratch just to make sure you understand the everything of everything.) The Hard Way really The Hard Way means swimming through a sticky syrup of drudgery, taking forever to get things done. I had this attitude in my wild youth, but today I’m embarrassed on behalf of forces that make life difficult for no good reason. Progress tends to move in the direction of making things . I’d prefer to celebrate progress instead of being proud I can still do things the hard way. Sure, I can with pen and paper… by why would I ever do it? Because all the world’s computers might be raptured? easier invert a matrix Still waiting for this tech to make a comeback. Any day now. I’d better practice my by hand. backpropagation Let’s welcome the reduction in toil that technology brings. Bad tools and processes that slow everyone down belong on the endangered species list — let’s start being proud of doing what’s valuable instead of what’s difficult, and if you insist on tackling a real challenge, why not work to make difficult things easy for everyone else? embodies taking a stand against doing things The Hard Way. It’s a ski lift for your mountain of chores. Kubeflow Among the tools that embody a stand against The Hard Way, is a project that’s close to my heart — and I wrote some of the original blueprints in a manic fever right after our first cup of coffee together. I’m pretty sure I didn’t get up for sustenance that day, not while the first draft of the design doc was still on the inside of our skulls. Kubeflow David Aronchick What is Kubeflow? So, what is and why did it inspire all that intense passion? It’s a ski lift to help you with the mountain of setup chores. Let’s unpack that… Kubeflow machine learning The document and I wrote started under a different name. Eventually it was codenamed Grace, but we called first version Beautiful Machine Learning. Okay, maybe not my most original naming moment, but cut me some slack. We folk like a thing to do exactly what it says on the tin. (Thank goodness someone with some creativity fixed it later, right?) David statistician That’s the point of : beautiful machine learning. Specifically, the beauty of the ’s experience while wrestling into submission the beast that is machine learning in multicloud hybrid environments as an entire stack. Which, if you’ve tried to DIY it in the pre-Kubeflow days, was anything but beautiful. Kubeflow data scientist In case you prefer videos to reading stuff, here’s me summarizing from Next 2018. David’s talk on Kubeflow If you’re like me, you barely tolerate the boring parts of the process and it’s very hard to work with a song in your heart while you do the bits involving minimal cunning and maximal drudgery. (A personal exception is my perverse love of data cleaning, which I enjoy as a meditative activity in the same category as playing . Mmmm… injecting order into chaos. Delicious.) data science 2048 What data scientists want You want to work on those interesting models, you want to focus on testing your hypotheses, you want to make gorgeous plots (you the kind whose heart beats a little faster when I say interactive, shiny, animated?), and you want to get to the actionable insights. Yeah, me too. But first we have to spend an eternity on setup and operating systems and scaling and all kinds of painfully boring stuff. I mean, come on, does anyone actually enjoy giving Big Data special treatment relative to small data? As far as most data scientists are concerned, it would be great if all the code we wrote were the same for big or small, laptop or cloud, prototype or production… Deep down inside we know that the only reason that this isn’t the case is that we live in the dark ages where our tools suck. So how about if we make them unsuck? And, while we’re at it, how about if we could have the ideal data science workflow with every dream bell and whistle all laid out so that using them was effortless and we could just get on with the fun parts of our work? is about giving data scientists the experience they’d have if they got rid of all the fiddly bits they don’t like. Kubeflow is what beautiful machine learning, and ultimately , is all about. It’s a blueprint for living the dream, for giving the experience they’d have if they got rid of all the fiddly bits they don’t like. is not perfect yet (it first greeted the world at the end of last year), but it’s moving rapidly in the right direction. That Kubeflow data scientists Kubeflow Making our tools unsuck If we want to go about fixing things, where should we start? The team picked composability, portability, and scalability. Kubeflow accommodates all the various data science tools out there. Composable ML means it’s easy to go from prototyping models on your laptop to running them unchanged in production. Portable ML means it’s effortless to go from your small data prototype to huge data pipelines. Scalable ML You know what’s really good at composability, portability, and scalability? and ! Containers Kubernetes For most data scientists, even talking about these tech buzzwords is an exercise in improvisation. Ugh, the gotcha. For most data scientists, even talking about these tech buzzwords is an exercise in improvisation. If you’re not already an expert, using machine learning on presents a world of pain — not only because there are so many things you have to become an expert in that are unrelated to your comfort zone, but also because all the primers are written for a different kind of engineer. You’ll need to earn a second black belt in addition to your one. Aaaand we’re face-to-face with the problem again! Kubernetes machine learning A second black belt? Most consider learning all that stuff a special kind of torture, and those who are excited by it might simply not have the spare time. Can’t someone else go learn for us so we can get on with our actual jobs? data scientists Kubernetes is a ski lift to help you with the mountain of machine learning setup chores. Kubeflow Consider your volunteer buddy here. The whole point of it is to make it easy for everyone to develop, deploy, and manage portable, distributed machine learning on . No extra black belts needed. The goal is to have your entire stack made unannoying and complete with every beautiful toy you want. We’re not there yet, but we’re accelerating fast. For example, v0.2 gets you fully set up with one (!) single line of code. Kubeflow Kubernetes Kubeflow You can deploy Kubeflow with the one-step script included in the download. For more info, see . David’s talk It takes only one line of effort to get notebooks, distributed training, and model serving fit for the hybrid cloud environment. Oh, and there’s customizable ksonnet packaging too, H2O.ai tooling, and powerful hyperparameter tuning to boot. Data scientists, think on these additional things, look me in the eye, and tell me you want to figure out autoscaling based on job submission, cloud-specific VMs, and data exfiltration prevention. No? Well, luckily you don’t have to. Congratulations on waiting it out long enough to have it taken care of for you, kind of like you don’t need to build your own computer anymore. Jupyter A glimpse of what hyperparameter tuning with Katib looks like. What’s a hyperparameter? It’s that dial on your toaster. When your has a lot of knobs and dials, figuring out what to set them to can be annoying — luckily tools like Katib are here to help. machine learning algorithm An attitude shift towards inclusion is the discipline of making data useful, and since the world is generating data like never before, the work do is becoming more necessary than ever. We all want a world where information is used to make life better instead of collecting cobwebs. There’s so much data and so much work to be done that we need the barriers to entry into analytics lowered as quickly as possible. I’m proud of the role is playing in that. Data science data scientists Kubeflow We’re entering an era that is about empowering everyone stand on the shoulders of giants. Actually, by talking about how lovably shiny is, I might have distracted you from the bigger point: what it represents. It’s an attitude shift towards inclusion and it’s one of the early steps our community is taking in an era that accelerates everyone’s ability to stand on the shoulders of giants. It points to a bright and exciting future! Kubeflow Imagine a world where the tools are so easy absolutely everyone can use them. That’s where we’re headed! I’m proud that as a community of data professionals we’re starting to shun The Hard Way and stepping up to make fiddly things easy for everyone else. We’re beginning to say, “ Don’t know the gory behind-the-scenes details? That’s okay! We refuse to leave you behind. Come have a seat at our table and join us in making incredible things.” That smells like progress to me — let’s have more of it! If you’re interested in trying out Kubeflow, get started . Or learn more in David’s fabulous . here talk