We are studying the emerging discipline of Machine Learning Engineering by investigating best practices for developing software systems that include ML components. In this article, we share the research motivation and approach, some initial results, and an invitation to help us by taking our 7-minute online survey on ML Engineering best practices. (Photo by Franck V. on Unsplash ) Engineering Machines that Learn Machine Learning is key to the new wave of AI Artificial Intelligence (AI) is undeniably experiencing a new wave of attention, energy, and sky-high expectations. This wave is driven by the abundance of data that is generated in our connected, digital society, and by the low-barrier availability of enormous computational resources. Among various AI-techniques, Machine Learning (ML) in particular has come to play a key role. The current surge of Artificial Intelligence is driven by Machine Learning, as indicated by relative interest in search terms according to Google Trends . Learning complex behavior from examples Machine Learning allows us to solve complex problems, not by arduously writing new code, but by letting an existing algorithm learn new behavior from examples. We are now witnessing break-through results in image recognition, speech processing, medical diagnostics, securities trading, autonomous driving, product design and manufacturing, and much more. Does Machine Learning replace programming? Does the rapid ascent of Machine Learning mean that software systems will no longer need to be programmed? Will we need data scientists instead of software developers? To those that have experienced software-related project delays, system outages, and indefinitely incomplete feature sets, a world without programmers might seem attractive. Does Machine Learning require programming? But no so fast. There are several reasons why Machine Learning will replace programming, but rather make the software engineering discipline even richer and more complex. not ML algorithms are themselves software that needs to be developed, tested, and maintained. Using an ML algorithm requires programming, for the tasks of ingesting, cleaning, merging, and enhancing data, for feeding the data into the ML algorithm, for running repeated training experiments to generate, evaluate, and optimize an ML model, and for testing, integrating, deploying, and operating ML models in production systems. Trained ML models are just one building block in the construction of complex software systems. So, what is different? Still, there are specific characteristics of Machine Learning that challenge traditional software development practices. The amount of data to manage is typically much larger for applications that involve Machine Learning components. The development process tends to involve more rapid-cycle experimentation, where alternative solutions are routinely attempted, compared, and discarded. And the level of uncertainty in the final product is higher. inherent Emergence of the Machine Learning Engineer. Relative interest in search terms according to Google Trends . ML Engineering Around the globe, numerous organizations are learning step-by-step how to develop software systems that include ML components. With an increasing number of people self-identifying as ML Engineer, the discipline of Machine Learning Engineering is emerging. This raises interesting questions: Is ML Engineering distinct from Software Engineering? Or is one a sub-discipline of the other? Do established Software Engineering best-practices apply equally when building software systems with ML components? Or do these best-practices need to be modified or replaced? Can a canonical set of ML Engineering best-practices be identified by which practitioners can be guided and newcomers can be educated? Investigating ML engineering practices To investigate these questions, researchers in the fields of Software Engineering and Machine Learning have teamed up. We have started with an extensive review of , to which practices are described and recommended by practitioners and researchers. These practices range from data management (e.g. how to deal with storage and versioning of large data sets), through model training (e.g. how to run and evaluate training experiments), to operations (e.g. how to deploy and monitor trained models). both scientific and popular literature identify Aspects of ML Engineering organized into groups of practices. Surveying the adoption of ML Engineering practices We then embedded the identified practices in a among representatives of teams that build software with ML components. This survey is currently in progress and (see below). At the time of writing about 200 teams have participated. Early results show that larger teams tend to adopt more engineering practices. survey open for new participants Early results of our global survey on the adoption of engineering practices by Machine Learning teams. Larger teams tend to adopt more practices. Also, early results tell us that some practices are widely adopted, and can be considered , while other practices are only applied by more experienced teams in larger organizations, and can be considered . basic advanced An example of a more advanced practice is the use of so-called machine learning techniques, where teams are able to do model selection and hyper-parameter optimization in an automated way. Early survey results indicate that these techniques enjoy much stronger adoption in tech companies and (academic) research labs than in non-tech companies and government. automated Early results of our global survey on the adoption of engineering practices by Machine Learning teams. Teams in tech companies, universities, and non-commercial research labs tend to make much more use of automated machine learning techniques than teams in non-tech companies and governmental organizations. Towards a ML Engineering best-practice catalogue We are using the results of our survey to organize the best practices into a comprehensive catalogue. In the catalogue, each ML engineering practice is recorded in a uniform structure, much like and have been catalogued in the past. design patterns refactorings Elements of the structure include the and of the practice, its in various contexts, the with other practices, and a short and actionable description of how to apply the practice. We also provide references to literature and supporting tools. intent motivation applicability interdependencies Using the survey results we are also able to the of each practice. This helps us to sort them into difficulty levels from to , giving guidance to teams to prioritize their adoption. quantify difficulty basic advanced Our ultimate objective is that the resulting catalogue will help the formation and effectiveness of ML Engineering teams, not only in the larger tech companies where ML Engineering already enjoys strong adoption, but also in smaller and non-tech organizations. Take the survey! If you are part of a team that builds software that includes Machine Learning components, please by taking our survey. help us Take the survey: https://se-ml.github.io/survey/ Joost Visser is professor of Software and Data Science at Leiden University. Previously, Joost held various leadership positions at the Software Improvement Group. He is the author of numerous publications on software quality and related topics. Joint work with Alex Serban , Holger Hoos , and Koen van der Blom . For more information, consult the SE4ML project website .An earlier version of this article was published in Bits & Chips .

Google

On the Relevance of Software Engineering for the Development of ML based Software Systems

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Are we getting better at software development?

10 Reasons Why Less Is More in Your init/deinit Methods

10 Best Practices for Using Kubernetes Network Policies

10 Best Practices for Every React Developer

12 Essential Coding Standards for Quality Web Development

12 Essential Django Programming Tips for Developers

Are we getting better at software development?

10 Reasons Why Less Is More in Your init/deinit Methods

10 Best Practices for Using Kubernetes Network Policies

10 Best Practices for Every React Developer

12 Essential Coding Standards for Quality Web Development

12 Essential Django Programming Tips for Developers

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps