Jay Zalowitz

@jayzalowitz

Engineer to Engineer: How Mabl Builds Cutting Edge Machine Learning

Disclosure: mabl, the ML-driven test automation service, has previously sponsored Hacker Noon.

Today, we’re going to catch with mabl engineer John Kinnebrew about how to build what’s next in machine learning. As an engineer myself — it’s very exciting to dig into how mabl is building what they’re building.

Jay: What do you think mabl is getting right ML wise that others aren’t?

John: I think one of the biggest things we’ve gotten right in applying ML in this space is how we frame the interactions between mabl and the tester, which informs the rest of our ML and system design. In particular, our approach of having testers “train” mabl, rather than focusing on making it easier to record a test script that can be directly executed, has had major implications. Testers train mabl on a user’s journey through their app by showing mabl what a user would do and describing their expectations about important aspects of that experience. To test that journey, mabl attempts to simulate a user completing it based not only on her initial training but also what she’s learned about the app through past attempts and even from other journeys and apps. Thinking about what mabl should do in these terms, means that we were never going to be able to describe a journey with something as static as a Selenium script. Instead, we designed our own domain-specific language that is intended to capture the user’s actions and intents, but is only executable in concert with an evolving knowledgebase of the context for those actions in the app. This has allowed us to incrementally incorporate different ML and AI techniques to seamlessly improve mabl’s ability to simulate users and learn from each attempt.

What made you join mabl?

There were a lot of factors, but I think the people and culture that I was introduced to during interviewing with mabl had the biggest impact. I immediately got a feel for the small, very experienced team at mabl. These were definitely people that I could learn a lot from, and they were brought together in an informal and cooperative, yet clearly excited and driven, atmosphere. When you combine this culture with the intriguing and ambitious vision that the founders, Dan and Izzy, laid out, the opportunity to help achieve it with AI and ML was very appealing. In fact, it was so appealing that I started later the same week that I interviewed, and it’s been a great experience ever since.

What type of ml do you guys use in your system?

We’re using a variety of approaches to make mabl a smarter and more effective tester. Some ML and AI techniques are enabling mabl to more effectively simulate users by identifying the current state of an app or the most appropriate action to take. For example, machine vision and image processing techniques are important for assessing whether mabl reached the expected state of the app after taking an action. We’re also evaluating decision-theoretic and cognitive approaches to improve mabl’s ability to simulate users attempting to complete a journey or achieve a specific goal in an app. Other ML techniques are helping us provide deeper insights beyond the robust testing functionality. For example, Bayesian modeling of load and execution time distributions helps us detect regressions and other aberrations in app behavior even when the functionality isn’t directly broken.

Do you have something your models have taught you that was surprising? What was it?

Not having done much front-end work in my career, trying to model whether an action was something as conceptually simple as “click a button” was surprisingly difficult. The sheer variety of approaches in web apps to something as simple as a clickable button, really drives home the importance of two related aspects of mabl’s design: 1) both uncertainty and random variation are prevalent enough that probabilistic models/techniques are usually a strong contender, and 2) experimentation, aka “trying some options and seeing what happens”, is a vital part of how a machine can effectively adapt testing to evolving apps, especially in a CI/CD world (although that probably goes for humans too).

What opinion do you have (as a computer scientist) that is unusual? Why do you think that?

I usually don’t care about my algorithm or system getting exactly the right/best answer. If it gives a decent answer quickly most of the time and rarely gives a really bad answer, I’m probably quite happy. Of course, this could probably be a good heuristic for determining whether you’re working on an AI algorithm (and hopefully not just a sloppy programmer), but it tends to go against the grain for a lot of developers.

What’s your favorite python package?

I’d have to say that Toolz is my favorite Python package. At this point, I automatically include it in the dependencies for a new project because the chances are negligible that I won’t end up using it. If you want to write clear, concise, functional code in Python, I’d recommend getting to know it. Pandas is probably a close second for many analytics and data science projects and prototypes, although I have a little more of a love-hate relationship there. If the dataset’s not too big, and you can visualize how you want to manipulate it, you can probably do it in just a few lines with Pandas. Unfortunately, there are probably 10 different variations to choose from and another 10 that look nearly indistinguishable but actually do something else — sometimes it’s flexibility can be a bit of curse.

What do people get wrong when testing?

I can’t claim any great expertise in testing, but I can tell you some of the things that present challenges for mabl. One thing that’s always hard to capture is intent. This can be doubly hard in learning from training sessions when dealing with both dynamic data and an evolving UI. When a tester trains mabl to click on the “Add new workspace” button, the intent is pretty clear and when that button later moves 20 pixels, changes color, and has the text “Create a workspace”, mabl can be pretty confident in her identification of it as the most likely way to complete that action. Similarly, when a tester trains mabl to click on the first item in a list that meets specific criteria like starting with “Add” and ending with “to shopping cart”, that’s pretty straightforward. However, connecting that to a later choice that might or might not depend on which item was added, especially when there’s no text or image that exactly matches, becomes a lot harder. Did the tester intend to always choose the second item in a list on this page, or possibly the item with the word “baseball” in it, or… It’s those connections over multiple steps in testing a journey that can pose a particularly interesting challenge, especially when training can involve a mixture of specific criteria in one step and not in later, related step.

I noticed you guys are running periodical tests, what queuing system did you use? Is this on lambdas? (I’m assuming rbtmq and celery from your past work experience)

Our entire system is running on Google Cloud Platform, which immediately gives us a lot of scalability and enables us to focus on the unique features of mabl rather than managing the infrastructure. Each time mabl tests a user journey, a Cloud Function is actually picking up a request off a PubSub topic and submitting a Kubernetes job to Google Kubernetes Engine. So Kubernetes manages the actual container creation and execution (and retries, if needed) for tests, and we can keep different parts of our system loosely coupled through PubSub.

What do you think of AWS Athena?

I’ve found AWS Athena to be really convenient for moderately large quantities of data that need to be preserved primarily in case of future need or that were provided as an evaluation sample from various data providers. For those cases, where a one-off or exploratory analysis is needed, but there aren’t any strong time or availability constraints, it can be an excellent time saver to just shove it into S3 and be able to query it. It’s especially nice when the data gets provided to you in a bunch of cryptically named CSV or TXT files, which has happened to me more than once.

What’s your favorite recent ML advancement?

I find deep reinforcement learning to be incredibly compelling. The pairing of deep learning for abstracting a visual space into a more meaningful and, importantly, much smaller state space with reinforcement learning techniques was a fantastic inspiration. However, I’m not certain it will be as widely applicable as some hope. Personally, I’m looking forward to similar advancements with evolutionary techniques. In particular, work with only approximate evaluation criteria in evolutionary algorithms that can incorporate sparse, human judgments seems quite promising.

What do you think of Boston as a tech hub?

Boston has continued to invest in its position as a tech hub, and I think the combination of that support and the top-tier universities and students here is a powerful combination. It creates a different sort of tech scene than someplace like Silicon Valley, but the variety of academic ties and New England culture make for a strong tech community that is committed to staying on the cutting edge.

I noticed you went to Vandy, as a Tennessean I’d love to know what you think of the volunteer state?

I had a fantastic time living in Nashville, and enjoyed many trips to either end of the state. I may be a little biased because my wife and her family are from the area, but I think it’s pretty hard not to love the people (and the food!) in Tennessee.

More by Jay Zalowitz

Topics of interest

More Related Stories