This entry in our research page is more of a blog post rather than an actual research work; the goal being to share what we are working on and why we are doing it. : bring me to the . TL;DR sneak peek 1) Building apps and websites is insanely slow Back when I was an undergraduate student, I used to work part-time as a front-end developer in a digital agency. I was fortunate enough to be part of a team made of extremely talented people with art directors and UI/UX designers crafting gorgeous interfaces and creative front-end engineers building cutting-edge applications using the latest technologies . The team was working exclusively for high-profile clients and collecting awards such as and to acknowledge the quality of their craftsmanship. (remember when WebGL was getting cool and Adobe Flash was dying, that was that time) The FWA Awwwards Working in the web industry was a lot of fun but something struck me: . design the workflow is completely broken The majority of designers I worked with prefer to sketch their creative ideas on the whiteboard or on their fancy notebook instead of using a wireframing tool like or . They would argue that these are constraining their thoughts and would kill the creative flow — and I quite frankly agree with them. No surprise that graphic tablets are so popular among designers as the device attempts to recreate the pen-and-paper feeling digitally. Some designers would actually draw their ideas directly in a software using their graphic tablet as an attempt to save some time. Balsamiq Axure tools The graphic tablet, designer’s best friend. source: Wikipedia Regardless of the method chosen to sketch ideas, designers would then have to recreate their drawings either in a wireframing tool to get the layout validated by the customer or the project manager, either directly craft the user interface in their favorite design tools such as or . This essentially means having to redo the very same work twice by converting the work done in one format to another format. Adobe Photoshop Sketch That’s the first way in which the workflow is broken. Once designers have finalized the look of a given user interface, they would ship their work to a front-end developer in order to get it implemented in code. Implementing user interfaces basically consist in re-creating in code what the designers created graphically in a software. Doesn’t it sound like duplicated work once again? And here is the thing, as a developer you want to focus on implementing the client-server logic, the core functionality, optimize the interactive graphics, animations, and transitions; but you end up wasting the majority of your time coding user interfaces. Writing HTML/CSS is super boring, repetitive, frustrating and so time-consuming that it prevents iteration cycles with designers. In some digital agencies, designers are in charge of implementing the user interfaces they sketched. But the problem remains the same, someone has to sit down and manually write cumbersome, boring, and repetitive UI code. That’s the second way in which the workflow is broken. The classical workflow for building apps and websites. As shown in the figure above, these redundant steps bring zero value to the project since their one and only purpose is to convert a user interface encoded in one format to another in order to enable the next step in the workflow. Because these conversions are performed manually by people, they are expensive, time-consuming, frustrating, and prevent innovation because they consume precious time that should be instead spent on iteration cycles to improve the app being built. 2) Deep Learning at the core of a possible solution As a graduate student focusing on Machine Learning, I was amazed by the breakthroughs made possible by Deep Learning. Computers were finally able to process images in a somewhat satisfying manner. I remember being completely mindblown reading the paper by Vinyals et al. at Google where a deep neural network was trained to generate an English description given an input picture. Show and Tell Show and Tell: A Neural Image Caption Generator, Vinyals et al. 2015 Inspired by this work and many others, I envisioned that generating English descriptions given a photograph should basically be the same as generating computer code given a UI mockup. In both cases, you want to produce a textual output given a visual input. After letting that idea take the dust in my notebook for a long time, I finally decided to write some code and see if my assumption was correct. To my great surprise, it actually worked! Of course, it worked in a controlled environment and a lot more work would be needed to improve the technology to meet real-world requirements. Nevertheless, this encouraging first step suggests that Deep Learning can indeed be leveraged for the automatic generation of code from user interface images . That was the moment I wrote the paper and decided to open-source a for educational purposes. Surprisingly, the project received quite a attention, was covered in of ML-related podcasts, and was even the subject of a episode. (and Airbnb agrees with us ) pix2code basic implementation and a toy dataset lot of media a couple Two Minutes Papers 3) Seeing the bigger picture At , we are essentially teaching machines to understand graphical user interfaces the same way humans do in order to propose a more efficient workflow for building apps and websites. Our core technology has evolved quite a lot since the release of our but the central idea remains the same: we are building a software pipeline made of neural network weights to convert pixel values to sequences of characters . The workflow we are envisioning is pictured below. Uizard pix2code paper (e.g. photograph, screenshot) (e.g. iOS code, Sketch file) The modern AI-driven workflow we are envisioning for building apps and websites. For professional users such as designers and developers, such a technology would save critical time early on a project by enabling ideas to be tested quickly, boost iteration cycles, and eventually enable the development of better apps. The goal is to save as much time as possible on trivial tasks; no one enjoys redundant work. Most importantly, this would allow designers and developers to focus on what matters: bringing value to end-users. The entry level to build simple apps would become really low. Learning to use a UI design tool takes time, learning to code probably takes even more time. However, most people are able to draw a user interface on a piece of paper; allowing your grandma to go from an idea to a materialized UI running on her phone in a matter of seconds. Our vision is to empower people with Artificial Intelligence because we believe in a future where machines assist humans, not replace them. 4) Sneak peek We are working hard making our vision a reality. In the meantime, the are really excited to share some of our progress. four of us

Flow

Adobe

AirBnB

Teaching Machines to Understand User Interfaces

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

It Is Okay If You Don't Know What You Like. We Do (feat. Deep Recommendation Algorithms)

10 Machine Learning, Data Science, and Deep Learning Courses for Programmers in 2020

10 Computer Vision Startups on Product Hunt with the Most Upvotes

10 Best Entry Level Machine Learning Tutorials

10 Best + Free Machine Learning Courses Collection

The Noonification: Proglogging: The Developers Detective Toolkit (10/9/2023)

It Is Okay If You Don't Know What You Like. We Do (feat. Deep Recommendation Algorithms)

10 Machine Learning, Data Science, and Deep Learning Courses for Programmers in 2020

10 Computer Vision Startups on Product Hunt with the Most Upvotes

10 Best Entry Level Machine Learning Tutorials

10 Best + Free Machine Learning Courses Collection

The Noonification: Proglogging: The Developers Detective Toolkit (10/9/2023)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps