paint-brush
Our AI Coding Tool Went Viral, Then Everything Broke. This is What We Learned.by@yanglicosine
242 reads

Our AI Coding Tool Went Viral, Then Everything Broke. This is What We Learned.

by Yang LiOctober 11th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Our AI coding tool went viral at the wrong time, forcing us to pivot and build smarter, learn hard lessons, and ultimately develop a groundbreaking AI Engineer.
featured image - Our AI Coding Tool Went Viral, Then Everything Broke. This is What We Learned.
Yang Li HackerNoon profile picture


On March 2nd, 2023, we launched a simple VS Code extension to help developers search their codebases. Fast forward a few weeks and we had over 7,000 users – we had volumes of traffic that were far more than we could afford or handle. This is the inside story of how we navigated that chaos, completely rebuilt our architecture, and learned some hard lessons about building AI-powered developer tools.


Things that don’t scale

We were a small team of three, fresh out of Y Combinator's Winter 2023 batch, with a product that was very much in the "do things that don't scale" phase. Our core functionality allowed developers to "search for what your code does, not what it is," – enabling queries like "Find where we initialize Stripe in React and add some logging."


Our approach was simple but computationally intensive. We indexed entire codebases, passing each snippet through an LLM to generate a description, which we then embedded. It was a success, giving us the "what the code does" functionality, but at a significant cost.


Our launch tweet unexpectedly took off, and signups started pouring in. While this was exciting, it quickly became apparent that our backend architecture wasn't prepared for this level of demand. Worse, our approach was incredibly inefficient and expensive at scale.

At our peak, we were processing nearly a quarter of a billion tokens per day. This was both a technical challenge and a financial nightmare. We were burning through thousands of dollars worth of tokens and constantly hitting our rate limits.


Hard…but productive decisions

One of our YC group partners, Tom Blomfield, gave us crucial advice: stop onboarding customers who were going to have a bad experience. It was tough to turn away potential users, but we knew that short-term disappointment would protect both parties from damage in the long term.


We found ourselves at a bit of a crossroads. Suddenly our minimum viable product had lost the “viable” bit. We needed to figure out a way to deliver the same functionality without relying on an LLM for every query.


We considered using a self-hosted LLM as an alternative, but the costs would still have been prohibitively high for our traffic levels. After a deeper analysis, we realized that having an LLM in the loop simply wasn't feasible with current technology.


This led us to a solution that, while not groundbreaking, is often overlooked: creating a custom matrix to bias embeddings for our specific use case. Think of this as reorganising cookbooks based on how chefs actually search for recipes (e.g. cuisine or ingredients), rather than using a standard alphabetical system. This tailored approach helps find the right recipe much faster and more accurately, just as the custom matrix improves matching code snippets to user queries.


We gathered a corpus of open-source code snippets from GitHub and used an LLM to generate synthetic user queries for each snippet. We then trained a model to optimize the embeddings for our particular domain. The results were impressive. Our accuracy jumped from 61% to around 90%, all without the need for constant LLM queries. This allowed us to maintain the core functionality of our product while drastically reducing costs and computational overhead. Our approach to data curation became increasingly sophisticated, and we developed a proprietary pipeline to generate datasets that truly capture the nuances of how human developers work.

What we learned


  • Be prepared to pivot: Our initial approach seemed promising, but it didn't hold up under real-world conditions. Being willing to completely rethink our architecture was what allowed us to survive.
  • Monitor aggressively: We were caught off guard by the sudden surge in usage. Implementing better monitoring and alerting systems could have helped us respond more quickly.
  • Understand your cost structure: We didn't fully grasp how our costs would scale with usage. This led to some painful financial realisations in those early days.
  • Communicate clearly with users: Managing expectations during our rebuilding phase was challenging. We learned the importance of transparent communication, especially when dealing with technical users.
  • Don't underestimate data quality: In rebuilding our system, we found that carefully curated, high-quality data was far more valuable than simply having more data.

The road to Genie & the OpenAI validation

Our journey from a simple VS Code extension to Genie was marked by constant iteration and a willingness to pivot. We built what we could while waiting for AI technology to catch up to our vision. Our initial code search and retrieval system, cloud-based indexing platform, and experiments with emerging AI models all became crucial building blocks. When more advanced language models finally became available, we were uniquely positioned to integrate them with our existing tools, leading to the development of Genie - a fully autonomous AI software engineer that embodies our original vision.


By October 2023, we had developed a proprietary technique to generate datasets that truly captured the nuances of how human developers work. This caught the attention of OpenAI, who granted us what was at the time rare early access to fine-tune GPT-4 beyond public availability. This collaboration allowed us to push the boundaries of what was possible with our approach, and in August 2024, we achieved a major milestone. Genie scored 43.8% on the industry-standard SWE-Bench Verified, significantly outperforming previous records held by major tech companies.


Our product nearly broke under its own success. But we're now excited about the potential to transform how developers work. The bottom line is that the unprepared, underestimated success of a first product can derail a company, but it can also set it on a path to something a lot more meaningful than the founders initially imagined - if they have the humility to challenge their own assumptions, like we did.