284 reads

From Davin to Microsoft Autodev: Elevating AI Coding Assistants to Super-Powered Code Editors

by Sukhpinder SinghMarch 22nd, 2024

Too Long; Didn't Read

Unveiling Microsoft Autodev: How It’s Shaping the Future of Software Development.

featured image - From Davin to Microsoft Autodev: Elevating AI Coding Assistants to Super-Powered Code Editors

Tools like ChatGPT have been a big help for programmers, offering code suggestions in chat and even directly within their coding environment. But these helpers have limitations. They can’t do everything a programmer can, like checking for errors or running the code to see if it works.

This is where AutoDev steps in to make a difference.

It’s like a super-powered coding assistant that can take action directly within your code files.

AutoDev can edit files, search for information within the code, build and run the code, test it for errors, and even use command-line tools.
This means it can help you complete complex coding tasks without you having to do everything manually.

In essence, AutoDev takes existing coding helper tools to the next level by giving them the ability to directly work on your code and automate more steps in the coding process.

Key Features:

Conversation Manager
Customizable Tools
Agent Scheduler
Evaluation Environment

A novel framework that empowers AI agents to manage complex software development workflows. We demonstrate its effectiveness through evaluations using established datasets and showcase its ability to handle diverse software engineering objectives. The results highlight AutoDev’s potential to significantly automate software development tasks while maintaining a secure and user-controlled development environment.

How It Works Behind the Scenes

Imagine a team of AI helpers for your coding projects. That’s what AutoDev is! It lets these AI agents work together to tackle complex software development tasks without needing you to look over their shoulders constantly.

Here’s a breakdown of how it works:

Setting the Rules: First, you tell AutoDev what the AI helpers can and can’t do. You can choose from a list of pre-built commands, like editing files, running tests, or checking for bugs. You can even fine-tune these permissions to fit your specific needs. Think of it as giving your AI helpers a rulebook to follow.
Giving Instructions: Once the rules are set, you tell AutoDev what you want to achieve. For example, you might ask it to create test cases for your code, making sure they work properly and catch any errors. This is like giving your AI helpers a specific task to complete.
The Conversation Manager: This part acts as the project manager for your AI helpers. It keeps track of everything that’s said and done, making sure everyone is on the same page. It also decides when to pause things and check in with you, just in case.
Understanding the Helpers: The AI helpers might suggest actions, but the Conversation Manager needs to make sure they’re following the rules and using the right commands. It also translates their responses into clear messages for you to understand.
Keeping You Informed: After the AI helpers take action (like running a test), the Conversation Manager gathers the results and puts them in a clear report for you. This way, you can see what the AI helpers did and what the outcome was.

Imagine AutoDev as a team manager for AI helpers in coding.

The Scheduler: This is like the team leader who assigns tasks to different AI helpers based on their strengths. Some helpers are better at understanding code (like OpenAI GPT-4), while others are better at creating code (like smaller language models). The leader decides who does what and in what order, using different methods like taking turns (round robin), letting someone work until finished (token-based), or prioritizing tasks based on importance.
The AI Helpers: These are like the coding assistants. They can’t directly write code, but they can follow instructions and use tools to help with various coding tasks. They communicate with each other and the team leader using plain English.
The Tool Library: This is like a toolbox with different functions for the helpers. There are tools for editing files (like code, instructions, or notes), searching for specific information within the code, building and running the code to see if it works, checking for errors, and even communicating with the team leader or you (the user) if they need help.
The Safe Zone: This is a secure environment where the helpers use the tools to work on the code. It keeps everything safe and makes sure no mistakes are accidentally made to the original code.

How it works:

You tell AutoDev what you want the code to do (the objective).
It assigns different parts of the task to different AI helpers based on their skills.
The helpers use the tools and communicate with each other to achieve the objective.
Everything happens in the safe zone to avoid messing up the original code.
This process keeps going back and forth until the code is finished, you ask AutoDev to stop, or there have been too many attempts without success.

Overall, It helps automate complex coding tasks by using a team of AI helpers with different skills and a safe environment for them to work in.

Testing AutoDev’s Skills

Imagine you want to see how good AutoDev is at different coding tasks. That’s what the researchers did here. They set up some experiments to answer three main questions:

Can AutoDev write code?
Can AutoDev create tests for existing code?
How efficient is AutoDev at doing these tasks?

Write Code

The researchers gave some unfinished code snippets and asked them to complete them.
They compared AutoDev’s performance to other approaches, including directly using a powerful AI model like GPT-4.
It did a very good job (almost as good as the best) at writing code, even without any extra training!
They measured success by checking if the code AutoDev generated passed all the existing tests for that code snippet.

Creating Test

This test flipped things around. The researchers gave AutoDev complete code and asked it to create new tests for that code.
They compared AutoDev’s tests to tests written by humans.
Here’s the key finding: While AutoDev wasn’t perfect, it created tests that were almost as good as human-written tests in terms of how well they covered the code.

The Efficiency Test

This test looked at how many steps and resources AutoDev used to complete the tasks.
They looked at things like how many times it had to call on the helper AI model and how many words it used in its conversations.
This is important because you wouldn’t want AutoDev to take too long or use too much power to do its job.

Overall, these tests show that AutoDev is a promising tool that can both write code and create tests for existing code. It’s still under development, but it has the potential to be a valuable helper for programmers.

The Results: How Well Does AutoDev Do?

The researchers looked at how well AutoDev performed on different tasks:

Writing Code: They found that it did a great job (almost the best) at writing missing parts of code, even without any extra training! This is better than directly using a powerful AI model like GPT-4 on its own.
Creating Tests: AutoDev was also good at creating tests for existing code. Its tests were almost as good as human-written tests in terms of how well they covered the code.
How Many Steps Does It Take? AutoDev takes a few more steps than some other approaches, but that’s because it does more than just write code. It also checks the code it writes to make sure it works properly, which is an important step programmers normally do themselves.

AutoDev is a promising tool that can both write code and create tests for existing code.
It achieves good results without needing extra training data.
It takes a bit more effort (steps) than some other approaches, but that’s because it does more thorough work.

Technical Details (can be skipped by most readers):

The researchers compared AutoDev to other methods and found it performed well.
They measured success by looking at how many times it completed the task correctly on the first try (Pass@1).
It uses more “tokens” (words) than some other approaches, but that’s because it takes extra steps, like testing the code it writes.
It uses a secure environment to run the code it creates.

AutoDev in Action

We tell AutoDev to generate a Pytest test with specific assertions for a function in human_answer.py. Here's the Python code for the function:

def is_bored(S):
  """
  You'll be given a string of words, and your task is to count the number of boredoms. 
  A boredom is a sentence that starts with the word "I". Sentences are delimited by '.', '?' or '!'.
  For example:
  >>> is_bored("Hello world") 0
  >>> is_bored("The sky is blue. The sun is shining. I love this weather") 1
  """
  import re
  sentences = re.split(r'[.?!]\s*', S)
  return sum(sentence[0:2] == 'I ' for sentence in sentences)

We tell AutoDev to write the test to a new file named test_HumanEval_91.py with a specific format.

It creates a new test file (test_HumanEval_91.py) with the following code:

from .human_answer import *
import pytest
def test_is_bored():
  assert is_bored('Hello world') == 0
  assert is_bored('I am bored. This is boring!') == 2
  assert is_bored('The sky is blue. The sun is shining. I love this weather.') == 1
  assert is_bored('I think, therefore I am. I am bored?') == 2
  assert is_bored('') == 0

It discovers an error. One of the assertions fails.

AutoDev Detects and Fixes the Error

It identifies that the test for 'I am bored. This is boring!' fails because it expects the function to return 2, but according to the function's logic, it should only return 1.
It corrects the failing test by updating the assertion statement in test_HumanEval_91.py (line 5) to:

assert is_bored('I am bored. This is boring!') == 1

AutoDev runs the tests again, and this time, all tests pass. It successfully creates a test suite that covers all the functionalities of the is_bored function. This example highlights two key features :

Self-evaluation: It can identify errors in its generated code and fix them.
User Transparency: It keeps the user informed about its actions throughout the process.

Conclusion

Imagine them as little code-writing helpers. Instead of developers doing all the work, they become supervisors.

And don’t worry, this won’t happen overnight. It’s a gradual shift, and developers will still play a crucial part in shaping the software world. So, while AutoDev is exciting, it won’t replace human developers in the long term. They’ll continue to be essential making sure the robots stay in line! 🤖👩‍💻👨‍💻