How AI Bots Code: Comparing Bing, Claude+, Co-Pilot, GPT-4 and Bard

Written by jorgev | Published 2023/04/28
Tech Story Tags: artificial-intelligence | coding | coding-skills | generative-ai | ai | technology-trends | programming | ai-trends

TLDRAn AI bot for coding is an artificial intelligence program that can automatically generate code for a specific task. In this article, we will compare four of the most advanced AI bots: GPT-4, Bing, Claude+, Bard, and GitHub Co-Pilot. We will examine how they work, their strengths and weaknesses, and how they compare to each other.via the TL;DR App

As technology advances, more and more tasks are being automated. One area that is rapidly evolving is computer programming. Recently, several AI bots have been developed that can write code, freeing up programmers to work on other tasks.

In this article, we will compare four of the most advanced AI bots: GPT-4, Bing, Claude+, Bard, and GitHub Co-Pilot. We will examine how they work, their strengths and weaknesses, and how they compare to each other.

Testing the AI Bots for Coding

Before we dive into comparing these four AI bots, it’s essential to understand what an AI bot for coding is and how it works. An AI bot for coding is an artificial intelligence program that can automatically generate code for a specific task. These bots use natural language processing and machine learning algorithms to analyze human-written code and generate new code based on that analysis.

To start off we are going to test the AI on a hard Leetcode question, after all, we want to be able to solve complex coding problems. We also wanted to test it on a less well-known question. For our experiment, we will be testing Leetcode 214. Shortest Palindrome. It’s somewhat similar to Leetcode 3, as in it’s one of the most popular string problems.

GPT-4: The Latest and Greatest

GPT-4 is OpenAI’s latest offering in the field of AI-assisted coding. It claims to be more powerful than any other generative AI tool. It uses a transformer-based language model making it the most extensive AI bot for coding to date. GPT-4 can generate high-quality code for a wide range of programming languages and applications.

Here is GPT-4’s answer in Python:

 def shortestPalindrome(self, s: str) -> str:
        if not s:
            return s
        
        # Find the longest palindrome prefix
        def kmp_table(s: str) -> List[int]:
            table = [0] * len(s)
            i, j = 1, 0
            while i < len(s):
                if s[i] == s[j]:
                    table[i] = j + 1
                    i += 1
                    j += 1
                elif j > 0:
                    j = table[j - 1]
                else:
                    i += 1
            return table
        
        # Combine the original string and its reverse with a special character '#'
        s_extended = s + '#' + s[::-1]
        table = kmp_table(s_extended)
        
        # The last element of the table is the longest palindrome prefix
        longest_palindrome_prefix = table[-1]
        
        # Add characters to the beginning to form a palindrome
        return s[longest_palindrome_prefix:][::-1] + s

The submission passes all tests. It beat 47% of submissions on runtime and 8% on memory.

Strengths of GPT-4

GPT-4 is highly versatile in generating code for various programming languages and applications. Some of the caveats are that it takes much longer to get a response. API usage is also a lot more expensive and costs could ramp up quickly. Overall it got the answer right and passed the test.

Bing: The Search Engine That Can Code

Bing is a search engine developed by Microsoft, earlier this year they integrated GPT as a new way to search. Bing uses deep neural networks and natural language processing to analyze human-written code and generate new code based on that analysis. Bing is not designed to generate complete programs but rather small code snippets, as of now you only get 20 follow-up questions per chat.

Here is Bing’s answer in Python:

def shortestPalindrome(self, s: str) -> str:
        if not s:
            return ""
        rev_s = s[::-1]
        for i in range(len(s), -1, -1):
            if s[:i] == rev_s[len(s)-i:]:
                return rev_s[:len(s)-i] + s

The submission passed all the tests. It beat 47% of submissions on runtime and 37% on memory. This code looks a lot simpler than what GPT-4 generated. It beat GPT-4 on memory and it used less code! Bing seems to have the most efficient code so far, however, it gave a very short explanation of how it solved it. Nonetheless, best so far.

Claude+: The Anthropic AI Bot

Claude+ is an AI bot developed by Anthropic. Released in March 2023 a new version of Claude was introduced. Unlike other AI chatbots, Claude aims to offer a safer AI with less toxic, biased, and hallucinatory responses to make the rapid developments in AI positively affect the world. This AI seems to be less talked about overall, so let’s see if it’s a hidden gem.

Here is Claude+’s code in Python:

   def shortestPalindrome(self, s: str) -> str:
        n = len(s)
        # Find the longest suffix that is a palindrome
        suffix = ''
        for i in range(n-1, -1, -1):
            if s[-i:] == s[-i:][::-1]:
                suffix = s[-i:]
                break
                
        # Find the longest prefix that is a palindrome
        prefix = ''
        for i in range(n):
            if s[:i+1] == s[:i+1][::-1]:
                prefix = s[:i+1]
                break  
                
        # The palindrome is prefix + suffix + reverse of rest of the string
        return prefix + suffix + s[:n-len(prefix)-len(suffix)][::-1] + s[n-len(prefix)-len(suffix):]

The code does not pass the submission test. Only 1/121 of the test passed. Ouch! This one seemed promising but it looks like Claude is not that well suited for programming.

Bard: The AI Bot from Google

Now we move on to Bard. Like many of the others, Bard was recently released. Although it has been mocked in the past, let’s see how it fairs with coding tasks. After all, Google is one of the leading tech companies, especially in the field of artificial intelligence .

Here is the Bard’s code for solving the problem:

def shortestPalindrome(self, s):
        lps = [0] * len(s)
        for i in range(1, len(s)):
            j = lps[i - 1]
            while j > 0 and s[i] != s[j]:
                j = lps[j - 1]
            if s[i] == s[j]:
                j += 1
            lps[i] = j
        
        palindrome = ""
        i = len(s) - 1
        
        while i >= 0:
            if lps[i] == 0:
                palindrome = s[i] + palindrome
            else:
                palindrome = s[i:lps[i]] + palindrome
            i = lps[i] - 1
        return palindrome

So to start off I had to manually insert the “self” arg in the function since Bard didn’t include it. From the result of the test, Bard’s code did not pass the submission test. Passing only 2/121 test cases. An unfortunate result, but it’s safe to say for now Bard isn’t much of a coding expert.

GitHub Co-Pilot

Although not technically similar to the rest, we thought we’d include co-pilot on this list to see how it stacks against the rest. Co-pilot is the most popular coding assistant as of now and is now being used by over a million developers.

I will start by creating a comment on VSCode and asking it to solve the same leetcode question.

Here is the code Co-Pilot returned:

 def shortestPalindrome(self, s: str) -> str:
        if not s:
            return s
        n = len(s)
        for i in range(n-1, -1, -1):
            if s[:i+1] == s[:i+1][::-1]:
                return s[i+1:][::-1] + s
        return s

This passes all the tests. It scored better than 30% of submissions on runtime and 37% on memory.

Conclusion

Overall, it appears that Bing offers the least adequate coding assistance in the group. Interestingly, both Bing and Co-Pilot are powered by GPT under the hood. I provided all the AIs with the same prompt, "Solve Leetcode 214. Shortest Palindrome." While I could have asked follow-up questions, I chose to only consider the initial response.

It is unclear whether any of these models were pre-trained on Leetcode data. I tested these models because, as of April 2023, they seem to be the most effective at programming. Although there are some open source models such as Alpaca, Llama, Vicuna, and GPT-J, none of them have yet to match the effectiveness of the closed source models.

What are your thoughts? Which programming model have you found to be the most effective, and what have you discovered to be the most effective way to prompt it?


Also published here.


Written by jorgev | Tech, AI Enthusiast. Startups, Programming & Entrepreneurship
Published by HackerNoon on 2023/04/28