As technology advances, more and more tasks are being automated. One area that is rapidly evolving is computer programming. Recently, several AI bots have been developed that can write code, freeing up programmers to work on other tasks. In this article, we will compare four of the most advanced AI bots: GPT-4, Bing, Claude+, Bard, and GitHub Co-Pilot. We will examine how they work, their strengths and weaknesses, and how they compare to each other. Testing the AI Bots for Coding Before we dive into comparing these four AI bots, it’s essential to understand what an AI bot for coding is and how it works. An AI bot for coding is an artificial intelligence program that can automatically generate code for a specific task. These bots use natural language processing and machine learning algorithms to analyze human-written code and generate new code based on that analysis. To start off we are going to test the AI on a hard Leetcode question, after all, we want to be able to solve complex coding problems. We also wanted to test it on a less well-known question. For our experiment, we will be testing Leetcode 214. Shortest Palindrome. It’s somewhat similar to Leetcode 3, as in it’s one of the most popular string problems. GPT-4: The Latest and Greatest is OpenAI’s latest offering in the field of AI-assisted coding. It claims to be more powerful than any other generative AI tool. It uses a transformer-based language model making it the most extensive AI bot for coding to date. GPT-4 can generate high-quality code for a wide range of programming languages and applications. GPT-4 Here is GPT-4’s answer in Python: def shortestPalindrome(self, s: str) -> str: if not s: return s # Find the longest palindrome prefix def kmp_table(s: str) -> List[int]: table = [0] * len(s) i, j = 1, 0 while i < len(s): if s[i] == s[j]: table[i] = j + 1 i += 1 j += 1 elif j > 0: j = table[j - 1] else: i += 1 return table # Combine the original string and its reverse with a special character '#' s_extended = s + '#' + s[::-1] table = kmp_table(s_extended) # The last element of the table is the longest palindrome prefix longest_palindrome_prefix = table[-1] # Add characters to the beginning to form a palindrome return s[longest_palindrome_prefix:][::-1] + s The submission . It beat of submissions on runtime and on memory. passes all tests 47% 8% Strengths of GPT-4 GPT-4 is highly versatile in generating code for various programming languages and applications. Some of the caveats are that it takes much longer to get a response. API usage is also a lot more expensive and costs could ramp up quickly. Overall it got the answer right and passed the test. Bing: The Search Engine That Can Code Bing is a search engine developed by , earlier this year they integrated GPT as a new way to search. Bing uses deep neural networks and natural language processing to analyze human-written code and generate new code based on that analysis. Bing is not designed to generate complete programs but rather small code snippets, as of now you only get 20 follow-up questions per chat. Microsoft Here is Bing’s answer in Python: def shortestPalindrome(self, s: str) -> str: if not s: return "" rev_s = s[::-1] for i in range(len(s), -1, -1): if s[:i] == rev_s[len(s)-i:]: return rev_s[:len(s)-i] + s The submission . It beat of submissions on runtime and on memory. This code looks a lot simpler than what GPT-4 generated. It beat GPT-4 on memory and it used less code! Bing seems to have the most efficient code so far, however, it gave a very short explanation of how it solved it. Nonetheless, best so far. passed all the tests 47% 37% Claude+: The Anthropic AI Bot Claude+ is an AI bot developed by Anthropic. Released in March 2023 a new version of Claude was introduced. Unlike other AI chatbots, Claude aims to offer a with less toxic, biased, and hallucinatory responses to make the rapid developments in AI positively affect the world. This AI seems to be less talked about overall, so let’s see if it’s a hidden gem. safer AI Here is Claude+’s code in Python: def shortestPalindrome(self, s: str) -> str: n = len(s) # Find the longest suffix that is a palindrome suffix = '' for i in range(n-1, -1, -1): if s[-i:] == s[-i:][::-1]: suffix = s[-i:] break # Find the longest prefix that is a palindrome prefix = '' for i in range(n): if s[:i+1] == s[:i+1][::-1]: prefix = s[:i+1] break # The palindrome is prefix + suffix + reverse of rest of the string return prefix + suffix + s[:n-len(prefix)-len(suffix)][::-1] + s[n-len(prefix)-len(suffix):] The code pass the submission test. Only 1/121 of the test passed. This one seemed promising but it looks like Claude is not that well suited for programming. does not Ouch! Bard: The AI Bot from Google Now we move on to . Like many of the others, Bard was recently released. Although it has been , let’s see how it fairs with coding tasks. After all, is one of the leading tech companies, especially in the field of artificial intelligence . Bard mocked in the past Google Here is the Bard’s code for solving the problem: def shortestPalindrome(self, s): lps = [0] * len(s) for i in range(1, len(s)): j = lps[i - 1] while j > 0 and s[i] != s[j]: j = lps[j - 1] if s[i] == s[j]: j += 1 lps[i] = j palindrome = "" i = len(s) - 1 while i >= 0: if lps[i] == 0: palindrome = s[i] + palindrome else: palindrome = s[i:lps[i]] + palindrome i = lps[i] - 1 return palindrome So to start off I had to manually insert the “self” arg in the function since Bard didn’t include it. From the result of the test, Bard’s code pass the submission test. Passing only test cases. An unfortunate result, but it’s safe to say for now Bard isn’t much of a coding expert. did not 2/121 GitHub Co-Pilot Although not technically similar to the rest, we thought we’d include on this list to see how it stacks against the rest. Co-pilot is the most popular coding assistant as of now and is now being used by over a million developers. co-pilot I will start by creating a comment on VSCode and asking it to solve the same leetcode question. Here is the code Co-Pilot returned: def shortestPalindrome(self, s: str) -> str: if not s: return s n = len(s) for i in range(n-1, -1, -1): if s[:i+1] == s[:i+1][::-1]: return s[i+1:][::-1] + s return s This . It scored better than of submissions on runtime and on memory. passes all the tests 30% 37% Conclusion Overall, it appears that offers the least adequate coding assistance in the group. Interestingly, both Bing and Co-Pilot are powered by GPT under the hood. I provided all the AIs with the same prompt, "Solve Leetcode 214. Shortest Palindrome." While I could have asked follow-up questions, I chose to only consider the initial response. Bing It is unclear whether any of these models were pre-trained on Leetcode data. I tested these models because, as of April 2023, they seem to be the most effective at programming. Although there are some open source models such as Alpaca, Llama, Vicuna, and GPT-J, none of them have yet to match the effectiveness of the closed source models. What are your thoughts? Which programming model have you found to be the most effective, and what have you discovered to be the most effective way to prompt it? Also published here.