Google Gemini CodeAssist: A Review

Cognition4 min read

tl;dr Google’s Gemini CodeAssist still has a ways to go to pass the basic sniff tests for a high quality code assistant.

While we are always heads down in improving Codeium, we constantly review the state of other tools on the market to (a) see if there are learnings that we can incorporate and (b) answer questions from our customers on these tools. We have gotten a number of questions on Google’s Gemini CodeAssist since being rebranded from Google Duet and relaunched at the Google Cloud Next conference a couple of weeks ago, and while we have received questions in the past, there was no GA’d product for us to test, and we never write about products that we cannot actually try and review. Those products end up being more hype than reality.

So, with that, let’s dive into CodeAssist!

Let’s start with what is right. For one, there is no explicit bias to any other product in the developer toolkit. Unlike GitHub Copilot, CodeAssist is not preferentially better on any source code management tool, and unlike Amazon CodeWhisperer, the sales pitch isn’t the GCP analog of “we are the best at writing at AWS code.” On the surface, this seems like it is meant to be a general purpose code assistant, which while sounding simple, is not the stance that the other large tech companies have taken. The second good thing is that it seems fast enough for the autocomplete to be usable. We have talked about latency extensively on this blog because of how critical it is for the product to even be usable, and when CodeAssist does produce a suggestion, it seems fast enough.

That is unfortunately where the positives end.

Sure, we can point to the fact that it is only available on VSCode and JetBrains as a negative today when it comes to availability, but the bigger issues were simply in the quality of the suggestions. We almost immediately hit tokenizer issues where CodeAssist started producing comments in another language:

Improper tokenization leading to tokens in wrong natural language.

This already points to fundamental issues with the system but let us plow along. One of the basic aspects that make a code assistant different from other LLM applications is that a developer is often editing existing code rather than adding net new code at the end of a file. This means that the system needs to have fill-in-the-middle, or FIM, capabilities to reason about the text after the cursor (even betters is inline FIM but to this day Codeium is the only tool that has this). Let’s just say that the basic FIM sniff test wasn’t perfect:

When we continued trying some basic tasks, we noticed failures of the earliest edge cases that we hit over a year ago like handling parentheses generation (and seamlessly merging with the existing code). Also, CodeAssist would just stop generating code altogether without us pressing any special toggle to disable suggestions. This is frustrating once a developer gets used to using code assistants, because such simple situations are exactly where we would hope to get suggestions to speed us up:

These just kept accumulating in poor experiences. An inability to generate code when expected, writing syntactically incorrect code, etc. CodeAssist even has a (likely larger) model that you can actively request instead of relying on the passive autocomplete model, but that has even worse FIM capabilities. All of these are shown in a single session:

Ok, so what about Chat? The most important thing for a code assistant Chat is how well it can reason about the context active in the editor and checked-out repositories, because otherwise it is no better than a ChatGPT chatbot put into the IDE. While we often point out how GitHub Copilot only looks at context of the open files, CodeAssist seems to join Amazon CodeWhisperer at only looking at the more limited context of the currently active file (not even other open tabs):

So, in conclusion, we take seriously anything coming from Google in the AI space, but CodeAssist seems like a very nascent product today that is still far behind tools like Codeium and GitHub Copilot. It is very reminiscent of where Amazon CodeWhisperer was post-AWS re:Invent announcements, and while there was a lot of buzz for about a month or two after re:Invent, that died down as well. However, unlike CodeWhisperer, which was a focus at AWS re:Invent, CodeAssist was simply not a big focus of Google Cloud Next. That simply raises some questions on how seriously Google is taking this particular application of GenAI, as compared to more generic chatbot systems or integrations into their existing product suite.