What We Think of 2023 Q4 AI News

Cognition7 min read

What a crazy Q4 it has been for gen AI, specifically in the code assistant space. First, there was GitHub Universe, where GitHub rebranded itself as an AI company and announced a bunch of upcoming features for Copilot. Then the whole OpenAI saga highlighted the fragility in this space for any products relying on third party foundational models. And then AWS rolled out Q and a bunch of other features for Amazon CodeWhisperer at AWS re:Invent.

So what do we think about everything?

Honestly, we weren’t going to post anything, but we are getting some form of this question multiple times a day, so it would be nice to refer to some existing content. So, let us walk through each one.

GitHub Universe

This one was jam-packed with announcements, but the quick tl;dr is that none of them really changed our roadmap or how we are thinking of the space. In reality, some of the announcements, such as full repository context awareness if your code is on GitHub SaaS or some of the Chat UX announcements, actually reinforce that our competitors have seen value in the functionalities that we have uniquely built and deployed, and are playing a bit of catch up.

That being said, there were some nuances in the announcements that perhaps should get more attention:

  • Going outside of the IDE: The demos with integrations into the PR process look really slick. Our goal with Codeium has always been to address the entire software development life cycle, so accelerating the PR process has always been on our radar, but just like everything we do, we are always mindful of the balance between potential value and the value we can productionize. It was a bit amusing to see Copilot announcing features like PR summaries, which they announced and have had in CopilotX since this last March, yet still have not deployed into production. We have tried these features, and they didn’t work consistently enough to trust. So, we are just waiting to see if all of the announcements this time around are just hype generators like their Copilot Next announcement earlier this year, or if they have materially improved the product.
  • Doubling down on GitHub SaaS: It seems like all of Copilot’s roadmap is tied to having the code in GitHub SaaS for source code management, not even GitHub self-hosted! An SCM migration is a very painful process for any moderately sized company, which is why we make sure Codeium is SCM-agnostic. With Codeium, you will get this context awareness, PR acceleration, and more without having to swap out SCMs, which in large is a relatively commoditized technology. We are also totally fine if you have multiple SCMs, which is almost always the case for a large organization that has gone through acquisitions or just simply does not like having vendor lock-in. And GitHub SaaS? That is a whole lot of security being thrown away given some of the track record.
  • Everything is instructive: We think this one went under the radar. We clearly noticed that not a single announcement was made talking about the quality or performance of their Autocomplete, which is Copilot’s bread and butter and the number one value-driving feature for any code assistant today because of its passive nature (developers get thousands of suggestions a day as opposed to having low tens of chats). Why the silence? Well sure, chat-based instructive UXs just look better for demos, but we actually think that this comes down to cost and margin. It was recently exposed that Copilot spends $20-40/mo serving a developer on GitHub, with the primary bulk of that cost coming from Autocomplete. That is not a good margin, which can be fixed by increasing price (which they did, say hello to $39/user/mo) and by reducing costs, which is what we think this shift is. We believe Copilot will try to shift more user behavior to instructive UXs, not because that is actually good for end user value, but because it makes the margins better. At Codeium, we do everything the opposite. We have the expertise in ML infrastructure, which is why we don’t lose a whole lot of money giving our product out for free, and want to push to more and better passive interactions that will actually drive increased value to our users.

At the end of the day, some of this is a hypothesis, and we will be waiting for a few more months to see what actually pans out. Definitely interesting though!

OpenAI Saga

Quite honestly, does anyone really know what happened? If so, please let us know. That was quite the rollercoaster of a week, but we are happy that it seems like things are (mostly) back to normal. It is such an exciting time for AI, and OpenAI crumbling, for whatever reason, would have definitely dampened the vibes around the AI world.

For us at Codeium personally, we are just curious to see if this affects GitHub Copilot in any way. For those unaware, GitHub has historically not trained their own models - they have relied on OpenAI for their models. Not being vertically integrated is why their pace of development has lagged that of a lot of other players in the space, including us (Copilot was launched 2 years before Codeium started, and yet Codeium still launched Chat 6 months before Copilot - that’s how quick it flipped). That’s all we have to say on the OpenAI saga.

AWS re:Invent, CodeWhisperer, and Q

It felt like 80% of AWS re:Invent this year was about generative AI. How much a year changes the conversations! And of course Amazon CodeWhisperer was a big part of it. Just like with GitHub, a big announcement was around adding better context awareness to CodeWhisperer in the form of a new retrieval-like system dubbed Q. Again, another vote of confidence on our product direction and development so far. We wrote a review on CodeWhisperer when it came out of beta many months ago, and the results left a lot to be desired. So, given that these announcements were made on the biggest of stages, we decided to run it back. We downloaded Q and immediately, we weren’t impressed:

The most shocking thing is how few suggestions CodeWhisperer gives now. Perhaps in the goal to juice their acceptance rate numbers, they added a bunch of filters on when they would even try giving a suggestion, and that led to a very frustrating end experience where we found ourselves waiting for a suggestion in a place we would expect it to be provided, only with no dice.

Let’s see if they have fill-in-the-middle yet:

Kind of? It seems like it can see that we have the GCD algorithm, but then also doesn’t actually do “filling in” and just generates the code all over again. Maybe some prompt engineering to put the following code into the context.

So maybe the focus was purely on context retrieval with Q and we shouldn’t expect improvements in their models to work for the code modality. Let’s just ask Q to explain a function in the codebase, the simplest of code retrieval tasks:

Ok, so the context awareness also doesn’t work. If you need to have the file open, that definitely reduces the value of the context awareness by a bunch. It turns out that even with the file open, Q only has access to the active tab:

If we are supposed to use Q differently, we would love to know, but it definitely doesn’t work in the manner we would expect intuitively, given experiences with other tools like Codeium or Copilot or Cody that have varying sophistication of context awareness.

It seems Amazon is mainly looking at generative AI as a way to incentivize more AWS consumption, by doubling down on messaging that Q is good for infrastructure-as-code, rather than build tools that are the best for developers. This is doubly interesting since tools such as Codeium are also trained on lots of public infrastructure-as-code data, a good fraction of which is AWS code, so it is unclear how much more valuable Q will be for this relatively narrow application.

Conclusion

So that’s our thoughts. Lots of announcements, some more head-scratching than others, but at the end of the day, we will see how things play out. We always keep tabs on how the space is evolving in order to learn and react, but nothing here changes our vision - get the best AI to every developer, no matter where they code, what they code, and where their code is.