Why ChatGPT Highlights Copilot's Fundamental Flaw

Anshul5 min read

With all the recent AI hype, you may have forgotten about Github Copilot’s launch a year and a half ago. It was nuts - an AI product that kinda worked for code finally brought the power of generative ML to software development. It wasn’t perfect, and definitely was sometimes more annoying than helpful, but I could not put my finger on whether there was something fundamentally wrong with the product or just my unluckiness getting bad / unwanted completions.

François Wouts @fwouts

I find GitHub Copilot more distracting than anything.

It often spits out code that is slightly incorrect so I spend more time figuring out why it doesn’t work than coming up with the correct code myself (even if I have to first make a mistake then correct it, it feels faster).

View on X

I finally could express what was off a couple months ago, when two things happened:

I’ll start with the second. This study on how people interacted with Copilot verbalized an important distinction on how we code, namely that there are two “modes” in which we code and what we want from a coding assistant in each mode is slightly different:

  • Acceleration: The programmer knows what they want to do next and the assistant just does it faster. It is important for interactions to be fast so as to not break the flow, and long suggestions were often seen as a hindrance breaking the flow (even if correct!)
  • Exploration: The programmer is trying to do something unfamiliar, and the assistant is used to explore options and get a starting point. Interactions are slow and deliberate, often involve explicit prompting that isn’t present in the existing context, and results require more validation

The fundamental flaw with Copilot is that it tries to be the assistant for both acceleration and exploration, ending up being imperfect for both.

When I’m in acceleration mode, the long suggestions that Copilot provides are at best distracting. Slightly worse, my limited human pattern matching accepts a suggestion that seems fine, only for it to have some minor bug that requires hours to finally debug. At absolute worst, it becomes much easier to accept in code that is a security vulnerability or a long code block that is actually verbatim from some training data. So while the interaction UI is fine, a better solution for acceleration mode is to provide smaller chunks, which would be faster to generate, quicker to reject, and less prone to these human acceptance errors.

recluse; @cosine_distance

I find the GitHub copilot thing really distracting. Like, it interrupts my own stream of though with a block of code that makes me think “huh, maybe this is what I want” then i take the time to evaluate it in my head and realize it’s not quite right, then have to refocus again

View on X

When I’m in exploration, I’m stuck with trying to prompt Copilot with natural language comments to provide additional context and getting blocks of code with often little to no explanation, forcing me to open up Google and search, which is antithetical to having a code assistant that lives in your IDE. And what would be a better solution? This is where ChatGPT comes in. And I’m not talking about the underlying models, just the UI. A conversational UI for exploration is ideal because it is great for prompting with both explicit and implicit context and goals, as well as iteration, explanation, and answering questions.

Yuri Sagalov @yuris

There’s a real difference between using GitHub Copilot and Chat GPT for programming.

GitHub Copilot definitely makes coding faster, but ChatGPT actually feels like a pair programmer because you can have a discussion with it.

View on X

I’ve talked to a lot of developers on what AI tools they use today. A lot of developers respond that they haven’t disabled Copilot entirely, but have started using ChatGPT in some situations. With a little more prodding, it becomes clear that ChatGPT is used for exploration while Copilot for acceleration, since the latency and out-of-IDE experience of ChatGPT makes it unusable in the latter mode.

Now is ChatGPT as a model good for exploration? As many a Twitter thread will point out, ChatGPT can be incredibly confident with wrong answers, so there’s a lot of work to be done to improve quality and add validation. But when it comes to UI? There’s something there.

So, wrapping up, Github Copilot undoubtedly started moving the gears towards an AI-based revolution of software development, but unless it is fundamentally changed to not try to do it all under a single UI, it will be flawed from a product perspective.


Shameless plug: Here at Codeium, we are building our own AI-powered code acceleration toolkit. We have started with autocomplete, but unlike Copilot, focusing on just the “acceleration” mode, prioritizing speed and reasonably chunked completions (O(few) lines at a time) unless we are incredibly confident in a longer completion (e.g. very standard ways of doing something). We already have thousands of developers using Codeium, with an active community and timely support on Discord, which has been lacking recently with Copilot.

We have also made this AI-powered autocomplete free forever, so you can get both ideal AI-powered assistants for both acceleration and exploration (via ChatGPT) for free. Well that is until ChatGPT becomes monetized…