We’ll start with a hot take - developer productivity should not be the ultimate goal of AI code assistants. Yes, it is how AI code assistants are framed today, and yes, it is a measurable and real benefit, but a narrow focus on “developer productivity” suggests that tools like Codeium can only make existing work go faster. In reality, we believe AI has the potential to help make possible what was previously unachievable. Dream bigger, that’s been our mission.
However, dreaming bigger may not happen within the constraints of the tools and processes built for a world without AI. As the frontier of AI capability charges forwards, developers will need a new set of tools that can unleash innovation.
That’s where the Windsurf Editor comes in.
The Story Pre-Windsurf
Many people today know us as a company that has built AI-powered extensions into existing IDEs. If dreaming bigger and removing constraints were always the goals, why did we build extensions?
Well, simply put, extensions were enough to expose the frontier of what AI was capable of doing. As a company, we are no strangers to diving into new technical challenges when the need is clear. We used to be an ML infrastructure company before Codeium, but we extended the infrastructure to a fully vertical application when it was clear that we could drive outcomes from AI even further if we delivered the end application ourselves. Our first versions of autocomplete used off-the-shelf models, but we shifted to training our own proprietary LLMs for the task when it was clear that tasks like fill-in-the-middle and context-aware pretraining would be key to expose the best quality results.
In contrast, we did not feel that the IDE needed to be reinvented to give a fantastic user experience.
That all changed when a few different advancements happened in quick succession:
-
Riptide: This was actually covered by the press in August as Cortex, but we changed the name to be more fitting (and aquatic!) to Riptide. The core problem is semantic search on enterprise-scale codebases. Embedding based systems naturally start to break down against larger and larger corpuses because of a guaranteed non-zero false positive rate. Embeddings can never capture the full nuance of the original raw text. This is even more problematic when thinking of semantic search as part of a RAG-like pipeline for LLM inference, since an increased absolute quantity of false positives corrupt the input context quality and cause hallucinations and poor results. Our approach with Riptide was to train a proprietary LLM that is really good at answering the question “how relevant is this snippet of code to this particular input query?” and then parallelize thousands of inference calls to this LLM for different code snippets across hundreds of GPUs at burst capacity. This outperforms state-of-the-art embedding search methods by 300% on accuracy, all because we never switch into an embedding space and instead leverage vast amounts of compute to get the same low-latency experience.
-
Trajectory understanding: We always knew that the next level of personalization would be user-level personalization. While systems like our context awareness engine help to personalize the system to a particular organization, each user has both preferences and intent that should be leveraged for better results. A quick example - if a developer was to observe another developer do some file navigation, execute a couple terminal commands, and maybe make a couple of small edits, there is a very high chance that the observer will know exactly what the active developer is trying to accomplish, what their intent is. The actively working developer would not have to turn to the observer and spell out the intent word-for-word, but this is actually exactly what was happening with AI tools - a developer would have to spell out the task at hand for the AI even though all of the information to decipher the intent exists, if not recorded. We started building a lot of systems internally that could understand a user’s trajectory of actions within the IDE in order to derive intent, and this was a brand new core competency.
-
Model improvements: Unlike the other two, this was not internal. Simply put, the Claude 3.5 Sonnet’s and GPT-4o’s of the world rapidly got a lot better at reasoning, especially over code. Our model philosophy has always been around optionality - we will use whatever model is best for each workload, whether it is internally or externally trained, because our value proposition isn’t a model company or a trivial model wrapper. We knew we needed a better general purpose reasoning model to unlock some of the tool-calling and iterative behaviors that we were hoping to achieve with a collaborative agentic system, and these models helped complement our other proprietary models that would be required to pull this off.
Combining all of these together with our industry-leading context awareness engine for knowledge understanding and our expertise in ML infrastructure for latency optimization, we realized we could create a whole new way of working with AI, a paradigm that combined the collaborative nature of copilots with the capacity to be independently powerful like an agent, which we later called an AI flow.
And for the first time, we felt like the existing IDEs were truly constraining, both in terms of the UX to expose these flows and in terms of the information we could get from the IDE. So, the time was right to start to rethink what an editor should look like, which turned into the Windsurf Editor, powered by the first ever generally-accessible flow-like collaborative agent, Cascade.
Making of Windsurf
Let’s start with the tactical. Just like the previous frontier of AI capability did not require a custom IDE, today’s frontier of capability with AI flows did not require us to completely reimagine everything about an IDE. This is why we decided to take the somewhat-memed approach of forking Visual Studio Code; this massively increased the flexibility in UX by not being constrained by the APIs for VS Code extensions, while simultaneously not having to rebuild all of the core internals of an IDE.
With that out of the way, we started questioning everything about what an editor needed to have and not have if you had this AI flow that could search, traverse, and analyze code as well as make multi-file edits and execute terminal commands.
The first decision was to completely remove Chat. The “ChatGPT in a panel in the IDE” has been a staple of all code assistants, whether they are extensions or AI-native IDEs. It is such an intuitive interface with familiarity due to ChatGPT and other conversational AI bots. However, we believed that Cascade was a strict improvement to Chat, that a system that could iterate on itself can answer strictly broader questions or strictly larger-scoped tasks than an AI that is capped in capacity by a single LLM inference (no matter how good the context is). Axing the most popular modality within AI code assistants was one that was obvious once it was internally suggested, but fundamentally changes what an AI-native IDE feels like.
Then there are more user experience decisions. If there are multi-file edits that we would want a human to review, the simple “tab to accept” isn’t comprehensive. We realized we had to bring in a true code review flow into the editor, which materialized in this simple bar at the bottom of the text editor that allows the developer to jump through suggested diff blocks within and across files, accepting and rejecting quickly.
Cascade review bar
We also architected the system under the hood so that we aren’t constrained to the Windsurf Editor, with the proper abstractions, language servers, and more. While we might not be able to create the same UX snappiness and polish in other IDEs via an extension, these other IDEs do have powerful features that the Windsurf Editor does not have yet (e.g. Java debuggers in IntelliJ), so we should be thinking about how we can bring as much of this frontier of AI possibility to these IDEs as well. Our Codeium extensions are not going anywhere, and we will continue to push these forwards. For our enterprise customers, we want to have both the best AI-native IDE and the best AI-powered extensions so that they are able to get the maximum value from AI no matter where they are in their AI transformation journey.
These were just a couple of the many design decisions that went into reimagining the IDE. With a lot of A/B testing, user feedback, and our own personal experiences with the Windsurf Editor, we were confident that we had a product that would make every individual question whether they could be more capable than they initially thought they were.
How Windsurf Changes Everything
Again, our goal is not to just make developers more efficient - in our mind, that’s table stakes for these tools. The AI code assistant space has been around long enough and is saturated enough that if a tool is not driving developer productivity, then it isn’t really in the conversation. The real goal is to allow anyone to do things they could not do before.
In just a little over a month, we are already seeing how Windsurf is changing the calculus, both in tactical development and philosophically what software teams will look like.
Expanding the scope of every employee. Silos between teams are starting to break down. We have seen developers contribute to systems that are out of their scope that they would have previously been reluctant to touch because of unfamiliarity. And this has extended past engineering. Product managers, analysts, architects, designers, and more have all felt empowered to use Cascade to reason about relevant code or write code or queries themselves. Where they would have previously filed a ticket for another team or wait for engineering resourcing, they are able to take action on the production codebases themselves, something they would have been unable to do themselves.
Renewed focus on business-advancing, as opposed to business-maintaining, work. We are already seeing massive amounts of unit tests, migrations, and other boilerplate work being completed almost entirely by Cascade. What happens next when developers are in the flow state solving meaningful problems for the business, as opposed to the uninteresting work that has normally “come with the job”? We are just about to find out, but we can see a world where organizations can confidently trust that they have a culture of technological excellence that serves as a strong foundation for all future work.
Recalibration of the build vs buy tradeoff. It is no secret that companies spend a lot of money on expensive B2B SaaS software, and it is also no secret that most companies don’t need all of the features that each B2B SaaS platform provides. Historically, this has just been taken as the cost of choosing to buy in order to not have to pay many engineers to build and maintain an internal alternative, which in itself has no connection to the business itself. But what if you don’t need to pay a lot of engineers? What if even non-developers can spin up one-off applications in the manner of minutes to solve their particular in-the-moment needs? These applications don’t need to be maintained by an engineer and would save both time and money. Already, almost every non-developer at Codeium has built an internal app that solves their particular needs, whether it be a quoting tool, trivia app, survey tool, recruiting dashboard automation, or partner management system. In some ways, this was the premise of the whole low-code/no-code movement, but Windsurf has completely flipped the paradigm on its head. In this world, it is all-code, but low-effort/no-effort. Instead of trying to bring the tech to the software development skill levels of non-developers, we are simply redefining what non-developers are skilled enough to do.
View on XJeff from business realized that he might not need to spend a lot of money on a quoting tool… Just built it himself!
None of these are what anyone would classify as standard “developer productivity” and it may be hard to quantify these, but the anecdotes have been so clear and widespread that it is undeniable the macro changes that will result from a collaborative agentic system like Cascade within the Windsurf Editor.
So, What’s Next?
Windsurf today looks like an IDE. This was intentional. As discussed earlier, that was all that was needed to expose AI flows, the new frontier of AI’s capability. It is already much more advanced than any other AI developer tool today. But that is just today, the frontier charges on.
We believe Windsurf will become a platform that will extend across the entire software development life cycle, with a single “flow” passing through all of the surfaces in which development happens. In fact, certain tasks might happen on completely different surfaces than they do today (e.g. does code review still happen in a web browser in the future, or in an editor-like surface? are tickets annotated in a JIRA or directly in a codebase?). Windsurf will also be able to ingest any store of knowledge at a company, not just the existing codebases, and learn from common flow “trajectories” to automatically learn organizational best practices. AI has the potential to truly tie together work across people, time, data, and tasks, and this will continue to redefine what development teams are able to do, not just how fast they will do it.
We aren’t quite sure yet exactly when any of these advancements will happen, or if they are even possible, but this is why we stay close to and continue to discover what AI is capable of doing at any point in time.
The beauty of a motto like “dream bigger” is that it is never complete. There will always be more that will be possible, so our work is not even close to done. That being said, the Windsurf Editor is yet another large step - redefining the IDE so that you can redefine what you’re capable of.