What is Percentage of Code Written (PCW)?
All of our customers, independent of scale or industry, have a desire to understand whether their team is getting more value from AI tools over time because of effort put into education, product improvements, simply more familiarity with the tool over time, etc.
This is why we created our Percentage of Code Written (PCW) metric. At a very high level, it measures the percentage of code committed to the codebase that can be attributed to Windsurf’s AI results. More on how we compute PCW later.
Unlike PCW, almost every metric reported by most AI coding tools can be easily gamed by the creators of the tool. For example, acceptance rates can be gamed by providing shorter, more frequent suggestions or providing suggestions only when there is extremely high confidence, at the expense of suggestions that are still quite likely to be valuable. We talked a bit about how optimizing for metrics like this can lead to decisions that are not actually optimizing for the end user performance in a previous blog post.
The primary goal of PCW was to solve this trust problem - have a metric that our customers can trust as a good directional proxy for productivity that we at Windsurf can only improve by improving the product, not by any fudging. We want it so that if PCW goes up, customers can trust that they are getting more value (and vice versa if PCW goes down). To be explicit, the goal is not for it to be a good absolute proxy for developer productivity. Developer productivity is a metric that eludes everyone, with no real clear definition. All our customers have different ways to measure developer productivity that are trusted by senior leadership. We have only general guidance, not any exact formula, on how the absolute value of PCW maps to absolute developer productivity gains.
Computing PCW
To compute PCW, we take the number of new, persisted bytes of code that can be attributed to an accepted AI result from Windsurf (i.e. Tab suggestion, Command generation, or Cascade edit) and the number of new, persisted bytes of code that can be attributed to the developer manually typing. For simplicity, call the former number (bytes attributed to Windsurf) as W and the latter (bytes attributed to developer) as D. PCW is simply (100 * W) / (W + D).
We take these measurements whenever a commit is being made. This way if the AI added a lot of code but the developer deleted a lot of it before committing the code to the codebase, then we are not incorrectly inflating the W number. Similarly, any bytes of code that come from the developer manually editing an AI result will get attributed to the developer (D) as opposed to Windsurf. It should be clear that we at Windsurf cannot really game this. If we provide fewer results, then W goes down. If the results are shorter, then W (with respect to D) goes down. If the results are worse, then W will not be as high. To be explicit, there are a few code changes for which we choose to not attribute to either Windsurf or the developer (does not contribute to the value of either the W or D quantity). For example, if there are large copy-pastes or file moves, we don’t attribute those to either.
As some historical context, because of lack of instrumentation on our end, we only used to count bytes added from accepted Tab suggestions when tracking W, and did not attribute accepted Cascade edits to either Windsurf or the developer. As of Wave 10 (June 12, 2025), W now includes code generated by Cascade. This naturally caused PCW values to increase dramatically overnight since Cascade does generate a lot of accepted and persisted code (and adds a lot of value).
Interpreting PCW
PCW is meant to be viewed and analyzed across a longer period of time (order of weeks as opposed to hours or days). When users first start using Windsurf, they often spend most of their time with Cascade as they learn the potential and limitations of the AI system, so the PCW numbers are likely larger in the beginning, only to flatten back out. That being said, customers should expect PCW values of 85%+, often 95%+. This is not a hallucination and is accurate given how we compute this metric, though there are a number of caveats that we will cover later in this section.
Now, it is important to stress that this is indeed much higher than the 30-50% code coming from AI quoted by competitors. The short answer is that this highlights the sheer value difference between agentic and non-agentic systems. At Windsurf, our overall internal PCW value is 94%. But breaking this down based on agentic vs non-agentic experiences, our PCW on the Windsurf Editor (agentic) is 95%, on the JetBrains Plugin (agentic) is 94%, and on all other plugins (non-agentic) is 41%. As you can tell, we just use a lot more of the agentic platforms than non-agentic platforms at this point. The 30-50% range generally covers what most of our customers saw for PCW before we incorporated Cascade edits to the metric, which are the agentic contributions.
At the same time, we want to be sure that there are a lot of caveats on what PCW means and what it captures that we want to be explicit about. Again, PCW is meant to be an accurate directional proxy for value, as opposed to an absolute proxy for value. Said another way, two organizations might have similar values for PCW, but observe different uplifts in absolute productivity due to the following caveats; over a reasonable period of time (large amount of data), both can trust that any changes to this metric in time would reflect as changes in how they measure productivity.
Some of the caveats include:
- Writing code is not the same as software development. This is only capturing some level of acceleration while writing code, and does not capture time taken in architecture, debugging, review, deployment, and a number of other steps. This is why we have generally heard that the overall productivity lift (by PR cycle time, story points, or whatever the customer measures productivity by) is approximately 50% of the PCW value (i.e. a 90% PCW corresponds to a ~45% actual lift in overall acceleration to shipping software). This also means that even if we entirely automate the process of writing code (unlikely), we will not hit our mission to accelerate software development by 99%. This is why we are expanding our offering to more surfaces.
- The code that the AI is producing is much more likely code that is more boilerplate or “easy.” The few percent of code written by the developer is much more likely to be core logic that takes a disproportionate amount of time to write, even if the resulting text is very minimal characters in length. It is very common for the amount of boilerplate code to be one or two orders of magnitude more in quantity than the corresponding core logic, which would translate to 90%+ PCW.
- Adding new code is not always the “right thing” to do. Often, value is driven in deleting old code, but we don’t have a great way of capturing this at the moment. Similarly, PCW does not explicitly add negative weight on AI suggestions that were accepted and then deleted. To PCW, since this is measured at commit time, this would just be a no-op. We believe if this happens excessively, then users will stop using the tool and that will get reflected negatively overall.
- We only measure the inputs to PCW at commit time within the currently active session. We currently do not have instrumentation to measure PCW across sessions. For example, it is possible that someone commits a lot of AI-generated code only for some of it to be deleted or modified at a later date. In fact, we do believe that this happens at some rate. Again, if this is frequent, we believe that developers will stop using the tool as it only would add more work over time, and have anecdotally heard that this happens very infrequently (tying back to the point that the code coming from AI is more boilerplate than core, complex logic). And also, this ties to the point that PCW is a directionally accurate metric, not an absolutely accurate metric.
- There is currently no separation of documentation and code when assessing contributions to the inputs to PCW. Windsurf, and especially Cascade, is often used to generate large amounts of documentation, and that would contribute to a higher PCW even if documentation is not technically logic-executing code. We do not believe that this takes away from being able to trust PCW as a directionally accurate metric for value driven by the platform, as long as it is measured over a reasonable period of time.
- As corollaries of previous points, if there are individual users vibe-coding toy projects or internal apps that don’t have a lot of complex core logic that requires manual edits, that can meaningfully skew the PCW number for the organization. We currently don’t have a great way to discriminate where the byte contributions are made, therefore people testing out the tool on toy repos, especially at the start of trials, can have an impact. This is why we suggest utilizing filters, such as slicing by subteam or filtering by date range.
Overall, we want to be transparent that PCW is a powerful metric that is much more trustworthy than metrics provided by other vendors in this space, but it is not the silver bullet that can be used unilaterally to justify value and ROI. We recommend using it in conjunction with other signals, such as developer sentiment and qualitative feedback, tracking changes in the value over time, and using our filters to understand how value might vary across parts of your organization. PCW is queryable from our analytics API to make it easy for our customers to combine it with other signals that we would be unable to measure from our tool.
Future Work
We have put a lot of engineering work into computing PCW, from character level attribution and tracking edits and diff views over time. We believe that PCW can be trusted as a strong, directionally accurate metric that will only change if there are real improvements in the product, adoption, or familiarity. We have used this metric internally for over a year, and have been able to correlate changes to PCW to changes in our product over time. This all being said, given the caveats, it should be used as a proxy, not the end-all metric for productivity, and there is still room for improvement.
As part of our enterprise promise, we will continue to iterate on PCW and other metrics to give our customers the most actionable insights into the value they are getting from the Windsurf platform.