One of Nubank’s most critical, company-wide projects for 2023-2024 was a migration of their core ETL — an 8 year old, multi-million lines of code monolith — to sub-modules. To handle such a large refactor, their only option was a multi-year effort that distributed repetitive refactoring work across over one thousand of their engineers. With Devin, however, this changed: engineers were able to delegate Devin to handle their migrations and achieve a 12x efficiency improvement in terms of engineering hours saved, and over 20x cost savings. Among others, Data, Collections, and Risk business units verified and completed their migrations in weeks instead of months or years.
Nubank was born into the tradition of centralized ETL FinServ architectures. To date, the monolith architecture had worked well for Nubank — it enabled the developer autonomy and flexibility that carried them through their hypergrowth phases. After 8 years, however, Nubank’s sheer volume of customer growth, as well as geographic and product expansion beyond their original credit card business, led to an entangled, behemoth ETL with countless cross-dependencies and no clear path to continuing to scale.
For Nubankers, business critical data transformations started taking increasingly long to run, with chains of dependencies as deep as 70 and insufficient formal agreements on who was responsible for maintaining what. As the company continued to grow, it became clear that the ETL would be a primary bottleneck to scale.
Nubank concluded that there was an urgent need to split up their monolithic ETL repository, amassing over 6 million lines of code, into smaller, more flexible sub-modules.
Nubank’s code migration was filled with the monotonous, repetitive work that engineers dread. Moving each data class implementation from one architecture to another while tracing imports correctly, performing multiple delicate refactoring steps, and accounting for any number of edge cases was highly tedious, even to do just once or twice. At Nubank’s scale, however, the total migration scope involved more than 1,000 engineers moving ~100,000 data class implementations over an expected timeline of 18 months.
In a world where engineering resources are scarce, such large-scale migrations and modernizations become massively expensive, time-consuming projects that distract from any engineering team’s core mission: building better products for customers. Unfortunately, this is the reality for many of the world’s largest organizations.
At project outset in 2023, Nubank had no choice but to rely on their engineers to perform code changes manually. Migrating one data class was a highly discretionary task, with multiple variations, edge cases, and ad hoc decision-making — far too complex to be scriptable, but high-volume enough to be a significant manual effort.
Within weeks of Devin’s launch, Nubank identified a clear opportunity to accelerate their refactor at a fraction of the engineering hours. Migration or large refactoring tasks are often fantastic projects for Devin: after investing a small, fixed cost to teach Devin how to approach sub-tasks, Devin can go and complete the migration autonomously. A human is kept in the loop just to manage the project and approve Devin’s changes.
A task of this magnitude, with the vast number of variations that it had, was a ripe opportunity for fine-tuning. The Nubank team helped to collect examples of previous migrations their engineers had done manually, some of which were fed to Devin for fine-tuning. The rest were used to create a benchmark evaluation set. Against this evaluation set, we observed a doubling of Devin’s task completion scores after fine-tuning, as well as a 4x improvement in task speed. Roughly 40 minutes per sub-task dropped to 10, which made the whole migration start to look much cheaper and less time-consuming, allowing the company to devote more energy to new business and new value creation instead.
Devin contributed to its own speed improvements by building itself classical tools and scripts it would later use on the most common, mechanical components of the migration. For instance, detecting the country extension of a data class (either ‘br’, ‘co’, or ‘mx’) based on its file path was a few-step process for each sub-task. Devin’s script automatically turned this into a single step executable — improvements from which added up immensely across all tens of thousands of sub-tasks.
There is also a compounding advantage on Devin’s learning. In the first weeks, it was common to see outstanding errors to fix, or small things Devin wasn’t sure how to solve. But as Devin saw more examples and gained familiarity with the task, it started to avoid rabbit holes more often and find faster solutions to previously-seen errors and edge cases. Much like a human engineer, we observed obvious speed and reliability improvements with every day Devin worked on the migration.
“Devin provided an easy way to reduce the number of engineering hours for the migration, in a way that was more stable and less prone to human error. Rather than engineers having to work across several files and complete an entire migration task 100%, they could just review Devin’s changes, make minor adjustments, then merge their PR”
Jose Carlos Castro, Senior Product Manager
Ramp is a financial operations platform helping companies save time and money. Their all-in-one platform combines corporate cards, bill payments, procurement and vendor management, travel booking, treasury, and more with built-in controls and intelligence to maximize the impact of every dollar and hour spent. Founded in 2019, Ramp is one of the fastest growing companies in the US, powering tens of billions in purchases annually for over 30,000+ businesses.
As Ramp continues to grow and expand into new product areas, they fight to manage an ever-growing technical backlog. Whether “business-as-usual” work (triaging on-call bugs, fixing broken tests) or “punch-list” tasks (resolving flakey tests or optimizing N+1 queries), the technical debt accumulation quickly began to take time away from the engineering team’s focus on product execution.
After some experimentation, Ramp found that a very small team of Devin-savvy engineers could be responsible for tens of thousands of engineer hours saved and up to weekly 80 merged PRs, tackling tasks that reached horizontally across the engineering organization. As part of their process, the applied Al team at Ramp built a new Devin workflow to tackle a different technical debt task every week, in some cases integrating the workflow directly into their ongoing SDLC.
There are three categories of solutions Ramp has implemented using Devin:
As any codebase grows, so too does the number of rote, repetitive clean-up tasks necessary to pay back larger and larger amounts of technical debt build-up. Typically, these are the types of tasks engineers dread – repetitive, uninteresting maintenance tasks that account for up to 20% of engineering time.
A test suite full of unoptimized tests, for example, has a clear and direct negative effect on developer velocity. Since most Ramp developers regularly run tests locally as part of their SDLC, any speed improvement to the test suite gets multiplied by hundreds, if not thousands, of executions each day. Accelerating the test suite completion time by an entire minute can quickly add up to thousands of developer hours saved per year.
Take feature flag removal as another example – hundreds of deprecated feature flags and forked logic across the codebase makes the code slower and less readable. If an engineer is triaging a critical bug and encounters a feature flag, determining if it is still in use requires switching context, investigating the feature flag manager, understanding the code logic, and can add significant time and complexity to an already tedious task.
Feature flag removal also cannot be solved in a straightforward or scriptable manner. It requires understanding the core logic of the code and identifying all of the potential downstream impacts of removing a code fork. Given many hundreds of feature flags at Ramp, this was starting to become a clear blocker to developer velocity
Given the number of low-hanging-fruit technical debt tasks across the codebase, Ramp deployed Devin to automate one clean-up or optimization task at a time. To do so, they leveraged Devin’s API to trigger multiple Devins in parallel to return completed PRs en masse.
“It can take several days to remove a single feature flag. We’ve tried scripting it in the past, but only Devin can comprehensively remove the feature flag PLUS fix any breaking tests or other dependencies. In the past month alone this has saved us over 1000 engineering hours.”
—Rakesh Nori, Software Engineer, Ramp
Removing just a single feature flag can be challenging, particularly when the flag affects logic throughout the codebase in a variety of different formats. Identifying all dead code paths, conditionals, and edge cases is not always straightforward and consumes valuable engineering time.
Devin’s core differentiator here was the ability to run Ramp tests and confirm that core code logic wasn’t affected by the feature flag removal. In certain cases, running the tests and seeing the stack trace was the only way to identify all downstream dependencies and affected tests, meaning that any classical static analysis approach would have been far more difficult to make work.
As a result, Ramp saw an opportunity to work with Devin to build an easy-to-use feature flag removal system with minimal overhead:
You are Worker Devin, responsible for removing the feature flag EXAMPLE_FLAG from the codebase. Perform the following changes on
`worker-devin/example-flag-1`
Your task:
- Remove all instances of the feature flag EXAMPLE_FLAG in the file assigned to you.
- The feature flag is now `on` in all cases. Ensure the behavior of the code does not change.
- Update any associated tests to reflect the changes.
Push your changes to your branch.
Now, any engineer at Ramp can simply input their feature flag into their internal tool, and Devin will automatically tag them when the PR is completed. Today, Devin is a core piece of internal Ramp tooling that orchestrates and executes tasks that were previously unachievable without a coding agent.
“Having Devin be the first eyes on every Airflow error is a massive time-saver. Half of the time we can merge Devin’s PR as-is, which saves hours of debugging. Even when the solution isn’t perfect, Devin’s change almost always brings us to the solution much faster anyway.”
—Peyton McCullough, Staff Software Engineer, Ramp
Data Platform is the nexus for a large number of internal systems and teams at Ramp, home to a lot of cross-functional activity and contributions. As a result, things break often. Every morning, Ramp’s Platform Engineering team wakes up to a handful of Airflow errors or failure reports to debug manually. These are time sensitive errors that are often directly related to critical finance services, but the code fixes themselves are usually straightforward. The vast majority of the engineering pain comes from context switching to the error or triaging the site of the bug in the first place.
So, after early experiments showed promise, Ramp decided to leverage Devin to automate the first pass on every Airflow failure and relieve the pressure on platform engineers. It was an easy implementation: Ramp simply took the Airflow error logs, determined those that made sense for Devin to fix, then triggered Devin via its API to work on a fix. If the CI checks completed successfully for Devin’s fix, Devin would then open a PR and tag the appropriate engineer.
Instead of a manual debugging process, Ramp’s engineers now simply review and approve a Devin PR that already passes CI. Even in the case where the fix isn’t exactly right, it is almost always in the right ballpark, which will still dramatically accelerate time-to-fix.
“Devin helped automate the process of reducing our test suite local runtime by an entire minute. For every single engineer on the team, that’s up to 20 minutes of dev time back every single day.”
—Maxim Enis, Software Engineer, Ramp
Ramp was sitting on a backlog of over 100 slow tests and over 500 legacy, undocumented endpoints. In both cases, the team built a prompt for Devin that used an objective validation criteria, which essentially gave Devin a test-driven development workflow. For slow tests, they created a verification script that confirmed Devin’s optimizations maintained identical test coverage. Similarly for endpoint documentation, the script completed only when the expected response types matched Devin’s documented API types.
function validate_test_split(old_test_id, new_test_ids):
// 1. Validate refactoring of new test files
for each test_id in new_test_ids:
verify file contains "this test was split" comment
for each test_function in file:
verify function body contains either:
- exactly one statement (must be a function call) - or exactly two
- statements (assignment + function call)
// 2. Verify test count hasn't changed
original_count = count_tests(old_test_id)
new_count = count_tests(new_test_ids)
if original_count != new_count:
fail "Test count mismatch"
// 3. Compare coverage between old and new tests
new_coverage = run_coverage_analysis(new_test_ids)
old_coverage = run_coverage_analysis(old_test_id)
// 4. Check for coverage regression
regressions = {}
for each file in old_coverage:
find lines that were covered in original test but not in new test if any lines are missing:
record missing lines for this file
if regressions is not empty:
fail "Coverage regression detected"
else:
success "No coverage regression"
Using these validation scripts, Ramp was able to essentially remove all false positives and confidently process their entire backlog through Devin. Even if there were certain tasks that Devin couldn’t complete, the script would not pass and the PR would never get surfaced. This allowed Ramp to parallelize as many Devins as they wanted without fear of being noisy, bottlenecked only by bandwidth for human review.
This style of use case is one of the most common at Ramp – namely, a Devin workflow that can be applied to every backlog instance of a technical debt use case.
“At Ramp, there’s an internal tracker of Devin use case suggestions that any engineer can contribute to. Every week, our Applied AI team selects and automates the highest impact ones using Devin. Devin has essentially given us a completely new way of eliminating the most tedious technical debt workflows that, until now, were all being done manually.”
—Rahul Sengottuvelu, Head of Applied AI, Ramp
Devin has saved thousands of hours of engineering work so far in technical debt cleanup across a variety of use cases, dramatically accelerating core product and engineering velocity at Ramp.