How Nubank refactors millions of lines of code to improve engineering efficiency with Devin

engineering time efficiency gain

20x

cost savings

Overview

One of Nubank’s most critical, company-wide projects for 2023-2024 was a migration of their core ETL — an 8 year old, multi-million lines of code monolith — to sub-modules. To handle such a large refactor, their only option was a multi-year effort that distributed repetitive refactoring work across over one thousand of their engineers. With Devin, however, this changed: engineers were able to delegate Devin to handle their migrations and achieve a 12x efficiency improvement in terms of engineering hours saved, and over 20x cost savings. Among others, Data, Collections, and Risk business units verified and completed their migrations in weeks instead of months or years.

The Problem

Nubank was born into the tradition of centralized ETL FinServ architectures. To date, the monolith architecture had worked well for Nubank — it enabled the developer autonomy and flexibility that carried them through their hypergrowth phases. After 8 years, however, Nubank’s sheer volume of customer growth, as well as geographic and product expansion beyond their original credit card business, led to an entangled, behemoth ETL with countless cross-dependencies and no clear path to continuing to scale.

For Nubankers, business critical data transformations started taking increasingly long to run, with chains of dependencies as deep as 70 and insufficient formal agreements on who was responsible for maintaining what. As the company continued to grow, it became clear that the ETL would be a primary bottleneck to scale.

Nubank concluded that there was an urgent need to split up their monolithic ETL repository, amassing over 6 million lines of code, into smaller, more flexible sub-modules.

Nubank’s code migration was filled with the monotonous, repetitive work that engineers dread. Moving each data class implementation from one architecture to another while tracing imports correctly, performing multiple delicate refactoring steps, and accounting for any number of edge cases was highly tedious, even to do just once or twice. At Nubank’s scale, however, the total migration scope involved more than 1,000 engineers moving ~100,000 data class implementations over an expected timeline of 18 months.

In a world where engineering resources are scarce, such large-scale migrations and modernizations become massively expensive, time-consuming projects that distract from any engineering team’s core mission: building better products for customers. Unfortunately, this is the reality for many of the world’s largest organizations.

The Decision: an army of Devins to tackle subtasks in parallel

At project outset in 2023, Nubank had no choice but to rely on their engineers to perform code changes manually. Migrating one data class was a highly discretionary task, with multiple variations, edge cases, and ad hoc decision-making — far too complex to be scriptable, but high-volume enough to be a significant manual effort.

Within weeks of Devin’s launch, Nubank identified a clear opportunity to accelerate their refactor at a fraction of the engineering hours. Migration or large refactoring tasks are often fantastic projects for Devin: after investing a small, fixed cost to teach Devin how to approach sub-tasks, Devin can go and complete the migration autonomously. A human is kept in the loop just to manage the project and approve Devin’s changes.

The Solution: Custom ETL Migration Devin

A task of this magnitude, with the vast number of variations that it had, was a ripe opportunity for fine-tuning. The Nubank team helped to collect examples of previous migrations their engineers had done manually, some of which were fed to Devin for fine-tuning. The rest were used to create a benchmark evaluation set. Against this evaluation set, we observed a doubling of Devin’s task completion scores after fine-tuning, as well as a 4x improvement in task speed. Roughly 40 minutes per sub-task dropped to 10, which made the whole migration start to look much cheaper and less time-consuming, allowing the company to devote more energy to new business and new value creation instead.

Devin contributed to its own speed improvements by building itself classical tools and scripts it would later use on the most common, mechanical components of the migration. For instance, detecting the country extension of a data class (either ‘br’, ‘co’, or ‘mx’) based on its file path was a few-step process for each sub-task. Devin’s script automatically turned this into a single step executable — improvements from which added up immensely across all tens of thousands of sub-tasks.

There is also a compounding advantage on Devin’s learning. In the first weeks, it was common to see outstanding errors to fix, or small things Devin wasn’t sure how to solve. But as Devin saw more examples and gained familiarity with the task, it started to avoid rabbit holes more often and find faster solutions to previously-seen errors and edge cases. Much like a human engineer, we observed obvious speed and reliability improvements with every day Devin worked on the migration.

    Results:
    Delivering an 8-12x faster migration, lifting a burden from every engineer, and slashing migration costs by 20x.

“Devin provided an easy way to reduce the number of engineering hours for the migration, in a way that was more stable and less prone to human error. Rather than engineers having to work across several files and complete an entire migration task 100%, they could just review Devin’s changes, make minor adjustments, then merge their PR”

Jose Carlos Castro, Senior Product Manager

    8-12x efficiency gains
    This is calculated by comparing the typical engineering hours required to complete a data class migration task against the total engineering hours spent prompting and reviewing Devin’s work on the same task.

    Over 20x cost savings on scope of the migration delegated to Devin
    This is calculated by comparing the cost of running Devin versus the hourly cost of an engineer completing that task. The significant savings are heavily driven by speed of task execution and cost effectiveness of Devin relative to human engineering time – it does not even consider the value captured by completing the entire project months ahead of schedule!

Fewer dreaded migration tasks for Nubank engineers

Overview

As Ramp continues to grow and expand into new product areas, they fight to manage an ever-growing technical backlog. Whether “business-as-usual” work (triaging on-call bugs, fixing broken tests) or “punch-list” tasks (resolving flakey tests or optimizing N+1 queries), the technical debt accumulation quickly began to take time away from the engineering team’s focus on product execution.

After some experimentation, Ramp found that a very small team of Devin-savvy engineers could be responsible for tens of thousands of engineer hours saved and up to weekly 80 merged PRs, tackling tasks that reached horizontally across the engineering organization. As part of their process, the applied Al team at Ramp built a new Devin workflow to tackle a different technical debt task every week, in some cases integrating the workflow directly into their ongoing SDLC.

There are three categories of solutions Ramp has implemented using Devin:

Devin-powered internal tooling (e.g. feature flag removal tool)
Fully automated, event-driven tasks (e.g. automatic Airflow error resolution)
Backlog tasks (e.g. fixing hundreds of slow or flakey tests)

The problem: fighting technical debt accumulation

As any codebase grows, so too does the number of rote, repetitive clean-up tasks necessary to pay back larger and larger amounts of technical debt build-up. Typically, these are the types of tasks engineers dread – repetitive, uninteresting maintenance tasks that account for up to 20% of engineering time.

A test suite full of unoptimized tests, for example, has a clear and direct negative effect on developer velocity. Since most Ramp developers regularly run tests locally as part of their SDLC, any speed improvement to the test suite gets multiplied by hundreds, if not thousands, of executions each day. Accelerating the test suite completion time by an entire minute can quickly add up to thousands of developer hours saved per year.

Take feature flag removal as another example – hundreds of deprecated feature flags and forked logic across the codebase makes the code slower and less readable. If an engineer is triaging a critical bug and encounters a feature flag, determining if it is still in use requires switching context, investigating the feature flag manager, understanding the code logic, and can add significant time and complexity to an already tedious task.

Feature flag removal also cannot be solved in a straightforward or scriptable manner. It requires understanding the core logic of the code and identifying all of the potential downstream impacts of removing a code fork. Given many hundreds of feature flags at Ramp, this was starting to become a clear blocker to developer velocity

The solution: one Devin workflow at a time

Given the number of low-hanging-fruit technical debt tasks across the codebase, Ramp deployed Devin to automate one clean-up or optimization task at a time. To do so, they leveraged Devin’s API to trigger multiple Devins in parallel to return completed PRs en masse.

    Devin feature flag removal
    150 complex feature flags
    removed in a month, saving thousands of engineering hours

“It can take several days to remove a single feature flag. We’ve tried scripting it in the past, but only Devin can comprehensively remove the feature flag PLUS fix any breaking tests or other dependencies. In the past month alone this has saved us over 1000 engineering hours.”

—Rakesh Nori, Software Engineer, Ramp

Removing just a single feature flag can be challenging, particularly when the flag affects logic throughout the codebase in a variety of different formats. Identifying all dead code paths, conditionals, and edge cases is not always straightforward and consumes valuable engineering time.

Devin’s core differentiator here was the ability to run Ramp tests and confirm that core code logic wasn’t affected by the feature flag removal. In certain cases, running the tests and seeing the stack trace was the only way to identify all downstream dependencies and affected tests, meaning that any classical static analysis approach would have been far more difficult to make work.

As a result, Ramp saw an opportunity to work with Devin to build an easy-to-use feature flag removal system with minimal overhead:

Ramp developed a primary playbook – a standardized prompt that can be programmatically attached to recurring tasks – that coordinates multiple “worker” Devins to tackle different aspects of a feature flag removal in parallel. This allowed Ramp to scale the playbook to feature flags with high complexity, since the task could be divided up across multiple “workers.”
A “clean-up” Devin verifies that there are no conflicts in the “worker” Devin outputs and consolidates the changes into a single, easily-reviewable PR.
The entire Devin workflow was abstracted into an internal tool accessible and invokable from Ramp’s admin dashboard.

Example worker Devin prompt:

    
    You are Worker Devin, responsible for removing the feature flag EXAMPLE_FLAG from the codebase. Perform the following changes on
    `worker-devin/example-flag-1`

    Your task:

    - Remove all instances of the feature flag EXAMPLE_FLAG in the file assigned to you.
    - The feature flag is now `on` in all cases. Ensure the behavior of the code does not change.
    - Update any associated tests to reflect the changes.

    Push your changes to your branch.

Now, any engineer at Ramp can simply input their feature flag into their internal tool, and Devin will automatically tag them when the PR is completed. Today, Devin is a core piece of internal Ramp tooling that orchestrates and executes tasks that were previously unachievable without a coding agent.

    Automating airflow fixes
    8 mins
    average Devin bug-to-PR time

“Having Devin be the first eyes on every Airflow error is a massive time-saver. Half of the time we can merge Devin’s PR as-is, which saves hours of debugging. Even when the solution isn’t perfect, Devin’s change almost always brings us to the solution much faster anyway.”

—Peyton McCullough, Staff Software Engineer, Ramp

Data Platform is the nexus for a large number of internal systems and teams at Ramp, home to a lot of cross-functional activity and contributions. As a result, things break often. Every morning, Ramp’s Platform Engineering team wakes up to a handful of Airflow errors or failure reports to debug manually. These are time sensitive errors that are often directly related to critical finance services, but the code fixes themselves are usually straightforward. The vast majority of the engineering pain comes from context switching to the error or triaging the site of the bug in the first place.

So, after early experiments showed promise, Ramp decided to leverage Devin to automate the first pass on every Airflow failure and relieve the pressure on platform engineers. It was an easy implementation: Ramp simply took the Airflow error logs, determined those that made sense for Devin to fix, then triggered Devin via its API to work on a fix. If the CI checks completed successfully for Devin’s fix, Devin would then open a PR and tag the appropriate engineer.

Instead of a manual debugging process, Ramp’s engineers now simply review and approve a Devin PR that already passes CI. Even in the case where the fix isn’t exactly right, it is almost always in the right ballpark, which will still dramatically accelerate time-to-fix.

    Resolving slow tests and documenting endpoints
    20 mins
    of dev time per engineer saved each day

“Devin helped automate the process of reducing our test suite local runtime by an entire minute. For every single engineer on the team, that’s up to 20 minutes of dev time back every single day.”

—Maxim Enis, Software Engineer, Ramp

Ramp was sitting on a backlog of over 100 slow tests and over 500 legacy, undocumented endpoints. In both cases, the team built a prompt for Devin that used an objective validation criteria, which essentially gave Devin a test-driven development workflow. For slow tests, they created a verification script that confirmed Devin’s optimizations maintained identical test coverage. Similarly for endpoint documentation, the script completed only when the expected response types matched Devin’s documented API types.

Slow test validation script pseudocode:

    
    function validate_test_split(old_test_id, new_test_ids):
    // 1. Validate refactoring of new test files
    for each test_id in new_test_ids:
    verify file contains "this test was split" comment
    for each test_function in file:
    verify function body contains either:
    - exactly one statement (must be a function call)  - or exactly two
    - statements (assignment + function call)

    // 2. Verify test count hasn't changed
    original_count = count_tests(old_test_id)
    new_count = count_tests(new_test_ids)
    if original_count != new_count:
    fail "Test count mismatch"

    // 3. Compare coverage between old and new tests
    new_coverage = run_coverage_analysis(new_test_ids)
    old_coverage = run_coverage_analysis(old_test_id)

    // 4. Check for coverage regression
    regressions = {}
    for each file in old_coverage:
    find lines that were covered in original test but not in new test if any lines are missing:
    record missing lines for this file

    if regressions is not empty:
    fail "Coverage regression detected"
    else:
    success "No coverage regression"

Using these validation scripts, Ramp was able to essentially remove all false positives and confidently process their entire backlog through Devin. Even if there were certain tasks that Devin couldn’t complete, the script would not pass and the PR would never get surfaced. This allowed Ramp to parallelize as many Devins as they wanted without fear of being noisy, bottlenecked only by bandwidth for human review.

This style of use case is one of the most common at Ramp – namely, a Devin workflow that can be applied to every backlog instance of a technical debt use case.

Results

“At Ramp, there’s an internal tracker of Devin use case suggestions that any engineer can contribute to. Every week, our Applied AI team selects and automates the highest impact ones using Devin. Devin has essentially given us a completely new way of eliminating the most tedious technical debt workflows that, until now, were all being done manually.”

—Rahul Sengottuvelu, Head of Applied AI, Ramp

Devin has saved thousands of hours of engineering work so far in technical debt cleanup across a variety of use cases, dramatically accelerating core product and engineering velocity at Ramp.

How Nubank refactors millions of lines of code to improve engineering efficiency with Devin

Overview

The Problem

The Decision: an army of Devins to tackle subtasks in parallel

The Solution: Custom ETL Migration Devin

Devin fixes tens of thousands of hours of technical debt, letting engineers focus on what matters most: saving their customers time and money

About the company

Overview

The problem: fighting technical debt accumulation

The solution: one Devin workflow at a time

Example worker Devin prompt:

Slow test validation script pseudocode:

Results

Build more with
Devin

Need Devin for your enterprise?

Get started with Devin Enterprise

How Nubank refactors millions of lines of code to improve engineering efficiency with Devin

Overview

The Problem

The Decision: an army of Devins to tackle subtasks in parallel

The Solution: Custom ETL Migration Devin

Devin fixes tens of thousands of hours of technical debt, letting engineers focus on what matters most: saving their customers time and money

About the company

Overview

The problem: fighting technical debt accumulation

The solution: one Devin workflow at a time

Example worker Devin prompt:

Slow test validation script pseudocode:

Results

Build more withDevin

Need Devin for your enterprise?

Get started with Devin Enterprise

Build more with
Devin