How Nubank refactors millions of lines of code to improve engineering efficiency with Devin

8x
engineering time efficiency gain
20x
cost savings
Vimeo

Overview

One of Nubank’s most critical, company-wide projects for 2023-2024 was a migration of their core ETL — an 8 year old, multi-million lines of code monolith — to sub-modules. To handle such a large refactor, their only option was a multi-year effort that distributed repetitive refactoring work across over one thousand of their engineers. With Devin, however, this changed: engineers were able to delegate Devin to handle their migrations and achieve a 12x efficiency improvement in terms of engineering hours saved, and over 20x cost savings. Among others, Data, Collections, and Risk business units verified and completed their migrations in weeks instead of months or years.

The Problem

Nubank was born into the tradition of centralized ETL FinServ architectures. To date, the monolith architecture had worked well for Nubank — it enabled the developer autonomy and flexibility that carried them through their hypergrowth phases. After 8 years, however, Nubank’s sheer volume of customer growth, as well as geographic and product expansion beyond their original credit card business, led to an entangled, behemoth ETL with countless cross-dependencies and no clear path to continuing to scale.

For Nubankers, business critical data transformations started taking increasingly long to run, with chains of dependencies as deep as 70 and insufficient formal agreements on who was responsible for maintaining what. As the company continued to grow, it became clear that the ETL would be a primary bottleneck to scale.

Nubank concluded that there was an urgent need to split up their monolithic ETL repository, amassing over 6 million lines of code, into smaller, more flexible sub-modules.

Nubank’s code migration was filled with the monotonous, repetitive work that engineers dread. Moving each data class implementation from one architecture to another while tracing imports correctly, performing multiple delicate refactoring steps, and accounting for any number of edge cases was highly tedious, even to do just once or twice. At Nubank’s scale, however, the total migration scope involved more than 1,000 engineers moving ~100,000 data class implementations over an expected timeline of 18 months.

In a world where engineering resources are scarce, such large-scale migrations and modernizations become massively expensive, time-consuming projects that distract from any engineering team’s core mission: building better products for customers. Unfortunately, this is the reality for many of the world’s largest organizations.

The Decision: an army of Devins to tackle subtasks in parallel

At project outset in 2023, Nubank had no choice but to rely on their engineers to perform code changes manually. Migrating one data class was a highly discretionary task, with multiple variations, edge cases, and ad hoc decision-making — far too complex to be scriptable, but high-volume enough to be a significant manual effort.

Within weeks of Devin’s launch, Nubank identified a clear opportunity to accelerate their refactor at a fraction of the engineering hours. Migration or large refactoring tasks are often fantastic projects for Devin: after investing a small, fixed cost to teach Devin how to approach sub-tasks, Devin can go and complete the migration autonomously. A human is kept in the loop just to manage the project and approve Devin’s changes.

The Solution: Custom ETL Migration Devin

A task of this magnitude, with the vast number of variations that it had, was a ripe opportunity for fine-tuning. The Nubank team helped to collect examples of previous migrations their engineers had done manually, some of which were fed to Devin for fine-tuning. The rest were used to create a benchmark evaluation set. Against this evaluation set, we observed a doubling of Devin’s task completion scores after fine-tuning, as well as a 4x improvement in task speed. Roughly 40 minutes per sub-task dropped to 10, which made the whole migration start to look much cheaper and less time-consuming, allowing the company to devote more energy to new business and new value creation instead.

Devin contributed to its own speed improvements by building itself classical tools and scripts it would later use on the most common, mechanical components of the migration. For instance, detecting the country extension of a data class (either ‘br’, ‘co’, or ‘mx’) based on its file path was a few-step process for each sub-task. Devin’s script automatically turned this into a single step executable — improvements from which added up immensely across all tens of thousands of sub-tasks.

There is also a compounding advantage on Devin’s learning. In the first weeks, it was common to see outstanding errors to fix, or small things Devin wasn’t sure how to solve. But as Devin saw more examples and gained familiarity with the task, it started to avoid rabbit holes more often and find faster solutions to previously-seen errors and edge cases. Much like a human engineer, we observed obvious speed and reliability improvements with every day Devin worked on the migration.

Results: Delivering an 8-12x faster migration, lifting a burden from every engineer, and slashing migration costs by 20x.

“Devin provided an easy way to reduce the number of engineering hours for the migration, in a way that was more stable and less prone to human error. Rather than engineers having to work across several files and complete an entire migration task 100%, they could just review Devin’s changes, make minor adjustments, then merge their PR”

Jose Carlos Castro, Senior Product Manager

8-12x efficiency gains This is calculated by comparing the typical engineering hours required to complete a data class migration task against the total engineering hours spent prompting and reviewing Devin’s work on the same task.
Over 20x cost savings on scope of the migration delegated to Devin This is calculated by comparing the cost of running Devin versus the hourly cost of an engineer completing that task. The significant savings are heavily driven by speed of task execution and cost effectiveness of Devin relative to human engineering time – it does not even consider the value captured by completing the entire project months ahead of schedule!
Fewer dreaded migration tasks for Nubank engineers

How The Citation Group Measures Engineering ROI with Devin.

Vimeo
271
Merged PRs with an 80% merge rate
50%
Improvement in engineering efficiency
85%
Reduction in legacy app refresh time
180+
Weekly sessions, 1,200+ lifetime sessions logged (GitHub alone)

About the company

The Citation Group is a leading provider of legal, risk, compliance, and HR technology, supporting over 120,000 small and midsize businesses across the UK, Canada, Australia, and New Zealand.

Industry: Professional Services Visit site

About The Citation Group

The Citation Group provides legal, risk, compliance, and HR technology to more than 120,000 small and midsize businesses across the UK, Canada, Australia, and New Zealand.
Backed by private equity and expanded through acquisition, the company faced the challenge of scaling engineering output while managing legacy systems, distributed teams, and accumulated tech debt.

The Challenge: Establishing Trust in AI-Generated Code

Citation’s engineering leaders needed a reliable way to evaluate whether Devin was contributing production-quality work. Traditional metrics, such as sprint velocity, varied too much to isolate Devin’s impact. With distributed teams and partner developers, the concern was that AI-generated changes might not meet the same standards as human engineers, potentially adding technical debt instead of reducing it.

To validate quality, the team initially onboarded Devin to a set of projects and measured quality at the pull-request level. In the first three months, Devin generated 549 PRs, and 80% of them were merged after senior engineer review

“By tracking Devin’s pull requests directly, we finally had a clean, production-level signal of impact.” — Anthony Wray, AI Engineering Lead

Structuring Work for Devin

Citation first onboarded Devin through a series of hackathons run by internal and partner teams. Each team designated a lead to document results and share practices. In these early projects, unstructured prompts often produced inconsistent results, but tasks scoped in Jira with clear requirements — and supplemented by documentation pulled directly into Devin sessions — generated PRs that passed review.

Three practices proved critical:

  • Jira ticket scoping to define tasks with clear acceptance criteria.
  • Devin search-to-session to pull documentation and architectural context directly into a Devin coding session.
  • Markdown specification files to provide structured inputs and reduce ambiguity.

“Once we paired Devin with structured specs, the results became consistent. We built a rinse-and-repeat pattern that engineers could trust,” Wray explained.

Use Cases in Practice

Once Devin’s workflow was established, Citation applied it across a range of projects:

  • Legacy modernization: A compliance tool originally estimated as a three-month migration (from .NET Framework and AngularJS to .NET Core and React 18) reached a working prototype in two weeks. Devin decomposed the monolith into a clean architecture and delivered vertical slices end to end, from user interface through database entries, with static analysis and automated tests driving coverage above 90%.
  • Backlog throughput: Medium-priority tasks such as dependency upgrades and bug fixes that previously slipped sprint to sprint were consistently completed. In one corrective action feature, Devin contributed 147 merged pull requests, about 367 story points of work. This output was comparable to a multi-sprint epic delivered in weeks rather than months.
  • Debugging by non-engineers: Business Analysts used Devin to explain unexpected system behavior by querying the codebase directly. In one case, a BA spotted a subtle prefix mismatch, submitted a PR, and had it approved by a senior engineer, preventing a customer-facing bug.
  • Customer support: Devin automated root-cause analysis of help desk tickets, reducing the number of issues escalated to engineering.
  • Rapid prototyping: Engineers used Devin to build working proofs of concept in a few days. These included extending existing tools and testing new features that would normally wait weeks while teams focused on core delivery.

Scaling Adoption

Devin is now part of daily engineering practice at Citation. Engineers run more than 180 sessions each week, with over 1,200 sessions recorded in GitHub since rollout. What started as using Devin in scoped hackathon projects has scaled into steady, ongoing use in production across the engineering organization, with adoption also extending to business analysts and support teams.

“Devin isn’t just another tool. To get value, you have to change your ways of working. Once we did that, it started delivering results we wouldn’t have reached otherwise.” — Anthony Wray, AI Engineering Lead

Looking Ahead

Citation plans to expand Devin’s role in platform modernization and legacy upgrades over the next twelve months. The team is also working on deeper process integration, including automated Jira scoping and standardized specification files to make AI-driven development more predictable.

For Anthony Wray, the significance goes beyond throughput.

“The real value is that it forces us to rethink process. You can’t just slap AI on the old way of working. You have to redesign for what’s possible now.” — Anthony Wray, AI Engineering Lead

The Citation Group at a Glance

Company The Citation Group
Industry Legal, Risk, Compliance, and HR Technology
Scale Serves 120,000+ SMEs across the UK, Canada, Australia, and New Zealand
AI Use Cases
  • Autonomous code development & PR submission
  • Tech debt remediation and refactors
  • Customer support automation
  • Bug reproduction & root cause analysis by non-engineers
  • Proof-of-concept prototyping
  • Documentation-driven development with structured prompts
Key Outcomes
  • 549 PRs opened, 271 merged with an 80% merge rate
  • 50% improvement in engineering output efficiency (ACU-to-PR ratio) in first 3 months
  • ~300 story points delivered across 147 PRs, including epics that shipped in weeks instead of months
  • 180+ Devin sessions per week, 1,200+ lifetime sessions logged
  • Backlog items cleared without increasing headcount
  • Faster customer support resolution with fewer escalations
Innovation Approach Citation rolled out Devin through structured hackathons, appointing AI champions to document best practices and standardize prompt design.
By combining markdown-driven specifications with new efficiency metrics like ACU-to-PR ratio, the team built a repeatable framework for scaling AI across engineering and support workflows.