EN

United States (EN)

Australia (EN)

Canada (EN)

Canada (FR)

France (FR)

Germany (DE)

Ireland (EN)

Netherlands (NL)

Spain (ES)

United Kingdom (EN)

EN

United States (EN)

Australia (EN)

Canada (EN)

Canada (FR)

France (FR)

Germany (DE)

Ireland (EN)

Netherlands (NL)

Spain (ES)

United Kingdom (EN)

Corroding the Monolith: Building a Rust-Native MongoDB Client for Python with AI-First Development

An isometric illustration featuring three square buttons with lightning bolt icons on a dark purple background. The center button is glowing bright pink/purple and connected to circuit board lines, while the two outer buttons are gray and inactive.

For Comyar Zaheri and Silas Marvin, engineers on Rippling’s Core Platform team, improving system performance is a constant priority. The official Python MongoDB client, used by Ripping's query engine to handle over 800 million queries a day, was directly in the hotpath for our products and they saw an opportunity to improve performance for our customers. So, they built a custom solution from scratch.

The result was mongoxide, a faster, more efficient MongoDB client built in Rust. mongoxide reduced database query execution times by 45% and enabled a new high-performance object-document mapper to replace Rippling's existing implementation with 20x better performance.

Read Corroding the Monolith to find out how.


Introduction

At Rippling, we operate a large Python monolith serving dozens of products, many of which rely heavily on . Most application code queries MongoDB through the – either directly or via our internal fork of , an object-document mapper built on top of it. Because PyMongo sits on the hot path for over 800M queries per day at Rippling, the client is an ideal target for a cross-cutting performance improvement by our Core Platform organization. 

Based on production traces, profiling, and a few small-scale experiments, we believed we could achieve lower latency and more consistent query performance with a MongoDB driver that interleaved CPU and I/O work, performed less memory allocations, and reduced time spent under the . In pursuit of those gains, we identified that implementing a new driver in Rust gave us additional technical options that are not practical in pure Python.

Using and , we built mongoxide, a Rust-native MongoDB client for Python. mongoxide provides functional parity with PyMongo for reads, which allowed for drop-in replacement in our monolith. In support of our most performance-sensitive products, mongoxide also ships a new object-document mapper which performs zero-copy deserialization from bytes-on-the-wire to an application-defined Python object, bypassing all intermediate memory allocations and type coercions.

Using mongoxide, we reduced the latency of our read-heavy MongoDB workloads by up to 45%, and cut tail and customer-perceived latencies by as much as 60% across all percentiles.

To build mongoxide, we leaned into an AI-first, spec-driven workflow which allowed us to build a high-quality library – along with robust parity, correctness, and benchmarking suites – in just under three engineering months.

In this post, we’ll walk through what we built, how we validated compatibility with PyMongo, and how we leveraged an agentic coding workflow to enable a small team to quickly ship a low-level rewrite into production with confidence.

Understanding the Opportunity

Many of the read-heavy workloads at Rippling are driven by Rippling Query Language (RQL), our DSL that executes transactional queries in Python and can scan many thousands of MongoDB documents to answer customer queries. RQL is used in Supergroups as well as other critical platform components which govern permissions, benefits enrollments, payroll configuration, and more. In production traces of these paths, BSON deserialization and Python object materialization accounted for a significant amount of wall time in read spans, often matching or exceeding the time spent waiting on MongoDB. 

When receiving batches of documents from MongoDB, PyMongo serially and eagerly decodes each document’s BSON bytes into nested Python dict/list objects, allocating per container and per field, and it performs much of that deserialization while holding the Python GIL. For queries that return many documents, the allocation overhead and serial nature of the decoding work drove both CPU per request and tail-latency for Rippling’s read-heavy workloads. 

In RQL execution and many MongoEngine-backed call sites, our business logic typically reads only a small subset of fields from wide documents. While projections could limit what MongoDB returns, using MongoEngine makes that optimization difficult; it often requires proliferating narrowly-scoped models or dropping into PyMongo for special cases. In order to avoid the increased maintenance burden of either approach, our product teams often reuse broader models, which can lead to over-fetching from MongoDB. And because PyMongo eagerly decodes documents, we always pay the cost to deserialize everything even if business logic only reads a couple of fields. Rather than trying to push projection/model changes across all business logic at Rippling, we sought to remove the eager-deserialization penalty in a new driver implementation, along with a host of other performance improvements.

We set a narrow bar for a rewrite: preserve call-site ergonomics, match PyMongo’s observable behavior for the read APIs we rely on, reduce memory allocations, reduce GIL hold time, and interleave CPU and I/O.

Introducing mongoxide

mongoxide is a Python extension module built with PyO3 and packaged with maturin. For reads, it keeps a PyMongo-shaped surface area: find() returns a cursor, and the caller iterates results the same way they would with PyMongo. That lets existing call sites across Rippling swap the client without rewriting query logic.

mongoxide is built around a streaming pipeline that keeps MongoDB reads, CPU work, and Python consumption decoupled but continuously in motion. Instead of waiting for each request to fully complete before issuing the next, a runtime issues asynchronous queries to MongoDB and hands off each received batch to a thread pool for field indexing, while continuing to fetch subsequent batches in parallel. The result is an overlap of network I/O and computation that avoids idle time and turns what would otherwise be a stop-and-go request loop into a continuously advancing pipeline. 

Media Item | Corroding the Monolith 1 | .png

Throughout pipeline execution, the Python GIL is never held. All network I/O, batching, and indexing remain entirely in Rust, and Python objects are only materialized when the caller actually accesses a document or field. Because this pipeline can run ahead of a Python consumer, mongoxide applies backpressure to keep resource usage under control. A semaphore caps the number of in-flight batches, ensuring memory remains bounded even when indexing outpaces consumption or the Python side stops iterating altogether.

mongoxide also takes particular care to avoid memory copies. MongoDB batch responses arrive as contiguous BSON buffers, and mongoxide represents each document as a slice into that buffer and avoids copying owned BSON bytes anywhere in its deserialization logic. In order to achieve this, we collaborated with the engineering team at MongoDB to introduce a new, high-performance alternative to the standard cursor API which would allow direct access to server response batches without per-document deserialization overhead. This new API, , is available in .

Agentic Coding for Parity and Performance

Swapping the MongoDB driver used in Rippling’s monolith is incredibly risky; a semantic difference between mongoxide and PyMongo behavior could result in data corruption, which could have real-world impact for companies using Rippling, such as affecting if their employees are paid on time.

In addition to shadow testing and staged deployments, we knew we needed a comprehensive parity-testing suite that allowed us to validate mongoxide’s behavior against PyMongo. To build a comprehensive suite quickly, we were able to leverage AI.

At Rippling, we have found that agentic coding using a spec-driven workflow with strict guardrails can allow a small, experienced engineering team to deliver high-quality results extremely quickly, especially when significant care is given to designing prompts and managing agent context.

Our workflow was relatively simple but very strict. For each parity feature (e.g. skip/limit, projections, sort semantics, etc.), we wrote a short spec that included:

  • a canonical example test, 

  • a set of success and failure criteria, 

  • how to record and label divergences in behavior, 

  • authoritative sources, such as public PyMongo documentation or targeted PyMongo source code snippets,

  • and readability/maintainability constraints, such as naming, structure, and what not to assert

Using these specs, we generated human-reviewed implementation plans for each feature under test and directed agents to enumerate test combinations after human approval. Using this agentic workflow, we were able to implement a 600 test parity suite in two engineering days that comprehensively tested mongoxide's entire surface area and validated its parity with PyMongo.

An example test generated using this methodology looks as follows:

Beyond parity testing, we found that the same agentic, spec-driven approach unlocked the ability to build bespoke benchmarking and testing infrastructure that would normally be hard to justify from a cost or time perspective.

When benchmarking mongoxide against PyMongo, we initially saw significant noise in our results when executing queries against MongoDB. Even with identical queries and datasets, MongoDB’s execution latency varied enough between runs to muddy the signals we actually cared about: client-side I/O, buffering, and deserialization performance. This made it difficult to attribute performance differences to the driver itself rather than to fluctuations in the database.

To eliminate this source of variance, we used the same AI-assisted workflow to build a lightweight server that implements MongoDB’s wire protocol and serves precomputed responses directly from memory. By removing MongoDB from the benchmarking loop entirely, we were able to provide consistent, deterministic responses to both PyMongo and mongoxide, dramatically reducing noise and allowing us to isolate client-side performance characteristics with much higher confidence.

As with our parity-testing work, this effort was guided by a narrowly scoped spec and strict guardrails: which parts of the wire protocol needed to be supported, which behaviors could be stubbed or simplified, what correctness guarantees were required, and what was explicitly out of scope. With that structure in place, an agent was able to produce a working implementation in roughly an hour of engineering time – something that would typically be deferred or abandoned as “too much effort”.

While agentic coding workflows are often discussed with respect to feature development, we have also found extreme value in using them to quickly build bespoke internal tooling that can be used to directly improve correctness or assess performance in our production systems.

Performance

To compare mongoxide performance against PyMongo, we wrote an end-to-end benchmark that timed querying, iterating, and accessing a handful of fields per document across varying projection sizes. For each projection preset, the benchmark runs a collection scan via find()batch_size=1000, limit=100000 – so the reported wall time includes server execution, network transfer, client-side decoding/materialization, Python iteration overhead, and field access on every returned document. The projection presets are as follows:

  • few: 4 projected fields, smallest payload; top-level only

  • small: 9 projected paths, mostly top-level, with a couple nested dot-paths

  • medium: 15 projected paths, mix of top-level and several nested dot-paths, including deeper nesting

  • large: 35 projected paths, many top-level and multiple nested dot-paths; includes more structured payload like subdocs/arrays

  • full: no projection, full document; the seeded synthetic shape is ~28 top-level keys and ~45-ish total keys including nested subdocuments, plus a small array and several optional/nullable fields

The following benchmark results were taken on a 2024 MacBook Pro with an M4 Pro and 48 GB RAM on cpython 3.12.11.

Media Item | Corroding the Monolith 3 | .PNG

Benchmark

Mongoxide Mean ± Std Dev

PyMongo Mean ± Std Dev

few

68.863 ± 6.155

96.016 ± 7.550

small

94.451 ± 2.000

157.040 ± 4.262

medium

126.034 ± 1.603

274.751 ± 7.738

large

152.041 ± 2.923

552.183 ± 8.380

full

156.457 ± 3.601

610.234 ± 5.536

Our benchmark results show that mongoxide’s advantages compound as workloads become more demanding. It is faster than PyMongo across all our benchmarks, scales better as projections and batch sizes grow, and benefits from Rust’s more consistent execution characteristics, yielding both higher throughput and lower latency variance.

When enabled in production, we observed that mongoxide had the same effect on overall RQL execution latency. Below is a graph from our internal dashboard; the green vertical annotation represents enabling mongoxide for use in RQL execution, replacing PyMongo. We observed a significant drop in query latency across all percentiles and a similarly significant reduction in standard deviation, resulting in both customer-perceived latency improvements as well as more consistent query execution across the platform.

Corroding the Monolith 5

Pushing the Envelope with Zero-Copy Deserialization

For drop-in compatibility, mongoxide can return dict-like documents whose values are converted into Python objects lazily, so reading one field does not require building a full dict/list of Python objects. However, much of our business logic involves querying from PyMongo and then converting returned dicts into Python classes. This means even when using mongoxide we still incurred the cost of materializing an intermediate dict-like document representation before the target Python object was constructed.

For our most latency-sensitive products, we wanted to enable full zero-copy deserialization, where we could directly transform bytes returned by MongoDB into application-defined Python objects without an intermediate dict-like document representation. To do so, we designed and built our own object-document mapper – similar to MongoEngine – that utilized mongoxide’s underlying deserialization pipeline and zero-copy document representation.

mongoxide’s object-document mapper is designed with user ergonomics in mind to ensure easy adoption by product teams across Rippling; for example, our core Model type provides ergonomics similar to a Python . The following is an example of how you might define a User model and query MongoDB.

In the above example, mongoxide finds documents in the users collection, filtering by name, and transforms retrieved documents into User objects. mongoxide avoids over-fetching from MongoDB by using a projection based on just the three fields defined in the User model: name, email, and age. This projection along with related deserialization metadata – such as field typing information – is computed only once the first time a model is used with mongoxide and then reused for all subsequent queries. 

Even when using find_model, we materialize Python objects as lazily as possible. After receiving a server response, we associated the slice of BSON bytes for each object with its model instance and materialize Python objects for fields and nested structures only when they are accessed.

Performance

We benchmarked mongoxide’s object-document mapper against two patterns commonly used across Rippling’s codebase:

  • Using PyMongo and manually deserializing returned dicts into Python dataclasses

  • Using MongoEngine, which handles both querying and deserialization into Python objects 

The following benchmark results were taken on a 2024 MacBook Pro with an M4 Pro and 48 GB RAM on cpython 3.12.11. Our benchmarked models and client configurations mirror the projection presets we used for our find() benchmark described previously.

Corroding the Monolith 7

Benchmark

Mongoxide Mean ± Std Dev

PyMongo Mean ± Std Dev

MongoEngine Mean ± Std Dev

few

40.211 ± 1.008

92.985 ± 4.176

797.007 ± 16.936

small

80.025 ± 1.72

193.896 ± 4.606

2,892.95 ± 31.205

medium

146.795 ± 2.492

353.798 ± 6.443

5,429.218 ± 55.326

large

289.313 ± 3.967

747.658 ± 15.218

9,754.371 ± 101.709

full

334.537 ± 7.052

830.027 ± 17.169

10,646.646  ± 177.655

Similar to our dict-like documentation representation benchmark, we see mongoxide’s latency is roughly 50-60% of PyMongo’s latency across all projection sizes, making it about 2.3x - 2.6x faster. More importantly, we see that mongoxide’s zero-copy, lazy-deserialization approach wildly outperforms MongoEngine end-to-end, delivering a latency improvement of 20x - 37x depending on projection size. For our most performance-sensitive products, mongoxide’s object-document mapper will have a huge impact on customer-perceived latency. We’re just beginning to integrate this into Rippling’s monolith and are very excited to see this rollout in the coming weeks.

Where We’re Going and Why You Should Join Us

The real challenge was in making it faster safely within a production system where one error can impact payroll for millions. We had to be intentional—and it paid off. To get mongoxide into production at Rippling, we had to be intentional about correctness testing, telemetry, shadow testing, and our rollout strategy. Similarly, we had to be intentional about avoiding migration campaigns across our product teams, and to design for ergonomics where we needed explicit adoption. It was this intentionality that allowed us to be successful, not just rewriting a library in Rust.

Just as importantly, the way we built mongoxide shaped the outcome. Like the rest of the industry, we’re actively developing best practices for leveraging agentic coding workflows, and this project served as a proving ground for new workflows we expect to reuse for other high-impact projects across Rippling.

mongoxide is the first major deliverable at Rippling where we replaced a core monolith component with Rust in production. Given its success, we are excited about further investing in Rust across our core platform to benefit our most performance-sensitive and high-impact projects. 

If you’re excited about Rust, databases, and distributed systems, we hope you’ll in the Core Platform organization to help further “corrode” the monolith.

Disclaimer

Rippling and its affiliates do not provide tax, accounting, or legal advice. This material has been prepared for informational purposes only, and is not intended to provide or be relied on for tax, accounting, or legal advice. You should consult your own tax, accounting, and legal advisors before engaging in any related activities or transactions.

Author

Person in formal black tuxedo and bow tie smiling at a nighttime event with twinkling lights in background.

Comyar Zaheri

Senior Staff Software Engineer

Comyar is a Senior Staff Software Engineer on Rippling’s Core Platform, focusing on performance, databases, and distributed systems.

See Rippling in action

Increase savings, automate busy work, and make better decisions by managing HR, IT, and Finance in one place.