Stop building AI for perfect conditions

Rippling’s Ankur Bhatt shares how brittle workflows break AI agents — and the frameworks teams need to ship reliable, reasoning-driven agents into production.

Engineering teams are grappling with a different way of building right now.

Not iteratively different — fundamentally different. This is especially true for startups that are building AI agents, as industry best practices have not caught up with the pace of shipping agentic software into production.

At Rippling, our engineering team has been building through this seismic shift. Developers have gone from building simple text summarization to autonomous agents that investigate why employees didn't get paid.

In a fireside chat with LangChain CEO Harrison Chase, Rippling Head of AI Ankur Bhatt shared what he’s learned the hard way: deterministic workflows break in ways you can't predict, and pretty demos hide the problems you need to solve.

We take a closer look at these two blockers and what startups can do to avoid them.

The principle ⚛️

How to avoid your AI agents from breaking in production

Rippling’s framework for shipping AI agents covers four key areas (see Ankur’s infographic above). Today, we’re going deep on the two that Ankur sees block the most teams from getting to production.

The deep agent shift (aka stop building brittle, deterministic workflows)
Fighting ‘AI slop’ with evals (aka use production traces to catch what breaks)

Let’s dive in.

Let the LLM reason instead of mapping out every path

You built a perfect demo. A sales agent takes a call, summarizes it, updates Salesforce, and creates a document. But as soon as you ship it to production and a prospect asks about security during the call, it breaks. “This is not in the flow,” Ankur says.

Conventional wisdom says to fix this, devs should be building more deterministic workflows to map every possible branch and edge case.

Because Rippling is a compound startup, in the beginning, the premise was to be more deterministic and think about different domains, like payroll and IT (two different product suites.) “Then we'd create subagents that handled these domains with a simple router,” Ankur says. That approach broke immediately.

The challenge: Humans don't ask questions in a deterministic way. “You and I are different,” Ankur says. “You might ask, ‘How many people were onboarded last week?’' I may say ‘How many were hired?’ A simple change of words from an agent paradigm can mean completely different things.”

From his POV, engineering teams keep doubling down on more comprehensive routing. But Ankur's take is the opposite: “There is a lot of belief that we have to make workflow agents more deterministic. What we're finding is the way people use agents cannot be defined in a box.”

What to do instead: In the last month, Ankur has moved away from brittle, deterministic workflow agents into what he calls “the deep agent paradigm.”

"Lean into the power of LLMs,” he says. “Because they can do reasoning, thinking, and judgment. Give them ample context of the problem you're solving and the tools, you'll get much better outcomes.”

The shift: Treat workflows as tools the agent invokes when needed, not rigid paths it must follow. Workflows still matter for transactional sequences — ”places where we're taking certain actions predictably are good tools to plug into our overall agent design,” Ankur says. But the agent chooses when to use them.

Test with production data, not perfect demos

Beautiful demos with curated data collapse in production. Ankur sees this blocking most teams.

“You really have to work with production data. You can't do a simple demo with a demo instance, but really curated production snapshots where we can deploy our AI innovation and use it to fine-tune what's working.”

Example: dates. “We have a lot of information with dates. If users ask about dates, we have to have the right dates configured for when the user is logged on.” Demo instances don't have users across time zones asking about “last week” when pay periods don't align with calendar weeks. You only catch this with real production data.

But there's a deeper cultural blocker. Software engineers and ML practitioners think about failure differently.

“If you design something and it doesn't work, it's a bug. You fix it,” Ankur explained. “If you come from an ML background, you run experiments. Some work, others fail, and you move on. That concept is very alien to people in software.”

This isn't about lowering standards — it's about accepting some experiments won't work and moving on. Teams that can't embrace this spend months perfecting demos instead of weeks iterating in production.

Here's what Rippling does instead — three feedback mechanisms that make production testing sustainable:

Dogfooding: “We dogfood everything. The first step is to switch it on at Rippling and get feedback immediately. Feedback can be brutal but important.” Rippling CEO Parker Conrad actively tests features and Slack's feedback on what works. The feedback loop is hours or days, not weeks.

Controlled rollout: A small group of sophisticated users can help understand what you're building, and give fast signal collection + rapid feedback loops.

For Talent Signal, Rippling’s AI-powered performance tool for reviewing developers, CX, and sales roles, the team went to engineering managers first with thumbs up/down feedback.

Obsessive tracing: “The amount of time we spend in Langsmith tracing what's happening in production — how LLM is performing, how routing logic is functioning — is extremely valuable.” They trace individual interactions: Did it route correctly? Pick the right tools? Where did reasoning break?

They run control tests internally — multiple people testing the same feature, comparing traces, capturing what users wanted to see. “We use traces to compare/contrast what works/what didn't.”

Unit tests aren't enough. Production traces catch the edge cases that matter — the date bugs, the routing issues, the questions that break your workflow.

🎬 Watch the full fireside chat

The flywheel from prototype to production
Why agents are harder than simple LLM features
How Rippling and Langchain think about security and data privacy when building

Watch here.

The edge case ⚡️

3 free GTM workflow templates

I stumbled upon this great walkthrough from Khushi Shelat at Parallel Labs. She open-sourced three incredibly useful agentic workflows for startup founders:

A competitive SEO analyzer
Personalized email outreach drafter
Prospect list extractor from URL

If you are curious for the link or just want to see her walk through a live demo, the full post is here. (Not an ad, just a fan!)

Check it out here. Terms Apply*

More startup resources

Newsletter

He hired exactly right — and still lost 3 months of runway

Newsletter

Why Rippling’s board doesn’t watch decks — they debate strategy

Close up of someone wearing a fleece jacket with a patch that that says, "Spendsetters," with text below that says, "Expert advice from trailblazers in finance."

Newsletter

How OpenAI builds finance culture

See Rippling in action

See how Rippling can help you manage all of your employee data and operations in one place, no matter your business's size.

See Rippling Watch tour video

HR

Payroll

IT

Finance

Global

The Rippling Platform

Rippling Platform

Platform

Explore Rippling Resources

Meet the wavemakers

Select language

HR Leaders

IT Leaders

Finance Leaders

Startups

Stop building AI for perfect conditions

The principle ⚛️

How to avoid your AI agents from breaking in production

Let the LLM reason instead of mapping out every path

Test with production data, not perfect demos

🎬 Watch the full fireside chat

Recommended reads 📚

The edge case ⚡️

3 free GTM workflow templates

More startup resources

See Rippling in action

HR

Payroll

IT

Finance

Global

The Rippling Platform

Rippling Platform

Platform

Core Capabilities

Hire Globally

Resources

Explore Rippling Resources

Blog

Customer Stories

Meet the wavemakers

Webinars

Glossary

Events

Rippling U

Help Center

Compare

Select language

HR Leaders

IT Leaders

Finance Leaders

Startups

Stop building AI for perfect conditions

The principle ⚛️

How to avoid your AI agents from breaking in production

Let the LLM reason instead of mapping out every path

Test with production data, not perfect demos

🎬 Watch the full fireside chat

Recommended reads 📚

The edge case ⚡️

3 free GTM workflow templates

More startup resources

See Rippling in action