Rippling Data Cloud: Data Catalog

3D icons on purple background: white envelope, gray gridded block, and pink code symbol with brackets and slash.

As part of today’s Rippling Data Cloud announcement, we launched Data Catalog, which allows users to inspect, govern, and understand every data object in Rippling Data Cloud, across native Rippling data, connected apps, Transformations, and external warehouse data. This gives both analysts and Rippling AI the context they need to find the right data, understand what it means, and trace where and how it is used.

What is Data Catalog?

Data Catalog is the central inventory of every data object in Rippling Data Cloud, across native Rippling data, Custom Objects, Transformations, and connected external data. It is a critical component to enabling AI-driven data analysis, as it answers the questions: 

  • What data do we have?

  • What does the data mean?

  • Who is allowed to use the data?

  • Where did the data come from?

  • What workflows, views, dashboards, or derived datasets depend on the data?

Data Catalog turns Rippling Data Cloud into a governed, navigable data system for human users and AI.

Rippling AI needs a map of your data

When a user asks a question like “how much did we spend on recruiting this month?”, the hard part is not just generating a query. The hard part is knowing which version of “spend” the question refers to, which expense objects are authoritative, which vendors or categories count as recruiting, whether the answer should include invoices, card spend, reimbursements, purchase orders, or payroll allocations, and which fields the user is allowed to access.

In AI analytics, the first step often is mapping a natural-language question to the right objects, fields, joins, filters, grain, and business definitions. Field names alone are not enough. There may be many plausibly useful columns called amount, category, vendor, department, status, or date, each with different meanings. Rippling’s Data Catalog gives AI the context required to make the right selection: object descriptions, field definitions, usage patterns, lineage, relationships, permissions, and business semantics. That context is what allows Rippling AI to stop guessing around a warehouse schema, and instead reason over a governed map of your business.

More data, more problems

As your data estate grows across native Rippling data, third-party systems, and your warehouse, the challenge of simply locating, interpreting, and navigating permissions restrictions becomes massive. In addition to the overloaded column titles, multiple may exist, adding another dimension of confusion. Some of these Transformations may have been created by employees who have since left the company. Are they still running? How will you interpret what they’re calculating, if you can’t ask the creator? What reports or workflows depend on them? There are many problems that a well-architected central registry like Data Catalog can help solve.

One place for every object

The Data Catalog in Rippling Data Cloud is where every dataset in the system lives, regardless of origin. This includes native Rippling objects, Custom Objects from third-party systems, Transformations, and external objects from Snowflake via . All are searchable and organized in a single interface. Human users can search and browse datasets by name, category, or keyword. They can pin frequently-used objects for quick access or browse by logical category, such as Finance, or Devices, or Store Locations, to discover what's available.

Every object and field can include plain-English documentation: what it means, when to use it, and how it relates to the rest of the business. For native Rippling data, many are populated out of the box — a huge leg up for AI analysis on Rippling. For custom data, it is pulled from the source, generated by AI, and can be added/adjusted by admins.

Usage metadata helps the Catalog track which objects are actually being queried in reports, dashboards, and workflows. This identifies what your org relies on versus what was created once and forgotten, which is useful for governance and prioritization.

Lineage from source to use

Data Catalog lets you click any object and see its complete lineage, from end to end: where the data originates (such as a pipeline or connector), how it's been transformed, and where it surfaces in reports or apps. Rippling AI can also directly answer questions about Lineage.

[fig. 9] Lineage

Lineage shows the end-to-end data flow and makes it easy to spot and fix issues

In many data stacks, metadata, lineage, permissions and usage are split across the warehouse, transformation layer, BI tool, and governance system. To consolidate them often requires purchasing yet another tool from yet another vendor. And you still have to traverse through multiple tools to eventually fix the issue. In Rippling, the complete data path is in one view and changing or fixing a pipeline doesn’t require leaving the system.

Data Catalog becomes the working surface for all data management:

  • View and edit SQL: For objects derived from Transformations, you can view and edit the underlying SQL directly from the Catalog.

  • Unified across every data type: Most platforms have separate schema browsers or catalog interfaces for different data types. In Rippling, native objects, custom objects, transformed objects, zero copy objects, and managed connector objects all live in the Data Catalog.

  • Granular permissions management: From the Data Catalog, you can manage who has access to each dataset down to the field level. Permissions are tied to Rippling's role-based model and update automatically.

  • Automated metadata enrichment: As the data estate grows, Rippling AI generates plain-English descriptions, surfaces representative sample values, and computes basic field statistics. For example, when a brings in Salesforce data, the Catalog surfaces readable descriptions of each field, rather than a wall of cryptic API names.

  • Tagging: Objects can be added to favorites or marked as verified to guide other users and AI on how best to answer a question.

The Catalog as a control panel

For the person responsible for the data estate, the Catalog is the operational interface for all of it. From a single object's Catalog entry, you can navigate directly to the pipeline feeding it, the Transformation shaping it, the reports consuming it, and the permission profiles governing it. It provides full lifecycle visibility without switching tools.

When a data change is planned, such as a new connector or schema update, the Catalog tells you what would break downstream before you make the change. When an audit requires demonstrating data access controls, the Catalog surfaces that directly.

The Catalog and AI make each other better

Data Catalog enables sophisticated field selection for Rippling AI. But it works the other way, too. As the data estate grows, AI automatically generates descriptions for new objects. This means the Catalog gets richer without requiring manual curation for every new dataset. Richer metadata leads to more accurate AI field selection, which leads to more correct answers and drives more usage, improving the Catalog further.

While the Data Catalog is a powerful data discovery and management tool, it’s also the layer that makes every other Rippling Platform capability more intelligent over time.

Disclaimer

Rippling and its affiliates do not provide tax, accounting, or legal advice. This material has been prepared for informational purposes only, and is not intended to provide or be relied on for tax, accounting, or legal advice. You should consult your own tax, accounting, and legal advisors before engaging in any related activities or transactions.

Author

avatar_image_b6427625_aBAMAKeA0

Matt MacInnis

Chief Operating Officer

Matt MacInnis is Chief Operating Officer at Rippling where he oversees business operations. He was previously co-founder and CEO of Inkling, a mobile learning platform that raised over $100 million in funding before being acquired in 2018. Before Inkling, Matt spent eight years at Apple, growing the use of its products in education and the sciences. He holds an Electrical and Computer Engineering degree from Harvard, and lives in San Francisco with his husband and kids.