How we overhauled our end-to-end testing to improve speed, reliability, and developer experience

Anoop Raveendran, Web DeveloperSep 23, 2022

At Rippling we’ve built one of the most extensive workforce management platforms out there—maybe the most extensive. And with the recent addition of Finance Cloud to our considerable portfolio of HR and IT products, we’re not slowing down. Today, we have 600+ developers, across more than six time zones, contributing to our codebase.  

Because of our size and speed, we follow trunk-based development, where we merge small but frequent updates to master, which is assumed to always be stable, without issues, and ready to deploy. To support this, we built a fast, efficient, and reliable diagnostic system that runs quality checks on every PR and gatekeeps for the build's stability and integrity.

Updating our end-to-end testing workflow

In the past, one of the most important parts of this system was our suite of end-to-end (E2E) tests using Selenium (WebdriverIO). But the test suites—like our products—started to grow exponentially. Our current E2E configuration and workflow started to feel slow and flaky. It was increasingly difficult for developers to author, execute, and debug tests. 

So we decided to update the workflow and explore other E2E solutions. After gathering and documenting the requirements, exploring possible solutions, and creating pros-and-cons lists, we decided to go with Playwright library for testing and a database restore methodology for seeding test data.

During our initial comparison analysis, we observed that these steps could reduce our test run durations by as much as 40%. Going forward, we set ourselves two objectives:

  • Deliver an impeccable developer experience at each step of a test’s lifecycle (authoring, execution, debugging, and reporting).
  • Provide a reporting solution that would allow anybody at Rippling (in and outside Engineering) to understand exactly which user flows are being tested and monitored.

Authoring and execution of tests

One of the first decisions we made was to use TypeScript, for static type-checking, readability, and better auto-complete support. We also included the test files in our code quality checks (linting, formatting, etc.). 

Playwright test generator helped accelerate adoption for developers who were new to E2E testing. They could generate the test code from the interactions on the frontend app, clean it, and introduce it as a test flow.

For most tests, we needed to set up some data as a prerequisite. For example, to sign in as an employee, we had to first set up a company and then add the employee. We used to do this via multiple API calls before a test, and this was needed for every execution of every test. But this had increased our overall test run duration and the seed APIs were another point of failure for the tests. 

So we moved to a Database restore methodology, where the data snapshot required for all tests is generated beforehand and restored into the Database before the test run starts. This not only reduced our test run duration but also removed the dependency on seed APIs.

For the pages or scenarios with golden expectations, we use screenshot testing. To normalize the platform (screenshots differ between browsers and platforms due to different rendering, fonts, etc.), we use the Playwright-Docker integration to take screenshots.

Next, we added READMEs documenting the workflow, best practices (sample tests, folder structure, tagging, etc.), FAQs, common gotchas, and channels to reach out to for support. Another company might have documented this workflow on a page buried in a wiki. But this is Rippling—we have a Learning Management System (LMS), which helps our customers train employees on everything from compliance to internal processes. So, to introduce new engineers to this workflow, we added it to our frontend onboarding program in Rippling LMS.

Debugging and reporting

In the event of any test failure, we wanted to provide rapid visibility into how the changes affected the failing test(s). With Playwright, we were able to generate and share HTML reports with screenshots, video playback, and test traces for all the test runs. We also created playbooks on advanced debugging methods and sections on common failure scenarios (e.g. quick redirection logic leads to chrome aborting JS build downloads).

In case of test failures on the Continuous Integration (CI)

  • On a pull request, we add the HTML report link on the PR as a comment and also notify the PR owner via Slack.
  • On a master branch check, we notify on the #frontend Slack channel with the HTML report link, tagging the relevant stakeholders based on which tests failed.

Reports for everyone

We also created a dashboard where we list all the tests that are on master, provide additional details (like the module the tests belong to), define the directly responsible individual, and also link to a video recording of the test (generated using Playwright), so that it is easier for non-technical teams, like Product and Design, to view the tests in action. We were also able to track slow and flaky tests better with this dashboard.

Efficient test execution on CI

Test selection

We wanted to detect and run only the relevant tests based on the files changed in the pull request. We also wanted developers to be able to define which tests to run when specific files had changes.

Let’s look at a simplified implementation of test selection:

Step 1: Detect the changed files

  • Compare the latest commit in the PR with the master branch to generate a changed file list.

Step 2: Define a mapping between source files and tests to run when they are changed

Step 3: Generate the list of tests to run from the output of Step 1 and Step 2

  • Matching file paths from Step 1 to the patterns from testMapping.json (above), we get the test sets to run as “Payroll” and “Benefits” which resolves to end-to-end-tests/modules/Payroll/**/*.test.ts and end-to-end-tests/modules/Benefits/**/*.test.ts.
  • For running tests across modules, we support test tags (e.g. any changes made to the config files like package.json will trigger a run with all the tests and this is achieved through #all-tests tag)
  • For a test to be selectable using tags, you need to comment on the test file with that tag (e.g #smoke at the top of the file). This is similar to how Playwright annotations work for individual tests.

Now we have the list of tests to run based on the changes made in the PR.

Batching of tests

Playwright supports sharding but we wanted to make batching more efficient.

How? Here’s an example:

  • Let’s say we have three tests to run on two parallel processes:
    • Test 1, which takes 5 minutes to run
    • Test 2, which takes 3 minutes to run
    • Test 3, which takes 1 minute to run.
  • Now let’s distribute the tests across threads:
    • Process 1 – Test 1 & 2 (5 + 3 minutes)
    • Process 2 – Test 3 (1 minute)

The time taken for CI to complete is 8 minutes.

  • But if we batch the tests efficiently in the following way:
    • Process 1 – Test 1 (5 minutes)
    • Process 2 – Test 2 & 3 (3 + 1 minutes)

The time taken for CI to complete is only 5 minutes. So, using the time taken for each test in the previous CI run, we use the Bin packing algorithm to efficiently batch the tests to optimize for time.

Seeding test data

Populating data in your application for testing is called seeding. Let’s say you want to test if an employee has permission to access a particular third-party app.

  • Before you can run this test, you will need to:
    • Create a company
    • Finish onboarding and add an employee
      • For example, test_admin_privileges@rippling.test
    • Set the correct permissions for the employee

Instead of executing the above steps every time before running the test, we wanted the database to already have the test_admin_privileges@rippling.test account with the necessary data saved, so the only step required to run the test would be signing in to test_admin_privileges@rippling.test.

To achieve this, we created a separate workflow in the backend services (let’s call it “generate_seed_data”) which executes the above steps and created the test_admin_privileges@rippling.test account. 

Once we execute this workflow, we can perform a database dump and the output structure is saved in storage. This workflow needs to only run based on backend code changes and not before every E2E testing setup. So, while setting up frontend and backend services for running end-to-end tests, we will perform a database restore from the latest dump and then run the tests.For example, for MongoDB, database dump will be performed with mongodump and database restore with mongorestore commands.

Sample code

In the above snippet the HubSeedHelper class can be used to create variations in accounts, which will be used to generate the seeded accounts in the database dump.

Overview of the CI pipeline

Let’s look at a sample schematic of the CI pipeline with the discussed parts in action.

Deploy test environment 

Once the code is checked out, we deploy the frontend web app and backend code after resetting the DB with the latest snapshot. We expose the WEBAPP_BASE_URL for the tests to run on.

Select and batch tests

Then, based on the files changed in the pull request, we select the relevant tests and, based on the previous test run durations, batch them. We can also add support for running smoke tests in this step to have more confidence in the stability of the build.

Build image and run tests

Next, we use the versioned official Playwright-Docker integration to build the image with the test code (you can also use Docker volumes and skip this step) and run all the test batches in parallel. Using the official docker image really helps us when upgrading Playwright in the codebase, as we don’t need to worry about the system dependencies to make it work.

Analyze and push reports, then notify stakeholders

Lastly, we analyze the test reports, push data to the reporting endpoint and upload the artifacts (HTML report, video, trace and screenshot) to S3. Once the analysis is completed, we notify the developers via Slack and also leave a comment on the pull request.

Interested in building platform-level solutions and tackling user and developer experience challenges? We’re hiring in San Francisco, Bangalore, New York, Seattle, and more. Take a look at our careers page!