Waiting for engine...
Skip to main content

Pipelines-as-code for Data Integration: CI/CD with REST & GitHub Actions

· 8 min read
Guy Raz
Guy Raz
Product Manager, Data Integration @Boomi

Data Integration makes it easy to build Rivers in the UI. You can drag, drop, and connect things faster. But relying solely on the UI can run into problems, especially when you need to move a pipeline between environments without breaking anything and ensuring repeatable deployments, reviews, and traceability. Suddenly, that “clicky” workflow starts showing cracks.

This is where treating your integrations like code comes in! By leveraging REST APIs and treating your configuration as an artifact, you can automatically version, test, and deploy, giving your Boomi projects the same DevOps treatment your apps already enjoy.

Workflow overview: pipelines-as-code implementation

note

This is specifically about Data Integration and not Boomi Integration (iPaaS). The assets, concepts, and APIs here are for Data Integration Rivers.

This blog includes a working pipelines-as-code reference implementation: a GitHub Actions–driven workflow where a single YAML file serves as the source of truth for the Data Integration pipeline. Automation can create, update, activate, run, and discover schemas and tables using the Data Integration APIs. The YAML file (in the repository, under the configs directory) is selected for its configurability and demonstrates how to retrieve the pipeline configuration as JSON. Check out the Operation section to see the example in action and understand how everything fits together.

If you want to dig into the API details, all endpoints are documented here.

Why pipelines-as-code for a low-code data pipeline platform?

At first, pipelines-as-code on a low-code platform might sound unnecessary. Isn’t the UI the whole point?

DevOps isn’t “just deployments.” It’s really about getting things done faster without breaking stuff, reducing lead time, deploying more often, lowering failure rates, and recovering quickly when something goes wrong.

This tension becomes most visible at scale. As pipelines, environments, and teams grow, UI-driven changes stop scaling: safe promotion from dev → prod becomes risky, rollbacks turn manual, audits rely on screenshots and institutional knowledge, and reproducing environments becomes guesswork.

The Boomi Low-Code DevOps White Paper highlights this challenge directly: while low-code accelerates delivery, teams still need DevOps-grade controls for promotion, rollback, traceability, and compliance in large environments.

In pro-code systems, Git is the obvious hero because your deliverable is already text. Low-code platforms are a bit trickier. There, the “deliverable” usually lives inside the platform as a structured configuration, so the reliable path to CI/CD is to externalize your intent, basically define what the pipeline should do, and then let the platform’s API-first design reconcile that intent with what’s actually running.

The accompanying repo is built around that idea: YAML becomes the contract, and the APIs becomes the deployment mechanism. Think of it like telling the system “here’s what I want,” and letting it do the heavy lifting for you.

note

GitHub Actions is used here as a concrete example of an orchestration layer. The same pipelines-as-code approach works with any CI/CD or orchestration system that can call REST APIs (for example, GitLab CI, Jenkins, Azure DevOps, or internal schedulers).

Prerequisites

Before running any workflows, configure credentials and environment settings in GitHub.

GitHub secrets

In your repo, create the following GitHub secrets:

  • ACCOUNT_ID

  • ENV_ID

  • TOKEN (Data Integration API token)

    Check out the Creating a token Help Docs for more details.

    important

    Each Data Integration environment requires its own API token. So if you have Dev, Staging, and Prod, create and store a separate TOKEN for each environment you plan to run workflows against.

Base URL (region-specific)

Data Integration uses region-specific API base URLs. Ensure your scripts and workflows point to the correct one:

  • US: https://api.rivery.io
  • EU: https://api.eu-west-1.rivery.io
  • IL: https://api.il-central-1.rivery.io
  • AU: https://api.ap-southeast-2.rivery.io
tip

Expose this as an API_BASE environment variable in your workflows (or set it in GitHub Environments). That way, the same YAML works across different regions without any edits.

Environment and connection assumptions

This CI/CD workflow assumes your Data Integration environments and connection assets are already set up and managed outside the pipeline YAML.

In practice, teams typically:

  • Set up one or more Data Integration environments (Dev, Staging, Prod, etc.)
  • Provision the required source/target connections in each environment (credentials, network access, connector configuration)
  • Reference those pre-existing connections by ID inside the pipeline YAML

This separation is intentional. Think of it like keeping what you run separate from how it connects:

  • The pipeline YAML stays focused on deployable pipeline intent: what to run, what to load, schedules, schemas, and tables.
  • Connection lifecycle stuff (credentials, rotation, network policies) remains governed by your platform and security processes.
  • Automation can still read metadata (schemas and tables) and deploy or update Rivers consistently without trying to create, edit, or rotate connection details during pipeline deployment.

Repository layout

This is how the repo is organized and what each folder is responsible for:

  • .github/workflows/: CI/CD entry points, triggered via manual workflow_dispatch runs.
  • configs/pipelines/: YAML source of truth configs, templates, and exported configs.
  • connections/: Exported inventory of connections per environment.
  • exports/: Exported River JSON and schema, and table inventories.
  • scripts/: Python helper scripts that call the Data Integration REST API.

Operation

The following are the operation included in this repository, along with what they do and the API endpoints they call.

CI/CD Toolkit

Operation 1: Export Connection
  • Operation: .github/workflows/export-connections.yml
  • Why: Keep a versioned inventory of available connections so your team can reference IDs safely.
  • Endpoint used: Get Connections: GET /v1/accounts/{account_id}/environments/{environment_id}/connections
Operation 2: Export River (JSON + YAML)
  • Operation: .github/workflows/export-river.yml

  • Why: Pull the deployed River definition from the environment, store the raw JSON, and generate an update-ready YAML.

  • Endpoint used: Get River: GET /v1/accounts/{account_id}/environments/{environment_id}/rivers/{river_cross_id}

    This operation provides you the configuration details of existing pipelines.

    note

    The JSON configuration can also be exported from the UI.

Operation 3: Create River (YAML → new River)
  • Operation: .github/workflows/river-create.yml

  • Why: Create a new River from a YAML template.

  • Endpoint used: Add River: POST /v1/accounts/{account_id}/environments/{environment_id}/rivers

    This step prints the created River’s cross_id, which you can use in other workflows.

Operation 4: Update River (Updated YAML = desired state)
  • Operation: .github/workflows/river-update.yml

  • Why: Use the updated YAML to update the River configuration.

  • Endpoint used: Edit River: PUT /v1/accounts/{account_id}/environments/{environment_id}/rivers/{river_cross_id}

    This is the core “pipelines-as-code” move: update is not a patch, it’s a controlled replacements of the configuration based on what’s reviewed in Git.

Operation 5: Activate River
  • Operation: .github/workflows/river-activate.yml

  • Why: Move a River from “deployed but gated” to “allowed to run.”

  • Endpoint used: Activate River: POST /v1/accounts/{account_id}/environments/{environment_id}/rivers/{river_cross_id}/activate_river

    note

    Once a river has been created, it must be activated before it can run.

Operation 6: Run River (on-demand execution)
  • Operation: .github/workflows/river-run.yml

  • Why: Trigger a run as a smoke test or manual execution.

  • Endpoint used: Run River: POST /v1/accounts/{account_id}/environments/{environment_id}/rivers/{river_cross_id}/run

    This runs the River once. Scheduling is handled in the pipeline configuration.

Operation 7: Disable River
  • Operation: .github/workflows/river-disable.yml

  • Why: Disable a River when it’s no longer active.

  • Endpoint used: Disable River: POST /v1/accounts/{account_id}/environments/{environment_id}/rivers/{river_cross_id}/disable_river

    note

    You should disable a River before making changes, then re-enable it before running again.

Operation 8: Export Schemas and Tables
  • Operation: .github/workflows/export-schemas-tables.yml
  • Why: Get a JSON inventory of schemas and tables for a given connection. This step is useful when defining multi-table pipelines and deciding what to include in YAML.

Typical Dev → Prod lifecycle workflow example

A practical, production-friendly lifecycle workflow might look like this:

Production-ready Pipeline Lifecycle

Pipeline YAML: the fields you can tweak

The pipeline template is designed with one goal in mind: let engineers focus on intent, not platform internals. You can safely tweak the following in the pipeline YAML:

  • Source connector: type and connection_id
  • Target connector: type and connection_id
  • Target database/schema fields: Define where your data lands.
  • Run settings: Run type, schemas list, notifications, schedules
  • Advanced: advanced.cross_id once the River has been created.
Example (illustrative)
name: mysql_to_snowflake_demo
description: "Ingest MySQL tables into Snowflake"
river_status: disabled

source:
type: mysql
connection_id: "6784f4ffe3d56a8e86ae2xxx"

target:
type: snowflake
connection_id: "67162612774a3ad04dxxx"
database_name: "ANALYTICS"
schema_name: "RAW"
loading_method: merge
merge_method: merge

schedule:
cron: "0 2 * * *"
enabled: false

advanced:
cross_id: "24_hex_cross_id_here"

Once pipelines live in code, UI clicks become optional and everything in your pipeline becomes reviewable, repeatable, and team-ready. Tweak your YAML, hit run, and let automation do the heavy lifting.