Simplify Data Movement with ingestr: Zero-Code Replication for Modern Teams
Data teams often wait on custom scripts or brittle ELT jobs just to land a new table in analytics. ingestr answers that with a self-contained CLI: point it at a source, choose a destination, and let it manage batching, retries, and incremental syncs. You can explore the code and roadmap on GitHub.
What is ingestr?
ingestr is an open-source command-line tool from Bruin Data that wraps the dlt and SQLAlchemy ecosystems into a single, opinionated workflow. The project is developed in the open on GitHub and focuses on:
- Connector breadth: databases like Postgres, BigQuery, ClickHouse, DuckDB, and Elasticsearch; SaaS APIs such as Salesforce, Shopify, Notion, and GitHub; flat files from S3 or local CSV/Parquet.
- Incremental strategies: choose append, merge, or delete+insert semantics per table without writing pipeline code.
- Backend-free operation: run it locally, in CI, or from a cron container—no control plane to babysit.
Installation
The maintainers recommend uv for the fastest setup:
pip install uv
uvx ingestr
If you prefer a global install:
uv pip install --system ingestr
While pip install ingestr
works, uv dramatically shortens dependency resolution by shipping pre-built wheels.
Quickstart: Replicate Postgres to BigQuery
ingestr ingest \
--source-uri 'postgresql://admin:admin@localhost:8837/web?sslmode=disable' \
--source-table 'public.some_data' \
--dest-uri 'bigquery://demo-project?credentials_path=/tmp/sa.json' \
--dest-table 'landing.some_data'
This single command fetches public.some_data
from Postgres, stages it, then hydrates landing.some_data
inside BigQuery. ingestr auto-detects data types, negotiates batch sizes, and surfaces a run log you can ship to observability tooling.
Operating Modes You Should Know
- Incremental flags: Use
--load-method append
,merge
, ordelete-insert
to control deduplication and idempotency. - Schema evolution: ingestr inspects column metadata on each run and propagates new fields downstream, reducing manual DDL.
- Credential flexibility: connection strings handle secrets; for cloud warehouses supply key files via query params (
credentials_path
,aws_profile
, etc.). - Dry runs and validation: pair
--record-limit
with a sandbox destination to sample data before the first full sync.
Production Tips
- Schedule safely: wrap ingestr inside GitHub Actions, Airflow, or a Kubernetes CronJob. Treat each command invocation as a stateless batch.
- Log retention: stream STDOUT/STDERR to your log stack to capture load metrics and retry hints.
- Catalog hygiene: split large workloads into multiple invocations so you can tune load methods per table.
- Community support: join the Bruin Data Slack to request new connectors or report edge cases.
Why SRE and Platform Teams Care
Simpler ingestion helps response loops. When an incident demands historical context from an operational database, ingestr can hydrate analytics stores in minutes without waiting on code deploys. It also lowers the barrier to replicate metrics into staging environments for chaos drills or load tests.
Conclusion
ingestr collapses weeks of pipeline plumbing into a reproducible CLI workflow. Install it with uv, point it at your source and target, and let the tool manage sync semantics while you focus on insights.
For efficient incident management and to prevent on-call burnout, consider using Akmatori. Akmatori automates incident response, reduces downtime, and simplifies troubleshooting.
Additionally, for reliable virtual machines and bare metal servers worldwide, check out Gcore.