Miniray: Lightweight Distributed Python Compute

Running compute-intensive Python workloads on a single machine quickly hits limits. Whether you are training ML models, processing large datasets, or running simulations, scaling to multiple nodes typically requires complex frameworks. Miniray takes a different approach: extreme simplicity backed by Redis.
What is Miniray?
Miniray is an open-source distributed compute library from comma.ai, the autonomous driving company. It dispatches arbitrary Python code to workers across a datacenter using Redis as the message broker. The API mirrors Python's built-in concurrent.futures, making adoption straightforward for anyone familiar with ThreadPoolExecutor or ProcessPoolExecutor.
The project powers comma.ai's internal ML training infrastructure, handling everything from data preprocessing to on-policy reinforcement learning rollouts across 600+ GPUs.
Key Features
- Familiar API: Uses
concurrent.futurespatterns withsubmit(),map(), andas_completed(). - Redis-backed: Task queuing and result delivery through Redis keeps the architecture simple and debuggable.
- Zero configuration for workers: Workers pull tasks automatically based on job priority.
- GPU support: Built-in integration with Triton Inference Server for efficient model serving.
- Cgroup isolation: Tasks run in isolated cgroups with configurable memory limits and NUMA pinning.
- Cloudpickle serialization: Send arbitrary Python functions without manual serialization code.
Installation
Clone the repository and install with pip:
git clone https://github.com/commaai/miniray.git
cd miniray
pip install -e .
You will need a Redis instance accessible to both executor and workers.
Usage
The executor API feels native to Python developers:
import miniray
from concurrent.futures import as_completed
def process_item(item):
# Your compute-intensive logic here
return item * 2
data = range(1000)
with miniray.Executor(job_name='batch_process') as executor:
# Map style
results = list(executor.map(process_item, data, chunksize=10))
# Submit style with futures
futures = [executor.submit(process_item, x) for x in data]
for future in as_completed(futures):
print(future.result())
For ML workloads, miniray supports batching with chunksize to reduce Redis round-trips and automatic function caching to avoid resending pickled code.
Operational Tips
- Set appropriate chunksizes: Large chunksizes reduce overhead but increase latency for individual results.
- Use job priorities: Higher priority jobs get scheduled first when workers become available.
- Monitor with Redis: Task queues are visible in Redis, making debugging straightforward with
redis-cli. - Configure timeouts: Set
limits.timeout_secondsto prevent runaway tasks from blocking workers.
Conclusion
Miniray proves that distributed compute does not require heavyweight orchestration. If you need to parallelize Python workloads across multiple machines without the complexity of Dask or Ray, miniray offers a battle-tested alternative. It runs comma.ai's entire ML training pipeline, from data processing to model training.
Akmatori is an open-source AI agent platform designed for SRE teams. If you are looking for reliable, scalable cloud infrastructure, check out Gcore.
