Skip to main content
03.04.2026

TimesFM for SRE Capacity Forecasting

head-image

Capacity planning is rarely a clean spreadsheet exercise. Production metrics drift, demand spikes on weird schedules, and seasonality changes faster than static thresholds can keep up. TimesFM, an open time-series foundation model from Google Research, gives operators a practical way to forecast metric trends with a pretrained model instead of hand-tuning a classical pipeline for every signal.

What is TimesFM?

TimesFM stands for Time Series Foundation Model. It is a decoder-only model built for time-series forecasting and released openly through GitHub and Hugging Face. The current open release is TimesFM 2.5, which reduces the model size to 200M parameters, supports up to 16k context length, and adds continuous quantile forecasting for uncertainty-aware predictions.

That combination matters for SRE work. You can feed the model CPU usage, request rates, queue depth, or storage growth and use the forecast to spot likely saturation before it turns into an incident.

Key Features

  • Pretrained Forecasting Model: Start from an existing foundation model instead of training a forecasting model from zero.
  • Long Context Support: TimesFM 2.5 supports up to 16k context length, which is useful for long operational histories.
  • Quantile Forecasts: Predict a range, not just a point estimate, so alerting and capacity reviews can include uncertainty.
  • Multiple Backends: The project supports PyTorch today and is adding more inference options as the repo evolves.
  • Production-Friendly Fit: Works well for metrics such as traffic, latency trends, storage growth, and resource utilization.

Installation

Clone the repository and install the package with the PyTorch backend:

git clone https://github.com/google-research/timesfm.git
cd timesfm
uv venv
source .venv/bin/activate
uv pip install -e .[torch]

Then load the latest open checkpoint in Python:

import numpy as np
import timesfm

model = timesfm.TimesFM_2p5_200M_torch.from_pretrained(
    "google/timesfm-2.5-200m-pytorch"
)

model.compile(
    timesfm.ForecastConfig(
        max_context=1024,
        max_horizon=168,
        normalize_inputs=True,
        use_continuous_quantile_head=True,
        infer_is_positive=True,
        fix_quantile_crossing=True,
    )
)

Usage

A simple SRE workflow is to export a metric series from Prometheus, convert it to a NumPy array, and ask TimesFM for the next forecast window. For example, you could forecast the next 168 hourly samples for API request volume or disk growth:

point_forecast, quantile_forecast = model.forecast(
    horizon=168,
    inputs=[metric_values],
)

The point forecast helps with baseline planning. The quantile output is even more useful operationally because it gives you a confidence band for best-case and worst-case scenarios.

Operational Tips

  • Forecast the right signals: Start with request rate, CPU, memory pressure, queue depth, and storage consumption.
  • Use quantiles for guardrails: Plan against upper quantiles when deciding autoscaling limits or hardware purchases.
  • Retrain your process, not the model: The model is pretrained, but your value comes from better metric hygiene and rollout decisions.
  • Pair forecasts with incident data: Compare prediction misses with deploys, campaigns, or outages to understand where human context still matters.

Conclusion

TimesFM gives SRE teams a modern way to forecast operational data with less custom modeling work. If you want faster capacity reviews, better risk estimates, and a cleaner path from metrics to planning, it is worth a serious look.

For efficient incident management and to prevent on-call burnout, consider using Akmatori. Akmatori automates incident response, reduces downtime, and simplifies troubleshooting.

Additionally, for reliable virtual machines and bare metal servers worldwide, check out Gcore.

Automate incident response and prevent on-call burnout with AI-driven agents!