Skip to main content
16.04.2026

Magika: Fast File Type Detection for CI Pipelines

head-image

Modern delivery pipelines handle ZIPs, container layers, logs, configs, documents, and third party artifacts all day. The hard part is that filenames lie. A file called invoice.pdf might actually be a script, and a suspicious upload with no extension can still move through your workflow. Magika helps SRE and platform teams classify files from their content with an ML model designed for speed.

What Is Magika?

Magika is an open-source file type detection tool from Google. It uses a compact deep learning model to identify more than 200 content types from a limited subset of file bytes, which keeps inference fast even on a single CPU. According to the official project, the model was trained and evaluated on about 100 million samples and reaches about 99% average precision and recall on its test set.

For operators, the practical value is simple: route files to the right scanner, policy engine, or retention path without relying on extensions alone.

Key Features

  • Content-based detection instead of extension-based guessing
  • Rust CLI plus Python package and additional language bindings
  • JSON and JSONL output for automation
  • Recursive directory scanning for bulk triage
  • Prediction modes that can return generic labels when confidence is low

Google says Magika already helps route files for Gmail, Drive, and Safe Browsing, and the project has also been integrated with VirusTotal and abuse.ch. That is a strong signal that the tool is built for real security workflows, not just demos.

Installation

Magika ships with a Rust CLI and is easy to install in a few ways:

# CLI-focused install
pipx install magika

# Python package
pip install magika

# Homebrew
brew install magika

Verify it after install:

magika --version

Usage

The CLI is straightforward and works well in scripts.

Classify a single file:

magika artifact.bin

Scan a directory recursively:

magika -r ./incoming-files

Emit machine-friendly JSON for a pipeline step:

magika ./build/output.tar.gz --json

Read from stdin when a file is streamed through a job:

cat suspicious.payload | magika -

You can also use the Python API when you need Magika inside a service or worker:

from magika import Magika

m = Magika()
res = m.identify_path("./artifact.bin")
print(res.output.label, res.output.mime_type)

Operational Tips

In CI, Magika fits best as an early classification step before deeper inspection. For example, you can detect whether an uploaded artifact is really a document, script, archive, or executable, then send it to the right scanner or block unexpected types.

A simple pattern is to run Magika on every inbound artifact, parse the --json output, and enforce an allowlist. If your release job expects tar, gzip, and json, there is no reason for a Windows executable or shell script to appear in that lane.

Because inference is near constant time and does not require reading whole files, Magika is also useful in high-volume ingestion paths where latency matters.

Conclusion

Magika solves a boring but important problem: knowing what a file really is. For SRE, platform, and security teams, that improves artifact validation, malware triage, and policy enforcement without adding much latency to the pipeline.

If you want to turn signals like file type, scanner output, and deployment metadata into operational decisions, Akmatori helps teams build AI agents for incident response and infrastructure automation. Infrastructure delivery powered by Gcore.

Automate incident response and prevent on-call burnout with AI-driven agents!