Skip to main content
03.04.2026

Samply: CPU Profiling With Firefox Profiler UI on macOS, Linux, and Windows

head-image

Most CPU profiling workflows involve multiple tools: a profiler to collect samples, a converter to transform the output, and a viewer to make sense of it. Samply collapses this into one step.

samply record ./my-application my-arguments

That is it. Samply profiles the command, opens the Firefox Profiler in your browser, and serves symbol information and source code from a local webserver. You get flame graphs, call trees, timelines, and source-level annotation without any setup beyond installing samply itself.

Quick Install

# macOS / Linux (one-liner)
curl --proto '=https' --tlsv1.2 -LsSf \
  https://github.com/mstange/samply/releases/download/samply-v0.13.1/samply-installer.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy Bypass -c "irm https://github.com/mstange/samply/releases/download/samply-v0.13.1/samply-installer.ps1 | iex"

# Or via Cargo
cargo install --locked samply

Why Samply Matters for SRE

When you are debugging a production performance issue, you often need to answer: "What is this process spending CPU time on?" The usual approach involves:

  1. Run perf record (Linux) or dtrace (macOS)
  2. Convert the output to a format a viewer understands
  3. Open a separate tool (FlameGraph scripts, Speedscope, etc.)
  4. Realize you forgot to include debug symbols

Samply handles all of this in one command. It collects stack traces at 1000Hz (1ms interval), resolves symbols (including inlined functions for Rust and C++), and serves everything through the Firefox Profiler, which is arguably the best profiling UI available today.

How It Works

Samply is a sampling profiler. It collects stack traces per thread at a configurable interval (default 1000Hz).

On Linux: Uses perf_events. You need to grant access:

# Temporary (until reboot)
echo '-1' | sudo tee /proc/sys/kernel/perf_event_paranoid

# Permanent
sudo sysctl kernel.perf_event_paranoid=1

If you get mmap failed errors, increase the mlock limit:

sudo sysctl kernel.perf_event_mlock_kb=2048

On macOS: Uses the Mach task API. Both on-CPU and off-CPU samples are collected, so you can see where threads block on locks or I/O.

On Windows: Uses ETW (Event Tracing for Windows). Use -a flag to record all processes. Add symbol servers for Windows/Firefox/Chrome libraries:

samply record -a \
  --windows-symbol-server https://msdl.microsoft.com/download/symbols \
  --breakpad-symbol-server https://symbols.mozilla.org/try/ \
  --windows-symbol-server https://chromium-browser-symsrv.commondatastorage.googleapis.com

The Firefox Profiler UI

The Firefox Profiler gives you:

View What It Shows
Call Tree Hierarchical breakdown of CPU time per function
Flame Graph Visual stack depth over time
Stack Chart Stack traces laid out on a timeline
Marker Chart Events and annotations on a timeline
Marker Table Searchable list of all markers

You can filter by thread, zoom into time ranges, search for specific functions, and double-click any function to see source code with per-line sample counts.

All data stays local until you explicitly choose to upload. The Firefox Profiler runs in your browser but loads data from samply's local webserver.

Real-World Examples

Profile a Build Process

# Profile Hugo static site generation
samply record hugo

# Profile a Rust build
samply record cargo build --release

# Profile a Go binary
samply record ./my-go-service --config prod.yaml

Profile a Running Service

On macOS, you can attach to a running process:

# First time only: self-sign samply binary
samply setup

# Attach to PID
samply record --pid 12345

Profile a CLI Tool

# Profile kubectl operations
samply record kubectl get pods -A

# Profile a database migration
samply record ./migrate up

# Profile a log parser
samply record grep -r "ERROR" /var/log/

Tips for Better Profiles

Rust: Use a Profiling Build Profile

Create ~/.cargo/config.toml:

[profile.profiling]
inherits = "release"
debug = true

Then build and profile:

cargo build --profile profiling
samply record ./target/profiling/my-binary

This gives you release-level optimizations with debug symbols, so you get inline stacks and accurate source code mapping.

C/C++: Include Debug Info

Make sure -g is in your compiler flags. Without debug info, you only see hex addresses instead of function names.

Go: Default Builds Work

Go binaries include symbol information by default (unless stripped with -ldflags -s -w). Samply works out of the box with standard Go builds.

Comparison With Other Profilers

Tool Platform UI Setup Effort Off-CPU
Samply macOS, Linux, Windows Firefox Profiler (browser) Minimal macOS + Windows
perf + FlameGraph Linux only SVG (static) Medium With additional flags
Instruments macOS only Xcode (heavy) Medium Yes
py-spy Python only Speedscope/console Minimal No
async-profiler JVM only FlameGraph/JFR Low Yes
Intel VTune Linux, Windows Custom GUI High Yes

Samply's advantage is the combination of cross-platform support, zero-config operation, and the Firefox Profiler UI. For most DevOps profiling tasks (figuring out why a build is slow, why a CLI tool takes too long, or what a service is spending CPU on), samply is the fastest path from "I have a problem" to "I can see the flame graph."

Limitations

  • macOS: Cannot profile system commands (e.g., /usr/bin/sleep, system Python). These are signed to block DYLD_INSERT_LIBRARIES. Homebrew-installed and locally-compiled binaries work fine
  • Linux: Only on-CPU samples are collected currently (off-CPU support is macOS and Windows only)
  • No continuous profiling. Samply is for ad-hoc profiling sessions, not production continuous profiling (use Pyroscope or Parca for that)
  • Needs debug symbols for useful output. Stripped binaries show hex addresses

Sharing Profiles

After recording, you can upload the profile directly from the Firefox Profiler UI. This generates a shareable link:

Example: dump_syms on macOS

This is useful for sharing performance findings with your team. The recipient does not need samply installed; they just open the link in a browser.

Quick Reference

# Profile a command
samply record ./my-app args

# Profile with custom sample rate (samples per second)
samply record --rate 4000 ./my-app

# Attach to running process (macOS, needs samply setup first)
samply record --pid <PID>

# Profile all processes (Windows)
samply record -a

# Load an existing perf.data file
samply load perf.data

At Akmatori, our AI agents run diagnostics during incidents, including CPU profiling of misbehaving services. Tools like samply that produce rich, shareable profiles with minimal setup are exactly what you need when debugging at 3 AM.

Automate incident response and prevent on-call burnout with AI-driven agents!