Samply: CPU Profiling With Firefox Profiler UI on macOS, Linux, and Windows

Most CPU profiling workflows involve multiple tools: a profiler to collect samples, a converter to transform the output, and a viewer to make sense of it. Samply collapses this into one step.
samply record ./my-application my-arguments
That is it. Samply profiles the command, opens the Firefox Profiler in your browser, and serves symbol information and source code from a local webserver. You get flame graphs, call trees, timelines, and source-level annotation without any setup beyond installing samply itself.
Quick Install
# macOS / Linux (one-liner)
curl --proto '=https' --tlsv1.2 -LsSf \
https://github.com/mstange/samply/releases/download/samply-v0.13.1/samply-installer.sh | sh
# Windows (PowerShell)
powershell -ExecutionPolicy Bypass -c "irm https://github.com/mstange/samply/releases/download/samply-v0.13.1/samply-installer.ps1 | iex"
# Or via Cargo
cargo install --locked samply
Why Samply Matters for SRE
When you are debugging a production performance issue, you often need to answer: "What is this process spending CPU time on?" The usual approach involves:
- Run
perf record(Linux) ordtrace(macOS) - Convert the output to a format a viewer understands
- Open a separate tool (FlameGraph scripts, Speedscope, etc.)
- Realize you forgot to include debug symbols
Samply handles all of this in one command. It collects stack traces at 1000Hz (1ms interval), resolves symbols (including inlined functions for Rust and C++), and serves everything through the Firefox Profiler, which is arguably the best profiling UI available today.
How It Works
Samply is a sampling profiler. It collects stack traces per thread at a configurable interval (default 1000Hz).
On Linux: Uses perf_events. You need to grant access:
# Temporary (until reboot)
echo '-1' | sudo tee /proc/sys/kernel/perf_event_paranoid
# Permanent
sudo sysctl kernel.perf_event_paranoid=1
If you get mmap failed errors, increase the mlock limit:
sudo sysctl kernel.perf_event_mlock_kb=2048
On macOS: Uses the Mach task API. Both on-CPU and off-CPU samples are collected, so you can see where threads block on locks or I/O.
On Windows: Uses ETW (Event Tracing for Windows). Use -a flag to record all processes. Add symbol servers for Windows/Firefox/Chrome libraries:
samply record -a \
--windows-symbol-server https://msdl.microsoft.com/download/symbols \
--breakpad-symbol-server https://symbols.mozilla.org/try/ \
--windows-symbol-server https://chromium-browser-symsrv.commondatastorage.googleapis.com
The Firefox Profiler UI
The Firefox Profiler gives you:
| View | What It Shows |
|---|---|
| Call Tree | Hierarchical breakdown of CPU time per function |
| Flame Graph | Visual stack depth over time |
| Stack Chart | Stack traces laid out on a timeline |
| Marker Chart | Events and annotations on a timeline |
| Marker Table | Searchable list of all markers |
You can filter by thread, zoom into time ranges, search for specific functions, and double-click any function to see source code with per-line sample counts.
All data stays local until you explicitly choose to upload. The Firefox Profiler runs in your browser but loads data from samply's local webserver.
Real-World Examples
Profile a Build Process
# Profile Hugo static site generation
samply record hugo
# Profile a Rust build
samply record cargo build --release
# Profile a Go binary
samply record ./my-go-service --config prod.yaml
Profile a Running Service
On macOS, you can attach to a running process:
# First time only: self-sign samply binary
samply setup
# Attach to PID
samply record --pid 12345
Profile a CLI Tool
# Profile kubectl operations
samply record kubectl get pods -A
# Profile a database migration
samply record ./migrate up
# Profile a log parser
samply record grep -r "ERROR" /var/log/
Tips for Better Profiles
Rust: Use a Profiling Build Profile
Create ~/.cargo/config.toml:
[profile.profiling]
inherits = "release"
debug = true
Then build and profile:
cargo build --profile profiling
samply record ./target/profiling/my-binary
This gives you release-level optimizations with debug symbols, so you get inline stacks and accurate source code mapping.
C/C++: Include Debug Info
Make sure -g is in your compiler flags. Without debug info, you only see hex addresses instead of function names.
Go: Default Builds Work
Go binaries include symbol information by default (unless stripped with -ldflags -s -w). Samply works out of the box with standard Go builds.
Comparison With Other Profilers
| Tool | Platform | UI | Setup Effort | Off-CPU |
|---|---|---|---|---|
| Samply | macOS, Linux, Windows | Firefox Profiler (browser) | Minimal | macOS + Windows |
| perf + FlameGraph | Linux only | SVG (static) | Medium | With additional flags |
| Instruments | macOS only | Xcode (heavy) | Medium | Yes |
| py-spy | Python only | Speedscope/console | Minimal | No |
| async-profiler | JVM only | FlameGraph/JFR | Low | Yes |
| Intel VTune | Linux, Windows | Custom GUI | High | Yes |
Samply's advantage is the combination of cross-platform support, zero-config operation, and the Firefox Profiler UI. For most DevOps profiling tasks (figuring out why a build is slow, why a CLI tool takes too long, or what a service is spending CPU on), samply is the fastest path from "I have a problem" to "I can see the flame graph."
Limitations
- macOS: Cannot profile system commands (e.g.,
/usr/bin/sleep, system Python). These are signed to blockDYLD_INSERT_LIBRARIES. Homebrew-installed and locally-compiled binaries work fine - Linux: Only on-CPU samples are collected currently (off-CPU support is macOS and Windows only)
- No continuous profiling. Samply is for ad-hoc profiling sessions, not production continuous profiling (use Pyroscope or Parca for that)
- Needs debug symbols for useful output. Stripped binaries show hex addresses
Sharing Profiles
After recording, you can upload the profile directly from the Firefox Profiler UI. This generates a shareable link:
This is useful for sharing performance findings with your team. The recipient does not need samply installed; they just open the link in a browser.
Quick Reference
# Profile a command
samply record ./my-app args
# Profile with custom sample rate (samples per second)
samply record --rate 4000 ./my-app
# Attach to running process (macOS, needs samply setup first)
samply record --pid <PID>
# Profile all processes (Windows)
samply record -a
# Load an existing perf.data file
samply load perf.data
At Akmatori, our AI agents run diagnostics during incidents, including CPU profiling of misbehaving services. Tools like samply that produce rich, shareable profiles with minimal setup are exactly what you need when debugging at 3 AM.
