Skip to main content
Architecture

Tool & Skill Architecture

Akmatori documents agent behavior through incident-level AGENTS.md guidance and skill-level SKILL.md files that include an auto-generated Assigned Tools section. Instead of Python import snippets, modern Akmatori routes tool use through the MCP Gateway, with generated gateway_call(...) and execute_script examples tied to the skill's actual assigned tools.

Overview

AGENTS.md
SKILL.md
Assigned Tools
gateway_call / execute_script

The goal is still to reduce exploration, but the execution model changed. Agents should read the relevant skill doc first, follow the generated tool examples, and use the gateway's namespaced tools directly. This keeps authorization, configuration, and routing centralized instead of asking agents to discover import paths or wire credentials manually. Connector and MCP registration changes can be reloaded at runtime, and proxied MCP tool schemas can be refreshed so newly discovered external tools show up without rewriting the skill prompt model.

The backend repository still contains legacy docs such as docs/TOOL_ARCHITECTURE.md that describe the old Quick Start import-snippet model. The current runtime contract is the gateway-first Assigned Tools flow documented here.

File Responsibilities

AGENTS.md

Incident-level instructions loaded at the incident workspace root.

  • Defines the incident manager role and investigation workflow
  • Points agents to the relevant skill docs before they start exploring
  • Requires runbook search, then cross-incident memory search, before using infrastructure tools

SKILL.md

Skill prompt plus an auto-generated Assigned Tools section.

  • Stores YAML frontmatter and human-authored skill instructions
  • Lists only the tools currently assigned to that skill
  • Includes per-tool details and ready-to-use gateway_call(...) examples
  • Can reference shared context files through linked assets

Gateway Tool Registry

The MCP Gateway is the runtime contract agents execute against.

  • Built-in tool types include SSH, Zabbix, VictoriaMetrics, Catchpoint, PostgreSQL, Grafana, PagerDuty, ClickHouse, NetBox, Kubernetes, Jira, and the credentialless Incidents namespace
  • HTTP connectors can add declarative API-backed tools and trigger gateway reloads after CRUD changes
  • External MCP servers can be proxied under a custom namespace prefix, with runtime reloads and schema refreshes for discovered tools
  • System-level MCP servers can also stay registered across reloads, so platform-wide helper tools remain available without per-skill rewiring

Assigned Tools Pattern

Generated skill docs now include a structured tool section with copyable gateway examples. This keeps the docs aligned with the authorized tool set for that skill and avoids stale import instructions.

markdown
## Assigned Tools

### Production PostgreSQL (logical_name: "prod-db", type: postgresql)

Use ```python
gateway_call("postgresql.list_tables", {}, "prod-db")
gateway_call("postgresql.describe_table", {"table_name": "users"}, "prod-db")
```

For multi-step work, use `execute_script` with built-in `gateway_call()`.
Akmatori no longer treats Python import snippets as the primary integration path. The preferred path is generated gateway usage via gateway_call, with execute_script available for multi-step workflows.

Investigation Guardrails

The incident manager prompt now makes context recall an explicit preflight. For every incident, agents search the runbook library first, then search cross-incident memory, and only then invoke infrastructure tools through skills or the gateway.

markdown
1. Understand the alert.
2. Delegate runbook recall:
   subagent({"agent": "runbook-searcher", "task": "<full Original alert text or concise summary>"})
3. Delegate cross-incident memory recall:
   subagent({"agent": "memory-searcher", "task": "<full Original alert text or concise summary>"})
4. Read the most relevant returned files.
5. Load skills and call infrastructure tools with runbook and memory context in hand.
Both searches are delegated to scoped read-only subagents. runbook-searcher can only inspect /akmatori/runbooks/, while memory-searcher can only inspect /akmatori/memory/. Each may retry up to two times with narrower terms before the parent agent continues.

Scheduled Agents

Cron jobs use the same agent execution path as incidents, but each schedule owns its tool allowlist. A cron job can post to a configured Channel or fall back to the workspace default post channel, and its tool_instance_ids determine exactly which infrastructure tools the cron-agent may call. Empty tool assignments mean the scheduled run still has memory and runbook recall, but no gateway-backed infrastructure tools.

System cron rows are seeded by the backend for platform maintenance. Operators can disable them, but the API rejects deletion with 409 Conflict so maintenance schedules survive routine cleanup.

Key Principles

Skill-scoped authorization

Generated docs should reflect only the tools a skill is actually allowed to use, not the entire platform surface area.

Gateway-first execution

Agents should prefer gateway_call and execute_script so routing, auth, rate limiting, and auditing stay centralized.

Generated docs stay close to runtime behavior

Examples should be produced from current tool assignments and tool types, so docs change when the real execution path changes.

Less discovery, more investigation

Good docs remove avoidable exploration while still leaving room for runbook search, memory recall, incident analysis, and targeted debugging.

Generation Flow

  1. User assigns one or more tool instances to a skill in the UI.
  2. The backend regenerates SKILL.md with the current Assigned Tools section.
  3. Each generated block includes the logical tool name and gateway usage examples for that tool type.
  4. HTTP connector and MCP server changes can trigger gateway reloads so new tools appear without a full rebuild.
  5. When an incident is created, Akmatori generates AGENTS.md with the current incident workflow and global memory manifest.

Contributor Checklist

When adding a new built-in tool type

Update tool registration and add a concrete usage example in generateToolUsageExample() so assigned skills get useful gateway examples automatically.

When changing skill docs behavior

Keep generated section markers and stripping logic aligned so user-authored prompt content is preserved while auto-generated blocks stay fresh.

When extending integrations

Decide whether the new capability belongs in a built-in tool type, a declarative HTTP connector, or a proxied external MCP server, then document that runtime path clearly.

Troubleshooting

Assigned tools look wrong

Check the skill-to-tool assignment and regenerate skill docs so SKILL.md is rebuilt from the current mapping.

A connector or MCP server is missing

Verify the HTTP connector or MCP server configuration exists, has a non-conflicting namespace, and that the gateway reload completed successfully.

Agents are still exploring too much

Make sure AGENTS.md points agents to the right skill docs and that the generated gateway examples are specific enough to execute directly.