Observability Stack
Astromesh provides built-in observability across three pillars: distributed tracing (OpenTelemetry), metrics (Prometheus), and cost tracking. All three are optional and degrade gracefully when their dependencies are not installed.
Architecture
Section titled “Architecture”Agent Execution --> TelemetryManager --> OpenTelemetry Collector --> Jaeger / Zipkin | MetricsCollector --> Prometheus --> Grafana | CostTracker --> Usage records + budget alertsEach component operates independently. You can use tracing without metrics, metrics without cost tracking, or any combination.
Installing Dependencies
Section titled “Installing Dependencies”The observability dependencies are bundled in the observability extra:
uv sync --extra observabilityThis installs:
opentelemetry-api(>= 1.27.0)opentelemetry-sdk(>= 1.27.0)opentelemetry-exporter-otlp(>= 1.27.0)prometheus-client(>= 0.21.0)
If these packages are not installed, Astromesh uses no-op fallbacks and logs no warnings.
Tracing (OpenTelemetry)
Section titled “Tracing (OpenTelemetry)”Configuration
Section titled “Configuration”Set the OTLP endpoint via environment variable or TelemetryConfig:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317The TelemetryManager initializes on startup:
from astromesh.observability.telemetry import TelemetryManager, TelemetryConfig
telemetry = TelemetryManager(TelemetryConfig( service_name="astromesh", otlp_endpoint="http://localhost:4317", enabled=True, sample_rate=1.0,))telemetry.setup()What Gets Traced
Section titled “What Gets Traced”Three span types are created during agent execution:
| Span Name | Attributes | Description |
|---|---|---|
agent.run.<name> | agent.name, session.id | Full agent execution from query to response |
provider.call.<provider> | provider.name, model.name | Individual LLM provider call |
tool.execute.<tool> | tool.name | Tool execution within an orchestration step |
Spans are nested: an agent.run span contains one or more provider.call spans, which may contain tool.execute spans (in tool-calling flows like ReAct).
NoOpSpan Fallback
Section titled “NoOpSpan Fallback”When OpenTelemetry is not installed, TelemetryManager returns _NoOpSpan instances. These implement __enter__, __exit__, set_attribute, and add_event as no-ops. No code changes are needed — the tracing context managers work identically whether OTel is present or not.
Lightweight Tracing (TracingContext)
Section titled “Lightweight Tracing (TracingContext)”In addition to full OpenTelemetry export, Astromesh includes a lightweight built-in tracing system that works without any external dependencies. Every agent run automatically collects spans into a TracingContext:
from astromesh.observability.tracing import TracingContext
ctx = TracingContext(agent_name="my-agent", session_id="sess_123")span = ctx.start_span("agent.run", {"agent": "my-agent"})
# Nested spanschild = ctx.start_span("llm.complete", parent_span_id=span.span_id)child.set_attribute("input_tokens", 150)ctx.finish_span(child)
ctx.finish_span(span)trace_data = ctx.to_dict()Automatic spans in Agent.run():
Every call to Agent.run() produces these spans automatically:
| Span Name | Parent | Attributes | Description |
|---|---|---|---|
agent.run | (root) | agent, session | Full execution lifecycle |
memory_build | agent.run | — | Build memory context |
prompt_render | agent.run | — | Render Jinja2 prompt |
llm.complete | agent.run | input_tokens, output_tokens | Each LLM call |
tool.call | agent.run | tool | Each tool execution |
orchestration | agent.run | pattern | Orchestration pattern execution |
memory_persist | agent.run | — | Persist conversation turns |
The trace is attached to the agent response:
{ "answer": "...", "trace": { "trace_id": "abc123", "agent": "my-agent", "spans": [ { "name": "agent.run", "span_id": "s1", "start_time": "2026-03-10T...", "end_time": "2026-03-10T...", "duration_ms": 2340, "attributes": {"agent": "my-agent"}, "status": "OK" } ] }}Trace Collectors:
Traces are stored via pluggable collectors:
| Collector | Use Case | Storage |
|---|---|---|
StdoutCollector | Development/debugging | Writes JSON to stdout |
InternalCollector | Small deployments | In-memory deque (default 10K traces) |
OTLPCollector | Production | In-memory + bridges to OpenTelemetry via TelemetryManager |
from astromesh.observability.collector import InternalCollector, OTLPCollectorfrom astromesh.observability.telemetry import TelemetryManager
# Development: in-memory onlycollector = InternalCollector(max_traces=10000)
# Production: in-memory + OTel exporttelemetry = TelemetryManager()telemetry.setup()collector = OTLPCollector(telemetry_manager=telemetry)Viewing Traces
Section titled “Viewing Traces”Send traces to any OpenTelemetry-compatible backend:
- Jaeger —
docker run -p 16686:16686 jaegertracing/all-in-one:latest - Zipkin —
docker run -p 9411:9411 openzipkin/zipkin - Grafana Tempo — included in the
dev-fullDocker recipe
Open the backend UI and search by service name astromesh to see agent execution traces with timing breakdowns.
Traces API
Section titled “Traces API”Query collected traces via REST:
List traces:
GET /v1/traces/?agent=my-agent&limit=20{ "traces": [ { "trace_id": "abc123", "agent": "my-agent", "spans": [...] } ]}Get specific trace:
GET /v1/traces/{trace_id}Returns 404 if not found.
Metrics (Prometheus)
Section titled “Metrics (Prometheus)”Exposed Metrics
Section titled “Exposed Metrics”The MetricsCollector registers the following Prometheus metrics with the configurable prefix astromesh (default):
Counters
Section titled “Counters”| Metric | Labels | Description |
|---|---|---|
astromesh_agent_runs_total | agent_name, pattern, status | Total agent executions |
astromesh_provider_calls_total | provider, model, status | Total LLM provider calls |
astromesh_tool_executions_total | tool_name, status | Total tool executions |
astromesh_tokens_used_total | agent_name, direction | Total tokens consumed (direction: input or output) |
Histograms
Section titled “Histograms”| Metric | Labels | Description |
|---|---|---|
astromesh_agent_latency_seconds | agent_name, pattern | Agent run latency distribution |
astromesh_provider_latency_seconds | provider, model | Provider call latency distribution |
Gauges
Section titled “Gauges”| Metric | Labels | Description |
|---|---|---|
astromesh_active_sessions | agent_name | Currently active sessions per agent |
Scrape Endpoint
Section titled “Scrape Endpoint”When prometheus-client is installed, metrics are exposed on the default Prometheus client HTTP server. In Kubernetes, pods are auto-annotated for Prometheus scraping.
Configure your prometheus.yml scrape target:
scrape_configs: - job_name: "astromesh" static_configs: - targets: ["astromesh:8000"] metrics_path: "/metrics" scrape_interval: 15sPromQL Examples
Section titled “PromQL Examples”Common queries for dashboards and alerts:
# Agent runs per minute by agentrate(astromesh_agent_runs_total[1m])
# P95 agent latencyhistogram_quantile(0.95, rate(astromesh_agent_latency_seconds_bucket[5m]))
# Error rate by agentrate(astromesh_agent_runs_total{status="error"}[5m]) / rate(astromesh_agent_runs_total[5m])
# Token consumption per hour by agentincrease(astromesh_tokens_used_total[1h])
# Provider latency P99histogram_quantile(0.99, rate(astromesh_provider_latency_seconds_bucket[5m]))
# Currently active sessions across all agentssum(astromesh_active_sessions)Cost Tracking
Section titled “Cost Tracking”The CostTracker records per-call usage and enforces budgets. It does not require any external dependencies (it is part of the core Astromesh package).
Usage Records
Section titled “Usage Records”Every LLM provider call generates a UsageRecord:
@dataclassclass UsageRecord: agent_name: str session_id: str model: str provider: str input_tokens: int output_tokens: int cost_usd: float latency_ms: float pattern: str timestamp: datetimeRecords are stored in memory and can be queried with optional filters:
tracker.get_total_cost(agent_name="support-agent")tracker.get_total_cost(agent_name="support-agent", since=one_hour_ago)tracker.get_usage_summary(agent_name="support-agent")Budget Enforcement
Section titled “Budget Enforcement”Set a maximum spend per agent:
tracker.set_budget("support-agent", max_usd=10.0)
result = tracker.check_budget("support-agent")# {# "has_budget": True,# "budget": 10.0,# "spent": 3.42,# "remaining": 6.58,# "exceeded": False# }When the budget is exceeded, check_budget() returns exceeded: True. The runtime can use this to reject further requests for that agent.
Cost Reports
Section titled “Cost Reports”The get_usage_summary() method returns a detailed breakdown:
tracker.get_usage_summary(agent_name="support-agent")# {# "total_cost": 3.42,# "total_input_tokens": 125000,# "total_output_tokens": 42000,# "total_tokens": 167000,# "num_calls": 85,# "avg_latency_ms": 1250.5,# "by_provider": {# "openai": {"cost": 2.10, "calls": 50, "tokens": 100000},# "anthropic": {"cost": 1.32, "calls": 35, "tokens": 67000}# },# "by_model": {# "gpt-4o-mini": {"cost": 1.05, "calls": 40, "tokens": 80000},# "gpt-4o": {"cost": 1.05, "calls": 10, "tokens": 20000},# "claude-sonnet-4-20250514": {"cost": 1.32, "calls": 35, "tokens": 67000}# }# }When Rust native extensions are compiled, cost aggregation uses RustCostIndex for faster queries over large record sets. See the Rust Native Extensions guide.
Metrics API
Section titled “Metrics API”In addition to Prometheus metrics, Astromesh provides a simple REST API for in-memory counters and histograms:
Get current metrics:
GET /v1/metrics/{ "counters": { "agent.runs": 150, "tool.executions": 420 }, "histograms": { "agent.latency_ms": { "count": 150, "sum": 345000, "avg": 2300, "min": 450, "max": 8900 } }}Reset all metrics:
POST /v1/metrics/resetThese metrics are stored in memory and reset on daemon restart. For persistent metrics, use the Prometheus integration described above.
Docker Setup
Section titled “Docker Setup”The recipes/dev-full.yml Docker Compose file includes a complete observability stack:
docker compose -f recipes/dev-full.yml up -dThis starts:
| Service | URL | Purpose |
|---|---|---|
| Astromesh | http://localhost:8000 | Agent runtime |
| OTel Collector | localhost:4317 (gRPC), localhost:4318 (HTTP) | Receives traces and forwards to backends |
| Prometheus | http://localhost:9090 | Metrics storage and querying |
| Grafana | http://localhost:3000 | Dashboards and visualization |
The OTel Collector is pre-configured via docker/otel-config.yaml to receive OTLP traces and export them to the configured backend.
Grafana
Section titled “Grafana”Access Grafana at http://localhost:3000 with default credentials:
- Username:
admin - Password:
admin
To create a dashboard:
- Go to Dashboards > New > New Dashboard
- Add a panel and select Prometheus as the data source
- Use the PromQL queries from the section above
- Recommended panels:
- Agent Runs/min — time series with
rate(astromesh_agent_runs_total[1m]) - P95 Latency — time series with
histogram_quantile(0.95, ...) - Active Sessions — gauge with
sum(astromesh_active_sessions) - Token Usage — stacked bar with
increase(astromesh_tokens_used_total[1h]) - Error Rate — stat panel with the error rate query
- Agent Runs/min — time series with
Kubernetes Setup
Section titled “Kubernetes Setup”For Kubernetes deployments using the Astromesh Helm chart, enable the observability subcharts:
kube-prometheus-stack: enabled: true
opentelemetry-collector: enabled: trueWhen the OTel Collector subchart is enabled, the Astromesh deployment is automatically configured with the correct OTLP endpoint via the astromesh.otel.endpoint Helm helper. No manual endpoint configuration is needed.
The kube-prometheus-stack subchart bundles:
- Prometheus server with auto-discovery of annotated pods
- Grafana with the Prometheus data source pre-configured
- Alertmanager for alert routing
For production clusters that already have a Prometheus and Grafana deployment, leave the subcharts disabled and point Astromesh to your existing OTel Collector endpoint:
kube-prometheus-stack: enabled: false
opentelemetry-collector: enabled: false
observability: otel: endpoint: "otel-collector.monitoring.svc.cluster.local:4317"