Skip to content

Helm / Kubernetes

This guide covers deploying Astromesh on Kubernetes using the official Helm chart. The chart packages the Astromesh runtime with optional PostgreSQL, Redis, Ollama, vLLM, HuggingFace TEI, and a full observability stack.

The Helm chart provides a production-grade Kubernetes deployment with:

  • Infrastructure as subcharts — PostgreSQL, Redis, and Ollama can be deployed alongside Astromesh or replaced with external managed services
  • Model serving — optional vLLM and HuggingFace TEI deployments with GPU scheduling
  • Observability — Prometheus, Grafana (via kube-prometheus-stack), and OpenTelemetry Collector
  • Configuration as code — Astromesh YAML config (runtime, providers, agents, channels) defined inline in values.yaml
  • Environment profiles — pre-built values-dev.yaml, values-staging.yaml, and values-prod.yaml
  • External Secrets — optional ESO integration for AWS, GCP, and Vault secret management
  • CRDs — Kubernetes-native Agent, Provider, Channel, and RAGPipeline resources
RequirementVersionCheck command
Kubernetes1.26+kubectl version
Helm3.12+helm version
kubectlconfiguredkubectl cluster-info
Container registryaccessiblevaries
deploy/helm/astromesh/
├── Chart.yaml # Chart metadata and dependencies
├── values.yaml # Default configuration
├── values-dev.yaml # Development overrides
├── values-staging.yaml # Staging overrides
├── values-prod.yaml # Production overrides
├── crds/ # Custom Resource Definitions
│ ├── agent-crd.yaml
│ ├── provider-crd.yaml
│ ├── channel-crd.yaml
│ └── ragpipeline-crd.yaml
└── templates/
├── _helpers.tpl # Template helpers
├── deployment.yaml # Astromesh API deployment
├── service.yaml # ClusterIP service
├── ingress.yaml # Optional ingress
├── hpa.yaml # Horizontal Pod Autoscaler
├── configmap-runtime.yaml
├── configmap-providers.yaml
├── configmap-channels.yaml
├── configmap-agents.yaml
├── secret.yaml
├── serviceaccount.yaml
├── deployment-vllm.yaml
├── service-vllm.yaml
├── deployment-tei.yaml
├── service-tei.yaml
├── external-secret.yaml
├── secret-store.yaml
└── NOTES.txt
SubchartRepositoryDefaultPurpose
postgresqlBitnamienabledEpisodic memory, pgvector semantic search
redisBitnamienabledConversational memory cache
ollamaollama-helmdisabledLocal model serving
kube-prometheus-stackprometheus-communitydisabledPrometheus + Grafana + AlertManager
opentelemetry-collectoropen-telemetrydisabledTrace and metric collection
Terminal window
cd deploy/helm/astromesh
helm dependency update

Expected output:

Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈Happy Helming!⎈
Saving 5 charts
Downloading postgresql from repo https://charts.bitnami.com/bitnami
Downloading redis from repo https://charts.bitnami.com/bitnami
Downloading ollama from repo https://otwld.github.io/ollama-helm
Deleting outdated charts
Terminal window
helm install astromesh ./deploy/helm/astromesh \
-f deploy/helm/astromesh/values-dev.yaml \
--namespace astromesh \
--create-namespace

Expected output:

NAME: astromesh
LAST DEPLOYED: Mon Mar 9 10:00:00 2026
NAMESPACE: astromesh
STATUS: deployed
REVISION: 1
NOTES:
Astromesh has been deployed!
API endpoint: http://astromesh.astromesh.svc:8000
Health check: kubectl port-forward svc/astromesh 8000:8000
To verify: curl http://localhost:8000/health
Terminal window
kubectl get pods -n astromesh

Expected output:

NAME READY STATUS RESTARTS AGE
astromesh-5d8f9c7b6-x2k4m 1/1 Running 0 60s
astromesh-postgresql-0 1/1 Running 0 60s
astromesh-redis-master-0 1/1 Running 0 60s
Terminal window
kubectl port-forward svc/astromesh 8000:8000 -n astromesh

In another terminal:

Terminal window
curl http://localhost:8000/health

Expected output:

{
"status": "healthy",
"version": "0.10.0"
}

Astromesh configuration files are defined inline in values.yaml and mounted as ConfigMaps:

config:
runtime: |
apiVersion: astromesh/v1
kind: RuntimeConfig
metadata:
name: default
spec:
api:
host: "0.0.0.0"
port: 8000
defaults:
orchestration:
pattern: react
max_iterations: 10
providers: |
apiVersion: astromesh/v1
kind: ProviderConfig
metadata:
name: default-providers
spec:
providers:
ollama:
type: ollama
endpoint: "http://ollama:11434"
models:
- "llama3.1:8b"
health_check_interval: 30
openai:
type: openai_compat
endpoint: "https://api.openai.com/v1"
api_key_env: OPENAI_API_KEY
models:
- "gpt-4o"
- "gpt-4o-mini"
routing:
default_strategy: cost_optimized
fallback_enabled: true
circuit_breaker:
failure_threshold: 3
recovery_timeout: 60
channels: |
channels:
whatsapp:
verify_token: "${WHATSAPP_VERIFY_TOKEN}"
access_token: "${WHATSAPP_ACCESS_TOKEN}"
phone_number_id: "${WHATSAPP_PHONE_NUMBER_ID}"
app_secret: "${WHATSAPP_APP_SECRET}"
default_agent: "whatsapp-assistant"
agents:
support-agent.agent.yaml: |
apiVersion: astromesh/v1
kind: Agent
metadata:
name: support-agent
spec:
identity:
display_name: "Support Agent"
model:
primary:
provider: ollama
model: llama3.1:8b
orchestration:
pattern: react
max_iterations: 5

Each section becomes a ConfigMap mounted at /app/config/ inside the pod.

For development, set secret values directly in your values file:

secrets:
create: true
values:
OPENAI_API_KEY: "sk-dev-key-here"
WHATSAPP_VERIFY_TOKEN: "dev-verify-token"
WHATSAPP_ACCESS_TOKEN: "dev-access-token"

For production, create a Kubernetes Secret separately and reference it:

Terminal window
kubectl create secret generic astromesh-secrets \
--from-literal=OPENAI_API_KEY="sk-prod-..." \
--from-literal=WHATSAPP_ACCESS_TOKEN="EAAx..." \
-n astromesh

Then in values:

secrets:
create: false
existingSecret: "astromesh-secrets"

To use an external PostgreSQL instance (e.g., AWS RDS) instead of the subchart:

postgresql:
enabled: false
externalDatabase:
host: "your-rds-instance.region.rds.amazonaws.com"
port: "5432"
database: "astromesh"
username: "astromesh"
existingSecret: "astromesh-db-credentials" # Secret with key DATABASE_PASSWORD

Create the credentials secret:

Terminal window
kubectl create secret generic astromesh-db-credentials \
--from-literal=DATABASE_PASSWORD="your-rds-password" \
-n astromesh

To use an external Redis instance (e.g., AWS ElastiCache):

redis:
enabled: false
externalRedis:
host: "your-elasticache.region.cache.amazonaws.com"
port: "6379"
existingSecret: "astromesh-redis-credentials" # Secret with key REDIS_PASSWORD

Deploy vLLM for high-throughput, GPU-accelerated inference:

vllm:
enabled: true
model: "mistralai/Mistral-7B-Instruct-v0.3"
extraArgs:
- "--max-model-len"
- "4096"
resources:
requests:
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"
nodeSelector:
gpu: "true"
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule

If the model is gated (requires HuggingFace authentication):

vllm:
huggingfaceToken: "hf_..."
# Or reference an existing secret:
# existingSecret: "hf-token"

HuggingFace TEI (embeddings and reranking)

Section titled “HuggingFace TEI (embeddings and reranking)”

Deploy Text Embeddings Inference for semantic search:

tei:
enabled: true
instances:
- name: embeddings
modelId: "BAAI/bge-small-en-v1.5"
port: 8002
resources:
limits:
nvidia.com/gpu: "1"
nodeSelector:
gpu: "true"
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
- name: reranker
modelId: "BAAI/bge-reranker-base"
port: 8003
resources:
limits:
nvidia.com/gpu: "1"
nodeSelector:
gpu: "true"
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule

Each instances[] entry creates a separate Deployment and Service.

GPU workloads (vLLM, TEI) need to be scheduled on nodes with GPUs. Configure per-service:

vllm:
nodeSelector:
gpu: "true" # Only schedule on GPU-labeled nodes
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule # Tolerate GPU node taints
resources:
requests:
nvidia.com/gpu: "1" # Request 1 GPU
limits:
nvidia.com/gpu: "1" # Limit to 1 GPU

Label your GPU nodes:

Terminal window
kubectl label node gpu-node-1 gpu=true

Taint GPU nodes to prevent non-GPU workloads:

Terminal window
kubectl taint nodes gpu-node-1 nvidia.com/gpu=:NoSchedule

Enabled by default. The Astromesh Service gets Prometheus scrape annotations:

observability:
prometheus:
enabled: true # Adds prometheus.io/scrape: "true" to Service

Point Astromesh to an existing OTel collector:

observability:
otel:
enabled: true
endpoint: "http://otel-collector.monitoring:4317"

Deploy an OTel Collector alongside Astromesh. The endpoint is auto-resolved:

opentelemetry-collector:
enabled: true
mode: deployment
config:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
exporters:
debug: {}
prometheus:
endpoint: 0.0.0.0:8889
service:
pipelines:
traces:
receivers: [otlp]
exporters: [debug]
metrics:
receivers: [otlp]
exporters: [prometheus]

Enable kube-prometheus-stack for Prometheus, Grafana, and AlertManager:

kube-prometheus-stack:
enabled: true
prometheus:
prometheusSpec:
serviceMonitorSelectorNilUsesHelmValues: false
grafana:
adminPassword: admin
opentelemetry-collector:
enabled: true
observability:
prometheus:
enabled: true
otel:
enabled: true

Access Grafana:

Terminal window
kubectl port-forward svc/astromesh-grafana 3000:80 -n astromesh

Open http://localhost:3000 (admin/admin).

ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
hosts:
- host: astromesh.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: astromesh-tls
hosts:
- astromesh.example.com

Verify:

Terminal window
kubectl get ingress -n astromesh

Expected output:

NAME CLASS HOSTS ADDRESS PORTS AGE
astromesh nginx astromesh.example.com 203.0.113.10 80, 443 5m

Enable the Horizontal Pod Autoscaler:

autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70

Verify:

Terminal window
kubectl get hpa -n astromesh

Expected output:

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
astromesh Deployment/astromesh 35%/70% 2 10 2 5m

The chart ships with three environment profiles:

SettingDevStagingProd
Replicas123 (HPA 3-10)
PostgreSQLSubchartSubchartExternal (RDS)
RedisSubchartSubchartExternal (ElastiCache)
OllamaSubchart
vLLMEnabled (no GPU limits)Enabled (GPU)Enabled (GPU)
TEIEmbeddings onlyEmbeddingsEmbeddings + Reranker
ObservabilityFull stack (subcharts)OTel onlyOTel to external
IngressDisabledEnabledEnabled + TLS
SecretsInlineInlineexistingSecret
Resources250m/512Mi500m/1Gi1/2Gi - 4/4Gi

Install with a specific profile:

Terminal window
# Development
helm install astromesh ./deploy/helm/astromesh -f deploy/helm/astromesh/values-dev.yaml -n astromesh-dev --create-namespace
# Staging
helm install astromesh ./deploy/helm/astromesh -f deploy/helm/astromesh/values-staging.yaml -n astromesh-staging --create-namespace
# Production
helm install astromesh ./deploy/helm/astromesh -f deploy/helm/astromesh/values-prod.yaml -n astromesh-prod --create-namespace

For production secret management, use the External Secrets Operator to sync secrets from AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault.

Prerequisite: ESO must be installed in your cluster. It is a cluster-level operator, not deployed per-application.

externalSecrets:
enabled: true
refreshInterval: 1h
secretStore:
enabled: true
kind: SecretStore
provider:
aws:
service: SecretsManager
region: us-east-1
auth:
secretRef:
accessKeyIDSecretRef:
name: aws-credentials
key: access-key-id
secretAccessKeySecretRef:
name: aws-credentials
key: secret-access-key
externalSecrets:
keys:
- secretKey: OPENAI_API_KEY
remoteRef:
key: astromesh/openai
property: api_key
- secretKey: WHATSAPP_ACCESS_TOKEN
remoteRef:
key: astromesh/whatsapp
property: access_token
- secretKey: DATABASE_PASSWORD
remoteRef:
key: astromesh/database
property: password

When externalSecrets.enabled=true, the ExternalSecret resource creates a Kubernetes Secret with the same name as the Astromesh release. Set secrets.create: false to avoid conflicts with the inline secret.

AWS Secrets Manager:

provider:
aws:
service: SecretsManager
region: us-east-1
auth:
secretRef:
accessKeyIDSecretRef:
name: aws-credentials
key: access-key-id
secretAccessKeySecretRef:
name: aws-credentials
key: secret-access-key

GCP Secret Manager:

provider:
gcpsm:
projectID: my-gcp-project
auth:
secretRef:
secretAccessKeySecretRef:
name: gcp-credentials
key: secret-access-credentials

HashiCorp Vault:

provider:
vault:
server: https://vault.example.com
path: secret
version: v2
auth:
kubernetes:
mountPath: kubernetes
role: astromesh

The chart installs four Custom Resource Definitions for Kubernetes-native agent management:

CRDGroupKindScope
agents.astromesh.ioastromesh.ioAgentNamespaced
providers.astromesh.ioastromesh.ioProviderNamespaced
channels.astromesh.ioastromesh.ioChannelNamespaced
ragpipelines.astromesh.ioastromesh.ioRAGPipelineNamespaced

All CRDs use API version v1alpha1 with a status subresource.

Terminal window
kubectl apply -f - <<EOF
apiVersion: astromesh.io/v1alpha1
kind: Agent
metadata:
name: support-agent
namespace: astromesh
spec:
identity:
display_name: "Support Agent"
description: "Handles customer support queries"
model:
primary:
provider: ollama
model: llama3.1:8b
orchestration:
pattern: react
max_iterations: 5
EOF

Expected output:

agent.astromesh.io/support-agent created
Terminal window
kubectl get agents -n astromesh

Expected output:

NAME AGE
support-agent 10s
Terminal window
kubectl apply -f - <<EOF
apiVersion: astromesh.io/v1alpha1
kind: Provider
metadata:
name: openai
namespace: astromesh
spec:
type: openai_compat
endpoint: "https://api.openai.com/v1"
api_key_env: OPENAI_API_KEY
models:
- gpt-4o
- gpt-4o-mini
EOF
Terminal window
kubectl apply -f - <<EOF
apiVersion: astromesh.io/v1alpha1
kind: Channel
metadata:
name: whatsapp
namespace: astromesh
spec:
type: whatsapp
default_agent: support-agent
rate_limit:
window_seconds: 60
max_messages: 30
EOF
Terminal window
kubectl apply -f - <<EOF
apiVersion: astromesh.io/v1alpha1
kind: RAGPipeline
metadata:
name: docs-search
namespace: astromesh
spec:
embeddings:
provider: tei
endpoint: "http://astromesh-tei-embeddings:8002"
vector_store:
type: pgvector
chunking:
strategy: recursive
chunk_size: 512
overlap: 50
EOF

Note: CRD definitions are installed with the chart, but the reconciliation controller is a separate project. Currently, CRDs serve as documentation of intent and can be used by external automation.

Terminal window
helm install astromesh ./deploy/helm/astromesh \
-f deploy/helm/astromesh/values-dev.yaml \
-n astromesh --create-namespace
Terminal window
helm upgrade astromesh ./deploy/helm/astromesh \
-f deploy/helm/astromesh/values-prod.yaml \
-n astromesh
Terminal window
helm upgrade astromesh ./deploy/helm/astromesh \
-f deploy/helm/astromesh/values-prod.yaml \
-n astromesh --dry-run --debug
Terminal window
helm template astromesh ./deploy/helm/astromesh \
-f deploy/helm/astromesh/values-dev.yaml
Terminal window
helm lint ./deploy/helm/astromesh \
-f deploy/helm/astromesh/values-prod.yaml

Expected output:

==> Linting ./deploy/helm/astromesh
[INFO] Chart.yaml: icon is recommended
1 chart(s) linted, 0 chart(s) failed
Terminal window
helm uninstall astromesh -n astromesh

Note: CRDs are not removed on uninstall (Helm convention). Remove manually if needed:

Terminal window
kubectl delete crd agents.astromesh.io providers.astromesh.io channels.astromesh.io ragpipelines.astromesh.io
Terminal window
helm get values astromesh -n astromesh
Terminal window
kubectl get all -n astromesh

Expected output:

NAME READY STATUS RESTARTS AGE
pod/astromesh-5d8f9c7b6-x2k4m 1/1 Running 0 5m
pod/astromesh-postgresql-0 1/1 Running 0 5m
pod/astromesh-redis-master-0 1/1 Running 0 5m
NAME TYPE CLUSTER-IP PORT(S)
service/astromesh ClusterIP 10.96.100.1 8000/TCP
service/astromesh-postgresql ClusterIP 10.96.100.2 5432/TCP
service/astromesh-redis-master ClusterIP 10.96.100.3 6379/TCP
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/astromesh 1/1 1 1 5m

Before going to production, verify the following:

  • Secrets: Using existingSecret or External Secrets, not inline values
  • Database: External managed PostgreSQL (RDS, Cloud SQL) with backups
  • Redis: External managed Redis (ElastiCache, Memorystore) or Redis with persistence
  • Ingress: TLS enabled with valid certificate (cert-manager or manual)
  • Resources: CPU and memory requests/limits set for all workloads
  • Autoscaling: HPA enabled with appropriate min/max replicas
  • Observability: OTel tracing and Prometheus metrics enabled
  • GPU: nodeSelector and tolerations set for GPU workloads
  • Network policies: Restrict traffic between namespaces if required
  • RBAC: ServiceAccount with minimal permissions
  • Image: Using a specific image tag, not latest
  • Backups: Database backup schedule configured