DevOps / Infra agent for a zero-human AI startup — the keeper of the CI/CD pipeline, cloud infrastructure, and observability stack. Automates everything, monitors everything, and keeps infrastructure costs controlled.
DevOps / Infra Agent
The keeper of the pipeline and the infrastructure for a zero-human AI startup. This agent ensures code gets from merge to production safely (CI/CD), cloud resources are provisioned and right-sized (infrastructure), everything is observable (monitoring/alerting), and infrastructure spend stays controlled (cost optimization). If the Lead Engineer builds the house, DevOps keeps the electricity and plumbing running.
Quick Start
-
Deploy the agent using OpenClaw with the ClawPack bundle:
clawpack install @agentebox/devops -
Configure communication channels — DevOps needs to send/receive messages to the CTO (upstream), Lead Engineer (deployment coordination), and QA/Testing (test infrastructure).
-
Set up the Remote Project Board — primary tracking for infrastructure tasks, incidents, and cost optimization projects.
-
Connect infrastructure tools — IaC (Terraform/Pulumi), CI/CD platform, monitoring stack, cloud provider APIs.
-
Configure cadences — daily infrastructure check (morning, 10 min), weekly infrastructure planning (Monday, 30 min), monthly infrastructure review (last Monday, 60 min).
-
Initialize monitoring — set up dashboards and alerts for all existing services before the agent starts its regular cadence.
Environment Variables
| Variable | Description | Required |
|---|---|---|
REMOTE_PROJECT_ID | Project ID on the Remote board | Yes |
CTO_AGENT_ID | Session ID or label for the CTO agent | Yes |
LEAD_ENGINEER_AGENT_ID | Session ID or label for the Lead Engineer agent | Yes |
CLOUD_PROVIDER | Primary cloud provider (aws/gcp/azure) | Yes |
INFRA_BUDGET_PER_SERVICE | Max monthly spend per service without CTO approval (default: $100) | No |
ALERT_NOISE_TARGET | Maximum acceptable alert noise ratio (default: 0.10) | No |
UPTIME_SLA_TARGET | Target uptime percentage (default: 99.5) | No |
File Listing
| File | Description |
|---|---|
SOUL.md | Complete agent identity: behaviors, decision framework, communication protocols, boundaries, failure modes |
IDENTITY.md | Quick-reference identity card (name, role, emoji) |
manifest.json | Machine-readable configuration: skills, tools, cadences, autonomy levels |
README.md | This file — setup guide and integration reference |
skills/cicd-management/SKILL.md | Pipeline health, deployment configuration, fallback runbooks |
skills/infrastructure-provisioning/SKILL.md | IaC-based provisioning, capacity planning, resource management |
skills/monitoring-alerting/SKILL.md | Observability setup, alert tuning, incident detection, capacity monitoring |
skills/cost-optimization/SKILL.md | Waste identification, right-sizing, commitment evaluation, cost-per-unit tracking |
Architecture
CTO
↕ (directives, incident reports, cost reports)
DevOps / Infra ──── 🔧
├── CI/CD Pipeline (automated build → test → deploy)
├── Cloud Infrastructure (compute, storage, networking)
├── Monitoring Stack (metrics, logs, traces, alerts)
└── Cost Management (tracking, optimization, reporting)
Coordinates with:
→ Lead Engineer (deployment support, infrastructure requests)
→ QA / Testing (test infrastructure)
→ COO (cost reporting via CTO)
Framework Integration
OpenClaw (Native)
# openclaw.yaml
agent:
name: devops
soul: ./SOUL.md
identity: ./IDENTITY.md
skills:
- ./skills/cicd-management/
- ./skills/infrastructure-provisioning/
- ./skills/monitoring-alerting/
- ./skills/cost-optimization/
heartbeat:
interval: 15m
file: ./HEARTBEAT.md
CrewAI
from crewai import Agent, Task, Crew
devops = Agent(
role="DevOps / Infra",
goal="Keep the pipeline running, infrastructure reliable, and costs controlled",
backstory=open("SOUL.md").read(),
tools=[iac_tool, cicd_tool, monitoring_tool, remote_board_tool],
verbose=True
)
daily_check = Task(
description="Run daily infrastructure check: review alerts, verify service health, check deployment queue",
agent=devops,
expected_output="Infrastructure health report with any issues flagged"
)
crew = Crew(agents=[devops], tasks=[daily_check], verbose=True)
crew.kickoff()
Monitoring
The DevOps agent is healthy when:
- Deployment success rate stays above 95%
- Mean time to deploy stays under 15 minutes
- Alert noise ratio stays below 10% (most alerts require action)
- Uptime SLA stays above 99.5%
- Infrastructure costs stay within budget (±10%)
- All production services have monitoring, alerts, and runbooks
Warning signs:
- Deployment queue building up (pipeline bottleneck)
- Alert volume increasing without corresponding incidents
- Infrastructure costs rising faster than traffic/revenue
- IaC drift detected (manual changes in production)
- Any production service without monitoring coverage
- Incident MTTR increasing (harder to fix things)
Version History
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-03-16 | Initial creation |