网站/服务监控 - 运维自动化
利用 OpenClaw 的 Polls(轮询)+ Cron(定时任务)+ Webhooks 组合,实现对企业网站、API 接口和内部服务的全方位自动化监控。支持分钟级可用性检测、响应时间追踪、多渠道告警推送和月度 SLA Report(SLA 报告)自动生成。
整体架构
┌─────────────────────────────────────────────┐
│ OpenClaw Gateway │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Polls │ │ Cron │ │ Webhooks │ │
│ │ 实时检测 │ │ 定时报告 │ │ 接收外部 │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ └──────────────┼──────────────┘ │
│ ▼ │
│ ┌──────────┐ │
│ │ Agent │ │
│ │ 分析&告警 │ │
│ └────┬─────┘ │
│ │ │
└────────────────────┼─────────────────────────┘
▼
┌────────────────────────┐
│ 企业微信/钉钉/短信/邮件 │
└────────────────────────┘HTTP 端点监控
Polls 配置
使用 Polls 进行周期性 HTTP Health Check(HTTP 健康检查):
json
{
"polls": {
"entries": {
"website-monitor": {
"enabled": true,
"interval": 60000,
"source": {
"type": "http",
"targets": [
{
"name": "官网首页",
"url": "https://www.example.com",
"method": "GET",
"timeout": 10000,
"expectedStatus": 200
},
{
"name": "API 接口",
"url": "https://api.example.com/health",
"method": "GET",
"timeout": 5000,
"expectedStatus": 200,
"expectedBody": "{\"status\":\"ok\"}"
},
{
"name": "后台管理系统",
"url": "https://admin.example.com/login",
"method": "GET",
"timeout": 15000,
"expectedStatus": 200
}
]
},
"action": {
"onFailure": {
"session": "isolated",
"message": "服务异常告警:{target.name} 访问失败",
"delivery": "announce"
}
}
}
}
}
}单目标详细配置
json
{
"name": "支付接口",
"url": "https://api.example.com/pay/health",
"method": "POST",
"headers": {
"Authorization": "Bearer ${MONITOR_TOKEN}",
"Content-Type": "application/json"
},
"body": "{\"action\":\"ping\"}",
"timeout": 5000,
"expectedStatus": 200,
"expectedBody": "{\"alive\":true}",
"priority": "critical"
}响应时间追踪
告警阈值配置
json
{
"monitoring": {
"thresholds": {
"responseTime": {
"warning": 2000,
"critical": 5000,
"timeout": 10000
},
"availability": {
"warning": 99.5,
"critical": 99.0
},
"consecutiveFailures": {
"warning": 2,
"critical": 5
}
}
}
}| 级别 | 响应时间 | 可用率 | 连续失败 | 动作 |
|---|---|---|---|---|
normal | < 2s | ≥ 99.5% | 0 | 仅记录 |
warning | 2s - 5s | 99.0% - 99.5% | 2 次 | 企业微信通知 |
critical | > 5s | < 99.0% | 5 次 | 企业微信 + 短信 + 电话 |
timeout | > 10s | - | - | 紧急升级 |
告警推送配置
企业微信 (WeCom)
json
{
"alerts": {
"channels": {
"wecom": {
"enabled": true,
"webhook": "${WECOM_ALERT_WEBHOOK}",
"template": {
"msgtype": "markdown",
"markdown": {
"content": "🚨 **服务告警**\n> 服务: {target.name}\n> 状态: <font color=\"warning\">{status}</font>\n> 响应时间: {responseTime}ms\n> 时间: {timestamp}\n> 详情: {errorMessage}"
}
}
}
}
}
}钉钉 (DingTalk)
json
{
"alerts": {
"channels": {
"dingtalk": {
"enabled": true,
"webhook": "${DINGTALK_ALERT_WEBHOOK}",
"secret": "${DINGTALK_SECRET}",
"template": {
"msgtype": "markdown",
"markdown": {
"title": "服务告警",
"text": "## 🚨 服务告警\n- **服务**: {target.name}\n- **状态**: {status}\n- **响应时间**: {responseTime}ms\n- **时间**: {timestamp}"
}
},
"atMobiles": ["13800138000"]
}
}
}
}短信告警 (SMS)
json
{
"alerts": {
"channels": {
"sms": {
"enabled": true,
"provider": "aliyun",
"accessKeyId": "${ALIYUN_SMS_KEY}",
"accessKeySecret": "${ALIYUN_SMS_SECRET}",
"signName": "OpenClaw监控",
"templateCode": "SMS_123456",
"phoneNumbers": ["13800138000", "13900139000"],
"triggerLevel": "critical"
}
}
}
}月度 SLA 报告
Cron 配置
每月 1 号自动生成上月 SLA 报告:
bash
openclaw cron add \
--name "monthly-sla-report" \
--cron "0 9 1 * *" \
--timezone "Asia/Shanghai" \
--session isolated \
--message "生成上月SLA报告:统计所有监控目标的可用率、平均响应时间、故障次数、最长故障时长,生成详细报告并发送到运维群和管理层"报告内容模板
📊 月度 SLA 报告 - {year}年{month}月
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
【总体概况】
监控目标数: {totalTargets}
平均可用率: {avgAvailability}%
告警总次数: {totalAlerts}
【各服务明细】
| 服务名称 | 可用率 | 平均响应 | 故障次数 | 最长故障 |
|---------|----------|---------|---------|---------|
| 官网 | 99.97% | 320ms | 1 | 3min |
| API | 99.85% | 180ms | 3 | 12min |
| 后台 | 99.99% | 450ms | 0 | 0min |
【故障回顾】
1. 03-12 14:23 API 接口超时 (12分钟) - 数据库连接池满
2. 03-18 09:45 官网响应慢 (3分钟) - CDN 节点异常
【改进建议】
- API 接口建议增加数据库连接池配置
- 建议增加 CDN 多节点容灾日报与周报
json
{
"cron": {
"entries": {
"daily-monitor-summary": {
"cron": "0 8 * * *",
"timezone": "Asia/Shanghai",
"session": "isolated",
"message": "生成昨日监控日报:可用率、告警汇总、异常事件"
},
"weekly-monitor-summary": {
"cron": "0 9 * * 1",
"timezone": "Asia/Shanghai",
"session": "isolated",
"message": "生成上周监控周报:可用率趋势、响应时间变化、告警统计"
},
"monthly-sla-report": {
"cron": "0 9 1 * *",
"timezone": "Asia/Shanghai",
"session": "isolated",
"message": "生成月度SLA报告"
}
}
}
}自定义告警规则
通过 Hooks 实现复杂的告警逻辑:
typescript
// hooks/smart-alert/handler.ts
import type { HookContext } from "@openclaw/sdk";
export default async function handler(ctx: HookContext) {
const { target, status, responseTime, consecutiveFailures } = ctx.event;
// 规则1: 连续 3 次失败才告警(避免误报)
if (consecutiveFailures < 3) return;
// 规则2: 非工作时间升级告警级别
const hour = new Date().getHours();
const isOffHours = hour < 8 || hour > 22;
// 规则3: 核心服务优先告警
const isCritical = target.priority === "critical";
if (isCritical || isOffHours) {
// 发送短信 + 电话
await ctx.agent.tool("send-sms", {
phones: ["13800138000"],
message: `紧急: ${target.name} 连续${consecutiveFailures}次故障`,
});
}
// 所有告警都发企业微信
await ctx.agent.tool("send-wecom", {
webhook: "${WECOM_ALERT_WEBHOOK}",
content: `🚨 ${target.name} 异常 | 状态: ${status} | 响应: ${responseTime}ms`,
});
}完整配置汇总
json
{
"polls": {
"entries": {
"website-monitor": {
"enabled": true,
"interval": 60000,
"source": { "type": "http", "targets": ["..."] }
}
}
},
"cron": {
"entries": {
"daily-monitor-summary": { "cron": "0 8 * * *", "timezone": "Asia/Shanghai" },
"weekly-monitor-summary": { "cron": "0 9 * * 1", "timezone": "Asia/Shanghai" },
"monthly-sla-report": { "cron": "0 9 1 * *", "timezone": "Asia/Shanghai" }
}
},
"monitoring": {
"thresholds": {
"responseTime": { "warning": 2000, "critical": 5000 },
"consecutiveFailures": { "warning": 2, "critical": 5 }
}
},
"alerts": {
"channels": {
"wecom": { "enabled": true, "webhook": "${WECOM_ALERT_WEBHOOK}" },
"dingtalk": { "enabled": true, "webhook": "${DINGTALK_ALERT_WEBHOOK}" },
"sms": { "enabled": true, "provider": "aliyun", "triggerLevel": "critical" }
}
}
}🇨🇳 中国用户须知
- 短信通道:推荐使用阿里云短信或腾讯云短信服务,需提前完成签名和模板审核
- 电话告警:严重故障时可使用阿里云语音通知服务实现自动电话告警
- CDN 监控:国内网站通常使用 CDN 加速,建议从多个地区(北京、上海、广州)分别监测
- 备案要求:被监控的网站域名需完成 ICP 备案,否则可能影响健康检查结果
- 合规告警:监控数据和告警记录建议保留至少 6 个月,符合等保(等级保护)要求
