Skip to main content
如需更详细的 PagerDuty 集成指南,请点击此处
1

部署 Webhook 桥接器

创建一个小型服务,用于监听 PagerDuty incident.resolved 事件,并启动一个 Devin 会话来撰写事后分析报告。将其部署为无服务器函数 (AWS Lambda、Cloudflare Worker) 或轻量级容器:
const express = require('express');
const crypto = require('crypto');
const app = express();
app.use(express.json());

function verifySignature(req) {
  const secret = Buffer.from(req.headers['x-webhook-secret'] || '');
  const expected = Buffer.from(process.env.WEBHOOK_SECRET || '');
  if (!expected.length) throw new Error('WEBHOOK_SECRET environment variable is not set');
  if (secret.length !== expected.length) return false;
  return crypto.timingSafeEqual(secret, expected);
}

app.post('/pagerduty-resolved', async (req, res) => {
  if (!verifySignature(req)) return res.status(401).send('Bad signature');

  const event = req.body?.event;
  if (!event || event.event_type !== 'incident.resolved') {
    return res.sendStatus(200);
  }

  const incident = event.data;
  const title = incident.title || 'Unknown incident';
  const service = incident.service?.summary || 'unknown-service';
  const urgency = incident.urgency || 'high';
  const incidentUrl = incident.html_url || '';
  const createdAt = incident.created_at || '';
  const resolvedAt = incident.resolved_at || new Date().toISOString();

  const orgId = process.env.DEVIN_ORG_ID;
  const response = await fetch(
    `https://api.devin.ai/v3/organizations/${orgId}/sessions`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.DEVIN_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      prompt: [
        `A PagerDuty incident has been resolved. Draft a postmortem.`,
        ``,
        `Incident: "${title}"`,
        `Service: ${service}`,
        `Urgency: ${urgency}`,
        `Created: ${createdAt}`,
        `Resolved: ${resolvedAt}`,
        `Incident URL: ${incidentUrl}`,
        ``,
        `Write a structured postmortem:`,
        `1. Use the Datadog MCP to pull logs and metrics for ${service} during the incident window`,
        `2. Identify the root cause — check for deploys, config changes, or upstream failures`,
        `3. Build a detailed timeline from first alert to resolution`,
        `4. List action items to prevent recurrence`,
        `5. Post the postmortem as a PR to our docs/postmortems/ directory`,
      ].join('\n'),
      tags: ['pagerduty-postmortem', `service:${service}`],
    }),
  });

  const { session_id } = await response.json();
  console.log(`Started postmortem session ${session_id} for: ${title}`);
  res.sendStatus(200);
});

app.listen(3000);
app.devin.ai设置 > 服务用户 中创建一个 服务用户,并为其授予 ManageOrgSessions 权限。复制创建后显示的 API 令牌,并将其保存为桥接服务中的 DEVIN_API_KEY。将 DEVIN_ORG_ID 设置为你的组织 ID——可通过使用你的令牌调用 GET https://api.devin.ai/v3/enterprise/organizations 获取。将 WEBHOOK_SECRET 设置为一个你也会在 PagerDuty 中配置的共享密钥。
2

配置 PagerDuty

  1. 在 PagerDuty 中,前往 Services > [your service] > Integrations
  2. 点击 Add Integration,然后选择 Generic Webhooks (v3)
  3. Webhook URL 设置为你的桥接端点 (例如:https://your-bridge.example.com/pagerduty-resolved)
  4. Custom Headers 下,添加 X-Webhook-Secret,其值与您保存为 WEBHOOK_SECRET 的值相同
  5. Event Subscription 下,按事件类型 incident.resolved 进行筛选,以便仅在事件关闭后触发事后复盘
如果你希望 Devin 在事件仍在处理中时就开始收集数据,也可以订阅 incident.acknowledged,然后在事件解析后完成事后复盘。
3

接入可观测性 MCP(可选)

当 Devin 可以访问你的遥测数据时,就能撰写更高质量的事后复盘。启用一个或多个 MCP,以便 Devin 可以拉取事故时间段内的真实数据:Datadog MCP — 前往 Settings > MCP Marketplace,找到 Datadog,点击 Enable,然后输入你的 API/Application 密钥。Devin 将查询日志、指标、部署事件和监控历史。Sentry MCP — 在 MCP Marketplace 中找到 Sentry,点击 Enable,并完成 OAuth 流程。Devin 将拉取错误详情、堆栈跟踪和发布标签。连接后,Devin 会自动将遥测数据与事故时间线关联起来,从而生成一份有据可依的事后复盘。详细了解连接 MCP 服务器
4

Devin 生成的内容

当 PagerDuty 事件解决后,Devin 会分析事件时间窗口,并起草一份结构化的事后复盘报告:Devin 生成的事后复盘报告示例:
# Postmortem: Database Connection Pool Exhaustion — orders-service
**Date:** 2026-02-10 | **Duration:** 46 minutes | **Severity:** P1

## Summary
orders-service experienced connection pool exhaustion between
14:32 and 15:18 UTC, causing 502 errors for ~12% of order
placement requests.

## Timeline
- 14:15 UTC — Deploy #387 released (commit e4f29a1)
- 14:28 UTC — Connection pool usage climbed from 60% to 92%
- 14:32 UTC — Pool exhausted; PagerDuty incident triggered
- 14:38 UTC — On-call engineer acknowledged
- 14:45 UTC — Identified Deploy #387 added a new inventory
              check that opens a DB connection per line item
              without releasing it in the finally block
- 15:02 UTC — Rollback to Deploy #386 initiated
- 15:18 UTC — Connection pool recovered; incident resolved

## Root Cause
Deploy #387 introduced `checkInventoryAvailability()` in
`src/services/orders.ts:142`. The function opens a new DB
connection for each line item in an order but only releases
it on the success path. When inventory checks fail (timeout
or stock unavailable), connections leak.

Orders with 5+ line items reliably exhausted the pool within
15 minutes of the deploy.

## Action Items
- [ ] Fix connection leak: add `finally` block to release
      connection (PR #388 opened)
- [ ] Add connection pool usage monitor with alert at 80%
- [ ] Add integration test for multi-item orders with
      simulated inventory failures
- [ ] Review other DB access patterns for similar leak risks
5

自定义复盘

根据你团队的事后复盘流程定制这条流水线:使用 Playbook 定义你的事后复盘模板——包括章节、严重性分级、必填字段以及输出的存储位置。在 API 请求中传入 playbook_id,即可统一每一份事后复盘的格式。按严重性分流。 在你的桥接服务中添加逻辑,仅为 P1/P2 事故生成事后复盘。严重性较低的事故可能不需要完整的书面总结。添加 Knowledge,纳入你的架构、服务负责人以及过往事故等信息,这样 Devin 就能串联起相关线索——例如,“orders-service 依赖 inventory-service,而后者已知在高负载下容易出现超时问题。”发布到你的 wiki。 与其提交到 repo,不如让 Devin 通过会话提示将事后复盘发布到 Confluence、Notion 或你的内部 wiki。