- ✅ - ✅ - ✅
5.6 KiB
Setrip — Admin System Health Roadmap
Admin perlu visibilitas atas job otomatis (cron) dan deteksi state yang nyangkut (Payment stale, Payout overdue, Refund mandek).
Skenario nyata: cron auto-complete trip crash karena env variable rusak. 50 trip yang sudah lewat tanggalnya tetap
OPENselama 3 hari sampai peserta komplain "kenapa belum bisa kasih review". Admin tidak punya cara cek status cron tanpa SSH ke server.
Baseline
- ✅ Cron infra ada (system crontab +
CRON_SECRET) — lihat docs/CRON_SETUP.md. - ✅ Cron jobs aktif:
app/api/cron/auto-complete-trips/route.ts, payout release (lewatpayoutService), kemungkinan refund timeout. - ❌ Tidak ada log/audit per cron run (success/fail/error).
- ❌ Tidak ada page
/admin/systemuntuk lihat status. - ❌ Tidak ada alert deteksi state stale (Payment AWAITING > 24h, Payout HELD past
heldUntil, Refund APPROVED > 7d).
Phase 1 — Cron Run Log ⏳
Tabel CronRun yang dicatat setiap kali cron jalan. Foundation untuk semua observability.
Keputusan asumsi:
- Append-only model. Retention: keep all (tabel kecil, ~365 rows/year/cron). Cleanup nanti kalau perlu.
- Wrap existing cron handler dengan helper
runCron(name, fn)yang otomatis log start/finish/error. - Tidak pakai job library (BullMQ/Inngest) — overkill. Tetap pakai system cron + Next route handler.
| # | Item | Status | File |
|---|---|---|---|
| 1.1 | Model CronRun { id, jobName, startedAt, finishedAt?, status (RUNNING/SUCCESS/FAILED), errorMessage?, payload? Json } + migration |
⏳ | prisma/schema.prisma |
| 1.2 | Helper runCron(jobName, fn) — wrap handler, otomatis create RUNNING row → SUCCESS/FAILED |
⏳ | lib/cron-runner.ts |
| 1.3 | Wire runCron di app/api/cron/auto-complete-trips/route.ts |
⏳ | app/api/cron/auto-complete-trips/route.ts |
| 1.4 | Wire runCron di cron payout release (kalau sudah ada — kalau belum, daftar sebagai gap) |
⏳ | TBD |
| 1.5 | Wire runCron di cron lain (refund sweep, dst) |
⏳ | TBD |
Tindakan manual: tidak ada.
Phase 2 — System Status Page ⏳
Page /admin/system yang tampilkan kondisi terkini.
Keputusan asumsi:
- Tabel per cron job: last run, last success, total runs (7d), error count (7d).
- Refresh manual (tombol "Refresh") — bukan auto-poll. Cukup untuk admin.
- Health badge: 🟢 OK (last success < 25 jam untuk daily), 🟡 STALE (> 25 jam), 🔴 FAILED (last run = FAILED).
- Tampilkan 20 cron run terbaru di table bawah untuk drill-down.
| # | Item | Status | File |
|---|---|---|---|
| 2.1 | cronRepo.getJobSummary(jobName) — last run, last success, count 7d |
⏳ | server/repositories/cron.repo.ts |
| 2.2 | cronRepo.listRecent(limit) — 20 run terakhir lintas job |
⏳ | server/repositories/cron.repo.ts |
| 2.3 | Page /admin/system — tabel job summary + tabel recent runs |
⏳ | app/admin/system/page.tsx |
| 2.4 | Health badge logic (helper) | ⏳ | lib/cron-health.ts |
| 2.5 | Link "System" di admin navbar | ⏳ | app/admin/layout.tsx |
Tindakan manual:
- Set ekspektasi SLA per cron (mis.
auto-complete-tripsharus jalan setiap hari sebelum jam 06:00 WIB). - Brief admin: cek
/admin/systemminimal sekali per hari pagi sebelum mulai kerja.
Phase 3 — Stale State Alerts ⏳
Deteksi entity yang nyangkut di state non-final terlalu lama. Tampilkan sebagai banner di /admin/system.
Keputusan asumsi:
- Stale thresholds (review dengan stakeholder, ini draft):
- Payment status
PENDING> 1 jam → suspect: gagal create Snap token, perlu manual cleanup - Payment status
AWAITING> 25 jam (lebih dari expiresAt) → suspect: webhook gagal, expire belum di-set, perlu reconcile - Booking status
AWAITING_PAY+ trip date < today → suspect: peserta lupa bayar, butuh cleanup - Payout status
HELD+heldUntil < now> 1 hari → suspect: cron release tidak jalan, perlu trigger manual - Refund status
APPROVED> 7 hari → suspect: admin lupa proses, atau Midtrans refund gagal
- Payment status
- Compute via query parameter pada page load — tidak perlu materialized view.
- Setiap kategori tampilkan jumlah + link ke filtered list page yang relevan.
| # | Item | Status | File |
|---|---|---|---|
| 3.1 | systemHealthService.detectStale() return { stalePayments, expiredAwaiting, awaitingPayPastDeparture, overduePayouts, stuckRefunds } |
⏳ | server/services/system-health.service.ts |
| 3.2 | Banner alerts di /admin/system kalau ada count > 0 |
⏳ | app/admin/system/page.tsx |
| 3.3 | Link tiap alert ke filtered list (pakai filter di ADMIN_AUDIT_ROADMAP.md Phase 1) | ⏳ | app/admin/system/page.tsx |
| 3.4 | Stat card di dashboard utama /admin kalau ada alert |
⏳ | app/admin/page.tsx |
Tindakan manual:
- Tuning threshold setelah jalan 1-2 minggu (false positive vs miss).
- SOP per alert: action apa yang admin harus ambil saat banner muncul.
Phase 4 — External Alerting (opsional) ⏳
Push notif ke channel eksternal (Discord/Telegram/email) saat ada cron FAILED atau stale state critical. Skip kecuali admin sering miss banner.
| # | Item | Status | File |
|---|---|---|---|
| 4.1 | Helper notifyAdmins(message) — POST ke Discord webhook URL dari env |
⏳ | lib/admin-notify.ts |
| 4.2 | Trigger notify di runCron saat FAILED |
⏳ | lib/cron-runner.ts |
| 4.3 | Trigger notify dari systemHealthService.detectStale (rate-limited, max 1x/hari per kategori) |
⏳ | server/services/system-health.service.ts |
Tindakan manual:
- Buat channel Discord internal + webhook URL → set env
ADMIN_ALERT_WEBHOOK_URL. - Test alert dengan trigger fake fail.