- ✅
- ✅ - ✅ - ✅
This commit is contained in:
@@ -0,0 +1,103 @@
|
||||
# Setrip — Admin System Health Roadmap
|
||||
|
||||
Admin perlu visibilitas atas job otomatis (cron) dan deteksi state yang nyangkut (Payment stale, Payout overdue, Refund mandek).
|
||||
|
||||
> **Skenario nyata:** cron auto-complete trip crash karena env variable rusak. 50 trip yang sudah lewat tanggalnya tetap `OPEN` selama 3 hari sampai peserta komplain "kenapa belum bisa kasih review". Admin tidak punya cara cek status cron tanpa SSH ke server.
|
||||
|
||||
---
|
||||
|
||||
## Baseline
|
||||
|
||||
- ✅ Cron infra ada (system crontab + `CRON_SECRET`) — lihat [docs/CRON_SETUP.md](docs/CRON_SETUP.md).
|
||||
- ✅ Cron jobs aktif: `app/api/cron/auto-complete-trips/route.ts`, payout release (lewat `payoutService`), kemungkinan refund timeout.
|
||||
- ❌ Tidak ada log/audit per cron run (success/fail/error).
|
||||
- ❌ Tidak ada page `/admin/system` untuk lihat status.
|
||||
- ❌ Tidak ada alert deteksi state stale (Payment AWAITING > 24h, Payout HELD past `heldUntil`, Refund APPROVED > 7d).
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Cron Run Log ⏳
|
||||
|
||||
Tabel `CronRun` yang dicatat setiap kali cron jalan. Foundation untuk semua observability.
|
||||
|
||||
**Keputusan asumsi:**
|
||||
- Append-only model. Retention: keep all (tabel kecil, ~365 rows/year/cron). Cleanup nanti kalau perlu.
|
||||
- Wrap existing cron handler dengan helper `runCron(name, fn)` yang otomatis log start/finish/error.
|
||||
- Tidak pakai job library (BullMQ/Inngest) — overkill. Tetap pakai system cron + Next route handler.
|
||||
|
||||
| # | Item | Status | File |
|
||||
|---|---|---|---|
|
||||
| 1.1 | Model `CronRun { id, jobName, startedAt, finishedAt?, status (RUNNING/SUCCESS/FAILED), errorMessage?, payload? Json }` + migration | ⏳ | [prisma/schema.prisma](prisma/schema.prisma) |
|
||||
| 1.2 | Helper `runCron(jobName, fn)` — wrap handler, otomatis create RUNNING row → SUCCESS/FAILED | ⏳ | `lib/cron-runner.ts` |
|
||||
| 1.3 | Wire `runCron` di `app/api/cron/auto-complete-trips/route.ts` | ⏳ | `app/api/cron/auto-complete-trips/route.ts` |
|
||||
| 1.4 | Wire `runCron` di cron payout release (kalau sudah ada — kalau belum, daftar sebagai gap) | ⏳ | TBD |
|
||||
| 1.5 | Wire `runCron` di cron lain (refund sweep, dst) | ⏳ | TBD |
|
||||
|
||||
**Tindakan manual:** tidak ada.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — System Status Page ⏳
|
||||
|
||||
Page `/admin/system` yang tampilkan kondisi terkini.
|
||||
|
||||
**Keputusan asumsi:**
|
||||
- Tabel per cron job: last run, last success, total runs (7d), error count (7d).
|
||||
- Refresh manual (tombol "Refresh") — bukan auto-poll. Cukup untuk admin.
|
||||
- Health badge: 🟢 OK (last success < 25 jam untuk daily), 🟡 STALE (> 25 jam), 🔴 FAILED (last run = FAILED).
|
||||
- Tampilkan 20 cron run terbaru di table bawah untuk drill-down.
|
||||
|
||||
| # | Item | Status | File |
|
||||
|---|---|---|---|
|
||||
| 2.1 | `cronRepo.getJobSummary(jobName)` — last run, last success, count 7d | ⏳ | `server/repositories/cron.repo.ts` |
|
||||
| 2.2 | `cronRepo.listRecent(limit)` — 20 run terakhir lintas job | ⏳ | `server/repositories/cron.repo.ts` |
|
||||
| 2.3 | Page `/admin/system` — tabel job summary + tabel recent runs | ⏳ | `app/admin/system/page.tsx` |
|
||||
| 2.4 | Health badge logic (helper) | ⏳ | `lib/cron-health.ts` |
|
||||
| 2.5 | Link "System" di admin navbar | ⏳ | [app/admin/layout.tsx](app/admin/layout.tsx) |
|
||||
|
||||
**Tindakan manual:**
|
||||
1. Set ekspektasi SLA per cron (mis. `auto-complete-trips` harus jalan setiap hari sebelum jam 06:00 WIB).
|
||||
2. Brief admin: cek `/admin/system` minimal sekali per hari pagi sebelum mulai kerja.
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — Stale State Alerts ⏳
|
||||
|
||||
Deteksi entity yang nyangkut di state non-final terlalu lama. Tampilkan sebagai banner di `/admin/system`.
|
||||
|
||||
**Keputusan asumsi:**
|
||||
- Stale thresholds (review dengan stakeholder, ini draft):
|
||||
- Payment status `PENDING` > 1 jam → suspect: gagal create Snap token, perlu manual cleanup
|
||||
- Payment status `AWAITING` > 25 jam (lebih dari expiresAt) → suspect: webhook gagal, expire belum di-set, perlu reconcile
|
||||
- Booking status `AWAITING_PAY` + trip date < today → suspect: peserta lupa bayar, butuh cleanup
|
||||
- Payout status `HELD` + `heldUntil < now` > 1 hari → suspect: cron release tidak jalan, perlu trigger manual
|
||||
- Refund status `APPROVED` > 7 hari → suspect: admin lupa proses, atau Midtrans refund gagal
|
||||
- Compute via query parameter pada page load — tidak perlu materialized view.
|
||||
- Setiap kategori tampilkan jumlah + link ke filtered list page yang relevan.
|
||||
|
||||
| # | Item | Status | File |
|
||||
|---|---|---|---|
|
||||
| 3.1 | `systemHealthService.detectStale()` return `{ stalePayments, expiredAwaiting, awaitingPayPastDeparture, overduePayouts, stuckRefunds }` | ⏳ | `server/services/system-health.service.ts` |
|
||||
| 3.2 | Banner alerts di `/admin/system` kalau ada count > 0 | ⏳ | `app/admin/system/page.tsx` |
|
||||
| 3.3 | Link tiap alert ke filtered list (pakai filter di [ADMIN_AUDIT_ROADMAP.md](ADMIN_AUDIT_ROADMAP.md) Phase 1) | ⏳ | `app/admin/system/page.tsx` |
|
||||
| 3.4 | Stat card di dashboard utama `/admin` kalau ada alert | ⏳ | [app/admin/page.tsx](app/admin/page.tsx) |
|
||||
|
||||
**Tindakan manual:**
|
||||
1. Tuning threshold setelah jalan 1-2 minggu (false positive vs miss).
|
||||
2. SOP per alert: action apa yang admin harus ambil saat banner muncul.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — External Alerting (opsional) ⏳
|
||||
|
||||
Push notif ke channel eksternal (Discord/Telegram/email) saat ada cron FAILED atau stale state critical. Skip kecuali admin sering miss banner.
|
||||
|
||||
| # | Item | Status | File |
|
||||
|---|---|---|---|
|
||||
| 4.1 | Helper `notifyAdmins(message)` — POST ke Discord webhook URL dari env | ⏳ | `lib/admin-notify.ts` |
|
||||
| 4.2 | Trigger notify di `runCron` saat FAILED | ⏳ | `lib/cron-runner.ts` |
|
||||
| 4.3 | Trigger notify dari `systemHealthService.detectStale` (rate-limited, max 1x/hari per kategori) | ⏳ | `server/services/system-health.service.ts` |
|
||||
|
||||
**Tindakan manual:**
|
||||
1. Buat channel Discord internal + webhook URL → set env `ADMIN_ALERT_WEBHOOK_URL`.
|
||||
2. Test alert dengan trigger fake fail.
|
||||
Reference in New Issue
Block a user