# Setrip — Admin System Health Roadmap Admin perlu visibilitas atas job otomatis (cron) dan deteksi state yang nyangkut (Payment stale, Payout overdue, Refund mandek). > **Skenario nyata:** cron auto-complete trip crash karena env variable rusak. 50 trip yang sudah lewat tanggalnya tetap `OPEN` selama 3 hari sampai peserta komplain "kenapa belum bisa kasih review". Admin tidak punya cara cek status cron tanpa SSH ke server. --- ## Baseline - ✅ Cron infra ada (system crontab + `CRON_SECRET`) — lihat [docs/CRON_SETUP.md](docs/CRON_SETUP.md). - ✅ Cron jobs aktif: `app/api/cron/auto-complete-trips/route.ts`, payout release (lewat `payoutService`), kemungkinan refund timeout. - ❌ Tidak ada log/audit per cron run (success/fail/error). - ❌ Tidak ada page `/admin/system` untuk lihat status. - ❌ Tidak ada alert deteksi state stale (Payment AWAITING > 24h, Payout HELD past `heldUntil`, Refund APPROVED > 7d). --- ## Phase 1 — Cron Run Log ⏳ Tabel `CronRun` yang dicatat setiap kali cron jalan. Foundation untuk semua observability. **Keputusan asumsi:** - Append-only model. Retention: keep all (tabel kecil, ~365 rows/year/cron). Cleanup nanti kalau perlu. - Wrap existing cron handler dengan helper `runCron(name, fn)` yang otomatis log start/finish/error. - Tidak pakai job library (BullMQ/Inngest) — overkill. Tetap pakai system cron + Next route handler. | # | Item | Status | File | |---|---|---|---| | 1.1 | Model `CronRun { id, jobName, startedAt, finishedAt?, status (RUNNING/SUCCESS/FAILED), errorMessage?, payload? Json }` + migration | ⏳ | [prisma/schema.prisma](prisma/schema.prisma) | | 1.2 | Helper `runCron(jobName, fn)` — wrap handler, otomatis create RUNNING row → SUCCESS/FAILED | ⏳ | `lib/cron-runner.ts` | | 1.3 | Wire `runCron` di `app/api/cron/auto-complete-trips/route.ts` | ⏳ | `app/api/cron/auto-complete-trips/route.ts` | | 1.4 | Wire `runCron` di cron payout release (kalau sudah ada — kalau belum, daftar sebagai gap) | ⏳ | TBD | | 1.5 | Wire `runCron` di cron lain (refund sweep, dst) | ⏳ | TBD | **Tindakan manual:** tidak ada. --- ## Phase 2 — System Status Page ⏳ Page `/admin/system` yang tampilkan kondisi terkini. **Keputusan asumsi:** - Tabel per cron job: last run, last success, total runs (7d), error count (7d). - Refresh manual (tombol "Refresh") — bukan auto-poll. Cukup untuk admin. - Health badge: 🟢 OK (last success < 25 jam untuk daily), 🟡 STALE (> 25 jam), 🔴 FAILED (last run = FAILED). - Tampilkan 20 cron run terbaru di table bawah untuk drill-down. | # | Item | Status | File | |---|---|---|---| | 2.1 | `cronRepo.getJobSummary(jobName)` — last run, last success, count 7d | ⏳ | `server/repositories/cron.repo.ts` | | 2.2 | `cronRepo.listRecent(limit)` — 20 run terakhir lintas job | ⏳ | `server/repositories/cron.repo.ts` | | 2.3 | Page `/admin/system` — tabel job summary + tabel recent runs | ⏳ | `app/admin/system/page.tsx` | | 2.4 | Health badge logic (helper) | ⏳ | `lib/cron-health.ts` | | 2.5 | Link "System" di admin navbar | ⏳ | [app/admin/layout.tsx](app/admin/layout.tsx) | **Tindakan manual:** 1. Set ekspektasi SLA per cron (mis. `auto-complete-trips` harus jalan setiap hari sebelum jam 06:00 WIB). 2. Brief admin: cek `/admin/system` minimal sekali per hari pagi sebelum mulai kerja. --- ## Phase 3 — Stale State Alerts ⏳ Deteksi entity yang nyangkut di state non-final terlalu lama. Tampilkan sebagai banner di `/admin/system`. **Keputusan asumsi:** - Stale thresholds (review dengan stakeholder, ini draft): - Payment status `PENDING` > 1 jam → suspect: gagal create Snap token, perlu manual cleanup - Payment status `AWAITING` > 25 jam (lebih dari expiresAt) → suspect: webhook gagal, expire belum di-set, perlu reconcile - Booking status `AWAITING_PAY` + trip date < today → suspect: peserta lupa bayar, butuh cleanup - Payout status `HELD` + `heldUntil < now` > 1 hari → suspect: cron release tidak jalan, perlu trigger manual - Refund status `APPROVED` > 7 hari → suspect: admin lupa proses, atau Midtrans refund gagal - Compute via query parameter pada page load — tidak perlu materialized view. - Setiap kategori tampilkan jumlah + link ke filtered list page yang relevan. | # | Item | Status | File | |---|---|---|---| | 3.1 | `systemHealthService.detectStale()` return `{ stalePayments, expiredAwaiting, awaitingPayPastDeparture, overduePayouts, stuckRefunds }` | ⏳ | `server/services/system-health.service.ts` | | 3.2 | Banner alerts di `/admin/system` kalau ada count > 0 | ⏳ | `app/admin/system/page.tsx` | | 3.3 | Link tiap alert ke filtered list (pakai filter di [ADMIN_AUDIT_ROADMAP.md](ADMIN_AUDIT_ROADMAP.md) Phase 1) | ⏳ | `app/admin/system/page.tsx` | | 3.4 | Stat card di dashboard utama `/admin` kalau ada alert | ⏳ | [app/admin/page.tsx](app/admin/page.tsx) | **Tindakan manual:** 1. Tuning threshold setelah jalan 1-2 minggu (false positive vs miss). 2. SOP per alert: action apa yang admin harus ambil saat banner muncul. --- ## Phase 4 — External Alerting (opsional) ⏳ Push notif ke channel eksternal (Discord/Telegram/email) saat ada cron FAILED atau stale state critical. Skip kecuali admin sering miss banner. | # | Item | Status | File | |---|---|---|---| | 4.1 | Helper `notifyAdmins(message)` — POST ke Discord webhook URL dari env | ⏳ | `lib/admin-notify.ts` | | 4.2 | Trigger notify di `runCron` saat FAILED | ⏳ | `lib/cron-runner.ts` | | 4.3 | Trigger notify dari `systemHealthService.detectStale` (rate-limited, max 1x/hari per kategori) | ⏳ | `server/services/system-health.service.ts` | **Tindakan manual:** 1. Buat channel Discord internal + webhook URL → set env `ADMIN_ALERT_WEBHOOK_URL`. 2. Test alert dengan trigger fake fail.