Error Logs

Server-side error monitoring surfaced on the admin dashboard at /admin-dashboard/logs. Captures unhandled errors from instrumented catch blocks across the codebase: Stripe webhooks, billing checkout & license flows, auth (magic link + OAuth callback), AI streaming, jobs runner, RAG document processing, and every admin mutation. The system is feature-gated via the LOGS_ENABLED environment variable (server-only, no NEXT_PUBLIC_ prefix) and surfaces as 404 when disabled — the dashboard link disappears and the logger becomes a no-op.

Feature Gate

Set LOGS_ENABLED=true to activate writes and make the admin page reachable. The flag is read once inside config/app.ts and exposed as appConfig.logs.enabled. Components and routes must read the helper — never process.env directly. When disabled, the several hundred instrumented catch blocks across the codebase simply skip logging (zero cost).

Configuration (`config/app.ts`)

Field	Purpose
`logs.enabled`	Env-driven (`LOGS_ENABLED=true`). When false, `logError()` is a no-op and the admin page returns 404.
`logs.retentionDays`	Env-driven (`LOGS_RETENTION_DAYS`, default 90). The `purge-error-logs` job handler deletes rows older than this value daily.
`logs.salt`	Env-driven (`LOGS_SALT`). Salt for hashing client IPs before storage. MUST be set in production.

What Gets Logged

Only errors — never successful requests. Each entry captures:

level: error (default) or critical (reserved for things that should page oncall — signature verification failures, webhook handler crashes, jobs runner crashes, unhandled auth errors)
category: coarse grouping — stripe_webhook, auth, billing, ai, rag, jobs, push, newsletter, contact, referrals, admin
event: specific event name (e.g. checkout_session_completed_failed, stream_failed:chat, process_document_failed)
route, method, status_code: request context when available
actor_user_id, account_id: populated when known (both nullable — cron jobs and webhooks may have no user)
message: redacted error message (max 1000 chars)
stack_redacted: redacted, truncated stack trace (max 2000 chars)
ip_hash: SHA-256 of client IP salted with LOGS_SALT — never raw IPs
metadata JSONB: structured extras (Stripe event id, job id, session id, model id, etc.)
created_at: timestamp

Secret Redaction

The logger runs every message and stack trace through a redaction pass before the row is inserted, so secrets never reach the database. Stripe keys (sk_*, whsec_*, rk_*, pk_*), Postgres connection strings, Bearer tokens, JWTs, and patterns matching api_key=... are replaced with *** placeholders. Stack traces are capped at 2000 chars. This complements the repo-wide anti-pattern rule D2 (no raw error messages in audit rows).

Writer: `logError()` / `logApiError()`

Two call styles, both fire-and-forget (the caller must not await):

import { logError, logApiError } from '@/lib/logging'

// Full-control form — use inside domain code, jobs, cron handlers, streaming
logError({
  category: 'ai',
  event: 'stream_failed:chat',
  level: 'error',
  actorUserId: user.id,
  accountId,
  route: '/api/ai/stream',
  error,
  metadata: { agent_id: agentId, model_id, tokens_used },
})

// Short form for route handlers — auto-fills route/method/IP from the request
logApiError(request, 'billing', 'checkout_failed', error, {
  actorUserId: ctx.user?.id ?? null,
})

Both are wrapped in an internal try/catch around the Supabase insert, so logging itself can never throw or block the caller. If the DB write fails it's swallowed in production and logged to console in development.

Do not pair these helpers with a bare console.error(error) in production paths. The structured helpers perform secret redaction before persistence; raw console output does not, and can leak connection strings, Bearer tokens, provider keys, or stack traces into platform log aggregators. If a local debugging hint is useful, gate it behind process.env.NODE_ENV !== 'production' and log only a short sanitized message, following the Stripe webhook pattern.

Admin Dashboard

Accessible at /admin-dashboard/logs (requires is_admin = true). Features:

Stat cards: total rows, last 24h, last 7d, critical in last 24h (all via head: true counts, parallelized with Promise.all)
Filters: level (error / critical), category dropdown (distinct values from the table), free-text search on message/event, from/to datetime range
Cursor pagination: stable scroll even as new rows arrive (URL-driven via beforeCreatedAt)
Expandable details: redacted stack trace and metadata JSON inline per row
Force-dynamic: always fresh — an oncall engineer refreshing the page never sees cached data

Database

Table error_logs with RLS enabled and zero policies — writes go through the service-role admin client in lib/logging/logger.ts, and reads happen in core/logs/queries.ts via createServiceClient() (admin is already gated by apiSecurity.admin()). Authenticated and anonymous clients see nothing.

Indexes on the error_logs table (defined in supabase/schema.sql):

(created_at DESC) — main timeline
(category, created_at DESC) — filter + sort composite
(level, created_at DESC) — filter + sort composite
(account_id, created_at DESC) partial index
(actor_user_id, created_at DESC) partial index

Retention & Purging

The purge-error-logs job handler deletes rows older than LOGS_RETENTION_DAYS via the purge_error_logs(n) SECURITY DEFINER RPC. The RPC is revoked from authenticated, anon and granted only to service_role, with an auth.role() guard as defense-in-depth. Register the job in the admin dashboard with a cron like 0 3 * * * (daily 3am) and it will keep the table size bounded.

Environment Variables

Variable	Purpose
`LOGS_ENABLED`	Server-only feature gate. `'true'` to enable. Default `false`.
`LOGS_RETENTION_DAYS`	Retention period in days (minimum 1, recommended 30–90). Default `90`.
`LOGS_SALT`	Server-only 64-hex random salt used to hash client IPs before storage. Generate with `node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"`.

Invariants

error_logs RLS is enabled with zero policies — writes are service-role only; authenticated/anon clients see nothing
All writes go through the logError() helper, which redacts secrets before insert
Production catch blocks must not duplicate structured logging with raw console.error(error); use the helper only, or a development-only sanitized console hint
IPs are stored as SHA-256 hashes salted with LOGS_SALT — never raw
The logger never throws or blocks the caller — every failure mode is contained
Only unhandled / unexpected errors are logged; typed domain errors (ReferralError, DocumentError, LogsError) are treated as business-logic outcomes and skipped
The feature gate returns 404 (not 403) so the surface is invisible when disabled — same pattern as the referral system
The admin list API is read-only — no CSRF needed; rate-limited via apiSecurity.admin() (5/min)
Typed errors inside core/logs/ (LogsError, LogsErrorCode) map query failures to stable response codes