RAG Document Chat

Upload documents (TXT, MD, PDF) and chat with them using the Knowledge Base AI agent. Documents are chunked, embedded with a configurable OpenAI embedding model, and vector similarity search runs inside Postgres via the membership-enforced match_document_chunks_text SECURITY DEFINER RPC over a pgvector HNSW index (an earlier raw pg.Pool path was removed because it bypassed the RPC's membership check). The match cutoff is configurable via embeddingConfig.ragMatchThreshold (default 0.1 in config/ai.ts). Credits are deducted 1:1 with the embedding tokens consumed. PDF text extraction uses unpdf.

📄

Document Upload

Upload TXT, MD, PDF files (10MB max). Drag-and-drop or file picker. Dedicated page at /private-dashboard/documents in the sidebar under AI Agents.

✂️

Text Chunking

Automatic splitting into 1000-char chunks with 200-char overlap. Max 500 chunks per document to prevent DoS.

🔍

Similarity Search

Vector similarity runs inside Postgres via the membership-enforced match_document_chunks_text SECURITY DEFINER RPC over a pgvector HNSW index (<=> operator). Threshold is configurable via embeddingConfig.ragMatchThreshold (default 0.1 in config/ai.ts). No raw connection pool — an earlier direct-pg.Pool path was removed because it bypassed the RPC's membership check.

🤖

Knowledge Base Agent

Dedicated RAG agent that retrieves relevant document chunks and injects them into the user message (not system prompt). Uses a pre-fetch pattern via prepareRAGContext() called from the stream route. Cites sources as [Document N].

RAG Agent Integration

The Knowledge Base agent (lib/ai/agents/rag/index.ts) is fully integrated into the AI chat system. It uses a pre-fetch pattern — the stream route calls prepareRAGContext() before streaming, which caches the document context. Then buildMessages() injects it into the user message.

  1. Stream route calls agent.prepareRAGContext(accountId, message) before streaming
  2. Fetches all chunks for the account, generates query embedding via OpenAI
  3. Runs vector similarity search via the match_document_chunks_text Postgres RPC (pgvector HNSW index, membership enforced inside the function — no raw connection pool)
  4. Filters by threshold (configurable via embeddingConfig.ragMatchThreshold, default 0.1) and keeps top 5 matches
  5. buildMessages() injects cached context into the user message (not system prompt) wrapped in XML tags (<document_chunk>) for prompt injection protection
  6. The LLM responds using the document context, citing sources like [Document 1]

The agent is registered in lib/ai/agents/index.ts and appears in the chat agent selector dropdown with a rose/pink color scheme.

PDF support: PDF text extraction uses unpdf (lazy-loaded to avoid build-time errors). TXT and MD files are read as plain text.

Configurable Embedding Models

The embedding model used for document processing is configurable in config/ai.ts via the embeddingConfig object:

ModelDimensionsCost / 1K tokensNotes
text-embedding-3-small1536$0.002Default. Best cost/quality ratio.
text-embedding-3-large3072$0.013Higher quality, higher cost.
text-embedding-ada-0021536$0.01Legacy model.

To change the default model, update embeddingConfig.defaultModel in config/ai.ts. The model used is stored in documents.metadata.embedding_model and document_chunks.metadata.embedding_model for traceability.

Important: If you switch embedding models, existing document chunks will use the old model's vectors. You should re-process affected documents to ensure consistent similarity search results.

Credit Deduction on Upload

1 credit = 1 LLM token. Credits are deducted from the account based on the actual usage.total_tokens reported by the OpenAI embeddings API across all chunks of the document, summed and decremented in a single decrement_credits RPC after the loop completes.

Total credits = sum of usage.total_tokens for each chunk embedding call

For example, a 10-page document producing 50 chunks of ~250 tokens each will cost roughly 12,500 credits, billed exactly to the token. A pre-flight check ensures the account has at least aiConfig.minCreditsRequired (default: 100) before calling OpenAI; the post-loop decrement is best-effort — if a concurrent request races the balance below the required tokens, the failure is logged via logError but the document remains indexed.

Config KeyDefaultDescription
embeddingConfig.defaultModeltext-embedding-3-smallDefault embedding model for new documents
aiConfig.minCreditsRequired100Minimum balance for the pre-flight check before any LLM call (chat or RAG)

Knowledge Base Page

The document manager is accessible at /private-dashboard/documents and appears in the sidebar as "Knowledge Base" (right below "AI Agents"). The page provides:

  • Drag-and-drop file upload with progress tracking
  • Real-time status polling (pending → processing → ready)
  • Document list with status badges, chunk counts, and file sizes
  • Delete with confirmation and full cleanup (chunks + storage)
Database tables: documents (metadata + status + embedding model), document_chunks (content + vector embeddings + model metadata). Both with membership-based RLS.

API Routes

API RouteMethodDescription
/api/documentsGETList documents for account
/api/documentsPOSTUpload document (multipart, 10MB, strict rate limit). Triggers async processing with credit deduction.
/api/documentsDELETEDelete document + chunks + storage file
/api/documents/[id]GETDocument details + chunk summaries

File Structure

lib/rag/
└── index.ts                 # chunkText, generateEmbedding, processDocument,
                             # searchDocuments (match_document_chunks_text RPC),
                             # buildRAGContext, parsePdf (unpdf)

lib/ai/agents/rag/
└── index.ts                 # RAGAgent class (extends BaseAgent)
                             # prepareRAGContext() + buildMessages() pattern

config/ai.ts
├── embeddingModels          # Available embedding models with dimensions + cost
└── embeddingConfig          # Default embedding model (credits are 1:1 with tokens, no flat rate)

app/api/documents/
├── route.ts                 # GET (list), POST (upload), DELETE (remove)
└── [id]/route.ts            # GET (details + chunk summaries)

app/[locale]/(private)/private-dashboard/documents/
└── page.tsx                 # Knowledge Base page (server component)

components/documents/
└── document-manager.tsx     # Upload UI with drag-and-drop + status polling