RAG Document Chat
Upload documents (TXT, MD, PDF) and chat with them using the Knowledge Base AI agent. Documents are chunked, embedded with a configurable OpenAI embedding model, and vector similarity search runs inside Postgres via the membership-enforced match_document_chunks_text SECURITY DEFINER RPC over a pgvector HNSW index (an earlier raw pg.Pool path was removed because it bypassed the RPC's membership check). The match cutoff is configurable via embeddingConfig.ragMatchThreshold (default 0.1 in config/ai.ts). Credits are deducted 1:1 with the embedding tokens consumed. PDF text extraction uses unpdf.
Document Upload
Upload TXT, MD, PDF files (10MB max). Drag-and-drop or file picker. Dedicated page at /private-dashboard/documents in the sidebar under AI Agents.
Text Chunking
Automatic splitting into 1000-char chunks with 200-char overlap. Max 500 chunks per document to prevent DoS.
Similarity Search
Vector similarity runs inside Postgres via the membership-enforced match_document_chunks_text SECURITY DEFINER RPC over a pgvector HNSW index (<=> operator). Threshold is configurable via embeddingConfig.ragMatchThreshold (default 0.1 in config/ai.ts). No raw connection pool — an earlier direct-pg.Pool path was removed because it bypassed the RPC's membership check.
Knowledge Base Agent
Dedicated RAG agent that retrieves relevant document chunks and injects them into the user message (not system prompt). Uses a pre-fetch pattern via prepareRAGContext() called from the stream route. Cites sources as [Document N].
RAG Agent Integration
The Knowledge Base agent (lib/ai/agents/rag/index.ts) is fully integrated into the AI chat system. It uses a pre-fetch pattern — the stream route calls prepareRAGContext() before streaming, which caches the document context. Then buildMessages() injects it into the user message.
- Stream route calls
agent.prepareRAGContext(accountId, message)before streaming - Fetches all chunks for the account, generates query embedding via OpenAI
- Runs vector similarity search via the
match_document_chunks_textPostgres RPC (pgvector HNSW index, membership enforced inside the function — no raw connection pool) - Filters by threshold (configurable via
embeddingConfig.ragMatchThreshold, default0.1) and keeps top 5 matches buildMessages()injects cached context into the user message (not system prompt) wrapped in XML tags (<document_chunk>) for prompt injection protection- The LLM responds using the document context, citing sources like
[Document 1]
The agent is registered in lib/ai/agents/index.ts and appears in the chat agent selector dropdown with a rose/pink color scheme.
unpdf (lazy-loaded to avoid build-time errors). TXT and MD files are read as plain text.
Configurable Embedding Models
The embedding model used for document processing is configurable in config/ai.ts via the embeddingConfig object:
| Model | Dimensions | Cost / 1K tokens | Notes |
|---|---|---|---|
text-embedding-3-small | 1536 | $0.002 | Default. Best cost/quality ratio. |
text-embedding-3-large | 3072 | $0.013 | Higher quality, higher cost. |
text-embedding-ada-002 | 1536 | $0.01 | Legacy model. |
To change the default model, update embeddingConfig.defaultModel in config/ai.ts. The model used is stored in documents.metadata.embedding_model and document_chunks.metadata.embedding_model for traceability.
Credit Deduction on Upload
1 credit = 1 LLM token. Credits are deducted from the account based on the actual usage.total_tokens reported by the OpenAI embeddings API across all chunks of the document, summed and decremented in a single decrement_credits RPC after the loop completes.
Total credits = sum of usage.total_tokens for each chunk embedding call
For example, a 10-page document producing 50 chunks of ~250 tokens each will cost roughly 12,500 credits, billed exactly to the token. A pre-flight check ensures the account has at least aiConfig.minCreditsRequired (default: 100) before calling OpenAI; the post-loop decrement is best-effort — if a concurrent request races the balance below the required tokens, the failure is logged via logError but the document remains indexed.
| Config Key | Default | Description |
|---|---|---|
embeddingConfig.defaultModel | text-embedding-3-small | Default embedding model for new documents |
aiConfig.minCreditsRequired | 100 | Minimum balance for the pre-flight check before any LLM call (chat or RAG) |
Knowledge Base Page
The document manager is accessible at /private-dashboard/documents and appears in the sidebar as "Knowledge Base" (right below "AI Agents"). The page provides:
- Drag-and-drop file upload with progress tracking
- Real-time status polling (pending → processing → ready)
- Document list with status badges, chunk counts, and file sizes
- Delete with confirmation and full cleanup (chunks + storage)
documents (metadata + status + embedding model), document_chunks (content + vector embeddings + model metadata). Both with membership-based RLS.
API Routes
| API Route | Method | Description |
|---|---|---|
/api/documents | GET | List documents for account |
/api/documents | POST | Upload document (multipart, 10MB, strict rate limit). Triggers async processing with credit deduction. |
/api/documents | DELETE | Delete document + chunks + storage file |
/api/documents/[id] | GET | Document details + chunk summaries |
File Structure
lib/rag/
└── index.ts # chunkText, generateEmbedding, processDocument,
# searchDocuments (match_document_chunks_text RPC),
# buildRAGContext, parsePdf (unpdf)
lib/ai/agents/rag/
└── index.ts # RAGAgent class (extends BaseAgent)
# prepareRAGContext() + buildMessages() pattern
config/ai.ts
├── embeddingModels # Available embedding models with dimensions + cost
└── embeddingConfig # Default embedding model (credits are 1:1 with tokens, no flat rate)
app/api/documents/
├── route.ts # GET (list), POST (upload), DELETE (remove)
└── [id]/route.ts # GET (details + chunk summaries)
app/[locale]/(private)/private-dashboard/documents/
└── page.tsx # Knowledge Base page (server component)
components/documents/
└── document-manager.tsx # Upload UI with drag-and-drop + status polling