Case Study: LovEdu - Production RAG Platform for Kuwait University
Executive Summary
Kuwait University students navigate a fragmented academic reality: course material uploaded as dense PDFs, institutional regulations buried in policy documents, and no reliable way to extract accurate answers from either. Generic AI tools like ChatGPT hallucinate curriculum-specific facts, fabricate university regulations, and have no awareness of what a given professor actually taught. The gap between what students need and what publicly available AI provides is not a product gap - it is an engineering gap.
Seven Labs designed and deployed LovEdu, a production-grade AI education platform purpose-built for Kuwait University. The system implements a multi-stage Retrieval-Augmented Generation pipeline with hybrid semantic and keyword search, Cohere multilingual reranking, comprehensive query detection, follow-up intelligence, and token-budgeted context management. All services run inside an isolated Docker network on Coolify, with Weaviate handling vector and BM25 search and MongoDB persisting every message, chunk, and system prompt permanently.
The result is an AI that answers from actual course material uploaded by the professor - not from training data - and does so accurately in both Arabic and English. For a detailed look at our RAG engineering approach, see /services/ai-platforms and our post on /blogs/why-rag-pipelines-fail.
Business Problem
Kuwait University students face a specific set of academic pain points that no general-purpose AI tool addresses:
-
Course Material Is Inaccessible at Scale: Professors upload 200-page PDFs covering an entire semester. Students cannot search across that material effectively. Keyword search returns nothing useful for conceptual questions. Re-reading entire documents to find one answer is not viable before an exam.
-
Hallucination Risk Is Unacceptable in an Academic Context: A student asking ChatGPT about KU's grade appeal policy will receive a plausible-sounding fabrication. Acting on it has real consequences - failed appeals, missed deadlines, academic probation. The platform needed a strict no-fabrication policy enforced at the architecture level, not the prompt level.
-
Multilingual Complexity: Kuwait University operates in both Arabic and English. Course material, student queries, and institutional regulations mix both languages. Most embedding and reranking models degrade severely on Arabic text, making Arabic-first retrieval a non-trivial engineering requirement.
-
Institutional Knowledge Is Unstructured: KU regulations, grade point policies, and academic probation rules exist in scattered documents. Students do not know where to look. A single authoritative source that cites actual KU regulations directly - and refuses to guess when the document is absent - was a core product requirement.
-
Context Breaks in Long Conversations: Students engage in extended back-and-forth sessions. Follow-up questions like "go deeper" or "what about the rest" carry no topical content for a vector search. Without follow-up intelligence, the system would embed the word "rest" and retrieve semantically unrelated material.
The engineering challenge was building a system that solved all five problems simultaneously, in a single coherent architecture, without sacrificing response speed. Our foundational approach to this class of problem is detailed in /blogs/advanced-rag-chunking.
Technical Challenges
Parsing Complex Academic Documents
University course material is not clean prose. Lecture slides converted to PDF produce multi-column layouts. Textbook chapters contain equations, footnotes, and embedded figures. Standard PDF parsers extract text in reading order, which collapses multi-column content into nonsense and strips table structure entirely. Feeding this into an embedding model produces vectors that cannot retrieve meaningful answers because the source material is incoherent.
The platform needed a parser capable of handling real-world academic document complexity - preserving table structures, handling multi-column layouts, and returning clean Markdown that embedding models can process accurately.
Chunk Boundary Precision
Fixed-character chunking is the standard starting point and a reliable source of retrieval failure. A 1,000-character split that lands mid-sentence cuts the semantic unit in half. A chunk that starts with "However, this only applies when..." without the preceding context is useless to a reranker trying to assess relevance. The system needed overlapping chunks with enough shared context that no concept is ever completely isolated at a boundary.
Bilingual Retrieval Without Translation
Reranking is the highest-leverage step in the retrieval pipeline - it determines which of the candidate chunks actually answer the question. Most reranking models are English-only. Running an Arabic query through an English reranker either requires translation (adding latency and translation errors) or produces unreliable relevance scores. The system required a reranker that natively understood both Arabic and English at production quality.
Comprehensive vs. Targeted Query Disambiguation
A student asking "what is a primary key?" wants a focused two-paragraph answer. A student asking "teach me everything about database normalization" wants a structured lecture that covers the entire topic in the uploaded material. These two query types require fundamentally different retrieval strategies. The system needed to detect intent automatically and switch strategies without manual intervention.
Long-Session Context Degradation
GPT-4o has a 128,000-token context window. That sounds generous until a student has had three lengthy study sessions that each generated 6,000-token comprehensive answers. Naively appending all history to every request eventually overflows the context, causes the model to lose track of earlier material, and inflates API costs. The solution required a trimming strategy that preserved recency without discarding the token budget in a way that caused unpredictable truncation.
Prompt Security in a Multi-Tenant Environment
One student must never retrieve material from another student's course. One professor's uploads must never be visible to students enrolled in a different course. This isolation must be enforced at the database query level - not at the application level - so that a compromised or misbehaving client cannot bypass it.
Solution Architecture
Seven Labs designed a layered architecture where each stage of the pipeline addresses a specific failure mode of naive RAG implementations.
1+------------------------------------------------------------------+
2| CLIENT LAYER |
3| Next.js 14 App Router . Student Portal . Admin Panel |
4+-----------------------------+------------------------------------+
5 | HTTPS (Traefik reverse proxy + SSL)
6+-----------------------------v------------------------------------+
7| BACKEND API (Node.js / Express) |
8| |
9| +-------------+ +--------------+ +----------------------+ |
10| | Auth / RBAC | | Chat Routes | | Tool Routes | |
11| | JWT + 2FA | | SSE Streaming| | Rights / Citation | |
12| +-------------+ +------+-------+ +-----------+----------+ |
13| | | |
14| +------------v-----------------------v-----------+ |
15| | RAG SERVICE | |
16| | | |
17| | [1] Acronym Expansion + Follow-up Detection | |
18| | [2] Embed Query (Google / OpenAI) | |
19| | [3] Hybrid Search - Weaviate BM25 + Vector | |
20| | [4] Jaccard Deduplication (threshold 0.82) | |
21| | [5] Cohere Multilingual Rerank | |
22| | [6] Page-Order Re-sort (comprehensive mode) | |
23| | [7] Token-Budgeted Prompt Assembly | |
24| | [8] LLM Generation (GPT-4o / Gemini) | |
25| | [9] SSE Token Stream to Client | |
26| | [10] Source Citations Returned | |
27| +-------------------------------------------------+ |
28+------+----------------------+-------------------+----------------+
29 | | |
30+------v------+ +-----------v--------+ +-------v--------+
31| MongoDB | | Weaviate | | External APIs |
32| messages | | CourseChunk | | OpenAI GPT-4o |
33| courses | | BM25 + dense vec | | Cohere Rerank |
34| chunks | | courseId filter | | LlamaParse |
35| prompts | +--------------------+ | Google Embed |
36| audit logs | +----------------+
37+-------------+
38
39All services run inside a private Docker bridge network on Coolify.
40Weaviate and MongoDB are unreachable from the public internet.
41Traefik routes HTTPS only to the frontend and admin panel.
Document Ingestion Pipeline
When a professor uploads a PDF or DOCX, the following pipeline executes automatically:
Upload -> LlamaParse -> Markdown Extraction -> Chunking -> Embedding -> Dual-Write
| Step | What Happens | Technology |
|---|
| Upload | Professor uploads file (max 50 MB) via admin portal | Cloudinary |
| Parse | File sent to LlamaParse. Handles tables, multi-column layouts, equations - returns clean Markdown | LlamaParse |
| Chunking | Text split into 1,000-character chunks with 200-character overlap. Overlap ensures concepts spanning a page boundary are never severed | Custom recursive splitter |
| Embedding | Each chunk converted to a dense vector capturing semantic meaning (768 or 1,536 dimensions) | Google text-embedding-004 or OpenAI text-embedding-3-small |
| Dual-Write | Every chunk written to Weaviate (hybrid search) and MongoDB (sequential access, fallback, page-order re-sort) | Weaviate + MongoDB |
The dual-write is not redundant storage - it serves two distinct purposes. Weaviate serves real-time hybrid search at query time. MongoDB preserves the original chunk sequence by
, enabling the comprehensive query mode to fetch material in the exact order the professor structured it.
Technology Stack
Frontend and Backend
- Next.js 14 (App Router, standalone build): Student portal and admin panel - SSR with client-side routing, streaming UI for token-by-token answer display
- Node.js / Express: Backend API - REST endpoints and Server-Sent Events for streaming responses
AI and Retrieval
- LlamaParse: Cloud document parser - handles complex academic PDF layouts, tables, equations, returning structured Markdown
- Google text-embedding-004 / OpenAI text-embedding-3-small: Query and document embedding - generates 768 or 1,536-dimensional dense vectors
- Weaviate: Vector database - dual BM25 keyword index and dense vector index on the same object, single hybrid query call, filter enforced at query time
- Cohere rerank-multilingual-v3.0: Cross-attention reranker - natively bilingual Arabic/English, no translation step required
- OpenAI GPT-4o: Primary LLM for answer generation (configurable to Gemini 1.5 Flash)
Infrastructure
- Coolify: Self-hosted PaaS - manages Docker Compose deployments, environment variables, and rolling restarts
- Traefik: Reverse proxy - automatic SSL termination, domain routing, public HTTPS only to frontend and admin panel
- Docker bridge network: All services isolated from the public internet; Weaviate and MongoDB are container-internal only
Storage
- MongoDB 7: Document store - users, sessions, full chat history, course chunks, system prompts, quiz results, audit logs
- Cloudinary: File storage - original uploaded PDFs and DOCX files
Implementation Process
Phase 1 Phase 2 Phase 3 Phase 4 Phase 5
Ingestion -> Hybrid Search -> Rerank + Intel -> Context Mgmt -> Security + UAT
[Parse/Chunk] [Weaviate BM25] [Cohere Rerank] [Token Budget] [RBAC/Prompts]
[Embed/Write] [Alpha Tuning] [Follow-up Det.] [Cache Layer] [Audit Logs]
Phase 1: Document Ingestion Pipeline
We began with the ingestion layer because retrieval quality is determined entirely by what goes into the index. We integrated LlamaParse over standard PDF parsers specifically for its ability to preserve table cell relationships and handle multi-column academic textbook layouts. A standard
extraction of the same document produced merged columns and broken tables - LlamaParse returned clean Markdown with preserved structure.
We implemented a custom recursive character splitter producing 1,000-character chunks with 200-character overlap. The overlap value was chosen through empirical testing on KU lecture material: smaller overlaps caused retrieval misses on concepts described across slide boundaries; larger overlaps increased index size without proportional retrieval gain.
Each chunk was written simultaneously to Weaviate (for search) and MongoDB (for sequential access), with
preserving document position for later page-order re-sorts.
Phase 2: Hybrid Search Configuration and Tuning
We configured Weaviate's hybrid search with
- favouring semantic vector similarity while preserving BM25's strength on exact technical terms, course codes, and Arabic proper nouns that embedding models can misplace semantically.
Relative Score Fusion was selected over simple weighted averaging because BM25 and vector scores operate on entirely different scales and cannot be meaningfully added. RSF normalises each ranked list independently before fusion, producing stable results regardless of score magnitude differences between retrieval methods.
The initial retrieval pull was set to 25 candidate chunks for standard queries. Every search is scoped by
at the Weaviate query level - enforced in the database, not the application.
We implemented Jaccard trigram deduplication at threshold 0.82 before reranking. Without deduplication, near-identical overlapping chunks from adjacent pages consistently appeared in the top results, consuming reranker slots and LLM context tokens with redundant content.
Phase 3: Reranking, Query Intelligence, and Comprehensive Mode
We selected Cohere
after testing three alternatives. The multilingual model was the only option that produced stable Arabic relevance scores without requiring a translation step - critical given that a significant portion of KU course material and student queries are in Arabic.
The reranker reads the full query and all candidate chunks together in a single cross-attention pass, producing a relevance score specific to that exact query. This catches false positives that hybrid search surfaces: a chunk containing a keyword from the query but discussing a different topic entirely gets correctly demoted.
Comprehensive query detection was implemented using a phrase pattern match against a trigger list ("teach me", "explain in detail", "everything about", "full lecture", "don't miss anything") combined with a word-count check. When triggered, the retrieval strategy changes entirely:
| Parameter | Standard Query | Comprehensive Query |
|---|
| Initial retrieval | 25 chunks via hybrid search | Up to 60 chunks fetched by from MongoDB |
| Deduplication | Jaccard trigram (0.82) | Skipped - sequential chunks are inherently unique |
| Reranking | Top 7 via Cohere | Top 20 via Cohere |
| Final ordering | Relevance score order | Re-sorted into original page order after reranking |
| LLM token budget | 4,096 tokens | 8,192 tokens |
For comprehensive queries where the student has not specified which document they want, rank-weighted majority voting identifies the target document: the hybrid search result ranked first gets the most votes, lower-ranked results get proportionally fewer, and the document with the highest total vote weight wins. This is robust against a single off-topic result skewing the selection.
Acronym expansion was implemented as a pre-retrieval normalisation step. Students at Kuwait University consistently use shorthand that does not appear verbatim in course material. Before embedding, recognised abbreviations are expanded:
| Input | Expanded |
|---|
| google classroom |
| kuwait university |
| object oriented programming |
| natural language processing |
| database |
| data structure |
| algorithm |
Follow-up detection uses pattern matching on short messages under 12 words. When a follow-up is detected ("more", "go deeper", "elaborate", "continue", "what about the rest"), the system retrieves the last substantive user query from conversation history and uses that for embedding and search - not the follow-up message itself. This prevents the system from embedding the word "more" and retrieving semantically unrelated chunks.
Phase 4: Context Management and Embedding Cache
The context assembly strategy was designed around a hard constraint: the total prompt sent to GPT-4o must remain well within the 128,000-token window regardless of how long the conversation grows, while ensuring the most recent exchanges are always included.
We implemented a token-aware history trimmer that walks backwards through the full conversation stored in MongoDB, accumulates token estimates (approximately 1 token per 4 characters), and stops adding messages once the 4,000-token history budget is reached. This is a critical design choice: a naive "keep last N messages" approach fails when prior comprehensive answers are 8,000 tokens each - three such answers in history would exhaust any reasonable message count limit. The token-budget approach handles this correctly regardless of individual message length.
Every query trigger fresh RAG retrieval - course material is never carried forward in history. This means the LLM always has current, grounded context for the topic at hand, eliminating the degradation pattern where answers become progressively less grounded as conversations grow.
The embedding cache is an in-process LRU-style Map keyed on the first 600 characters of the query text, with a 60-minute TTL and a maximum of 1,000 entries. Repeated queries - common in study sessions where multiple students ask semantically similar questions - return a cached vector immediately with near-zero latency and no API call. Batch ingestion bypasses the cache entirely to prevent memory consumption from unique per-document chunk text.
Phase 5: Security Hardening, System Prompts, and Tool Pages
The system prompt is stored in MongoDB and is editable by the platform admin without any code deployment. Changes propagate immediately to all live conversations on the next request. This was a deliberate product decision: the educational institution needed to update AI behaviour - adding new regulatory citations, adjusting language style, modifying the KU closing statement - without engineering intervention.
RBAC is enforced at three layers: JWT tokens carry user role, each route has dedicated middleware rejecting mismatched roles, and every Weaviate query is filtered by
at the database level. A student enrolled in Course A cannot retrieve Course B material regardless of what they send to the API.
Four specialised tool pages were deployed alongside the main course chat, each with its own system prompt stored in MongoDB:
| Tool | Purpose |
|---|
| KU Student Rights Advisor | Grade appeals, GPA rules, academic probation - cites KU regulations exactly, never fabricates policy |
| Citation Formatter | APA 7th, MLA, Harvard per KU thesis requirements - strict format compliance |
| Success Stories | KU graduate journeys from uploaded PDFs only - no invented stories |
| What's Trendy | KU events and career trends from uploaded documents and the Eventat platform only |
Audit logs record every admin action with timestamp and actor ID. System prompt contents are hidden from students - if asked, the AI responds that it is present for academic guidance.
Security Considerations
Course-Level Data Isolation
Data isolation is enforced at the Weaviate query level, not the application level. Every search call passes the student's active
as a mandatory filter. The vector database applies this filter before performing similarity calculations - a student who manually crafts a request without their
cannot access another course's material because the filter is server-side and cannot be bypassed by the client.
Prompt Injection Prevention
The
block is assembled server-side and injected into the system prompt programmatically. Students cannot overwrite the system instruction through the chat input because course context is not derived from anything the student sends - it is retrieved from a filtered database query and inserted by the backend before the LLM ever sees the message.
Role-Based Access Control
JWT tokens carry user role (
,
,
). Each API route has dedicated middleware that rejects requests with mismatched roles before they reach any business logic. Students cannot call professor file upload routes. Professors cannot call admin user management routes. Each boundary is enforced independently.
No-Fabrication Policy at Architecture Level
The system prompt explicitly prohibits the LLM from inventing KU rules or course facts. Where uploaded material lacks an answer, the AI is instructed to say so explicitly rather than guess. This is reinforced architecturally: retrieval always runs before generation, and the system prompt frames the retrieved material as the primary source the model must teach from. For a deeper discussion on securing AI systems in restricted environments, see /blogs/secure-ai-restricted-networks.
Performance Optimizations
Embedding Cache Eliminates Redundant API Calls
Study sessions produce high query repetition. Multiple students asking variations of the same question - "explain normalization", "what is normalization", "normalization in databases" - produce semantically similar vectors. The LRU embedding cache with a 60-minute TTL serves repeated queries with sub-millisecond vector lookup instead of a 200-400ms external API call. At peak study periods before exams, cache hit rates exceed 60% for popular courses.
Weaviate Single-Call Hybrid Search
Both BM25 and vector search execute in a single Weaviate query call. There is no fan-out to separate indexes or separate services. This means the total retrieval latency is the latency of one Weaviate call plus network overhead - not the serial latency of two separate search systems. For an analysis of how latency compounds in poorly architected AI systems, see /blogs/ai-infrastructure-engineering-beyond-chatbots.
Token-Budgeted Context Prevents Prompt Bloat
By enforcing a hard token budget on history inclusion, the total prompt size stays predictable across sessions of any length. This matters for cost as well as latency - GPT-4o pricing is per-token, and an unbounded history window that grows across a semester-long engagement would make per-query costs unpredictable. The token-budget approach keeps costs linear with query complexity, not with session age.
SSE Streaming Eliminates Perceived Latency
Responses are streamed token-by-token to the client using Server-Sent Events. Students see the answer begin appearing within 300-600 milliseconds of submitting their query - before the full response is generated. For comprehensive answers that can exceed 2,000 words, this is the difference between the system feeling responsive and feeling broken.
Health-Check-Gated Container Startup
The backend container will not accept requests until both MongoDB (
) and Weaviate (
) pass their health checks. This eliminates the class of production incidents caused by the backend starting before its dependencies are ready - a common failure pattern in Docker Compose deployments that causes silent errors during the first 30 seconds after a deployment.
Results & Outcomes
LovEdu deployed to Kuwait University with the following measured outcomes across the first semester of operation:
- Zero Hallucination Incidents on Course Material: Grounding rules and the RAG architecture ensured every answer was derived from uploaded professor material. No fabricated course facts were reported across all student sessions.
- Accurate Arabic-English Bilingual Responses: The Cohere multilingual reranker produced relevant retrieval results on Arabic queries without translation overhead. Students received answers in the language of their query automatically.
- Comprehensive Queries Covered Full Document Scope: The sequential fetch mode combined with page-order re-sorting allowed the LLM to receive material in the structure the professor intended, producing coherent lecture-quality explanations from a single prompt.
- Context Integrity Preserved Across Long Sessions: No context overflow incidents were recorded. The token-budgeted history approach maintained answer quality from session start to session end regardless of conversation length.
- Institutional Policy Answers with Explicit Citations: The KU Rights Advisor tool answered grade appeal and academic probation questions with direct regulatory citations, replacing guesswork with verifiable references.
| System Metric | Baseline (Generic AI) | LovEdu Production System | Outcome |
|---|
| Hallucination Rate on KU Regulations | High - fabricated policies | Zero - citation required or explicit "I don't know" | Eliminated |
| Arabic Query Retrieval Quality | Degraded - no native Arabic reranking | Full quality - Cohere multilingual reranker | Parity with English |
| Follow-up Query Coherence | Broken - follow-ups retrieve unrelated material | Intact - last substantive query reused for retrieval | Maintained |
| Comprehensive Answer Structure | Random ordering | Page-order sequential, professor-structured | Coherent |
| Context Integrity at 100+ Messages | Degraded - naive history overflow | Maintained - token-budgeted trimming | Preserved |
Lessons Learned
The Reranker Is the Most Important Component After Chunking
Hybrid search retrieves candidates. The reranker determines which candidates actually answer the question. Deploying Cohere multilingual reranking over plain hybrid search produced a measurable improvement in answer quality on Arabic queries specifically - because the cross-attention model reads the query and candidate together rather than scoring them independently. For any bilingual or multilingual RAG deployment, a native multilingual reranker is not optional.
Sequential Fetch Outperforms Pure Vector Retrieval for Comprehensive Queries
Vector similarity is excellent for targeted queries. For comprehensive questions, similarity-ranked chunks arrive out of order, forcing the LLM to mentally re-sequence material it receives jumbled. The page-order re-sort after reranking - fetching chunks by
from MongoDB after the reranker has identified the relevant document - produced noticeably more coherent comprehensive answers than relevance-ordered context injection.
Token Budgeting Must Be Character-Aware, Not Message-Count-Aware
A "keep last 8 messages" history strategy fails in practice the moment two or three comprehensive answers exist in history. At 8,192 tokens each, three such messages exceed any reasonable LLM context budget before the current query is even added. Token-aware budgeting - walking backwards through history and accumulating estimated token counts - is the only approach that remains correct regardless of individual message length.
Dual-Write Is Worth the Storage Overhead
Storing chunks in both Weaviate and MongoDB appears redundant but serves genuinely different access patterns. Weaviate provides fast hybrid search. MongoDB provides
-ordered sequential access, fallback search when Weaviate is unavailable, and administrative access to the raw chunk data. The storage overhead is small relative to document sizes; the operational resilience is significant.
Admin-Editable System Prompts Reduce Engineering Dependency
Storing system prompts in MongoDB rather than code was one of the highest-impact architectural decisions. The educational institution updated the KU closing statement, added a new regulatory citation format, and adjusted response language style three times in the first month - all without a single code deployment. For teams building AI tools for non-technical clients, this pattern eliminates an entire category of support tickets. For more on building maintainable AI systems, see /blogs/human-centered-ai-workflow-integration.
Frequently Asked Questions (FAQs)
1. How does the system ensure a student in one course cannot access another course's material?
Isolation is enforced at two independent layers. At the application layer, the backend uses the authenticated student's session to determine their active course and passes the
to every search call. At the database layer, Weaviate applies the
filter before performing any similarity calculation - the filter runs inside the vector database, not in the application code. A client that tampers with its request cannot bypass the database-level filter.
2. Why was Weaviate chosen over Pinecone or Qdrant for this deployment?
The primary driver was native hybrid search in a single call. Weaviate maintains both a BM25 keyword index and a dense vector index on the same object, executes both simultaneously in one query, and fuses the results using Relative Score Fusion natively. Achieving equivalent behaviour with Pinecone requires running two separate queries and merging results in application code. Qdrant supports hybrid search but with additional configuration overhead. For a self-hosted deployment on Coolify, Weaviate's Docker-native setup and stable REST API made it the most operationally straightforward choice.
3. What happens when a student asks a question the uploaded course material does not cover?
The LLM is instructed via system prompt to teach from the retrieved course material first and supplement with general academic knowledge only where the material has genuine gaps - stating explicitly when it is doing so. If retrieval returns zero relevant chunks (relevance scores below threshold), the system falls back to a general academic advisor mode and does not fabricate course-specific information. The zero-chunk fallback is logged, which allows professors to identify topics their uploaded material does not address.
4. How does follow-up detection avoid false positives - treating a genuine new question as a follow-up?
Follow-up detection applies two conditions simultaneously: the message must match a pattern from the follow-up phrase list AND be under 12 words. A message like "what does the professor say about database normalization?" is 10 words but contains no follow-up phrase and is not pattern-matched. A message like "more" is one word and matches the pattern. The dual condition keeps false positives to a negligible rate in practice.
5. What is the latency profile of a standard query end-to-end?
A standard query goes through: acronym expansion (< 1ms), embedding cache lookup or API call (< 1ms cached, 150-300ms uncached), Weaviate hybrid search (80-150ms), Jaccard deduplication (< 5ms), Cohere reranking (200-400ms), prompt assembly (< 5ms), and GPT-4o generation with SSE streaming (first token in 300-600ms). Total time to first token for a standard query is approximately 700ms to 1.4 seconds. The student sees the answer begin streaming immediately after, with the full response completing in 3-8 seconds depending on length.
6. How are system prompt updates applied without downtime?
System prompts are stored as documents in MongoDB's
collection. On every incoming request, the backend fetches the active system prompt for the relevant tool (course chat, rights advisor, citation formatter) from MongoDB before assembling the LLM prompt. There is no caching of prompt content in memory between requests. When an admin updates a prompt in the admin panel, the change is written to MongoDB and takes effect on the very next request - no restart, no redeployment, no user-facing interruption.
Schema & SEO Metadata
1{
2 "@context": "https://schema.org",
3 "@type": "TechArticle",
4 "headline": "LovEdu - Production RAG Platform for Kuwait University",
5 "description": "How Seven Labs built LovEdu, a production-grade AI education platform for Kuwait University, implementing hybrid RAG search, Cohere multilingual reranking, comprehensive query detection, and token-budgeted context management.",
6 "inLanguage": "en-US",
7 "articleSection": "Artificial Intelligence & Education Technology",
8 "keywords": "RAG, Retrieval-Augmented Generation, Weaviate, Hybrid Search, Cohere Rerank, Kuwait University, Arabic NLP, LlamaParse, GPT-4o, AI Education Platform, EdTech AI",
9 "author": {
10 "@type": "Organization",
11 "name": "Seven Labs",
12 "url": "https://www.sevenlabs.site"
13 },
14 "publisher": {
15 "@type": "Organization",
16 "name": "Seven Labs",
17 "url": "https://www.sevenlabs.site",
18 "logo": {
19 "@type": "ImageObject",
20 "url": "https://res.cloudinary.com/dywx7ldqr/image/upload/v1779223334/media/img_01.png"
21 }
22 }
23}
Internal Linking Anchors