Legal professionals and developers have spent the last year hitting a recurring wall with Retrieval-Augmented Generation. The pattern is always the same: a user asks a complex legal question, the AI retrieves a few relevant paragraphs from a statute, and then produces a response that is either dangerously hallucinated or so generic it borders on useless. When a user provides insufficient facts, the AI guesses; when a user uploads a fifty-page contract, the AI returns a standard checklist that could apply to any document in the world. The tension lies in the gap between the fluid nature of human storytelling and the rigid, structural requirements of legal evidence.

The Architecture of Dual-Mode Legal Retrieval

Lemini addresses this gap by abandoning the idea of a single, all-purpose prompt in favor of a dual-mode pipeline designed specifically for the Korean legal landscape. The first path, known as Ouroboros mode, is designed for the initial phase of legal consultation where facts are often missing or ambiguous. Instead of allowing the LLM to fill in the blanks with plausible but fake details, Ouroboros implements a self-assessment loop. The model evaluates whether the provided facts are sufficient to reach a legal conclusion. If the information is lacking, the system pauses the generation process and presents the user with multiple-choice follow-up questions to converge on the actual facts of the case. Only after this convergence does the system trigger the RAG pipeline to produce a structured analysis covering favorable facts, cautionary notes, specific action plans, and statute of limitations warnings.

For users who already possess a document and need a deep dive, Lemini switches to a Professional Analysis mode. This mode operates as a six-step chain rather than a single request. It begins with a total scan to identify the document's nature and summarize its sections. The second step is the most critical: external institutional framework mapping. Here, the system defines the legal and institutional framework the document relies on before attempting any analysis. This ensures the AI is not just summarizing text but is analyzing the text against a specific legal regime. The process then moves into axis-based RAG and clause-by-clause review, followed by three parallel analysis chains that simultaneously evaluate the alignment of purpose and means, institutional compliance, and potential risk scenarios. A final verdict is only issued if the user asks a specific judgment-based question.

This logic is powered by a three-tier data pool containing statutes, precedents, and self-regulatory codes. The self-regulatory pool is particularly nuanced, incorporating standard terms from the Fair Trade Commission, association bylaws, and guidelines from the Korea Internet & Security Agency (KISA) and the Personal Information Protection Commission (PIPC). To prevent the data from becoming stale, Lemini utilizes the Digital Law Resource Framework (DRF) API for weekly automatic updates of statutes, while precedents are managed via the National Law Information joint utilization API combined with an on-demand cache. The entire system is orchestrated by Gemini, which manages the multi-chain transitions and ensures the final output is delivered in a strict JSON format for stability. The backend is built on FastAPI and deployed via Google Cloud Run, while the frontend utilizes Next.js and SQLite for lightweight data management. Lemini is designed to prioritize verified evidence over the raw generative intelligence of the LLM.

From Generative Guesswork to Verifiable Pipelines

The fundamental shift in Lemini is the move from prompt engineering to pipeline engineering. In a standard RAG setup, the model is asked to be a lawyer, which often leads to the AI mimicking the tone of a lawyer without performing the actual work of a lawyer. By splitting the analysis into six distinct chains, Lemini forces the model to anchor its reasoning in an external institutional framework before it is allowed to analyze a single clause. This removes the model's ability to drift into generalities. When the system declares the legal framework first, it creates a boundary that limits the scope of the subsequent RAG calls, ensuring that the retrieved laws and precedents are contextually relevant to the specific regime identified in step two.

To solve the most critical failure point in legal AI—the hallucination of non-existent statutes—Lemini implements a citation verification loop. This is not a simple prompt asking the AI to be honest. Instead, it is a programmatic process that takes every citation in the final response and cross-references it against the actual hits returned by the retrieval engine. If a cited article or clause does not exist in the retrieved source material, the system automatically strips that citation from the response before the user ever sees it. This transforms the AI from a creative writer into a curated index.

This precision is further supported by a hybrid search strategy. Because legal terminology is too rigid for simple vector-based semantic search, Lemini employs a three-axis embedding matrix. It combines vector search for conceptual similarity, lexical search for keyword matching, and exact match search for specific legal identifiers. This ensures that a search for a specific article number doesn't return a conceptually similar but legally irrelevant section. From a development perspective, the system avoids complex branching logic for different legal domains, instead using a single `document_type` identifier. This allows the tool to scale across various legal fields without needing to rewrite the underlying chain logic for every new type of contract or regulation.

Privacy is handled through a stateless architecture that minimizes data liability. Lemini eliminates the need for user accounts or login procedures entirely. Conversation histories are not stored in a central database but are kept exclusively in the browser's `localStorage`. Even user IP addresses are handled via in-memory processing solely for the purpose of rate limiting, ensuring that sensitive legal queries never leave a permanent footprint on the server. This architectural choice reduces the psychological barrier for users dealing with sensitive information and removes the administrative burden of managing PII (Personally Identifiable Information) in a high-risk domain.

By explicitly positioning itself as an information retrieval and analysis tool rather than a legal advisory service, Lemini navigates the regulatory complexities of the Attorney Act while providing a high-utility technical solution. The value provided here is not the intelligence of the model, but the control over the data pipeline.

This shift toward verifiable, chain-based architectures suggests that the future of professional AI lies in the systematic removal of model autonomy in favor of rigorous, multi-step verification.