How legalQ Uses RAG and MCP to Democratize Korean Legal Search

For most people, the distance between a legal problem and a legal answer is measured in frustration. The traditional process of navigating Korean statutes requires a specific kind of literacy—the ability to guess the exact keywords a legislator used decades ago. If you do not know the precise legal term for your grievance, a keyword search returns nothing. Conversely, the recent surge in general-purpose AI has offered a seductive alternative, but with a dangerous catch. A user asks a chatbot about a housing dispute, and the AI provides a confident, fluent answer, complete with a clause number that does not actually exist. This tension between the rigidity of legal databases and the fluidity of large language models is where the current friction in legal tech resides.

The Architecture of Grounded Legal Intelligence

legalQ addresses this gap by implementing a Retrieval-Augmented Generation (RAG) pipeline designed specifically for the complexities of the Korean legal system. Rather than relying on the internal weights of a model to recall laws, the system treats the LLM as a reasoning engine that operates on external, verified data. When a user submits a query in natural language, the system transforms that input into a search-optimized query to scan the `legalize-kr` dataset, a project dedicated to refining and providing structured Korean legal data. The resulting citations and precedents are then fed back into the model to generate a response that is grounded in actual text, ensuring that every claim is accompanied by the specific clause or case law used as a reference.

Technically, the platform is built on a modern, high-performance stack. The backend utilizes FastAPI to handle requests with minimal latency, while the frontend is constructed with React to provide a seamless user interface. To maintain flexibility in model selection, legalQ routes its LLM calls through OpenRouter, allowing the system to leverage various frontier models depending on the complexity of the task. The heavy lifting of data retrieval is handled by Seahorse Cloud, which serves as the vector database where legal indices are stored and queried.

One of the most significant architectural choices is the implementation of the Model Context Protocol (MCP). By separating the tool-calling logic into an MCP layer, the developers have modularized the process of converting natural language into search queries and fetching the corresponding legal data. This separation ensures that the data extraction process remains efficient and decoupled from the final answer generation. Furthermore, the system is designed with a radical approach to privacy. There is no sign-up process and no login requirement. The architecture is entirely stateless, meaning the server does not store conversation histories in a database. Instead, all chat logs are kept exclusively in the browser's `localStorage`. To prevent abuse while maintaining anonymity, the system employs an in-memory bucket for rate limiting based on IP addresses, and standard access log metadata is purged every 30 days to ensure that sensitive legal inquiries do not leave a permanent digital footprint.

Bridging the Gap Between Keywords and Hallucinations

To understand why this architecture matters, one must look at the failure points of previous legal search methods. For years, the gold standard was the keyword search. While highly accurate, it placed the entire burden of discovery on the user. If a layperson did not know the exact terminology, the system was useless. The barrier to entry was not the law itself, but the language used to search for it. General-purpose LLMs from providers like OpenAI or Anthropic attempted to lower this barrier by allowing natural language queries, but they introduced the problem of hallucinations. In a legal context, a hallucinated clause is not just a technical error; it is a liability.

legalQ creates a synthesis of these two worlds. It provides the accessibility of a chatbot with the evidentiary rigor of a database. By using the MCP layer to fetch real-time data from the search index, it bypasses the model's tendency to invent citations. This distinguishes legalQ from other developer-centric tools like the Korean Law MCP, which provide the plumbing for legal data but lack a direct, user-facing interface. legalQ is positioned not as a tool for developers to build with, but as a product for citizens to use immediately via a browser.

However, the transition to natural language search is not without its technical hurdles. The current iteration of the system struggles with the processing of annex data—the detailed exceptions and supplementary explanations often found in the appendices of Korean laws. To mitigate this, the system does not rely solely on the AI's summary but provides direct links to the relevant statutes, allowing users to verify the fine print manually. Additionally, there is a performance trade-off inherent in the RAG process. Complex queries that require the synthesis of multiple laws trigger a higher number of tool calls, which can lead to a noticeable increase in response latency. This is the inevitable cost of accuracy; the system spends more time verifying facts than a standard LLM would spend inventing them.

The shift toward natural language legal search is effectively dismantling the traditional monopoly on legal information. By turning the law from a locked archive into a searchable conversation, the barrier between the citizen and the statute is finally beginning to dissolve.

How legalQ Uses RAG and MCP to Democratize Korean Legal Search

The Architecture of Grounded Legal Intelligence

Bridging the Gap Between Keywords and Hallucinations

Related Articles