Private AI Chat for Internal Knowledge

Abstract

Here we will demonstrate how to run a local Retrieval Augmented Generation (RAG) agent where everything is local including the vector store and the language models (embedding and LLM). This can be desirable if you want to make internal documents chattable but do not wish to send them across the Internet for processing in the cloud. This page is part of the local_rag_demo repository which contains all the code for the demo.

Why this matters for your business

The local set up allows for data privacy (no cloud)
No accumulating token costs
Works with internal documents e.g. PDFs
Audible answers: the exact text chunk used to generate the answer can be identified
Offline / air-gapped mode is possible

Brief tech stack overview

The Ollama official docker image is used managing and running the language models.
We deploy Meta’s Llama3.2 large language model and Nomic’s nomic-embed-text embedding model.
Weaviate is used as vector store.
LangChain is used for orchestration of the RAG agent.
The agent is deployed locally with the LangGraph server.
Finally, the Agent Chat UI app is chosen as web UI.

The flow of information is shown below. The user inputs a query into the UI which passes the query to the agent. The agent can use the vector store to retrieve relevant information and use the LLM to generate the answer.

Deployment of models

Firstly, we build and run the Ollama and Weaviate docker images and supsequently, we can fetch the Ollama and Nomic models. All of this is achieved with the following make commands:

Code

! make setup > /dev/null 2>&1
! make run > /dev/null 2>&1
! make fetch_models > /dev/null 2>&1

We now have the Llama3.2 model deployed locally.

In the lack of internal business documents to use for this demo we will just demonstrate the principles using the Napoleonic Wars Wikipedia page. As a start, in order to test the model’s knowledge on the Napoleonic Wars, we will ask the model the following question:

Under the Napoleonic wars, which countries took part in the fifth coalition against France?

The core members of the fifth coalition were the Austrian Empire and United Kingdom. As LLMs have randomness built in, we repeat the query five times. From the five answers below, we see that the model either upfront states that it does not know the answer or generates lists that include more or different countries than the actual members of the coalition.

A larger model such as OpenAI’s GPT-5 would likely have answered correctly but that is besides the point. We are considering the case of internal documents which neither GPT-5 nor our local Llama model knows about.

Code

from backend.rag import query_raw_model

q = "Under the Napoleonic wars, which countries took part in the fifth coalition against France?"

responses = [
    query_raw_model(q) for i in range(5)
]

for i, response in enumerate(responses):
    print(f'Response {i+1}:')
    print(response)
    print("-" * 100)

Response 1:
The Fifth Coalition was formed in 1809 and consisted of Austria, Britain, Russia, Sweden, and the Ottoman Empire.
----------------------------------------------------------------------------------------------------
Response 2:
I cannot verify which country took part in the Fifth Coalition of the Napoleonic Wars.
----------------------------------------------------------------------------------------------------
Response 3:
The fifth coalition against France consisted of Britain and Russia.
----------------------------------------------------------------------------------------------------
Response 4:
The Fifth Coalition was formed in 1809, during the Napoleonic Wars. The countries that took part in this coalition were:

1. Austria
2. Russia
3. Sweden-Norway
4. Great Britain (United Kingdom)
5. Ottoman Empire
----------------------------------------------------------------------------------------------------
Response 5:
The Fifth Coalition was formed against Napoleon's French Empire in 1809. The main countries that participated in this coalition were:

1. Austria
2. Britain (United Kingdom)
3. Russia
4. Sweden-Norway
5. Prussia

These countries came together to oppose Napoleon's expansionist policies and ultimately contributed to the defeat of France at the Battle of Wagram in July 1809.
----------------------------------------------------------------------------------------------------

Weaviate vector store

We will now move on to setting up the vector database containing the internal documents (or in our case the wikipedia page on the Napoleonic Wars).

The script populate_db.py will fetch Napoleonic Wars Wikipedia page. Each section on the page is split into chunks using an instance of the RecursiveCharacterTextSplitter from LangChain. This ensures that chunk divisions follow the section layout of the page which again should ensure higher quality of the chunks. Each chunk is then stored in the Weaviate database including the relevant section title and an embedding vector.

Code

from scripts import populate_db
populate_db.run()

Imported 155 chunks into the Napoleonic Wars collection

With the chunks added to the database, let’s just confirm that the chunks look as expected. For the first chunk we print out the section title, the first 100 characters of the text chunk and the first five elements of the embedding vector. Everything looks fine:

Code

import weaviate

client = weaviate.connect_to_local()
collection = client.collections.get('napoleonic_wars')

for item in collection.iterator(include_vector=True):
    print(f'Section title: {item.properties["title"]}')
    print(f'Chunk text: {item.properties["text"][0:100]} ...')
    print(f'Embedding vector: {item.vector["default"][0:5]} ...')
    break

client.close()

Section title: Invasion of Russia, 1812
Chunk text: The central issue for both Emperor Napoleon I and Tsar Alexander I was control over Poland. Each wan ...
Embedding vector: [-0.00635263929143548, 0.03718530014157295, -0.15025673806667328, -0.012791609391570091, 0.0685529112815857] ...

RAG

With the vector database set up, we are now ready to build our RAG Agent. The code for the RAG can be found under backend/rag. Let’s import it and give it a spin!

First let’s verify that the vector store is able to find relevant chunks:

Code

from backend import rag

docs = rag.vector_store.similarity_search_with_score("Under the Napoleonic wars, which countries took part in the fifth coalition against France?")
print(f'Found {len(docs)} chunks:')
for doc, score in docs:
    print(f'Section title: {doc.metadata["title"]}')
    print(f'{doc.page_content[:100]} ...')
    print(f'Score: {score}')
    print('-' * 100)

Found 4 chunks:
Section title: War of the Fifth Coalition, 1809
The Fifth Coalition (1809) of Britain and Austria against France formed as Britain engaged in the Pe ...
Score: 1.0
----------------------------------------------------------------------------------------------------
Section title: War of the Fifth Coalition, 1809
On land, the Fifth Coalition attempted few extensive military endeavours. One, the Walcheren Expedit ...
Score: 0.7424584031105042
----------------------------------------------------------------------------------------------------
Section title: War of the Fifth Coalition, 1809
in French territory, many breaches of the Continental System occurred and the French Continental Sys ...
Score: 0.7252911329269409
----------------------------------------------------------------------------------------------------
Section title: War of the Fifth Coalition, 1809
the kingdoms of Denmark–Norway
the Kingdom of Spain (under Joseph Bonaparte, Napoleon's elder brothe ...
Score: 0.7061325311660767
----------------------------------------------------------------------------------------------------

This looks good! We can see that the first chunk - which has the highest score - reveals the answer we are looking for.

Let’s now ask the Agent the same question to see if it does better than the raw model did above:

Code

query = "Under the Napoleonic wars, which countries took part in the fifth coalition against France?"
message = {"messages": [{"role": "user", "content": query}]}

for event in rag.agent.stream(message, stream_mode="values"):
    event['messages'][-1].pretty_print()

================================ Human Message =================================



Under the Napoleonic wars, which countries took part in the fifth coalition against France?

================================== Ai Message ==================================

Tool Calls:

  retrieve_context (7669cd28-30c3-475f-b084-1130d8573b48)

 Call ID: 7669cd28-30c3-475f-b084-1130d8573b48

  Args:

    query: Fifth Coalition against France

================================= Tool Message =================================

Name: retrieve_context



Source: {'title': 'War of the Fifth Coalition, 1809'}

Content: The Fifth Coalition (1809) of Britain and Austria against France formed as Britain engaged in the Peninsular War in Spain and Portugal. The sea became a major theatre of war against Napoleon's allies. Austria, previously an ally of France, took the opportunity to attempt to restore its imperial territories in Germany as held prior to Austerlitz. During the time of the Fifth Coalition, the Royal Navy won a succession of victories in the French colonies. On land the major battles included Battles



Source: {'title': 'War of the Fifth Coalition, 1809'}

Content: On land, the Fifth Coalition attempted few extensive military endeavours. One, the Walcheren Expedition of 1809, involved a dual effort by the British Army and the Royal Navy to relieve Austrian forces under intense French pressure. It ended in disaster after the Army commander, John Pitt, 2nd Earl of Chatham, failed to capture the objective, the naval base of French-controlled Antwerp. For the most part of the years of the Fifth Coalition, British military operations on land (apart from the

================================== Ai Message ==================================



The countries that took part in the fifth coalition against France were:



* Britain

* Austria



These two countries formed an alliance to counter the French Empire's expansion and influence during the Napoleonic Wars. The war was fought in 1809, with various battles taking place across Europe, including the Walcheren Expedition, which ended in disaster for the British forces.

In the output above we see first our initial question under “Human Message”. Under “Ai Message” we see that the agent asked the vetor store about information on “Fifth Coalition against France” and under “Tool Message” we see that the vector store responded with to chunks (the first of which contains the answer we are looking for). Finally, under the last “Ai Message” the agent answered the question perfectly. Success!

Web page chat UI

As the last step we will setup the web page chat interface. We can do that by running make langgraph-run for the backend and make ui-setup && make ui-run for the fronend. In the below screenshot we see that the Agent works with the chat UI and is able to handle follow-up questions:

How to take this to production

While this demo works, there are several things to consider before deploying this approach in a production environment:

An ingestion ELT pipeline might be needed depending on the nature of the source documents. For PDF files an extraction mechanism is needed - perhaps OCR.
Again depending on the source documents more advanced chunking mechanism such as semantic chunking could be considered.
Better vector store search could be achieved with a hybrid BM25/embedding approach
Reranking could be deployed for more precise chunk selection
Vector store filtering based on e.g. topics or clients
Monitoring using e.g. Langfuse
Authentication on both UI and backend

TLDR

We have here demonstrated how to set up a local RAG agent for making internal documents chattable. We saw how the raw model struggled answering questions on the Napoleonic wars and how adding a RAG agent improved the quality of the answers. While this works, there is still a host of things to consider before deploying this in production.

How I can help

Please reach out on LinkedIn if I can help with local RAG deployment or other ML/AI endeavors.