Project Summary
Fully local RAG with FAISS indexing, LangChain chunking, GPT-2 generation, and Streamlit UI — no cloud API keys. Conceptual ancestor of the DocuMind production stack.
Technical deep dive
This repository implements a fully local Retrieval-Augmented Generation (RAG) pipeline using LangChain, FAISS vector search, and a GPT-2 language model — with no external API keys required. It predates my production DocuMind stack but captures the same core insight: grounding answers in retrieved context beats unconstrained generation for factual tasks. The Streamlit UI makes the retrieval-and-generation loop visible, which is ideal for demos, teaching, and SEO around terms like local RAG pipeline, FAISS LangChain tutorial, and offline LLM question answering.
Architecture overview
Key design decisions
- FAISS in-memory vector index for fast cosine similarity without a vector database server
- LangChain document loaders and text splitters for chunking with configurable overlap
- GPT-2 as a lightweight local LLM — no OpenAI or cloud inference dependency
- Streamlit frontend exposing query input, retrieved sources, and generated response
- End-to-end runnable on a laptop with modest GPU or CPU inference
When to use this vs DocuMind
| Criteria | RAG Streamlit (this repo) | DocuMind (production) |
|---|---|---|
| Vector store | FAISS in-memory | ChromaDB with persistent collections |
| LLM backend | GPT-2 local | Ollama (llama3, embeddings) |
| API | Streamlit only | FastAPI + Next.js UI |
| Citations | Basic source display | Structured SourceCitation objects |
| Best for | Learning, quick demos | Production RAG reference |
Setup
git clone https://github.com/cdtalley/rag-streamlit-langchain
cd rag-streamlit-langchain
pip install -r requirements.txt
streamlit run app.pyKey Features & Capabilities
- FAISS in-memory vector index for cosine similarity search
- LangChain document loaders and configurable text splitters
- GPT-2 local generation conditioned on retrieved context
- Streamlit UI exposing query, sources, and generated answers
Tech Stack & Components
Getting Started
1.Run locally
Install dependencies and launch Streamlit.
git clone https://github.com/cdtalley/rag-streamlit-langchain
pip install -r requirements.txt
streamlit run app.pyFrequently asked questions
- Does this RAG project require OpenAI or cloud APIs?
- No. It uses FAISS for vector search and GPT-2 for local text generation. The entire pipeline runs offline after initial model download.
- How is this different from DocuMind?
- This is a lightweight learning/demo stack with FAISS and Streamlit. DocuMind is a production reference with ChromaDB, Ollama embeddings, FastAPI, citation grounding, and dual library collections — documented at draketalley.ai/blog/documind-local-first-rag-platform.
- What documents can I index?
- Plain text and common document formats supported by LangChain loaders. Chunk size and overlap are configurable in the notebook/app configuration.
- Can I swap GPT-2 for a larger model?
- Yes — replace the LangChain LLM wrapper with any Hugging Face or local model compatible with your hardware. DocuMind demonstrates the Ollama-based production pattern.
