lab-02-rag-retriever
Rag Retriever
README
# lab-02-rag-retriever — RAG arc Retrieval without a vector database is still retrieval. You wire a toy corpus, score it with deterministic token overlap, simulate a reranker that breaks ties like grown-ups (explicit rules, not vibes), and return an answer that carries **citations** your validator can audit. This lab stays offline: no API keys, no embeddings download, no "it worked on my laptop" network flake. ## Run ```bash cd labs/langchain/lab-02-rag-retriever uv run python src/main.py ``` ## Test ```bash uv run pytest tests/ ``` ## Stretch goals - Swap overlap for BM25-ish scoring using only stdlib math. - Add a `context_budget` that trims quotes before answering.
pyproject.toml
[project] name = "lab-02-rag-retriever" version = "0.1.0" requires-python = ">=3.11" dependencies = [] [project.optional-dependencies] dev = ["pytest>=8.0"] [tool.uv] dev-dependencies = []
Starter Python
"""Demo offline RAG lab."""
from __future__ import annotations
import sys
from pathlib import Path
_SRC = Path(__file__).resolve().parent
if str(_SRC) not in sys.path:
sys.path.insert(0, str(_SRC))
from rag_pipeline import answer_with_citations # noqa: E402
def main() -> None:
for q in (
"titanium widget weight",
"PTO approval policy",
"what is Runnable invoke",
):
print(q, "->", answer_with_citations(q))
if __name__ == "__main__":
main()