lab-02-rag-retriever

Rag Retriever

README

# lab-02-rag-retriever — RAG arc

Retrieval without a vector database is still retrieval. You wire a toy corpus, score it with deterministic token overlap, simulate a reranker that breaks ties like grown-ups (explicit rules, not vibes), and return an answer that carries **citations** your validator can audit.

This lab stays offline: no API keys, no embeddings download, no "it worked on my laptop" network flake.

## Run

```bash
cd labs/langchain/lab-02-rag-retriever
uv run python src/main.py
```

## Test

```bash
uv run pytest tests/
```

## Stretch goals

- Swap overlap for BM25-ish scoring using only stdlib math.
- Add a `context_budget` that trims quotes before answering.

pyproject.toml

[project]
name = "lab-02-rag-retriever"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = []

[project.optional-dependencies]
dev = ["pytest>=8.0"]

[tool.uv]
dev-dependencies = []

Starter Python

"""Demo offline RAG lab."""

from __future__ import annotations

import sys
from pathlib import Path

_SRC = Path(__file__).resolve().parent
if str(_SRC) not in sys.path:
    sys.path.insert(0, str(_SRC))

from rag_pipeline import answer_with_citations  # noqa: E402


def main() -> None:
    for q in (
        "titanium widget weight",
        "PTO approval policy",
        "what is Runnable invoke",
    ):
        print(q, "->", answer_with_citations(q))


if __name__ == "__main__":
    main()