MRLBA: Natural Language Database Queries
Making Data Queryable Without SQL
This project demonstrates how to bridge the gap between raw data and insight. Instead of requiring SQL knowledge, users ask questions in plain English and get instant answers—no database expertise needed.
The Challenge
Large CSV datasets (1.7GB+) are hard to query without technical skills. SQL is a barrier. Existing solutions require cloud services or expensive APIs. I built a system that keeps everything local and makes data exploration trivial.
The Solution: Text-to-SQL RAG
The system converts natural language questions into SQL queries, executes them against a local database, and synthesizes results back into readable answers. All processing happens on your machine—no data leaves.
via LLM] C -->|Irrelevant| F[Reject] E --> G[Execute Query
on SQLite] G --> H[Query Results] H --> I[Synthesize Answer
via LLM] I --> J[Natural Language
Response] style A fill:#e1f5ff style B fill:#c8e6c9 style J fill:#c8e6c9
Architecture
Data Layer: SQLite database created once from CSV. Pandas handles chunked ingestion (100K rows at a time) to manage memory efficiently.
Query Engine: Two-stage LLM pipeline. First stage classifies intent (is this a database question?). Second stage generates SQL and synthesizes results.
Local LLM: Ollama + Llama 3.1 runs entirely locally. No cloud API calls, no rate limits, no costs per query.
Example Queries
Users can ask questions like:
- “How many records match these criteria?"
- "What’s the average value across the dataset?"
- "Show me entries with X greater than Y"
- "Which items are in location Z?”
Why This Matters
No SQL needed - Anyone can query the database in natural language.
Private and fast - All processing local, no data sent to APIs, instant results.
Grounded answers - The LLM synthesizes responses based on actual data, not hallucinations.
Simple codebase - Two Python scripts handle everything: ingestion and querying.
Technical Depth
The system handles SQL generation errors gracefully, validates questions before execution, and manages large result sets efficiently. It demonstrates understanding of LLM limitations—classifying intent before generation prevents wasted queries and hallucinated SQL.
What This Shows
This project proves I can build practical RAG systems that ground LLMs in real data. It’s not just prompt engineering—it’s architectural thinking about how to make AI systems reliable and useful at scale.