Master Thesis Architecture

R&D in LLM-based document processing and reasoning systems utilizing GraphRAG.

Role

AI Researcher & Engineer

Stack

Python, FastAPI, Neo4j, LangChain

Focus

Graph Knowledge Retrieval

Problem Statement

Traditional Vector RAG systems often struggle with cross-document reasoning and understanding complex relationships between entities spread across massive knowledge bases. The thesis addressed the limitation of flat vector searches by infusing structural context.

The Architecture (GraphRAG)

I designed a system that extracts entities and their relationships from unstructured text, storing them in a Neo4j Knowledge Graph while simultaneously maintaining vector embeddings for semantic search.

Hybrid Retrieval

Combining Cypher queries for exact relational traversal with Vector similarity for semantic meaning.

Ingestion Pipeline

Automated chunking, NER (Named Entity Recognition), and relationship extraction using LLMs.

Results & Impact

Achieved a significant improvement in multi-hop reasoning questions over standard naive RAG baselines.
Reduced hallucinations by anchoring answers in explicit graph relationships.
Created a scalable, modular ingestion protocol that can be adapted for enterprise use-cases.