TIL - Structured RAG using typeagent-py

TIL

LLMs

RAG

Exploring typeagent-py package for Structured RAG

Author

Viswa Kumar

Published

October 28, 2025

Important

TIL¹ is a new series I’m starting to quickly document things I explore, inspired from Simon Willison’s blogs.

I accidentally followed Guido van Rossum² on X and bookmarked his structured RAG is better than RAG talk on the PyBay25 workshop. Following are my notes based on that slides.

The talk starts with the pros and cons of both traditional / classic RAG vs structured RAG

Traditional / Classic RAG

KB sources are chunked, converted into embeddings³ and get stored in a vector database
When the user asks a query, this query is also turned into embeddings (usually using the same embedding model / LLM) and then a similarity search is performed between this question embeddings vs doc embeddings. There are multiple search algos, but the most common that I have encountered is Cosine similarity.
Based on the similarity score, top-K chunks are retrieved, converted back to text, which will be used as additional context for the LLM to answer. A typical instruction to LLM would be Based on the retrived context {retrieved text}, answer the user query {query}.
This talk claims that, traditional rag falls short in more contextual queries. For eg queries like who ate that? or Find the most sold product in Q3 and summarize its profit by regions type of queries.
This is because the retrieval is simply based on embeddings, which doesn’t have semantic meanings.

Structured RAG

In structured RAG, the ingestion pipeline is little nuanced. Instead of simply ingest the embeddings of the KB chunks, a LLM is employed to extract knowledge nuggets using NER⁴ techniques. LLMs are quite good it, apparently.
This knowledge chunks are more richer in semantics and can be stored in a more traditional datastores like relational DBs with further metadata. They argue, this then becomes a DB indexing problem, which can optimized with existing techniques, that are proven. It is now a computer science problem, they say!
In the querying stage, once again NER is employed to convert the user’s query into abstract entities and then use that to query the pre-populated DB to retrieve the context. The next step is still the same as of classic RAG . ie Based on the retrived context {retrieved text}, answer the user query {query}

Things I’m not sure!

Even though the theory portion makes sense, I couldn’t find the corresponding implementation from the MS’s typeagent-py repo . Based on my little reading, I still find references to classic rag where embeddings are still used. Need to dig further. Pplx is not helping much here

As per the README, it is simply a py port of TypeAgent KnowPro which is a typescript implementation of structured RAG concept by MS. The type agent architecture document provides more insights of this framework.

The distilled info from these docs to me are simply the following :

Structured RAG is being posed as a better framework for conversational usecase, where indirect references are better recalled than classic RAG
It works by elucidating metadata from those conversations / podcast transcripts (which inturn is in conv) such as timestamp, speaker etc and put it in a indexed DB backed data store.
A LLM is then used to simply translate NLP input to structured query input and vice versa.

Tip

In a nutshell, instead of using LLM as a embedding translator, they are using it as a NLP to sql translator??

My gut feeling : Probably over engineered but still a good project to explore. Definitely in my watchlist, but not in my toolbelt⁵

Subscribe to Techno Adventure Newsletter

I also publish a newsletter where I share my techo adventures in the intersection of Telecom, AI/ML, SW Engineering and Distributed systems. If you like getting my post delivered directly to your inbox whenever I publish, then consider subscribing to my substack.

I pinky promise 🤙🏻 . I won’t sell your emails!

Subscribe ✉️

Footnotes

For the uninitiated, TIL is the short form for Today I Learnt ↩︎
The inventor of python↩︎
Either by an embedding model or by an LLM↩︎
Named Entity Recognition ↩︎
A skill set; one’s accumulated capabilities.↩︎