Retrieval‑Augmented Generation (RAG) has quickly become one of the most practical ways to build AI systems that work with real‑world data instead of relying only on a model’s training knowledge.
Most RAG examples today are written in Python and often depend on frameworks like LangChain. While powerful, this leaves a gap for developers working primarily in .NET and C#.
In this post, I’ll walk through a .NET Console application I built that implements a complete RAG pipeline using GitHub Models for embeddings and chat completion.
What This Project Does?
At a high level, this application allows you to:
- Provide a website URL
- Automatically scrape and clean the page content
- Convert the content into embeddings
- Store embeddings in an in‑memory vector store
- Ask questions about the website
- Get accurate, context‑aware answers generated by an AI model
All of this runs inside a .NET Console App, making it easy to debug, extend, and later expose as an API.
Architecture Overview
The application follows a simple but effective RAG architecture:
Each step is implemented as a separate service, making the system easy to test and reuse.
How the RAG Pipeline Works?
Web Scraping
The application fetches the website HTML and extracts visible content from tags like: p, h1–h6,li, span, div.
Script and style tags are removed to reduce noise.
Text Chunking
The cleaned text is split into overlapping chunks.This improves retrieval accuracy and mirrors how modern RAG systems operate. Example: Chunk size: 300 characters and Overlap: 50 characters
Embeddings with GitHub Models
Each chunk is converted into a vector embedding using GitHub Models, allowing the system to perform semantic similarity searches.
The same embedding model is used for:
- Website content
- User questions
This ensures meaningful vector comparisons.
Vector Similarity Search
When a question is asked:
- The query is embedded
- Cosine similarity is calculated against all stored vectors
- The top K most relevant chunks are selected
This acts as the knowledge retrieval step.
Prompt Construction
The final prompt sent to the chat model contains:
- Retrieved context chunks
- Conversation history
- The user’s question
This keeps responses accurate, grounded, and conversational.
Chat Completion
The prompt is sent to a GitHub Models chat model, which generates the final answer. If the context does not contain relevant information, the model is instructed to clearly say so.
Running the Application
dotnet restore dotnet run
Console Commands
- Enter a website URL to initialize the RAG pipeline
- Ask questions about the website content
- Type new to load another website
- Type quit to exit
Check out the full source code on GitHub:
Happy coding!! 😊

No comments:
Post a Comment