Tuesday, March 3, 2026

Building a Web Scraping RAG Pipeline in .NET Using GitHub Models

Retrieval‑Augmented Generation (RAG) has quickly become one of the most practical ways to build AI systems that work with real‑world data instead of relying only on a model’s training knowledge.

Most RAG examples today are written in Python and often depend on frameworks like LangChain. While powerful, this leaves a gap for developers working primarily in .NET and C#.

In this post, I’ll walk through a .NET Console application I built that implements a complete RAG pipeline using GitHub Models for embeddings and chat completion.

What This Project Does?

At a high level, this application allows you to:

  1. Provide a website URL
  2. Automatically scrape and clean the page content
  3. Convert the content into embeddings
  4. Store embeddings in an in‑memory vector store
  5. Ask questions about the website
  6. Get accurate, context‑aware answers generated by an AI model

All of this runs inside a .NET Console App, making it easy to debug, extend, and later expose as an API.

Architecture Overview

The application follows a simple but effective RAG architecture:

Project RAG architecture

Each step is implemented as a separate service, making the system easy to test and reuse.

How the RAG Pipeline Works?

  1. Web Scraping

    The application fetches the website HTML and extracts visible content from tags like: p, h1–h6,li, span, div.

    Script and style tags are removed to reduce noise.

  2. Text Chunking

    The cleaned text is split into overlapping chunks.This improves retrieval accuracy and mirrors how modern RAG systems operate. Example: Chunk size: 300 characters and Overlap: 50 characters

  3. Embeddings with GitHub Models

    Each chunk is converted into a vector embedding using GitHub Models, allowing the system to perform semantic similarity searches.

    The same embedding model is used for:

    • Website content
    • User questions

    This ensures meaningful vector comparisons.

  4. Vector Similarity Search

    When a question is asked:

    • The query is embedded
    • Cosine similarity is calculated against all stored vectors
    • The top K most relevant chunks are selected

    This acts as the knowledge retrieval step.

  5. Prompt Construction

    The final prompt sent to the chat model contains:

    • Retrieved context chunks
    • Conversation history
    • The user’s question

    This keeps responses accurate, grounded, and conversational.

  6. Chat Completion

    The prompt is sent to a GitHub Models chat model, which generates the final answer. If the context does not contain relevant information, the model is instructed to clearly say so.

Running the Application

dotnet restore
dotnet run

Console Commands

  • Enter a website URL to initialize the RAG pipeline
  • Ask questions about the website content
  • Type new to load another website
  • Type quit to exit

Check out the full source code on GitHub:

Memento Design Pattern

Happy coding!! 😊

No comments:

Post a Comment

^ Scroll to Top