Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Improving Chatbot result with Retrieval Augmented Generation (RAG) and Node.js

November 13, 2024
Lucas Holmquist
Related topics:
Artificial intelligenceNode.js
Related products:
Red Hat build of Node.js

Share:

    Welcome back to this ongoing series of posts about using Large Language Models(LLMs) with Node.js.  In the first post, we took a look at creating and using a LLM chat bot with Node.js.  The second post added a feature to help generate an email summarization while also making the data returned from the LLM was a properly structured JSON object

    This next post will take a look at improving the chat bot’s results using a paradigm called Retrieval Augmented Generation or RAG.

    What is RAG?

    Before we start, let's see what this concept is all about.  Like most human conversations, how someone responds to a question will depend on the context.  If you ask multiple people the same question, you might get different answers depending on that person's context or knowledge.  LLMs are really no different, they are trained on a certain set of data and if that data is wrong, or out-dated, then the answers they give might not be what you expected.  

    One option would be to retrain the LLM, but this is time consuming, and there is also the possibility that we won’t have the ability to do that anyway.  This is where Retrieval Augmented Generation or RAG comes in handy.

    At a high level the process has 2 main concepts

    Indexing, which usually happens “offline”.  This is where we load and store the extra context.  This can be really anything, like pdfs, markdown files or even data scraped from a webpage.  These are usually split and stored in some sort of vector data

    offline rag flow
    https://js.langchain.com/docs/tutorials/rag/,

     

     

    The next concept is Retrieval and Generation.  This is where you would find the relevant chunks of data most closely related to your new context.  Using only the relevant chunks is important here since you are limited in how much you can add to the prompt.  Those bits of data are added to our question prompt to give our model the proper context

    RAG flow
    https://js.langchain.com/docs/tutorials/rag/,

    Parasol and RAG

    If you remember from the first post, the chat bot would reply based on the claims summary that we were providing, but if we wanted to know something like what the rental car policy is, or what Parasols contact information was, our model was not equipped with that knowledge.

    This is where the RAG concept comes into play.  As mentioned before, retraining our model can be time consuming, and you might not even have access to it anyway, so retraining would be impossible.

    For this use case, we have a pdf file that contains policy related information, like the car rental policy as well as the contact information for our fictitious company.  This information will be used as the extra context when querying our model.

    Prepare the Context

    As mentioned above, we need to add our extra context, which in this case is a pdf file, to some type of vector database for later access.  In this example, we will be using an in-memory database.  These next few steps are usually done “offline”, meaning that they aren’t usually part of the application, but at some point beforehand. Those steps, as outlined in the image in the previous section, are to load the document, split it into chunks, embed with the proper embeddings, then store them into the vector database.

    For those that would like to follow along, the code for this functionality can be found in this branch of our application.

    Load

    Since we are using langchain.js, we can use the PDFLoader class to load our pdf document.  This would look something like this:

    const loader = new PDFLoader(path.join(__dirname, '../', 'resources', 'policies', 'policy-info.pdf'));
    
    const docs = await loader.load();

    Split

    The next part is to take those loaded documents and spit them into smaller chunks.  This is important for both better indexing and since what we pass to the prompt is limited, this allows us to only get the pieces we need.

    const textSplitter = new RecursiveCharacterTextSplitter({
        chunkSize: 200,
        chunkOverlap: 20
      });
    
    const splits = await textSplitter.splitDocuments(docs);

    Embed and Store

    The last steps once our docs are split is to then generate embeddings for those chunks.  I won’t go in depth on embeddings, but this is the process of converting the text we just loaded from our documents into a numerical representation.  This is important for getting all our data in a compatible format to perform a better relevancy search. All of that is then stored into a vector database of some kind

    In this example we are just using the in-memory store that langchain.js provides

      // Instantiate Embeddings function
      const embeddings = new HuggingFaceTransformersEmbeddings();
    
      const vectorStore = await MemoryVectorStore.fromDocuments(
        splits,
        embeddings
      );

     

    Use the Context

    Once all the setup has been done, and like i mentioned earlier, that part is usually done at some other point, and not usually part of the application, we can use that new knowledge in our prompt so we can get a context aware answer to our question.

    Creating the Prompt and Chain

    The key to any chatbot is the construction of the prompt and chain that we pass to the model.  In this example, we need to add some “context” to our prompt, that will be filled in with the relevant pieces of information from our loaded document.  

    The full code for the prompt can be found here. Below is a shortened version.

     

    const prompt = ChatPromptTemplate.fromMessages([
        [ 'system',
          'You are a helpful, respectful and honest assistant named "Parasol Assistant".' +
      .....
          'You must answer in 4 sentences or less.' +
          'Don\'t make up policy term limits by yourself' +
          'Context: {context}'
        ],
        [ 'human', '{input}' ]
      ]);
    

    Notice, the {context} parameter,  this is where our new context will be injected.

    We can use langchain.js here to create a document chain,  and yes, that is the actual name of the function createStuffDcouemtnsChain

     

      const ragChain = await createStuffDocumentsChain({
        llm: model,
        prompt
      });

     

    And then create our retrieval chain based on our document chain and the vector store retriever

    const retrievalChain = await createRetrievalChain({
        combineDocsChain: ragChain,
        retriever: vectorStore.asRetriever();
      });

    Finally, we can use that newly created chain to ask the question.

    const result = await retrievalChain.stream({
        input: createQuestion(question)
      });

    The result returned will be more context aware now.  

    context aware chat bot

    As we can see, there is information that is referencing the pdf document we loaded.  Here is a screenshot of the relevant parts of that document, that our result references:

    policy info screenshot

     

    Conclusion

    As you can see, without too much more code, we are able to make our chatbot more context aware without having to retrain it.  This is very useful for those industries, who might not feel comfortable training a model with their sensitive data.

    Stay tuned for the next post in this series, where we will add some function tooling

    As always if you want to learn more about what the Red Hat Node.js team is up to check these out:

    https://developers.redhat.com/topics/nodejs

    https://developers.redhat.com/topics/nodejs/ai

    https://github.com/nodeshift/nodejs-reference-architecture

    https://developers.redhat.com/e-books/developers-guide-nodejs-reference-architecture

    OSZAR »
    Disclaimer: Please note the content in this blog post has not been thoroughly reviewed by the Red Hat Developer editorial team. Any opinions expressed in this post are the author's own and do not necessarily reflect the policies or positions of Red Hat.

    Recent Posts

    • LLM Compressor: Optimize LLMs for low-latency deployments

    • How to set up NVIDIA NIM on Red Hat OpenShift AI

    • Leveraging Ansible Event-Driven Automation for Automatic CPU Scaling in OpenShift Virtualization

    • Python packaging for RHEL 9 & 10 using pyproject RPM macros

    • Kafka Monthly Digest: April 2025

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue

    OSZAR »