Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Simplify AI data integration with RamaLama and RAG

How RamaLama makes sharing data with your AI model boring

April 3, 2025
Daniel Walsh
Related topics:
Artificial intelligenceContainersKubernetesOpen source
Related products:
Red Hat AI

Share:

    The RamaLama project makes it easy to run AI locally by combining AI models and container technology. The RamaLama project has prepared all software necessary to run an AI model in container images specific to the local GPU accelerators. Check out How RamaLama makes working with AI models boring for an overview of the project.

    The RamaLama tool figures out what accelerator is available on the user’s system and pulls the matching image. It then pulls the specified AI model to the local system and finally creates a container from the image with the AI model mounted inside of it. You can use the run command to activate a chatbot against the model, or serve the model via an OpenAI-compatible REST API.

    Integrating user-specific data into AI models with RAG

    Because everything is running in containers, RamaLama can generate code to put the REST API into production, either to run on edge devices using Quadlets or into a Kubernetes cluster. 

    This works great, but often the AI model was not trained on the user’s data and needs more data. In the AI world, adding user data to an AI model requires retrieval-augmented generation (RAG). This technique enhances large language models (LLMs) by enabling them to access and incorporate external knowledge sources before generating responses, leading to more accurate and relevant outputs. User data is often stored as PDF or DOCX files or as Markdown.  

    How do users translate these documents into something that the AI models can understand?

    IBM developed a helpful open source tool called Docling, which can parse most document formats into simpler JSON structured language. This JSON can then be compiled into RAG vector database format for AI models to consume. See Figure 1.

    Image of a duckling with PDF, DOCX, PPTX, and HTML files traveling through Docling to create a Docling DOC file. The DOC file has three arrows pointing to JSON, Markdown, and Figures; Chunking, LlamaIndex, and Langchain; and your gen AI app.
    Figure 1: Processing document formats with Docling.

    This sounds great, but it can be very complex to set up.

    Introducing RamaLama RAG

    RamaLama has added variants of the GPU-accelerated container images with a -rag postfix. These images layer on top of the existing images and add Docling and all of its requirements as well as code to create a RAG vector database. See Figure 2.

    Diagram of RAG support in RamaLama, with Docling routing documents to the RAG vector database.
    Figure 2: How a RAG vector database is created with RamaLama and Docling.

    RamaLama is currently compatible with the Qdrant vector database. (The RamaLama project welcomes PRs to add compatibility for other databases.)

    Simply execute:

    $ ramalama rag file.md document.docx https://example.com/mydoc.pdf quay.io/myrepository/ragdata

    This command generates a container, mounting the specified files into it and executing the doc2rag Python script. This script uses Docling and Qdrant to produce a vector.db based on the input files. 

    Once the container completes, RamaLama creates the specified OCI image (Artifact in the future) containing vectordb. This image can now be pushed to any OCI-compliant registry (quay.io, docker.io, Artifactory …) for others to consume.

    To serve up the model, execute the following command:

    $ ramalama run --rag quay.io/myrepository/ragdata MODEL

    RamaLama creates a container with the RAG vector database and the model mounted into it. Then it starts a chatbot that can interact with the AI model using the RAG data.

    Similarly, RamaLama can serve up the REST API with a similar command:

    $ ramalama serve  --rag quay.io/myrepository/ragdata MODEL

    Putting the RAG-served model into production

    In order to put the RAG model into production, you need to use an OCI-based model. If the model is from Ollama or Hugging Face, it is easy to convert it to an OCI format, as follows:

    $ ramalama convert MODEL quay.io/myrepository/mymodel

    Now push the models to the registries:

    $ ramalama push quay.io/myrepository/mymodel
    $ ramalama push quay.io/myrepository/myrag

    Use the ramalama serve command to generate Kubernetes format for running in a cluster or a quadlet to run on edge devices.

    For Quadlets:

    $ ramalama serve  –name myrag –generate quadlet --rag quay.io/myrepository/ragdata quay.io/myrepository/mymodel
    Generating quadlet file: myrag.volume
    Generating quadlet file: myrag.image
    Generating quadlet file: myrag-rag.volume
    Generating quadlet file: myrag-rag.image
    Generating quadlet file: myrag.container

    For Kubernetes:

    $ ramalama serve  –name myrag –generate kube --rag quay.io/myrepository/ragdata quay.io/myrepository/mymodel
    Generating Kubernetes YAML file: myrag.yaml

    Now install these quadlets on multiple edge services and just update the RAG data image or the model image and the edge devices will automatically get updated with the latest content.

    Similarly, use the Kubernetes YAML files and update the container image used to run the model and the RAG data independently, and Kubernetes will take care of updating the application and its content on restart.

    Summary

    RAG is a powerful capability, but one that can be complicated to set up. RamaLama has made it trivial.

    Follow these installation instructions to try RamaLama on your machine.

    OSZAR »

    Related Posts

    • How RamaLama makes working with AI models boring

    • How RamaLama runs AI models in isolation by default

    • Simplifying AI with RamaLama and llama-run

    • Deploy Llama 3 8B with vLLM

    • A practical guide to Llama Stack for Node.js developers

    • How to fine-tune Llama 3.1 with Ray on OpenShift AI

    Recent Posts

    • LLM Compressor: Optimize LLMs for low-latency deployments

    • How to set up NVIDIA NIM on Red Hat OpenShift AI

    • Leveraging Ansible Event-Driven Automation for Automatic CPU Scaling in OpenShift Virtualization

    • Python packaging for RHEL 9 & 10 using pyproject RPM macros

    • Kafka Monthly Digest: April 2025

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue

    OSZAR »