DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities.

DZone Security Research: Tell us your top security strategies in 2024, influence our research, and enter for a chance to win $!

Launch your software development career: Dive head first into the SDLC and learn how to build high-quality software and teams.

Open Source Migration Practices and Patterns: Explore key traits of migrating open-source software and its impact on software development.

Related

  • A Framework for Building Semantic Search Applications With Generative AI
  • How To Build Generative AI Apps on AWS Using Anthropic Claude 3
  • Addressing the Challenges of Scaling GenAI
  • Shingling for Similarity and Plagiarism Detection

Trending

  • Setting up CI/CD Pipelines: A Step-By-Step Guide
  • How to Configure Custom Metrics in AWS Elastic Beanstalk Using Memory Metrics Example
  • Developer Git Commit Hygiene
  • How To Use Builder Design Pattern and DataFaker Library for Test Data Generation in Automation Testing
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Vector Databases for Generative AI Applications

Vector Databases for Generative AI Applications

This post details a recent talk from a session at GIDS 2024 and provides a deeper look into the fundamentals of vector databases for Generative AI applications.

By 
Abhishek Gupta user avatar
Abhishek Gupta
DZone Core CORE ·
May. 06, 24 · Presentation
Like (1)
Save
Tweet
Share
1.4K Views

Join the DZone community and get the full member experience.

Join For Free

I first shared this blog from my session (at GIDS 2024). If you attended it, thank you for coming and I hope you found it useful! If not, well, you have the resources and links anyway – I have written out the talk so that you can follow along with the slides if you need more context.

Hopefully, the folks at GIDS will publish the video as well. I will add the link once it's available.

Key Info

  • Slides available here
  • GitHub repository - Code and instructions on how to get the demo up and running

Summarized Version of the Talk

I had 30-mins – so, I kept it short and sweet!

Setting the Context

Foundation models (FMs) are the heart of generative AI. These models are pre-trained on vast amounts of data. Large language models (LLMs) are a class of FMs; for instance, the Claude family from Anthropic, Llama from Meta, etc.

You generally access these using dedicated platforms; for example, Amazon Bedrock, which is a fully managed service with a wide range of models accessible via APIs. These models are pretty powerful, and they can be used standalone to build generative AI apps.

So Why Do We Need Vector Databases?

To better understand this, let's take a step back and talk about the limitations of LLMs. I will highlight a few common ones.

LLM Limitations

  • Knowledge cut-off: The knowledge of these models is often limited to the data that was current at the time it was pre-trained or fine-tuned.
  • Hallucination: Sometimes, these models provide an incorrect response, quite “confidently."

Hallucination example

Lack of Access To External Data Sources

Another reason is the lack of access to external data sources.

Think about it: You can set up an AWS account and start using models on Amazon Bedrock. But, if you want to build generative AI applications that are specific to your business needs, you need domain or company-specific private data (for example, a customer service chatbot that can access customer details, order info, etc.).

Now, it's possible to train or fine-tune these models with your data – but it's not trivial or cost-effective. However, there are techniques to work around these constraints – RAG (discussed later) being one of them, and vector databases play a key role.

Dive Into Vector Databases

Before we get into it, let's understand the following:

What Is a Vector?

In simple terms, vectors are numerical representations of text.

  • There is input text (also called prompt).
  • You pass it through something called an embedding model - think of it as a stateless function.
  • You get an output which is an array of floating-point numbers.

What’s important to understand is that Vectors capture semantic meaning, so they can be used for relevancy or context-based search, rather than simple text search.

Human text - embedding model - vector embeddings

Types of Vector Databases

I tend to categorize Vector databases into two types:

  • Vector data type support within existing databases, such as PostgreSQL, Redis, OpenSearch, MongoDB, Cassandra, etc.
  • Specialized vector databases, like Pinecone, Weaviate, Milvus, Qdrant, ChromaDB, etc.

This field is also moving very fast and I’m sure we will see a lot more in the near future!

Now you can run these specialized vector stores on AWS, via their dedicated cloud offerings. But I want to quickly give you a glimpse of the choices in terms of the first category that I referred to.

The following are supported as native AWS database(s):

  • Amazon OpenSearch service
  • Amazon Aurora with PostgreSQL compatibility
  • Amazon DocumentDB (with MongoDB compatibility)
  • Amazon MemoryDB for Redis which currently has Vector search in preview (at the time of writing)

Vector Databases in Generative AI Solutions

Here is a simplified view of where vector databases sit in generative AI solutions:

  • You take your domain-specific data and split/chunk them up.
  • Pass them through an embedding model: this gives you these vectors or embeddings.
  • Store these embeddings in a vector database.
  • Then, there are applications that execute semantic search queries and combine them in various ways (RAG being one of them).

Demo 1 (of 3): Semantic Search With OpenSearch and LangChain

Find the details on the GitHub repository linked earlier.

Semantic search with Open Search and LangChain

RAG: Retrieval Augmented Generation

We covered the limitations of LLM – knowledge cut-off, hallucination, no access to internal data, etc. Of course, there are multiple ways to overcome this.

  • Prompt-engineering techniques: Zero-shot, few-shot, etc., - Sure this is cost-effective, but how would this apply to domain-specific data?
  • Fine-tuning: Take an existing LLM and train it using a specific dataset. But what about the infra and costs involved? Do you want to become a model development company or focus on your core business?

These are just a few examples.

RAG Technique Adopts a Middle Ground

There are two key parts to a RAG workflow:

  • Part 1: Data ingestion is where you take your source data (PDF, text, images, etc.), break it down into chunks, pass it through an embedding model, and store it in the vector database.
  • Part 2: This involves the end-user application (e.g., a chatbot). The user sends a query – this input is converted to vector embedding using the same (embedding) model that was used for the source data. We then execute a semantic or similarity search to get the top-N closest results.

That’s not all.

  • Part 3: These results, also referred to as ”context,” are then combined with the user input and a specialized prompt. Finally, this is sent to an LLM – note this is not the embedding model, this is a large language model. The added context in the prompt helps the model provide a more accurate and relevant response to the user’s query.

Demo 2 (of 3): RAG With OpenSearch and LangChain

Find the details in the GitHub repository linked earlier.

RAG With OpenSearch and LangChain

Fully-Managed RAG Experience: Knowledge Bases for Amazon Bedrock

Another approach is to have a managed solution to take care of the heavy lifting. For example, if you use Amazon Bedrock, then Knowledge Bases can make RAG easier and more manageable. It supports the entire RAG workflow, from ingestion to retrieval and prompt augmentation.

It supports multiple vector stores to store vector embedding data.

Vector databases for Amazon Bedrock

Demo 3 (Of 3): Full-Managed RAG Knowledge Bases for Amazon Bedrock

Find the details in the GitHub repository linked earlier.

Full-Managed RAG Knowledge Bases for Amazon Bedrock

Now how do we build RAG applications using this?

For application integration, this is exposed by APIs:

  • RetrieveAndGenerate: Call the API, get the response - that's it. Everything (query embedding, semantic search, prompt engineering, LLM orchestration) is handled!
  • Retrieve: For custom RAG workflows, where you simply extract the top-N responses (like semantic search) and integrate the rest as per your choice.

Where Do I Learn More?

  • Documentation is a great place to start! Specifically, "Knowledge bases for Amazon Bedrock"
  • Code samples for Amazon Bedrock
  • Lots of content and practical solutions in the generative AI community space!
  • Ultimately, there is no replacement for hands-on learning. Head over to Amazon Bedrock and start building!

Wrapping Up

And, that's it. Like I said, I had 30-mins and I kept it short and sweet! This area is evolving very quickly. This includes vector databases, LLMs (there is one every week - feels like the JavaScript frameworks era!), and frameworks (like LangChain, etc.). It's hard to keep up but remember: the fundamentals are the same. The key is to grasp them - hopefully, this helps with some of it.

Happy building!

AI AWS Database Semantic search generative AI

Published at DZone with permission of Abhishek Gupta, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • A Framework for Building Semantic Search Applications With Generative AI
  • How To Build Generative AI Apps on AWS Using Anthropic Claude 3
  • Addressing the Challenges of Scaling GenAI
  • Shingling for Similarity and Plagiarism Detection

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: