DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities.

DZone Security Research: Tell us your top security strategies in 2024, influence our research, and enter for a chance to win $!

Launch your software development career: Dive head first into the SDLC and learn how to build high-quality software and teams.

Open Source Migration Practices and Patterns: Explore key traits of migrating open-source software and its impact on software development.

Related

  • Using AWS Data Lake and S3 With SQL Server: A Detailed Guide With Research Paper Dataset Example
  • Leveraging AI and Vector Search in Azure Cosmos DB for MongoDB vCore
  • Automating Operational Efficiency: Integrating AI Insights From Amazon SageMaker Into Business Workflows
  • Securing and Monitoring Your Data Pipeline: Best Practices for Kafka, AWS RDS, Lambda, and API Gateway Integration

Trending

  • A Comprehensive Guide To Building and Managing a White-Label Platform
  • Microservices Design Patterns for Highly Resilient Architecture
  • Test Smells: Cleaning up Unit Tests
  • How To Remove Excel Worksheets Using APIs in Java
  1. DZone
  2. Data Engineering
  3. Databases
  4. Simplify RAG Application With MongoDB Atlas and Amazon Bedrock

Simplify RAG Application With MongoDB Atlas and Amazon Bedrock

In this article, learn how to integrate MongoDB Atlas as the vector store and set up the entire workflow for your RAG application.

By 
Abhishek Gupta user avatar
Abhishek Gupta
DZone Core CORE ·
May. 30, 24 · Tutorial
Like (1)
Save
Tweet
Share
2.1K Views

Join the DZone community and get the full member experience.

Join For Free

By fetching data from the organization’s internal or proprietary sources, Retrieval Augmented Generation (RAG) extends the capabilities of FMs to specific domains, without needing to retrain the model. It is a cost-effective approach to improving model output so it remains relevant, accurate, and useful in various contexts.

Knowledge Bases for Amazon Bedrock is a fully managed capability that helps you implement the entire RAG workflow from ingestion to retrieval and prompt augmentation without having to build custom integrations to data sources and manage data flows. With MongoDB Atlas vector store integration, you can build RAG solutions to securely connect your organization’s private data sources to FMs in Amazon Bedrock.

Let's see how the MongoDB Atlas integration with Knowledge Bases can simplify the process of building RAG applications.

mongodb atlas

Configure MongoDB Atlas

MongoDB Atlas cluster creation on AWS process is well documented. Here are the high-level steps:

  • This integration requires an Atlas cluster tier of at least M10. During cluster creation, choose an M10 dedicated cluster tier.
  • Create a database and collection.
  • For authentication, create a database user. Select Password as the Authentication Method. Grant the Read and write to any database role to the user.
  • Modify the IP Access List – add IP address 0.0.0.0/0 to allow access from anywhere. For production deployments, AWS PrivateLink is the recommended way to have Amazon Bedrock establish a secure connection to your MongoDB Atlas cluster.

Create the Vector Search Index in MongoDB Atlas

Use the below definition to create a Vector Search index.

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "AMAZON_BEDROCK_CHUNK_VECTOR",
      "similarity": "cosine",
      "type": "vector"
    },
    {
      "path": "AMAZON_BEDROCK_METADATA",
      "type": "filter"
    },
    {
      "path": "AMAZON_BEDROCK_TEXT_CHUNK",
      "type": "filter"
    }
  ]
}


  • AMAZON_BEDROCK_TEXT_CHUNK – Contains the raw text for each data chunk. We are using cosine similarity and embeddings of size 1536 (we will choose the embedding model accordingly - in the the upcoming steps).
  • AMAZON_BEDROCK_CHUNK_VECTOR – Contains the vector embedding for the data chunk.
  • AMAZON_BEDROCK_METADATA – Contains additional data for source attribution and rich query capabilities.

Configure the Knowledge Base in Amazon Bedrock

Create an AWS Secrets Manager secret to securely store the MongoDB Atlas database user credentials.

Secrets Manager

Create an Amazon Simple Storage Service (Amazon S3) storage bucket and upload any document(s) of your choice — Knowledge Base supports multiple file formats (including text, HTML, and CSV). Later, you will use the knowledge base to ask questions about the contents of these documents.

Navigate to the Amazon Bedrock console and start configuring the knowledge base. In step 2, choose the S3 bucket you created earlier:

configure data source

Select Titan Embeddings G1 – Text embedding model MongoDB Atlas as the vector database.

Select Titan Embeddings G1 – Text embedding model MongoDB Atlas as the vector database

Enter the basic information for the MongoDB Atlas cluster along with the ARN of the AWS Secrets Manager secret you had created earlier. In the Metadata field mapping attributes, enter the vector store-specific details. They should match the vector search index definition you used earlier.

metadata field mapping

Once the knowledge base is created, you need to synchronize the data source (S3 bucket data) with the MongoDB Atlas vector search index.

synchronize data source

Once that's done, you can check the MongoDB Atlas collection to verify the data. As per the index definition, the vector embeddings have been stored in AMAZON_BEDROCK_CHUNK_VECTOR along with the text chunk and metadata in AMAZON_BEDROCK_TEXT_CHUNK and AMAZON_BEDROCK_METADATA, respectively.

bedrock knowledge

Query the Knowledge Base

You can now ask questions about your documents by querying the knowledge base — select Show source details to see the chunks cited for each footnote.

Select Show source details to see the chunks cited for each footnote

You can also change the foundation model. For example, I switched to Claude 3 Sonnet.

select model

Use Retrieval APIs To Integrate Knowledge Base With Applications

To build RAG applications on top of Knowledge Bases for Amazon Bedrock, you can use the RetrieveAndGenerate API which allows you to query the knowledge base and get a response.

If you want to further customize your RAG solutions, consider using the Retrieve API, which returns the semantic search responses that you can use for the remaining part of the RAG workflow.

More Configurations

You can further customize your knowledge base queries using a different search type, additional filter, different prompt, etc.

More Configurations

Conclusion

Thanks to the MongoDB Atlas integration with Knowledge Bases for Amazon Bedrock, most of the heavy lifting is taken care of. Once the vector search index and knowledge base are configured, you can incorporate RAG into your applications. Behind the scenes, Amazon Bedrock will convert your input (prompt) into embeddings, query the knowledge base, augment the FM prompt with the search results as contextual information, and return the generated response.

Happy building!

AWS Knowledge base MongoDB Integration vector database

Published at DZone with permission of Abhishek Gupta, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Using AWS Data Lake and S3 With SQL Server: A Detailed Guide With Research Paper Dataset Example
  • Leveraging AI and Vector Search in Azure Cosmos DB for MongoDB vCore
  • Automating Operational Efficiency: Integrating AI Insights From Amazon SageMaker Into Business Workflows
  • Securing and Monitoring Your Data Pipeline: Best Practices for Kafka, AWS RDS, Lambda, and API Gateway Integration

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: