DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Low-Code Development: Leverage low and no code to streamline your workflow so that you can focus on higher priorities.

DZone Security Research: Tell us your top security strategies in 2024, influence our research, and enter for a chance to win $!

Launch your software development career: Dive head first into the SDLC and learn how to build high-quality software and teams.

Open Source Migration Practices and Patterns: Explore key traits of migrating open-source software and its impact on software development.

Related

  • A Comparative Exploration of LLM and RAG Technologies: Shaping the Future of AI
  • How To Select the Right Vector Database for Your Enterprise GENERATIVE-AI Stack
  • PostgresML: Extension That Turns PostgreSQL Into a Platform for AI Apps
  • How I Converted Regular RDBMS Into Vector Database To Store Embeddings

Trending

  • Build Your Business App With BPMN 2.0
  • Theme-Based Front-End Architecture Leveraging Tailwind CSS for White-Label Systems
  • Build an Advanced RAG App: Query Rewriting
  • Next-Gen Lie Detector: Stack Selection
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Maximizing the Potential of LLMs: Using Vector Databases

Maximizing the Potential of LLMs: Using Vector Databases

How can you work around the token limit problem in Large Language Models like GPT-4 using Vector Embedding Databases and LangChain?

By 
Roger Oriol user avatar
Roger Oriol
·
Apr. 20, 23 · Tutorial
Like (2)
Save
Tweet
Share
4.4K Views

Join the DZone community and get the full member experience.

Join For Free

LLMs do Natural Language Processing (NLP) to represent the meaning of the text as a vector. This representation of the words of the text is an embedding.

The Token Limit: The LLM Prompting Biggest Problem

Currently, one of the biggest problems with LLM prompting is the token limit. When GPT-3 was released, the limit for both the prompt and the output combined was 2,048 tokens. With GPT-3.5, this limit increased to 4,096 tokens. Now, GPT-4 comes in two variants. One with a limit of 8,192 tokens and another with a limit of 32,768 tokens, around 50 pages of text.

So, what can you do when you might want to do a prompt with a context larger than this limit? Of course, the only solution is to make the context shorter. But how can you make it shorter and at the same time have all the relevant information? The solution: store the context in a vector database and find the relevant context with a similarity search query.

What Are Vector Embeddings?

Let's start by explaining what vector embeddings are. Roy Keynes' definition is: "Embeddings are learned transformations to make data more useful." A neural network learns to transform the text into a vector space that contains its actual meaning. This is more useful because it can find synonyms and the syntactical and semantical relationships between words. This visual helps us understand how those vectors can encode meaning:

Word2Vec representation of Queen = King - Man + Woman

What Do Vector Databases Do?

A vector database stores and indexes vector embeddings. This is useful for the fast retrieval of vectors and looking for similar vectors.

Similarity Search

We can find the similarity of vectors by calculating a vector's distance from all other vectors. The nearest neighbors will be the most similar results to the query vector. This is how flat indexes in vector databases work. But this is not very efficient; in a large database, this might take a very long time.

To improve the search's performance, we can try to calculate the distance for only a subset of the vectors. This approach, called approximate nearest neighbors (ANN), improves speed but sacrifices the quality of results. Some popular ANN indexes are Locally Sensitive Hashing (LSH), Hierarchical Navigable Small Worlds (HNSW), or Inverted File Index (IVF).

Integrating Vector Stores and LLMs

For this example, I downloaded the whole Numpy documentation (with over 2000 pages) as a PDF from this URL.

We can write Python code to transform the context document into embeddings and save them to a vector store. We will use LangChain to load the document and split it into chunks and Faiss (Facebook AI Similarity Search) as a vector database.

Python
 
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("example_data/layout-parser-paper.pdf")
pages = loader.load_and_split()

embeddings = OpenAIEmbeddings()

db = FAISS.from_documents(pages, embeddings)
db.save_local("numpy_faiss_index")


Now, we can use this database to perform a similarity search query to find pages that might be related to our prompt. Then, we use the resulting chunks to fill the context of our prompt. We will use LangChain to make it easier:

Python
 
from langchain.vectorstores import FAISS
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.llms import OpenAI

query = "How to calculate the median of an array"

db = FAISS.load_local("numpy_faiss_index", embeddings)
docs = docsearch.similarity_search(query)

chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff")
chain({"input_documents": docs, "question": query}, return_only_outputs=True)


Our question for the model is, "How to calculate the median of an array." Even though the context that we give it is way over the token limit, we have overcome this limitation and got an answer:

Python
 
To calculate the median, you can use the numpy.median() function, which takes an input array or object that can be converted to an array and computes the median along the specified axis. The axis parameter specifies the axis or axes along which the medians are computed, and the default is to compute the median along a flattened version of the array. The function returns the median of the array elements.

For example, to calculate the median of an array "arr" along the first axis, you can use the following code:

import numpy as np
median = np.median(arr, axis=0)

This will compute the median of the array elements along the first axis, and return the result in the variable "median".


This is just one clever solution for a very new problem. As LLMs keep evolving, maybe problems like this will be solved without the need for these kinds of clever solutions. However, I'm sure that this evolution will open the door for new capabilities that might need other new clever solutions for the challenges that they may bring.

AI Data structure Database NLP NumPy neural network

Published at DZone with permission of Roger Oriol. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • A Comparative Exploration of LLM and RAG Technologies: Shaping the Future of AI
  • How To Select the Right Vector Database for Your Enterprise GENERATIVE-AI Stack
  • PostgresML: Extension That Turns PostgreSQL Into a Platform for AI Apps
  • How I Converted Regular RDBMS Into Vector Database To Store Embeddings

Partner Resources


Comments

ABOUT US

  • About DZone
  • Send feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: