Embeddings - The Foundation of Semantic Search

Why do we need them?

We’ve got summaries.

We’ve extracted the keywords and can search through them.

However, we need to upgrade our game if we want to answer user questions.

This is where embeddings come into help!

This is where we are in the process of creating LexGPT!

Why embeddings?

Embeddings are a foundation of semantic search. They convert text (e.g. summaries or sentences) into numbers (we call them vectors).

Transformation of a question into embedding.

These vectors capture the essence and context of the text, and most importantly - they allow us to search for similar documents.

Searching for texts that have similar meanings is called semantic search.

An example space of embeddings can look something like this:

An example of embedding space

Each point on the graph represents one document from a given category. We can see that similar documents are grouped close together.

This is the mindblowing property of text embeddings!

Embeddings for question-answering

What’s even more crazy?

We can embed user questions into this space and look for documents that answer user’s questions!

This is what the process looks like:

  • The user asks a question related to the podcast, e.g. “What’s the best advice for young people”

  • We are embedding the question into embedding space and looking for documents “similar” to the question

  • We are providing these documents as a context for the LLM model

  • We get an answer based on the question and documents.

Simple - yet powerful!

This is the tenth day of the 30-day AI challenge. We are 1/3 through!

Over the next month, I will be building the Lex Fridman AI engine with you!

If you're reading this, I assume you'd like to build things. If you stick to this newsletter you will have a running project after a month and know the necessary technology to build AI apps.

I've recently built PodcastGPT and want to share the process with the community. If you haven't seen the app yet, you can get access here: PodcastGPT

This is all for now! See you tomorrow.

Stay focused!

Luke