Embeddings

Main Concept

Embeddings are numerical representations of real-world objects—such as words, images, or audio—stored as a list of numbers called a vector. Instead of treating data as simple text or pixels, embeddings capture the “meaning” or relationships between items by mapping them into a multidimensional space.

Context

Computers only understand numbers; however, human language and visual patterns are complex and full of nuances. Embeddings convert complex, unstructured data into a format that machine learning models can process efficiently.
For instance, embeddings allow LLMs to recognize that the words “king” and “queen” are more closely related than “king” and “apple.”

Key Aspects

The embedding process works goes after the tokenization, as each token (identified by its Token ID) must be processed through an embedding model to be converted into a multidimensional array.
Vectors possess a high dimensionality to capture a vast array of features from a single token, such as semantic meaning, syntactic role, sentiment, and more.
By utilizing multidimensional vectors, models can identify similarities between words (or images) through mathematical operations between vectors (such as cosine similarity).
Embeddings are stored in specialized Vector Databases (such as Amazon OpenSearch or Amazon Aurora with patternsgvector), which are optimized for efficient storage and high-speed similarity searches.
Words that have semantic relationship have similar embeddings, example: dog and putty, or cat and kitten.
Embeddings allow vector databases to do similarity search.

Applications

Semantic Search
Recommendation Engines
Retrieval Augmented Generation
Anomaly Detection

Examples

Take the word “love”. Its vector representation captures that it is an emotion, a noun (or verb depending on context), carries a positive valence, and is semantically related to “affection” or “caring.”
Text: The sentence “I love my cat” is converted into a vector like [0.12, -0.54, 0.88…]. A similar sentence like “I adore my kitten” would result in a vector that is mathematically very close to the first on.

Links:

References

🌿💻 The Packets Garden

Embeddings

Main Concept

Context

Key Aspects

Applications

Examples

Links:

Graph View

Table of Contents

Backlinks

🌿💻 The Packets Garden

Embeddings

Main Concept

Context

Key Aspects

Applications

Examples

Related Concepts:

Links:

Graph View

Table of Contents

Backlinks