What the heck is a vector embedding?

Vectors, Vector Embeddings, Vector Databases, and HNSW

Vectors

Vectors are interesting, they can pretty mean a lot of things in different contexts. In the context of the famed first programming language of every computer science student (C++), they may mean an array of N numbers (where N = 1 to X). In the context of physics they may mean something that has a certain direction. î (magnitude along x axis), ĵ (magnitude along y axis), k̂ (magnitude along z axis). Any point in 3D space can be represented by [x, y, z] coordinates which can be called a vector. We represent a location on the map with a vector of 2 dimensions (latitude and longitude).

To put it in context of the topic that we’re discussing, we can call vectors as arrays of length N, where N might be 3 (for a point in 3D space), 2 (for a point on a map), or even 768, 1024, 1536 etc. We just say N dimensions.

Vector embeddings

Models today are very powerful at one thing, they’re powerful at taking in a arbitrary piece of data (a word, a text, an image, an audio or even a video), and converting them to vector embeddings. These embeddings are just numerical representations of the input (in our case a word, a text, an image, an audio or even a video) that aim to capture meaning of the input, in N dimensions. The value of N depends on the Embedding Model that we’re using, but to generalize we could say that an Embedding Model is good at taking in an input, and spitting out a vector embedding.

Let’s say for an example, that a piece of text “A red dress”, has a vector embedding as: [1, 2, 3], let’s say (again for example) that this was generated by an Embedding that spits out 3 dimensional embeddings. Models today are so good, that if you actually input an image of a red dress & the Embedding had to spit out a 3 dimensional embedding, it would give out something like a [1.1, 2.1, 2.9] as an ouput. Do you see how close it is to the original vector of the query? Our model inherently knows that the query “A red dress” and an actual image of the red dress are similar things, and so it outputs similar embeddings for both.

On a side note, a helpful tool to visualise vector embeddings for words in a 3D space is this embedding projector: https://projector.tensorflow.org/

What’s a Embedding model?

I mentioned an Embedding model in a couple of places above, so I thought I should clarify: Embedding Models are Models trained especially to produce embeddings. They include open source models like BERT, CLIP (Constrastive Language-Image Pre-training) and Sentence Transformers.

Vector databases

The bread and butter of vector databases is that a user should be able to put in a query (“A red dress”), and the Vector DB should be able to pull back the proper set of result (The image of the red dress, along with any other things that are similar to “A red dress”) as a response. BTW, a vector database doesn’t need you to tag that image of the red dress with something “red dress” or even “red” or even “dress”. It has to do this at scale, out of billions of vectors (or how many ever vectors we’re storing).

One of the most popular vector search data structure / algorithm that Vector DBs use is HNSW (Heirarchical Navigable Small Worlds). HNSW sits at the core of systems like Weaviate, Elasticsearch, Vespa, OpenSearch, PGVector, and many others.

HNSW

HNSW is a graph based algorithm used for approximate nearest neighbor (ANN) search in high-dimensional spaces, particularly in vector databases.

It works by starting at an arbitrary node in the graph, and then walking the graph to get closer to the vector embeddings of your input query.

Stay tuned for another article where I go over HNSW in depth!