Skip to content

Vector databases

Machine learning and especially neural networks allow transforming “unstructured data” into fixed-length vectors (usually float32) that preserve the semantics of the original object. For example, two similar texts will have similar vectors (low Euclidean distance or high cosine proximity).

Vector databases:
These are specialized storages designed to efficiently search and compare vectors (usually embeddings) that represent objects such as text, images, audio, or video in numerical form.

ComponentWhat is thisExample
IDUnique record identifier“doc-001” or 123
VectorA numeric list representing an object[0.12, -0.56, 0.44, …, -0.03] (usually float32)
DocumentSource text or file (optional)“How are you?”
MetadataAdditional fields for filtering, tags, context{“language”: “ru”, “user”: “petya”, “tags”: [“faq”]}
NameDeveloperIndexingMetricsProsCons
MilvusZillizIVF_FLAT, IVF_SQ8, IVF_PQ, HNSW, ANNOY, FlatL2, IP, Cosine, Jaccard, HammingScalability (billions of points)
Many indexes
gRPC/REST
Requires Docker or Standalone
More difficult to deploy
QdrantQdrant (in Rust)HNSW (modified), FlatCosine, Dot, Euclidean (L2)Fast, Rust engine
Easy installation
Filtering by metadata
There are fewer indices for now
No built-in clustering
WeaviateSemi.technologiesHNSW + Text (Hybrid Search)Cosine, Dot, EuclideanHybrid Search (BM25 + Vector)
GraphQL API
Auto-injection of data
Requires more memory
GraphQL is not always convenient
ChromaChroma.orgFlat (exact), HNSW (in roadmap/partial)CosineVery easy installation
Ideal for RAG and local running
Only Flat (for now)
No metadata filtering (partially available)
FAISSFacebook/MetaFlat, IVF, PQ, OPQ, HNSW, LSHL2, Dot, Cosine (via normalization)Very flexible
GPU support
Best CPU/GPU performance
This is a library, not a server.
Manual setup and coding required
OpenSearchAmazonHNSW, Faiss backend, ANN native pluginL2, Dot, CosineHybrid Search (BM25 + ANN)
Integration with text
Elasticsearch compatible
Complex ANN setup
High memory requirements

For example, the text “How are you?” can be converted into a vector of 384 values

[0.12, -0.56, 0.44, ..., -0.03]

For this purpose, specialized embedding models are used, such as:

NameWhat does it encode and how does it work?Advantages
all-MiniLM-L6-v2Lightweight and fast model based on Transformer
Encodes phrases, questions, paragraphs
Compact (~80MB)
Support in sentence-transformers
Works out of the box
text-embedding-ada-002
(OpenAI)
Commercial model from OpenAI
Requires API key
Encodes any texts
High quality embeddings
Multi-language support
Perfect for RAG
bge-small-enModern model from BAAI
Supports templates: “query:…”, “passage:…”
High precision
Multilingual support (in M3E)
Great for Qdrant, LangChain
e5-base / e5-largeUniversal models from FlagAI
Suitable for search, clustering, QA
Best performance on MTEB
Support for multilingual tasks
Works without fine-tune
Instructor-XLEncodes text according to the task
Uses instructions in the style: “Represent the … for …”
Increased accuracy
Suitable for task-aware embedding
Great for RAG/FAQ
mpnet-base-v2From Microsoft
Context sensitive model
Good for similar phrases
Good balance between accuracy and speed
Suitable for paraphrase and general search
LabSEFrom Google
Multilingual model
Best with short sentences
Support 100+ languages
Great choice for cross-language search

Typical vector sizes (embedding lengths) for different models:

ModelVector length
all-MiniLM-L6-v2384
text-embedding-ada-002 (OpenAI)1536
bge-small-en384
bge-base-en768
bge-large-en1024
e5-small-v2384
e5-base-v2768
e5-large-v21024
mpnet-base-v2768
LabSE768
Instructor-XL768 or 1024

Vectors corresponding to objects are indexed so that later it will be possible to quickly search for the closest ones stored in the database using the incoming vector.
Methods of vector indexing:

Types of indexes:

NameHow it worksAdvantagesFlaws
FlatIterates through all vectors manuallyThe most accurate search
Simple implementation
Ideal for debugging and small kits
Very slow with large volume
Requires a lot of calculations
Doesn’t scale
HNSWSearch the network for similar vectors (starting from “centers”)Very fast
High precision
Suitable for large bases
Requires a lot of memory
Long time to build index
Difficult to parameterize
IVFDivides vectors into groups (clusters), searches only in themFaster than Flat
Flexible configuration (nprobe)
Scales well
May skip similar vectors
Requires prior training
PQReplaces parts of a vector with short codesSaves memory a lot
Quick search by table
Ideal for large sets
Loss of precision
Training required (codebook)
Not for high precision tasks
OPQImproved version of PQ – first “corrects” the vectorHigher accuracy than PQ
Works well in FAISS, Milvus
Combined with IVF
More difficult to learn
Still an approximate method
AnnoyBuilds many random trees, searches through themEasy to use
Little dependent on resources
Suitable for CPU and mobile
Less accurate than HNSW
Long time to build index
Cannot update after build

When a user enters a query, it is converted into a vector and the database performs a nearest neighbor (KNN) search on the chosen metric.

Types of metrics:

Metric nameHow it worksAdvantagesFlaws
Cosine SimilarityCompare the angle between the vectors. The closer the angle is to 0°, the greater the similarity.Considers only direction
Works well with texts and embeddings
Does not depend on the length of the vector
Does not take into account scale (length)
Not suitable if vector length is important
Euclidean (L2)We measure the “linear” distance between points. Closer means similar.Simple and intuitive
Suitable for coordinates, images
Does not normalize vectors (scale affects)
Not always good for texts
Inner Product (Dot Product)We add up the corresponding coordinates. The greater the sum, the higher the similarity.It is calculated very quickly
Works well with unnormalized vectors
Sensitive to vector length
There may be difficult to interpret meanings.
Manhattan (L1)The sum of the absolute values of the differences for each coordinate is the same as for the cells on the grid.Resistant to emissions
Works better with sparse vectors
Less commonly used
Works worse with dense vectors
Hamming DistanceWe count the number of bits in which two binary vectors differ.Very fast for binary data
Suitable for fingerprints and hashes
Works only with binary vectors
Not applicable to float
Jacquard SimilarityThe intersection relation to the union of sets or binary vectors.Ideal for tags and binary features
Clear metrics
For binary vectors only
Doesn’t work with float vectors
TanimotoA generalized Jaccard metric that also applies to float vectors.Suitable for chemical structures, fingerprint
Works with both binary and real numbers
Rarely used
Limited support in libraries

In the example of CRUD operations we will use Python, Milvus base, embedding model e5‑base

from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection
from pymilvus.model.dense import SentenceTransformerEmbeddingFunction

# 1. Connect to Milvus (default localhost:19530)
connections.connect("default", host="localhost", port="19530")

#2. Initialize the embedding function with the e5-base-v2 model
# This model requires:
# - Prefix "passage: " for documents
# - Prefix "query: " for search queries
ef = SentenceTransformerEmbeddingFunction("intfloat/e5-base-v2")

# 3. Defining the collection schema:
# - "id" — integer identifier (primary key)
# - "text" — the original text of the document (string)
# - "emb" — embedding vector of dimension 768
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=512),
    FieldSchema(name="emb", dtype=DataType.FLOAT_VECTOR, dim=768)
]
schema = CollectionSchema(fields, description="Collection with embeddings from e5-base-v2")

# 4. Creating a collection in Milvus with a given schema
collection = Collection("e5_collection", schema)

# 5. Preparing and inserting documents
# Important: Before submitting to the model, you need to add the prefix "passage: "
raw_docs = ["Hello world", "Milvus vector database", "Semantic search with e5 model"]
docs = [f"passage: {d}" for d in raw_docs] # add prefix
ids = [1, 2, 3]

# Calculate embeddings for documents using e5‑base
embs = ef.encode_documents(docs)

# Insert into the collection:
# - identifiers
# - original (clean) texts without prefixes
# - embeddings
collection.insert([ids, raw_docs, embs])

# 6. Create an index on the "emb" field to speed up searching
collection.create_index(
    field_name="emb",
    index_params={
        # index type
        "index_type": "IVF_FLAT",

        # clustering parameter
        "params": {"nlist": 128},

        # distance metric (euclidean distance)
        "metric_type": "L2"
    }
)

# 7. Loading the collection into RAM
# Without this the search will not work
collection.load()

# 8. Search query
# Similarly, we use the prefix "query: " before the query text
query_docs = ["query: vector database"]
q_emb = ef.encode_queries(query_docs)

# Performing semantic search by embeddings
results = collection.search(
    # search query embedding
    data=q_emb,

    # field by which the search is performed
    anns_field="emb",

    # search parameters
    param={"metric_type": "L2", "params": {"nprobe": 10}},

    # number of nearest neighbors
    limit=2,

    # additional fields to return
    output_fields=["text"]
)

# 9. Display search results
for i, hits in enumerate(results):
    print(f"Results for query: '{query_docs[i]}'")
    if not hits:
        print("Nothing found")
        continue
    for rank, hit in enumerate(hits, start=1):
        print(f"  {rank}:")
        print(f"    ID: {hit.id}")
        print(f" Text: {hit.entity.get('text')}")
        print(f" Distance: {hit.distance:.4f}")

# Results for query: 'query: vector database'
#  1:
#    ID: 2
# Text: Milvus vector database
# Distance: 2.8374
#  2:
#    ID: 3
# Text: Semantic search with e5 model
# Distance: 5.4931

# 10. Deleting a document by ID
# In this case, the document with id = 1 is deleted
collection.delete(expr="id in [1]")

# 11. Delete the entire collection (if no longer needed)
collection.drop()