🤗 Huggingface vs ⚡ FastEmbed️

Comparing the performance of Huggingface’s 🤗 Transformers and ⚡ FastEmbed️ on a simple task on the following machine: Apple M2 Max, 32 GB RAM.

Open In Colab

📦 Imports

Importing the necessary libraries for this comparison.

  1. !pip install matplotlib transformers torch -qq
  1. import time
  2. from typing import Callable
  3. import matplotlib.pyplot as plt
  4. import torch.nn.functional as F
  5. from torch import Tensor
  6. from transformers import AutoModel, AutoTokenizer
  7. from fastembed import TextEmbedding
  1. import fastembed
  2. fastembed.__version__
  1. '0.2.6'

📖 Data

data is a list of strings, each string is a document.

  1. documents: list[str] = [
  2. "Chandrayaan-3 is India's third lunar mission",
  3. "It aimed to land a rover on the Moon's surface - joining the US, China and Russia",
  4. "The mission is a follow-up to Chandrayaan-2, which had partial success",
  5. "Chandrayaan-3 will be launched by the Indian Space Research Organisation (ISRO)",
  6. "The estimated cost of the mission is around $35 million",
  7. "It will carry instruments to study the lunar surface and atmosphere",
  8. "Chandrayaan-3 landed on the Moon's surface on 23rd August 2023",
  9. "It consists of a lander named Vikram and a rover named Pragyan similar to Chandrayaan-2. Its propulsion module would act like an orbiter.",
  10. "The propulsion module carries the lander and rover configuration until the spacecraft is in a 100-kilometre (62 mi) lunar orbit",
  11. "The mission used GSLV Mk III rocket for its launch",
  12. "Chandrayaan-3 was launched from the Satish Dhawan Space Centre in Sriharikota",
  13. "Chandrayaan-3 was launched earlier in the year 2023",
  14. ]
  15. len(documents)
  1. 12

Setting up 🤗 Huggingface

We’ll be using the Huggingface Transformers with PyTorch library to generate embeddings. We’ll be using the same model across both libraries for a fair(er?) comparison.

  1. class HF:
  2. """
  3. HuggingFace Transformer implementation of FlagEmbedding
  4. """
  5. def __init__(self, model_id: str) -> None:
  6. self.model = AutoModel.from_pretrained(model_id)
  7. self.tokenizer = AutoTokenizer.from_pretrained(model_id)
  8. def embed(self, texts: list[str]):
  9. encoded_input = self.tokenizer(
  10. texts, max_length=512, padding=True, truncation=True, return_tensors="pt"
  11. )
  12. model_output = self.model(**encoded_input)
  13. sentence_embeddings = model_output[0][:, 0]
  14. sentence_embeddings = F.normalize(sentence_embeddings)
  15. return sentence_embeddings
  16. model_id = "BAAI/bge-small-en-v1.5"
  17. hf = HF(model_id=model_id)
  18. hf.embed(documents).shape
  1. torch.Size([12, 384])

Setting up ⚡️FastEmbed

Sorry, don’t have a lot to set up here. We’ll be using the default model, which is Flag Embedding, same as the Huggingface model.

  1. embedding_model = TextEmbedding(model_name=model_id)
  1. huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
  2. To disable this warning, you can either:
  3. - Avoid using `tokenizers` before the fork if possible
  4. - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
  1. Fetching 9 files: 0%| | 0/9 [00:00<?, ?it/s]

📊 Comparison

We’ll be comparing the following metrics: Minimum, Maximum, Mean, across k runs. Let’s write a function to do that:

🚀 Calculating Stats

  1. import types
  2. def calculate_time_stats(
  3. embed_func: Callable, documents: list, k: int
  4. ) -&gt; tuple[float, float, float]:
  5. times = []
  6. for _ in range(k):
  7. # Timing the embed_func call
  8. start_time = time.time()
  9. embeddings = embed_func(documents)
  10. # Force computation if embed_func returns a generator
  11. if isinstance(embeddings, types.GeneratorType):
  12. list(embeddings)
  13. end_time = time.time()
  14. times.append(end_time - start_time)
  15. # Returning mean, max, and min time for the call
  16. return (sum(times) / k, max(times), min(times))
  17. hf_stats = calculate_time_stats(hf.embed, documents, k=100)
  18. print(f"Huggingface Transformers (Average, Max, Min): {hf_stats}")
  19. fst_stats = calculate_time_stats(embedding_model.embed, documents, k=100)
  20. print(f"FastEmbed (Average, Max, Min): {fst_stats}")
  1. Huggingface Transformers (Average, Max, Min): (0.04711266994476318, 0.0658111572265625, 0.043084144592285156)
  2. FastEmbed (Average, Max, Min): (0.04384247303009033, 0.05654191970825195, 0.04293417930603027)

📈 Results

Let’s run the comparison and see the results.

  1. def plot_character_per_second_comparison(
  2. hf_stats: tuple[float, float, float], fst_stats: tuple[float, float, float], documents: list
  3. ):
  4. # Calculating total characters in documents
  5. total_characters = sum(len(doc) for doc in documents)
  6. # Calculating characters per second for each model
  7. hf_chars_per_sec = total_characters / hf_stats[0] # Mean time is at index 0
  8. fst_chars_per_sec = total_characters / fst_stats[0]
  9. # Plotting the bar chart
  10. models = ["HF Embed (Torch)", "FastEmbed"]
  11. chars_per_sec = [hf_chars_per_sec, fst_chars_per_sec]
  12. bars = plt.bar(models, chars_per_sec, color=["#1f356c", "#dd1f4b"])
  13. plt.ylabel("Characters per Second")
  14. plt.title("Characters Processed per Second Comparison")
  15. # Adding the number at the top of each bar
  16. for bar, chars in zip(bars, chars_per_sec):
  17. plt.text(
  18. bar.get_x() + bar.get_width() / 2,
  19. bar.get_height(),
  20. f"{chars:.1f}",
  21. ha="center",
  22. va="bottom",
  23. color="#1f356c",
  24. fontsize=12,
  25. )
  26. plt.show()
  27. plot_character_per_second_comparison(hf_stats, fst_stats, documents)

No description has been provided for this image

Are the Embeddings the same?

This is a very important question. Let’s see if the embeddings are the same.

  1. def calculate_cosine_similarity(embeddings1: Tensor, embeddings2: Tensor) -&gt; float:
  2. """
  3. Calculate cosine similarity between two sets of embeddings
  4. """
  5. return F.cosine_similarity(embeddings1, embeddings2).mean().item()
  6. calculate_cosine_similarity(hf.embed(documents), Tensor(list(embedding_model.embed(documents))))
  1. /var/folders/b4/grpbcmrd36gc7q5_11whbn540000gn/T/ipykernel_14307/1958479940.py:8: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/utils/tensor_new.cpp:278.)
  2. calculate_cosine_similarity(hf.embed(documents), Tensor(list(embedding_model.embed(documents))))
  1. 0.9999992847442627

This indicates the embeddings are quite close to each with a cosine similarity of 0.99 for BAAI/bge-small-en and 0.92 for BAAI/bge-small-en-v1.5. This gives us confidence that the embeddings are the same and we are not sacrificing accuracy for speed.