Text embeddings are models that convert texts into floating-point numbers. Text embeddings play a pivotal role in the Retrieval Augmented Generation (RAG) framework that underpins many chatbots. Without text embeddings, the chatbot will have to read the entire knowledge base to generate a response for user prompts. With text embeddings, the chatbot is fed only relevant information such that it can generate accurate responses. OpenAI has introduced the Ada-002 text embedding model, and it has been very popular among tech enthusiasts. In this blog, we will analyse many embedding models from different vendors and compare their performance.
✍️ TL;DR: One-Minute Summary
Text embeddings convert text into numerical vectors and are crucial for RAG-based chatbot systems, enabling faster, more accurate responses. This blog compares top models from OpenAI, Voyage AI, Mistral, and Cohere, analyzing dimensions, tokenization, token counts, pricing, and statistical performance.
- OpenAI’s ada-002 and Text Embedding 3 models are widely used, offering high-dimensional (1536) vectors with lower token counts
- Voyage AI’s models (like Voyage 3 Lite) have slightly higher token counts (~1.6% more than OpenAI) and support multiple dimensions.
- Lower embedding dimensions show higher maximum magnitudes and greater variance in value ranges.
- Statistical tests reveal no significant difference within the same vendor (e.g., OpenAI 1536d vs 512d), but significant differences across vendors (e.g., OpenAI vs Cohere, Voyage vs Mistral).
- Key takeaway: Not all embeddings perform equally. Choose embedding models based on use case, accuracy, and multilingual support
Vendors of Text Embedding Models
There are many vendors in the market who offer text embedding. They are
OpenAI:
- OpenAI has introduced the Ada-002 text embedding model. They also have text embedding 3 small and large models, which were introduced in 2024.
Voyage AI:
- Voyage AI offers only text embedding and a re-ranker model. Their popular text embedding model is Voyage 3 Lite.
Mistral:
- Mistral AI has a Mistral-embed model
Cohere:
- Cohere offers a v3 text embedding model
Many vendors offer embedding dimensions from 512 to 3192, depending on the type of embedding model you choose. They are priced based on the number of tokens (a token is approximately 4 characters). The comparison for each vendor is listed below
Aspect |
OpenAI |
Voyage AI |
Mistral |
Cohere |
Embedding Models |
Text embedding 3 small |
Voyage 3 Lite |
Mistral-embed |
Cohere English v3 |
Tokenization method |
Tiktoken – cl110k_base |
Built into the VoyageAI SDK |
NA |
Cohere’s tokenizer model |
Token counting method |
encoding.encode() |
vo.count_tokens() |
NA |
Co.tokenize() |
Embedding dimensions |
1536 |
512 |
1024 |
1024 |
Output token count |
4,73,217 |
4,80,842 |
NA |
5,72,673 |
Pricing |
$0.02/ 1M tokens |
$0.02/ 1M tokens |
$0.10/M tokens |
$0.10/M tokens |
Inferences
- Voyage AI token counts are slightly higher compared to the OpenAI model. From our test cases, we infer that the total token count for Voyage AI is 1.6% more than for OpenAI. Also, Voyage AI uses different tokenization models for different models (from Hugging Face). OpenAI uses the CL100k base model for tokenization for most of their models.
- We also observed that the dimensions of the embedding model decrease, and we see an increase in the maximum magnitude, resulting in the long-tail distribution of the text embedding.
Embedding dimensions |
Maximum Magnitude |
1536 |
0.0949307531118393 |
1024 |
0.10752172023057938 |
768 |
0.11955433338880539 |
512 |
0.1382475197315216 |
256 |
0.20418193936347961 |
Figure: Showing text embedding dimensions versus maximum magnitude
- We analysed digits of text embedding of text models such as OpenAI (1536-dimension) = embedding digit ranges from 6 to 13, Voyage AI (512- dimension) embedding digit ranges from 13 to 21, Mistral AI (1024- dimension) embedding digit ranges from 2 to 21, and Cohere v3 (1024- dimension) embedding digit ranges from 4 to 13. We observed that the number of digits generated by the models is inversely proportional to the dimensions.
Figure: Embedding distribution of OpenAI (text-embedding-3-small)
OpenAI 1536 vs OpenAI 512 dimensions
- Slicing the first 512 floating point values from the vector embeddings of 1536 and rescaling back to its dimension, and comparing the similarity score, we get the cosine similarity very close with the quantization error of 0.000001 (approximately)
- linalg. norm (embeddings) is used to compute the Euclidean norm of the vector embedding, which is the square root of the sum of the squares of its elements.
- Statistical inferences:
- Comparing OpenAI (1536) Vs OpenAI (512): By comparing the normalized similarity scores of the top 5 chunks picked by both dimensions for 38 input Standalone Questions, we infer that there is no statistical significance difference between them based on the Wilcoxon test (non-parametric test)
- Comparing OpenAI (1536) Vs Voyage AI (512): We infer that there is a statistically significant difference between the similarity score of the top 5 chunks picked by them
Comparing dimensions within the Voyage 3 large model
- In Voyage AI, after truncating the first 512-dimension from the 1024-dimension, we get the average quantization error of 00043 (Mean squared error between 1024 Vs truncated 512)
In OpenAI, after truncating the first 512-d from the 1536-d, we get the average quantization error of 00032 (Mean squared error between 1536 Vs truncated 512)
Statistical Significance Between the Models
This table shows the statistical significance between different text embedding models.
Model Comparison |
Paired t-test (p-value) |
Significance (t-test) |
Wilcoxon test (p-value) |
Significance (Wilcoxon) |
voyage3_large_1024d vs 512d |
6.87E-01 |
No significant difference |
5.19E-01 |
No significant difference |
openai_1536d vs 512d |
9.64E-02 |
No significant difference |
1.09E-01 |
No significant difference |
voyage3_lite vs cohere |
1.15E-12 |
Significant difference |
5.68E-11 |
Significant difference |
voyage3_lite vs mistral |
5.24E-11 |
Significant difference |
4.42E-09 |
Significant difference |
cohere vs mistral |
2.59E-09 |
Significant difference |
6.62E-11 |
Significant difference |
openai_1536d vs cohere |
5.47E-05 |
Significant difference |
9.18E-04 |
Significant difference |
openai_1536d vs mistral |
4.45E-25 |
Significant difference |
1.08E-20 |
Significant difference |
Conclusion
We observe statistical significance between different vendors of text embedding models and no statistical significance between models from the same vendor. Some text embedding models, such as Voyage 3 Lite, perform very well in multilingual benchmarks. We recommend doing thorough research before choosing a text embedding model that suits your use cases.