Want to use the newest Jina V5 model in OpenSearch? This guide walks you through the entire process—from downloading and packaging the ONNX weights to successfully registering the model without errors.
If you’ve been keeping an eye on the vector search space, you’ve probably heard about the newest models from Jina AI: the v5-small and v5-nano. Despite the names, these models are state-of-the-art for semantic search, text matching, and retrieval-augmented generation, and according to Elastic’s latest benchmarks, they even outperform E5.
Elastic considers these models so state-of-the-art that they heavily promote using them within their own ecosystem (which makes sense, especially now that the Jina team has joined Elastic).
But what if you are using OpenSearch?
Since these models are readily available on Hugging Face, you don’t need to be locked into Elasticsearch to take advantage of them. In this guide, we’ll walk through the entire process, demonstrating how you can run Jina v5 embeddings natively in OpenSearch.
To run Jina V5 in OpenSearch, we first need to convert the model to the ONNX format.
However, according to the model’s Hugging Face page, Jina V5 handles things a bit differently than other models. Instead of one giant model that does everything, Jina provides task-specific versions where the “adapters” (the parts that fine-tune the model for specific jobs) are already merged into the weights.
For vector search, we want to use the retrieval version: jinaai/jina-embeddings-v5-text-small-retrieval.
To download the model in the correct format, we can use the optimum library.
from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer
model_id = "jinaai/jina-embeddings-v5-text-small-retrieval"
save_dir = "jina_temp_onnx"
# 1. Pull the tokenizer and the pre-exported ONNX model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = ORTModelForFeatureExtraction.from_pretrained(
model_id,
subfolder="onnx",
file_name="model.onnx",
trust_remote_code=True,
)
# 2. Save everything to a local temporary folder
tokenizer.save_pretrained(save_dir)
model.save_pretrained(save_dir)
To use the model in OpenSearch, you must save the model file as zip before uploading it.
zip -r jina_temp_onnx.zip jina_temp_onnx
The zip file must include both the tokenizer JSON and the model weights.
Note that modern models can exceed the 2 GB limit of a single ONNX file.
When this happens, the weights are stored separately in a file named model.onnx_data.
For more details on handling large ONNX models, see this guide.
Finally, it is necessary to calculate a SHA256 hash, that we need to use when registering the model. OpenSearch will re-calculate this hash after downloading the zip to ensure the file is exactly what you intended.
shasum -a 256 jina-v5-small.zip
To register a custom model, OpenSearch must be able to access the ZIP file via HTTP or HTTPS. In my case, since I’m working locally, I started a temporary web server using:
python -m http.server 8000
This serves the current directory on http://localhost:8000, making the ZIP file accessible to OpenSearch.
If OpenSearch is running inside a Docker container, keep in mind that the container cannot access your host machine using localhost or the host’s local address.
Once this is done, it is time to open DevTools and register the model:
POST /_plugins/_ml/models/_register?deploy=true
{
"name": "jinaai/jina-embeddings-v5-text-small-retrieval",
"version": "1.0.0",
"model_format": "ONNX",
"function_name": "TEXT_EMBEDDING",
"model_content_hash_value": "<YOUR_SHA256_HASH_HERE>",
"url": "<YOUR URL>",
"model_config": {
"model_type": "qwen3",
"embedding_dimension": 1024,
"framework_type": "sentence_transformers",
"all_config": """{\"architectures\":[\"Qwen3Model\"],\"attention_bias\":false,\"attention_dropout\":0.0,\"bos_token_id\":151643,\"dtype\":\"bfloat16\",\"eos_token_id\":151645,\"head_dim\":128,\"hidden_act\":\"silu\",\"hidden_size\":1024,\"initializer_range\":0.02,\"intermediate_size\":3072,\"layer_types\":[\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\"],\"max_position_embeddings\":32768,\"max_window_layers\":28,\"model_type\":\"qwen3\",\"num_attention_heads\":16,\"num_hidden_layers\":28,\"num_key_value_heads\":8,\"pad_token_id\":null,\"rms_norm_eps\":1e-06,\"rope_parameters\":{\"rope_theta\":3500000,\"rope_type\":\"default\"},\"sliding_window\":null,\"task_names\":[\"retrieval\",\"text-matching\",\"clustering\",\"classification\"],\"tie_word_embeddings\":true,\"transformers_version\":\"5.1.0\",\"use_cache\":true,\"use_sliding_window\":false,\"vocab_size\":151936}"""
}
}
A few words about this command:
1024).TEXT_EMBEDDING)** indicates that this model will generate embeddings for text.config.json file inside the downloaded model folder.Once this is done, remember to use the task ID from the response. This is useful for:
To get the model ID from DevTools, run:
GET /_plugins/_ml/tasks/<your_upload_task_id>
Here is the draft for the final section of your blog post. It focuses on the payoff: actually seeing those 1,024-dimensional vectors in action.
Once the registration task is complete, it’s time to see the state-of-the-art in action.
You don’t need to index thousands of documents just to see if it’s working; OpenSearch provides a _predict endpoint that allows you to test the model on the fly.
To do so, send a sample sentence to the model.
POST /_plugins/_ml/models/<model_id>/_predict
{
"text_docs": [
"Jina V5 is running natively on OpenSearch!"
]
}
Since we configured this as a TEXT_EMBEDDING function with the qwen3 architecture, OpenSearch will handle the tokenization and pooling automatically.
If everything is set up correctly, you’ll receive a response containing a 1024 dimension vector.
Now you can finally use your model, create an ingest pipeline with text_embedding processor for indexation, and search your vector database using semantic queries.
By successfully deploying Jina V5, we’ve proven that you don’t need to be locked into a specific ecosystem to access the latest breakthroughs in vector search. While Elastic has heavily integrated this state-of-the-art model into their stack, the availability of these weights on Hugging Face allows OpenSearch users to achieve that same elite performance natively. The secret lies in the packaging. By correctly bundling the split ONNX weights and configuring the qwen3 architecture in OpenSearch, you now have a 1024-dimensional embedding powerhouse at your disposal. This setup ensures that your search infrastructure remains at the absolute cutting edge, ready for high-performance semantic retrieval and RAG applications.