menu icon

A Complete Guide to run Jina V5 natively in OpenSearch

Want to use the newest Jina V5 model in OpenSearch? This guide walks you through the entire process—from downloading and packaging the ONNX weights to successfully registering the model without errors.

A Complete Guide to run Jina V5 natively in OpenSearch

Table of Contents

  1. Introduction
  2. Downloading & preparing the Model
  3. Uploading to OpenSearch
  4. Testing Your New Vector Embeddings
  5. Takeaways

1. Introduction

If you’ve been keeping an eye on the vector search space, you’ve probably heard about the newest models from Jina AI: the v5-small and v5-nano. Despite the names, these models are state-of-the-art for semantic search, text matching, and retrieval-augmented generation, and according to Elastic’s latest benchmarks, they even outperform E5.

Elastic considers these models so state-of-the-art that they heavily promote using them within their own ecosystem (which makes sense, especially now that the Jina team has joined Elastic).

But what if you are using OpenSearch?

Since these models are readily available on Hugging Face, you don’t need to be locked into Elasticsearch to take advantage of them. In this guide, we’ll walk through the entire process, demonstrating how you can run Jina v5 embeddings natively in OpenSearch.

2. Downloading and preparing the Model

To run Jina V5 in OpenSearch, we first need to convert the model to the ONNX format.

However, according to the model’s Hugging Face page, Jina V5 handles things a bit differently than other models. Instead of one giant model that does everything, Jina provides task-specific versions where the “adapters” (the parts that fine-tune the model for specific jobs) are already merged into the weights.

For vector search, we want to use the retrieval version: jinaai/jina-embeddings-v5-text-small-retrieval.

To download the model in the correct format, we can use the optimum library.

from optimum.onnxruntime import ORTModelForFeatureExtraction
from transformers import AutoTokenizer

model_id = "jinaai/jina-embeddings-v5-text-small-retrieval"
save_dir = "jina_temp_onnx"

# 1. Pull the tokenizer and the pre-exported ONNX model
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = ORTModelForFeatureExtraction.from_pretrained(
    model_id,
    subfolder="onnx", 
    file_name="model.onnx",
    trust_remote_code=True,
)

# 2. Save everything to a local temporary folder
tokenizer.save_pretrained(save_dir)
model.save_pretrained(save_dir)

To use the model in OpenSearch, you must save the model file as zip before uploading it.

zip -r jina_temp_onnx.zip jina_temp_onnx

The zip file must include both the tokenizer JSON and the model weights.

Note that modern models can exceed the 2 GB limit of a single ONNX file. When this happens, the weights are stored separately in a file named model.onnx_data.

For more details on handling large ONNX models, see this guide.

Finally, it is necessary to calculate a SHA256 hash, that we need to use when registering the model. OpenSearch will re-calculate this hash after downloading the zip to ensure the file is exactly what you intended.

shasum -a 256 jina-v5-small.zip

3. Uploading to OpenSearch

To register a custom model, OpenSearch must be able to access the ZIP file via HTTP or HTTPS. In my case, since I’m working locally, I started a temporary web server using:

python -m http.server 8000

This serves the current directory on http://localhost:8000, making the ZIP file accessible to OpenSearch.

If OpenSearch is running inside a Docker container, keep in mind that the container cannot access your host machine using localhost or the host’s local address.

Once this is done, it is time to open DevTools and register the model:

POST /_plugins/_ml/models/_register?deploy=true
{
  "name": "jinaai/jina-embeddings-v5-text-small-retrieval",
  "version": "1.0.0",
  "model_format": "ONNX",
  "function_name": "TEXT_EMBEDDING",
  "model_content_hash_value": "<YOUR_SHA256_HASH_HERE>",
  "url": "<YOUR URL>",
  "model_config": {
    "model_type": "qwen3",
    "embedding_dimension": 1024,
    "framework_type": "sentence_transformers", 
    "all_config": """{\"architectures\":[\"Qwen3Model\"],\"attention_bias\":false,\"attention_dropout\":0.0,\"bos_token_id\":151643,\"dtype\":\"bfloat16\",\"eos_token_id\":151645,\"head_dim\":128,\"hidden_act\":\"silu\",\"hidden_size\":1024,\"initializer_range\":0.02,\"intermediate_size\":3072,\"layer_types\":[\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\",\"full_attention\"],\"max_position_embeddings\":32768,\"max_window_layers\":28,\"model_type\":\"qwen3\",\"num_attention_heads\":16,\"num_hidden_layers\":28,\"num_key_value_heads\":8,\"pad_token_id\":null,\"rms_norm_eps\":1e-06,\"rope_parameters\":{\"rope_theta\":3500000,\"rope_type\":\"default\"},\"sliding_window\":null,\"task_names\":[\"retrieval\",\"text-matching\",\"clustering\",\"classification\"],\"tie_word_embeddings\":true,\"transformers_version\":\"5.1.0\",\"use_cache\":true,\"use_sliding_window\":false,\"vocab_size\":151936}"""
  }
}

A few words about this command:

  1. Embedding dimension is obviously the size of the vectors the model produces (here, 1024).
  2. Function name (TEXT_EMBEDDING)** indicates that this model will generate embeddings for text.
  3. Model config contains both high-level settings and the full model configuration:
  4. model_type and framework_type tell OpenSearch what kind of model and library it is working with.
  5. all_config is a complete JSON dump of the underlying model architecture and hyperparameters, which can be found in the config.json file inside the downloaded model folder.

Once this is done, remember to use the task ID from the response. This is useful for:

  • Checking whether your task has completed, with or without errors.
  • Retrieving the model ID once the model is successfully uploaded (this can also be found on the Machine Learning page).

To get the model ID from DevTools, run:

GET /_plugins/_ml/tasks/<your_upload_task_id>

Here is the draft for the final section of your blog post. It focuses on the payoff: actually seeing those 1,024-dimensional vectors in action.


4. Testing Your New Vector Embeddings

Once the registration task is complete, it’s time to see the state-of-the-art in action. You don’t need to index thousands of documents just to see if it’s working; OpenSearch provides a _predict endpoint that allows you to test the model on the fly.

To do so, send a sample sentence to the model.

POST /_plugins/_ml/models/<model_id>/_predict
{
  "text_docs": [
    "Jina V5 is running natively on OpenSearch!"
  ]
}

Since we configured this as a TEXT_EMBEDDING function with the qwen3 architecture, OpenSearch will handle the tokenization and pooling automatically. If everything is set up correctly, you’ll receive a response containing a 1024 dimension vector.

Now you can finally use your model, create an ingest pipeline with text_embedding processor for indexation, and search your vector database using semantic queries.

5. Takeaways

By successfully deploying Jina V5, we’ve proven that you don’t need to be locked into a specific ecosystem to access the latest breakthroughs in vector search. While Elastic has heavily integrated this state-of-the-art model into their stack, the availability of these weights on Hugging Face allows OpenSearch users to achieve that same elite performance natively. The secret lies in the packaging. By correctly bundling the split ONNX weights and configuring the qwen3 architecture in OpenSearch, you now have a 1024-dimensional embedding powerhouse at your disposal. This setup ensures that your search infrastructure remains at the absolute cutting edge, ready for high-performance semantic retrieval and RAG applications.

Towards the future of search with OpenSearch

07/09/2025

Search is evolving faster than ever, and technologies like OpenSearch are leading the way. In this blog, we’ll explore the latest OpenSearch features such as agents and the Model Context Protocol, that unlock new superpowers for building next-generation rag applications.

Read the article

Connecting your LLM to OpenSearch through connectors.

07/05/2024

Recently, OpenSearch implemented connectors, a functionality that enables connecting a machine learning model without the need to internally deploy it to the cluster. In this blog article we will discover how to use connectors and implement a rag through the utilization of connectors and agents.

Read the article

Hello OpenSearch Conference Europe 2024

06/05/2024

The inaugural OpenSearch Conference in Europe takes place in Berlin this May and Adelean is taking part in the event. This page is listing some of our OpenSearch references and talks for you to enjoy during and after the event.

Read the article

A guide to a full Open-Source RAG

01/12/2023

Delving into Retrieval-Augmented Generation (RAG). In this article we explore the foundational concepts behind RAG, emphasizing its role in enhancing contextual understanding and information synthesis. Moreover, we provide a practical guide on implementing a RAG system exclusively using open-source tools and large language model.

Read the article

The Art of Image Vectorization - A Guide with OpenSearch

01/10/2023

BLIP-2 is a model that combines the strengths of computer vision and large language models. This powerful blend enables users to engage in conversations with their own images and generate descriptive content. In this article, we will explore how to leverage BLIP-2 for creating enriched image descriptions, followed by indexing them as vectors in Opensearch.

Read the article

NLP in OpenSearch

18/06/2023

A practical guide about how to import and use NLP models into OpenSearch for text analysis and inference in your search and analytics workflows

Read the article