menu icon

Fine-tuning relevance, reranking techniques in hybrid search

In this post, we explore how reranking methods like RRF, min-max normalization, L2, and atan boost the performance of hybrid search systems by combining semantic and lexical approaches.

Fine-tuning relevance, reranking techniques in hybrid search

Table of contents

  1. Introduction
  2. What is Hybrid Search?
  3. Reciprocal Rank Fusion (RRF)
  4. Score Normalization Techniques
  5. Conclusion

1. Introduction

In an age of information overload, the relevance of search results has never been more important. Users don’t just want answers—they want the right answers, fast. As hybrid search systems become more common, combining both keyword and semantic models, reranking is key to delivering the most relevant results.

This article dives into reranking techniques such as Reciprocal Rank Fusion (RRF) and score normalization methods like min-max, L2, and atan—tools that refine and optimize search result quality across multiple systems.

You’ll also gain practical insights on how to implement custom reranking methods within Elasticsearch.

Hybrid search blends traditional lexical search (keyword-based) with semantic search (context and intent-driven). Each method brings unique strengths:

-Lexical Search: Precise keyword matching, fast indexing, but limited understanding of context.

-Semantic Search: Uses machine learning models to understand meaning and user intent, making results smarter and more intuitive.

Hybrid systems run both types of searches and then merge their outputs. That’s where reranking comes in—to organize the combined results into a cohesive and relevant list.

When hybrid search systems return results from different sources, these outputs often differ in scoring methods, ranking logic, and precision. Simply merging them won’t do.

Reranking addresses this by:

  • Improving result consistency and coherence.
  • Elevating documents that are relevant in multiple rankings.
  • Harmonizing different score scales and ranking strategies.

This technique is especially effective in Retrieval-Augmented Generation (RAG), where the accuracy of the language model’s response depends heavily on the relevance of the retrieved documents. By ensuring better alignment between the query and the supporting content, reranking leads to more precise and contextually appropriate answers.


3. Reciprocal Rank Fusion (RRF)

One of the most efficient reranking methods is Reciprocal Rank Fusion (RRF). It’s score-agnostic and instead focuses on rank positions.

RRF Formula.
Formula of RRF

  • R: Set of ranked result lists (e.g., lexical and semantic).
  • r(d): Position of document d in ranking r.
  • k: A smoothing constant (usually 60).

One of the key strengths of Reciprocal Rank Fusion (RRF) lies in its simplicity and robustness. Unlike more complex score-combination techniques, RRF operates purely on the position of a document in each result list, making it both easy to implement and remarkably effective in practice.

RRF Calculation.
Calculating the RRF

Because RRF doesn’t rely on raw scores, it sidesteps the challenge of score normalization entirely. This makes it especially attractive in hybrid search setups, where lexical and semantic engines produce scores on very different scales. There’s no need to calibrate or align these scores—RRF focuses solely on ranks.

Perhaps most importantly, RRF rewards consistency. Documents that appear in multiple result sets—even if they’re not at the very top—are boosted higher in the final ranking. This reflects a kind of “consensus relevance”: if different retrieval methods independently find the same document valuable, it’s likely to be truly relevant to the user’s query. In this way, RRF elegantly combines diversity with agreement, producing ranked lists that are both comprehensive and coherent.


4. Score Normalization Techniques

If you want to combine raw scores instead of ranks, you’ll face a challenge: different systems use different scoring scales.

Normalization to range 0-1.
Normalization to range [0,1]

Here’s how normalization helps: score normalization offers a way to bring these diverse scales into alignment, allowing for fair comparison and effective reranking. Several techniques are commonly used, each with its strengths and caveats.

🔹 Min-Max Normalization

The Min-Max normalization method is perhaps the most straightforward. It linearly scales scores so they fall within a common [0,1] interval. This makes comparison easier and preserves the relative distribution of values.

Min-max Formula.
Formula of min-max

Another practical advantage of min-max normalization is that it’s easy to implement within Elasticsearch. While it’s not natively supported, a simple script-based workaround can achieve the desired effect. This is typically done through a two-step approach:

  1. Run an initial query to retrieve the maximum lexical score for the given search. This score will be used as the normalization reference.
  2. Use the retrieved max score in a second hybrid search query, applying a script to normalize the lexical scores by dividing each one by the max value. This ensures all lexical scores are scaled between 0 and 1.

Here an exemple of a scripted query using min-max (step 2):

{
        "from": 0,
        "size": 10,
        "knn": {
            "k": 1000,
            "num_candidates": 1500.0,
            "field": "vector",
            "boost": 0.5,
            "query_vector":[0.3, 0.6, 0.1]
        },
        "query": {
            "script_score": {
                "query": {
                    "bool": {
                        "must": [
                            {
                                "multi_match": {
                                    "query": "The sound of space",
                                    "fields": "text.lang"
                                }
                            }
                        ]
                    }
                },
                "script": {
                    "source": "((_score) / (params.max )) * params.boost;",
                        "params": {
                        "max": 10, // to retrieve
                        "boost": 0.5
                        }
                }
            }
        }
}

Here we assume the min-score is equal to zero and the max-score equal to 10.

This method allows for effective score alignment, making it much easier to combine lexical and semantic results in a meaningful way.

🔹 L2 Normalization

L2 normalization offers an alternative approach to min-max when it comes to scaling scores.

l2 Formula.
l2 normalization

Implementing this method is a bit more complex and, like min-max, requires two steps:

  1. First, you need to compute the sum of the squared scores for all documents in your search results.
  2. Once the sum is known, you then take the square root of this sum (calculating the Euclidean norm), and each score in the vector is divided by this value, ensuring that the resultant vector has a length of 1.

This method can be computationally expensive, as you have to gather all the scores and perform a sum of their squares. For this reason, we won’t further discuss about its implementation.

🔹 Atan Normalization

Atan normalization takes a more nonlinear approach to scaling scores, using the arctangent function to smooth out extreme values. The formula for this transformation looks like this:

atan Formula.
atan normalization

Atan is absolutely the easiest to implement, since doesn’t require multiple steps. Here a scripted query that implement this method:

{
        "from": 0,
        "size": 10,
        "knn": {
            "k": 1000,
            "num_candidates": 1500.0,
            "field": "vector",
            "boost": 0.5,
            "query_vector":[0.3, 0.6, 0.1]
        },
        "query": {
            "script_score": {
                "query": {
                    "bool": {
                        "must": [
                            {
                                "multi_match": {
                                    "query": "A tennis player winning his match",
                                    "fields": "text.lang"
                                }
                            }
                        ]
                    }
                },
                "script": {
                    "source": "(Math.atan(_score) / ( Math.PI/2)) * params.boost;",
                    "params": {
                        "boost": 1
                    }
                }
            }
        }
}

However, like any normalization method, Atan comes with its trade-offs. Since the distribution of scores is altered, the normalized values may not fully reflect their original importance when combined.

Distribution difference between min-max and atan.
Distribution difference between min-max and atan.

Specifically, as you can see in the image, lower scores are often normalized to higher values, which can lead to a bias toward the lexical part, potentially overemphasizing it in the final ranking.

7. Conclusion

As hybrid search cements itself as the new standard, reranking plays a critical role in refining, unifying, and elevating search results. Whether through RRF to merge rankings or through score normalization to blend outputs meaningfully, reranking ensures that the most relevant results rise to the top—delivering a smarter, more intuitive search experience and ultimately leading to more satisfied users.

Choosing the right approach must be tailored to specific user needs and validated through rigorous relevance evaluation. As experts in the field, Adelean is here to guide you toward the most effective solution.

A practical guide to prompt engineering

30/09/2024

In recent times, we often hear more and more about prompt engineering. But what exactly is this technique? When and how can we use it?

Read the article

Understanding the differences between sparse and dense semantic vectors

31/01/2024

More and more frequently, we hear about semantic search and new ways to implement it. In the latest version of OpenSearch (2.11), semantic search through sparse vectors has been introduced. But what does sparse vector mean? How does it differ from dense matrix? Let's try to clarify within this article.

Read the article

A guide to a full Open-Source RAG

01/12/2023

Delving into Retrieval-Augmented Generation (RAG). In this article we explore the foundational concepts behind RAG, emphasizing its role in enhancing contextual understanding and information synthesis. Moreover, we provide a practical guide on implementing a RAG system exclusively using open-source tools and large language model.

Read the article

Return from the DevFest Toulouse conference

19/11/2023

We are back from DevFest Toulouse, an opportunity for us to attend several conferences, train ourselves and share a personalized version of our presentation Cloner ChatGPT with Hugging Face and Elasticsearch.

Read the article

The Art of Image Vectorization - A Guide with OpenSearch

01/10/2023

BLIP-2 is a model that combines the strengths of computer vision and large language models. This powerful blend enables users to engage in conversations with their own images and generate descriptive content. In this article, we will explore how to leverage BLIP-2 for creating enriched image descriptions, followed by indexing them as vectors in Opensearch.

Read the article

NLP in OpenSearch

18/06/2023

A practical guide about how to import and use NLP models into OpenSearch for text analysis and inference in your search and analytics workflows

Read the article

Diving into NLP with the Elastic Stack

01/04/2023

An overview about NLP and a practical guide about how it can be used with the Elastic stack to enhance search capabilities.

Read the article