AI, Big Data, and Search. Welcome to Berlin Buzzwords 2025

27/06/2025

Auteur(s) :

Pietro Mele

Giovanna Monti

Temps de lecture : 10 minute(s)

Berlin Buzzwords 2025 unites leading voices in AI, Big Data, Search, and Scalable Data Infrastructure, all anchored around open-source innovation. From GPUs for LLMs to data sovereignty, from search platforms to real‑time pipelines, this is where ideas meet engineering and where tomorrow’s data landscape takes shape.

AI, Big Data, and Search. Welcome to Berlin Buzzwords 2025

Introduction
Semantic Search: More Than Just a Buzzword
Agentic RAG: The New Frontier of Search
A new approach to E-commerce search
Conversations on culture, collaboration and career growth
Conclusion

Introduction

Once again this year, we had the opportunity to attend Berlin Buzzwords, a conference that stands as a benchmark for excellence in the search community. Like every edition, it was a chance to discover new technologies, connect with peers, learn from real-world use cases, and leave truly inspired.

Moreover, it was also an opportunity to share our own knowledge and contribute to the celebration of the amazing search community we’re proud to be part of. Indeed, Adelean was there with two talks.

On the first day, Lucian Precup and Giovanna Monti took the stage to explore the topic of fake news and how AI can help detect and combat misinformation in a world where information travels fast (sometimes too fast).

On the second day, Pietro Mele and Radu Pop presented a session on hybrid search, highlighting the challenges of implementing effective semantic search at scale.

Both talks, and many others, are now online on the Plain Schwarz YouTube channel.

Semantic Search: More Than Just a Buzzword

One of the main threads of this edition was semantic search. This technology seems to be increasingly permeating search engines, steadily gaining in maturity. Many of the talks at the conference reflected this trend, showcasing real-world applications, lessons learned, and innovative approaches to implementing semantic capabilities, from retrieval-augmented generation to vector search and hybrid architectures.

Performance tuning Apache Solr for Dense Vectors

In the talk “Performance Tuning Apache Solr for Dense Vectors,” Kevin Liang shared an insightful use case from Bloomberg, illustrating how their search engine supports investors in making informed decisions.

With the introduction of semantic search, Liang and his team encountered several challenges in scaling this technology. These ranged from increased latency and CPU saturation, to broader technical hurdles that required both vertical and horizontal scaling strategies. However, scaling alone wasn’t enough. To significantly improve performance, the team implemented several key optimizations:

Improved merge strategy: since segmenting the index splits the HNSW graph, it increases latency. A better merge approach helped reduce this overhead.
Switched from cosine similarity to dot product: dot product is computationally cheaper and faster to compute.
Leveraged Project Panama: this module in Solr allows for high-performance, low-level computation, further speeding up vector operations.

Another critical focus was reducing memory footprint, which they achieved by quantizing vectors and avoiding storage of the original vector data.

In his talk, Dhrubo Saha demonstrated the real machine learning capabilities of OpenSearch, while also offering a glimpse into its future roadmap.

The highlight of his presentation was multi-modal search, an approach that leverages multiple types of input vectors, such as those derived from both text and images, to perform semantic and hybrid search. This capability is particularly valuable in scenarios where traditional metadata is insufficient or unavailable, enabling richer and more accurate search results.

Saha also discussed essential components that support this functionality, including model management, agent orchestration, and real-time inference. These foundational elements ensure that multi-modal search can be deployed efficiently and at scale, paving the way for more intelligent and flexible search experiences.

A key focus of Saha’s talk was the inference processor in OpenSearch, which stands out for its flexibility. It can be applied at multiple stages of the search pipeline, during indexing, at query time, and even post-search. This versatility enables a wide range of powerful applications.

For example, Named Entity Recognition (NER) models can be applied at index time to enrich documents with structured metadata. At search time, hybrid search can be enhanced with re-ranking models that refine the relevance of results. Finally, after search execution, the inference processor can be used to implement RAG, potentially in combination with connectors and large language models, to synthesize more informative and context-aware answers.

This flexible design makes OpenSearch a strong foundation for building advanced, AI-driven search experiences.

Agentic RAG: The New Frontier of Search

Another interesting subject was the challenge of evolving RAG into something more dynamic. In her talk, Bilge Yücel showed how combining RAG with agentic behavior allows us to create AI systems that don’t just retrieve and respond, but also plan, reason, and act.

While standard RAG pipelines are great for fact-based queries, they struggle with complex requests, requiring query decomposition, iterative search strategies, and dynamic retrieval from multiple sources to improve accuracy.

That’s when agents come into play: an AI agent is a system that autonomously pursues a goal by interacting with its environment, using tools, and adapting actions based on feedback to reach the final goal.
Agentic behavior in AI involves giving a model the ability to choose its next action by incorporating components like planning, decision-making and reasoning.

The speaker demonstrated this concept using the Haystack framework, an orchestration system for building modular agent pipelines. Her demo featured an agent built with OpenAI’s GPT-4 mini, equipped with tools like web search, document parsing and weather APIs. The agent received a query, dynamically decided which tools to call, and looped through reasoning steps to produce a structured answer.

In summary, she proved that using prompting strategies like “Let’s think step by step” and ReAct (Reason + Act), language models can be taught to refine their answers over multiple steps. The result: more intelligent, adaptable systems that are ideal for real-world enterprise applications.

A new approach to E-commerce search

As experts in the domain of E-commerce search, we couldn’t miss the talk “What You See Is What You Mean: Intent-Based E-Commerce Search” by Dennis Berger, Marco Petris, and Volker Carlguth. Their talk addressed a recurring challenge in e-commerce: how to make search results both precise and diverse, especially when users enter broad or ambiguous queries.

Their solution? A novel intent-based clustering approach, powered by large language models. Instead of relying solely on traditional hybrid search systems, which often struggle with inspirational or vague queries, their method brings in-depth query understanding to the very beginning of the search pipeline. By using LLMs to interpret the user’s intent upfront, the system can more effectively guide retrieval, clustering, validation, and presentation steps.

This technique isn’t just academic, it’s been tested at scale. The team shared how LLMs help uncover hidden meanings in queries, boosting recall while still maintaining high precision. Ambiguity is resolved early, and the results are visually organized by disambiguated meanings, reducing the cognitive overload of the “paradox of choice” and giving users clearer paths through large product catalogs.

Of course, relying entirely on LLMs would be expensive. That’s why the team also fine-tuned a BERT-based model to handle binary classification of product relevance. Their model was trained on over a million intent-product pairs for the German e-commerce market and achieved an F-score of 0.779. A modular design allows for easy model comparison, whether between fine-tuned transformers or commercial LLMs like GPT or Gemini.

Performance wasn’t the only focus. The team also looked at search optimization strategies: caching works well for popular exploratory queries like “washing machines”, while dynamic processing remains key for seasonal or less frequent ones like “Christmas decorations”.

Another finding: reducing user queries from dozens to just a few shorter, intent-driven ones makes the experience faster and more semantically aligned, albeit with slightly less diversity in results.

The presentation closed with insights from UX testing, showing that users benefited from intent-based clustering as a visual and cognitive aid. It’s a promising direction for making e-commerce search more intuitive, more intelligent, and more aligned with what users truly mean when they type.

Conversations on culture, collaboration and career growth

This year’s Berlin Buzzwords wasn’t just about innovative approaches to search or agentic AI models, but it also spotlighted the human dynamics that make technical teams successful. Talks by Fatima Taj and the duo Marion Nehring & Jessie de Groot reminded us of the importance of building resilient company cultures and navigating career paths with intention.

How the right culture can create strong communities

Nehring and de Groot delivered a powerful message: great products and thriving open-source communities are not just built on tech. They’re built on people, culture, and values.

They emphasized that “people buy from people": we’re more likely to try a product recommended by a trusted peer than one advertised by a company. A premium product like Nike Airs isn’t just about materials and design; it’s about the emotional story and trust built around it. That same principle applies in open source: value is created not just in features, but in the relationships and experiences surrounding a project.

That’s why strong company values like openness, kindness, and transparency aren’t just important to internal teams; they must extend outward, shaping how users and contributors interact with your community. Some standout practices include:

blameless postmortems that foster psychological safety and shared learning. This creates a culture where mistakes are seen as growth opportunities, not failures;
quarterly peer-voted awards and appreciation channels, to spotlight people who have shone with excellence;
internally, strong onboarding processes, regular coffee chats, and moments of “unreasonable hospitality”, to create positive emotional experiences that make people want to stay and contribute.

Nehring and de Groot also stress that leadership must model the values they want to see. In large companies, values can easily become hollow slogans unless leaders actively live them through small, everyday actions. Whether it’s giving kind, actionable feedback or making time to truly listen to a colleague, these moments matter.

Career growth: how to be seen for your technical impact

In her talk, Fatima Taj shared some advice on how she avoided being boxed into the role of the “glue”: the person who keeps the team running with supportive, but often invisible, non-technical work, like writing documentation or managing cross-team communications.

Fatima’s journey began when she became the default owner of a critical rollout process at Yelp. As the most familiar person with the project, she was naturally leaned on for updates, communication, and coordination. The visibility felt good, until it didn’t. Her time and energy were spent in spreadsheets and stakeholder calls rather than working on technical tasks. Despite being at the center of the project, her engineering role became secondary.

Reading Tanya Reilly’s article “Being Glue” sparked a wake-up call. She realized that without intervention, she risked stalling her growth by becoming the team’s unofficial project manager, rather than being recognized as a competent software engineer.

So she began a strategic shift: first, she started saying no with intention. Then she trained teammates to take on ownership of recurring tasks and documented her processes, enabling a healthier distribution of responsibilities. She also built a “brag document”, a living record of accomplishments, updated every few weeks: this helped her advocate for herself during performance reviews and kept her focused on having enough technical contributions.

Her key takeaway is that doing good work isn’t enough; you have to make the right people aware of it. Fatima’s message for managers: clarify how different contributions are valued, ensure technical and non-technical work are both recognized, and distribute team responsibilities intentionally. Her message for peers: learn to delegate, let go of control, and be unafraid to valorise your work.

Conclusion

Berlin Buzzwords 2025 reaffirmed why it remains a flagship event for the search and data community. This year’s edition showcased not only the growing maturity of semantic and hybrid search technologies, but also the exciting frontiers being explored, from multi-modal pipelines and agentic AI to intent-driven e-commerce and performance-optimized infrastructure.

Yet beyond the code and models, what resonated deeply was the human side of tech: how we build teams, support careers, and create inclusive, value-driven cultures that power innovation.

We left Berlin inspired, not only by the tools and trends shaping the future of search, but by the vibrant community of thinkers, builders, and leaders driving it forward. Until next year!

Reflections from Haystack US 2025 - The Future of Search

16/05/2025

At Haystack US, the search and relevance community came together to explore how AI and LLMs are reshaping the way we search. Let's discover this year's hot topics!

Read the article

ElasticON 2025 in Paris

22/01/2025

ElasticON has returned to the capital with plenty of new features and exciting talks. This article wraps up the event.

Read the article

Voxxed Days Luxembourg 2024

28/06/2024

This year again, Voxxed Days did not disappoint expectations. Inside this article, you will find a summary of the talks we attended as viewers and as speakers.

Read the article

AI, Big Data, and Search. Welcome to Berlin Buzzwords 2025

Table of contents

Introduction

Semantic Search: More Than Just a Buzzword

Performance tuning Apache Solr for Dense Vectors

Advancing Multi-Modal search capabilities in search pipeline

Agentic RAG: The New Frontier of Search

A new approach to E-commerce search

Conversations on culture, collaboration and career growth

How the right culture can create strong communities

Career growth: how to be seen for your technical impact

Conclusion

Reflections from Haystack US 2025 - The Future of Search

ElasticON 2025 in Paris

Voxxed Days Luxembourg 2024