CVPR 2023 Tutorial on

Neural Search in Action

Time and venue

Date
June 19th, 2023
Time
Half Day - Afternoon (13:30 - 16:30)
Room
West113
Venue
Vancouver Convention Center, Vancouver, Canada
Zoom
Virtual site and zoom link

Overview

Neural search, a technique for efficiently searching for similar items in deep embedding space, is the most fundamental technique for handling large multimodal collections. With the advent of powerful technologies such as foundation models and prompt engineering, efficient neural search is becoming increasingly important. For example, multimodal encoders such as CLIP allow us to convert various problems into simple embedding-and-search. Another example is the way to feed information into LLMs; currently, vector search engines are a promising direction. Despite the above attention, it is not obvious how to design a search algorithm for given data. In this tutorial, we will focus on "million-scale search", "billion-scale search", and "query language" to show how to tackle real-world search problems:

  • First, we outline the theory and applications of graph-based nearest neighbor search methods. Graph-based methods are the current de facto standard for in-memory (million-scale) search, but they are difficult to understand because of their complex structure with many heuristics. We will explain its basic mathematical concepts, summarize recent improvements, and provide practical guidelines for choosing an algorithm.
  • The second part of the tutorial will cover current approaches and benchmarking efforts on billion-scale approximate nearest neighbor search. It will extend the discussion of the first tutorial part to this scale and outline the general search pipeline and the applicability of different methods (graph-based/cluster-based/quantization). At the end, it summarizes interesting research directions.
  • Finally, we will provide an overview of query language for neural search, covering its syntax, semantics, and applications. Query language is a crucial aspect of neural search that allows users to express their information needs and constraints in a structured and compositional way that the system can understand and act on. We will discuss how query language can be integrated with vector similarity search and BM25 to improve information retrieval performance. We will also cover common challenges and recent developments in the field, and provide guidance on designing and implementing query languages for neural search systems. This tutorial is aimed at researchers and practitioners who are interested in using query language for neural search in their work.

Schedule

Time Session Presenter Link
13:30-13:40 Opening Yusuke Matsui Slides
13:40-14:30 Theory and Applications of Graph-based Search Yusuke Matsui Slides
14:30-15:20 A Survey on Billion-Scale Approximate Nearest Neighbors Martin Aumüller Slides
15:20-15:30 Break
15:30-16:20 Query Language for Neural Search in Practical Applications Han Xiao Slides

Organizers

Yusuke Matsui

The University of Tokyo

Martin Aumüller

IT University of Copenhagen

Han Xiao

Jina AI

BibTeX

@misc{cvpr23_tutorial_neural_search,
  author = {Yusuke Matsui and Martin Aum{\"u}ller and Han Xiao},
  title = {CVPR2023 Tutorial on Neural Search in Action},
  howpublished = {\url{https://matsui528.github.io/cvpr2023_tutorial_neural_search/}},
  year = {2023}
}