CVPR 2020 Tutorial on

Image Retrieval in the Wild

Time and venue

Date and time
June 19th (AM), 2020
XXX, The Washington State Convention Center, Seattle, WA, U.S.


Content-based image retrieval is one of the most essential techniques used for interacting with visual collections. Although significant progress has been made in the last decade, existing technologies have only been evaluated on a standard benchmark such as the Oxford dataset, which mainly consists of building images. There has not been enough discussion about how to create a practical and large-scale visual search system for real-world applications, such as recommendation for shopping items in online marketplaces or re-identification for pedestrians in a security scenario.

This tutorial covers several important components of building an image retrieval system for real-world applications.

  • First, we will review state-of-the-art algorithms of approximate nearest neighbor search. The design of the search algorithm is critical for performance. We will provide a practical guide to select the best algorithm for the given task.
  • In the second part, we will present an example of how such an algorithm is utilized in an online C2C marketplace app, which has over one billion listings and over 15 million monthly active users. Specifically, we will show how we productionize a highly scalable and available visual search system on Kubernetes for the app.
  • In the third part, we will conduct a systematic review for heterogeneous person re-identification, where the inter-modality discrepancy works as the main challenge. We consider four cross-modality application scenarios: low-resolution (LR), infrared (IR), sketch, and text. We will introduce and organize the available datasets in each category, and will summarize and compare the representative approaches.
  • Finally, we will provide a live-coding demo to implement an image search engine from scratch. By leveraging a pre-trained deep model, we will show that a web-based search system can be easily achieved with only 100 lines of Python code.


Extensive survey of approximate nearest neighbor search - Yusuke Matsui slides
A Large-scale visual search system in the C2C marketplace app Mercari - Takuma Yamaguchi slides
Coffee break
Beyond Intra-modality Discrepancy: A Survey of Heterogeneous Person Re-identification - Zheng Wang slides
Live-coding demo to implement an image search engine from scratch - Yusuke Matsui slides


Yusuke Matsui

The University of Tokyo

Takuma Yamaguchi

Mercari Inc.

Zheng Wang

National Institute of Informatics

Supported by