Skip to yearly menu bar Skip to main content


Poster

Evaluating Transferability in Retrieval Tasks: An Approach Using MMD and Kernel Methods

Mengyu Dai · Amir Hossein Raffiee · Aashish Jain · Joshua Correa

Arch 4A-E Poster #273
[ ] [ Paper PDF ]
[ Poster
Fri 21 Jun 10:30 a.m. PDT — noon PDT

Abstract:

Retrieval tasks play central roles in real-world machine learning systems such as search engines, recommender systems, and retrieval-augmented generation (RAG). Achieving decent performance in these tasks often requires fine-tuning various pre-trained models on specific datasets and selecting the best candidate, a process that can be both time and resource-consuming. To tackle the problem, we introduce a novel and efficient method, called RetMMD, that leverages Maximum Mean Discrepancy (MMD) and kernel methods to assess the transferability of pre-trained models in retrieval tasks. RetMMD is calculated on pre-trained models and target datasets without any fine-tuning involved. Specifically, given some query, we quantify the distribution discrepancy between relevant and irrelevant document embeddings, by estimating the similarities within their mappings in the fine-tuned embedding space through kernel methods. This discrepancy is averaged over multiple queries, taking into account the distribution characteristics of the target dataset. Experiments suggest that the proposed metric calculated on pre-trained models closely aligns with retrieval performance post-fine-tuning. The observation holds across a variety of datasets, including image, text, and multi-modal domains, indicating the potential of using MMD and kernel methods for transfer learning evaluation in retrieval scenarios. In addition, we also design a way of evaluating dataset transferability for retrieval tasks, with experimental results demonstrating the effectiveness of the proposed approach.

Chat is not available.