Exploring the Nexus Between Retrievability and Query Generation Strategies
Published in The European Conference on Information Retrieval (ECIR), 2024
In this paper, we investigate the reproducibility of retrievability analysis due to varying query sets used in the literature doing retrievability analysis of IR models. We experiment with all the major query generation strategies in the current literature and a real query log. Since, theroretically an ideal query set is a real query log over the corpus used, we find very low correlations between retrievability scores estimated through artificial query generation strategies and retrievability scores estimated through a real query log. We then propose an incremental improvement for generating artificial queries where we exploit POS-filtering to simulate more realistic queries which are found in query logs. This reproducibility track paper inspires development of more robust strategies to estimate retrival biases of retrieval models and stirs future work in the area.
Recommended citation: Sinha, A., Mall, P.R., Roy, D. (2024). Exploring the Nexus Between Retrievability and Query Generation Strategies. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14611. Springer, Cham. https://doi.org/10.1007/978-3-031-56066-8_16
Download Paper