Publications

Fine-Tuning LLaMA for Multi-Stage Text Retrieval

Xueguang Ma, Liang Wang, Nan Yang, Furu Wei and Jimmy Lin

ArXiv 2023.


Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models

Raphael Tang, Xinyu Zhang, Xueguang Ma, Jimmy Lin and Ferhan Ture

ArXiv 2023.


Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering

Yubo Wang, Xueguang Ma and Wenhu Chen

ArXiv 2023.


Anserini Gets Dense Retrieval: Integration of Lucene’s HNSW Indexes

Xueguang Ma, Tommaso Teofili, Jimmy Lin

CIKM 2023.


TheoremQA: A Theorem-driven Question Answering dataset

Wenhu Chen, Ming Yin, Max Ku, Elaine Wan, Xueguang Ma, Jianyu Xu, Tony Xia, Xinyi Wang and Pan Lu

EMNLP 2023.


Zero-Shot Listwise Document Reranking with a Large Language Model

Xueguang Ma, Xinyu Zhang, Ronak Pradeep and Jimmy Lin

ArXiv 2023.


Precise Zero-Shot Dense Retrieval without Relevance Labels

Luyu Gao*, Xueguang Ma*, Jimmy Lin, and Jamie Callan

ACL 2023.


Few-shot In-context Learning for Knowledge Base Question Answering

Tianle Li, Xueguang Ma, Alex Zhuang, Yu Gu, Yu Su and Wenhu Chen

ACL 2023.


Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

Wenhu Chen*, Xueguang Ma*, Xinyi Wang, and William W. Cohen

TMLR 2023.


SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

Minghan Li, Sheng-Chieh Lin, Xueguang Ma and Jimmy Lin

SIGIR 2023.


Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval

Luyu Gao*, Xueguang Ma*, Jimmy Lin, and Jamie Callan

SIGIR 2023.


Document Expansions and Learned Sparse Lexical Representations for MS MARCO V1 and V2

Xueguang Ma, Ronak Pradeep, Rodrigo Nogueira, and Jimmy Lin

SIGIR 2022.


To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers

Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, and Guido Zuccon

SIGIR 2022.


Sparsifying Sparse Representations for Passage Retrieval by Top-k Masking

Jheng-Hong Yang, Xueguang Ma, and Jimmy Lin

arXiv 2021.


A Replication Study of Dense Passage Retriever

Xueguang Ma, Kai Sun, Ronak Pradeep, Minghan Li and Jimmy Lin

ECIR 2022.


Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study

Hang Li, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, and Guido Zuccon.

ECIR 2022.


Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval

Xueguang Ma*, Minghan Li*, Kai Sun, Ji Xin, Jimmy Lin

EMNLP 2021.


Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval

Xinyu Zhang, Xueguang Ma, Peng Shi, Jimmy Lin

MRL 2021 (EMNLP).


On the Separation of Logical and Physical Ranking Models for Text Retrieval Applications

Jimmy Lin, Xueguang Ma, Joel Mackenzie, Antonio Mallia, and Michał Siedlaczek

DESIRES 2021.


A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques

Jimmy Lin and Xueguang Ma

arXiv 2021.


Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep and Rodrigo Nogueira

SIGIR 2021.


Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira and Jimmy Lin

SIGIR 2021.


Scientific Claim Verification with VERT5ERINI

Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira and Jimmy Lin

LOUHI 2021 (EACL).


H2oloo at TREC 2020: When all you got is a hammer… Deep Learning, Health Misinformation, and Precision Medicine

Ronak Pradeep, Xueguang Ma, Xinyu Zhang, Hang Cui, Ruizhou Xu, Rodrigo Nogueira and Jimmy Lin

TREC 2020.