Publications
Fine-Tuning LLaMA for Multi-Stage Text Retrieval
Xueguang Ma, Liang Wang, Nan Yang, Furu Wei and Jimmy Lin
ArXiv 2023.
Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models
Raphael Tang, Xinyu Zhang, Xueguang Ma, Jimmy Lin and Ferhan Ture
ArXiv 2023.
Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering
Yubo Wang, Xueguang Ma and Wenhu Chen
ArXiv 2023.
Anserini Gets Dense Retrieval: Integration of Lucene’s HNSW Indexes
Xueguang Ma, Tommaso Teofili, Jimmy Lin
CIKM 2023.
TheoremQA: A Theorem-driven Question Answering dataset
Wenhu Chen, Ming Yin, Max Ku, Elaine Wan, Xueguang Ma, Jianyu Xu, Tony Xia, Xinyi Wang and Pan Lu
EMNLP 2023.
Zero-Shot Listwise Document Reranking with a Large Language Model
Xueguang Ma, Xinyu Zhang, Ronak Pradeep and Jimmy Lin
ArXiv 2023.
Precise Zero-Shot Dense Retrieval without Relevance Labels
Luyu Gao*, Xueguang Ma*, Jimmy Lin, and Jamie Callan
ACL 2023.
Few-shot In-context Learning for Knowledge Base Question Answering
Tianle Li, Xueguang Ma, Alex Zhuang, Yu Gu, Yu Su and Wenhu Chen
ACL 2023.
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Wenhu Chen*, Xueguang Ma*, Xinyi Wang, and William W. Cohen
TMLR 2023.
SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes
Minghan Li, Sheng-Chieh Lin, Xueguang Ma and Jimmy Lin
SIGIR 2023.
Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval
Luyu Gao*, Xueguang Ma*, Jimmy Lin, and Jamie Callan
SIGIR 2023.
Document Expansions and Learned Sparse Lexical Representations for MS MARCO V1 and V2
Xueguang Ma, Ronak Pradeep, Rodrigo Nogueira, and Jimmy Lin
SIGIR 2022.
To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers
Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, and Guido Zuccon
SIGIR 2022.
Sparsifying Sparse Representations for Passage Retrieval by Top-k Masking
Jheng-Hong Yang, Xueguang Ma, and Jimmy Lin
arXiv 2021.
A Replication Study of Dense Passage Retriever
Xueguang Ma, Kai Sun, Ronak Pradeep, Minghan Li and Jimmy Lin
ECIR 2022.
Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study
Hang Li, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, and Guido Zuccon.
ECIR 2022.
Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval
Xueguang Ma*, Minghan Li*, Kai Sun, Ji Xin, Jimmy Lin
EMNLP 2021.
Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval
Xinyu Zhang, Xueguang Ma, Peng Shi, Jimmy Lin
MRL 2021 (EMNLP).
On the Separation of Logical and Physical Ranking Models for Text Retrieval Applications
Jimmy Lin, Xueguang Ma, Joel Mackenzie, Antonio Mallia, and Michał Siedlaczek
DESIRES 2021.
A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques
Jimmy Lin and Xueguang Ma
arXiv 2021.
Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations
Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep and Rodrigo Nogueira
SIGIR 2021.
Vera: Prediction Techniques for Reducing Harmful Misinformation in Consumer Health Search
Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira and Jimmy Lin
SIGIR 2021.
Scientific Claim Verification with VERT5ERINI
Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira and Jimmy Lin
LOUHI 2021 (EACL).
H2oloo at TREC 2020: When all you got is a hammer… Deep Learning, Health Misinformation, and Precision Medicine
Ronak Pradeep, Xueguang Ma, Xinyu Zhang, Hang Cui, Ruizhou Xu, Rodrigo Nogueira and Jimmy Lin
TREC 2020.