Selected Publications
Unifying Multimodal Retrieval via Document Screenshot Embedding
Xueguang Ma, Sheng-Chieh Lin, Minghan Li, Wenhu Chen and Jimmy Lin
ArXiv 2024.
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
Ziyan Jiang, Xueguang Ma and Wenhu Chen
ArXiv 2024.
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue and Wenhu Chen
ArXiv 2024.
PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval
Shengyao Zhuang, Xueguang Ma, Bevan Koopman, Jimmy Lin and Guido Zuccon
ArXiv 2024.
Fine-Tuning LLaMA for Multi-Stage Text Retrieval
Xueguang Ma, Liang Wang, Nan Yang, Furu Wei and Jimmy Lin
SIGIR 2024.
Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering
Yubo Wang, Xueguang Ma and Wenhu Chen
ArXiv 2023.
Anserini Gets Dense Retrieval: Integration of Lucene’s HNSW Indexes
Xueguang Ma, Tommaso Teofili, Jimmy Lin
CIKM 2023.
Zero-Shot Listwise Document Reranking with a Large Language Model
Xueguang Ma, Xinyu Zhang, Ronak Pradeep and Jimmy Lin
ArXiv 2023.
Precise Zero-Shot Dense Retrieval without Relevance Labels
Luyu Gao*, Xueguang Ma*, Jimmy Lin, and Jamie Callan
ACL 2023.
Few-shot In-context Learning for Knowledge Base Question Answering
Tianle Li, Xueguang Ma, Alex Zhuang, Yu Gu, Yu Su and Wenhu Chen
ACL 2023.
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Wenhu Chen*, Xueguang Ma*, Xinyi Wang, and William W. Cohen
TMLR 2023.
Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval
Luyu Gao*, Xueguang Ma*, Jimmy Lin, and Jamie Callan
SIGIR 2023.
Document Expansions and Learned Sparse Lexical Representations for MS MARCO V1 and V2
Xueguang Ma, Ronak Pradeep, Rodrigo Nogueira, and Jimmy Lin
SIGIR 2022.
Sparsifying Sparse Representations for Passage Retrieval by Top-k Masking
Jheng-Hong Yang, Xueguang Ma, and Jimmy Lin
arXiv 2021.
A Replication Study of Dense Passage Retriever
Xueguang Ma, Kai Sun, Ronak Pradeep, Minghan Li and Jimmy Lin
ECIR 2022.
Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval
Xueguang Ma*, Minghan Li*, Kai Sun, Ji Xin, Jimmy Lin
EMNLP 2021.
Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval
Xinyu Zhang, Xueguang Ma, Peng Shi, Jimmy Lin
MRL 2021 (EMNLP).
On the Separation of Logical and Physical Ranking Models for Text Retrieval Applications
Jimmy Lin, Xueguang Ma, Joel Mackenzie, Antonio Mallia, and Michał Siedlaczek
DESIRES 2021.
A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques
Jimmy Lin and Xueguang Ma
arXiv 2021.
Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations
Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep and Rodrigo Nogueira
SIGIR 2021.
Vera: Prediction Techniques for Reducing Harmful Misinformation in Consumer Health Search
Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira and Jimmy Lin
SIGIR 2021.
Scientific Claim Verification with VERT5ERINI
Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira and Jimmy Lin
LOUHI 2021 (EACL).
H2oloo at TREC 2020: When all you got is a hammer… Deep Learning, Health Misinformation, and Precision Medicine
Ronak Pradeep, Xueguang Ma, Xinyu Zhang, Hang Cui, Ruizhou Xu, Rodrigo Nogueira and Jimmy Lin
TREC 2020.