Tai Nguyen

I am Tai (Đức Tài), a research engineer at Apple . I am working on pushing the boundaries of on-device language models and multimodality.

Previously, I got my MS from the University of Pennsylvania, where I got started on research with Eric Wong and Chris Callison-Burch. I was also fortunate to work with Ben Bogin from Ai2.

Before that, I helped build an analytics tool to support mainframes at IBM Systems. I studied Economics at the wonderful Haverford College and wrote my undergraduate thesis on the impact of Airbnb on welfare.

I grew up in Saigon, Vietnam. 🇻🇳

Email  /  GitHub  /  Google Scholar  /  huggingface  /  Twitter

profile photo
Photo credit: Grace Pindzola

Research


(*: equal contribution)

DataDecide: How to Predict Best Pretraining Data with Small Experiments

new!
Ian Magnusson*, Nguyen Tai*, Ben Bogin*, David Heineman, Jena D Hwang, Luca Soldaini, Akshita Bhagia, Jiacheng Liu, Dirk Groeneveld, Oyvind Tafjord, Noah A Smith, Pang Wei Koh, Jesse Dodge
In submission 2025
arxiv / code / blog / huggingface /

MMTEB: Massive Multilingual Text Embedding Benchmark


Kenneth Enevoldsen, Isaac Chung, ... Nguyen Tai ..., Niklas Muennighoff (82 authors)
ICLR 2025
arxiv / code / website /

In-context Example Selection with Influences


Nguyen Tai, Eric Wong
arXiv 2024
arxiv / code / blog /

Explanation-based Finetuning Makes Models More Robust to Spurious Cues


Josh Magnus Ludan, Yixuan Meng*, Tai Nguyen*, Saurabh Shah, Qing Lyu, Marianna Apidianaki, Chris Callison-Burch
ACL 2023
arxiv / code /

Software Entity Recognition with Noise-robust Learning


Tai Nguyen, Yifeng Di, Joohan Lee, Muhao Chen, Tianyi Zhang
ASE 2023
arxiv / code / huggingface /


Projects

Big Data Bowl


Ryan Brill, Joseph Rudoler, Tai Nguyen, Ryan Gross
2023
writeup / video / code / feature article /

One of 5 finalists, winning $15,000. We got to meet the Director of Research of the NFL and had a professional video made.

Underthesea


2022
website / code /

Contributed a small amount to an open-source Vietnamese toolkit built by the amazing Anh Vu. This helped me get started on NLP.

STEAM For Vietnam


2022
website /

During Covid, I volunteered for a non-profit that provides free online education for Vietnamese children. I worked on the data science team.


Miscellanea

I enjoy boxing and tennis in my free time. I am a fan of the Seahawks, and try to travel as much as I can.

My Top 100 list inspired by Huyền Chip.


Design and source code from Jon Barron's website