Tokenization in Natural Language Processing (NLP)

Language and keyboard stuff @Google + PhD student at Tokyo Institute of Technology. I like computers and Korean and computers-and-Korean and high school CS education. Georgia Tech → 연세대학교 → 東京工業大学. https://theoreticallygoodwithcomputers.com/

Catherine Arnett

@catherinearnett.bsky.social

NLP Researcher at EleutherAI, PhD UC San Diego Linguistics. Previously PleIAs, Edinburgh University. Interested in multilingual NLP, tokenizers, open science. 📍Boston. She/her. https://catherinearnett.github.io/

Anna Wegmann

@annawegmann.bsky.social

PhD candidate in NLP at Utrecht University | Accounting for language variation in ML/NLP | Tokenizers! | Paraphrases | she/her https://annawegmann.github.io/

Craig Schmidt

@craigschmidt.com

Interested in ML, AI, and NLP. Particularly interested in tokenization. Live in the Boston area and work at Kensho Technologies.

Tokenization in Natural Language Processing (NLP)

Vilém Zouhar at NAACL25

Chris Tanner

Tim Vieira

Yuval Pinter

Marco

Catherine Arnett

Anna Wegmann

Craig Schmidt