Tokenization in Natural Language Processing (NLP)
This is a start pack for people working in the area of tokenization for Large Language Models (LLMs) and Natural Language Processing (NLP).
Created by
@craigschmidt.com
@zouharvi.bsky.social
PhD student @ ETH Zürich | all aspects of NLP but mostly evaluation and MT | go vegan | https://vilda.net
@chriswtanner.bsky.social
Head of R&D @Kensho and lecturer at @MIT, teaching NLP and ML. Enjoys challenging #hikes ⛰ and #woodworking 🌲 Prev: @Harvard, @BrownCSDept, @MITLL
@mcognetta.bsky.social
Language and keyboard stuff @Google + PhD student at Tokyo Institute of Technology. I like computers and Korean and computers-and-Korean and high school CS education. Georgia Tech → 연세대학교 → 東京工業大学. https://theoreticallygoodwithcomputers.com/
@catherinearnett.bsky.social
NLP Researcher at EleutherAI, PhD UC San Diego Linguistics. Previously PleIAs, Edinburgh University. Interested in multilingual NLP, tokenizers, open science. 📍Boston. She/her. https://catherinearnett.github.io/
@annawegmann.bsky.social
PhD candidate in NLP at Utrecht University | Accounting for language variation in ML/NLP | Tokenizers! | Paraphrases | she/her https://annawegmann.github.io/
@craigschmidt.com
Interested in ML, AI, and NLP. Particularly interested in tokenization. Live in the Boston area and work at Kensho Technologies.