AI Evaluation

AI Evaluation is a mess Let's talk about how to fix it!

Created by

Yotam Perlitz

@yperlitz.bsky.social

View in Bluesky

Alex Choi

@aschoi.bsky.social

student @georgemasoncs.bsky.social. NLP with GMNLP. optimistic and curious.

Sagnik Mukherjee

@sagnikmukherjee.bsky.social

NLP PhD student @convai_uiuc | Agents, Reasoning, evaluation etc. https://sagnikmukherjee.github.io https://scholar.google.com/citations?user=v4lvWXoAAAAJ&hl=en

Stella Biderman

@stellaathena.bsky.social

I make sure that OpenAI et al. aren't the only people who are able to study large scale AI systems.

Gabi Stanovsky

@gabistanovsky.bsky.social

Assistant professor at the Hebrew University.

Anna Rogers

@annarogers.bsky.social

Associate professor at IT University of Copenhagen: NLP, language models, interpretability, AI & society. Co-editor-in-chief of ACL Rolling Review. #NLProc #NLP

Sam Bowman

@sleepinyourhat.bsky.social

AI safety at Anthropic, on leave from a faculty job at NYU. Views not employers'. I think you should join Giving What We Can. cims.nyu.edu/~sbowman

Evals/Agents @HuggingFace - 🐍💻📚✨ "The future is already here, it’s just not very evenly distributed" (Gibson) Most relevant current works: - Evals guidebook: github.com/huggingface/evaluation-guidebook - Agents bench: arxiv.org/abs/2311.12983

Leshem Choshen

@lchoshen.bsky.social

🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism

Yotam Perlitz

@yperlitz.bsky.social

Research Scientist at @ibmresearch #NLProc, #RL. Opinions are my own.

AI Evaluation

Alex Choi

Sagnik Mukherjee

Stella Biderman

Yoav Goldberg

Gabi Stanovsky

Anna Rogers

Sam Bowman

Clémentine Fourrier 🍊

Leshem Choshen

Yotam Perlitz