AI Evaluation
AI Evaluation is a mess Let's talk about how to fix it!
Created by
@yperlitz.bsky.social
@aschoi.bsky.social
student @georgemasoncs.bsky.social. NLP with GMNLP. optimistic and curious.
@sagnikmukherjee.bsky.social
NLP PhD student @convai_uiuc | Agents, Reasoning, evaluation etc. https://sagnikmukherjee.github.io https://scholar.google.com/citations?user=v4lvWXoAAAAJ&hl=en
@stellaathena.bsky.social
I make sure that OpenAI et al. aren't the only people who are able to study large scale AI systems.
@yoavgo.bsky.social
@annarogers.bsky.social
Associate professor at IT University of Copenhagen: NLP, language models, interpretability, AI & society. Co-editor-in-chief of ACL Rolling Review. #NLProc #NLP
@sleepinyourhat.bsky.social
AI safety at Anthropic, on leave from a faculty job at NYU. Views not employers'. I think you should join Giving What We Can. cims.nyu.edu/~sbowman
@clefourrier.hf.co
Evals/Agents @HuggingFace - 🐍💻📚✨ "The future is already here, it’s just not very evenly distributed" (Gibson) Most relevant current works: - Evals guidebook: github.com/huggingface/evaluation-guidebook - Agents bench: arxiv.org/abs/2311.12983
@lchoshen.bsky.social
🥇 #NLProc researcher 🥈 Opinionatedly Summarizing #ML & #NLP papers 🥉 Good science #scientivism
@yperlitz.bsky.social
Research Scientist at @ibmresearch #NLProc, #RL. Opinions are my own.