Clément Dumas

@butanium.bsky.social

Master student at ENS Paris-Saclay / aspiring AI safety researcher / improviser
Prev research intern @ EPFL w/ wendlerc.bsky.social and Robert West
MATS Winter 7.0 Scholar w/ neelnanda.bsky.social

https://butanium.github.io

View in Bluesky

Starter Packs

Created by Clément Dumas (1)

Mechanistic interpretability

Starter pack with mechanistic interpretability researchers mostly posting about their research

BlackboxNLP

@blackboxnlp.bsky.social

The largest workshop on analysing and interpreting neural networks for NLP. BlackboxNLP will be held at EMNLP 2025 in Suzhou, China blackboxnlp.github.io

Maxime Méloux

@maximemeloux.bsky.social

PhD student @LIG | Causal abstraction, interpretability & LLMs

David Duvenaud

@davidduvenaud.bsky.social

Machine learning prof at U Toronto. Working on evals and AGI governance.

Antonin Poché

@antoninpoche.bsky.social

PhD Student doing XAI for NLP at @ANITI_Toulouse, IRIT, and IRT Saint Exupery. 🛠️ Xplique library development team member.

Kerem Sahin

@rhothomirgyrlass.bsky.social

MS CS @ Northeastern | Fulbright Scholar

Byron Wallace

@byron.bsky.social

Assoc. Prof in CS @ Northeastern, NLP/ML & health & etc. He/him.

@hadasorgad.bsky.social

Actionable Interpretability Workshop ICML2025

@actinterp.bsky.social

🛠️ Actionable Interpretability🔎 @icmlconf.bsky.social 2025 | Bridging the gap between insights and actions ✨ https://actionable-interpretability.github.io

Steve Byrnes

@stevebyrnes.bsky.social

Researching Artificial General Intelligence Safety, via thinking about neuroscience and algorithms, at Astera Institute. https://sjbyrnes.com/agi.html

@aranguri.bsky.social

Martin Wattenberg

@wattenberg.bsky.social

Human/AI interaction. ML interpretability. Visualization as design, science, art. Professor at Harvard, and part-time at Google DeepMind.

@dimkakha.bsky.social

Can

@canrager.bsky.social

Cas (Stephen Casper)

@scasper.bsky.social

AI technical governance & risk management research. PhD Candidate at MIT CSAIL. Also at https://x.com/StephenLCasper. https://stephencasper.com/

almost dribnet

@dribnet.bsky.social

I moved here -> https://bsky.app/profile/drib.net <- here moved I

@vkrakovna.bsky.social

Research scientist in AI alignment at Google DeepMind. Co-founder of Future of Life Institute. Views are my own and do not represent GDM or FLI.

Shreyans

@pyparrot.bsky.social

Interpretability, AI ethics, Reinforcement Learning

Sara Fish

@sarafish.bsky.social

PhD student at Harvard interested in EconCS and ML / previously Caltech undergrad in math

Hidenori Tanaka

@hidenori8tanaka.bsky.social

Group Leader, CBS-NTT "Physics of Intelligence" Program at Harvard website: https://sites.google.com/view/htanaka/home

Core Francisco Parkg

@corefpark.bsky.social

https://cfpark00.github.io/

Ido Aizenbud

@idoai.bsky.social

Computational Neuroscience PhD Student

@evhub.bsky.social

Alignment Stress-Testing Team Lead at Anthropic. Opinions my own. Previously: MIRI, OpenAI, Google, Yelp, Ripple. (he/him/his)

Sébastien Simao

@taovacano.bsky.social

Maths addict : Apmep, Maths pour tous, Ires Aix-Marseille, club de Maths Marseille. Prof au lycée Monte-Cristo à Allauch Encadrant plongée sur Marseille.

Aengus Lynch

@aengusl.bsky.social

AI safety researcher

Tim Hua

@timhua.bsky.social

Helping people is good I guess Trying to do AI interp and control Used to do economics timhua.me

@yimingliu.bsky.social

Praneet

@praneet.bsky.social

ML PhD at McGill

Dennis Fucci

@dennisfucci.bsky.social

Speech | XAI | Fairness in AI PhD student @fbk-mt.bsky.social

@emmabortz.bsky.social

Angie Boggust

@angieboggust.bsky.social

MIT PhD candidate in the VIS group working on interpretability and human-AI alignment

Simon Schrodi

@simonschrodi.bsky.social

🎓 PhD student @cvisionfreiburg.bsky.social @UniFreiburg 💡 interested in mechanistic interpretability, robustness, AutoML & ML for climate science https://simonschrodi.github.io/

Sarah Wiegreffe

@sarahwiegreffe.bsky.social

Research in NLP (mostly LM interpretability & explainability). Incoming assistant prof at UMD CS + CLIP. Current postdoc @ai2.bsky.social & @uwnlp.bsky.social Views my own. sarahwie.github.io

Patrick Kahardipraja

@pkhdipraja.bsky.social

PhD student @ Fraunhofer HHI. Interpretability, incremental NLP, and NLU. https://pkhdipraja.github.io/

Sophie Hao

@notaphonologist.bsky.social

Faculty fellow (independent postdoc) in Data Science at New York University. NLP, computational linguistics, interpretability, gender. she/her. Please hire me! https://www.notaphonologist.com/

Jason Lee

@jasondeanlee.bsky.social

Associate Professor at Princeton Machine Learning Researcher

Sophia Sanborn

@naturecomputes.bsky.social

Searching for principles of neural representation | Neuro + AI @ enigmaproject.ai | Stanford | sophiasanborn.com

Jakub Łucki

@jakublucki.bsky.social

Visiting Researcher at NASA JPL | Data Science MSc at ETH Zurich

Max Lamparth, Ph.D.

@mlamparth.bsky.social

Postdoc at @Stanford, @StanfordCISAC, Stanford Center for AI Safety, and the SERI program | Focusing on interpretable, safe, and ethical AI decision-making.

NDIF Team

@ndif-team.bsky.social

The National Deep Inference Fabric, an NSF-funded computational infrastructure to enable research on large-scale Artificial Intelligence. 🔗 NDIF: https://ndif.us 🧰 NNsight API: https://nnsight.net 😸 GitHub: https://github.com/ndif-team/nnsight

Laura Kopf

@lkopf.bsky.social

PhD student in Interpretable Machine Learning at TU Berlin & BIFOLD

Antoine Bosselut

@abosselut.bsky.social

Helping machines make sense of the world. Asst Prof @icepfl.bsky.social; Before: @stanfordnlp.bsky.social @uwnlp.bsky.social AI2 #NLProc #AI Website: https://atcbosselut.github.io/

shuyhere

@shuyhere.bsky.social

cs phd in KAUST

@danielchtan.bsky.social

Yuli Slavutsky

@yulislavutsky.bsky.social

Stats Postdoc at Columbia, @bleilab.bsky.social Statistical ML, Generalization, Uncertainty, Empirical Bayes https://yulisl.github.io/

claudia shi

@claudiashi.bsky.social

machine learning, causal inference, science of llm, ai safety, phd student @bleilab, keen bean https://www.claudiashi.com/

@beodu.bsky.social