Mechanistic interpretability

Starter pack with mechanistic interpretability researchers mostly posting about their research

Created by

Clément Dumas

@butanium.bsky.social

Antonin Poché

@antoninpoche.bsky.social

PhD Student doing XAI for NLP at @ANITI_Toulouse, IRIT, and IRT Saint Exupery. 🛠️ Xplique library development team member.

Sheridan Feucht

@sfeucht.bsky.social

PhD student doing LLM interpretability with @davidbau.bsky.social and @byron.bsky.social. (they/them) https://sfeucht.github.io

@hadasorgad.bsky.social

Actionable Interpretability Workshop ICML2025

@actinterp.bsky.social

🛠️ Actionable Interpretability🔎 @icmlconf.bsky.social 2025 | Bridging the gap between insights and actions ✨ https://actionable-interpretability.github.io

Martin Wattenberg

@wattenberg.bsky.social

Human/AI interaction. ML interpretability. Visualization as design, science, art. Professor at Harvard, and part-time at Google DeepMind.

Hidenori Tanaka

@hidenori8tanaka.bsky.social

Group Leader, CBS-NTT "Physics of Intelligence" Program at Harvard website: https://sites.google.com/view/htanaka/home

Sophie Hao

@notaphonologist.bsky.social

Faculty fellow (independent postdoc) in Data Science at New York University. NLP, computational linguistics, interpretability, gender. she/her. Please hire me! https://www.notaphonologist.com/

Chandan Singh

@csinva.bsky.social

Senior researcher at Microsoft Research. Seeking good explanations with machine learning https://csinva.io/

The National Deep Inference Fabric, an NSF-funded computational infrastructure to enable research on large-scale Artificial Intelligence. 🔗 NDIF: https://ndif.us 🧰 NNsight API: https://nnsight.net 😸 GitHub: https://github.com/ndif-team/nnsight

Laura Kopf

@lkopf.bsky.social

PhD student in Interpretable Machine Learning at TU Berlin & BIFOLD

Ruizhe Li

@ruizheli.bsky.social

Assistant Professor at University of Aberdeen | Postdoc at UCL | PhD at University of Sheffield | mechanistic interpretability & multimodal LLMs | https://www.ruizhe.space

claudia shi

@claudiashi.bsky.social

machine learning, causal inference, science of llm, ai safety, phd student @bleilab, keen bean https://www.claudiashi.com/

Yoann Poupart

@xmaster6y.bsky.social

XAI PhD Student & Entrepreneur

Ana Lučić

@a-lucic.bsky.social

Assistant professor at the University of Amsterdam. Previously at Microsoft Research, Partnership on AI.

@joshengels.bsky.social

PhD student at MIT. Working on mechanistic interpretability and AI safety.

Caden

@cadentj.bsky.social

dribnet

@drib.net

creations with code and networks

Kunvar Thaman

@firstuserhere.bsky.social

Gonçalo Paulo

@goncalo-paulo.bsky.social

Interpretability researcher at @eleutherai.bsky.social

Shivam Raval

@sraval.bsky.social

Physics, Visualization and AI PhD @ Harvard | Embedding visualization and LLM interpretability | Love pretty visuals, math, physics and pets | Currently into manifolds Wanna meet and chat? Book a meeting here: https://zcal.co/shivam-raval

Taufeeque

@taufeeque.bsky.social

Research Engineer @ FAR.AI taufeeque9.github.io

Javier Ferrando

@javifer.bsky.social

Interpretability

Jasmijn Bastings

@jasmijn.bastings.me

Senior Research Scientist at Google DeepMind. Interested in (equitable) language technology, gender, interpretability, NLP. Views my own. She/her. 🌐 jasmijn.bastings.me

@neelnanda.bsky.social

@woog0.bsky.social

Koyena Pal

@koyena.bsky.social

CS Ph.D. Candidate @ Northeastern | Interpretability + Data Science | BS/MS @ Brown koyenapal.github.io

Neel Rajani

@neelrajani.bsky.social

PhD student in Responsible NLP at the University of Edinburgh, passionate about MechInterp

Vaidehi Patil

@vaidehipatil.bsky.social

Ph.D. Student at UNC NLP | Prev: Apple, Amazon, Adobe (Intern) vaidehi99.github.io | Undergrad @IITBombay

@kevdududu.bsky.social

David Atkinson

@diatkinson.bsky.social

PhD student at Northeastern, previously at EpochAI. Doing AI interpretability. diatkinson.github.io

Michael Hanna

@michaelwhanna.bsky.social

PhD Student at the ILLC / UvA doing work at the intersection of (mechanistic) interpretability and cognitive science. hannamw.github.io

Bart Bussmann

@bartbussmann.bsky.social

Independent Mechanistic Interpretability Researcher

Thomas Fel

@thomasfel.bsky.social

Explainability, Computer Vision, Neuro-AI.🪴 Kempner Fellow @Harvard. Prev. PhD @Brown, @Google, @GoPro. Crêpe lover. 📍 Boston | 🔗 thomasfel.me

Alessandro Stolfo

@alestolfo.bsky.social

PhD @ ETHZ - LLM Interpretability alestolfo.github.io

Dilyara Bareeva

@dilya.bsky.social

PhD Candidate in Interpretability @FraunhoferHHI | 📍Berlin, Germany dilyabareeva.github.io

@michael-pearce.bsky.social

Tiago Pimentel

@tpimentel.bsky.social

Postdoc at ETH. Formerly, PhD student at the University of Cambridge :)

Carl Allen

@carl-allen.bsky.social

Laplace Junior Chair, Machine Learning ENS Paris. (prev ETH Zurich, Edinburgh, Oxford..) Working on mathematical foundations/probabilistic interpretability of ML (what NNs learn🤷‍♂️, disentanglement🤔, king-man+woman=queen?👌…)

Natalie Shapira

@natalieshapira.bsky.social

Tell me about challenges, the unbelievable, the human mind and artificial intelligence, thoughts, social life, family life, science and philosophy.

Kaiser Sun

@kaiserwholearns.bsky.social

Ph.D. student at @jhuclsp, human LM that hallucinates. Formerly @MetaAI, @uwnlp, and @AWS they/them🏳️‍🌈 #NLProc #NLP Crossposting on X.

Francesco Ortu

@francescortu.bsky.social

NLP & Interpretability | PhD Student @ University of Trieste & Laboratory of Data Engineering of Area Science Park | Prev MPI-IS

Jannik Brinkmann

@jannikbrinkmann.bsky.social

José Oramas

@jaom7.bsky.social

Associate Professor @UAntwerp, sqIRL/IDLab, imec. #RepresentationLearning, #Model #Interpretability & #Explainability A guy who plays with toy bricks, enjoys research and gaming. Opinions are my own idlab.uantwerpen.be/~joramasmogrovejo

Shan Chen

@shan23chen.bsky.social

PhDing @AIM_Harvard @MassGenBrigham｜PhD Fellow @Google | Previously @Bos_CHIP @BrandeisU More robustness and explainabilities 🧐 for Health AI. shanchen.dev

Cristina

@cristinaml.bsky.social

ML/AI researcher @JohnsHopkins

Jonathan Ling

@jonling.bsky.social

Assistant Professor @HopkinsMedicine @JHUPath https://scholar.google.com/citations?user=dGBD72YAAAAJ

vedang

@vedanglad.bsky.social

ai interpretability research and running • thinking about how models think • prev @MIT cs + physics

Chris Wendler

@wendlerc.bsky.social

Postdoc at the interpretable deep learning lab at Northeastern University, deep learning, LLMs, mechanistic interpretability

Aryaman Arora

@aryaman.io

member of technical staff @stanfordnlp.bsky.social

Eric Todd

@ericwtodd.bsky.social

CS PhD Student, Northeastern University - Machine Learning, Interpretability https://ericwtodd.github.io

nishant subramani @ NAACL🌵

@nsubramani23.bsky.social

PhD student @CMU LTI - working on model #interpretability; prev predoc @ai2; intern @MSFT nishantsubramani.github.io

Julian Minder

@jkminder.bsky.social

CS Student at ETH Zürich, currently doing my masters thesis at the DLAB at EPFL Mainly interested in Language Model Interpretability. Most recent work: https://openreview.net/forum?id=Igm9bbkzHC MATS 7.0 Winter 2025 Scholar w/ Neel Nanda jkminder.ch

Arthur Conmy

@arthurconmy.bsky.social

Aspiring 10x reverse engineer at Google DeepMind

Kayo Yin

@kayoyin.bsky.social

PhD student at UC Berkeley. NLP for signed languages and LLM interpretability. kayoyin.github.io 🏂🎹🚵‍♀️🥋

Lee Sharkey

@leesharkey.bsky.social

Scruting matrices @ Apollo Research

Chris Olah

@colah.bsky.social

Reverse engineering neural networks at Anthropic. Previously Distill, OpenAI, Google Brain.Personal account.

Pepa Atanasova

@apepa.bsky.social

Assistant Professor, University of Copenhagen; interpretability, xAI, factuality, accountability, xAI diagnostics https://apepa.github.io/

Isabelle Lee

@wordscompute.bsky.social

nlp/ml phding @ usc, interpretability & reasoning & pretraining & emergence 한american, she, iglee.me, likes ??= bookmarks

Martina Vilas

@martinagvilas.bsky.social

Computer Science PhD student | AI interpretability | Vision + Language | Cogntive Science. https://martinagvilas.github.io/

Andrew Lee

@ajyl.bsky.social

Post-doc @ Harvard. PhD UMich. Spent time at FAIR and MSR. ML/NLP/Interpretability

Alex Makelov

@amakelov.bsky.social

Mechanistic interpretability Creator of https://github.com/amakelov/mandala prev. Harvard/MIT machine learning, theoretical computer science, competition math.

Daniel Johnson

@ddjohnson.bsky.social

PhD student at Vector Institute / University of Toronto. Building tools to study neural nets and find out what they know. He/him. www.danieldjohnson.com

Nicolas Beltran-Velez

@velezbeltran.bsky.social

Machine Learning PhD Student @ Blei Lab & Columbia University. Working on probabilistic ML | uncertainty quantification | LLM interpretability. Excited about everything ML, AI and engineering!

Sweta Karlekar

@swetakar.bsky.social

Machine learning PhD student @ Blei Lab in Columbia University Working in mechanistic interpretability, nlp, causal inference, and probabilistic modeling! Previously at Meta for ~3 years on the Bayesian Modeling & Generative AI teams. 🔗 www.sweta.dev

Joe Stacey

@joestacey.bsky.social

NLP PhD student at Imperial College London and Apple AI/ML Scholar. My research is on model robustness and interpretability. #NLP #NLProc

Dashiell

@dashiells.bsky.social

Machine learning haruspex || Norbert Weiner is dead so we should just call it "cybernetics" now

Naomi Saphra | hiring PhD students

@nsaphra.bsky.social

Waiting on a robot body. All opinions are universal and held by both employers and family. Recruiting students to start my lab! ML/NLP/they/she.

Gabriele Sarti

@gsarti.com

PhD Student at @gronlp.bsky.social 🐮, core dev @inseq.org. Interpretability ∩ HCI ∩ #NLProc. gsarti.com

Nina Rimsky

@ninarimsky.bsky.social

AI Safety Research // Software Engineering

Niklas Stoehr

@niklasstoehr.bsky.social

Research Scientist at Google DeepMind and PhD Student at ETH Zurich

Mor Geva

@megamor2.bsky.social

https://mega002.github.io

David Bau

@davidbau.bsky.social

Interpretable Deep Networks. http://baulab.info/ @davidbau

Aaron Mueller

@amuuueller.bsky.social

Postdoc at Northeastern and incoming Asst. Prof. at Boston U. Working on NLP, interpretability, causality. Previously: JHU, Meta, AWS

Clément Dumas

@butanium.bsky.social

Master student at ENS Paris-Saclay / aspiring AI safety researcher / improviser Prev research intern @ EPFL w/ wendlerc.bsky.social and Robert West MATS Winter 7.0 Scholar w/ neelnanda.bsky.social https://butanium.github.io

Mechanistic interpretability

Antonin Poché

Sheridan Feucht

Actionable Interpretability Workshop ICML2025

Martin Wattenberg

Can

Hidenori Tanaka

Sophie Hao

Chandan Singh

NDIF Team

Laura Kopf

Ruizhe Li

claudia shi

Yoann Poupart

Ana Lučić

Caden

dribnet

Kunvar Thaman

Gonçalo Paulo

Shivam Raval

Taufeeque

Javier Ferrando

Jasmijn Bastings

Koyena Pal

Neel Rajani

Vaidehi Patil

David Atkinson

Michael Hanna

Bart Bussmann

Thomas Fel

Alessandro Stolfo

Dilyara Bareeva

Tiago Pimentel

Carl Allen

Natalie Shapira

Kaiser Sun

Francesco Ortu

Jannik Brinkmann

José Oramas

Shan Chen

Cristina

Jonathan Ling

vedang

Chris Wendler

Aryaman Arora

Eric Todd

nishant subramani @ NAACL🌵

Julian Minder

Arthur Conmy

Kayo Yin

Lee Sharkey

Chris Olah

Pepa Atanasova

Isabelle Lee

Martina Vilas

Andrew Lee

Alex Makelov

Daniel Johnson

Nicolas Beltran-Velez

Sweta Karlekar

Joe Stacey

Dashiell

Naomi Saphra | hiring PhD students

Gabriele Sarti

Nina Rimsky

Niklas Stoehr

Mor Geva

David Bau

Aaron Mueller

Clément Dumas