Mechanistic interpretability
Starter pack with mechanistic interpretability researchers mostly posting about their research
Created by
@butanium.bsky.social
@antoninpoche.bsky.social
PhD Student doing XAI for NLP at @ANITI_Toulouse, IRIT, and IRT Saint Exupery. 🛠️ Xplique library development team member.
@sfeucht.bsky.social
PhD student doing LLM interpretability with @davidbau.bsky.social and @byron.bsky.social. (they/them) https://sfeucht.github.io
@actinterp.bsky.social
🛠️ Actionable Interpretability🔎 @icmlconf.bsky.social 2025 | Bridging the gap between insights and actions ✨ https://actionable-interpretability.github.io
@wattenberg.bsky.social
Human/AI interaction. ML interpretability. Visualization as design, science, art. Professor at Harvard, and part-time at Google DeepMind.
@canrager.bsky.social
@hidenori8tanaka.bsky.social
Group Leader, CBS-NTT "Physics of Intelligence" Program at Harvard website: https://sites.google.com/view/htanaka/home
@notaphonologist.bsky.social
Faculty fellow (independent postdoc) in Data Science at New York University. NLP, computational linguistics, interpretability, gender. she/her. Please hire me! https://www.notaphonologist.com/
@csinva.bsky.social
Senior researcher at Microsoft Research. Seeking good explanations with machine learning https://csinva.io/
@ndif-team.bsky.social
The National Deep Inference Fabric, an NSF-funded computational infrastructure to enable research on large-scale Artificial Intelligence. 🔗 NDIF: https://ndif.us 🧰 NNsight API: https://nnsight.net 😸 GitHub: https://github.com/ndif-team/nnsight
@ruizheli.bsky.social
Assistant Professor at University of Aberdeen | Postdoc at UCL | PhD at University of Sheffield | mechanistic interpretability & multimodal LLMs | https://www.ruizhe.space
@claudiashi.bsky.social
machine learning, causal inference, science of llm, ai safety, phd student @bleilab, keen bean https://www.claudiashi.com/
@a-lucic.bsky.social
Assistant professor at the University of Amsterdam. Previously at Microsoft Research, Partnership on AI.
@cadentj.bsky.social
@firstuserhere.bsky.social
@sraval.bsky.social
Physics, Visualization and AI PhD @ Harvard | Embedding visualization and LLM interpretability | Love pretty visuals, math, physics and pets | Currently into manifolds Wanna meet and chat? Book a meeting here: https://zcal.co/shivam-raval
@jasmijn.bastings.me
Senior Research Scientist at Google DeepMind. Interested in (equitable) language technology, gender, interpretability, NLP. Views my own. She/her. 🌐 jasmijn.bastings.me
@koyena.bsky.social
CS Ph.D. Candidate @ Northeastern | Interpretability + Data Science | BS/MS @ Brown koyenapal.github.io
@neelrajani.bsky.social
PhD student in Responsible NLP at the University of Edinburgh, passionate about MechInterp
@vaidehipatil.bsky.social
Ph.D. Student at UNC NLP | Prev: Apple, Amazon, Adobe (Intern) vaidehi99.github.io | Undergrad @IITBombay
@diatkinson.bsky.social
PhD student at Northeastern, previously at EpochAI. Doing AI interpretability. diatkinson.github.io
@michaelwhanna.bsky.social
PhD Student at the ILLC / UvA doing work at the intersection of (mechanistic) interpretability and cognitive science. hannamw.github.io
@thomasfel.bsky.social
Explainability, Computer Vision, Neuro-AI.🪴 Kempner Fellow @Harvard. Prev. PhD @Brown, @Google, @GoPro. Crêpe lover. 📍 Boston | 🔗 thomasfel.me
@dilya.bsky.social
PhD Candidate in Interpretability @FraunhoferHHI | 📍Berlin, Germany dilyabareeva.github.io
@tpimentel.bsky.social
Postdoc at ETH. Formerly, PhD student at the University of Cambridge :)
@carl-allen.bsky.social
Laplace Junior Chair, Machine Learning ENS Paris. (prev ETH Zurich, Edinburgh, Oxford..) Working on mathematical foundations/probabilistic interpretability of ML (what NNs learn🤷♂️, disentanglement🤔, king-man+woman=queen?👌…)
@natalieshapira.bsky.social
Tell me about challenges, the unbelievable, the human mind and artificial intelligence, thoughts, social life, family life, science and philosophy.
@kaiserwholearns.bsky.social
Ph.D. student at @jhuclsp, human LM that hallucinates. Formerly @MetaAI, @uwnlp, and @AWS they/them🏳️🌈 #NLProc #NLP Crossposting on X.
@francescortu.bsky.social
NLP & Interpretability | PhD Student @ University of Trieste & Laboratory of Data Engineering of Area Science Park | Prev MPI-IS
@jannikbrinkmann.bsky.social
@jaom7.bsky.social
Associate Professor @UAntwerp, sqIRL/IDLab, imec. #RepresentationLearning, #Model #Interpretability & #Explainability A guy who plays with toy bricks, enjoys research and gaming. Opinions are my own idlab.uantwerpen.be/~joramasmogrovejo
@shan23chen.bsky.social
PhDing @AIM_Harvard @MassGenBrigham|PhD Fellow @Google | Previously @Bos_CHIP @BrandeisU More robustness and explainabilities 🧐 for Health AI. shanchen.dev
@jonling.bsky.social
Assistant Professor @HopkinsMedicine @JHUPath https://scholar.google.com/citations?user=dGBD72YAAAAJ
@vedanglad.bsky.social
ai interpretability research and running • thinking about how models think • prev @MIT cs + physics
@wendlerc.bsky.social
Postdoc at the interpretable deep learning lab at Northeastern University, deep learning, LLMs, mechanistic interpretability
@ericwtodd.bsky.social
CS PhD Student, Northeastern University - Machine Learning, Interpretability https://ericwtodd.github.io
@nsubramani23.bsky.social
PhD student @CMU LTI - working on model #interpretability; prev predoc @ai2; intern @MSFT nishantsubramani.github.io
@jkminder.bsky.social
CS Student at ETH Zürich, currently doing my masters thesis at the DLAB at EPFL Mainly interested in Language Model Interpretability. Most recent work: https://openreview.net/forum?id=Igm9bbkzHC MATS 7.0 Winter 2025 Scholar w/ Neel Nanda jkminder.ch
@kayoyin.bsky.social
PhD student at UC Berkeley. NLP for signed languages and LLM interpretability. kayoyin.github.io 🏂🎹🚵♀️🥋
@colah.bsky.social
Reverse engineering neural networks at Anthropic. Previously Distill, OpenAI, Google Brain.Personal account.
@apepa.bsky.social
Assistant Professor, University of Copenhagen; interpretability, xAI, factuality, accountability, xAI diagnostics https://apepa.github.io/
@wordscompute.bsky.social
nlp/ml phding @ usc, interpretability & reasoning & pretraining & emergence 한american, she, iglee.me, likes ??= bookmarks
@martinagvilas.bsky.social
Computer Science PhD student | AI interpretability | Vision + Language | Cogntive Science. https://martinagvilas.github.io/
@ajyl.bsky.social
Post-doc @ Harvard. PhD UMich. Spent time at FAIR and MSR. ML/NLP/Interpretability
@amakelov.bsky.social
Mechanistic interpretability Creator of https://github.com/amakelov/mandala prev. Harvard/MIT machine learning, theoretical computer science, competition math.
@ddjohnson.bsky.social
PhD student at Vector Institute / University of Toronto. Building tools to study neural nets and find out what they know. He/him. www.danieldjohnson.com
@velezbeltran.bsky.social
Machine Learning PhD Student @ Blei Lab & Columbia University. Working on probabilistic ML | uncertainty quantification | LLM interpretability. Excited about everything ML, AI and engineering!
@swetakar.bsky.social
Machine learning PhD student @ Blei Lab in Columbia University Working in mechanistic interpretability, nlp, causal inference, and probabilistic modeling! Previously at Meta for ~3 years on the Bayesian Modeling & Generative AI teams. 🔗 www.sweta.dev
@joestacey.bsky.social
NLP PhD student at Imperial College London and Apple AI/ML Scholar. My research is on model robustness and interpretability. #NLP #NLProc
@dashiells.bsky.social
Machine learning haruspex || Norbert Weiner is dead so we should just call it "cybernetics" now
@nsaphra.bsky.social
Waiting on a robot body. All opinions are universal and held by both employers and family. Recruiting students to start my lab! ML/NLP/they/she.
@gsarti.com
PhD Student at @gronlp.bsky.social 🐮, core dev @inseq.org. Interpretability ∩ HCI ∩ #NLProc. gsarti.com
@niklasstoehr.bsky.social
Research Scientist at Google DeepMind and PhD Student at ETH Zurich
@amuuueller.bsky.social
Postdoc at Northeastern and incoming Asst. Prof. at Boston U. Working on NLP, interpretability, causality. Previously: JHU, Meta, AWS
@butanium.bsky.social
Master student at ENS Paris-Saclay / aspiring AI safety researcher / improviser Prev research intern @ EPFL w/ wendlerc.bsky.social and Robert West MATS Winter 7.0 Scholar w/ neelnanda.bsky.social https://butanium.github.io