AI Alignment people

people doing ai alignment research

Created by

Kabir Kumar, AI-Plans.com

@kabirkumar.bsky.social

View in Bluesky

Tom Everitt

@tom4everitt.bsky.social

AGI safety researcher at Google DeepMind, leading causalincentives.com Personal website: tomeveritt.se

Sam Bowman

@sleepinyourhat.bsky.social

AI safety at Anthropic, on leave from a faculty job at NYU. Views not employers'. I think you should join Giving What We Can. cims.nyu.edu/~sbowman

Peter Hase

@peterbhase.bsky.social

AI safety researcher. PhD from UNC Chapel Hill (Google PhD Fellow). Previously: Anthropic, AI2, Google, Meta

Alex Turner

@turntrout.bsky.social

Research scientist at Google DeepMind. All opinions are my own. https://turntrout.com

David Bau

@davidbau.bsky.social

Interpretable Deep Networks. http://baulab.info/ @davidbau

Aaron Mueller @ NAACL 🇺🇸

@amuuueller.bsky.social

Postdoc at Northeastern and incoming Asst. Prof. at Boston U. Working on NLP, interpretability, causality. Previously: JHU, Meta, AWS

Lee Sharkey

@leesharkey.bsky.social

Scruting matrices @ Apollo Research

Arthur Conmy

@arthurconmy.bsky.social

Aspiring 10x reverse engineer at Google DeepMind

Sebastian Farquhar

@sebfar.bsky.social

Senior Research Scientist at Google DeepMind. AGI Alignment researcher. Views my dog's.

Buck Shlegeris

@bshlgrs.bsky.social

@eleutherai.bsky.social

Nora Belrose

@norabelrose.bsky.social

AI, philosophy, spirituality Head of interpretability research at EleutherAI, but posts are my own views, not Eleuther’s.

@neelnanda.bsky.social

Karim Abdel Sadek

@karimabdel.bsky.social

Intern at CHAI, UC Berkeley | Ex-Research intern at the Krueger AI Safety Lab, University of Cambridge Interested in RL, AI Safety, Cooperative AI, TCS https://karim-abdel.github.io

Anka Reuel ➡️ NeurIPS

@ankareuel.bsky.social

Computer Science PhD Student @ Stanford | Geopolitics & Technology Fellow @ Harvard Kennedy School/Belfer | Vice Chair EU AI Code of Practice | Views are my own

Kaj Sotala

@kajsotala.bsky.social

This is a profile. There are many like it, but this one's mine. Blogs: https://kajsotala.fi , https://kajsotala.substack.com/ .

Gabriele Sarti

@gsarti.com

PhD Student at @gronlp.bsky.social 🐮, core dev @inseq.org. Interpretability ∩ HCI ∩ #NLProc. gsarti.com

Isabelle Lee

@wordscompute.bsky.social

nlp/ml phding @ usc, interpretability & reasoning & pretraining & emergence 한american, she, iglee.me, likes ??= bookmarks

Professor, CSE, UNSW Sydney. #AI #ML #UbiComp #LLM #MFM #timeseries #ST #multimodal #sensors #continuallearning #trustworthyAI ❤️ #coffee Why am I here? Scouting for a new platform to discover and learn new papers (let’s see if it’s the one)

Stephanie Brandl

@stephaniebrandl.bsky.social

Assistant Professor in NLP (Fairness, Interpretability and lately interested in Political Science) at the University of Copenhagen ✨ Before: PostDoc in NLP at Uni of CPH, PhD student in ML at TU Berlin

Mimansa Jaiswal

@mimansaj.bsky.social

Robustness, Data & Annotations, Evaluation & Interpretability in LLMs http://mimansajaiswal.github.io/

Christina (Chrisy)

@variint.bsky.social

Lost in translation | Interpretability of modular convnets applied to 👁️ and 🛰️🐝 | she/her 🦒💕 variint.github.io

Miryam de Lhoneux

@mdlhx.bsky.social

NLP assistant prof at KU Leuven, PI @lagom-nlp.bsky.social. I like syntax more than most people. Also multilingual NLP, interpretability, mountains and beer. (She/her)

Christoph Molnar

@christophmolnar.bsky.social

Author of Interpretable Machine Learning and other books Newsletter: https://mindfulmodeler.substack.com/ Website: https://christophmolnar.com/

Stella Biderman

@stellaathena.bsky.social

I make sure that OpenAI et al. aren't the only people who are able to study large scale AI systems.

Roma Patel

@romapatel.bsky.social

research scientist @deepmind. language & multi-agent rl & interpretability. phd @BrownUniversity '22 under ellie pavlick (she/her) https://roma-patel.github.io

sonia joseph

@soniajoseph.bsky.social

AI researcher at Mila, visiting researcher at Meta Also on X: @soniajoseph_

Matija Franklin

@matijafranklin.bsky.social

Researching AI Alignment and Manipulation. Conduct CogSci experiments

Aashiq Muhamed

@aashiqmuhamed.bsky.social

Machine Learning PhD at Carnegie Mellon @mldcmu Ex-Applied Scientist @amazon Search & @AWSAI @Stanford & @LTIatCMU MS and @iitroorkee President's Gold Medalist

Steve Byrnes

@stevebyrnes.bsky.social

Researching Artificial General Intelligence Safety, via thinking about neuroscience and algorithms, at Astera Institute. https://sjbyrnes.com/agi.html

Dylan Hadfield-Menell

@dhadfieldmenell.bsky.social

Assistant Prof of AI & Decision-Making @MIT EECS I run the Algorithmic Alignment Group (https://algorithmicalignment.csail.mit.edu/) in CSAIL. I work on value (mis)alignment in AI systems. https://people.csail.mit.edu/dhm/

Stanislav Fort

@stanislavfort.bsky.social

AI + security | Stanford PhD in AI & Cambridge physics | techno-optimism + alignment + progress + growth | 🇺🇸🇨🇿

Kartik Chandra

@kartikchandra.bsky.social

I'm a PhD student at MIT CSAIL. More about me: https://cs.stanford.edu/~kach

xuan (ɕɥɛn / sh-yen)

@xuanalogue.bsky.social

PhD Student. MIT ProbComp / CoCoSci. Inverting Bayesian models of human reasoning and decision-making. Pronouns: 祂/伊

Kabir Kumar, AI-Plans.com

@kabirkumar.bsky.social

I run AI-Plans, an AI Safety lab focused on very precisely evaluating AI Alignment Plans. For several weeks I used a stone for a pillow. I once spent a quarter of my paycheck on cheese. Ping me! DM me! SurpassAI

AI Alignment people

Tom Everitt

Sam Bowman

Peter Hase

Samuel Albanie

Alex Turner

David Bau

Aaron Mueller @ NAACL 🇺🇸

Lee Sharkey

Arthur Conmy

Sebastian Farquhar

Buck Shlegeris

Nora Belrose

Karim Abdel Sadek

Anka Reuel ➡️ NeurIPS

Kaj Sotala

Gabriele Sarti

Isabelle Lee

Flora Salim

Stephanie Brandl

Mimansa Jaiswal

Christina (Chrisy)

Miryam de Lhoneux

Christoph Molnar

Stella Biderman

Roma Patel

sonia joseph

Matija Franklin

Aashiq Muhamed

Steve Byrnes

Dylan Hadfield-Menell

Stanislav Fort

Kartik Chandra

xuan (ɕɥɛn / sh-yen)

Kabir Kumar, AI-Plans.com