Audrey Tang

It’s a new technique. Instead of asking people in Kenya to rate toxic versus non-toxic reply, we can just ask the language model to… It’s like a small constitutional convention that people blend their preferences and then simply steer the AI to fit the matrix as a norm. This is called “Alignment Assemblies” – maybe New Zealand would like to run one.

鍵盤快捷鍵Keyboard shortcuts

j 下一段next speechk 上一段previous speech