“It’s a new technique. Instead of asking people in Kenya to rate toxic versus non...”

It’s a new technique. Instead of asking people in Kenya to rate toxic versus non-toxic reply, we can just ask the language model to… It’s like a small constitutional convention that people blend their preferences and then simply steer the AI to fit the matrix as a norm. This is called “Alignment Assemblies” – maybe New Zealand would like to run one.

2023-06-01 Conversation with Graeme Muller

顯示前後文Show context

鍵盤快捷鍵Keyboard shortcuts