Why?
Human feedback on every output doesn’t scale.
What if AI could supervise itself, what framework would you give?
Maybe something like what Anthropic’s AI constitution outlines in the below.
Be helpful: Provide useful, accurate information
Be harmless: Avoid harmful, toxic, or dangerous content
Be honest: Don’t lie or provide false information
Respect autonomy: Support human agency and choice