Meta-Rewarding Language Models:Self-Improving Alignment with LLM-as-a-Meta-Judge(arxiv.org)2 points by sssummer 1 year ago | 0 commentsNo comments yet