Ask HN: How do I use LLMs to generate test cases for groundedness benchmarks?(devblogs.microsoft.com) |
Ask HN: How do I use LLMs to generate test cases for groundedness benchmarks?(devblogs.microsoft.com) |
Confirmation bias is one obvious pitfall that comes to mind, but also I wonder how it is possible to achieve reproducibility when the input is stochastic.