Recent updates:
- Outputs anchored to uploaded knowledge bases
- ~90% adherence to source material observed
- 1.78× faster dataset creation (1,000+ rows in ~10 mins)
- Edit columns before/during/after runs
- Better diversity beyond 5,000 rows
- SOP uploads converted into structured evaluation scenarios
- One-click synthetic variable generation for prompt testing
For teams evaluating LLM systems under data constraints, this has reduced iteration friction significantly.
Curious how others are validating grounding + diversity at scale.