undefined | Dark Hacker News

1 points by nikhilpareek13 93 days ago

nikhilpareek13 93 days ago |

Over the past few weeks, we rebuilt synthetic data generation at Future AGI.

Recent updates:

- Outputs anchored to uploaded knowledge bases

- ~90% adherence to source material observed

- 1.78× faster dataset creation (1,000+ rows in ~10 mins)

- Edit columns before/during/after runs

- Better diversity beyond 5,000 rows

- SOP uploads converted into structured evaluation scenarios

- One-click synthetic variable generation for prompt testing

For teams evaluating LLM systems under data constraints, this has reduced iteration friction significantly.

Curious how others are validating grounding + diversity at scale.