But while structured generation can guarantee the right format comes back, it cannot always guarantee that the properties returned have the right content in them. This is where fine-tuning comes in.
With LoRAX, combining both approaches together during inference is as easy as specifying two parameters in LoRAX: a "schema" and a fine-tuned LoRA "adapter_id". Together, you get the best of both worlds: the right format and the right content.
If getting reliable JSON output from LLMs is something you're interested in, do check out the blog for more details, including a tutorial, public LoRA adapter hosted on HuggingFace, and the complete set of benchmarking scripts to reproduce our results.