Was thinking about using gptV for understanding peoples images too which would be cool (cant upload images yet)
long term im making it into a social net where you can be/sound like anything you want
which is doing Latent consistency with extra quality steps like SDXL refinement