A few months ago, I started to experiment with a tool that takes a dictionary word as part of a Stable Diffusion text prompt and then outputs a picture. The objective is to speak about the generated picture using the dictionary word. The speech is recorded and then can be judged on various dimensions (tone/words/style/etc). Currently, the tool defaults to peer-to-peer anonymous judging, but there is also a private mode where all the audio stays local to the device.