Multimodal Language Models Explained: Visual Instruction Tuning | Dark Hacker News