Clip and Friends: How Vision-Language Models Evolved | Dark Hacker News