Ask HN: What happens when you paste a screenshot, and ask questions in LLM? (news.ycombinator.com)
When conversing with LLM (Claude, Cursor, ChatGPT), I often paste screenshot as a reference, to provide context and ask questions. I know, ultimately, this is pixels and bits. But how does it this work? Do LLMs do an image processing to text, translate them word vectors, and then answer the questions or do they go in a different mode? I find this kind of interaction with the machine, mind blowing.
Comments (2)