- Advertisement -
ChatGPT 4 will be 571 times more powerful than Chat GPT-3.
ChatGPT 4 will be the most valuable tool for:
• Writers
But only if you treat it right early:
• Practise prompting
• Automate work
• Build systems
If Chat GPT 4 is multimodal, we can predict with reasonable confidence what GPT-4 *might* be capable of, given Microsoft’s prior work Kosmos-1:
– Visual IQ test: yes, the ones that humans take!
– OCR-free reading comprehension: input a screenshot, scanned document, street sign, or any pixels that contain text. Reason about the contents directly without explicit OCR. This is extremely useful to unlock AI-powered apps on multimedia web pages, or “text in the wild” from real world cams.
– Multimodal chat: have a conversation about a picture. You can even provide “follow-up” images in the middle.
– Broad visual understanding abilities, like captioning, visual question answering, object detection, scene layout, common sense reasoning, etc.
– Audio & speech recognition (??): wasn’t mentioned in Kosmos-1 paper, but Whisper is already an OpenAI API and should be fairly easy to integrate.
Note: the predictions are based on what Andreas Braun, Microsoft Germany CTO, allegedly said. They may or may not be accurate (that’s why I call it “prediction”). But Kosmos-1 is very real and rock solid. It offers a glimpse of either GPT-4 or whatever AI service that Microsoft will provide next. I find it difficult to believe Kosmos-1 will stay in the lab and not become a product.
In any case, prepare yourself for multimodal APIs – they’ll happen sooner or later!
- Advertisement -