What is KOSMOS2
YOUR LINK HERE:
http://youtube.com/watch?v=K3K5BoN2b5k
KOSMOS-2: Grounding Multimodal Large Language Models to the World is a new preprint from Microsoft research that illustrates multimodal grounding abilities in a large vision-language model. • Timestamps: • 00:00 - KOSMOS-2 • 00:12 - Grounding Multimodal Large Language Models to the World • 01:25 - KOSMOS-1 • 02:01 - KOSMOS-2 overview • 03:42 - The Grounded Image-Text Pairs (GrIT) dataset • 05:07 - Kosmos-2: Model and training details • 07:00 - Testing the model on some tricky images • 07:57 - Hallucinations (Flamingo reference) • 09:34 - Phrase grounding • 09:59 - Referring expression comprehension • 10:43 - Flamingo - where art thou? • 11:56 - Language tasks • 12:20 - The Ethics Statement • 13:43 - Closing thoughts • Paper link: https://arxiv.org/abs/2306.14824 • Demo: https://d20f0c97b1b2ee15.gradio.app/ • Topics: #LLMs #ai #microsoft #KOSMOS-2 • For related content: • Twitter: / samuelalbanie • Research lab: https://caml-lab.com/ • personal webpage: https://samuelalbanie.com/ • YouTube: / @samuelalbanie1 • TikTok: / samuelalbanie • Instagram: / samuelalbanie • LinkedIn: / samuel-albanie • Discord server for filtir: / discord • (Optional) if you'd like to support the channel: • https://www.buymeacoffee.com/samuelal... • / samuel_albanie • Acknowledgements: • Image credit for the boat race: https://commons.wikimedia.org/wiki/Fi...
#############################
