Unified Video Segmentation and Video Object Segmentation Multimodal Weekly 59











>> YOUR LINK HERE: ___ http://youtube.com/watch?v=ALAZi4wRSEI

​​​​In the 59th session of Multimodal Weekly, we had two exciting presentations on video segmentation. • β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹βœ… Dr. Li Minghan from Harvard Medical School discussed her work UniVS - a novel unified video segmentation architecture that uses prompts as queries. • Follow Minghan: https://sites.google.com/view/minghan... • UniVS: https://sites.google.com/view/unified... • β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹β€‹βœ… Ho Kei (Rex) Cheng from UIUC discussed his work Cutie - a video object segmentation (VOS) network with object-level memory reading, which puts the object representation from memory back into the video object segmentation result. • Follow Rex: https://hkchengrex.com/ • Cutie: https://hkchengrex.com/Cutie/ • Timestamps: • 00:10 Introduction • 03:35 Minghan starts • 04:20 Background: image segmentation tasks • 05:22 Background: video segmentation tasks • 07:05 Related works: towards unified framework • 08:20 Motivation: conflicts between video segmentation tasks • 09:53 Main contributions in UniVS (Unified Video Segmentation with Prompts as Queries) • 13:00 UniVS: overall training framework • 14:58 UniVS: unified streaming inference • 16:07 UniVS: three training stages • 16:48 UniVS: ablation study • 18:28 UniVS: experimental results • 19:25 Visualized results for category-guided VS and prompt-guided VS • 20:08 Conclusion • 20:35 Future work • 22:18 Q A with Minghan • 33:15 Rex starts • 34:00 Multimodal video segmentation in 2 steps: (1) text-to-image segmentation and (2) mask propagation • 36:15 Video object segmentation • 36:41 How to track objects? • 37:25 Memory formulation • 37:44 Prior works: pixel matching • 38:35 Our approach: object-level reasoning • 39:00 Overview of Cutie • 40:24 Object transformer block • 41:28 SOTA performance at real-time • 42:22 Performance on long videos • 42:45 Pixel-level plus object-level features • 43:08 Object-level reasoning reduces noise • 43:35 Object-level attentions are focused • 44:15 Multimodal applications • 46:25 Q A with Rex • 58:45 Conclusion • Join the Multimodal Minds community to receive an invite for future webinars:   / discord  

#############################









New on site
Content Report
Youtor.org / YTube video Downloader Β© 2025

created by www.youtor.org