reCAPTCHA WAF Session Token
Lastest IT Trends

AI Trends 2024: Computer Vision with Naila Murray – 665



Today we kick off our AI Trends 2024 series with a conversation with Naila Murray, director of AI research at Meta. In our conversation with Naila, we dig into the latest trends and developments in the realm of computer vision. We explore advancements in the areas of controllable generation, visual programming, 3D Gaussian splatting, and multimodal models, specifically vision plus LLMs. We discuss tools and open source projects, including Segment Anything–a tool for versatile zero-shot image segmentation using simple text prompts clicks, and bounding boxes; ControlNet–which adds conditional control to stable diffusion models; and DINOv2–a visual encoding model enabling object recognition, segmentation, and depth estimation, even in data-scarce scenarios. Finally, Naila shares her view on the most exciting opportunities in the field, as well as her predictions for upcoming years.

🔔 Subscribe to our channel for more great content just like this:

🗣️ CONNECT WITH US!
===============================
Subscribe to the TWIML AI Podcast:
Join our Slack Community:
Subscribe to our newsletter:
Want to get in touch? Send us a message:

📖 CHAPTERS
===============================
00:00 – Background
1:52 – 2023’s computer vision highlights
3:53 – Controllable generation
5:51 – Versatile diffusion
10:10 – Pix2Video
13:16 – ControlNet
16:02 – Visual programming
16:18 – VisProg
19:39 – ViperGPT
23:12 – 3D Gaussian splatting
29:53 – Vision + LLMs
32:25 – Visual Instruction Tuning
36:34 – Top new tools/open source projects–Segment Anything
40:56 – ControlNet
41:35 – DINOv2
45:30 – Upcoming exciting opportunities in computer vision
53:00 – 2024 predictions

🔗 LINKS & RESOURCES
===============================
Learning Representations for Visual Search with Naila Murray – 190 –
Adding Conditional Control to Text-to-Image Diffusion Models –
@aivanlogic tweet –
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model –
CLIP: Connecting text and images –
DALL·E: Creating images from text –
Dual-Guided Brain Diffusion Model: Natural Image Reconstruction from Human Visual Stimulus fMRI –
Pix2Video: Video Editing using Image Diffusion –
Zero-Shot Spatial Layout Conditioning for Text-to-Image Diffusion Models –
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model –
Voyager: An Open-Ended Embodied Agent with Large Language Models –
Visual Programming: Compositional visual reasoning without training –
ViperGPT (codex) –
Visual Instruction Tuning –
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models –
Toolformer: Language Models Can Teach Themselves to Use Tools –
Segment Anything –
DINOv2: A Self-supervised Vision Transformer Model –

📸 Camera:
🎙️Microphone:
🚦Lights:
🎛️ Audio Interface:
🎚️ Stream Deck:

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button
WP Twitter Auto Publish Powered By : XYZScripts.com
SiteLock