Top Important Computer Vision Papers for the Week from 11/12 to 17/12

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3190

Abstract

ttps://huggingface.co/papers/2312.07409"><b>1.5. DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing</b></a></p><p id="dd64"><a href="https://huggingface.co/papers/2312.07536"><b>1.6. FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition</b></a></p><p id="8053"><a href="https://huggingface.co/papers/2312.07231"><b>1.7. Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation</b></a></p><p id="6d4f"><a href="https://huggingface.co/papers/2312.07509"><b>1.8. PEEKABOO: Interactive Video Generation via Masked-Diffusion</b></a></p><p id="d245"><a href="https://huggingface.co/papers/2312.08128"><b>1.8. Clockwork Diffusion: Efficient Generation With Model-Step Distillation</b></a></p><p id="f582"><a href="https://huggingface.co/papers/2312.09256"><b>1.9. LIME: Localized Image Editing via Attention Regularization in Diffusion Models</b></a></p><p id="e33f"><a href="https://huggingface.co/papers/2312.08754"><b>1.10. UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation</b></a></p><h1 id="28f2">2. Vision Language Models</h1><p id="79ca"><a href="https://huggingface.co/papers/2312.06109"><b>2.1. Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models</b></a></p><p id="bf72"><a href="https://huggingface.co/papers/2312.07533"><b>2.2. VILA: On Pre-training for Visual Language Models</b></a></p><p id="00b5"><a href="https://huggingface.co/papers/2312.06971"><b>2.3. CCM: Adding Conditional Controls to Text-to-Image Consistency Models</b></a></p><p id="da5f"><a href="https://huggingface.co/papers/2312.07424"><b>2.4. How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation</b></a></p><p id="ca1b"><a href="https://huggingface.co/papers/2312.08914"><b>2.5. CogAgent: A Visual Language Model for GUI Agents</b></a></p><p id="ee55"><a href="https://huggingface.co/papers/2312.08578"><b>2.6. A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions</b></a></p><p id="0292"><a href="https://huggingface.co/papers/2312.09251"><b>2.7. VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation</b></a></p><p id="f465"><a href="https://huggingface.co/papers/2312.09237"><b>2.6. Pixel-Aligned Language Models</b></a></p><p id="88ab"><a href="https://huggingface.co/papers/2312.09187"><b>2.7. Vision-Language Models as a Source of Rewards</b></a></p><h1 id="0a7b">3. Image Generation & Editing</h1><p id="aa41"><a href="https://huggingface.co/papers/2312.04655"><b>3.1. ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations</b></a></p><p id="d26b"><a href="https://huggingface.co/papers/2312.09246"><b>3.2. SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds</b></a></p><p id="47d0"><a href="https://huggingface.co/papers/2312.09222"><b>3.3. Mosaic-SDF for 3D Generative Models</b></a></p><p id="920d"><a href="https://huggingface.co/papers/2312.08889"><b>3.4. SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance</b></a></p><p id="e80d"><a href="https://huggingface.co/pap

Options

ers/2312.09252"><b>3.5. FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection</b></a></p><h1 id="11af">4. Video Generation & Editing</h1><p id="8b3b"><a href="https://huggingface.co/papers/2312.05107"><b>4.1. DreaMoving: A Human Dance Video Generation Framework Based on Diffusion Models</b></a></p><p id="8335"><a href="https://huggingface.co/papers/2312.06662"><b>4.2. Photorealistic Video Generation with Diffusion Models</b></a></p><p id="60b0"><a href="https://huggingface.co/papers/2312.04875"><b>4.3. MVDD: Multi-View Depth Diffusion Models</b></a></p><p id="beb3"><a href="https://huggingface.co/papers/2312.09109"><b>4.4. VideoLCM: Video Latent Consistency Model</b></a></p><h1 id="1cd6">5. Image Segmentation</h1><p id="d256"><a href="https://huggingface.co/papers/2312.07661"><b>5.1. CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor</b></a></p><h1 id="cabb">6. Image Recognition</h1><p id="5c5d"><a href="https://huggingface.co/papers/2312.04837"><b>6.1. Localized Symbolic Knowledge Distillation for Visual Commonsense Models</b></a></p><h2 id="f1a9">If you like the article and would like to support me, make sure to:</h2><ul><li><b>👏 Clap for the story (50 claps) to help this article be featured</b></li><li><b>Subscribe to <a href="https://youssefh.substack.com/">To Data & Beyond</a> Newsletter</b></li><li><b>Follow me on <a href="https://youssefraafat57.medium.com/">Medium</a></b></li><li><b>📰 View more content on my <a href="https://medium.com/@youssefraafat57">medium profile</a></b></li><li><b>🔔 Follow Me: <a href="https://www.linkedin.com/in/youssef-hosni-b2960b135/">LinkedIn </a>|<a href="https://www.youtube.com/@youssefhosni9801/featured">Youtube </a>| <a href="https://github.com/youssefHosni">GitHub</a> | <a href="https://twitter.com/Youssef70125494">Twitter</a></b></li></ul><h2 id="25ac">Subscribe to my newsletter To Data & Beyond to get full and early access to my articles:</h2><div id="3d07" class="link-block"> <a href="https://youssefh.substack.com/"> <div> <div> <h2>To Data & Beyond | Youssef Hosni | Substack</h2> <div><h3>Data Science, Machine Learning, AI, and what is beyond them. Click to read To Data & Beyond, by Youssef Hosni, a…</h3></div> <div><p>youssefh.substack.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*lBX68YIFUHJ67nT3)"></div> </div> </div> </a> </div><h2 id="2717">Are you looking to start a career in data science and AI and do not know how? I offer data science mentoring sessions and long-term career mentoring:</h2><ul><li><b>Mentoring sessions</b>: <a href="https://topmate.io/youssef_hosni">https://lnkd.in/dXeg3KPW</a></li><li><b>Long-term mentoring:</b> <a href="https://lnkd.in/dtdUYBrM">https://lnkd.in/dtdUYBrM</a></li></ul><figure id="e4ec"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*SzdN4bIQAQGNiK3c.png"><figcaption></figcaption></figure></article></body>

Top Important Computer Vision Papers for the Week from 11/12 to 17/12

Stay Updated with Recent Computer Vision Research

Table of Contents:

To Data & Beyond | Youssef Hosni | Substack

Data Science, Machine Learning, AI, and what is beyond them. Click to read To Data & Beyond, by Youssef Hosni, a…

1. Stable Diffusion

2. Vision Language Models

3. Image Generation & Editing

4. Video Generation & Editing

5. Image Segmentation

6. Image Recognition

If you like the article and would like to support me, make sure to:

Subscribe to my newsletter To Data & Beyond to get full and early access to my articles:

To Data & Beyond | Youssef Hosni | Substack

Data Science, Machine Learning, AI, and what is beyond them. Click to read To Data & Beyond, by Youssef Hosni, a…

Are you looking to start a career in data science and AI and do not know how? I offer data science mentoring sessions and long-term career mentoring: