avatarYoussef Hosni

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

3190

Abstract

ttps://huggingface.co/papers/2312.07409"><b>1.5. DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing</b></a></p><p id="dd64"><a href="https://huggingface.co/papers/2312.07536"><b>1.6. FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition</b></a></p><p id="8053"><a href="https://huggingface.co/papers/2312.07231"><b>1.7. Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation</b></a></p><p id="6d4f"><a href="https://huggingface.co/papers/2312.07509"><b>1.8. PEEKABOO: Interactive Video Generation via Masked-Diffusion</b></a></p><p id="d245"><a href="https://huggingface.co/papers/2312.08128"><b>1.8. Clockwork Diffusion: Efficient Generation With Model-Step Distillation</b></a></p><p id="f582"><a href="https://huggingface.co/papers/2312.09256"><b>1.9. LIME: Localized Image Editing via Attention Regularization in Diffusion Models</b></a></p><p id="e33f"><a href="https://huggingface.co/papers/2312.08754"><b>1.10. UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation</b></a></p><h1 id="28f2">2. Vision Language Models</h1><p id="79ca"><a href="https://huggingface.co/papers/2312.06109"><b>2.1. Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models</b></a></p><p id="bf72"><a href="https://huggingface.co/papers/2312.07533"><b>2.2. VILA: On Pre-training for Visual Language Models</b></a></p><p id="00b5"><a href="https://huggingface.co/papers/2312.06971"><b>2.3. CCM: Adding Conditional Controls to Text-to-Image Consistency Models</b></a></p><p id="da5f"><a href="https://huggingface.co/papers/2312.07424"><b>2.4. How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation</b></a></p><p id="ca1b"><a href="https://huggingface.co/papers/2312.08914"><b>2.5. CogAgent: A Visual Language Model for GUI Agents</b></a></p><p id="ee55"><a href="https://huggingface.co/papers/2312.08578"><b>2.6. A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions</b></a></p><p id="0292"><a href="https://huggingface.co/papers/2312.09251"><b>2.7. VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation</b></a></p><p id="f465"><a href="https://huggingface.co/papers/2312.09237"><b>2.6. Pixel-Aligned Language Models</b></a></p><p id="88ab"><a href="https://huggingface.co/papers/2312.09187"><b>2.7. Vision-Language Models as a Source of Rewards</b></a></p><h1 id="0a7b">3. Image Generation & Editing</h1><p id="aa41"><a href="https://huggingface.co/papers/2312.04655"><b>3.1. ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations</b></a></p><p id="d26b"><a href="https://huggingface.co/papers/2312.09246"><b>3.2. SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds</b></a></p><p id="47d0"><a href="https://huggingface.co/papers/2312.09222"><b>3.3. Mosaic-SDF for 3D Generative Models</b></a></p><p id="920d"><a href="https://huggingface.co/papers/2312.08889"><b>3.4. SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance</b></a></p><p id="e80d"><a href="https://huggingface.co/pap

Options

ers/2312.09252"><b>3.5. FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection</b></a></p><h1 id="11af">4. Video Generation & Editing</h1><p id="8b3b"><a href="https://huggingface.co/papers/2312.05107"><b>4.1. DreaMoving: A Human Dance Video Generation Framework Based on Diffusion Models</b></a></p><p id="8335"><a href="https://huggingface.co/papers/2312.06662"><b>4.2. Photorealistic Video Generation with Diffusion Models</b></a></p><p id="60b0"><a href="https://huggingface.co/papers/2312.04875"><b>4.3. MVDD: Multi-View Depth Diffusion Models</b></a></p><p id="beb3"><a href="https://huggingface.co/papers/2312.09109"><b>4.4. VideoLCM: Video Latent Consistency Model</b></a></p><h1 id="1cd6">5. Image Segmentation</h1><p id="d256"><a href="https://huggingface.co/papers/2312.07661"><b>5.1. CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor</b></a></p><h1 id="cabb">6. Image Recognition</h1><p id="5c5d"><a href="https://huggingface.co/papers/2312.04837"><b>6.1. Localized Symbolic Knowledge Distillation for Visual Commonsense Models</b></a></p><h2 id="f1a9">If you like the article and would like to support me, make sure to:</h2><ul><li><b>👏 Clap for the story (50 claps) to help this article be featured</b></li><li><b>Subscribe to <a href="https://youssefh.substack.com/">To Data & Beyond</a> Newsletter</b></li><li><b>Follow me on <a href="https://youssefraafat57.medium.com/">Medium</a></b></li><li><b>📰 View more content on my <a href="https://medium.com/@youssefraafat57">medium profile</a></b></li><li><b>🔔 Follow Me: <a href="https://www.linkedin.com/in/youssef-hosni-b2960b135/">LinkedIn </a>|<a href="https://www.youtube.com/@youssefhosni9801/featured">Youtube </a>| <a href="https://github.com/youssefHosni">GitHub</a> | <a href="https://twitter.com/Youssef70125494">Twitter</a></b></li></ul><h2 id="25ac">Subscribe to my newsletter To Data & Beyond to get full and early access to my articles:</h2><div id="3d07" class="link-block"> <a href="https://youssefh.substack.com/"> <div> <div> <h2>To Data & Beyond | Youssef Hosni | Substack</h2> <div><h3>Data Science, Machine Learning, AI, and what is beyond them. Click to read To Data & Beyond, by Youssef Hosni, a…</h3></div> <div><p>youssefh.substack.com</p></div> </div> <div> <div style="background-image: url(https://miro.readmedium.com/v2/resize:fit:320/0*lBX68YIFUHJ67nT3)"></div> </div> </div> </a> </div><h2 id="2717">Are you looking to start a career in data science and AI and do not know how? I offer data science mentoring sessions and long-term career mentoring:</h2><ul><li><b>Mentoring sessions</b>: <a href="https://topmate.io/youssef_hosni">https://lnkd.in/dXeg3KPW</a></li><li><b>Long-term mentoring:</b> <a href="https://lnkd.in/dtdUYBrM">https://lnkd.in/dtdUYBrM</a></li></ul><figure id="e4ec"><img src="https://cdn-images-1.readmedium.com/v2/resize:fit:800/0*SzdN4bIQAQGNiK3c.png"><figcaption></figcaption></figure></article></body>

Top Important Computer Vision Papers for the Week from 11/12 to 17/12

Stay Updated with Recent Computer Vision Research

Every week, several top-tier academic conferences and journals showcased innovative research in computer vision, presenting exciting breakthroughs in various subfields such as image recognition, vision model optimization, generative adversarial networks (GANs), image segmentation, video analysis, and more.

This article provides a comprehensive overview of the most significant papers published in the third week of December 2023, highlighting the latest research and advancements in computer vision. Whether you’re a researcher, practitioner, or enthusiast, this article will provide valuable insights into the state-of-the-art techniques and tools in computer vision.

Table of Contents:

  1. Stable Diffusion
  2. Vision Language Models
  3. Image Generation & Editing
  4. Video Generation & Editing
  5. Image Segmentation
  6. Image Recognition

Most insights I share in Medium have previously been shared in my weekly newsletter, To Data & Beyond.

If you want to be up-to-date with the frenetic world of AI while also feeling inspired to take action or, at the very least, to be well-prepared for the future ahead of us, this is for you.

🏝Subscribe below🏝 to become an AI leader among your peers and receive content not present in any other platform, including Medium:

1. Stable Diffusion

1.1. Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

1.2. Customizing Motion in Text-to-Video Diffusion Models

1.3. Efficient Quantization Strategies for Latent Diffusion Models

1.4. FreeInit: Bridging Initialization Gap in Video Diffusion Models

1.5. DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing

1.6. FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

1.7. Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation

1.8. PEEKABOO: Interactive Video Generation via Masked-Diffusion

1.8. Clockwork Diffusion: Efficient Generation With Model-Step Distillation

1.9. LIME: Localized Image Editing via Attention Regularization in Diffusion Models

1.10. UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation

2. Vision Language Models

2.1. Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

2.2. VILA: On Pre-training for Visual Language Models

2.3. CCM: Adding Conditional Controls to Text-to-Image Consistency Models

2.4. How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation

2.5. CogAgent: A Visual Language Model for GUI Agents

2.6. A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

2.7. VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

2.6. Pixel-Aligned Language Models

2.7. Vision-Language Models as a Source of Rewards

3. Image Generation & Editing

3.1. ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

3.2. SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds

3.3. Mosaic-SDF for 3D Generative Models

3.4. SEEAvatar: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance

3.5. FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection

4. Video Generation & Editing

4.1. DreaMoving: A Human Dance Video Generation Framework Based on Diffusion Models

4.2. Photorealistic Video Generation with Diffusion Models

4.3. MVDD: Multi-View Depth Diffusion Models

4.4. VideoLCM: Video Latent Consistency Model

5. Image Segmentation

5.1. CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

6. Image Recognition

6.1. Localized Symbolic Knowledge Distillation for Visual Commonsense Models

If you like the article and would like to support me, make sure to:

Subscribe to my newsletter To Data & Beyond to get full and early access to my articles:

Are you looking to start a career in data science and AI and do not know how? I offer data science mentoring sessions and long-term career mentoring:

Data Science
AI
Computer Vision
Deep Learning
Machine Learning
Recommended from ReadMedium