ICCV2023论文速递(2023.8.16)!多篇扩散模型/SAM相关研究

共 3684字,需浏览 8分钟

 ·

2023-08-18 21:13








整理:AI算法与图像处理



CVPR2023论文和代码整理:https://github.com/DWCTOD/CVPR2023-Papers-with-Code-Demo

欢迎关注公众号 AI算法与图像处理,获取更多干货:





大家好,  最近正在优化每周分享的CVPR论文, 目前考虑按照不同类别去分类,方便不同方向的小伙伴挑选自己感兴趣的论文哈







最新成果demo展示:








Helping Hands: An Object-Aware Ego-Centric Video Recognition Model



  • 论文/Paper: http://arxiv.org/pdf/2308.07918


  • 代码/Code: https://github.com/chuhanxx/helping_hand_for_egocentric_videos



Memory-and-Anticipation Transformer for Online Action Understanding



  • 论文/Paper: http://arxiv.org/pdf/2308.07893


  • 代码/Code: https://github.com/echo0125/memory-and-anticipation-transformer



ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces



  • 论文/Paper: http://arxiv.org/pdf/2308.07868


  • 代码/Code: https://github.com/qianyiwu/objectsdf_plus



StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models



  • 论文/Paper: http://arxiv.org/pdf/2308.07863


  • 代码/Code: None



ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition



  • 论文/Paper: http://arxiv.org/pdf/2308.07815


  • 代码/Code: https://github.com/cool-xuan/imbalanced_sam



Learning to Identify Critical States for Reinforcement Learning from Videos



  • 论文/Paper: http://arxiv.org/pdf/2308.07795


  • 代码/Code: https://github.com/ai-initiative-kaust/videorlcs



Identity-Consistent Aggregation for Video Object Detection



  • 论文/Paper: http://arxiv.org/pdf/2308.07737


  • 代码/Code: None



UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation



  • 论文/Paper: http://arxiv.org/pdf/2308.07732


  • 代码/Code: https://github.com/haiyang-w/unitr



DiffGuard: Semantic Mismatch-Guided Out-of-Distribution Detection using Pre-trained Diffusion Models



  • 论文/Paper: http://arxiv.org/pdf/2308.07687


  • 代码/Code: https://github.com/cure-lab/diffguard



Boosting Multi-modal Model Performance with Adaptive Gradient Modulation



  • 论文/Paper: http://arxiv.org/pdf/2308.07686


  • 代码/Code: https://github.com/lihong2303/agm_iccv2023



Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval



  • 论文/Paper: http://arxiv.org/pdf/2308.07648


  • 代码/Code: None



Backpropagation Path Search On Adversarial Transferability



  • 论文/Paper: http://arxiv.org/pdf/2308.07625


  • 代码/Code: None



Story Visualization by Online Text Augmentation with Context Memory



  • 论文/Paper: http://arxiv.org/pdf/2308.07575


  • 代码/Code: None



3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack



  • 论文/Paper: http://arxiv.org/pdf/2308.07546


  • 代码/Code: None



Boosting Semi-Supervised Learning by bridging high and low-confidence predictions



  • 论文/Paper: http://arxiv.org/pdf/2308.07509


  • 代码/Code: None



DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation



  • 论文/Paper: http://arxiv.org/pdf/2308.07498


  • 代码/Code: https://github.com/hanqingwangai/Dreamwalker



Probabilistic MIMO U-Net: Efficient and Accurate Uncertainty Estimation for Pixel-wise Regression



  • 论文/Paper: http://arxiv.org/pdf/2308.07477


  • 代码/Code: https://github.com/antonbaumann/mimo-unet



PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects



  • 论文/Paper: http://arxiv.org/pdf/2308.07391


  • 代码/Code: None



DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding



  • 论文/Paper: http://arxiv.org/pdf/2308.07787


  • 代码/Code: https://github.com/joannahong/diffv2s











浏览 247
点赞
评论
收藏
分享

手机扫一扫分享

分享
举报
评论
图片
表情
推荐
点赞
评论
收藏
分享

手机扫一扫分享

分享
举报