ICCV2023论文速递（2023.8.16）！多篇扩散模型/SAM相关研究-技术圈

整理：AI算法与图像处理

CVPR2023论文和代码整理：https://github.com/DWCTOD/CVPR2023-Papers-with-Code-Demo

欢迎关注公众号 AI算法与图像处理，获取更多干货：

大家好, 最近正在优化每周分享的CVPR论文, 目前考虑按照不同类别去分类,方便不同方向的小伙伴挑选自己感兴趣的论文哈

最新成果demo展示：

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

论文/Paper: http://arxiv.org/pdf/2308.07918

代码/Code: https://github.com/chuhanxx/helping_hand_for_egocentric_videos

Memory-and-Anticipation Transformer for Online Action Understanding

论文/Paper: http://arxiv.org/pdf/2308.07893

代码/Code: https://github.com/echo0125/memory-and-anticipation-transformer

ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces

论文/Paper: http://arxiv.org/pdf/2308.07868

代码/Code: https://github.com/qianyiwu/objectsdf_plus

StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models

论文/Paper: http://arxiv.org/pdf/2308.07863

代码/Code: None

ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition

论文/Paper: http://arxiv.org/pdf/2308.07815

代码/Code: https://github.com/cool-xuan/imbalanced_sam

Learning to Identify Critical States for Reinforcement Learning from Videos

论文/Paper: http://arxiv.org/pdf/2308.07795

代码/Code: https://github.com/ai-initiative-kaust/videorlcs

Identity-Consistent Aggregation for Video Object Detection

论文/Paper: http://arxiv.org/pdf/2308.07737

代码/Code: None

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

论文/Paper: http://arxiv.org/pdf/2308.07732

代码/Code: https://github.com/haiyang-w/unitr

DiffGuard: Semantic Mismatch-Guided Out-of-Distribution Detection using Pre-trained Diffusion Models

论文/Paper: http://arxiv.org/pdf/2308.07687

代码/Code: https://github.com/cure-lab/diffguard

Boosting Multi-modal Model Performance with Adaptive Gradient Modulation

论文/Paper: http://arxiv.org/pdf/2308.07686

代码/Code: https://github.com/lihong2303/agm_iccv2023

Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval

论文/Paper: http://arxiv.org/pdf/2308.07648

代码/Code: None

Backpropagation Path Search On Adversarial Transferability

论文/Paper: http://arxiv.org/pdf/2308.07625

代码/Code: None

Story Visualization by Online Text Augmentation with Context Memory

论文/Paper: http://arxiv.org/pdf/2308.07575

代码/Code: None

3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack

论文/Paper: http://arxiv.org/pdf/2308.07546

代码/Code: None

Boosting Semi-Supervised Learning by bridging high and low-confidence predictions

论文/Paper: http://arxiv.org/pdf/2308.07509

代码/Code: None

DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation

论文/Paper: http://arxiv.org/pdf/2308.07498

代码/Code: https://github.com/hanqingwangai/Dreamwalker

Probabilistic MIMO U-Net: Efficient and Accurate Uncertainty Estimation for Pixel-wise Regression

论文/Paper: http://arxiv.org/pdf/2308.07477

代码/Code: https://github.com/antonbaumann/mimo-unet

PARIS: Part-level Reconstruction and Motion Analysis for Articulated Objects

论文/Paper: http://arxiv.org/pdf/2308.07391

代码/Code: None

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

论文/Paper: http://arxiv.org/pdf/2308.07787

代码/Code: https://github.com/joannahong/diffv2s