CVPR 2022 论文和开源项目合集
共 19845字,需浏览 40分钟
·
2022-03-28 05:12
向AI转型的程序员都关注了这个号👇👇👇
机器学习AI算法工程 公众号:datayx
【CVPR 2022 论文开源目录】
Backbone
CLIP
GAN
NAS
NeRF
Visual Transformer
视觉和语言(Vision-Language)
自监督学习(Self-supervised Learning)
数据增强(Data Augmentation)
目标检测(Object Detection)
目标跟踪(Visual Tracking)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
小样本分割(Few-Shot Segmentation)
视频理解(Video Understanding)
图像编辑(Image Editing)
Low-level Vision
超分辨率(Super-Resolution)
3D点云(3D Point Cloud)
3D目标检测(3D Object Detection)
3D语义分割(3D Semantic Segmentation)
3D目标跟踪(3D Object Tracking)
3D人体姿态估计(3D Human Pose Estimation)
3D语义场景补全(3D Semantic Scene Completion)
3D重建(3D Reconstruction)
伪装物体检测(Camouflaged Object Detection)
深度估计(Depth Estimation)
立体匹配(Stereo Matching)
车道线检测(Lane Detection)
图像修复(Image Inpainting)
人群计数(Crowd Counting)
医学图像(Medical Image)
场景图生成(Scene Graph Generation)
弱监督物体检测(Weakly Supervised Object Localization)
高光谱图像重建(Hyperspectral Image Reconstruction)
水印(Watermarking)
数据集(Datasets)
新任务(New Tasks)
其他(Others)
Backbone
A ConvNet for the 2020s
Paper: https://arxiv.org/abs/2201.03545
Code: https://github.com/facebookresearch/ConvNeXt
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Paper: https://arxiv.org/abs/2203.06717
Code: https://github.com/megvii-research/RepLKNet
Code2: https://github.com/DingXiaoH/RepLKNet-pytorch
MPViT : Multi-Path Vision Transformer for Dense Prediction
Paper: https://arxiv.org/abs/2112.11010
Code: https://github.com/youngwanLEE/MPViT
HairCLIP: Design Your Hair by Text and Reference Image
Paper: https://arxiv.org/abs/2112.05142
Code: https://github.com/wty-ustc/HairCLIP
PointCLIP: Point Cloud Understanding by CLIP
Paper: https://arxiv.org/abs/2112.02413
Code: https://github.com/ZrrSkywalker/PointCLIP
Blended Diffusion for Text-driven Editing of Natural Images
Paper: https://arxiv.org/abs/2111.14818
Code: https://github.com/omriav/blended-diffusion
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Homepage: https://semanticstylegan.github.io/
Paper: https://arxiv.org/abs/2112.02236
Demo: https://semanticstylegan.github.io/videos/demo.mp4
Style Transformer for Image Inversion and Editing
Paper: https://arxiv.org/abs/2203.07932
Code: https://github.com/sapphire497/style-transformer
β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
Paper: https://arxiv.org/abs/2203.01665
Code: https://github.com/Sunshine-Ye/Beta-DARTS
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
Paper: https://arxiv.org/abs/2111.15362
Code: None
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
Homepage: https://jonbarron.info/mipnerf360/
Paper: https://arxiv.org/abs/2111.12077
Demo: https://youtu.be/YStDS2-Ln1s
Point-NeRF: Point-based Neural Radiance Fields
Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/
Paper: https://arxiv.org/abs/2201.08845
Code: https://github.com/Xharlie/point-nerf
NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images
Paper: https://arxiv.org/abs/2111.13679
Homepage: https://bmild.github.io/rawnerf/
Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc
Urban Radiance Fields
Homepage: https://urban-radiance-fields.github.io/
Paper: https://arxiv.org/abs/2111.14643
Demo: https://youtu.be/qGlq5DZT6uc
Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation
Paper: https://arxiv.org/abs/2202.13162
Code: https://github.com/HexagonPrime/Pix2NeRF
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
Homepage: https://grail.cs.washington.edu/projects/humannerf/
Paper: https://arxiv.org/abs/2201.04127
Demo: https://youtu.be/GM-RoZEymmw
Backbone
MPViT : Multi-Path Vision Transformer for Dense Prediction
Paper: https://arxiv.org/abs/2112.11010
Code: https://github.com/youngwanLEE/MPViT
应用(Application)
Language-based Video Editing via Multi-Modal Multi-Level Transformer
Paper: https://arxiv.org/abs/2104.01122
Code: None
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
Paper: https://arxiv.org/abs/2203.00859
Code: None
Embracing Single Stride 3D Object Detector with Sparse Transformer
Paper: https://arxiv.org/abs/2112.06375
Code: https://github.com/TuSimple/SST
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Paper: https://arxiv.org/abs/2203.02891
Code: https://github.com/xulianuwa/MCTformer
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Paper: https://arxiv.org/abs/2112.05132
Code: https://github.com/Anirudh257/strm
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
Paper: https://arxiv.org/abs/2111.07910
Code: https://github.com/caiyuanhao1998/MST
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Homepage: https://point-bert.ivg-research.xyz/
Paper: https://arxiv.org/abs/2111.14819
Code: https://github.com/lulutang0608/Point-BERT
GroupViT: Semantic Segmentation Emerges from Text Supervision
Homepage: https://jerryxu.net/GroupViT/
Paper: https://arxiv.org/abs/2202.11094
Demo: https://youtu.be/DtJsWIUTW-Y
Restormer: Efficient Transformer for High-Resolution Image Restoration
Paper: https://arxiv.org/abs/2111.09881
Code: https://github.com/swz30/Restormer
Splicing ViT Features for Semantic Appearance Transfer
Homepage: https://splice-vit.github.io/
Paper: https://arxiv.org/abs/2201.00424
Code: https://github.com/omerbt/Splice
Self-supervised Video Transformer
Homepage: https://kahnchana.github.io/svt/
Paper: https://arxiv.org/abs/2112.01514
Code: https://github.com/kahnchana/svt
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
Paper: https://arxiv.org/abs/2203.02664
Code: https://github.com/rulixiang/afa
Accelerating DETR Convergence via Semantic-Aligned Matching
Paper: https://arxiv.org/abs/2203.06883
Code: https://github.com/ZhangGongjie/SAM-DETR
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Paper: https://arxiv.org/abs/2203.01305
Code: https://github.com/FengLi-ust/DN-DETR
Style Transformer for Image Inversion and Editing
Paper: https://arxiv.org/abs/2203.07932
Code: https://github.com/sapphire497/style-transformer
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
Paper: https://arxiv.org/abs/2203.10981
Code: https://github.com/kuanchihhuang/MonoDTR
Mask Transfiner for High-Quality Instance Segmentation
Paper: https://arxiv.org/abs/2111.13673
Code: https://github.com/SysCV/transfiner
Conditional Prompt Learning for Vision-Language Models
Paper: https://arxiv.org/abs/2203.05557
Code: https://github.com/KaiyangZhou/CoOp
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
Paper: https://arxiv.org/abs/2203.06965
Code: None
Crafting Better Contrastive Views for Siamese Representation Learning
Paper: https://arxiv.org/abs/2202.03278
Code: https://github.com/xyupeng/ContrastiveCrop
HCSC: Hierarchical Contrastive Selective Coding
Homepage: https://github.com/gyfastas/HCSC
Paper: https://arxiv.org/abs/2202.00455
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
Paper: https://arxiv.org/abs/2202.12513
Code: https://github.com/DensoITLab/TeachAugment
AlignMix: Improving representation by interpolating aligned features
Paper: https://arxiv.org/abs/2103.15375
Code: None
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Paper: https://arxiv.org/abs/2203.01305
Code: https://github.com/FengLi-ust/DN-DETR
Accelerating DETR Convergence via Semantic-Aligned Matching
Paper: https://arxiv.org/abs/2203.06883
Code: https://github.com/ZhangGongjie/SAM-DETR
Localization Distillation for Dense Object Detection
Paper: https://arxiv.org/abs/2102.12252
Code: https://github.com/HikariTJU/LD
Code2: https://github.com/HikariTJU/LD
Focal and Global Knowledge Distillation for Detectors
Paper: https://arxiv.org/abs/2111.11837
Code: https://github.com/yzd-v/FGD
A Dual Weighting Label Assignment Scheme for Object Detection
Paper: https://arxiv.org/abs/2203.09730
Code: https://github.com/strongwolf/DW
Correlation-Aware Deep Tracking
Paper: https://arxiv.org/abs/2203.01666
Code: None
TCTrack: Temporal Contexts for Aerial Tracking
Paper: https://arxiv.org/abs/2203.01885
Code: https://github.com/vision4robotics/TCTrack
弱监督语义分割
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Paper: https://arxiv.org/abs/2203.00962
Code: https://github.com/zhaozhengChen/ReCAM
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Paper: https://arxiv.org/abs/2203.02891
Code: https://github.com/xulianuwa/MCTformer
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
Paper: https://arxiv.org/abs/2203.02664
Code: https://github.com/rulixiang/afa
半监督语义分割
ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation
Paper: https://arxiv.org/abs/2106.05095
Code: https://github.com/LiheYoung/ST-PlusPlus
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
Homepage: https://haochen-wang409.github.io/U2PL/
Paper: https://arxiv.org/abs/2203.03884
Code: https://github.com/Haochen-Wang409/U2PL
无监督语义分割
GroupViT: Semantic Segmentation Emerges from Text Supervision
Homepage: https://jerryxu.net/GroupViT/
Paper: https://arxiv.org/abs/2202.11094
Demo: https://youtu.be/DtJsWIUTW-Y
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
Paper: https://arxiv.org/abs/2203.04074
Code: https://github.com/zhang-tao-whu/e2ec
Mask Transfiner for High-Quality Instance Segmentation
Paper: https://arxiv.org/abs/2111.13673
Code: https://github.com/SysCV/transfiner
自监督实例分割
FreeSOLO: Learning to Segment Objects without Annotations
Paper: https://arxiv.org/abs/2202.12181
Code: None
视频实例分割
Efficient Video Instance Segmentation via Tracklet Query and Proposal
Homepage: https://jialianwu.com/projects/EfficientVIS.html
Paper: https://arxiv.org/abs/2203.01853
Demo: https://youtu.be/sSPMzgtMKCE
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation
Paper: https://arxiv.org/abs/2203.07615
Code: https://github.com/chunbolang/BAM
Self-supervised Video Transformer
Homepage: https://kahnchana.github.io/svt/
Paper: https://arxiv.org/abs/2112.01514
Code: https://github.com/kahnchana/svt
行为识别(Action Recognition)
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Paper: https://arxiv.org/abs/2112.05132
Code: https://github.com/Anirudh257/strm
动作检测(Action Detection)
End-to-End Semi-Supervised Learning for Video Action Detection
Paper: https://arxiv.org/abs/2203.04251
Code: None
Style Transformer for Image Inversion and Editing
Paper: https://arxiv.org/abs/2203.07932
Code: https://github.com/sapphire497/style-transformer
Blended Diffusion for Text-driven Editing of Natural Images
Paper: https://arxiv.org/abs/2111.14818
Code: https://github.com/omriav/blended-diffusion
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Homepage: https://semanticstylegan.github.io/
Paper: https://arxiv.org/abs/2112.02236
Demo: https://semanticstylegan.github.io/videos/demo.mp4
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
Paper: https://arxiv.org/abs/2111.15362
Code: None
Restormer: Efficient Transformer for High-Resolution Image Restoration
Paper: https://arxiv.org/abs/2111.09881
Code: https://github.com/swz30/Restormer
图像超分辨率(Image Super-Resolution)
Learning the Degradation Distribution for Blind Image Super-Resolution
Paper: https://arxiv.org/abs/2203.04962
Code: https://github.com/greatlog/UnpairedSR
视频超分辨率(Video Super-Resolution)
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
Paper: https://arxiv.org/abs/2104.13371
Code: https://github.com/open-mmlab/mmediting
Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Homepage: https://point-bert.ivg-research.xyz/
Paper: https://arxiv.org/abs/2111.14819
Code: https://github.com/lulutang0608/Point-BERT
A Unified Query-based Paradigm for Point Cloud Understanding
Paper: https://arxiv.org/abs/2203.01252
Code: None
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
Paper: https://arxiv.org/abs/2203.00680
Code: https://github.com/MohamedAfham/CrossPoint
PointCLIP: Point Cloud Understanding by CLIP
Paper: https://arxiv.org/abs/2112.02413
Code: https://github.com/ZrrSkywalker/PointCLIP
Embracing Single Stride 3D Object Detector with Sparse Transformer
Paper: https://arxiv.org/abs/2112.06375
Code: https://github.com/TuSimple/SST
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
Paper: https://arxiv.org/abs/2011.12001
Code: https://github.com/qq456cvb/CanonicalVoting
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
Paper: https://arxiv.org/abs/2203.10981
Code: https://github.com/kuanchihhuang/MonoDTR
Scribble-Supervised LiDAR Semantic Segmentation
Paper: https://arxiv.org/abs/2203.08537
Dataset: https://github.com/ouenal/scribblekitti
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
Paper: https://arxiv.org/abs/2203.01730
Code: https://github.com/Ghostish/Open3DSOT
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
Paper: https://arxiv.org/abs/2111.12707
Code: https://github.com/Vegetebird/MHFormer
中文解读: https://zhuanlan.zhihu.com/p/439459426
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
Paper: https://arxiv.org/abs/2203.00859
Code: None
MonoScene: Monocular 3D Semantic Scene Completion
Paper: https://arxiv.org/abs/2112.00726
Code: https://github.com/cv-rits/MonoScene
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
Homepage: https://banmo-www.github.io/
Paper: https://arxiv.org/abs/2112.12761
Code: https://github.com/facebookresearch/banmo
Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection
Paper: https://arxiv.org/abs/2203.02688
Code: https://github.com/lartpang/ZoomNet
单目深度估计
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
Paper: https://arxiv.org/abs/2203.01502
Code: None
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
Paper: https://arxiv.org/abs/2203.00838
Code: None
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
Paper: https://arxiv.org/abs/2112.02306
Code: None
ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching
Paper: https://arxiv.org/abs/2203.02146
Code: https://github.com/gangweiX/ACVNet
Rethinking Efficient Lane Detection via Curve Modeling
Paper: https://arxiv.org/abs/2203.02431
Code: https://github.com/voldemortX/pytorch-auto-drive
Demo:https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
Paper: https://arxiv.org/abs/2203.00867
Code: https://github.com/DQiaole/ZITS_inpainting
Leveraging Self-Supervision for Cross-Domain Crowd Counting
Paper: https://arxiv.org/abs/2103.16291
Code: None
BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
Paper: https://arxiv.org/abs/2203.02533
Code: None
SGTR: End-to-end Scene Graph Generation with Transformer
Paper: https://arxiv.org/abs/2112.12970
Code: None
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions
Homepage: https://lukashoel.github.io/stylemesh/
Paper: https://arxiv.org/abs/2112.01530
Code: https://github.com/lukasHoel/stylemesh
Demo:https://www.youtube.com/watch?v=ZqgiTLcNcks
Weakly Supervised Object Localization as Domain Adaption
Paper: https://arxiv.org/abs/2203.01714
Code: https://github.com/zh460045050/DA-WSOL_CVPR2022
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
Paper: https://arxiv.org/abs/2111.07910
Code: https://github.com/caiyuanhao1998/MST
Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings
Paper: https://arxiv.org/abs/2104.13450
Code: None
It's About Time: Analog Clock Reading in the Wild
Homepage: https://charigyang.github.io/abouttime/
Paper: https://arxiv.org/abs/2111.09162
Code: https://github.com/charigyang/itsabouttime
Demo: https://youtu.be/cbiMACA6dRc
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
Paper: https://arxiv.org/abs/2112.02306
Code: None
Kubric: A scalable dataset generator
Paper: https://arxiv.org/abs/2203.03570
Code: https://github.com/google-research/kubric
Scribble-Supervised LiDAR Semantic Segmentation
Paper: https://arxiv.org/abs/2203.08537
Dataset: https://github.com/ouenal/scribblekitti
Language-based Video Editing via Multi-Modal Multi-Level Transformer
Paper: https://arxiv.org/abs/2104.01122
Code: None
It's About Time: Analog Clock Reading in the Wild
Homepage: https://charigyang.github.io/abouttime/
Paper: https://arxiv.org/abs/2111.09162
Code: https://github.com/charigyang/itsabouttime
Demo: https://youtu.be/cbiMACA6dRc
Splicing ViT Features for Semantic Appearance Transfer
Homepage: https://splice-vit.github.io/
Paper: https://arxiv.org/abs/2201.00424
Code: https://github.com/omerbt/Splice
Kubric: A scalable dataset generator
Paper: https://arxiv.org/abs/2203.03570
Code: https://github.com/google-research/kubric
机器学习算法AI大数据技术
搜索公众号添加: datanlp
长按图片,识别二维码
阅读过本文的人还看了以下文章:
基于40万表格数据集TableBank,用MaskRCNN做表格检测
《深度学习入门:基于Python的理论与实现》高清中文PDF+源码
2019最新《PyTorch自然语言处理》英、中文版PDF+源码
《21个项目玩转深度学习:基于TensorFlow的实践详解》完整版PDF+附书代码
PyTorch深度学习快速实战入门《pytorch-handbook》
【下载】豆瓣评分8.1,《机器学习实战:基于Scikit-Learn和TensorFlow》
李沐大神开源《动手学深度学习》,加州伯克利深度学习(2019春)教材
【Keras】完整实现‘交通标志’分类、‘票据’分类两个项目,让你掌握深度学习图像分类
如何利用全新的决策树集成级联结构gcForest做特征工程并打分?
Machine Learning Yearning 中文翻译稿
斯坦福CS230官方指南:CNN、RNN及使用技巧速查(打印收藏)
中科院Kaggle全球文本匹配竞赛华人第1名团队-深度学习与特征工程
不断更新资源
深度学习、机器学习、数据分析、python
搜索公众号添加: datayx