2021年必读的10 个计算机视觉论文总结
共 4732字,需浏览 10分钟
·
2022-02-11 11:46
来源:DeepHub IMBA 本文共2000字,建议阅读10分钟
本文将带有清晰的视频解释和代码,文末尾列出了对每篇论文的完整参考。
“科学不能告诉我们应该做什么,只能告诉我们可以做什么。”—— Jean-Paul Sartre, Being and Nothingness
DALL·E: Zero-Shot Text-to-Image Generation from OpenAI [1]
Taming Transformers for High-Resolution Image Synthesis [2]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [3]
Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image [4]
Total Relighting: Learning to Relight Portraits for Background Replacement [5]
Animating Pictures with Eulerian Motion Fields [6]
CVPR 2021 Best Paper Award: GIRAFFE — Controllable Image Generation [7]
TimeLens: Event-based Video Frame Interpolation [8]
CLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis [9]
CityNeRF: Building NeRF at City Scale [10]
引用:
[1] A. Ramesh et al., Zero-shot text-to-image generation, 2021. arXiv:2102.12092
[2] Taming Transformers for High-Resolution Image Synthesis, Esser et al., 2020.
[3] Liu, Z. et al., 2021, “Swin Transformer: Hierarchical Vision Transformer using Shifted Windows”, arXiv preprint https://arxiv.org/abs/2103.14030v1
[bonus] Yuille, A.L., and Liu, C., 2021. Deep nets: What have they ever done for vision?. International Journal of Computer Vision, 129(3), pp.781–802, https://arxiv.org/abs/1805.04025.
[4] Liu, A., Tucker, R., Jampani, V., Makadia, A., Snavely, N. and Kanazawa, A., 2020. Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image, https://arxiv.org/pdf/2012.09855.pdf
[5] Pandey et al., 2021, Total Relighting: Learning to Relight Portraits for Background Replacement, doi: 10.1145/3450626.3459872, https://augmentedperception.github.io/total_relighting/total_relighting_paper.pdf.
[6] Holynski, Aleksander, et al. “Animating Pictures with Eulerian Motion Fields.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
[7] Michael Niemeyer and Andreas Geiger, (2021), “GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields”, Published in CVPR 2021.
[8] Stepan Tulyakov, Daniel Gehrig, Stamatios Georgoulis, Julius Erbach, Mathias Gehrig, Yuanyou Li, Davide Scaramuzza, TimeLens: Event-based Video Frame Interpolation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 2021, http://rpg.ifi.uzh.ch/docs/CVPR21_Gehrig.pdf
[9] a) CLIPDraw: exploring text-to-drawing synthesis through language-image encodersb) StyleCLIPDraw: Schaldenbrand, P., Liu, Z. and Oh, J., 2021. StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis.
[10] Xiangli, Y., Xu, L., Pan, X., Zhao, N., Rao, A., Theobalt, C., Dai, B. and Lin, D., 2021. CityNeRF: Building NeRF at City Scale.
本文作者:Louis Bouchard