CVPR 2024 Accepted Papers
Papers are assigned to poster sessions such that topics are maximally spread over sessions (attendees will find interesting papers at each session) while grouping similar posters within each poster session to minimize walking distances. We used a 1D t-SNE projection of the SPECTER paper embeddings to realize this assignment.
This page is cached for 1 hour. Changes to affiliation or name in your local profile may take up to 60 minutes to appear here.
Training-free Pretrained Model Merging
Zhengqi Xu · Ke Yuan · Huiqiong Wang · Yong Wang · Mingli Song · Jie Song
|
||
FedMef: Towards Memory-efficient Federated Dynamic Pruning
Hong Huang · Weiming Zhuang · Chen Chen · Lingjuan Lyu
|
||
3D Human Pose Perception from Egocentric Stereo Videos
Hiroyasu Akada · Jian Wang · Vladislav Golyanik · Christian Theobalt
|
||
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams
Christen Millerdurai · Hiroyasu Akada · Jian Wang · Diogo Luvizon · Christian Theobalt · Vladislav Golyanik
|
||
Federated Online Adaptation for Deep Stereo
Matteo Poggi · Fabio Tosi
|
||
Super-Resolution Reconstruction from Bayer-Pattern Spike Streams
Yanchen Dong · Ruiqin Xiong · Jian Zhang · Zhaofei Yu · Xiaopeng Fan · Shuyuan Zhu · Tiejun Huang
|
||
A Vision Check-up for Language Models
Pratyusha Sharma · Tamar Rott Shaham · Manel Baradad · Stephanie Fu · Adrian Rodriguez-Munoz · Shivam Duggal · Phillip Isola · Antonio Torralba
|
||
SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation
Kejia Yin · Varshanth Rao · Ruowei Jiang · Xudong Liu · Parham Aarabi · David B. Lindell
|
||
RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection
Ximiao Zhang · Min Xu · Xiuzhuang Zhou
|
||
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang · Jian Tao · Jiafei Lyu · Chunjiang Ge · Jiaxin Chen · Weihan Shen · Xiaolong Zhu · Xiu Li
|
||
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng · Xiaoyang Wu · Li Jiang · Yukang Chen · Hengshuang Zhao · Zhuotao Tian · Jiaya Jia
|
||
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
Zihan Wang · Xiangyang Li · Jiahao Yang · Yeqi Liu · Junjie Hu · Ming Jiang · Shuqiang Jiang
|
||
Robust Synthetic-to-Real Transfer for Stereo Matching
Jiawei Zhang · Jiahe Li · Lei Huang · Xiaohan Yu · Lin Gu · Jin Zheng · Xiao Bai
|
||
UniMODE: Unified Monocular 3D Object Detection
Zhuoling Li · Xiaogang Xu · Ser-Nam Lim · Hengshuang Zhao
|
||
A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution
Zhixiong Yang · Jingyuan Xia · Shengxi Li · Xinghua Huang · Shuanghui Zhang · Zhen Liu · Yaowen Fu · Yongxiang Liu
|
||
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu · Yikun Liu · Ferenas · Chen Ju · Ya Zhang · Yanfeng Wang
|
||
EFHQ: Multi-purpose ExtremePose-Face-HQ dataset
Trung Dao · Duc H Vu · Cuong Pham · Anh Tran
|
||
Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration
Yuxi Wei · Zi Wang · Yifan Lu · Chenxin Xu · Changxing Liu · Hao Zhao · Siheng Chen · Yanfeng Wang
|
||
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Kanchana Ranasinghe · Satya Narayan Shukla · Omid Poursaeed · Michael Ryoo · Tsung-Yu Lin
|
||
Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain
Qunliang Xing · Mai Xu · Shengxi Li · Xin Deng · Meisong Zheng · huaida liu · Ying Chen
|
||
Precise Image Editing via Recognition and Generation Tasks
Shelly Sheynin · Adam Polyak · Uriel Singer · Yuval Kirstain · Amit Zohar · Oron Ashual · Devi Parikh · Yaniv Taigman
|
||
Spin-UP: Spin Light for Natural Light Uncalibrated Photometric Stereo
Zongrui Li · Zhan Lu · Haojie Yan · Boxin Shi · Gang Pan · Qian Zheng · Xudong Jiang
|
||
Leak and Learn: An Attacker's Cookbook to Train Using Leaked Data from Federated Learning
Joshua C. Zhao · Ahaan Dabholkar · Atul Sharma · Saurabh Bagchi
|
||
EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation
Chanyoung Kim · Woojung Han · Dayun Ju · Seong Jae Hwang
|
||
A Physics-informed Low-rank Deep Neural Network for Blind and Universal Lens Aberration Correction
Jin Gong · Runzhao Yang · Weihang Zhang · Jinli Suo · Qionghai Dai
|
||
The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding
Lorenzo Bianchi · Fabio Carrara · Nicola Messina · Claudio Gennaro · Fabrizio Falchi
|
||
VRP-SAM: SAM with Visual Reference Prompt
Yanpeng Sun · Jiahui Chen · Shan Zhang · Xinyu Zhang · Qiang Chen · gang zhang · Errui Ding · Jingdong Wang · Zechao Li
|
||
Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes
Diandian Guo · Deng-Ping Fan · Tongyu Lu · Christos Sakaridis · Luc Van Gool
|
||
Data Poisoning based Backdoor Attacks to Contrastive Learning
Jinghuai Zhang · Hongbin Liu · Jinyuan Jia · Neil Zhenqiang Gong
|
||
A2XP: Towards Private Domain Generalization
Geunhyeok Yu · Hyoseok Hwang
|
||
ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks
Kai Han · Yunhe Wang · Jianyuan Guo · Enhua Wu
|
||
An Empirical Study of Scaling Law for Scene Text Recognition
Miao Rang · Zhenni Bi · Chuanjian Liu · Yunhe Wang · Kai Han
|
||
FMA-Net: Flow Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring
Geunhyuk Youk · Jihyong Oh · Munchurl Kim
|
||
Deep Imbalanced Regression via Hierarchical Classification Adjustment
Haipeng Xiong · Angela Yao
|
||
Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting
Haiwei Chen · Yajie Zhao
|
||
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Eric Slyman · Stefan Lee · Scott Cohen · Kushal Kafle
|
||
Modular Blind Video Quality Assessment
Wen Wen · Mu Li · Yabin ZHANG · Yiting Liao · Junlin Li · Li zhang · Kede Ma
|
||
GlitchBench: Can large multimodal models detect video game glitches?
Mohammad Reza Taesiri · Tianjun Feng · Cor-Paul Bezemer · Anh Nguyen
|
||
LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection
Dat NGUYEN · Nesryne Mejri · Inder Pal Singh · Polina Kuleshova · Marcella Astrid · Anis Kacem · Enjie Ghorbel · Djamila Aouada
|
||
DiVa-360: The Dynamic Visual Dataset for Immersive Neural Fields
Cheng-You Lu · Peisen Zhou · Angela Xing · Chandradeep Pokhariya · Arnab Dey · Ishaan Shah · Rugved Mavidipalli · Dylan Hu · Andrew Comport · Kefan Chen · Srinath Sridhar
|
||
DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
Tianyu Huang · Yihan Zeng · Zhilu Zhang · Wan Xu · Hang Xu · Songcen Xu · Rynson W.H. Lau · Wangmeng Zuo
|
||
Open-vocabulary object 6D pose estimation
Jaime Corsetti · Davide Boscaini · Changjae Oh · Andrea Cavallaro · Fabio Poiesi
|
||
Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing
Dongyoung Kim · Jinwoo Kim · Junsang Yu · Seon Joo Kim
|
||
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Yiwen Chen · Zilong Chen · Chi Zhang · Feng Wang · Xiaofeng Yang · Yikai Wang · Zhongang Cai · Lei Yang · Huaping Liu · Guosheng Lin
|
||
SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes
Yihua Huang · Yangtian Sun · Ziyi Yang · Xiaoyang Lyu · Yan-Pei Cao · Xiaojuan Qi
|
||
Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking
Wei Cao · Chang Luo · Biao Zhang · Matthias Nießner · Jiapeng Tang
|
||
From Activation to Initialization: Scaling Insights for Optimizing Neural Fields
Hemanth Saratchandran · Sameera Ramasinghe · Simon Lucey
|
||
MoST: Multi-modality Scene Tokenization for Motion Prediction
Norman Mu · Jingwei Ji · Zhenpei Yang · Nathan Harada · Haotian Tang · Kan Chen · Charles R. Qi · Runzhou Ge · Kratarth Goel · Zoey Yang · Scott Ettinger · Rami Al-Rfou · Dragomir Anguelov · Yin Zhou
|
||
RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses
bedrettin cetinkaya · Sinan Kalkan · Emre Akbas
|
||
Efficient Model Stealing Defense with Noise Transition Matrix
Dong-Dong Wu · Chilin Fu · Weichang Wu · Wenwen Xia · Xiaolu Zhang · JUN ZHOU · Min-Ling Zhang
|
||
Streaming Dense Video Captioning
Xingyi Zhou · Anurag Arnab · Shyamal Buch · Shen Yan · Austin Myers · Xuehan Xiong · Arsha Nagrani · Cordelia Schmid
|
||
End-to-End Spatio-Temporal Action Localisation with Video Transformers
Alexey Gritsenko · Xuehan Xiong · Josip Djolonga · Mostafa Dehghani · Chen Sun · Mario Lučić · Cordelia Schmid · Anurag Arnab
|
||
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction
Guillaume Jaume · Anurag Vaidya · Richard J. Chen · Drew F. K. Williamson · Paul Pu Liang · Faisal Mahmood
|
||
Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology
Andrew Song · Richard J. Chen · Tong Ding · Drew F. K. Williamson · Guillaume Jaume · Faisal Mahmood
|
||
DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision
Lu Ling · Yichen Sheng · Zhi Tu · Wentian Zhao · Cheng Xin · Kun Wan · Lantao Yu · Qianyu Guo · Zixun Yu · Yawen Lu · Xuanmao Li · Xingpeng Sun · Rohan Ashok · Aniruddha Mukherjee · Hao Kang · Xiangrui Kong · Gang Hua · Tianyi Zhang · Bedrich Benes · Aniket Bera
|
||
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
Yunsheng Ma · Can Cui · Xu Cao · Wenqian Ye · Peiran Liu · Juanwu Lu · Amr Abdelraouf · Rohit Gupta · Kyungtae Han · Aniket Bera · James Rehg · Ziran Wang
|
||
ZONE: Zero-Shot Instruction-Guided Local Editing
Shanglin Li · Bohan Zeng · Yutang Feng · Sicheng Gao · Xuhui Liu · Jiaming Liu · Li Lin · Xu Tang · Yao Hu · Jianzhuang Liu · Baochang Zhang
|
||
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving
Chen Min · Dawei Zhao · Liang Xiao · Jian Zhao · Xinli Xu · Zheng Zhu · Lei Jin · Jianshu Li · Yulan Guo · Junliang Xing · Liping Jing · Yiming Nie · Bin Dai
|
||
Describing Differences in Image Sets with Natural Language
Lisa Dunlap · Yuhui Zhang · Xiaohan Wang · Ruiqi Zhong · Trevor Darrell · Jacob Steinhardt · Joseph Gonzalez · Serena Yeung
|
||
Shadows Don’t Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now
Ayush Sarkar · Hanlin Mai · Amitabh Mahapatra · David Forsyth · Svetlana Lazebnik · Anand Bhattad
|
||
Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models
Jingyao Xu · Yuetong Lu · Yandong Li · Siyang Lu · Dongdong Wang · Xiang Wei
|
||
BioCLIP: A Vision Foundation Model for the Tree of Life
Samuel Stevens · Jiaman Wu · Matthew Thompson · Elizabeth Campolongo · Chan Hee Song · David Carlyn · Li Dong · Wasila Dahdul · Charles Stewart · Tanya Berger-Wolf · Wei-Lun Chao · Yu Su
|
||
Dual-View Visual Contextualization for Web Navigation
Jihyung Kil · Chan Hee Song · Boyuan Zheng · Xiang Deng · Yu Su · Wei-Lun Chao
|
||
Learned representation-guided diffusion models for large-image generation
Alexandros Graikos · Srikar Yellapragada · Minh-Quan Le · Saarthak Kapse · Prateek Prasanna · Joel Saltz · Dimitris Samaras
|
||
Desigen: A Pipeline for Controllable Design Template Generation
Haohan Weng · Danqing Huang · YU QIAO · Hu Zheng · Chin-Yew Lin · Tong Zhang · C. L. Philip Chen
|
||
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Evonne Ng · Javier Romero · Timur Bagautdinov · Shaojie Bai · Trevor Darrell · Angjoo Kanazawa · Alexander Richard
|
||
No More Ambiguity in 360$^\circ$ Room Layout via Bi-Layout Estimation
Yu-Ju Tsai · Jin-Cheng Jhang · JINGJING ZHENG · Wei Wang · Albert Chen · Min Sun · Cheng-Hao Kuo · Ming-Hsuan Yang
|
||
Multiview Aerial Visual RECognition (MAVREC) Dataset: Can Multi-view Improve Aerial Visual Perception?
Aritra Dutta · Srijan Das · Jacob Nielsen · RAJATSUBHRA CHAKRABORTY · Mubarak Shah
|
||
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Jack Urbanek · Florian Bordes · Pietro Astolfi · Mary Williamson · Vasu Sharma · Adriana Romero-Soriano
|
||
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman · Andrew Westbury · Lorenzo Torresani · Kris Kitani · Jitendra Malik · Triantafyllos Afouras · Kumar Ashutosh · Vijay Baiyya · Siddhant Bansal · Bikram Boote · Eugene Byrne · Zachary Chavis · Joya Chen · Feng Cheng · Fu-Jen Chu · Sean Crane · Avijit Dasgupta · Jing Dong · Maria Escobar · Cristhian David Forigua Diaz · Abrham Gebreselasie · Sanjay Haresh · Jing Huang · Md Mohaiminul Islam · Suyog Jain · Rawal Khirodkar · Devansh Kukreja · Kevin Liang · Jia-Wei Liu · Sagnik Majumder · Yongsen Mao · Miguel Martin · Effrosyni Mavroudi · Tushar Nagarajan · Francesco Ragusa · Santhosh Kumar Ramakrishnan · Luigi Seminara · Arjun Somayazulu · Yale Song · Shan Su · Zihui Xue · Edward Zhang · Jinxu Zhang · Angela Castillo · Changan Chen · Fu Xinzhu · Ryosuke Furuta · Cristina González · Gupta · Jiabo Hu · Yifei Huang · Yiming Huang · Weslie Khoo · Anush Kumar · Robert Kuo · Sach Lakhavani · Miao Liu · Mi Luo · Zhengyi Luo · Brighid Meredith · Austin Miller · Oluwatumininu Oguntola · Xiaqing Pan · Penny Peng · Shraman Pramanick · Merey Ramazanova · Fiona Ryan · Wei Shan · Kiran Somasundaram · Chenan Song · Audrey Southerland · Masatoshi Tateno · Huiyu Wang · Yuchen Wang · Takuma Yagi · Mingfei Yan · Xitong Yang · Zecheng Yu · Shengxin Zha · Chen Zhao · Ziwei Zhao · Zhifan Zhu · Jeff Zhuo · Pablo ARBELAEZ · Gedas Bertasius · Dima Damen · Jakob Engel · Giovanni Maria Farinella · Antonino Furnari · Bernard Ghanem · Judy Hoffman · C.V. Jawahar · Richard Newcombe · Hyun Soo Park · James Rehg · Yoichi Sato · Manolis Savva · Jianbo Shi · Mike Zheng Shou · Michael Wray
|
||
Gaussian Shell Maps for Efficient 3D Human Generation
Rameen Abdal · Wang Yifan · Zifan Shi · Yinghao Xu · Ryan Po · Zhengfei Kuang · Qifeng Chen · Dit-Yan Yeung · Gordon Wetzstein
|
||
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
Sherwin Bahmani · Ivan Skorokhodov · Victor Rong · Gordon Wetzstein · Leonidas Guibas · Peter Wonka · Sergey Tulyakov · Jeong Joon Park · Andrea Tagliasacchi · David B. Lindell
|
||
Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment
Muhammad Sohail Danish · Muhammad Haris Khan · Muhammad Akhtar Munir · M. Sarfraz · Mohsen Ali
|
||
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh · Chih-Wei Wu · Iroro Orife · Kalayeh
|
||
MeshPose: Unifying DensePose and 3D Body Mesh reconstruction
Eric-Tuan Le · Antonios Kakolyris · Petros Koutras · Himmy Tam · Efstratios Skordos · George Papandreou · Riza Alp Guler · Iasonas Kokkinos
|
||
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Petru-Daniel Tudosiu · Yongxin Yang · Shifeng Zhang · Fei Chen · Steven McDonagh · Gerasimos Lampouras · Ignacio Iacobacci · Sarah Parisot
|
||
Taming Self-Training for Open-Vocabulary Object Detection
Shiyu Zhao · Samuel Schulter · Long Zhao · Zhixing Zhang · Vijay Kumar BG · Yumin Suh · Manmohan Chandraker · Dimitris N. Metaxas
|
||
Generating Enhanced Negatives for Training Language-Based Object Detectors
Shiyu Zhao · Long Zhao · Vijay Kumar BG · Yumin Suh · Dimitris N. Metaxas · Manmohan Chandraker · Samuel Schulter
|
||
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
Ziyang Chen · Israel D. Gebru · Christian Richardt · Anurag Kumar · William Laney · Andrew Owens · Alexander Richard
|
||
Practical Measurements of Translucent Materials with Inter-Pixel Translucency Prior
Zhenyu Chen · Jie Guo · Shuichang Lai · Ruoyu Fu · mengxun kong · Chen Wang · Hongyu Sun · Zhebin Zhang · Chen Li · Yanwen Guo
|
||
Unsupervised Universal Image Segmentation
Xudong Wang · Dantong Niu · Xinyang Han · Long Lian · Roei Herzig · Trevor Darrell
|
||
Gated Fields: Learning Scene Reconstruction from Gated Videos
Andrea Ramazzina · Stefanie Walz · Pragyan Dahal · Mario Bijelic · Felix Heide
|
||
FADES: Fair Disentanglement with Sensitive Relevance
Taeuk Jang · Xiaoqian Wang
|
||
MonoNPHM: Dynamic Head Reconstruction from Monocular Videos
Simon Giebenhain · Tobias Kirschstein · Markos Georgopoulos · Martin Rünz · Lourdes Agapito · Matthias Nießner
|
||
Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling
Baoquan Zhang · Huaibin Wang · Luo Chuyao · Xutao Li · Guotao liang · Yunming Ye · joeq · Yao He
|
||
TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding
Yun Liu · Haolin Yang · Xu Si · Ling Liu · Zipeng Li · Yuxiang Zhang · Yebin Liu · Li Yi
|
||
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation guided by the Characteristic Dance Primitives
Ronghui Li · Yuxiang Zhang · Yachao Zhang · Hongwen Zhang · Jie Guo · Yan Zhang · Yebin Liu · Xiu Li
|
||
ProxyCap: Real-time Monocular Full-body Capture in World Space via Human-Centric Proxy-to-Motion Learning
Yuxiang Zhang · Hongwen Zhang · Liangxiao Hu · Jiajun Zhang · Hongwei Yi · Shengping Zhang · Yebin Liu
|
||
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
Xin Huang · Ruizhi Shao · Qi Zhang · Hongwen Zhang · Ying Feng · Yebin Liu · Qing Wang
|
||
DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets
Harsh Rangwani · Pradipto Mondal · Mayank Mishra · Ashish Asokan · R. Venkatesh Babu
|
||
Validating Privacy-Preserving Face Recognition under a Minimum Assumption
Hui Zhang · Xingbo Dong · YenLungLai · Ying Zhou · Xiaoyan ZHANG · Xingguo Lv · Zhe Jin · Xuejun Li
|
||
Low-power, Continuous Remote Behavioral Localization with Event Cameras
Friedhelm Hamann · Suman Ghosh · Ignacio Juarez Martinez · Tom Hart · Alex Kacelnik · Guillermo Gallego
|
||
Multimodal Sense-Informed Prediction of 3D Human Motions
Zhenyu Lou · Qiongjie Cui · Haofan Wang · Xu Tang · Hong Zhou
|
||
MiKASA: Multi-Key-Anchor Scene-Aware Transformer for 3D Visual Grounding
Chun-Peng Chang · Shaoxiang Wang · Alain Pagani · Didier Stricker
|
||
LIVE: Online Large Video-Language Model for Streaming Video
Joya Chen · Zhaoyang Lv · Shiwei Wu · Kevin Qinghong Lin · Chenan Song · Difei Gao · Jia-Wei Liu · Ziteng Gao · Dongxing Mao · Mike Zheng Shou
|
||
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Yiran Qin · Enshen Zhou · Qichang Liu · Zhenfei Yin · Lu Sheng · Ruimao Zhang · Yu Qiao · Jing Shao
|
||
SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation
Jiaben Chen · Huaizu Jiang
|
||
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang · Zhulin An · Libo Huang · Junyu Bi · XinQiang Yu · Han Yang · boyu diao · Yongjun Xu
|
||
InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360-degree Neural Radiance Fields
Dongqing Wang · Tong Zhang · Alaa Abboud · Sabine Süsstrunk
|
||
Total-Decom: Decomposed 3D Scene Reconstruction with Minimal Interaction
Xiaoyang Lyu · Chirui Chang · Peng Dai · Yangtian Sun · Xiaojuan Qi
|
||
CAPE: CAM as a Probabilistic Ensemble for Enhanced DNN Interpretation
Townim Chowdhury · Kewen Liao · Vu Minh Hieu Phan · Minh-Son To · Yutong Xie · Kevin Hung · David Ross · Anton van den Hengel · Johan Verjans · Zhibin Liao
|
||
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
Tomas Soucek · Dima Damen · Michael Wray · Ivan Laptev · Josef Sivic
|
||
CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoor Object Detection from Multi-view Images
Guanlin Shen · Jingwei Huang · Zhihua Hu · Bin Wang
|
||
Generative 3D Part Assembly via Part-Whole-Hierarchy Message Passing
Bi'an Du · Xiang Gao · Wei Hu · Renjie Liao
|
||
LightIt: Illumination Modeling and Control for Diffusion Models
Peter Kocsis · Kalyan Sunkavalli · Julien Philip · Matthias Nießner · Yannick Hold-Geoffroy
|
||
Test-Time Zero-Shot Temporal Action Localization
Benedetta Liberatori · Alessandro Conti · Paolo Rota · Yiming Wang · Elisa Ricci
|
||
HIVE: Harnessing Human Feedback for Instructional Visual Editing
Shu Zhang · Xinyi Yang · Yihao Feng · Can Qin · Chia-Chih Chen · Ning Yu · Zeyuan Chen · Huan Wang · Silvio Savarese · Stefano Ermon · Caiming Xiong · Ran Xu
|
||
NeRF Director: Revisiting View Selection in Neural Volume Rendering
Wenhui Xiao · Rodrigo Santa Cruz · David Ahmedt-Aristizabal · Olivier Salvado · Clinton Fookes · Leo Lebrat
|
||
Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection
Yajing Liu · Shijun Zhou · Xiyao Liu · chunhui Hao · Baojie Fan · Jiandong Tian
|
||
Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles
Rui Song · Chenwei Liang · Hu Cao · Zhiran Yan · Walter Zimmer · Markus Gross · Andreas Festag · Alois Knoll
|
||
GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection
Xiaotian Li · Baojie Fan · Jiandong Tian · Huijie Fan
|
||
NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis
Nilesh Kulkarni · Davis Rempe · Kyle Genova · Abhijit Kundu · Justin Johnson · David Fouhey · Leonidas Guibas
|
||
Prompting Vision Foundation Models for Pathology Image Analysis
CHONG YIN · Siqi Liu · Kaiyang Zhou · Vincent Wong · Pong C. Yuen
|
||
XFibrosis: Explicit Vessel-Fiber Modeling for Fibrosis Staging from Liver Pathology Images
CHONG YIN · Siqi Liu · Fei Lyu · Jiahao Lu · Sune Darkner · Vincent Wong · Pong C. Yuen
|
||
The More You See in 2D, the More You Perceive in 3D
Xinyang Han · Zelin Gao · Angjoo Kanazawa · Shubham Goel · Yossi Gandelsman
|
||
LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation
Kibum Kim · Kanghoon Yoon · Jaehyeong Jeon · Yeonjun In · Jinyoung Moon · Donghyun Kim · Chanyoung Park
|
||
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
Imad Eddine Toubal · Aditya Avinash · Neil Alldrin · Jan Dlabal · Wenlei Zhou · Enming Luo · Otilia Stretcu · Hao Xiong · Chun-Ta Lu · Howard Zhou · Ranjay Krishna · Ariel Fuxman · Tom Duerig
|
||
LEOD: Label-Efficient Object Detection for Event Cameras
Ziyi Wu · Mathias Gehrig · Qing Lyu · Xudong Liu · Igor Gilitschenski
|
||
Producing and Leveraging Online Map Uncertainty in Trajectory Prediction
Xunjiang Gu · Guanyu Song · Igor Gilitschenski · Marco Pavone · Boris Ivanovic
|
||
SPAD: Spatially Aware Multiview Diffusers
Yash Kant · Aliaksandr Siarohin · Ziyi Wu · Michael Vasilkovsky · Guocheng Qian · Jian Ren · Riza Alp Guler · Bernard Ghanem · Sergey Tulyakov · Igor Gilitschenski
|
||
CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning
Hyuck Lee · Heeyoung Kim
|
||
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He · Hengduo Li · Young Kyun Jang · Menglin Jia · Xuefei Cao · Ashish Shah · Abhinav Shrivastava · Ser-Nam Lim
|
||
Real-World Efficient Blind Motion Deblurring via Blur Pixel Discretization
Insoo Kim · Jae Seok Choi · Geonseok Seo · Kinam Kwon · Jinwoo Shin · Hyong-Euk Lee
|
||
Towards More Unified In-context Visual Understanding
Dianmo Sheng · Dongdong Chen · Zhentao Tan · Qiankun Liu · Qi Chu · Jianmin Bao · Tao Gong · Bin Liu · Shengwei Xu · Nenghai Yu
|
||
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang · Dongdong Chen · Chong Luo · Bo He · Lu Yuan · Zuxuan Wu · Yu-Gang Jiang
|
||
PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF
Yutao Feng · Yintong Shang · Xuan Li · Tianjia Shao · Chenfanfu Jiang · Yin Yang
|
||
MuseChat: A Conversational Music Recommendation System for Videos
Zhikang Dong · Bin Chen · Xiulong Liu · Pawel Polak · Peng Zhang
|
||
Rotation-Agnostic Image Representation Learning for Digital Pathology
Saghir Alfasly · Abubakr Shafique · Peyman Nejat · Jibran Khan · Areej Alsaafin · Ghazal Alabtah · Hamid Tizhoosh
|
||
CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras
Sachin Shah · Matthew Chan · Haoming Cai · Jingxi Chen · Sakshum Kulshrestha · Chahat Deep Singh · Yiannis Aloimonos · Christopher Metzler
|
||
The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
Wenqi Jia · Miao Liu · Hao Jiang · Ishwarya Ananthabhotla · James Rehg · Vamsi Krishna Ithapu · Ruohan Gao
|
||
LEDITS++: Limitless Image Editing using Text-to-Image Models
Manuel Brack · Felix Friedrich · Katharina Kornmeier · Linoy Tsaban · Patrick Schramowski · Kristian Kersting · Apolinário Passos
|
||
MemFlow: Optical Flow Estimation and Prediction with Memory
Qiaole Dong · Yanwei Fu
|
||
SemCity: Semantic Scene Generation with Triplane Diffusion
Jumin Lee · Sebin Lee · Changho Jo · Woobin Im · Ju-hyeong Seon · Sung-Eui Yoon
|
||
UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes
David Rozenberszki · Or Litany · Angela Dai
|
||
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
Zhongwei Zhang · Fuchen Long · Yingwei Pan · Zhaofan Qiu · Ting Yao · Yang Cao · Tao Mei
|
||
Boosting Diffusion Models with Moving Average Sampling in Frequency Domain
Yurui Qian · Qi Cai · Yingwei Pan · Yehao Li · Ting Yao · Qibin Sun · Tao Mei
|
||
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
Zhikai Chen · Fuchen Long · Zhaofan Qiu · Ting Yao · Wengang Zhou · Jiebo Luo · Tao Mei
|
||
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
Rui Zhu · Yingwei Pan · Yehao Li · Ting Yao · Zhenglong Sun · Tao Mei · Chang-Wen Chen
|
||
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
Yang Chen · Yingwei Pan · haibo yang · Ting Yao · Tao Mei
|
||
Riemannian Multinomial Logistics Regression for SPD Neural Networks
Ziheng Chen · Yue Song · Gaowen Liu · Ramana Kompella · Xiaojun Wu · Nicu Sebe
|
||
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei · Tao Chen · Xiruo Jiang · 刘华峰 Liu · Zeren Sun · Yazhou Yao
|
||
Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching
Shitong Shao · Zeyuan Yin · Muxin Zhou · Xindong Zhang · Zhiqiang Shen
|
||
ICP-Flow: LiDAR Scene Flow Estimation with ICP
Yancong Lin · Holger Caesar
|
||
GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
Hao Li · Dingwen Zhang · Yalun Dai · Nian Liu · Lechao Cheng · Li Jingfeng · Jingdong Wang · Junwei Han
|
||
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
Simindokht Jahangard · Zhixi Cai · Shiki Wen · Hamid Rezatofighi
|
||
Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning
Menghao Zhang · Jingyu Wang · Qi Qi · Haifeng Sun · Zirui Zhuang · Pengfei Ren · Ruilong Ma · Jianxin Liao
|
||
MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision
Chenyangguang Zhang · Guanlong Jiao · Yan Di · Gu Wang · Ziqin Huang · Ruida Zhang · Fabian Manhardt · Bowen Fu · Federico Tombari · Xiangyang Ji
|
||
Exploring Region-Word Alignment in Built-in Detector for Open-Vocabulary Object Detection
Heng Zhang · Qiuyu Zhao · Linyu Zheng · Hao Zeng · Zhiwei Ge · Tianhao Li · Sulong Xu
|
||
CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers
Shahaf Arica · Or Rubin · Sapir Gershov · Shlomi Laufer
|
||
MoST: Motion Style Transformer between Diverse Action Contents
Boeun Kim · Jungho Kim · Hyung Jin Chang · Jin Young Choi
|
||
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
Junyi Ma · Xieyuanli Chen · Jiawei Huang · Jingyi Xu · Zhen Luo · Jintao Xu · Weihao Gu · Rui Ai · Hesheng Wang
|
||
Low-Resource Vision Challenges for Foundation Models
Yunhua Zhang · Hazel Doughty · Cees G. M. Snoek
|
||
AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring
Xintian Mao · Xiwen Gao · Yan Wang
|
||
KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation
Ruida Zhang · Chenyangguang Zhang · Yan Di · Fabian Manhardt · Xingyu Liu · Federico Tombari · Xiangyang Ji
|
||
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures
Zhangyang Xiong · Chenghong Li · Kenkun Liu · Hongjie Liao · Jianqiao HU · Junyi Zhu · Shuliang Ning · Lingteng Qiu · Chongjie Wang · Shijie Wang · Shuguang Cui · Xiaoguang Han
|
||
DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning
Sikai Bai · Jie ZHANG · Song Guo · Shuaicheng Li · Jingcai Guo · Jun Hou · Tao Han · Xiaocheng Lu
|
||
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
Yuxuan Zhang · Yiren Song · Jiaming Liu · Rui Wang · Jinpeng Yu · Hao Tang · Huaxia Li · Xu Tang · Yao Hu · Han Pan · Zhongliang Jing
|
||
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld
Yijun Yang · Tianyi Zhou · kanxue Li · Dapeng Tao · Lusong Li · Li Shen · Xiaodong He · Jing Jiang · Yuhui Shi
|
||
POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement Learning
Jiayi Guan · Li Shen · Ao Zhou · Lusong Li · Han Hu · Xiaodong He · Guang Chen · Changjun Jiang
|
||
Bayesian Differentiable Physics for Cloth Digitalization
Deshan Gong · Ningtao Mao · He Wang
|
||
Monocular Identity-Conditioned Facial Reflectance Reconstruction
Xingyu Ren · Jiankang Deng · Yuhao Cheng · Jia Guo · Chao Ma · Yichao Yan · Wenhan Zhu · Xiaokang Yang
|
||
Ensemble Diversity Facilitates Adversarial Transferability
Bowen Tang · Zheng Wang · Yi Bin · Qi Dou · Yang Yang · Heng Tao Shen
|
||
From a Bird’s Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
Zekun Qian · Ruize Han · Wei Feng · Song Wang
|
||
Accept the Modality Gap: An Exploration in the Hyperbolic Space
Sameera Ramasinghe · Violetta Shevchenko · Gil Avraham · Thalaiyasingam Ajanthan
|
||
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
Zan Wang · Yixin Chen · Baoxiong Jia · Puhao Li · Jinlu Zhang · Jingze Zhang · Tengyu Liu · Yixin Zhu · Wei Liang · Siyuan Huang
|
||
PHYSCENE: Physically Interactable 3D Scene Synthesis for Embodied AI
Yandan Yang · Baoxiong Jia · Peiyuan Zhi · Siyuan Huang
|
||
6D-Diff: A Keypoint Diffusion Framework for 6D Object Pose Estimation
Li Xu · Haoxuan Qu · Yujun Cai · Jun Liu
|
||
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Sanghyeok Lee · Joonmyung Choi · Hyunwoo J. Kim
|
||
DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models
Nastaran Saadati · Minh Pham · Nasla Saleem · Joshua R. Waite · Aditya Balu · Zhanhong Jiang · Chinmay Hegde · Soumik Sarkar
|
||
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
Zehuan Huang · Hao Wen · Junting Dong · Yaohui Wang · Yangguang Li · Xinyuan Chen · Yan-Pei Cao · Ding Liang · Yu Qiao · Bo Dai · Lu Sheng
|
||
A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint
Xiaofeng Cong · Jie Gui · Jing Zhang · Junming Hou · Hao Shen
|
||
A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation
Qucheng Peng · Ce Zheng · Chen Chen
|
||
OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition
Yuchen Pan · Junjun Jiang · Kui Jiang · Zhihao Wu · Keyuan Yu · Xianming Liu
|
||
Enhancing the Power of OOD Detection via Sample-Aware Model Selection
Feng Xue · Zi He · Yuan Zhang · Chuanlong Xie · Zhenguo Li · Falong Tan
|
||
SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation
Yanzhe Liu · Rong Chen · Yushi Li · Yixi Li · Xuehou Tan
|
||
Hearing Anything Anywhere
Mason Wang · Ryosuke Sawata · Samuel Clarke · Ruohan Gao · Shangzhe Wu · Jiajun Wu
|
||
MICap: A Unified Model for Identity-aware Movie Descriptions
Haran Raajesh · Naveen Reddy Desanur · Zeeshan Khan · Makarand Tapaswi
|
||
SD2Event: Self-supervised Learning of Dynamic Detectors and Contextual Descriptors for Event Cameras
Yuan Gao · Yuqing Zhu · Xinjun Li · Yimin Du · Tianzhu Zhang
|
||
Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models
Xingqian Xu · Jiayi Guo · Zhangyang Wang · Gao Huang · Irfan Essa · Humphrey Shi
|
||
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng · Yan Xie · Hao Zhang · Chiyu Chen · Zhengjue Wang · Bo Chen
|
||
Re-thinking Data Availability Attacks Against Deep Neural Networks
Bin Fang · Bo Li · Shuang Wu · Shouhong Ding · Ran Yi · Lizhuang Ma
|
||
Self-Supervised Facial Representation Learning with Facial Region Awareness
Zheng Gao · Ioannis Patras
|
||
Breathing Life Into Sketches Using Text-to-Video Priors
Rinon Gal · Yael Vinker · Yuval Alaluf · Amit H. Bermano · Daniel Cohen-Or · Ariel Shamir · Gal Chechik
|
||
Abductive Ego-View Accident Video Understanding for Safe Driving Perception
Jianwu Fang · Lei-lei Li · Junfei Zhou · Junbin Xiao · Hongkai Yu · Chen Lv · Jianru Xue · Tat-seng Chua
|
||
FedHCA$^2$: Towards Hetero-Client Federated Multi-Task Learning
Yuxiang Lu · Suizhi Huang · Yuwen Yang · Shalayiding Sirejiding · Yue Ding · Hongtao Lu
|
||
Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching
Lennart Bastian · Yizheng Xie · Nassir Navab · Zorah Lähner
|
||
Revamping Federated Learning Security from a Defender's Perspective: A Unified Defense with Homomorphic Encrypted Data Space
Naveen Kumar Kummari · Reshmi Mitra · Krishna Mohan Chalavadi
|
||
MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes
Bor Shiun Wang · Chien-Yi Wang · Wei-Chen Chiu
|
||
3D Feature Tracking via Event Camera
Siqi Li · Zhou Zhikuan · Zhou Xue · Yipeng Li · Shaoyi Du · Yue Gao
|
||
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling
Jianan Fan · Dongnan Liu · Hang Chang · Heng Huang · Mei Chen · Weidong Cai
|
||
Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens
Zhiwen Chen · Zhiyu Zhu · Yifan Zhang · Junhui Hou · Guangming Shi · Jinjian Wu
|
||
DETRs Beat YOLOs on Real-time Object Detection
Yian Zhao · Wenyu Lv · Shangliang Xu · Jinman Wei · Guanzhong Wang · Qingqing Dang · Yi Liu · Jie Chen
|
||
Neural Clustering based Visual Representation Learning
Guikun Chen · Xia Li · Yi Yang · Wenguan Wang
|
||
I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions
Chengfeng Zhao · Juze Zhang · Jiashen Du · Ziwei Shan · Junye Wang · Jingyi Yu · Jingya Wang · Lan Xu
|
||
AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond
Zixiang Zhou · Yu Wan · Baoyuan Wang
|
||
De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts
Yuzheng Wang · Dingkang Yang · Zhaoyu Chen · Yang Liu · Siao Liu · Wenqiang Zhang · Lihua Zhang · Lizhe Qi
|
||
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data
Mengqi Zhang · Yang Fu · Zheng Ding · Sifei Liu · Zhuowen Tu · Xiaolong Wang
|
||
Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts
Jiayi Chen · Benteng Ma · Hengfei Cui · Kwang-Ting Cheng · Yong Xia
|
||
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
Xiaozheng Zheng · Chao Wen · Zhuo Su · Zeran Xu · Zhaohu Li · Yang Zhao · Zhou Xue
|
||
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
Xinyao Li · Yuke Li · Zhekai Du · Fengling Li · Ke Lu · Jingjing Li
|
||
Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches
Qing Yu · Mikihiro Tanaka · Kent Fujiwara
|
||
Structure-Guided Adversarial Training of Diffusion Models
Ling Yang · Haotian Qian · Zhilong Zhang · Jingwei Liu · Bin CUI
|
||
MonoHair: High-Fidelity Hair Modeling from a Monocular Video
Keyu Wu · LINGCHEN YANG · Zhiyi Kuang · Yao Feng · Xutao Han · Yuefan Shen · Hongbo Fu · Kun Zhou · Youyi Zheng
|
||
CMA: A Chromaticity Map Adapter for Robust Detection of Screen-Recapture Document Images
Changsheng Chen · Liangwei Lin · Yongqi Chen · Bin Li · Jishen Zeng · Jiwu Huang
|
||
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song · Wenhao Chai · Guanhong Wang · Haoyang Zhou · Feiyang Wu · Yucheng Zhang · Tian Ye · Haozhe Chi · Xun Guo · Yanting Zhang · Yan Lu · Jenq-Neng Hwang · Gaoang Wang
|
||
Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning
Da-Wei Zhou · Hai-Long Sun · Han-Jia Ye · De-Chuan Zhan
|
||
Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
Mukul Khanna · Yongsen Mao · Hanxiao Jiang · Sanjay Haresh · Brennan Shacklett · Dhruv Batra · Alexander William Clegg · Eric Undersander · Angel Xuan Chang · Manolis Savva
|
||
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
Xiangyang Zhu · Renrui Zhang · Bowei He · Ziyu Guo · Jiaming Liu · Han Xiao · Chaoyou Fu · Hao Dong · Peng Gao
|
||
Brain Decodes Deep Nets
Huzheng Yang · James Gee · Jianbo Shi
|
||
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
Yabin Zhang · Wenjie Zhu · Hui Tang · Zhiyuan Ma · Kaiyang Zhou · Lei Zhang
|
||
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models
Yaofang Liu · Xiaodong Cun · Xuebo Liu · Xintao Wang · Yong Zhang · Haoxin Chen · Yang Liu · Tieyong Zeng · Raymond Chan · Ying Shan
|
||
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Yanhui Wang · Jianmin Bao · Wenming Weng · Ruoyu Feng · Dacheng Yin · Tao Yang · Jingxu Zhang · Qi Dai · Zhiyuan Zhao · Chunyu Wang · Kai Qiu · Yuhui Yuan · Xiaoyan Sun · Chong Luo · Baining Guo
|
||
PH-Net: Semi-Supervised Breast Lesion Segmentation via Patch-wise Hardness
Siyao Jiang · Huisi Wu · Junyang Chen · Qin Zhang · Jing Qin
|
||
Towards Text-guided 3D Scene Composition
Qihang Zhang · Chaoyang Wang · Aliaksandr Siarohin · Peiye Zhuang · Yinghao Xu · Ceyuan Yang · Dahua Lin · Bolei Zhou · Sergey Tulyakov · Hsin-Ying Lee
|
||
MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation
Xiaolong Deng · Huisi Wu · Runhao Zeng · Jing Qin
|
||
Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting
Haipeng Liu · Yang Wang · Biao Qian · Meng Wang · Yong Rui
|
||
SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
Nikhil Keetha · Jay Karhade · Krishna Murthy Jatavallabhula · Gengshan Yang · Sebastian Scherer · Deva Ramanan · Jonathon Luiten
|
||
Density-Guided Semi-Supervised 3D Semantic Segmentation with Dual-Space Hardness Sampling
Jianan Li · Qiulei Dong
|
||
From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation
Javier Tirado-Garín · Javier Civera
|
||
Visual Anagrams: Synthesizing Multi-View Optical Illusions with Diffusion Models
Daniel Geng · Inbum Park · Andrew Owens
|
||
SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology
Saarthak Kapse · Pushpak Pati · Srijan Das · Jingwei Zhang · Chao Chen · Maria Vakalopoulou · Joel Saltz · Dimitris Samaras · Rajarsi Gupta · Prateek Prasanna
|
||
Low-Latency Neural Stereo Streaming
Qiqi Hou · Qiqi Hou · Farzad Farhadzadeh · Amir Said · Guillaume Sautiere · Hoang Le
|
||
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye · Haiyang Xu · Jiabo Ye · Ming Yan · Anwen Hu · Haowei Liu · Qi Qian · Ji Zhang · Fei Huang · Fei Huang
|
||
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection
Huan Liu · Zichang Tan · Zichang Tan · Chuangchuang Tan · Yunchao Wei · Jingdong Wang · Yao Zhao
|
||
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan · Sibo Song · Wenwen Yu · Yuliang Liu · Wenqing Cheng · Fei Huang · Fei Huang · Xiang Bai · Cong Yao · Zhibo Yang
|
||
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Jingxuan Xu · Wuyang Chen · Yao Zhao · Yunchao Wei
|
||
LowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-based 3D Semantic Occupancy Prediction
Linqing Zhao · Xiuwei Xu · Ziwei Wang · Yunpeng Zhang · Borui Zhang · Wenzhao Zheng · Dalong Du · Jie Zhou · Jiwen Lu
|
||
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen · Zhuo Xu · Sean Kirmani · brian ichter · brian ichter · Dorsa Sadigh · Leonidas Guibas · Fei Xia
|
||
Taming Mode Collapse in Score Distillation for Text-to-3D Generation
Peihao Wang · Dejia Xu · Dejia Xu · Zhiwen Fan · Dilin Wang · Sreyas Mohan · Forrest Iandola · Rakesh Ranjan · Yilei Li · Qiang Liu · Zhangyang Wang · Vikas Chandra
|
||
HAVE-FUN: Human Avatar Reconstruction from Few-Shot Unconstrained Images
Xihe Yang · Xingyu Chen · Daiheng Gao · Daiheng Gao · Finn Wong · Xiaoguang Han · Baoyuan Wang
|
||
Sequential Modeling Enables Scalable Learning for Large Vision Models
Yutong Bai · Xinyang Geng · Xinyang Geng · Karttikeya Mangalam · Amir Bar · Alan L. Yuille · Trevor Darrell · Jitendra Malik · Alexei A. Efros
|
||
Distraction is All You Need: Memory-Efficient Image Immunization against Diffusion-Based Image Editing
Ling Lo · Cheng Yeo · Hong-Han Shuai · Hong-Han Shuai · Wen-Huang Cheng
|
||
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Yutao Hu · Yutao Hu · Tianbin · Quanfeng Lu · Wenqi Shao · Junjun He · Yu Qiao · Ping Luo
|
||
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Honghui Yang · Sha Zhang · Di Huang · Xiaoyang Wu · Haoyi Zhu · Tong He · SHIXIANG TANG · Hengshuang Zhao · Qibo Qiu · Binbin Lin · Xiaofei He · Xiaofei He · Wanli Ouyang
|
||
Semantics, Distortion, and Style Matter: Towards Source-free UDA for Panoramic Segmentation
Xu Zheng · Pengyuan Zhou · Pengyuan Zhou · ATHANASIOS · Lin Wang
|
||
In2SET: Intra-Inter Similarity Exploiting Transformer for Dual-Camera Compressive Hyperspectral Imaging
Xin Wang · Lizhi Wang · Xiangtian Ma · Maoqing Zhang · Zhu Lin · Lin Zhu · Hua Huang
|
||
MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling
Xuzhe Zhang · Yuhao Wu · Elsa Angelini · Ang Li · Jia Guo · Jerod Rasmussen · Thomas O'Connor · Pathik Wadhwa · Andrea Jackowski · Hai Li · Jonathan Posner · Andrew Laine · YUN WANG · Yun Wang
|
||
Seeing Motion at Nighttime with an Event Camera
Haoyue Liu · Shihan Peng · Lin Zhu · Zhu Lin · Yi Chang · Hanyu Zhou · Luxin Yan
|
||
Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image Restoration
Chen Zhao · Weiling Cai · Chenyu Dong · Chenyu Dong · Chengwei Hu
|
||
StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
Jeongho Kim · Gyojung Gu · Minho Park · Sunghyun Park · Jaegul Choo
|
||
EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
Hongxia Xie · Chu-Jun Peng · Yu-Wen Tseng · Hung-Jen Chen · Chan-Feng Hsu · Hong-Han Shuai · Hong-Han Shuai · Wen-Huang Cheng
|
||
SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
Xin Guo · Xin Guo · Jiangwei Lao · Bo Dang · Yingying Zhang · Lei Yu · Lixiang Ru · Liheng Zhong · Ziyuan Huang · Kang Wu · Dingxiang Hu · HUIMEI HE · Jian Wang · Jingdong Chen · Ming Yang · Yongjun Zhang · Yansheng Li
|
||
Unifying Top-down and Bottom-up Scanpath Prediction using Transformers
Zhibo Yang · Sounak Mondal · Seoyoung Ahn · Ruoyu Xue · Ruoyu Xue · Gregory Zelinsky · Minh Hoai · Dimitris Samaras
|
||
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Yunhao Ge · Yihe Tang · Jiashu Xu · Jiashu Xu · Cem Gokmen · Chengshu Li · Wensi Ai · Benjamin Martinez · Arman Aydin · Mona Anvari · Ayush Chakravarthy · Hong-Xing Yu · Josiah Wong · Sanjana Srivastava · Sharon Lee · Shengxin Zha · Laurent Itti · Yunzhu Li · Roberto Martín-Martín · Miao Liu · Pengchuan Zhang · Ruohan Zhang · Li Fei-Fei · Jiajun Wu
|
||
Event-based Visible and Infrared Fusion via Multi-task Collaboration
Mengyue Geng · Lin Zhu · Zhu Lin · Lizhi Wang · Wei Zhang · Ruiqin Xiong · Yonghong Tian
|
||
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai · Haotian Liu · Siva Mustikovela · Gregory P. Meyer · Yuning Chai · Dennis Park · Yong Jae Lee · Yong Jae Lee
|
||
OpenEQA: Embodied Question Answering in the Era of Foundation Models
Arjun Majumdar · Anurag Ajay · Xiaohan Zhang · Sriram Yenamandra · Mikael Henaff · Alexander Sax · Sneha Silwal · Paul McVay · Oleksandr Maksymets · Sergio Arnaud · Pranav Putta · Karmesh Yadav · Qiyang Li · Benjamin Newman · Mohit Sharma · Mohit Sharma · Vincent-Pierre Berges · Shiqi Zhang · Pulkit Agrawal · Dhruv Batra · Yonatan Bisk · Mrinal Kalakrishnan · Franziska Meier · Chris Paxton · Aravind Rajeswaran
|
||
BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection
Wenjie Wang · Yehao Lu · Guangcong Zheng · Shuigenzhan · Xiaoqing Ye · Zichang Tan · Zichang Tan · Jingdong Wang · Gaoang Wang · Xi Li
|
||
Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline
Xiao Wang · Shiao Wang · Chuanming Tang · Zhu Lin · Lin Zhu · Bo Jiang · Yonghong Tian · Jin Tang
|
||
Memory-based Adapters for Online 3D Scene Perception
Xiuwei Xu · Chong Xia · Ziwei Wang · Linqing Zhao · Linqing Zhao · Yueqi Duan · Jie Zhou · Jiwen Lu
|
||
MultiDiff: Consistent Novel View Synthesis from a Single Image
Norman Müller · Katja Schwarz · Katja Schwarz · Barbara Roessle · Lorenzo Porzi · Samuel Rota Bulò · Matthias Nießner · Peter Kontschieder
|
||
VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis
Linshan Wu · Linshan Wu · Jia-Xin Zhuang · Hao Chen
|
||
PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation
Ruining Deng · Quan Liu · Can Cui · Tianyuan Yao · Jialin Yue · Juming Xiong · Lining yu · Yifei Wu · Mengmeng Yin · Yu Wang · Shilin Zhao · Yucheng Tang · Haichun Yang · Yuankai Huo
|
||
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Chaoya Jiang · Haiyang Xu · Mengfan Dong · Jiaxing Chen · Wei Ye · Ming Yan · Qinghao Ye · Ji Zhang · Fei Huang · Fei Huang · Shikun Zhang
|
||
Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Shijie Zhou · Haoran Chang · Sicheng Jiang · Zhiwen Fan · Zehao Zhu · Dejia Xu · Dejia Xu · Pradyumna Chari · Suya You · Zhangyang Wang · Achuta Kadambi
|
||
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
Moreno D'Incà · Elia Peruzzo · Massimiliano Mancini · Dejia Xu · Vidit Goel · Xingqian Xu · Zhangyang Wang · Humphrey Shi · Nicu Sebe
|
||
PAIR Diffusion: A Comprehensive Multimodal Object-Level Image Editor
Vidit Goel · Elia Peruzzo · Yifan Jiang · Dejia Xu · Xingqian Xu · Nicu Sebe · Trevor Darrell · Zhangyang Wang · Humphrey Shi
|
||
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Xiaojun Hou · Jiazheng Xing · Yijie Qian · Yaowei Guo · Shuo Xin · Junhao Chen · Kai Tang · Mengmeng Wang · Mengmeng Wang · Zhengkai Jiang · Liang Liu · Yong Liu
|
||
SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike Stream
Lin Zhu · Zhu Lin · Kangmin Jia · Yifan Zhao · Yunshan Qi · Lizhi Wang · Hua Huang
|
||
Edit One for All: Interactive Batch Image Editing
Thao Nguyen · Utkarsh Ojha · Yuheng Li · Haotian Liu · Yong Jae Lee · Yong Jae Lee
|
||
H-ViT: A Hierarchical Vision Transformer for Deformable Image Registration
MORTEZA GHAHREMANI · Mohammad Khateri · Bailiang Jian · Benedikt Wiestler · Ehsan Adeli · Christian Wachinger
|
||
RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation
Peng Lu · Tao Jiang · Yining Li · Xiangtai Li · Kai Chen · Wenming Yang
|
||
Resolution Limit of Single-Photon LIDAR
Stanley H. Chan · Hashan K Weerasooriya · Weijian Zhang · Pamela Abshire · Istvan Gyongy · Robert Henderson
|
||
Neural Visibility Field for Uncertainty-Driven Active Mapping
Shangjie Xue · Jesse Dill · Pranay Mathur · Frank Dellaert · Panagiotis Tsiotras · Danfei Xu
|
||
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
Jiangbo Shi · Chen Li · Tieliang Gong · Yefeng Zheng · Huazhu Fu
|
||
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
Phuc Nguyen · Tuan Duc Ngo · Evangelos Kalogerakis · Chuang Gan · Anh Tran · Cuong Pham · Khoi Nguyen
|
||
FairCLIP: Harnessing Fairness in Vision-Language Learning
Yan Luo · MIN SHI · Muhammad Osama Khan · Muhammad Muneeb Afzal · Hao Huang · Shuaihang Yuan · Yu Tian · Luo Song · Ava Kouhana · Tobias Elze · Yi Fang · Mengyu Wang
|
||
Class Incremental Learning with Multi-Teacher Distillation
Haitao Wen · Lili Pan · Yu Dai · Heqian Qiu · Lanxiao Wang · Qingbo Wu · Hongliang Li
|
||
Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views
Ziwei Zhao · Yuchen Wang · Chuhua Wang
|
||
CaDeT: a Causal Disentanglement Approach for Robust Trajectory Prediction in Autonomous Driving
Mozhgan Pourkeshavarz · Junrui Zhang · Amir Rasouli
|
||
AlignMiF: Geometry-Aligned Multimodal Implicit Field for Enhanced LiDAR-Camera Joint Synthesis
Tao Tang · Guangrun Wang · Yixing Lao · Peng Chen · Jie Liu · Liang Lin · Kaicheng Yu · Xiaodan Liang
|
||
GOAT-Bench: A Benchmark for Multi-modal Lifelong Navigation
Mukul Khanna · Ram Ramrakhya · Gunjan Chhablani · Sriram Yenamandra · Theo Gervet · Matthew Chang · Zsolt Kira · Devendra Singh Chaplot · Dhruv Batra · Roozbeh Mottaghi
|
||
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem · Conor McCullough · Randy Hsin · Chas Leichner · Shan Li · In Suk Chong · Andrew Howard · Lukasz Lew · Sherief Reda · Ville-Mikko Rautio · Daniele Moro
|
||
BodyMAP - Jointly Predicting Body Mesh and 3D Applied Pressure Map for People in Bed
Abhishek Tandon · Anujraaj Goyal · Henry M. Clever · Zackory Erickson
|
||
Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models
Weiwei Cao · Jianpeng Zhang · Yingda Xia · Tony C. W. MOK · Zi Li · Xianghua Ye · Le Lu · Jian Zheng · Yuxing Tang · Ling Zhang
|
||
Boosting Neural Representations for Videos with a Conditional Decoder
XINJIE ZHANG · Ren Yang · Dailan He · Xingtong Ge · Tongda Xu · Yan Wang · Hongwei Qin · Jun Zhang
|
||
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
Junyi Zhang · Charles Herrmann · Junhwa Hur · Eric Chen · Varun Jampani · Deqing Sun · Ming-Hsuan Yang
|
||
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Minghan Li · Shuai Li · Xindong Zhang · Lei Zhang
|
||
Text-Guided 3D Face Synthesis - From Generation to Editing
Yunjie Wu · Yapeng Meng · Zhipeng Hu · Lincheng Li · Haoqian Wu · Kun Zhou · Weiwei Xu · Xin Yu
|
||
AssistGUI: Task-Oriented PC Graphical User Interface Automation
Difei Gao · Lei Ji · Zechen Bai · Mingyu Ouyang · Peiran Li · Dongxing Mao · Qin WU · Weichen Zhang · Peiyi Wang · Xiangwu Guo · Hengxu Wang · Luowei Zhou · Mike Zheng Shou
|
||
Don’t drop your samples! Coherence-aware training benefits Conditional diffusion
Nicolas Dufour · Victor Besnier · Vicky Kalogeiton · David Picard
|
||
3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions
Weijia Li · Haote Yang · Zhenghao Hu · Juepeng Zheng · Gui-Song Xia · Conghui He
|
||
Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization
Takuhiro Kaneko
|
||
Learning the 3D Fauna of the Web
Zizhang Li · Dor Litvak · Ruining Li · Yunzhi Zhang · Tomas Jakab · Christian Rupprecht · Shangzhe Wu · Andrea Vedaldi · Jiajun Wu
|
||
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Hongjie Wang · Bhishma Dedhia · Niraj Jha
|
||
Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions
Oindrila Saha · Grant Horn · Subhransu Maji
|
||
Towards Robust 3D Object Detection with LiDAR and 4D Radar Fusion in Various Weather Conditions
Yujeong Chae · Hyeonseong Kim · Kuk-Jin Yoon
|
||
SIGNeRF: Scene Integrated Generation for Neural Radiance Fields
Jan-Niklas Dihlmann · Andreas Engelhardt · Hendrik Lensch
|
||
Synergistic Global-space Camera and Human Reconstruction from Videos
Yizhou Zhao · Tuanfeng Y. Wang · Bhiksha Raj · Min Xu · Jimei Yang · Chun-Hao P. Huang
|
||
PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution
Honghao Chen · Xiangxiang Chu · Renyongjian · Xin Zhao · Kaiqi Huang
|
||
Towards Generalizing to Unseen Domains with Few Labels
Chamuditha Jayanga Galappaththige · Sanoojan Baliah · Malitha Gunawardhana · Muhammad Haris Khan
|
||
Towards Detailed and Robust 3D Clothed Human Reconstruction with High-Frequency and Low-Frequency Information of Parametric Body Models
Yifan Yang · Dong Liu · Shuhai Zhang · Zeshuai Deng · Zixiong Huang · Mingkui Tan
|
||
Snapshot Lidar: Fourier embedding of amplitude and phase for single-image depth reconstruction
Sarah Friday · Yunzi Shi · Yaswanth Kumar Cherivirala · Vishwanath Saragadam · Adithya Pediredla
|
||
Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-training via Differentiable Rendering of Line Segments
Yusuke Takimoto · Hikari Takehara · Hiroyuki Sato · Zihao Zhu · Bo Zheng
|
||
Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework
Ziyao Huang · Fan Tang · Yong Zhang · Xiaodong Cun · Juan Cao · Jintao Li · Tong-yee Lee
|
||
Learning Degradation Independent Representations for Camera ISP Pipelines
Yanhui Guo · Yanhui Guo · Fangzhou Luo · Xiaolin Wu
|
||
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Jihao Liu · Jinliang Zheng · Yu Liu · Hongsheng Li
|
||
MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors
He Zhang · Shenghao Ren · Haolei Yuan · Jianhui Zhao · Fan Li · Shuangpeng Sun · Zhenghao Liang · Tao Yu · Qiu Shen · Xun Cao
|
||
Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization
Ioanna Ntinou · Enrique Sanchez · Georgios Tzimiropoulos
|
||
Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes
YuJie Lu · Long Wan · Nayu Ding · Yulong Wang · Shuhan Shen · Shen Cai · Lin Gao
|
||
Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion
Sofia Casarin · Cynthia Ugwu · Sergio Escalera · Oswald Lanz
|
||
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
Yannan He · Garvita Tiwari · Tolga Birdal · Jan Lenssen · Gerard Pons-Moll
|
||
MMA-Diffusion: MultiModal Attack on Diffusion Models
Yijun Yang · Ruiyuan Gao · Xiaosen Wang · Tsung-Yi Ho · Xu Nan · Qiang Xu
|
||
Control4D: Efficient 4D Portrait Editing with Text
Ruizhi Shao · Jingxiang Sun · Cheng Peng · Zerong Zheng · Boyao ZHOU · Hongwen Zhang · Yebin Liu
|
||
ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion
Jiayu Yang · Ziang Cheng · Yunfei Duan · Pan Ji · Hongdong Li
|
||
Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks
Shin'ya Yamaguchi · Sekitoshi Kanai · Kazuki Adachi · Daiki Chijiwa
|
||
Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis
Zhan Li · Zhang Chen · Zhong Li · Yi Xu
|
||
LOTUS: Evasive and Resilient Backdoor Attacks through Sub-Partitioning
Siyuan Cheng · Guanhong Tao · Yingqi Liu · Guangyu Shen · Shengwei An · Shiwei Feng · Xiangzhe Xu · Kaiyuan Zhang · Shiqing Ma · Xiangyu Zhang
|
||
Long-Tailed Anomaly Detection with Learnable Class Names
Chih-Hui Ho · Kuan-Chuan Peng · Nuno Vasconcelos
|
||
Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
Haoxuanye Ji · Pengpeng Liang · Erkang Cheng
|
||
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
Shiming Chen · Wenjin Hou · Salman Khan · Fahad Shahbaz Khan
|
||
PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
Zining Chen · Weiqiu Wang · Zhicheng Zhao · Fei Su · Aidong Men · Hongying Meng
|
||
Estimating Noisy Class Posterior with Part-level Labels for Noisy Label Learning
Rui Zhao · Bin Shi · Jianfei Ruan · Tianze Pan · Bo Dong
|
||
Unmixing Diffusion for Self-Supervised Hyperspectral Image Denoising
Haijin Zeng · Jiezhang Cao · Yongyong Chen · Kai Zhang · Hiep Luong · Wilfried Philips
|
||
ManiFPT: Defining and Analyzing Fingerprints of Generative Models
Hae Jin Song · Mahyar Khayatkhoei · Wael AbdAlmageed
|
||
Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation
Jihyun Kim · Changjae Oh · Hoseok Do · Soohyun Kim · Kwanghoon Sohn
|
||
MaxQ: Multi-Axis Query for N:M Sparsity Network
Jingyang Xiang · Siqi Li · Junhao Chen · Zhuangzhi Chen · Tianxin Huang · Linpeng Peng · Yong Liu
|
||
Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
Mingyu Lee · Jongwon Choi
|
||
Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models
Pengze Zhang · Hubery Yin · Chen Li · Xiaohua Xie
|
||
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes
Myeongseob Ko · Feiyang Kang · Weiyan Shi · Ming Jin · Zhou Yu · Ruoxi Jia
|
||
Deep Equilibrium Diffusion Restoration with Parallel Sampling
Jiezhang Cao · Yue Shi · Kai Zhang · Yulun Zhang · Radu Timofte · Luc Van Gool
|
||
One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature Learning
Pei-Kai Huang · Cheng-Hsuan Chiang · Tzu-Hsien Chen · Jun-Xiong Chong · Tyng-Luh Liu · Chiou-Ting Hsu
|
||
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation
Qingping SUN · Yanjun Wang · Ailing Zeng · Wanqi Yin · Chen Wei · Wenjia Wang · Haiy Mei · Chi LEUNG · Ziwei Liu · Lei Yang · Zhongang Cai
|
||
Robust Distillation via Untargeted and Targeted Intermediate Adversarial Samples
Junhao Dong · Piotr Koniusz · Junxi Chen · Z. Wang · Yew-Soon Ong
|
||
Named Entity Driven Zero-Shot Image Manipulation
Zhida Feng · Li Chen · Jing Tian · Jiaxiang Liu · Shikun Feng
|
||
Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling
Jiawei Shi · Hui Deng · Yuchao Dai
|
||
Multiway Point Cloud Mosaicking with Diffusion and Global Optimization
Shengze Jin · Iro Armeni · Marc Pollefeys · Daniel Barath
|
||
IDGuard: Robust, General, Identity-centric POI Proactive Defense Against Face Editing Abuse
Yunshu Dai · Jianwei Fei · Fangjun Huang
|
||
Single-Model and Any-Modality for Video Object Tracking
Zongwei Wu · Jilai Zheng · Xiangxuan Ren · Florin-Alexandru Vasluianu · Chao Ma · Danda Paudel · Luc Van Gool · Radu Timofte
|
||
iKUN: Speak to Trackers without Retraining
Yunhao Du · Cheng Lei · Zhicheng Zhao · Fei Su
|
||
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Xingqun Qi · Jiahao Pan · Peng Li · Ruibin Yuan · Xiaowei Chi · Mengfei Li · Wenhan Luo · Wei Xue · Shanghang Zhang · Qifeng Liu · Yike Guo
|
||
Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
Wenhao Tang · Fengtao ZHOU · Sheng Huang · Xiang Zhu · Yi Zhang · Bo Liu
|
||
SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction
Conghao Wong · Beihao Xia · Ziqian Zou · Yulong Wang · Xinge You
|
||
Pose-Transformed Equivariant Network for 3D Point Trajectory Prediction
Ruixuan Yu · Jian Sun
|
||
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Minkuk Kim · Hyeon Bae Kim · Jinyoung Moon · Jinwoo Choi · Seong Tae Kim
|
||
Object Dynamics Modeling with Hierarchical Point Cloud-based Representations
Chanho Kim · Li Fuxin
|
||
TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Zirui Wang · Zhizhou Sha · Zheng Ding · Yilin Wang · Zhuowen Tu
|
||
Efficient Hyperparameter Optimization with Adaptive Fidelity Identification
Jiantong Jiang · Zeyi Wen · Atif Mansoor · Ajmal Mian
|
||
MESA: Matching Everything by Segmenting Anything
Yesheng Zhang · Xu Zhao
|
||
SIRA: Scalable Inter-frame Relation and Association for Radar Perception
Ryoma Yataka · Pu (Perry) Wang · Petros Boufounos · Ryuhei Takahashi
|
||
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Yuwen Xiong · Zhiqi Li · Yuntao Chen · Feng Wang · Xizhou Zhu · Jiapeng Luo · Wenhai Wang · Tong Lu · Hongsheng Li · Yu Qiao · Lewei Lu · Jie Zhou · Jifeng Dai
|
||
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen · Nina Shvetsova · Andrew Rouditchenko · Daniel Kondermann · Samuel Thomas · Shih-Fu Chang · Rogerio Feris · James Glass · Hilde Kuehne
|
||
Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features
Youngmin Chung · Ji Hun Ha · Kyeong Chan Im · Joo Sang Lee
|
||
2S-UDF: A Novel Two-stage UDF Learning Method for Robust Non-watertight Model Reconstruction from Multi-view Images
Junkai Deng · Fei Hou · Xuhui Chen · Wencheng Wang · Ying He
|
||
Continuous Optical Zooming: A Benchmark for Arbitrary-Scale Image Super-Resolution in Real World
Huiyuan Fu · Fei Peng · Xianwei Li · Yejun Li · Xin Wang · Huadong Ma
|
||
ViewDiff: 3D-Consistent Image Generation with Text-To-Image Models
Lukas Hoellein · Aljaž Božič · Norman Müller · David Novotny · Hung-Yu Tseng · Christian Richardt · Michael Zollhoefer · Matthias Nießner
|
||
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Tianyu Yu · Yuan Yao · Haoye Zhang · Taiwen He · Yifeng Han · Ganqu Cui · Jinyi Hu · Zhiyuan Liu · Hai-Tao Zheng · Maosong Sun
|
||
General Point Model Pretraining with Autoencoding and Autoregressive
Zhe Li · Zhangyang Gao · Cheng Tan · Bocheng Ren · Laurence Yang · Stan Z. Li
|
||
Learning Instance-Aware Correspondences for Robust Multi-Instance Point Cloud Registration in Cluttered Scenes
Zhiyuan Yu · Zheng Qin · lintao zheng · Kai Xu
|
||
Fine-grained Prototypical Voting with Heterogeneous Mixup for Semi-supervised 2D-3D Cross-modal Retrieval
Fan Zhang · Xian-Sheng Hua · Chong Chen · Xiao Luo
|
||
GaussianAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh
Jing Wen · Xiaoming Zhao · Jason Ren · Alexander G. Schwing · Shenlong Wang
|
||
Person in Place: Generating Associative Skeleton-Guidance Maps for Human-Object Interaction Image Editing
ChangHee Yang · ChanHee Kang · Kyeongbo Kong · Hanni Oh · Suk-Ju Kang
|
||
Correlation-Decoupled Knowledge Distillation for Multimodal Sentiment Analysis with Incomplete Modalities
Mingcheng Li · Dingkang Yang · Xiao Zhao · Shuaibing Wang · Yan Wang · Kun Yang · Mingyang Sun · Dongliang Kou · Qian · Lihua Zhang
|
||
ES$^3$: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
Yuanhang Zhang · Shuang Yang · Shiguang Shan · Xilin Chen
|
||
MSU-4S - The Michigan State University Four Seasons Dataset
Daniel Kent · Mohammed Alyaqoub · Xiaohu Lu · Sayed Khatounabadi · Kookjin Sung · Cole Scheller · Alexander Dalat · Xinwei Guo · Asma Bin Thabit · Roberto Muntaner Whitley · Hayder Radha
|
||
Estimating Extreme 3D Image Rotations using Cascaded Attention
Shay Dekel · Yosi Keller · Martin Čadík
|
||
Taming Stable Diffusion for Text to 360$^{\circ}$ Panorama Image Generation
Cheng Zhang · Qianyi Wu · Camilo Cruz Gambardella · Xiaoshui Huang · Dinh Phung · Wanli Ouyang · Jianfei Cai
|
||
Towards Progressive Multi-Frequency Representation for Image Warping
Jun Xiao · Zihang Lyu · Cong Zhang · Yakun Ju · Changjian Shui · Kin-man Lam
|
||
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
Jaskirat Singh · Jianming Zhang · Qing Liu · Cameron Smith · Zhe Lin · Liang Zheng
|
||
Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
Oren Kraus · Kian Kenyon-Dean · Saber Saberian · Maryam Fallah · Peter McLean · Jess Leung · Vasudev Sharma · Ayla Khan · Jia Balakrishnan · Safiye Celik · Dominique Beaini · Maciej Sypetkowski · Chi Cheng · Kristen Morse · Maureen Makes · Ben Mabey · Berton Earnshaw
|
||
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Fanghua Yu · Jinjin Gu · Zheyuan Li · Jinfan Hu · Xiangtao Kong · Xintao Wang · Jingwen He · Yu Qiao · Chao Dong
|
||
VOODOO 3D: VOlumetric pOrtrait Disentanglement fOr Online 3D head reenactment
Phong Tran · Egor Zakharov · Long Nhat Ho · Anh Tran · Liwen Hu · Hao Li
|
||
Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning
Christopher Liao · Theodoros Tsiligkaridis · Brian Kulis
|
||
Masked and Shuffled Blind Spot Denoising for Real-World Images
Hamadi Chihaoui · Paolo Favaro
|
||
NARUTO: Neural Active Reconstruction from Uncertain Target Observations
Ziyue Feng · Huangying Zhan · Zheng Chen · Qingan Yan · Xiangyu Xu · Changjiang Cai · Bing Li · Qilun Zhu · Yi Xu
|
||
DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
Linqi Zhou · Andy Shih · Chenlin Meng · Stefano Ermon
|
||
ConsistDreamer: 3D-Consistent 2D Diffusion for High-Fidelity Scene Editing
Jun-Kun Chen · Samuel Rota Bulò · Norman Müller · Lorenzo Porzi · Peter Kontschieder · Yu-Xiong Wang
|
||
Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
Yuhao Liu · Zhanghan Ke · Fang Liu · Nanxuan Zhao · Rynson W.H. Lau
|
||
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu · Guandao Yang · Zhibing Li · Kai Zhang · Ziwei Liu · Leonidas Guibas · Dahua Lin · Gordon Wetzstein
|
||
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images
Zixiong Huang · Qi Chen · Libo Sun · Yifan Yang · Naizhou Wang · Qi Wu · Mingkui Tan
|
||
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Xianghui Yang · Gil Avraham · Yan Zuo · Sameera Ramasinghe · Loris Bazzani · Anton van den Hengel
|
||
MoCha-Stereo: Motif Channel Attention Network for Stereo Matching
Ziyang Chen · Wei Long · He Yao · Yongjun Zhang · Bingshu Wang · Yongbin Qin · Jia Wu
|
||
Discovering Syntactic Interaction Clues for Human-Object Interaction Detection
Jinguo Luo · Weihong Ren · Weibo Jiang · Xi'ai Chen · Qiang Wang · Zhi Han · Honghai LIU
|
||
Active Prompt Learning in Vision Language Models
Jihwan Bang · Sumyeong Ahn · Jae-Gil Lee
|
||
FINER: Flexible spectral-bias tuning in Implicit NEural Representation by Variable-periodic Activation Functions
Zhen Liu · Hao Zhu · Qi Zhang · Jingde Fu · Weibing Deng · Zhan Ma · Yanwen Guo · Xun Cao
|
||
Open-World Human-Object Interaction Detection via Multi-modal Prompts
Jie Yang · Bingliang Li · Ailing Zeng · Lei Zhang · Ruimao Zhang
|
||
Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture
Juanwu Lu · Can Cui · Yunsheng Ma · Aniket Bera · Ziran Wang
|
||
VidToMe: Video Token Merging for Zero-Shot Video Editing
Xirui Li · Chao Ma · Xiaokang Yang · Ming-Hsuan Yang
|
||
Text-image Alignment for Diffusion-based Perception
Neehar Kondapaneni · Markus Marks · Manuel Knott · Rogério Guimarães · Pietro Perona
|
||
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
Jimyeong Kim · Jungwon Park · Wonjong Rhee
|
||
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Yazhou Xing · Yingqing He · Zeyue Tian · Xintao Wang · Qifeng Chen
|
||
Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline
Yu chen · Fei Gao · YanguangZhang · Maoying Qiao · Nannan Wang
|
||
SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks
Yaxu Xie · Alain Pagani · Didier Stricker
|
||
When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach
TAO MA · Bing Bai · Haozhe Lin · Heyuan Wang · Yu Wang · Lin Luo · Lu Fang
|
||
Mitigating Motion Blur in Neural Radiance Fields with Events and Frames
Marco Cannici · Davide Scaramuzza
|
||
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun · Ye Fang · Tong Wu · Pan Zhang · Yuhang Zang · Shu Kong · Yuanjun Xiong · Dahua Lin · Jiaqi Wang
|
||
Discriminability-Driven Channel Selection for Out-of-Distribution Detection
Yue Yuan · Rundong He · Yicong Dong · Zhongyi Han · Yilong Yin
|
||
DemoFusion: Democratising High-Resolution Image Generation With No $$$
Ruoyi DU · Dongliang Chang · Timothy Hospedales · Yi-Zhe Song · Zhanyu Ma
|
||
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Seokju Yun · Youngmin Ro
|
||
Makeup Prior Models for 3D Facial Makeup Estimation and Applications
Xingchao Yang · Takafumi Taketomi · Yuki Endo · Yoshihiro Kanamori
|
||
Holoported Characters: Real-time Free-viewpoint Rendering of Humans from Sparse RGB Cameras
Ashwath Shetty · Marc Habermann · Guoxing Sun · Diogo Luvizon · Vladislav Golyanik · Christian Theobalt
|
||
Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
Youtian Lin · Zuozhuo Dai · Siyu Zhu · Yao Yao
|
||
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Yusheng Dai · HangChen · Jun Du · Ruoyu Wang · shihao chen · Haotian Wang · Chin-Hui Lee
|
||
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
Changhoon Kim · Kyle Min · Maitreya Patel · Sheng Cheng · 'YZ' Yezhou Yang
|
||
MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection
Jakub Micorek · Horst Possegger · Dominik Narnhofer · Horst Bischof · Mateusz Kozinski
|
||
MuRF: Multi-Baseline Radiance Fields
Haofei Xu · Anpei Chen · Yuedong Chen · Christos Sakaridis · Yulun Zhang · Marc Pollefeys · Andreas Geiger · Fisher Yu
|
||
Resource-Efficient Transformer Pruning for Finetuning of Large Models
Fatih Ilhan · Gong Su · Selim Tekin · Tiansheng Huang · Sihao Hu · Ling Liu
|
||
Referring Image Editing: Object-level Image Editing via Referring Expressions
Chang Liu · Xiangtai Li · Henghui Ding
|
||
Cloud-Device Collaborative Learning for Multimodal Large Language Models
Guanqun Wang · Jiaming Liu · Chenxuan Li · Yuan Zhang · Ma Junpeng · Xinyu Wei · Kevin Zhang · Maurice Chong · Renrui Zhang · Yijiang Liu · Shanghang Zhang
|
||
TeMO: Towards Text-Driven 3D Stylization for Multi-Object Meshes
Xuying Zhang · Bo-Wen Yin · yuming chen · Zheng Lin · Yunheng Li · Qibin Hou · Ming-Ming Cheng
|
||
Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model
Runmin Dong · Shuai Yuan · Bin Luo · Mengxuan Chen · Jinxiao Zhang · Lixian Zhang · Weijia Li · Juepeng Zheng · Haohuan Fu
|
||
X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition
Shuofeng Sun · Yongming Rao · Jiwen Lu · Haibin Yan
|
||
Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
Marianna Ohanyan · Hayk Manukyan · Zhangyang Wang · Shant Navasardyan · Humphrey Shi
|
||
HOIAnimator: Text-Prompt Human-Object Animations Generation with Perceptive Diffusion Models
Wenfeng Song · Xinyu Zhang · Shuai Li · Yang Gao · Aimin Hao · Xia HOU · Chenglizhao Chen · Ning Li · Hong Qin
|
||
BiPer: Binary Neural Networks using a Periodic Function
Edwin Vargas · Claudia Correa · Carlos Hinojosa · Henry Arguello
|
||
Learning Intra-view and Cross-view Geometric Knowledge for Stereo Matching
Rui Gong · Weide Liu · ZAIWANG GU · Xulei Yang · Jun Cheng
|
||
How Far Can We Compress Instant NGP-Based NeRF?
Yihang Chen · Qianyi Wu · Mehrtash Harandi · Jianfei Cai
|
||
Putting the Object Back into Video Object Segmentation
Ho Kei Cheng · Seoung Wug Oh · Brian Price · Joon-Young Lee · Alexander G. Schwing
|
||
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Hao Ouyang · Qiuyu Wang · Yuxi Xiao · Qingyan Bai · Juntao Zhang · Kecheng Zheng · Xiaowei Zhou · Qifeng Chen · Yujun Shen
|
||
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Ruyang Liu · Chen Li · Yixiao Ge · Thomas H. Li · Ying Shan · Ge Li
|
||
Towards 3D Vision with Low-Cost Single-Photon Cameras
Fangzhou Mu · Carter Sifferman · Sacha Jungerman · Yiquan Li · Zhiyue Han · Michael Gleicher · Mohit Gupta · Yin Li
|
||
Lane2Seq: Towards Unified Lane Detection via Sequence Generation
Kunyang Zhou
|
||
Dual Prior Unfolding for Snapshot Compressive Imaging
Jiancheng Zhang · Haijin Zeng · Jiezhang Cao · Yongyong Chen · Dengxiu Yu · Yinping Zhao
|
||
Predicated Diffusion: Predicate Logic-Based Attention Guidance for Text-to-Image Diffusion Models
Kota Sueyoshi · Takashi Matsubara
|
||
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min · Shyamal Buch · Arsha Nagrani · Minsu Cho · Cordelia Schmid
|
||
Color Shift Estimation-and-Correction for Image Enhancement
Yiyu Li · Ke Xu · Gerhard Hancke · Rynson W.H. Lau
|
||
Dexterous Grasp Transformer
Guo-Hao Xu · Yi-Lin Wei · Dian Zheng · Xiao-Ming Wu · Wei-Shi Zheng
|
||
Posterior Distillation Sampling
Juil Koo · Chanho Park · Minhyuk Sung
|
||
Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction
Inhwan Bae · Junoh Lee · Hae-Gon Jeon
|
||
C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
Juntao Zhang · Yuehuai LIU · Yu-Wing Tai · Chi-Keung Tang
|
||
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu · Ran Xu · Senqiao Yang · Renrui Zhang · Qizhe Zhang · Zehui Chen · Yandong Guo · Shanghang Zhang
|
||
TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
Sherry X Chen · Yaron Vaxman · Elad Ben Baruch · David Asulin · Aviad Moreshet · Kuo-Chin Lien · Misha Sra · Pradeep Sen
|
||
HOI-M$^3$: Capture Multiple Humans and Objects Interaction within Contextual Environment
Juze Zhang · Jingyan Zhang · Zining Song · Zhanhe Shi · Chengfeng Zhao · Ye Shi · Jingyi Yu · Lan Xu · Jingya Wang
|
||
PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation
Jinfeng Xu · Siyuan Yang · Xianzhi Li · Yuan Tang · yixue Hao · Long Hu · Min Chen
|
||
Higher-order Relational Reasoning for Pedestrian Trajectory Prediction
Sungjune Kim · Hyung-gun Chi · Hyerin Lim · Karthik Ramani · Jinkyu Kim · Sangpil Kim
|
||
DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
Zeeshan Hayder · Xuming He
|
||
Discover and Mitigate Multiple Biased Subgroups in Image Classifiers
Zeliang Zhang · Mingqian Feng · Zhiheng Li · Chenliang Xu
|
||
Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection
Jongha Kim · Jihwan Park · Jinyoung Park · Jinyoung Kim · Sehyung Kim · Hyunwoo J. Kim
|
||
S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
Xingyi Li · Zhiguo Cao · Yizheng Wu · Kewei Wang · Ke Xian · Zhe Wang · Guosheng Lin
|
||
Referring Expression Counting
Siyang Dai · Jun Liu · Ngai-Man Cheung
|
||
SeD: Semantic-Aware Discriminator for Image Super-Resolution
Bingchen Li · Xin Li · Hanxin Zhu · YEYING JIN · Ruoyu Feng · Zhizheng Zhang · Zhibo Chen
|
||
Robust Emotion Recognition in Context Debiasing
Dingkang Yang · Kun Yang · Mingcheng Li · Shunli Wang · Shuaibing Wang · Lihua Zhang
|
||
Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans
Romain Loiseau · Elliot Vincent · Mathieu Aubry · Loic Landrieu
|
||
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
Tianrui Lou · Xiaojun Jia · Jindong Gu · Li Liu · Siyuan Liang · Bangyan He · Xiaochun Cao
|
||
TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video
Minye Wu · Zehao Wang · Georgios Kouros · Tinne Tuytelaars
|
||
On Scaling up a Multilingual Vision and Language Model
Xi Chen · Josip Djolonga · Piotr Padlewski · Basil Mustafa · Soravit Changpinyo · Jialin Wu · Carlos Riquelme Ruiz · Sebastian Goodman · Xiao Wang · Yi Tay · Siamak Shakeri · Mostafa Dehghani · Daniel Salz · Mario Lučić · Michael Tschannen · Arsha Nagrani · Hexiang Hu · Mandar Joshi · Bo Pang · Ceslee Montgomery · Paulina Pietrzyk · Marvin Ritter · AJ Piergiovanni · Matthias Minderer · Filip Pavetic · Austin Waters · Gang Li · Ibrahim Alabdulmohsin · Lucas Beyer · Julien Amelot · Kenton Lee · Andreas Steiner · Yang Li · Daniel Keysers · Anurag Arnab · Yuanzhong Xu · Keran Rong · Alexander Kolesnikov · Mojtaba Seyedhosseini · Anelia Angelova · Xiaohua Zhai · Neil Houlsby · Radu Soricut
|
||
Understanding Video Transfomers via Universal Concept Discovery
Matthew Kowal · Achal Dave · Rares Andrei Ambrus · Adrien Gaidon · Kosta Derpanis · Pavel Tokmakov
|
||
Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms
Joren Brunekreef · Eric Marcus · Ray Sheombarsing · Jan-Jakob Sonke · Jonas Teuwen
|
||
Generative Proxemics: A Prior for 3D Social Interaction from Images
Lea Müller · Vickie Ye · Georgios Pavlakos · Michael J. Black · Angjoo Kanazawa
|
||
3DToonify: Creating Your High-Fidelity 3D Stylized Avatar Easily from 2D Portrait Images
Yifang Men · Hanxi Liu · Yuan Yao · Miaomiao Cui · Xuansong Xie · Zhouhui Lian
|
||
Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding
Alessandro Achille · Greg Ver Steeg · Tian Yu Liu · Matthew Trager · Carson Klingenberg · Stefano Soatto
|
||
DIOD: Self-Distillation Meets Object Discovery
Sandra Kara · Hejer AMMAR · Julien Denize · Florian Chabot · Quoc Cuong PHAM
|
||
Amodal Completion via Progressive Mixed Context Diffusion
Katherine Xu · Lingzhi Zhang · Jianbo Shi
|
||
CAT-Seg: Cost Aggregation for Open-vocabulary Semantic Segmentation
Seokju Cho · Heeseong Shin · Sunghwan Hong · Anurag Arnab · Paul Hongsuck Seo · Seungryong Kim
|
||
FaceLift: Semi-supervised 3D Facial Landmark Localization
David Ferman · Pablo Garrido · Gaurav Bharaj
|
||
Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation
Xiaohan Lei · Min Wang · Wengang Zhou · Li Li · Houqiang Li
|
||
PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis
Zhengyao Lv · Yuxiang Wei · Wangmeng Zuo · Kwan-Yee K. Wong
|
||
OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning
Geng Xinyu · Jiaming Wang · Jiawei Gong · yuerong xue · Jun Xu · Fanglin Chen · Xiaolin Huang
|
||
MorpheuS: Neural Dynamic 360$^{\circ}$ Surface Reconstruction from Monocular RGB-D Video
Hengyi Wang · Jingwen Wang · Lourdes Agapito
|
||
FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders
Soumen Basu · Mayuna Gupta · Chetan Madan · Pankaj Gupta · Chetan Arora
|
||
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles
Jiawei Zhang · Chejian Xu · Bo Li
|
||
WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Youngdong Jang · Dong In Lee · MinHyuk Jang · Jong Wook Kim · Feng Yang · Sangpil Kim
|
||
SnAG: Scalable and Accurate Video Grounding
Fangzhou Mu · SICHENG MO · Yin Li
|
||
PolarRec: Improving Radio Interferometric Data Reconstruction Using Polar Coordinates
Ruoqi Wang · Zhuoyang Chen · Jiayi Zhu · Qiong Luo · Feng Wang
|
||
Learning to Predict Activity Progress by Self-Supervised Video Alignment
Gerard Donahue · Ehsan Elhamifar
|
||
PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation
Zhenyu Li · Shariq Bhat · Peter Wonka
|
||
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie · Shengye Yu · Qile He · Mengtian Li
|
||
Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
Chang Liu · Haoning Wu · Yujie Zhong · Xiaoyun Zhang · Yanfeng Wang · Weidi Xie
|
||
NeLF-Pro: Neural Light Field Probes for Multi-Scale Novel View Synthesis
Zinuo You · Andreas Geiger · Anpei Chen
|
||
Fourier-basis functions to bridge augmentation gap: Rethinking frequency augmentation in image classification
Puru Vaish · Shunxin Wang · Nicola Strisciuglio
|
||
Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation
Ba Hung Ngo · Nhat-Tuong Do-Tran · Tuan-Ngoc Nguyen · Hae-Gon Jeon · Tae Jong Choi
|
||
Region-Based Representations Revisited
Michal Shlapentokh-Rothman · Ansel Blume · Yao Xiao · Yuqun Wu · Sethuraman T V · Heyi Tao · Jae Yong Lee · Wilfredo Torres-Calderon · Yu-Xiong Wang · Derek Hoiem
|
||
FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures
Lisa Mais · Peter Hirsch · Claire Managan · Ramya Kandarpa · Josef Rumberger · Annika Reinke · Lena Maier-Hein · Gudrun Ihrke · Dagmar Kainmueller
|
||
Can I Trust Your Answer? Visually Grounded Video Question Answering
Junbin Xiao · Angela Yao · Yicong Li · Tat-seng Chua
|
||
SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers
Ioannis Kakogeorgiou · Spyros Gidaris · Konstantinos Karantzalos · Nikos Komodakis
|
||
ToNNO: Tomographic Reconstruction of a Neural Network’s Output for Weakly Supervised Segmentation of 3D Medical Images
Marius Schmidt-Mengin · Alexis Benichoux · Shibeshih Belachew · Nikos Komodakis · Nikos Paragios
|
||
Physics-aware Hand-object Interaction Denoising
Haowen Luo · Yunze Liu · Li Yi
|
||
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
Trong-Thuan Nguyen · Pha Nguyen · Khoa Luu
|
||
Disentangled Pre-training for Human-Object Interaction Detection
Zhuolong Li · Xingao Li · Changxing Ding · Xiangmin Xu
|
||
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Hao Wu · Huabin Liu · Yu Qiao · Xiao Sun
|
||
CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
Haoran Lai · Qingsong Yao · Zihang Jiang · Rongsheng Wang · Zhiyang He · Xiaodong Tao · S Kevin Zhou
|
||
$\textbf{LaRE}^2$: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection
Yunpeng Luo · Junlong Du · Ke Yan · Shouhong Ding
|
||
Deep-TROJ: An Inference Stage Trojan Insertion Algorithm through Efficient Weight Replacement Attack
Sabbir Ahmed · RANYANG ZHOU · Shaahin Angizi · Adnan Rakin Rakin
|
||
ProMotion: Prototypes As Motion Learners
Yawen Lu · Dongfang Liu · Qifan Wang · Cheng Han · Yiming Cui · Zhiwen Cao · Xueling Zhang · Yingjie Victor Chen · Heng Fan
|
||
Zero-Shot Structure-Preserving Diffusion Model for High Dynamic Range Tone Mapping
Ruoxi Zhu · Shusong Xu · Peiye Liu · Sicheng Li · Yanheng Lu · Dimin Niu · Zihao Liu · Zihao Meng · Li Zhiyong · Xinhua Chen · Yibo Fan
|
||
Mask Grounding for Referring Image Segmentation
Yong Xien Chng · Henry Zheng · Yizeng Han · Xuchong QIU · Gao Huang
|
||
SignGraph: A Sign Sequence is Worth Graphs of Nodes
Shiwei Gan · Yafeng Yin · Zhiwei Jiang · Hongkai Wen · Lei Xie · Sanglu Lu
|
||
HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction
Yi ZHOU · Hui Zhang · Jiaqian Yu · yifan yang · Sangil Jung · Seung-In Park · ByungIn Yoo
|
||
$V_kD:$ Improving knowledge distillation using orthogonal projections
Roy Miles · Ismail Elezi · Jiankang Deng
|
||
DiLiGenRT: A Photometric Stereo Dataset with Quantified Roughness and Translucency
Heng Guo · Jieji Ren · Feishi Wang · Boxin Shi · Mingjun Ren · Yasuyuki Matsushita
|
||
Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation
Fahimeh Hosseini Noohdani · Parsa Hosseini · Aryan Yazdan Parast · Hamidreza Araghi · Mahdieh Baghshah
|
||
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
Haomiao Ni · Bernhard Egger · Suhas Lohit · Anoop Cherian · Ye Wang · Toshiaki Koike-Akino · Sharon X. Huang · Tim Marks
|
||
Coherence As Texture -- Passive Textureless 3D Reconstruction by Self-interference
Wei-Yu Chen · Aswin C. Sankaranarayanan · Anat Levin · Matthew O’Toole
|
||
Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation
Feilong Tang · Zhongxing Xu · Zhaojun QU · Wei Feng · xingjian jiang · Zongyuan Ge
|
||
Space-time Diffusion Features for Zero-shot Text-driven Motion Transfer
Rafail Fridman · Danah Yatim · Omer Bar-Tal · Yoni Kasten · Tali Dekel
|
||
GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians
Shenhan Qian · Tobias Kirschstein · Liam Schoneveld · Davide Davoli · Simon Giebenhain · Matthias Nießner
|
||
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Dave Chen · Haoxuan Li · Hsin-Ying Lee · Sergey Tulyakov · Matthias Nießner
|
||
CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing
Ajian Liu · Shuai Xue · Gan Jianwen · Jun Wan · Yanyan Liang · Jiankang Deng · Sergio Escalera · Zhen Lei
|
||
Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration
Hong Chen · Pei Yan · sihe xiang · Yihua Tan
|
||
Retrieval-Augmented Open-Vocabulary Object Detection
Jooyeon Kim · Eulrang Cho · Sehyung Kim · Hyunwoo J. Kim
|
||
NTO3D: Neural Target Object 3D Reconstruction with Segment Anything
Xiaobao Wei · Renrui Zhang · Jiarui Wu · Jiaming Liu · Ming Lu · Yandong Guo · Shanghang Zhang
|
||
MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models
Yanting Wang · Hongye Fu · Wei Zou · Jinyuan Jia
|
||
Mudslide: A Universal Nuclear Instance Segmentation Method
Jun Wang
|
||
Long-Tail Class Incremental Learning via Independent Sub-prototype Construction
Xi Wang · Xu Yang · jie yin · Kun Wei · Cheng Deng
|
||
Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World
Wen Yin · Jian Lou · Pan Zhou · Yulai Xie · Dan Feng · Yuhua Sun · Tailai Zhang · Lichao Sun
|
||
G3DR: Generative 3D Reconstruction in ImageNet
Pradyumna Reddy · Ismail Elezi · Jiankang Deng
|
||
Evidential Active Recognition: Intelligent and Prudent Open-World Embodied Perception
Lei Fan · Mingfu Liang · Yunxuan Li · Gang Hua · Ying Wu
|
||
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Lingyi Hong · Shilin Yan · Renrui Zhang · Wanyun Li · Xinyu Zhou · Pinxue Guo · Kaixun Jiang · Yiting Cheng · Jinglun Li · Zhaoyu Chen · Wenqiang Zhang
|
||
Diffusion Time-step Curriculum for One Image to 3D Generation
YI Xuanyu · Zike Wu · Qingshan Xu · Pan Zhou · Joo Lim · Hanwang Zhang
|
||
Harnessing Meta-Learning for Improving Full-Frame Video Stabilization
Muhammad Kashif Ali · Eun Woo Im · Dongjin Kim · Tae Hyun Kim
|
||
SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects
Abhinav Kumar · Yuliang Guo · Xinyu Huang · Liu Ren · Xiaoming Liu
|
||
PEM: Prototype-based Efficient MaskFormer for Image Segmentation
Niccolò Cavagnero · Gabriele Rosi · Claudia Cuttano · Francesca Pistilli · Marco Ciccone · Giuseppe Averta · Fabio Cermelli
|
||
Make Pixels Dance: High-Dynamic Video Generation
Yan Zeng · Guoqiang Wei · Jiani Zheng · Jiaxin Zou · Yang Wei · Yuchen Zhang · Hang Li
|
||
Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation
Hyunwoo Ryu · Jiwoo Kim · Hyunseok An · Junwoo Chang · Joohwan Seo · Taehan Kim · Yubin Kim · Chaewon Hwang · Jongeun Choi · Roberto Horowitz
|
||
BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
Wenqian Zhang · Molin Huang · Yuxuan Zhou · Juze Zhang · Jingyi Yu · Jingya Wang · Lan Xu
|
||
A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
Julio Silva-Rodríguez · Sina Hajimiri · Ismail Ben Ayed · Jose Dolz
|
||
DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos
Arjun Balasingam · Joseph Chandler · Chenning Li · Zhoutong Zhang · Hari Balakrishnan
|
||
Can Biases in ImageNet Models Explain Generalization?
Paul Gavrikov · Janis Keuper
|
||
Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline
Anas Al-lahham · Muhammad Zaigham Zaheer · Nurbek Tastan · Karthik Nandakumar
|
||
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection
Boyang Peng · Sanqing Qu · Yong Wu · Tianpei Zou · Lianghua He · Alois Knoll · Guang Chen · Changjun Jiang
|
||
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Yangyang Guo · Guangzhi Wang · Mohan Kankanhalli
|
||
Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios
Shiyan Chen · Jiyuan Zhang · Zhaofei Yu · Tiejun Huang
|
||
Teeth-SEG: An Efficient Instance Segmentation Framework for Orthodontic Treatment based on Anthropic Prior Knowledge
Bo Zou · Shaofeng Wang · Hao Liu · Gaoyue Sun · Yajie Wang · Zuo FeiFei · Chengbin Quan · Youjian Zhao
|
||
Multi-Level Neural Scene Graphs for Dynamic Urban Environments
Tobias Fischer · Lorenzo Porzi · Samuel Rota Bulò · Marc Pollefeys · Peter Kontschieder
|
||
Differentiable Display Photometric Stereo
Seokjun Choi · Seungwoo Yoon · Giljoo Nam · Seungyong Lee · Seung-Hwan Baek
|
||
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Zhen Li · Mingdeng Cao · Xintao Wang · Zhongang Qi · Ming-Ming Cheng · Ying Shan
|
||
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
Zike Wu · Pan Zhou · YI Xuanyu · Xiaoding Yuan · Hanwang Zhang
|
||
Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
Qilong Zhangli · Jindong Jiang · Di Liu · Licheng Yu · Xiaoliang Dai · Ankit Ramchandani · Guan Pang · Dimitris N. Metaxas · Praveen Krishnan
|
||
Improving Spectral Snapshot Reconstruction with Spectral-Spatial Rectification
Jiancheng Zhang · Haijin Zeng · Yongyong Chen · Dengxiu Yu · Yinping Zhao
|
||
D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection
Dinh Phat Do · Taehoon Kim · JAEMIN NA · Jiwon Kim · Keonho LEE · Kyunghwan Cho · Wonjun Hwang
|
||
MAGICK: A Large-scale Captioned Dataset from Matting Generated Images using Chroma Keying
Ryan Burgert · Brian Price · Jason Kuen · Yijun Li · Michael Ryoo
|
||
Self-Supervised Multi-Object Tracking with Path Consistency
Zijia Lu · Bing Shuai · Yanbei Chen · Zhenlin Xu · Davide Modolo
|
||
3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling
Chaokang Jiang · Guangming Wang · Jiuming Liu · Hesheng Wang · Zhuang Ma · Zhenqiang Liu · LIANG · Yi Shan · Dalong Du
|
||
SPECAT: SPatial-spEctral Cumulative-Attention Transformer for High-Resolution Hyperspectral Image Reconstruction
Zhiyang Yao · Shuyang Liu · Xiaoyun Yuan · Lu Fang
|
||
Boosting Image Restoration via Priors from Pre-trained Models
Xiaogang Xu · Shu Kong · Tao Hu · Zhe Liu · Hujun Bao
|
||
Online Task-Free Continual Generative and Discriminative Learning via Dynamic Cluster Memory
飞 叶 · Adrian Bors
|
||
CPR-Coach: Recognizing Composite Error Actions based on Single-class Training
Shunli Wang · Shuaibing Wang · Dingkang Yang · Mingcheng Li · Haopeng Kuang · Xiao Zhao · Liuzhen Su · Peng Zhai · Lihua Zhang
|
||
DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting
Demin Yu · Xutao Li · Yunming Ye · Baoquan Zhang · Luo Chuyao · Kuai Dai · wangrui · Chenxunlai
|
||
Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption
Nobuhiko Wakai · Satoshi Sato · Yasunori Ishii · Takayoshi Yamashita
|
||
Rethinking Boundary Discontinuity Problem for Oriented Object Detection
Hang Xu · Xinyuan Liu · Haonan Xu · Yike Ma · Zunjie Zhu · Chenggang Yan · Feng Dai
|
||
Restoration by Generation with Constrained Priors
Zheng Ding · Xuaner Zhang · Zhuowen Tu · Zhihao Xia
|
||
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
Dongjin Kim · Sung Jin Um · Sangmin Lee · Jung Uk Kim
|
||
EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection
Xuanyu Zhang · Runyi Li · Jiwen Yu · Youmin Xu · Weiqi Li · Jian Zhang
|
||
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve · Fan Fei · Jayakrishnan Unnikrishnan · Son Dinh Tran · Benjamin Yao · Belinda Zeng · Mubarak Shah · Trishul Chilimbi
|
||
OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning
Noor Ahmed · Anna Kukleva · Bernt Schiele
|
||
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
Zhi Gao · Yuntao Du. · Xintong Zhang · Xiaojian Ma · Wenjuan Han · Song-Chun Zhu · Qing Li
|
||
NeuRAD: Neural Rendering for Autonomous Driving
Adam Tonderski · Carl Lindström · Georg Hess · William Ljungbergh · Lennart Svensson · Christoffer Petersson
|
||
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
Dazhong Shen · Guanglu Song · Zeyue Xue · Fu-Yun Wang · Yu Liu
|
||
Poly Kernel Inception Network for Remote Sensing Detection
Xinhao Cai · Qiuxia Lai · Yuwei Wang · Wenguan Wang · Zeren Sun · Yazhou Yao
|
||
One-dimensional Adapter to Rule Them All: Concepts, Diffusion Models and Erasing Applications
Mengyao Lyu · Yuhong Yang · Haiwen Hong · Hui Chen · Xuan Jin · Yuan He · Hui Xue · Jungong Han · Guiguang Ding
|
||
Learned Lossless Image Compression based on Bit Plane Slicing
Zhe Zhang · Huairui Wang · Zhenzhong Chen · Shan Liu
|
||
BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning
Hongwei Zheng · Linyuan Zhou · Han Li · Jinming Su · Xiaoming Wei · Xu Xiaoming
|
||
HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces
Haithem Turki · Vasu Agrawal · Samuel Rota Bulò · Lorenzo Porzi · Peter Kontschieder · Deva Ramanan · Michael Zollhoefer · Christian Richardt
|
||
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
Ziyu Wang · Yue Xu · Cewu Lu · Yonglu Li
|
||
NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows
Zhenggang Tang · Jason Ren · Xiaoming Zhao · Bowen Wen · Jonathan Tremblay · Stan Birchfield · Alexander G. Schwing
|
||
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
Suraj Patni · Aradhye Agarwal · Chetan Arora
|
||
Prompt-Driven Referring Image Segmentation with Instance Contrasting
Chao Shang · Zichen Song · Heqian Qiu · Lanxiao Wang · Fanman Meng · Hongliang Li
|
||
CosmicMan: A Text-to-Image Foundation Model for Humans
Shikai Li · Jianglin Fu · Kaiyuan Liu · Wentao Wang · Kwan-Yee Lin · Wayne Wu
|
||
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen · Yuyuan Liu · Hu Wang · Fengbei Liu · Chong Wang · Helen Frazer · Gustavo Carneiro
|
||
PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
Yutong Xie · Qi Chen · Sinuo Wang · Minh-Son To · Iris Lee · Ee Win Khoo · Kerolos Hendy · Daniel Koh · Yong Xia · Qi Wu
|
||
Towards Robust Learning to Optimize with Theoretical Guarantees
Qingyu Song · Wei Lin · Juncheng Wang · Hong Xu
|
||
Language-conditioned Detection Transformer
Jang Hyun Cho · Philipp Krähenbühl
|
||
Distilled Datamodel with Reverse Gradient Matching
Jingwen Ye · Ruonan Yu · Songhua Liu · Xinchao Wang
|
||
NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging
Takahiro Shirakawa · Seiichi Uchida
|
||
Digital Life Project: Autonomous 3D Characters with Social Intelligence
Zhongang Cai · Jianping Jiang · Zhongfei Qing · Xinying Guo · Mingyuan Zhang · Zhengyu Lin · Haiy Mei · Chen Wei · Wang Ruisi · Wanqi Yin · Liang Pan · Xiangyu Fan · Han Du · Peng Gao · Zhitao Yang · Yang Gao · Jiaqi Li · Tianxiang Ren · YuKun Wei · Xiaogang Wang · Chen Change Loy · Lei Yang · Ziwei Liu
|
||
Object Recognition as Next Token Prediction
Kaiyu Yue · Bor-Chun Chen · Jonas Geiping · Hengduo Li · Tom Goldstein · Ser-Nam Lim
|
||
Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning
Yiwen Ye · Yutong Xie · Jianpeng Zhang · Ziyang Chen · Qi Wu · Yong Xia
|
||
A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark
Jakub Paplham · Vojtech Franc
|
||
SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation
Zhixuan Liu · Peter Schaldenbrand · Beverley-Claire Okogwu · Wenxuan Peng · Youngsik Yun · Andrew Hundt · Jihie Kim · Jean Oh
|
||
Virtual Immunohistochemistry Staining for Histological Images Assisted by Weakly-supervised Learning
Jiahan Li · Jiuyang Dong · Shenjin Huang · Xi Li · Junjun Jiang · Xiaopeng Fan · Yongbing Zhang
|
||
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
Razvan Pasca · Alexey Gavryushin · Muhammad Hamza · Yen-Ling Kuo · Kaichun Mo · Luc Van Gool · Otmar Hilliges · Xi Wang
|
||
CoralSCOP: Segment any COral Image on this Planet
Zheng Ziqiang · Liang Haixin · Binh-Son Hua · Tim, Yue Him Wong · Put ANG · Apple CHUI · Sai-Kit Yeung
|
||
Learn from View Correlation: An Anchor Enhancement Strategy for Multi-view Clustering
Suyuan Liu · KE LIANG · Zhibin Dong · Siwei Wang · Xihong Yang · sihang zhou · En Zhu · Xinwang Liu
|
||
Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging
Bhargav Ghanekar · Salman Siddique Khan · Pranav Sharma · Shreyas Singh · Vivek Boominathan · Kaushik Mitra · Ashok Veeraraghavan
|
||
Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
Nicolae Ristea · Florinel Croitoru · Radu Tudor Ionescu · Marius Popescu · Fahad Shahbaz Khan · Mubarak Shah
|
||
You Only Need Less Attention Each Stage in Vision Transformers
Shuoxi Zhang · Hanpeng Liu · Stephen Lin · Kun He
|
||
Self-Calibrating Vicinal Risk Minimisation for Model Calibration
Jiawei Liu · Changkun Ye · Ruikai Cui · Nick Barnes
|
||
Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection
Chuangchuang Tan · Huan Liu · Yao Zhao · Shikui Wei · Guanghua Gu · Ping Liu · Yunchao Wei
|
||
Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception
Haoming Chen · Zhizhong Zhang · Yanyun Qu · Ruixin Zhang · Xin Tan · Yuan Xie
|
||
Learning from One Continuous Video Stream
Joao Carreira · Michael King · Viorica Patraucean · Dilara Gokay · Catalin Ionescu · Yi Yang · Daniel Zoran · Joseph Heyward · Carl Doersch · Yusuf Aytar · Dima Damen · Andrew Zisserman
|
||
CDFormer: When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution
Qingguo Liu · Chenyi Zhuang · Pan Gao · Jie Qin
|
||
GenesisTex: Adapting Image Denoising Diffusion to Texture Space
Chenjian Gao · Boyan Jiang · Xinghui Li · YingPeng Zhang · Qian Yu
|
||
TEA: Test-time Energy Adaptation
Yige Yuan · Bingbing Xu · Liang Hou · Fei Sun · Huawei Shen · Xueqi Cheng
|
||
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Pingping Zhang · Yuhao Wang · Yang Liu · Zhengzheng Tu · Huchuan Lu
|
||
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Jieming Cui · Tengyu Liu · Nian Liu · Yaodong Yang · Yixin Zhu · Siyuan Huang
|
||
$M^3$-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection
Bin Pu · Liwen Wang · Jiewen Yang · He Guannan · Xingbo Dong · Shengli Li · Ying Tan · Ming Chen · Zhe Jin · Kenli Li · Xiaomeng Li
|
||
Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)
Tsu-Ching Hsiao · Hao-Wei Chen · Hsuan-Kung Yang · Chun-Yi Lee
|
||
QUADify: Extracting Meshes with Pixel-level Details and Materials from Images
Maximilian Frühauf · Hayko Riemenschneider · Markus Gross · Christopher Schroers
|
||
Improving Unsupervised Hierarchical Representation with Reinforcement Learning
Ruyi An · Yewen Li · Xu He · Pengjie Gu · Mengchen Zhao · Dong Li · Jianye Hao · Bo An · Chaojie Wang · Mingyuan Zhou
|
||
All Rivers Run to the Sea: Private Learning with Asymmetric Flows
Yue Niu · Ramy E. Ali · Saurav Prakash · Salman Avestimehr
|
||
SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering
Tao Hu · Fangzhou Hong · Ziwei Liu
|
||
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li · Mingxu Zhang · Yiran Geng · Haoran Geng · Yuxing Long · Yan Shen · Renrui Zhang · Jiaming Liu · Hao Dong
|
||
LASO: Language-guided Affordance Segmentation on 3D Object
Yicong Li · Na Zhao · Junbin Xiao · Chun Feng · Xiang Wang · Tat-seng Chua
|
||
Dispersed Structured Light for Hyperspectral 3D Imaging
Suhyun Shin · Seokjun Choi · Felix Heide · Seung-Hwan Baek
|
||
DualAD: Disentangling the Dynamic and Static World for End-to-End Driving
Simon Doll · Niklas Hanselmann · Lukas Schneider · Richard Schulz · Marius Cordts · Markus Enzweiler · Hendrik Lensch
|
||
Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
Takami Sato · Justin Yue · Nanze Chen · Ningfei Wang · Alfred Chen
|
||
FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation
Chris Rockwell · Nilesh Kulkarni · Linyi Jin · Jeong Joon Park · Justin Johnson · David Fouhey
|
||
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
Chenshuang Zhang · Fei Pan · Junmo Kim · In So Kweon · Chengzhi Mao
|
||
BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition
Yuxuan Zhou · Xudong Yan · Zhi-Qi Cheng · Yan Yan · Qi Dai · Xian-Sheng Hua
|
||
Language-only Training of Zero-shot Composed Image Retrieval
Geonmo Gu · Sanghyuk Chun · Wonjae Kim · Yoohoon Kang · Sangdoo Yun
|
||
Any-Shift Prompting for Generalization over Distributions
Zehao Xiao · Jiayi Shen · Mohammad Mahdi Derakhshani · Shengcai Liao · Cees G. M. Snoek
|
||
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
Kumaranage Ravindu Nagasinghe · Honglu Zhou · Malitha Gunawardhana · Martin Renqiang Min · Daniel Harari · Muhammad Haris Khan
|
||
Time-, Memory- and Parameter-Efficient Visual Adaptation
Otniel-Bogdan Mercea · Alexey Gritsenko · Cordelia Schmid · Anurag Arnab
|
||
Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion
Su Sun · Henry Zhao · Yuliang Guo · Ruoyu Wang · Xinyu Huang · Yingjie Victor Chen · Liu Ren
|
||
Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
Ke Fan · Zechen Bai · Tianjun Xiao · Tong He · Max Horn · Yanwei Fu · Francesco Locatello · Zheng Zhang
|
||
Perceptual-Oriented Video Frame Interpolation Via Asymmetric Synergistic Blending
Guangyang Wu · Xin Tao · Changlin Li · Wenyi Wang · Xiaohong Liu · Qingqing Zheng
|
||
Exact Fusion via Feature Distribution Matching for Few-shot Image Generation
Yingbo Zhou · Yutong Ye · Pengyu Zhang · Xian Wei · Mingsong Chen
|
||
iToF-flow-based High Frame Rate Depth Imaging
Yu Meng · Zhou Xue · Xu Chang · Xuemei Hu · Tao Yue
|
||
Revisiting Counterfactual Problems in Referring Expression Comprehension
Zhihan Yu · Ruifan Li
|
||
Improving Generalized Zero-Shot Learning by Exploring the Diverse Semantics from External Class Names
Yapeng Li · Yong Luo · Zengmao Wang · Bo Du
|
||
Continual Motion Prediction Learning Framework via Meta-Representation Learning and Optimal Memory Buffer Retention Strategy
Dae Jun Kang · Dongsuk Kum · Sanmin Kim
|
||
Detector-Free Structure from Motion
Xingyi He · Jiaming Sun · Yifan Wang · Sida Peng · Qixing Huang · Hujun Bao · Xiaowei Zhou
|
||
DiVAS: Video and Audio Synchronization with Dynamic Frame Rates
Clara Maria Fernandez Labrador · Mertcan Akcay · Eitan Abecassis · Joan Massich · Christopher Schroers
|
||
Material Palette: Extraction of Materials from a Single Image
Ivan Lopes · Fabio Pizzati · Raoul de Charette
|
||
Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos
Chen Liu · Peike Li · Qingtao Yu · Hongwei Sheng · Dadong Wang · Lincheng Li · Xin Yu
|
||
Towards Accurate and Robust Architectures via Neural Architecture Search
Yuwei Ou · Yuqi Feng · Yanan Sun
|
||
C$^\text{2}$RV: Cross-Regional and Cross-View Learning for Sparse-View CBCT Reconstruction
Yiqun Lin · Jiewen Yang · hualiang wang · Xinpeng Ding · Wei Zhao · Xiaomeng Li
|
||
Open-Vocabulary Video Anomaly Detection
Peng Wu · Xuerong Zhou · Guansong Pang · Yujia Sun · Jing Liu · Peng Wang · Yanning Zhang
|
||
Language Model Guided Interpretable Video Action Reasoning
Ning Wang · Guangming Zhu · Hongsheng Li · Liang Zhang · Syed Afaq Ali Shah · Mohammed Bennamoun
|
||
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework
Vu Minh Hieu Phan · Yutong Xie · Yuankai Qi · Lingqiao Liu · Liyang Liu · Bowen Zhang · Zhibin Liao · Qi Wu · Minh-Son To · Johan Verjans
|
||
Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields
Leili Goli · Cody Reading · Silvia Sellán · Alec Jacobson · Andrea Tagliasacchi
|
||
LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP
Yunshi HUANG · Fereshteh Shakeri · Jose Dolz · Malik Boudiaf · Houda Bahig · Ismail Ben Ayed
|
||
Building Optimal Neural Architectures using Interpretable Knowledge
Keith Mills · Fred Han · Mohammad Salameh · Shengyao Lu · CHUNHUA ZHOU · Jiao He · Fengyu Sun · Di Niu
|
||
IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM
Minghao Yin · Shangzhe Wu · Kai Han
|
||
Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping
Alex Costanzino · Pierluigi Zama Ramirez · Giuseppe Lisanti · Luigi Di Stefano
|
||
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Yuelang Xu · Benwang Chen · Zhe Li · Hongwen Zhang · Lizhen Wang · Zerong Zheng · Yebin Liu
|
||
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
Jun Wang · Yuzhe Qin · Kaiming Kuang · Yigit Korkmaz · Akhilan Gurumoorthy · Hao Su · Xiaolong Wang
|
||
Improving Plasticity in Online Continual Learning via Collaborative Learning
Maorong Wang · Nicolas Michel · Ling Xiao · Toshihiko Yamasaki
|
||
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
Xudong Wang · Ishan Misra · Ziyun Zeng · Rohit Girdhar · Trevor Darrell
|
||
HIT: Estimating Internal Human Implicit Tissues from the Body Surface
Marilyn Keller · Vaibhav ARORA · Abdelmouttaleb Dakri · Shivam Chandhok · Jürgen Machann · Andreas Fritsche · Michael J. Black · Sergi Pujades
|
||
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
Jia-Wei Liu · Yan-Pei Cao · Jay Zhangjie Wu · Weijia Mao · Yuchao Gu · Rui Zhao · Jussi Keppo · Ying Shan · Mike Zheng Shou
|
||
Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios
Jie Xu · Yazhou Ren · Xiaolong Wang · Lei Feng · Zheng Zhang · Gang Niu · Xiaofeng Zhu
|
||
Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations
Rui Zhao · Ruiqin Xiong · Jing Zhao · Jian Zhang · Xiaopeng Fan · Zhaofei Yu · Tiejun Huang
|
||
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
Fei Wang · Dan Guo · Kun Li · Zhun Zhong · Meng Wang
|
||
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
Yuqing Wen · Yucheng Zhao · Yingfei Liu · Fan Jia · Yanhui Wang · Chong Luo · Chi Zhang · Tiancai Wang · Xiaoyan Sun · Xiangyu Zhang
|
||
What Moves Together Belongs Together
Jenny Seidenschwarz · Aljoša Ošep · Francesco Ferroni · Simon Lucey · Laura Leal-Taixe
|
||
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
Jin Yang · Ping Wei · Huan Li · Ziyang Ren
|
||
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi · Skanda Koppula · Shreya Pathak · Justin Chiu · Joseph Heyward · Viorica Patraucean · Jiajun Shen · Antoine Miech · Andrew Zisserman · Aida Nematzadeh
|
||
NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images
Yufei Han · Heng Guo · Koki Fukai · Hiroaki Santo · Boxin Shi · Fumio Okura · Zhanyu Ma · Yunpeng Jia
|
||
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Fei Deng · Qifei Wang · Wei Wei · Tingbo Hou · Matthias Grundmann
|
||
NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation
Minh-Tuan Tran · Trung Le · Xuan-May Le · Mehrtash Harandi · Quan Tran · Dinh Phung
|
||
Domain-Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time Adaptation
Yeonguk Yu · Sungho Shin · Seunghyeok Back · Minhwan Ko · Sangjun Noh · Kyoobin Lee
|
||
Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning
Rongjie Li · Yu Wu · Xuming He
|
||
The Neglected Tails of Vision-Language Models
Shubham Parashar · Tian Liu · Zhiqiu Lin · Xiangjue Dong · Yanan Li · James Caverlee · Deva Ramanan · Shu Kong
|
||
Multi-View Attentive Contextualization for Multi-View 3D Object Detection
Xianpeng Liu · Ce Zheng · Ming Qian · Nan Xue · Chen Chen · Zhebin Zhang · Chen Li · Tianfu Wu
|
||
SODA: Bottleneck Diffusion Models for Representation Learning
Drew Hudson · Daniel Zoran · Mateusz Malinowski · Andrew Lampinen · Andrew Jaegle · James McClelland · Loic Matthey · Felix Hill · Alexander Lerchner
|
||
Scaling Up Dynamic 3D Human-Scene Interaction Modelling
Nan Jiang · Zhiyuan Zhang · Hongjie Li · Xiaoxuan Ma · Zan Wang · Yixin Chen · Tengyu Liu · Yixin Zhu · Siyuan Huang
|
||
Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
Zikai Xiao · Guo-Ye Yang · Xue Yang · Tai-Jiang Mu · Junchi Yan · Shi-Min Hu
|
||
MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
Sadeep Jayasumana · Daniel Glasner · Srikumar Ramalingam · Andreas Veit · Ayan Chakrabarti · Sanjiv Kumar
|
||
Data-Free Quantization via Pseudo-label Filtering
Chunxiao Fan · Ziqi Wang · Dan Guo · Meng Wang
|
||
Fitting Flats to Flats
Gabriel Dogadov · Ugo Finnendahl · Marc Alexa
|
||
Bayesian Diffusion Models for 3D Shape Reconstruction
Haiyang Xu · Yu lei · Zeyuan Chen · Xiang Zhang · Yue Zhao · Yilin Wang · Zhuowen Tu
|
||
HOIST-Former: Hand-held Objects Identification, Segmentation, and Tracking in the Wild
Supreeth Narasimhaswamy · Huy Anh Nguyen · Lihan Huang · Minh Hoai
|
||
Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation
Yuan Xiao · Shiqing Ma · Juan Zhai · Chunrong Fang · Jinyuan Jia · Zhenyu Chen
|
||
Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning
Rashindrie Perera · Saman Halgamuge
|
||
MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding
Xu Cao · Tong Zhou · Yunsheng Ma · Wenqian Ye · Can Cui · Kun Tang · Zhipeng Cao · Kaizhao Liang · Ziran Wang · James Rehg · chao zheng
|
||
Generative Multi-modal Models are Good Class Incremental Learners
Xusheng Cao · Haori Lu · Linlan Huang · Xialei Liu · Ming-Ming Cheng
|
||
WaveMo: Learning Wavefront Modulations to See Through Scattering
Mingyang Xie · Haiyun Guo · Brandon Y. Feng · Lingbo Jin · Ashok Veeraraghavan · Christopher Metzler
|
||
DiG-IN: Diffusion Guidance for Investigating Networks - Uncovering Classifier Differences, Neuron Visualisations, and Visual Counterfactual Explanations
Maximilian Augustin · Yannic Neuhaus · Matthias Hein
|
||
Backpropagation-free Network for 3D Test-time Adaptation
YANSHUO WANG · Ali Cheraghian · Zeeshan Hayder · JIE HONG · Sameera Ramasinghe · Shafin Rahman · David Ahmedt-Aristizabal · Xuesong Li · Lars Petersson · Mehrtash Harandi
|
||
Finsler-Laplace-Beltrami Operators with Application to Shape Analysis
Simon Weber · Thomas Dagès · Maolin Gao · Daniel Cremers
|
||
NeRFiller: Completing Scenes via Generative 3D Inpainting
Ethan Weber · Aleksander Holynski · Varun Jampani · Saurabh Saxena · Noah Snavely · Abhishek Kar · Angjoo Kanazawa
|
||
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
Hoon Kim · Minje Jang · Wonjun Yoon · Jisoo Lee · Donghyun Na · Sanghyun Woo
|
||
Sparse Semi-Detr: Sparse Learnable Queries for Semi-Supervised Object Detection
Tahira Shehzadi · Khurram Azeem Hashmi · Didier Stricker · Muhammad Zeshan Afzal
|
||
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng · Zhicheng Guo · Jingwen Wu · Kechen Fang · Peng Li · Huaping Liu · Yang Liu
|
||
ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More
Jiazhou Zhou · Xu Zheng · Yuanhuiyi Lyu · Lin Wang
|
||
4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations
Wenbo Wang · Hsuan-I Ho · Chen Guo · Boxiang Rong · Artur Grigorev · Jie Song · Juan Jose Zarate · Otmar Hilliges
|
||
Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer
Wenqiao Zhang · Zheqi Lv
|
||
Weak-to-Strong 3D Object Detection with X-Ray Distillation
Alexander Gambashidze · Aleksandr Dadukin · Maksim Golyadkin · Maria Razzhivina · Ilya Makarov
|
||
Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications
Karren Yang · Anurag Ranjan · Jen-Hao Rick Chang · Raviteja Vemulapalli · Oncel Tuzel
|
||
YolOOD: Utilizing Object Detection Concepts for Multi-Label Out-of-Distribution Detection
Alon Zolfi · Guy AmiT · Amit Baras · Satoru Koda · Ikuya Morikawa · Yuval Elovici · Asaf Shabtai
|
||
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning
Jianqing Zhang · Yang Liu · Yang Hua · Jian Cao
|
||
HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative
CONG MA · Qiao Lei · Chengkai Zhu · Kai Liu · Zelong Kong · Liqing · Xueqi Zhou · Yuheng KAN · Wei Wu
|
||
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
song yiran · Qianyu Zhou · Xiangtai Li · Deng-Ping Fan · Xuequan Lu · Lizhuang Ma
|
||
From Feature to Gaze: A Generalizable Replacement of Linear Layer for Gaze Estimation
Yiwei Bao · Feng Lu
|
||
NC-SDF: Enhancing Indoor Scene Reconstruction Using Neural SDFs with View-Dependent Normal Compensation
Ziyi Chen · Xiaolong Wu · Yu Zhang
|
||
FedSOL: Stabilized Orthogonal Learning with Proximal Restrictions in Federated Learning
Gihun Lee · Minchan Jeong · SangMook Kim · Jaehoon Oh · Se-Young Yun
|
||
Learning Multi-dimensional Human Preference for Text-to-Image Generation
Sixian Zhang · Bohan Wang · Junqiang Wu · Yan Li · Tingting Gao · Di ZHANG · Zhongyuan Wang
|
||
Improved Visual Grounding through Self-Consistent Explanations
Ruozhen He · Paola Cascante-Bonilla · Ziyan Yang · Alex Berg · Vicente Ordonez
|
||
Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation
Xiaoyang Chen · Hao Zheng · Yuemeng LI · Yuncong Ma · Liang Ma · Hongming Li · Yong Fan
|
||
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Hao Li · Xue Yang · Zhaokai Wang · Xizhou Zhu · Jie Zhou · Yu Qiao · Xiaogang Wang · Hongsheng Li · Lewei Lu · Jifeng Dai
|
||
OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos
Dongyoung Choi · Hyeonjoong Jang · Min H. Kim
|
||
Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation
Alexander Raistrick · Lingjie Mei · Karhan Kayan · David Yan · Yiming Zuo · Beining Han · Hongyu Wen · Meenal Parakh · Stamatis Alexandropoulos · Lahav Lipson · Zeyu Ma · Jia Deng
|
||
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
Zhuowan Li · Bhavan Jasani · Peng Tang · Shabnam Ghadar
|
||
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models
Xinpeng Ding · Jianhua Han · Hang Xu · Xiaodan Liang · Wei Zhang · Xiaomeng Li
|
||
Reconstructing Hands in 3D with Transformers
Georgios Pavlakos · Dandan Shan · Ilija Radosavovic · Angjoo Kanazawa · David Fouhey · Jitendra Malik
|
||
Systematic comparison of semi-supervised and self-supervised learning for medical image classification
Zhe Huang · Ruijie Jiang · Shuchin Aeron · Michael C. Hughes
|
||
Hierarchical Correlation Clustering and Tree Preserving Embedding
Morteza Haghir Chehreghani · Mostafa Haghir Chehreghani
|
||
One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion
Minghua Liu · Ruoxi Shi · Linghao Chen · Zhuoyang Zhang · Chao Xu · Xinyue Wei · Hansheng Chen · Chong Zeng · Jiayuan Gu · Hao Su
|
||
C$^2$KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation
Fushuo Huo · Wenchao Xu · Jingcai Guo · Haozhao Wang · Song Guo
|
||
Distilling Vision-Language Models on Millions of Videos
Yue Zhao · Long Zhao · Xingyi Zhou · Jialin Wu · Chun-Te Chu · Hui Miao · Florian Schroff · Hartwig Adam · Ting Liu · Boqing Gong · Philipp Krähenbühl · Liangzhe Yuan
|
||
SNI-SLAM: Semantic Neural Implicit SLAM
Siting Zhu · Guangming Wang · Hermann Blum · Jiuming Liu · LiangSong · Marc Pollefeys · Hesheng Wang
|
||
PromptCoT: Align Prompt Distribution via Adapted Chain-of-Thought
Junyi Yao · Yijiang Liu · Zhen Dong · Mingfei Guo · Helan Hu · Kurt Keutzer · Li Du · Daquan Zhou · Shanghang Zhang
|
||
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Peng Qi · Zehong Yan · Wynne Hsu · Mong Li Lee
|
||
Spatial-Aware Regression for Keypoint Localization
Dongkai Wang · Shiliang Zhang
|
||
Shadow-Enlightened Image Outpainting
Hang Yu · Ruilin Li · Shaorong Xie · Jiayan Qiu
|
||
Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation
Jiapeng Su · Qi Fan · Wenjie Pei · Guangming Lu · Fanglin Chen
|
||
Few-Shot Object Detection with Foundation Models
Guangxing Han · Ser-Nam Lim
|
||
LiDAR-Net: A Real-scanned 3D Point Cloud Dataset for Indoor Scenes
Yanwen Guo · Yuanqi Li · Dayong Ren · Xiaohong Zhang · Jiawei Li · Liang Pu · Changfeng Ma · xiaoyu zhan · Jie Guo · Mingqiang Wei · Yan Zhang · Piaopiao Yu · Shuangyu Yang · Donghao Ji · Huisheng Ye · Hao Sun · Yansong Liu · Yinuo Chen · Jiaqi Zhu · Hongyu Liu
|
||
PREGO: online mistake detection in PRocedural EGOcentric videos
Alessandro Flaborea · Guido M. D'Amely di Melendugno · Leonardo Plini · Luca Scofano · Edoardo De Matteis · Antonino Furnari · Giovanni Maria Farinella · Fabio Galasso
|
||
M&M VTO: Multi-Garment Virtual Try-On and Editing
Luyang Zhu · Yingwei Li · Nan Liu · Hao Peng · Dawei Yang · Ira Kemelmacher-Shlizerman
|
||
Cross-view and Cross-pose Completion for 3D Human Understanding
Matthieu Armando · Salma Galaaoui · Fabien Baradel · Thomas Lucas · Vincent Leroy · Romain BRÉGIER · Philippe Weinzaepfel · Grégory Rogez
|
||
Task-Aware Encoder Control for Deep Video Compression
Xingtong Ge · Jixiang Luo · XINJIE ZHANG · Tongda Xu · Guo Lu · Dailan He · Jing Geng · Yan Wang · Jun Zhang · Hongwei Qin
|
||
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
Yibo Wang · Ruiyuan Gao · Kai Chen · Kaiqiang Zhou · Yingjie CAI · Lanqing Hong · Zhenguo Li · Lihui Jiang · Dit-Yan Yeung · Qiang Xu · Kai Zhang
|
||
Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors
Yu Zhang · Songpengcheng Xia · Lei Chu · Jiarui Yang · Qi Wu · Ling Pei
|
||
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Li Hu
|
||
Effective Video Mirror Detection with Inconsistent Motion Cues
Alex Warren · Ke Xu · Jiaying Lin · Gary Tam · Rynson W.H. Lau
|
||
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
Kelvin C.K. Chan · Yang Zhao · Xuhui Jia · Ming-Hsuan Yang · Huisheng Wang
|
||
Spatio-Temporal Turbulence Mitigation: A Translational Perspective
Xingguang Zhang · Nicholas M Chimitt · Yiheng Chi · Zhiyuan Mao · Stanley H. Chan
|
||
Composing Object Relations and Attributes for Image-Text Matching
Khoi Pham · Chuong Huynh · Ser-Nam Lim · Abhinav Shrivastava
|
||
Looking 3D: Anomaly Detection with 2D-3D Alignment
Ankan Kumar Bhunia · Changjian Li · Hakan Bilen
|
||
DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement
Jiuming Liu · Guangming Wang · Weicai Ye · Chaokang Jiang · Jinru Han · Zhe Liu · Guofeng Zhang · Dalong Du · Hesheng Wang
|
||
Point Cloud Pre-training with Diffusion Models
xiao zheng · Xiaoshui Huang · Guofeng Mei · Zhaoyang Lyu · Yuenan Hou · Wanli Ouyang · Bo Dai · Yongshun Gong
|
||
Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models
David Stotko · Nils Wandel · Reinhard Klein
|
||
On Train-Test Class Overlap and Detection for Image Retrieval
Chull Hwan Song · Jooyoung Yoon · Taebaek Hwang · Shunghyun Choi · Yeong Hyeon Gu · Yannis Avrithis
|
||
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
Zihua Liu · Hiroki Sakuma · Masatoshi Okutomi
|
||
Permutation Equivariance of Transformers and Its Applications
Hengyuan Xu · Liyao Xiang · Hangyu Ye · Dixi Yao · Pengzhi Chu · Baochun Li
|
||
Beyond Seen Primitive Concepts and Attribute-Object Compositional Learning
Nirat Saini · Khoi Pham · Abhinav Shrivastava
|
||
Transductive Zero-Shot $\&$ Few-Shot CLIP
Ségolène Martin · Yunshi HUANG · Fereshteh Shakeri · Jean-Christophe Pesquet · Ismail Ben Ayed
|
||
SLICE: Stabilized LIME for Consistent Explanations for Image Classification
Revoti Prasad Bora · Kiran Raja · Philipp Terhörst · Raymond Veldhuis · Raghavendra Ramachandra
|
||
GenZI: Zero-Shot 3D Human-Scene Interaction Generation
Lei Li · Angela Dai
|
||
Contrastive Learning for DeepFake Classification and Localization via Multi-Label Ranking
Cheng-Yao Hong · Yen-Chi Hsu · Tyng-Luh Liu
|
||
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Yuzhou Huang · Liangbin Xie · Xintao Wang · Ziyang Yuan · Xiaodong Cun · Yixiao Ge · Jiantao Zhou · Chao Dong · Rui Huang · Ruimao Zhang · Ying Shan
|
||
SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
Minghao Chen · Junyu Xie · Iro Laina · Andrea Vedaldi
|
||
3D Neural Edge Reconstruction
Lei Li · Songyou Peng · Zehao Yu · Shaohui Liu · Rémi Pautrat · Xiaochuan Yin · Marc Pollefeys
|
||
Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions
Namitha Padmanabhan · Matthew A Gwilliam · Pulkit Kumar · Shishira R Maiya · Max Ehrlich · Abhinav Shrivastava
|
||
Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding
Zhiheng Cheng · Qingyue Wei · Hongru Zhu · Yan Wang · Liangqiong Qu · Wei Shao · Yuyin Zhou
|
||
DiffusionAvatars: Deferred Diffusion for High-fidelity 3D Head Avatars
Tobias Kirschstein · Simon Giebenhain · Matthias Nießner
|
||
Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection
Jiaming Li · Jiacheng Zhang · Jichang Li · Ge Li · Si Liu · Liang Lin · Guanbin Li
|
||
Language-driven Grasp Detection
An Dinh Vuong · Minh Nhat VU · Baoru Huang · Nghia Nguyen · Hieu Le · Thieu Vo · Anh Nguyen
|
||
Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation
Zhiwei Yang · Kexue Fu · Minghong Duan · Linhao Qu · Shuo Wang · Zhijian Song
|
||
Each Test Image Deserves A Specific Prompt: Continual Test-Time Adaptation for 2D Medical Image Segmentation
Ziyang Chen · Yongsheng Pan · Yiwen Ye · Mengkang Lu · Yong Xia
|
||
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting
Zhijing Shao · Wang Zhaolong · Zhuang Li · Duotun Wang · Xiangru Lin · Yu Zhang · Mingming Fan · Zeyu Wang
|
||
LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model
Dongkai Wang · shiyu xuan · Shiliang Zhang
|
||
Instance-based Max-margin for Practical Few-shot Recognition
Minghao Fu · Ke Zhu
|
||
GARField: Group Anything with Radiance Fields
Chung Min Kim · Mingxuan Wu · Justin Kerr · Ken Goldberg · Matthew Tancik · Angjoo Kanazawa
|
||
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Walid Bousselham · Felix Petersen · Vittorio Ferrari · Hilde Kuehne
|
||
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Liangxiao Hu · Hongwen Zhang · Yuxiang Zhang · Boyao ZHOU · Boning Liu · Shengping Zhang · Liqiang Nie
|
||
ShapeMatcher: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation
Yan Di · Chenyangguang Zhang · Chaowei Wang · Ruida Zhang · Guangyao Zhai · Yanyan Li · Bowen Fu · Xiangyang Ji · Shan Gao
|
||
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
Rui Li · Tobias Fischer · Mattia Segu · Marc Pollefeys · Luc Van Gool · Federico Tombari
|
||
SVDTree: Semantic Voxel Diffusion for Single Image Tree Reconstruction
Yuan Li · Zhihao Liu · Bedrich Benes · Xiaopeng Zhang · Jianwei Guo
|
||
Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair
Jeonghoon Park · Chaeyeon Chung · Jaegul Choo
|
||
Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer
Yuwen Tan · Qinhao Zhou · Xiang Xiang · Ke Wang · Yuchuan Wu · Yongbin Li
|
||
Error Detection in Egocentric Procedural Task Videos
Shih-Po Lee · Zijia Lu · Zekun Zhang · Minh Hoai · Ehsan Elhamifar
|
||
Robust Self-calibration of Focal Lengths from the Fundamental Matrix
Viktor Kocur · Daniel Kyselica · Zuzana Kukelova
|
||
Learning to Control Camera Exposure via Reinforcement Learning
Kyunghyun Lee · Ukcheol Shin · Byeong-Uk Lee
|
||
Regressor-Segmenter Mutual Prompt Learning for Crowd Counting
Mingyue Guo · Li Yuan · Zhaoyi Yan · Binghui Chen · Yaowei Wang · Qixiang Ye
|
||
Efficient Detection of Long Consistent Cycles and its Application to Distributed Synchronization
Shaohan Li · Yunpeng Shi · Gilad Lerman
|
||
Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation
Dongliang Cao · Marvin Eisenberger · Nafie El Amrani · Daniel Cremers · Florian Bernard
|
||
Efficient Privacy-Preserving Visual Localization Using 3D Ray Clouds
Heejoon Moon · Chunghwan Lee · Je Hyeong Hong
|
||
Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior
Chen Cheng · Xiaofeng Yang · Fan Yang · Chengzeng Feng · ZHOUJIE FU · Chuan-Sheng Foo · Guosheng Lin · Fayao Liu
|
||
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Tony C. W. MOK · Zi Li · Yunhao Bai · Jianpeng Zhang · Wei Liu · Yan-Jie Zhou · Ke Yan · Dakai Jin · Yu Shi · Xiaoli Yin · Le Lu · Ling Zhang
|
||
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
Xi Liu · Ying Guo · Cheng Zhen · Tong Li · Yingying Ao · Pengfei Yan
|
||
Fully Convolutional Slice-to-Volume Reconstruction for Single-Stack MRI
Sean I. Young · Yaël Balbastre · Bruce Fischl · Polina Golland · Juan Iglesias
|
||
View From Above: Orthogonal viewpoint aware Cross-view Localization
Shan Wang · Chuong Nguyen · Jiawei Liu · Yanhao Zhang · Sundaram Muthu · Fahira Afzal Maken · Kaihao Zhang · Hongdong Li
|
||
SEAS: ShapE-Aligned Supervision for Person Re-Identification
Haidong Zhu · Pranav Budhwant · Zhaoheng Zheng · Ram Nevatia
|
||
LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes
shanlin sun · Bingbing Zhuang · Ziyu Jiang · Buyu Liu · Xiaohui Xie · Manmohan Chandraker
|
||
LoS: Local Structure Guided Stereo Matching
Kunhong Li · Longguang Wang · Ye Zhang · Kaiwen Xue · Shunbo Zhou · Yulan Guo
|
||
AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
Jonas Ricker · Denis Lukovnikov · Asja Fischer
|
||
Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training
Xiaoyang Wu · Zhuotao Tian · Xin Wen · Bohao Peng · Xihui Liu · Kaicheng Yu · Hengshuang Zhao
|
||
UniGS: Unified Representation for Image Generation and Segmentation
Lu Qi · Lehan Yang · Weidong Guo · Yu Xu · Bo Du · Varun Jampani · Ming-Hsuan Yang
|
||
Meta-Point Learning and Refining for Category-Agnostic Pose Estimation
Junjie Chen · Jiebin Yan · Yuming Fang · Li Niu
|
||
A Unified Framework for Human-centric Point Cloud Video Understanding
Yiteng Xu · Kecheng Ye · xiao han · yiming ren · Xinge Zhu · Yuexin Ma
|
||
HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation
Ce Zhang · Simon Stepputtis · Joseph Campbell · Katia Sycara · Yaqi Xie
|
||
GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation, Demonstration, and Imitation
Zifan Wang · Junyu Chen · Ziqing Chen · Pengwei Xie · Rui Chen · Li Yi
|
||
Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth
Zhaoyang Sun · Shengwu Xiong · Yaxiong Chen · Yi Rong
|
||
GenTron: Diffusion Transformers for Image and Video Generation
Shoufa Chen · Mengmeng Xu · Jiawei Ren · Yuren Cong · Sen He · Yanping Xie · Animesh Sinha · Ping Luo · Tao Xiang · Juan-Manuel Pérez-Rúa
|
||
Misalignment-Robust Frequency Distribution Loss for Image Transformation
Zhangkai Ni · Juncheng Wu · Zian Wang · Wenhan Yang · Hanli Wang · Lin Ma
|
||
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen · Kumar Ashutosh · Rohit Girdhar · David Harwath · Kristen Grauman
|
||
Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling
Liwen Wu · Sai Bi · Zexiang Xu · Fujun Luan · Kai Zhang · Iliyan Georgiev · Kalyan Sunkavalli · Ravi Ramamoorthi
|
||
360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries
Huajian Huang · Changkun Liu · Yipeng Zhu · Hui Cheng · Tristan Braud · Sai-Kit Yeung
|
||
Physical Property Understanding from Language-Embedded Feature Fields
Albert J. Zhai · Yuan Shen · Emily Y. Chen · Gloria Wang · Xinlei Wang · Sheng Wang · Kaiyu Guan · Shenlong Wang
|
||
Task-Customized Mixture of Adapters for General Image Fusion
Pengfei Zhu · Yang Sun · Bing Cao · Qinghua Hu
|
||
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Phillip Howard · Avinash Madasu · Tiep Le · Gustavo Lujan-Moreno · Anahita Bhiwandiwalla · Vasudev Lal
|
||
Convolutional Prompting meets Language Models for Continual Learning
ANURAG Roy · Riddhiman Moulick · Vinay Verma Verma · Saptarshi Ghosh · Abir Das
|
||
Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering
Vivek Gopalakrishnan · Neel Dey · Polina Golland
|
||
JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients
Woo Kyoung Han · Sunghoon Im · Jaedeok Kim · Kyong Hwan Jin
|
||
Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning
Tung Le · Khai Nguyen · shanlin sun · Nhat Ho · Xiaohui Xie
|
||
Generative Powers of Ten
Xiaojuan Wang · Janne Kontkanen · Brian Curless · Steve Seitz · Ira Kemelmacher-Shlizerman · Ben Mildenhall · Pratul P. Srinivasan · Dor Verbin · Aleksander Holynski
|
||
SuperPrimitive: Scene Reconstruction at a Primitive Level
Kirill Mazur · Gwangbin Bae · Andrew J. Davison
|
||
Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation
Renshuai Liu · Bowen Ma · Wei Zhang · Zhipeng Hu · Changjie Fan · Tangjie Lv · Yu Ding · Xuan Cheng
|
||
Geometrically-informed aggregation for zero-shot point cloud understanding
Guofeng Mei · Luigi Riz · Yiming Wang · Fabio Poiesi
|
||
From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Yonglu Li · Xiaoqian Wu · Xinpeng Liu · Zehao Wang · Yiming Dou · Yikun Ji · Junyi Zhang · Yixing Li · Xudong LU · Jingru Tan · Cewu Lu
|
||
Learning Degradation-unaware Representation with Prior-based Latent Transformations for Blind Face Restoration
Lianxin Xie · csbingbing zheng · Wen Xue · Le Jiang · Cheng Liu · Si Wu · Hau San Wong
|
||
ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection
Yichen Bai · Zongbo Han · Bing Cao · Xiaoheng Jiang · Qinghua Hu · Changqing Zhang
|
||
Unsupervised 3D Structure Inference from Category-Specific Image Collections
Weikang Wang · Dongliang Cao · Florian Bernard
|
||
Identifying Important Group of Pixels using Interactions
Kosuke Sumiyasu · Kazuhiko Kawamoto · Hiroshi Kera
|
||
Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields
Joshua Ahn · Haochen Wang · Raymond A. Yeh · Greg Shakhnarovich
|
||
POPDG: Popular 3D Dance Generation with PopDanceSet
ZhenYe Luo · Min Ren · Xuecai Hu · Yongzhen Huang · Li Yao
|
||
Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation
Qiyuan Dai · Sibei Yang
|
||
RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation
Zeyuan Yang · LIU JIAGENG · Peihao Chen · Anoop Cherian · Tim Marks · Jonathan Le Roux · Chuang Gan
|
||
Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance
Junkai Fan · Jiangwei Weng · Kun Wang · Yijun Yang · Jianjun Qian · Jun Li · Jian Yang
|
||
Sharingan: A Transformer Architecture for Multi-Person Gaze Following
Samy Tafasca · Anshul Gupta · Jean-marc Odobez
|
||
Implicit Motion Function
Yue Gao · Jiahao Li · Lei Chu · Yan Lu
|
||
MRFP: Learning Generalizable Semantic Segmentation from Sim-2-Real with Multi-Resolution Feature Perturbation
Sumanth Udupa · Prajwal Gurunath · Aniruddh Sikdar · Suresh Sundaram
|
||
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Yining Hong · Zishuo Zheng · Peihao Chen · Yian Wang · Junyan Li · Chuang Gan
|
||
Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation
Xin Kang · Lei Chu · Jiahao Li · Xuejin Chen · Yan Lu
|
||
Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection
Yicheng Xiao · Zhuoyan Luo · Yong Liu · Yue Ma · Hengwei Bian · Yatai Ji · Yujiu Yang · Xiu Li
|
||
ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images
Nicolas Bourriez · Ihab Bendidi · Cohen Ethan · Gabriel Watkinson · Maxime Sanchez · Guillaume Bollot · Auguste Genovesio
|
||
Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
Ji-Jia Wu · Andy Chia-Hao Chang · Chieh-Yu Chuang · Chun-Pei Chen · Yu-Lun Liu · Min-Hung Chen · Hou-Ning Hu · Yung-Yu Chuang · Yen-Yu Lin
|
||
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi · Sanghyeok Lee · Jaewon Chu · Minhyuk Choi · Hyunwoo J. Kim
|
||
Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes
Liqiong Wang · Jinyu Yang · Yanfu Zhang · Fangyi Wang · Feng Zheng
|
||
Diffusion Models Without Attention
Jing Nathan Yan · Jiatao Gu · Alexander Rush
|
||
DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models
Khawar Islam · Muhammad Zaigham Zaheer · Arif Mahmood · Karthik Nandakumar
|
||
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu · Quan Sun · Xiaosong Zhang · Yufeng Cui · Fan Zhang · Yue Cao · Xinlong Wang · Jingjing Liu
|
||
ExtraNeRF: Visibility-Aware View Extrapolation of Neural Radiance Fields with Diffusion Models
Meng-Li Shih · Wei-Chiu Ma · Lorenzo Boyice · Aleksander Holynski · Forrester Cole · Brian Curless · Janne Kontkanen
|
||
Real-Time Simulated Avatar from Head-Mounted Sensors
Zhengyi Luo · Jinkun Cao · Rawal Khirodkar · Alexander Winkler · Jing Huang · Kris Kitani · Weipeng Xu
|
||
Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation
Hoang Chuong Nguyen · Tianyu Wang · Jose M. Alvarez · Miaomiao Liu
|
||
AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation
Haonan Wang · Qixiang ZHANG · Yi Li · Xiaomeng Li
|
||
Byzantine-robust Decentralized Federated Learning via Dual-domain Clustering and Trust Bootstrapping
Peng Sun · Xinyang Liu · Zhibo Wang · Bo Liu
|
||
Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling
Kranthi Kumar Rachavarapu · Kalyan Ramakrishnan · A. N. Rajagopalan
|
||
State Space Models for Event Cameras
Nikola Zubic · Mathias Gehrig · Davide Scaramuzza
|
||
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk · Jaesung Huh · Evangelos Kazakos · Andrew Zisserman · Dima Damen
|
||
READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning
Takeru Oba · Matthew Walter · Norimichi Ukita
|
||
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
Rongjie Li · Songyang Zhang · Dahua Lin · Kai Chen · Xuming He
|
||
Spectrum AUC Difference (SAUCD): Human Aligned 3D Shape Evaluation
Tianyu Luan · Zhong Li · Lele Chen · Xuan Gong · Lichang Chen · Yi Xu · Junsong Yuan
|
||
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick · Guangxing Han · Rui Hou · Sayan Nag · Ser-Nam Lim · Nicolas Ballas · Qifan Wang · Rama Chellappa · Amjad Almahairi
|
||
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury · Sayan Nag · Joseph K J · Balaji Vasan Srinivasan · Dinesh Manocha
|
||
Multi-Space Alignments Towards Universal LiDAR Segmentation
Youquan Liu · Lingdong Kong · Xiaoyang Wu · Runnan Chen · Xin Li · Liang Pan · Ziwei Liu · Yuexin Ma
|
||
DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions
Yunxiao Shi · Manish Singh · Hong Cai · Fatih Porikli
|
||
Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning
Dipam Goswami · Albin Soutif · Yuyang Liu · Sandesh Kamath · Bartłomiej Twardowski · Joost van de Weijer
|
||
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen · Aliaksandr Siarohin · Willi Menapace · Ekaterina Deyneka · Hsiang-wei Chao · Byung Jeon · Yuwei Fang · Hsin-Ying Lee · Jian Ren · Ming-Hsuan Yang · Sergey Tulyakov
|
||
HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models
Mengcheng Li · Hongwen Zhang · Yuxiang Zhang · Ruizhi Shao · Tao Yu · Yebin Liu
|
||
FSC: Few-point Shape Completion
Xianzu Wu · Xianfeng Wu · Tianyu Luan · Yajing Bai · Zhongyuan Lai · Junsong Yuan
|
||
Holistic Features are almost Sufficient for Text-to-Video Retrieval
Kaibin Tian · Ruixiang Zhao · Zijie Xin · Bangxiang Lan · Xirong Li
|
||
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting
Yuheng Jiang · Zhehao Shen · Penghao Wang · Zhuo Su · Yu Hong · Yingliang Zhang · Jingyi Yu · Lan Xu
|
||
IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images
Yushuang Wu · Luyue Shi · Junhao Cai · Weihao Yuan · Lingteng Qiu · Zilong Dong · Liefeng Bo · Shuguang Cui · Xiaoguang Han
|
||
Doubly Abductive Counterfactual Inference for Text-based Image Editing
Xue Song · Jiequan Cui · Hanwang Zhang · Jingjing Chen · Richang Hong · Yu-Gang Jiang
|
||
AV-RIR: Audio-Visual Room Impulse Response Estimation
Anton Ratnarajah · Sreyan Ghosh · Sonal Kumar · Purva Chiniya · Dinesh Manocha
|
||
E-GPS: Explainable Geometry Problem Solving via Top-Down Solver and Bottom-Up Generator
Wenjun Wu · Lingling Zhang · Jun Liu · Xi Tang · Yaxian Wang · Shaowei Wang · QianYing Wang
|
||
CAGE: Controllable Articulation GEneration
Jiayi Liu · Hou In Ivan Tam · Ali Mahdavi Amiri · Manolis Savva
|
||
Fine-grained Bipartite Concept Factorization for Clustering
Chong Peng · Pengfei Zhang · Yongyong Chen · zhao kang · Chenglizhao Chen · Qiang Cheng
|
||
StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation
Sidi Wu · Yizi Chen · Loic Landrieu · Nicolas Gonthier · Samuel Mermet · Lorenz Hurni · Konrad Schindler
|
||
GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds
Prashant Kumar · Kshitij Madhav Bhat · Vedang Bhupesh Shenvi Nadkarni · Prem Kalra
|
||
ESCAPE: Encoding Super-keypoints for Category-Agnostic Pose Estimation
Khoi D Nguyen · Chen Li · Gim Hee Lee
|
||
ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models
Fei Kong · Jinhao Duan · Lichao Sun · Hao Cheng · Renjing Xu · Heng Tao Shen · Xiaofeng Zhu · Xiaoshuang Shi · Kaidi Xu
|
||
Unbiased Estimator for Distorted Conic in Camera Calibration
Chaehyeon Song · Jaeho Shin · Myung-Hwan Jeon · Jongwoo Lim · Ayoung Kim
|
||
Accurate Training Data for Occupancy Map Prediction in Automated Driving using Evidence Theory
Jonas Kälble · Sascha Wirges · Maxim Tatarchenko · Eddy Ilg
|
||
FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
Chao Xu · Yang Liu · Jiazheng Xing · Weida Wang · Mingze Sun · Jun Dan · Tianxin Huang · Siyuan Li · Zhi-Qi Cheng · Ying Tai · Baigui Sun
|
||
Backdoor Defense via Test-Time Detecting and Repairing
Jiyang Guan · Jian Liang · Ran He
|
||
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
Zhixuan Liang · Yao Mu · Hengbo Ma · Masayoshi Tomizuka · Mingyu Ding · Ping Luo
|
||
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
Chunlong Xia · Xinliang Wang · Feng Lv · Xin Hao · Yifeng Shi
|
||
Clustering for Protein Representation Learning
Ruijie Quan · Wenguan Wang · Fan Ma · Hehe Fan · Yi Yang
|
||
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
Yufan Chen · Jiaming Zhang · Kunyu Peng · Junwei Zheng · Ruiping Liu · Philip H.S. Torr · Rainer Stiefelhagen
|
||
What Sketch Explainability Really Means for Downstream Tasks ?
Hmrishav Bandyopadhyay · Pinaki Nath Chowdhury · Ayan Kumar Bhunia · Aneeshan Sain · Tao Xiang · Yi-Zhe Song
|
||
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Lingmin Ran · Xiaodong Cun · Jia-Wei Liu · Rui Zhao · Song Zijie · Xintao Wang · Jussi Keppo · Mike Zheng Shou
|
||
SURE: SUrvey REcipes for building reliable and robust deep networks
Yuting Li · Yingyi Chen · Xuanlong Yu · Dexiong Chen · Xi Shen
|
||
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Mingfu Liang · Jong-Chyi Su · Samuel Schulter · Sparsh Garg · Shiyu Zhao · Ying Wu · Manmohan Chandraker
|
||
Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Tianci Bi · Xiaoyi Zhang · Zhizheng Zhang · Wenxuan Xie · Cuiling Lan · Yan Lu · Nanning Zheng
|
||
Label Propagation for Zero-shot Classification with Vision-Language Models
Vladan Stojnić · Yannis Kalantidis · Giorgos Tolias
|
||
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
Zhipeng Du · Miaojing Shi · Jiankang Deng
|
||
CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images
Aaron Gokaslan · A. Feder Cooper · Jasmine Collins · Landan Seguin · Austin Jacobson · Mihir Patel · Jonathan Frankle · Cory Stephenson · Volodymyr Kuleshov
|
||
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
Kangfu Mei · Mauricio Delbracio · Hossein Talebi · Zhengzhong Tu · Vishal M. Patel · Peyman Milanfar
|
||
Unsegment Anything by Simulating Deformation
Jiahao Lu · Xingyi Yang · Xinchao Wang
|
||
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
Lingdong Kong · Youquan Liu · Lai Xing Ng · Benoit Cottereau · Wei Tsang Ooi
|
||
$\mathcal{Z}^*$: Zero-shot $\underline{S}$tyle $\underline{T}$ransfer via $\underline{A}$ttention $\underline{R}$eweighting
Yingying Deng · Xiangyu He · Fan Tang · Weiming Dong
|
||
VAREN: Very Accurate and Realistic Equine Network
Silvia Zuffi · Ylva Mellbin · Ci Li · Markus Höschle · Hedvig Kjellström · Senya Polikovsky · Elin Hernlund · Michael J. Black
|
||
Cross-spectral Gated-RGB Stereo Depth Estimation
Samuel Brucker · Stefanie Walz · Mario Bijelic · Felix Heide
|
||
Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution
Qingping Zheng · Ling Zheng · Yuanfan Guo · Ying Li · Songcen Xu · Jiankang Deng · Hang Xu
|
||
EASE-DETR: Easing the Competition among Object Queries
Yulu Gao · Yifan Sun · Xudong Ding · Chuyang Zhao · Si Liu
|
||
ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks
Andrea Rosasco · Stefano Berti · Giulia Pasquale · Damiano Malafronte · Shogo Sato · Hiroyuki Segawa · Tetsugo Inada · Lorenzo Natale
|
||
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous and Instruction-guided Driving
Brian Yang · Huangyuan Su · Nikolaos Gkanatsios · Tsung-Wei Ke · Ayush Jain · Jeff Schneider · Katerina Fragkiadaki
|
||
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Andong Wang · Bo Wu · Sunli Chen · Zhenfang Chen · Haotian Guan · Wei-Ning Lee · Li Erran Li · Chuang Gan
|
||
ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis
Xiangjun Gao · Xiaoyu Li · Chaopeng Zhang · Qi Zhang · Yan-Pei Cao · Ying Shan · Long Quan
|
||
Flow-Guided Online Stereo Rectification for Wide Baseline Stereo
Anush Kumar · Fahim Mannan · Omid Hosseini Jafari · Shile Li · Felix Heide
|
||
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
Zhe Li · Laurence Yang · Bocheng Ren · Xin Nie · Zhangyang Gao · Cheng Tan · Stan Z. Li
|
||
ProTeCt: Prompt Tuning for Taxonomic Open Set Classification
Tz-Ying Wu · Chih-Hui Ho · Nuno Vasconcelos
|
||
Mosaic-SDF for 3D Generative Models
Lior Yariv · Omri Puny · Oran Gafni · Yaron Lipman
|
||
FreeMan: Towards benchmarking 3D human pose estimation under Real-World Conditions
Jiong WANG · Fengyu Yang · Bingliang Li · Wenbo Gou · Danqi Yan · Ailing Zeng · Yijun Gao · Junle Wang · Yanqing Jing · Ruimao Zhang
|
||
Characteristics Matching Based Hash Codes Generation for Efficient Fine-grained Image Retrieval
Zhen-Duo Chen · Li-Jun Zhao · Zi-Chao Zhang · Xin Luo · Xin-Shun Xu
|
||
Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing
Bingyan Liu · Chengyu Wang · Tingfeng Cao · Kui Jia · Jun Huang
|
||
DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video
Huiqiang Sun · Xingyi Li · Liao Shen · Xinyi Ye · Ke Xian · Zhiguo Cao
|
||
TUMTraf V2X Cooperative Perception Dataset
Walter Zimmer · Gerhard Arya Wardana · Suren Sritharan · Xingcheng Zhou · Rui Song · Alois Knoll
|
||
Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models
Matthew Kowal · Richard P. Wildes · Kosta Derpanis
|
||
A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification
Zexian Yang · Dayan Wu · Chenming Wu · Zheng Lin · JingziGU · Weiping Wang
|
||
HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation
Linglin Jing · Yiming Ding · Yunpeng Gao · Zhigang Wang · Xu Yan · Dong Wang · Gerald Schaefer · Hui Fang · Bin Zhao · Xuelong Li
|
||
Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection
Taeheon Kim · Sebin Shin · Youngjoon Yu · Hak Gu Kim · Yong Man Ro
|
||
Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects
Yijia Weng · Bowen Wen · Jonathan Tremblay · Valts Blukis · Dieter Fox · Leonidas Guibas · Stan Birchfield
|
||
Human Gaussian Splatting : Real-time Rendering of Animatable Avatars
Arthur Moreau · Jifei Song · Helisa Dhamo · Richard Shaw · Yiren Zhou · Eduardo Pérez-Pellitero
|
||
Learning to Remove Wrinkled Transparent Film with Polarized Prior
Jiaqi Tang · RUIZHENG WU · Xiaogang Xu · Sixing Hu · Ying-Cong Chen
|
||
Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training
Yipeng Gao · Zeyu Wang · Wei-Shi Zheng · Cihang Xie · Yuyin Zhou
|
||
KeyPoint Relative Position Encoding for Face Recognition
Minchul Kim · Feng Liu · Yiyang Su · Anil Jain · Xiaoming Liu
|
||
Training Vision Transformers for Semi-Supervised Semantic Segmentation
Xinting Hu · Li Jiang · Bernt Schiele
|
||
Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
Xiang Li · Jinglu Wang · Xiaohao Xu · Xiulian Peng · Rita Singh · Yan Lu · Bhiksha Raj
|
||
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling
Xinhang Liu · Yu-Wing Tai · Chi-Keung Tang · Pedro Miraldo · Suhas Lohit · Moitreya Chatterjee
|
||
NECA: Neural Customizable Human Avatar
Junjin Xiao · Qing Zhang · Zhan Xu · Wei-Shi Zheng
|
||
VLP: Vision Language Planning for Autonomous Driving
Chenbin Pan · Burhaneddin Yaman · Tommaso Nesti · Abhirup Mallik · Alessandro G Allievi · Senem Velipasalar · Liu Ren
|
||
Adversarial Text to Continuous Image Generation
Kilichbek Haydarov · Aashiq Muhamed · Xiaoqian Shen · Jovana Lazarevic · Ivan Skorokhodov · Chamuditha Jayanga Galappaththige · Mohamed Elhoseiny
|
||
Robust Overfitting Does Matter: Test-Time Adversarial Purification With FGSM
Linyu Tang · Lei Zhang
|
||
Inversion-Free Image Editing with Language-Guided Diffusion Models
Sihan Xu · Yidong Huang · Jiayi Pan · Ziqiao Ma · Joyce Chai
|
||
Uncertainty-aware Action Decoupling Transformer for Action Anticipation
Hongji Guo · Nakul Agarwal · Shao-Yuan Lo · Kwonjoon Lee · Qiang Ji
|
||
Boosting Adversarial Training via Fisher-Rao Norm-based Regularization
Xiangyu Yin · Wenjie Ruan
|
||
InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
Xiefan Guo · Jinlin Liu · Miaomiao Cui · Jiankai Li · Hongyu Yang · Di Huang
|
||
SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation
Chen Sichen · Yingyi Zhang · Siming Huang · Ran Yi · Ke Fan · Ruixin Zhang · Peixian Chen · Jun Wang · Shouhong Ding · Lizhuang Ma
|
||
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
Feng Liang · Bichen Wu · Jialiang Wang · Licheng Yu · Kunpeng Li · Yinan Zhao · Ishan Misra · Jia-Bin Huang · Peizhao Zhang · Peter Vajda · Diana Marculescu
|
||
DiffLoc: Diffusion Model for Outdoor LiDAR Localization
Wen Li · Yuyang Yang · Shangshu Yu · Guosheng Hu · Chenglu Wen · Ming Cheng · Cheng Wang
|
||
PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection
Kuan-Chih Huang · Weijie Lyu · Ming-Hsuan Yang · Yi-Hsuan Tsai
|
||
Fairy: Fast Parallellized Instruction-Guided Video-to-Video Synthesis
Bichen Wu · Ching-Yao Chuang · Xiaoyan Wang · Yichen Jia · Kapil Krishnakumar · Tong Xiao · Feng Liang · Licheng Yu · Peter Vajda
|
||
SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion
Hsuan-I Ho · Jie Song · Otmar Hilliges
|
||
Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
ZHIXIANG WEI · Lin Chen · Xiaoxiao Ma · Huaian Chen · Tianle Liu · Pengyang Ling · Jinjin Zheng · Ben Wang · Yi Jin
|
||
OpenStreetView-5M: The Many Roads to Global Visual Geolocation
Guillaume Astruc · Nicolas Dufour · Ioannis Siglidis · Constantin Aronssohn · Nacim Bouia · Stephanie Fu · Romain Loiseau · Van Nguyen Nguyen · Charles Raude · Elliot Vincent · Lintao XU · Hongyu Zhou · Loic Landrieu
|
||
An Asymmetric Augmented Self-Supervised Learning Method for Unsupervised Fine-Grained Image Hashing
Feiran Hu · Chenlin Zhang · Jiangliang GUO · Xiu-Shen Wei · Lin Zhao · Anqi Xu · Lingyan Gao
|
||
MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model
Kaiyu Song · Hanjiang Lai · Yan Pan · Jian Yin
|
||
Action Scene Graphs for Long-Form Understanding of Egocentric Videos
Ivan Rodin · Antonino Furnari · Kyle Min · Subarna Tripathi · Giovanni Maria Farinella
|
||
Multi-Session SLAM using Wide-Baseline Optical Flow
Lahav Lipson · Jia Deng
|
||
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation
Xiao Ma · Sumit Patidar · Iain Haughton · Stephen James
|
||
Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models
Peifei Zhu · Tsubasa Takahashi · Hirokatsu Kataoka
|
||
UnionFormer: Unified-Learning Transformer with Multi-View Representation for Image Manipulation Detection and Localization
Shuaibo Li · Wei Ma · Jianwei Guo · Shibiao Xu · Benchong Li · Xiaopeng Zhang
|
||
EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
Jingyuan Yang · Jiawei Feng · Hui Huang
|
||
Decoupled Pseudo-labeling in Semi-Supervised Monocular 3D Object Detection
Jiacheng Zhang · Jiaming Li · Xiangru Lin · Wei Zhang · Xiao Tan · Junyu Han · Errui Ding · Jingdong Wang · Guanbin Li
|
||
Realigning Confidence with Temporal Saliency Information for Point-Level Weakly-Supervised Temporal Action Localization
Ziying Xia · Jian Cheng · Siyu Liu · Yongxiang Hu · Shiguang Wang · Zhang Yijie · Wanli Dang
|
||
Selective, Interpretable and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
Filip Ilic · He Zhao · Thomas Pock · Richard P. Wildes
|
||
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
Kyle Sargent · Zizhang Li · Tanmay Shah · Charles Herrmann · Hong-Xing Yu · Yunzhi Zhang · Eric Ryan Chan · Dmitry Lagun · Li Fei-Fei · Deqing Sun · Jiajun Wu
|
||
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
Haozhe Xie · Zhaoxi Chen · Fangzhou Hong · Ziwei Liu
|
||
Noisy-Correspondence Learning for Text-to-Image Person Re-identification
Yang Qin · Yingke Chen · Dezhong Peng · Xi Peng · Joey Tianyi Zhou · Peng Hu
|
||
LEMON: Learning 3D Human-Object Interaction Relation from 2D Images
Yuhang Yang · Wei Zhai · Hongchen Luo · Yang Cao · Zheng-Jun Zha
|
||
Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection
Zhanwei Zhang · Minghao Chen · Shuai Xiao · Liang Peng · Hengjia Li · Binbin Lin · Ping Li · Wenxiao Wang · Boxi Wu · Deng Cai
|
||
Brush2Prompt: Contextual Prompt Generator for Object Inpainting
Mang Tik Chiu · Yuqian Zhou · Lingzhi Zhang · Zhe Lin · Connelly Barnes · Sohrab Amirghodsi · Eli Shechtman · Humphrey Shi
|
||
NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild
weining ren · Zihan Zhu · Boyang Sun · Jiaqi Chen · Marc Pollefeys · Songyou Peng
|
||
Step differences in instructional video
Tushar Nagarajan · Lorenzo Torresani
|
||
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tongjia Chen · Hongshan Yu · Zhengeng Yang · Zechuan Li · Wei Sun · Chen Chen
|
||
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
Chaoqin Huang · Aofan Jiang · Jinghao Feng · Ya Zhang · Xinchao Wang · Yanfeng Wang
|
||
SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
Chull Hwan Song · Taebaek Hwang · Jooyoung Yoon · Shunghyun Choi · Yeong Hyeon Gu
|
||
Total Selfie: Generating Full-Body Selfies
Bowei Chen · Brian Curless · Ira Kemelmacher-Shlizerman · Steve Seitz
|
||
Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences
Minyoung Hwang · Luca Weihs · Chanwoo Park · Kimin Lee · Aniruddha Kembhavi · Kiana Ehsani
|
||
LayoutFormer: Hierarchical Text Detection Towards Scene Text Understanding
Min Liang · Jia-Wei Ma · Xiaobin Zhu · Jingyan Qin · Xu-Cheng Yin
|
||
Accelerating Neural Field Training via Soft Mining
Shakiba Kheradmand · Daniel Rebain · Gopal Sharma · Hossam Isack · Abhishek Kar · Andrea Tagliasacchi · Kwang Moo Yi
|
||
PEGASUS: Personalized Generative 3D Avatars with Composable Attributes
Hyunsoo Cha · Byungjun Kim · Hanbyul Joo
|
||
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo · Yufan Shen · Zhaoqing Zhu · Qi Zheng · Zhi Yu · Cong Yao
|
||
Continuous Pose for Monocular Cameras in Neural Implicit Representation
Qi Ma · Danda Paudel · Ajad Chhatkuli · Luc Van Gool
|
||
Depth Prompting for Sensor-Agnostic Depth Estimation
Jin-Hwi Park · Chanhwi Jeong · Junoh Lee · Hae-Gon Jeon
|
||
Modality-Collaborative Test-Time Adaptation for Action Recognition
Baochen Xiong · Xiaoshan Yang · Yaguang Song · Yaowei Wang · Changsheng Xu
|
||
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
Siyuan Liang · Mingli Zhu · Aishan Liu · Baoyuan Wu · Xiaochun Cao · Ee-Chien Chang
|
||
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Yuchao Gu · Xintao Wang · Yixiao Ge · Ying Shan · Mike Zheng Shou
|
||
Weakly Supervised Video Individual Counting
Xinyan Liu · Guorong Li · Yuankai Qi · Ziheng Yan · Zhenjun Han · Anton van den Hengel · Ming-Hsuan Yang · Qingming Huang
|
||
SHINOBI: SHape and Illumination using Neural Object decomposition via BRDF optimization and Inverse rendering from unconstrained Image collections
Andreas Engelhardt · Amit Raj · Mark Boss · Yunzhi Zhang · Abhishek Kar · Yuanzhen Li · Ricardo Martin-Brualla · Jonathan T. Barron · Deqing Sun · Hendrik Lensch · Varun Jampani
|
||
LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
Jialin Li · Qiang Nie · Weifu Fu · Yuhuan Lin · Guangpin Tao · Yong Liu · Chengjie Wang
|
||
Learning Group Activity Features Through Person Attribute Prediction
Chihiro Nakatani · Hiroaki Kawashima · Norimichi Ukita
|
||
FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Robik Shrestha · Yang Zou · Qiuyu Chen · Zhiheng Li · Yusheng Xie · Siqi Deng
|
||
MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections
mude hui · Zihao Wei · Hongru Zhu · Fei Xia · Yuyin Zhou
|
||
Visual Layout Composer: Image-Vector Dual Diffusion Model for Design Layout Generation
Mohammad Amin Shabani · Zhaowen Wang · Difan Liu · Nanxuan Zhao · Jimei Yang · Yasutaka Furukawa
|
||
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Sicong Leng · Hang Zhang · Guanzheng Chen · Xin Li · Shijian Lu · Chunyan Miao · Lidong Bing
|
||
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Yixun Liang · Xin Yang · Jiantao Lin · Haodong LI · Xiaogang Xu · Ying-Cong Chen
|
||
AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation
Taeckyung Lee · Sorn Chottananurak · Taesik Gong · Sung-Ju Lee
|
||
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
Li Maomao · Yu Li · Tianyu Yang · Yunfei Liu · Dongxu Yue · Zhihui Lin · Dong Xu
|
||
A Simple Recipe for Language-guided Domain Generalized Segmentation
Mohammad Fahes · TUAN-HUNG VU · Andrei Bursuc · Patrick Pérez · Raoul de Charette
|
||
Gradient Alignment for Cross-domain Face Anti-Spoofing
MINH BINH LE · Simon Woo
|
||
Multi-Object Tracking in the Dark
Xinzhe Wang · Kang Ma · Qiankun Liu · Yunhao Zou · Ying Fu
|
||
RoMa: Robust Dense Feature Matching
Johan Edstedt · Qiyu Sun · Georg Bökman · Mårten Wadenbäck · Michael Felsberg
|
||
ReconFusion: 3D Reconstruction with Diffusion Priors
Rundi Wu · Ben Mildenhall · Philipp Henzler · Ruiqi Gao · Keunhong Park · Daniel Watson · Pratul P. Srinivasan · Dor Verbin · Jonathan T. Barron · Ben Poole · Aleksander Holynski
|
||
Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention
Ju-Hyeon Nam · Nur Suriza Syazwany · Su Jung Kim · Sang-Chul Lee
|
||
DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis
Yuming Gu · Hongyi Xu · You Xie · Guoxian Song · Yichun Shi · Di Chang · Jing Yang · Linjie Luo
|
||
Video Prediction by Modeling Videos as Continuous Multi-Dimensional Processes
Gaurav Shrivastava · Abhinav Shrivastava
|
||
Unsupervised Keypoints from Pretrained Diffusion Models
Eric Hedlin · Gopal Sharma · Shweta Mahajan · Xingzhe He · Hossam Isack · Abhishek Kar · Helge Rhodin · Andrea Tagliasacchi · Kwang Moo Yi
|
||
Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving
JINLONG LI · Baolu Li · Zhengzhong Tu · XINYU LIU · Qing Guo · Felix Juefei Xu · Runsheng Xu · Hongkai Yu
|
||
MaGGIe: Masked Guided Gradual Human Instance Matting
Chuong Huynh · Seoung Wug Oh · Abhinav Shrivastava · Joon-Young Lee
|
||
JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments
Duy Tho Le · Chenhui Gou · Stavya Datta · Hengcan Shi · Ian Reid · Jianfei Cai · Hamid Rezatofighi
|
||
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
Haonan Lin
|
||
Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer
Yuang Ai · Xiaoqiang Zhou · Huaibo Huang · Lei Zhang · Ran He
|
||
Intensity-Robust Autofocus for Spike Camera
Changqing Su · Zhiyuan Ye · Yongsheng Xiao · You Zhou · Zhen Cheng · Bo Xiong · Zhaofei Yu · Tiejun Huang
|
||
LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset
Haolin Liu · Chongjie Ye · Yinyu Nie · Yingfan He · Xiaoguang Han
|
||
Generative Multimodal Models are In-Context Learners
Quan Sun · Yufeng Cui · Xiaosong Zhang · Fan Zhang · Qiying Yu · Yueze Wang · Yongming Rao · Jingjing Liu · Tiejun Huang · Xinlong Wang
|
||
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng · Binxin Yang · Tiankai Hang · Chen Li · Shuyang Gu · Ting Zhang · Jianmin Bao · Zheng Zhang · Houqiang Li · Han Hu · Dong Chen · Baining Guo
|
||
GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement
Linfang Zheng · Tze Ho Elden Tse · Chen Wang · Yinghan Sun · Hua Chen · Aleš Leonardis · Wei Zhang · Hyung Jin Chang
|
||
When StyleGAN Meets Stable Diffusion: a ${\mathcal{W}_+}$ Adapter for Personalized Image Generation
Xiaoming Li · Xinyu Hou · Chen Change Loy
|
||
Learning from Synthetic Human Group Activities
Che-Jui Chang · Danrui Li · Deep Patel · Parth Goel · Seonghyeon Moon · Samuel Sohn · Honglu Zhou · Sejong Yoon · Vladimir Pavlovic · Mubbasir Kapadia
|
||
Prompt-enhanced Multiple Instance Learning for Weakly Supervised Anomaly Detection
Junxi Chen · Liang Li · Li Su · Zheng-Jun Zha · Qingming Huang
|
||
OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers
Han Liang · Jiacheng Bao · Ruichi Zhang · Sihan Ren · Yuecheng Xu · Sibei Yang · Xin Chen · Jingyi Yu · Lan Xu
|
||
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange
Yanhao Wu · Tong Zhang · Wei Ke · Congpei Qiu · Sabine Süsstrunk · Mathieu Salzmann
|
||
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang · Xiaohan Ding · Kaixiong Gong · Yixiao Ge · Ying Shan · Xiangyu Yue
|
||
Rethinking Multi-view Representation Learning via Distilled Disentangling
Guanzhou Ke · Bo Wang · Xiao-Li Wang · Shengfeng He
|
||
Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations
Lei Fan · Jianxiong Zhou · Xiaoying Xing · Ying Wu
|
||
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
Shuai Yang · Yifan Zhou · Ziwei Liu · Chen Change Loy
|
||
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
Trevine Oorloff · Surya Koppisetti · Nicolo Bonettini · Divyaraj Solanki · Ben Colman · Yaser Yacoob · Ali Shahriyari · Gaurav Bharaj
|
||
DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction
Weiyi Lv · Yuhang Huang · NING Zhang · Ruei-Sung Lin · Mei Han · Dan Zeng
|
||
Condition-Aware Neural Network for Controlled Image Generation
Han Cai · Muyang Li · Qinsheng Zhang · Ming-Yu Liu · Song Han
|
||
Preserving Fairness Generalization in Deepfake Detection
Li Lin · Xinan He · Yan Ju · Xin Wang · Feng Ding · Shu Hu
|
||
On The Vulnerability of Efficient Vision Transformers to Adversarial Computation Attacks
Navaneet K L · Soroush Abbasi Koohpayegani · Essam Sleiman · Hamed Pirsiavash
|
||
Transcriptomics-guided Slide Representation Learning in Computational Pathology
Guillaume Jaume · Lukas Oldenburg · Anurag Vaidya · Richard J. Chen · Drew F. K. Williamson · Thomas Peeters · Andrew Song · Faisal Mahmood
|
||
DART: Implicit Doppler Tomography for Radar Novel View Synthesis
Tianshu Huang · John Miller · Akarsh Prabhakara · Tao Jin · Tarana Laroia · Zico Kolter · Anthony Rowe
|
||
Sparse views, Near light: A practical paradigm for uncalibrated point-light photometric stereo
Mohammed Brahimi · Bjoern Haefner · Zhenzhang Ye · Bastian Goldluecke · Daniel Cremers
|
||
EgoGen: An Egocentric Synthetic Data Generator
Gen Li · Kaifeng Zhao · Siwei Zhang · Siwei Zhang · Xiaozhong Lyu · Mihai Dusmanu · Yan Zhang · Marc Pollefeys · Siyu Tang
|
||
PanoContext-Former: Panoramic Total Scene Understanding with a Transformer
Yuan Dong · Chuan Fang · Liefeng Bo · Zilong Dong · Ping Tan
|
||
EGTR: Extracting Graph from Transformer for Scene Graph Generation
Jinbae Im · JeongYeon Nam · Nokyung Park · Hyungmin Lee · Seunghyun Park
|
||
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
Sangmin Lee · Bolin Lai · Fiona Ryan · Bikram Boote · James Rehg
|
||
EvDiG: Event-guided Direct and Global Components Separation
xinyu zhou · Peiqi Duan · Boyu Li · Chu Zhou · Chao Xu · Boxin Shi
|
||
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model
Qian Wang · Weiqi Li · Chong Mou · Xinhua Cheng · Jian Zhang
|
||
FreeKD: Knowledge Distillation via Semantic Frequency Prompt
Yuan Zhang · Tao Huang · Jiaming Liu · Tao Jiang · Kuan Cheng · Shanghang Zhang
|
||
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
Shanshan Zhong · Zhongzhan Huang · Shanghua Gao · Wushao Wen · Liang Lin · Marinka Zitnik · Pan Zhou
|
||
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis · Zhaoyan Liu · Satya Krishna Gorti · Valentin Villecroze · Jesse C. Cresswell · Guangwei Yu · Gabriel Loaiza-Ganem · Maksims Volkovs
|
||
AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement
Shiwei Jin · Zhen Wang · Lei Wang · Peng Liu · Ning Bi · Truong Nguyen
|
||
Towards HDR and HFR Video from Rolling-Mixed-Bit Spikings
Yakun Chang · Yeliduosi Xiaokaiti · Yujia Liu · Bin Fan · Zhaojun Huang · Tiejun Huang · Boxin Shi
|
||
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
Xiaoqi Wang · Wenbin He · Xiwei Xuan · Clint Sebastian · Jorge Piazentin Ono · Xin Li · Sima Behpour · Thang Doan · Liang Gou · Shen · Liu Ren
|
||
Solving Masked Jigsaw Puzzles with Diffusion Transformers
Jinyang Liu · Wondmgezahu Teshome · Sandesh Ghimire · Mario Sznaier · Octavia Camps
|
||
RoHM: Robust Human Motion Reconstruction via Diffusion
Siwei Zhang · Bharat Lal Bhatnagar · Yuanlu Xu · Alexander Winkler · Petr Kadlecek · Siyu Tang · Federica Bogo
|
||
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Yu-Ying Yeh · Jia-Bin Huang · Changil Kim · Lei Xiao · Thu Nguyen-Phuoc · Numair Khan · Cheng Zhang · Manmohan Chandraker · Carl Marshall · Zhao Dong · Zhengqin Li
|
||
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Yuechen Zhang · Shengju Qian · Bohao Peng · Shu Liu · Jiaya Jia
|
||
ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation
Jia-Hao Wu · Fu-Jen Tsai · Yan-Tsung Peng · Charles Tsai · Chia-Wen Lin · Yen-Yu Lin
|
||
Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households
Zhihao Cao · ZiDong Wang · Siwen Xie · Anji Liu · Lifeng Fan
|
||
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan · Fuxiao Liu · Xiyang Wu · Ruiqi Xian · Zongxia Li · Xiaoyu Liu · Xijun Wang · Lichang Chen · Furong Huang · Yaser Yacoob · Dinesh Manocha · Tianyi Zhou
|
||
CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs
Haocheng Yuan · Jing Xu · Hao Pan · Adrien Bousseau · Niloy J. Mitra · Changjian Li
|
||
ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention
Jiawei Wang · Changjian Li
|
||
Active Domain Adaptation with False Negative Prediction for Object Detection
Yuzuru Nakamura · Yasunori Ishii · Takayoshi Yamashita
|
||
HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models
Li Pang · Xiangyu Rui · Long Cui · Hongzhong Wang · Deyu Meng · Xiangyong Cao
|
||
Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Zihua Zhao · Mengxi Chen · Tianjie Dai · Jiangchao Yao · Bo Han · Ya Zhang · Yanfeng Wang
|
||
Towards Automatic Power Battery Detection: New Challenge, Benchmark Dataset and Baseline
Xiaoqi Zhao · Youwei Pang · Zhenyu Chen · Qian Yu · Lihe Zhang · Hanqi Liu · Jiaming Zuo · Huchuan Lu
|
||
Cinematic Behavior Transfer via NeRF-based Differentiable Filming
Xuekun Jiang · Anyi Rao · Jingbo Wang · Dahua Lin · Bo Dai
|
||
Random Entangled Tokens for Adversarially Robust Vision Transformer
Huihui Gong · Minjing Dong · Siqi Ma · Seyit Camtepe · Surya Nepal · Chang Xu
|
||
$360+x$: A Panoptic Multi-modal Scene Understanding Dataset
Hao Chen · Yuqi Hou · Chenyuan Qu · Irene Testini · Xiaohan Hong · Jianbo Jiao
|
||
Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery
Yuqi Zhang · Guanying Chen · Jiaxing Chen · Shuguang Cui
|
||
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Yangyi Chen · Karan Sikka · Michael Cogswell · Heng Ji · Ajay Divakaran
|
||
Rich Human Feedback for Text-to-Image Generation
Youwei Liang · Junfeng He · Gang Li · Peizhao Li · Arseniy Klimovskiy · Nicholas Carolan · Jiao Sun · Jordi Pont-Tuset · Sarah Young · Feng Yang · Junjie Ke · Krishnamurthy Dvijotham · Katherine Collins · Yiwen Luo · Yang Li · Kai Kohlhoff · Deepak Ramachandran · Vidhya Navalpakkam
|
||
FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment
Jinglin Xu · Sibo Yin · Guohao Zhao · Zishuo Wang · Yuxin Peng
|
||
Readout Guidance: Learning Control from Diffusion Features
Grace Luo · Trevor Darrell · Oliver Wang · Dan B Goldman · Aleksander Holynski
|
||
MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning
Yixin Liu · Chenrui Fan · Yutong Dai · Xun Chen · Pan Zhou · Lichao Sun
|
||
A theory of volumetric representations for opaque solids
Bailey Miller · Hanyu Chen · Alice Lai · Ioannis Gkioulekas
|
||
DiffusionLight: Light Probes for Free by Painting a Chrome Ball
Pakkapon Phongthawee · Worameth Chinchuthakun · Nontaphat Sinsunthithet · Varun Jampani · Amit Raj · Pramook Khungurn · Supasorn Suwajanakorn
|
||
PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness
Anh-Quan Cao · Angela Dai · Raoul de Charette
|
||
Neural Lineage
Runpeng Yu · Xinchao Wang
|
||
Structure-from-Motion from Pixel-wise Correspondences
Philipp Lindenberger · Paul-Edouard Sarlin · Marc Pollefeys
|
||
RNb-NeuS: Reflectance and Normal-based Multi-View 3D Reconstruction
Baptiste Brument · Robin Bruneau · Yvain Queau · Jean Mélou · Francois Lauze · Jean-Denis Durou · Lilian Calvet
|
||
3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis
Zhicheng Lu · xiang guo · Le Hui · Tianrui Chen · Min Yang · Xiao Tang · feng zhu · Yuchao Dai
|
||
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
Yichen Yao · Zimo Jiang · YUJING SUN · Zhencai Zhu · Xinge Zhu · Runnan Chen · Yuexin Ma
|
||
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing
Yujun Shi · Chuhui Xue · Jun Hao Liew · Jiachun Pan · Hanshu Yan · Wenqing Zhang · Vincent Y. F. Tan · Song Bai
|
||
Adversarially Robust Few-shot Learning via Parameter Co-distillation of Similarity and Class Concept Learners
Junhao Dong · Piotr Koniusz · Junxi Chen · Xiaohua Xie · Yew-Soon Ong
|
||
Generative Latent Coding for Ultra-Low Bitrate Image Compression
Zhaoyang Jia · Jiahao Li · Bin Li · Houqiang Li · Yan Lu
|
||
Differentiable Point-based Inverse Rendering
Hoon-Gyu Chung · Seokjun Choi · Seung-Hwan Baek
|
||
GS-IR: 3D Gaussian Splatting for Inverse Rendering
Zhihao Liang · Qi Zhang · Ying Feng · Ying Shan · Kui Jia
|
||
Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Hang Du · Sicheng Zhang · Binzhu Xie · Guoshun Nan · Jiayang Zhang · Junrui Xu · Hangyu Liu · Sicong Leng · Jiangming Liu · Hehe Fan · Dajiu Huang · Jing Feng · Linli Chen · Can Zhang · Xuhuan Li · Hao Zhang · Jianhang Chen · Qimei Cui · Xiaofeng Tao
|
||
Rethinking Diffusion Model for Multi-Contrast MRI Super-Resolution
Guangyuan Li · Chen Rao · Juncheng Mo · Zhanjie Zhang · Wei Xing · Lei Zhao
|
||
Learning Equi-angular Representations for Online Continual Learning
Minhyuk Seo · Hyunseo Koh · Wonje Jeung · Minjae Lee · San Kim · Hankook Lee · Sungjun Cho · Sungik Choi · Hyunwoo Kim · Jonghyun Choi
|
||
Improving Bird’s Eye View Semantic Segmentation by Task Decomposition
Tianhao Zhao · Yongcan Chen · Yu Wu · Tianyang Liu · Bo Du · Peilun Xiao · shi qiu · Hongda Yang · Guozhen Li · yi yang · Yutian Lin
|
||
Neural Video Compression with Feature Modulation
Jiahao Li · Bin Li · Yan Lu
|
||
KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling
Yu Wang · Xin Li · Shengzhao Wen · gang zhang · Haixiao Yue · Haocheng Feng · Junyu Han · Errui Ding
|
||
Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network
Sizhe Zheng · Pan Gao · Peng Zhou · Jie Qin
|
||
Efficient Test-Time Adaptation of Vision-Language Models
Adilbek Karmanov · Dayan Guan · Shijian Lu · Abdulmotaleb El Saddik · Eric P. Xing
|
||
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
Xianqi Wang · Gangwei Xu · Hao Jia · Xin Yang
|
||
Adversarial Backdoor Attack by Naturalistic Data Poisoning on Trajectory Prediction in Autonomous Driving
Mozhgan Pourkeshavarz · Mohammad Sabokrou · Amir Rasouli
|
||
Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
Mir Hossain Hossain · Mennatullah Siam · Leonid Sigal · Jim Little
|
||
S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data
Xuyang Li · Xuyang Li · Danfeng Hong · Jocelyn Chanussot
|
||
Object Pose Estimation via the Aggregation of Diffusion Features
Tianfu Wang · Guosheng Hu · Hongguang Wang
|
||
In Search of a Data Transformation That Accelerates Neural Field Training
Junwon Seo · Sangyoon Lee · Kwang In Kim · Jaeho Lee
|
||
Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior
Wonseok Roh · Hwanhee Jung · Giljoo Nam · Jinseop Yeom · Hyunje Park · Sang Ho Yoon · Sangpil Kim
|
||
In-distribution Public Data Synthesis with Diffusion Models for Differentially Private Image Classification
Jinseong Park · Yujin Choi · Jaewook Lee
|
||
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen · Dapeng Chen · Ruijin Liu · Sai Zhou · Wenyuan Xue · Wei Peng
|
||
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang · Chao Feng · Ziyang Chen · Hyoungseob Park · Daniel Wang · Yiming Dou · Ziyao Zeng · xien chen · Suchisrit Gangopadhyay · Andrew Owens · Alex Wong
|
||
EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
Nikita Drobyshev · Antoni Bigata Casademunt · Konstantinos Vougioukas · Zoe Landgraf · Stavros Petridis · Maja Pantic
|
||
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Qi Yang · Xing Nie · Tong Li · Gaopengfei · Ying Guo · Cheng Zhen · Pengfei Yan · Shiming Xiang
|
||
D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval
Yi Xie · Yihong Lin · Wenjie Cai · Xuemiao Xu · Huaidong Zhang · Yong Du · Shengfeng He
|
||
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
Sijia Chen · En Yu · Jinyang Li · Wenbing Tao
|
||
Single-View Refractive Index Tomography with Neural Fields
Brandon Zhao · Aviad Levis · Liam Connor · Pratul P. Srinivasan · Katherine Bouman
|
||
MTLoRA: Low-Rank Adaptation Approach for Efficient Multi-Task Learning
Ahmed Agiza · Marina Neseem · Sherief Reda
|
||
Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion
Zuoyue Li · Zhenqiang Li · Zhaopeng Cui · Marc Pollefeys · Martin R. Oswald
|
||
Adaptive Softassign via Hadamard-Equipped Sinkhorn
Binrui Shen · Qiang Niu · Shengxin Zhu
|
||
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras · Miika Aittala · Jaakko Lehtinen · Janne Hellsten · Timo Aila · Samuli Laine
|
||
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han · Kaixiong Gong · Yiyuan Zhang · Jiaqi Wang · Kaipeng Zhang · Dahua Lin · Yu Qiao · Peng Gao · Xiangyu Yue
|
||
See, Say, and Segment: Correcting False Premises with LMMs
Tsung-Han Wu · Giscard Biamby · David Chan · Lisa Dunlap · Ritwik Gupta · Xudong Wang · Trevor Darrell · Joseph Gonzalez
|
||
Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection
Suyeon Kim · Dongha Lee · SeongKu Kang · Sukang Chae · Sanghwan Jang · Hwanjo Yu
|
||
Question Aware Vision Transformer for Multimodal Reasoning
Roy Ganz · Yair Kittenplon · Aviad Aberdam · Elad Ben Avraham · Oren Nuriel · Shai Mazor · Ron Litman
|
||
Rethinking Generalizable Face Anti-spoofing via Hierarchical Prototype-guided Distribution Refinement in Hyperbolic Space
Chengyang Hu · Ke-Yue Zhang · Taiping Yao · Shouhong Ding · Lizhuang Ma
|
||
Towards Efficient Replay in Federated Incremental Learning
Yichen Li · Qunwei Li · Haozhao Wang · Ruixuan Li · Wenliang Zhong · Guannan Zhang
|
||
SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
Alexandros Delitzas · Ayça Takmaz · Federico Tombari · Robert Sumner · Marc Pollefeys · Francis Engelmann
|
||
ControlRoom3D: Room Generation using Semantic Controls
Jonas Schult · Sam Tsai · Lukas Hoellein · Bichen Wu · Jialiang Wang · Chih-Yao Ma · Kunpeng Li · Xiaofang Wang · Felix Wimbauer · Zijian He · Peizhao Zhang · Bastian Leibe · Peter Vajda · Ji Hou
|
||
LAN: Learning to Adapt Noise for Image Denoising
Changjin Kim · Tae Hyun Kim · Sungyong Baik
|
||
Diff-BGM: A Diffusion Model for Video Background Music Generation
Sizhe Li · Yiming Qin · Minghang Zheng · Xin Jin · Yang Liu
|
||
RMem: Restricted Memory Banks Improve Video Object Segmentation
Junbao Zhou · Ziqi Pang · Yu-Xiong Wang
|
||
DiaLoc: An Iterative Approach to Embodied Dialog Localization
Chao Zhang · Mohan Li · Ignas Budvytis · Stephan Liwicki
|
||
Artist-Friendly Relightable and Animatable Neural Heads
Yingyan Xu · Prashanth Chandran · Sebastian Weiss · Markus Gross · Gaspard Zoss · Derek Bradley
|
||
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
Thuan Nguyen · Anh Tran
|
||
AVID: Any-Length Video Inpainting with Diffusion Model
Zhixing Zhang · Bichen Wu · Xiaoyan Wang · Yaqiao Luo · Luxin Zhang · Yinan Zhao · Peter Vajda · Dimitris N. Metaxas · Licheng Yu
|
||
Circuit Design and Efficient Simulation of Quantum Inner Product and Empirical Studies of Its Effect on Near-Term Hybrid Quantum-Classic Machine Learning
Hao Xiong · Yehui Tang · Xinyu Ye · Junchi Yan
|
||
Neural Implicit Morphing of Face Images
Guilherme Schardong · Tiago Novello · Hallison Paz · Iurii Medvedev · Vinícius Silva · Luiz Velho · Nuno Gonçalves
|
||
GDA: Generalized Diffusion for Robust Test-time Adaptation
Yun-Yun Tsai · Fu-Chen Chen · Albert Chen · Junfeng Yang · Che-Chun Su · Min Sun · Cheng-Hao Kuo
|
||
Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds
Zhimin Yuan · Wankang Zeng · Yanfei Su · Weiquan Liu · Ming Cheng · Yulan Guo · Cheng Wang
|
||
SubT-MRS Datasets: Pushing SLAM Towards All-weather Environments
Shibo Zhao · Yuanjun Gao · Tianhao Wu · Damanpreet Singh · Rushan Jiang · Haoxiang Sun · Mansi Sarawata · Warren Whittaker · Ian Higgins · Shaoshu Su · Yi Du · Can Xu · John Keller · Jay Karhade · Lucas Nogueira · Sourojit Saha · Yuheng Qiu · Ji Zhang · Wenshan Wang · Chen Wang · Sebastian Scherer
|
||
SpecNeRF: Gaussian Directional Encoding for Specular Reflections
Li Ma · Vasu Agrawal · Haithem Turki · Changil Kim · Chen Gao · Pedro V. Sander · Michael Zollhoefer · Christian Richardt
|
||
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Dar-Yen Chen · Ayan Kumar Bhunia · Subhadeep Koley · Aneeshan Sain · Pinaki Nath Chowdhury · Yi-Zhe Song
|
||
SPIDeRS: Structured Polarization for Invisible Depth and Reflectance Sensing
Tomoki Ichikawa · Shohei Nobuhara · Ko Nishino
|
||
UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion
Junsheng Zhou · Weiqi Zhang · Baorui Ma · Kanle Shi · Yu-Shen Liu · Zhizhong Han
|
||
UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Jialong Zuo · Hanyu Zhou · Ying Nie · Feng Zhang · Tianyu Guo · Nong Sang · Yunhe Wang · Changxin Gao
|
||
Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates
Ka Chun SHUM · Jaeyeon Kim · Binh-Son Hua · Thanh Nguyen · Sai-Kit Yeung
|
||
VTimeLLM: Empower LLM to Grasp Video Moments
Bin Huang · Xin Wang · Hong Chen · Zihan Song · Wenwu Zhu
|
||
HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation
Yongliang Lin · Yongzhi Su · Praveen Nathan · Sandeep Inuganti · Yan Di · Martin Sundermeyer · Fabian Manhardt · Didier Stricker · Jason Rambach · Yu Zhang
|
||
AnyScene: Customized Image Synthesis with Composited Foreground
Ruidong Chen · Lanjun Wang · Weizhi Nie · Yongdong Zhang · An-An Liu
|
||
TE-TAD: Towards Fully End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression
Ho-Joong Kim · Jung-Ho Hong · Heejo Kong · Seong-Whan Lee
|
||
Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions
Saeed Khorram · Mingqi Jiang · Mohamad Shahbazi · Mohamad Hosein Danesh · Li Fuxin
|
||
TRINS: Towards Multimodal Language Models That Can Read
Ruiyi Zhang · Yanzhe Zhang · Jian Chen · Yufan Zhou · Jiuxiang Gu · Changyou Chen · Tong Sun
|
||
A Unified and Interpretable Emotion Representation and Expression Generation
Reni Paskaleva · Mykyta Holubakha · Andela Ilic · Saman Motamed · Luc Van Gool · Danda Paudel
|
||
One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls
Minghui Hu · Jianbin Zheng · Chuanxia Zheng · Chaoyue Wang · Dacheng Tao · Tat-Jen Cham
|
||
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models
Xinyu Tian · Shu Zou · Zhaoyuan Yang · Jing Zhang
|
||
LPSNet: End-to-End Human Pose and Shape Estimation with Lensless Imaging
Haoyang Ge · Qiao Feng · Hailong Jia · Xiongzheng Li · Xiangjun Yin · You Zhou · Jingyu Yang · Kun Li
|
||
6-DoF Pose Estimation with MultiScale Residual Correlation
Yuelong Li · Yafei Mao · Raja Bala · Sunil Hadap
|
||
StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN
Jongwoo Choi · Kwanggyoon Seo · Amirsaman Ashtari · Junyong Noh
|
||
CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion
Xiaoyu Wu · Yang Hua · Chumeng Liang · Jiaru Zhang · Hao Wang · Tao Song · Haibing Guan
|
||
Robust Depth Enhancement via Polarization Prompt Fusion Tuning
Kei IKEMURA · Yiming Huang · Felix Heide · Zhaoxiang Zhang · Qifeng Chen · Chenyang Lei
|
||
PI3D: Efficient Text-to-3D Generation with Pseudo-Image Diffusion
Ying-Tian Liu · Yuan-Chen Guo · Guan Luo · Heyi Sun · Wei Yin · Song-Hai Zhang
|
||
Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels
Tianming Liang · Chaolei Tan · Beihao Xia · Wei-Shi Zheng · Jian-Fang Hu
|
||
DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
Xiaoyu Zhou · Zhiwei Lin · Xiaojun Shan · Yongtao Wang · Deqing Sun · Ming-Hsuan Yang
|
||
Interactive3D: Create What You Want by Interactive 3D Generation
Shaocong Dong · Lihe Ding · Zhanpeng Huang · Zibin Wang · Tianfan Xue · Dan Xu
|
||
Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning
Leonardo Iurada · Marco Ciccone · Tatiana Tommasi
|
||
CORE-MPI: Consistency Object Removal with Embedding MultiPlane Image
Donggeun Yoon · Donghyeon Cho
|
||
Amodal Ground Truth and Completion in the Wild
Guanqi Zhan · Chuanxia Zheng · Weidi Xie · Andrew Zisserman
|
||
Real-Time Exposure Correction via Collaborative Transformations and Adaptive Sampling
Ziwen Li · Feng Zhang · Meng Cao · Jinpu Zhang · Yuanjie Shao · Yuehuan Wang · Nong Sang
|
||
Gaussian Splatting SLAM
Hidenobu Matsuki · Riku Murai · Paul Kelly · Andrew J. Davison
|
||
A Simple Baseline for Efficient Hand Mesh Reconstruction
zhishan zhou · shihao zhou · Zhi Lv · minqiang zou · Yao Tang · Jiajun Liang
|
||
EFormer: Enhanced Transformer towards Semantic-Contour Features of Foreground for Portraits Matting
Zitao Wang · Qiguang Miao · Yue Xi · Peipei Zhao
|
||
Privacy-preserving Optics for Enhancing Protection in Face De-identification
Jhon Lopez · Carlos Hinojosa · Henry Arguello · Bernard Ghanem
|
||
BilevelPruning: Unified Dynamic and Static Channel Pruning for Convolutional Neural Networks
Shangqian Gao · Yanfu Zhang · Feihu Huang · Heng Huang
|
||
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Jiayi Guo · Xingqian Xu · Yifan Pu · Zanlin Ni · Chaofei Wang · Manushree Vasu · Shiji Song · Gao Huang · Humphrey Shi
|
||
A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization
Hongwei Ren · Jiadong Zhu · Yue Zhou · Haotian FU · Yulong Huang · Bojun Cheng
|
||
ReCoRe: Regularized Contrastive Representation Learning of World Model
Rudra P,K. Poudel · Harit Pandya · Stephan Liwicki · Roberto Cipolla
|
||
Unmixing before Fusion: A Generalized Paradigm for Multi-Source-based Hyperspectral Image Synthesis
Yang Yu · Erting Pan · Xinya Wang · Yuheng Wu · Xiaoguang Mei · Jiayi Ma
|
||
Utility-Fairness Trade-Offs and How to Find Them
Sepehr Dehdashtian · Bashir Sadeghi · Vishnu Naresh Boddeti
|
||
Learning Continuous 3D Words for Text-to-Image Generation
Ta-Ying Cheng · Matheus Gadelha · Thibault Groueix · Matthew Fisher · Radomir Mech · Andrew Markham · Niki Trigoni
|
||
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue · Yuansheng Ni · Kai Zhang · Tianyu Zheng · Ruoqi Liu · Ge Zhang · Samuel Stevens · Dongfu Jiang · Weiming Ren · Yuxuan Sun · Cong Wei · Botao Yu · Ruibin Yuan · Renliang Sun · Ming Yin · Boyuan Zheng · Zhenzhu Yang · Yibo Liu · Wenhao Huang · Huan Sun · Yu Su · Wenhu Chen
|
||
A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling
Qu Wentao · Yuantian Shao · Lingwu Meng · Xiaoshui Huang · Liang Xiao
|
||
Efficient Solution of Point-Line Absolute Pose
Petr Hruby · Timothy Duff · Marc Pollefeys
|
||
CAMixerSR: Only Details Need More "Attention"
Yan Wang · Yi Liu · Shijie Zhao · Junlin Li · Li zhang
|
||
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He · Yiheng Deng · SHIXIANG TANG · Qihao CHEN · Qingsong Xie · Yizhou Wang · Lei Bai · Feng Zhu · Rui Zhao · Wanli Ouyang · Donglian Qi · Yunfeng Yan
|
||
CoDe: An Explicit Content Decoupling Framework for Image Restoration
Enxuan Gu · Hongwei Ge · Yong Guo
|
||
Inter-X: Towards Versatile Human-Human Interaction Analysis
Liang Xu · Xintao Lv · Yichao Yan · Xin Jin · Wu Shuwen · Congsheng Xu · Yifan Liu · Yizhou Zhou · Fengyun Rao · Xingdong Sheng · Yunhui LIU · Wenjun Zeng · Xiaokang Yang
|
||
One-Shot Open Affordance Learning with Foundation Models
Gen Li · Deqing Sun · Laura Sevilla-Lara · Varun Jampani
|
||
Self-Supervised Dual Contouring
Ramana Sundararaman · Roman Klokov · Maks Ovsjanikov
|
||
Tactile-Augmented Radiance Fields
Yiming Dou · Fengyu Yang · Yi Liu · Antonio Loquercio · Andrew Owens
|
||
Consistent Prompting for Rehearsal-Free Continual Learning
Zhanxin Gao · Jun Cen · Xiaobin Chang
|
||
MedBN: Robust Test-Time Adaptation against Malicious Test Samples
Hyejin Park · Jeongyeon Hwang · Sunung Mun · Sangdon Park · Jungseul Ok
|
||
Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion
Litu Rout · Yujia Chen · Abhishek Kumar · Constantine Caramanis · Sanjay Shakkottai · Wen-Sheng Chu
|
||
Purified and Unified Steganographic Network
GuoBiao Li · Sheng Li · Zicong Luo · Zhenxing Qian · Xinpeng Zhang
|
||
Deformable One-shot Face Stylization via DINO Semantic Guidance
Yang Zhou · Zichong Chen · Hui Huang
|
||
PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation
Ardian Umam · Cheng-Kun Yang · Min-Hung Chen · Jen-Hui Chuang · Yen-Yu Lin
|
||
Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures
Huijie Zhang · Yifu Lu · Ismail Alkhouri · Saiprasad Ravishankar · Dogyoon Song · Qing Qu
|
||
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
Yang Zhou · Hao Shao · Letian Wang · Steven L. Waslander · Hongsheng Li · Yu Liu
|
||
VecFusion: Vector Font Generation with Diffusion
Vikas Thamizharasan · Difan Liu · Shantanu Agarwal · Matthew Fisher · Michaël Gharbi · Oliver Wang · Alec Jacobson · Evangelos Kalogerakis
|
||
OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising
Haichao Zhang · Yi Xu · Hongsheng Lu · Takayuki Shimizu · Yun Fu
|
||
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Boran Han · Shuai Zhang · Xingjian Shi · Markus Reichstein
|
||
Bilateral Event Mining and Complementary for Event Stream Super-Resolution
Zhilin Huang · Quanmin Liang · Yijie Yu · Chujun Qin · Xiawu Zheng · Kai Huang · Zikun Zhou · Wenming Yang
|
||
Video Harmonization with Triplet Spatio-Temporal Variation Patterns
Zonghui Guo · XinYu Han · Jie Zhang · Shiguang Shan · Haiyong Zheng
|
||
Semantic Line Combination Detector
JINWON KO · Dongkwon Jin · Chang-Su Kim
|
||
GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image
Chong Bao · Yinda Zhang · Yuan Li · Xiyu Zhang · Bangbang Yang · Hujun Bao · Marc Pollefeys · Guofeng Zhang · Zhaopeng Cui
|
||
RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D
Lingteng Qiu · Guanying Chen · Xiaodong Gu · Qi Zuo · Mutian Xu · Yushuang Wu · Weihao Yuan · Zilong Dong · Liefeng Bo · Xiaoguang Han
|
||
Multi-Modal Hallucination Control by Visual Information Grounding
Alessandro Favero · Luca Zancato · Matthew Trager · Siddharth Choudhary · Pramuditha Perera · Alessandro Achille · Ashwin Swaminathan · Stefano Soatto
|
||
Laplacian-guided Entropy Model in Neural Codec with Blur-dissipated Synthesis
Atefeh Khoshkhahtinat · Ali Zafari · Piyush Mehta · Nasser Nasrabadi
|
||
Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection
Chengjie Wang · wenbing zhu · Bin-Bin Gao · Zhenye Gan · Jiangning Zhang · Zhihao Gu · Bruce Qian · Mingang Chen · Lizhuang Ma
|
||
ModaVerse: Efficiently Transforming Modalities with LLMs
Xinyu Wang · Bohan Zhuang · Qi Wu
|
||
3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow
Felix Taubner · Prashant Raina · Mathieu Tuli · Eu Wern Teh · Chul Lee · Jinmiao Huang
|
||
HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations
Peng Dai · Yang Zhang · Tao Liu · ZhenFan · Tianyuan Du · Zhuo Su · Xiaozheng Zheng · Zeming Li
|
||
Traffic Scene Parsing through the TSP6K Dataset
Peng-Tao Jiang · Yuqi Yang · Yang Cao · Qibin Hou · Ming-Ming Cheng · Chunhua Shen
|
||
Garment Recovery with Shape and Deformation Priors
Ren Li · Corentin Dumery · Benoît Guillard · Pascal Fua
|
||
KPConvX: Modernizing Kernel Point Convolution with Kernel Attention
Hugues Thomas · Yao-Hung Hubert Tsai · Timothy Barfoot · Jian Zhang
|
||
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Muyang Li · Tianle Cai · Jiaxin Cao · Qinsheng Zhang · Han Cai · Junjie Bai · Yangqing Jia · Kai Li · Song Han
|
||
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Sadeep Jayasumana · Srikumar Ramalingam · Andreas Veit · Daniel Glasner · Ayan Chakrabarti · Sanjiv Kumar
|
||
Text-Driven Image Editing via Learnable Regions
Yuanze Lin · Yi-Wen Chen · Yi-Hsuan Tsai · Lu Jiang · Ming-Hsuan Yang
|
||
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
Feng Lu · Xiangyuan Lan · Lijun Zhang · Dongmei Jiang · Yaowei Wang · Chun Yuan
|
||
pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
David Charatan · Sizhe Lester Li · Andrea Tagliasacchi · Vincent Sitzmann
|
||
Rethinking Few-shot 3D Point Cloud Semantic Segmentation
Zhaochong An · Guolei Sun · Yun Liu · Fayao Liu · Zongwei Wu · Dan Wang · Luc Van Gool · Serge Belongie
|
||
Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation
Bingfeng Zhang · Siyue Yu · Yunchao Wei · Yao Zhao · Jimin Xiao
|
||
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu · Fangyun Wei · Yanye Lu
|
||
PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor
Jaewon Jung · Hongsun Jang · Jaeyong Song · Jinho Lee
|
||
Adversarial Distillation Based on Slack Matching and Attribution Region Alignment
Shenglin Yin · Zhen Xiao · Mingxuan Song · Jieyi Long
|
||
Universal Robustness via Median Random Smoothing for Real-World Super-Resolution
Zakariya Chaouai · Mohamed Tamaazousti
|
||
RCBEVDet: Radar-camera Fusion in Bird’s Eye View for 3D Object Detection
Zhiwei Lin · Zhe Liu · Zhongyu Xia · Xinhao Wang · Yongtao Wang · Shengxiang Qi · Yang Dong · Nan Dong · Le Zhang · Ce Zhu
|
||
VBench: Comprehensive Benchmark Suite for Video Generative Models
Ziqi Huang · Yinan He · Jiashuo Yu · Fan Zhang · Chenyang Si · Yuming Jiang · Yuanhan Zhang · Tianxing Wu · Jin Qingyang · Nattapol Chanpaisit · Yaohui Wang · Xinyuan Chen · Limin Wang · Dahua Lin · Yu Qiao · Ziwei Liu
|
||
SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder
Dihan Zheng · Yihang Zou · Xiaowen Zhang · Chenglong Bao
|
||
SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction
Zechuan Zhang · Zongxin Yang · Yi Yang
|
||
Generalized Predictive Model for Autonomous Driving
Jiazhi Yang · Shenyuan Gao · Yihang Qiu · Li Chen · Tianyu Li · Bo Dai · Kashyap Chitta · Penghao Wu · Jia Zeng · Ping Luo · Jun Zhang · Andreas Geiger · Yu Qiao · Hongyang Li
|
||
Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
Chenyu You · Yifei Min · Weicheng Dai · Jasjeet Sekhon · Lawrence Staib · James Duncan
|
||
Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
Song Tang · Wenxin Su · Mao Ye · Xiatian Zhu
|
||
Grounding and Enhancing Grid-based Models for Neural Fields
Zelin Zhao · FENGLEI FAN · Wenlong Liao · Junchi Yan
|
||
Neural Sign Actors: A diffusion model for 3D sign language production from text
Vasileios Baltatzis · Rolandos Alexandros Potamias · Evangelos Ververas · Guanxiong Sun · Jiankang Deng · Stefanos Zafeiriou
|
||
HumMUSS: Human Motion Understanding using State Space Models
Arnab Mondal · Stefano Alletto · Denis Tome
|
||
APISR: Anime Production Inspired Real-World Anime Super-Resolution
Boyang Wang · Fengyu Yang · Xihang Yu · Chao Zhang · Hanbin Zhao
|
||
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
SICHENG MO · Fangzhou Mu · Kuan Heng Lin · Yanli Liu · Bochen Guan · Yin Li · Bolei Zhou
|
||
ShapeWalk: Compositional Shape Editing through Language-Guided Chains
Habib Slim · Mohamed Elhoseiny
|
||
LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
Pancheng Zhao · Peng Xu · Pengda Qin · Deng-Ping Fan · Zhicheng Zhang · Guoli Jia · Bowen Zhou · Jufeng Yang
|
||
WaveFace: Authentic Face Restoration with Efficient Frequency Recovery
Yunqi Miao · Jiankang Deng · Jungong Han
|
||
Hierarchical Histogram Threshold Segmentation – Auto-terminating High-detail Oversegmentation
Thomas Chang · Simon Seibt · Bartosz von Rymon Lipinski
|
||
G-FARS: Gradient-Field-based Auto-Regressive Sampling for 3D Part Grouping
Junfeng Cheng · Tania Stathaki
|
||
Dual-Enhanced Coreset Selection with Class-wise Collaboration for Online Blurry Class Incremental Learning
Yutian Luo · Shiqi Zhao · Haoran Wu · Zhiwu Lu
|
||
TULIP: Multi-camera 3D Precision Assessment of Parkinson's Disease
Kyungdo Kim · Sihan Lyu · Sneha Mantri · Timothy DUNN
|
||
BANF: Band-limited Neural Fields for Levels of Detail Reconstruction
Ahan Shabanov · Shrisudhan Govindarajan · Cody Reading · Leili Goli · Daniel Rebain · Kwang Moo Yi · Andrea Tagliasacchi
|
||
Weakly Supervised Point Cloud Semantic Segmentation via Artificial Oracle
Hyeokjun Kweon · Jihun Kim · Kuk-Jin Yoon
|
||
StraightPCF: Straight Point Cloud Filtering
Dasith de Silva Edirimuni · Xuequan Lu · Gang Li · Lei Wei · Antonio Robles-Kelly · Hongdong Li
|
||
SynFog: A Photo-realistic Synthetic Fog Dataset based on End-to-end Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving
Yiming Xie · Henglu Wei · Zhenyi Liu · Xiaoyu Wang · Xiangyang Ji
|
||
PeVL: Pose-Enhanced Vision-Language Model for Fine-Grained Human Action Recognition
Haosong Zhang · Mei Leong · Liyuan Li · Weisi Lin
|
||
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Bo Zou · Chao Yang · Yu Qiao · Chengbin Quan · Youjian Zhao
|
||
UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model
Shuai Yuan · Lei Luo · Zhuo Hui · Can Pu · Xiaoyu Xiang · Rakesh Ranjan · Denis Demandolx
|
||
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
Taeho Kang · Youngki Lee
|
||
GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors
Yuan Dong · Qi Zuo · Xiaodong Gu · Weihao Yuan · zhengyi zhao · Zilong Dong · Liefeng Bo · Qixing Huang
|
||
Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos
Yuhan Shen · Ehsan Elhamifar
|
||
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Haoning Wu · Zicheng Zhang · Erli Zhang · Chaofeng Chen · Liang Liao · Annan Wang · Kaixin Xu · Chunyi Li · Jingwen Hou · Guangtao Zhai · Xue Geng · Wenxiu Sun · Qiong Yan · Weisi Lin
|
||
REACTO: Reconstructing Articulated Objects from a Single Video
Chaoyue Song · Jiacheng Wei · Chuan-Sheng Foo · Guosheng Lin · Fayao Liu
|
||
Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement
Daiwei Yu · Zhuorong Li · Lina Wei · Canghong Jin · Yun Zhang · Sixian Chan
|
||
Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training
Shizhan Gong · Qi Dou · Farzan Farnia
|
||
Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Daichi Horita · Naoto Inoue · Kotaro Kikuchi · Kota Yamaguchi · Kiyoharu Aizawa
|
||
A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
Siddharth Srivastava · Gaurav Sharma
|
||
Transfer CLIP for Generalizable Image Denoising
Jun Cheng · Dong Liang · Shan Tan
|
||
LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images
Jing Zhang · Irving Fang · Hao Wu · Akshat Kaushik · Alice Rodriguez · Hanwen Zhao · Juexiao Zhang · Zhuo Zheng · Radu Iovita · Chen Feng
|
||
Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration
Mingyuan Meng · Dagan Feng · Lei Bi · Jinman Kim
|
||
PixelLM: Pixel Reasoning with Large Multimodal Model
Zhongwei Ren · Zhicheng Huang · Yunchao Wei · Yao Zhao · Dongmei Fu · Jiashi Feng · Xiaojie Jin
|
||
Cross Initialization for Face Personalization of Text-to-Image Models
Lianyu Pang · Jian Yin · Haoran Xie · Qiping Wang · Qing Li · Xudong Mao
|
||
Neural Fields as Distributions: Signal Processing Beyond Euclidean Space
Daniel Rebain · Soroosh Yazdani · Kwang Moo Yi · Andrea Tagliasacchi
|
||
OmniMotionGPT: Animal Motion Generation with Limited Data
Zhangsihao Yang · Mingyuan Zhou · Mengyi Shan · Bingbing Wen · Ziwei Xuan · Mitch Hill · Junjie Bai · Guo-Jun Qi · Yalin Wang
|
||
Stratified Avatar Generation from Sparse Observations
Han Feng · Wenchao Ma · Quankai Gao · Xianwei Zheng · Nan Xue · Huijuan Xu
|
||
HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
Gangwei Xu · Yujin Wang · Jinwei Gu · Tianfan Xue · Xin Yang
|
||
Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations
Kewei Wang · Yizheng Wu · Jun Cen · Zhiyu Pan · Xingyi Li · Zhe Wang · Zhiguo Cao · Guosheng Lin
|
||
Do Vision and Language Encoders Represent the World Similarly?
Mayug Maniparambil · Raiymbek Akshulakov · YASSER ABDELAZIZ DAHOU DJILALI · Mohamed El Amine Seddik · Sanath Narayan · Karttikeya Mangalam · Noel O'Connor
|
||
Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform
Chunghyun Park · Seungwook Kim · Jaesik Park · Minsu Cho
|
||
FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning
Rishub Tamirisa · Chulin Xie · Wenxuan Bao · Andy Zhou · Ron Arel · Aviv Shamsian
|
||
Test-Time Adaptation for Depth Completion
Hyoungseob Park · Anjali W Gupta · Alex Wong
|
||
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li · Yali Wang · Yinan He · Yizhuo Li · Yi Wang · Yi Liu · Zun Wang · Jilan Xu · Guo Chen · Ping Luo · Limin Wang · Yu Qiao
|
||
Dispel Darkness for Better Fusion: A Controllable Visual Enhancer based on Cross-modal Conditional Adversarial Learning
Hao Zhang · Linfeng Tang · Xinyu Xiang · Xuhui Zuo · Jiayi Ma
|
||
Time-Efficient Light-Field Acquisition Using Coded Aperture and Events
Shuji Habuchi · Keita Takahashi · Chihiro Tsutake · Toshiaki Fujii · Hajime Nagahara
|
||
Video-P2P: Video Editing with Cross-attention Control
Shaoteng Liu · Yuechen Zhang · Wenbo Li · Zhe Lin · Jiaya Jia
|
||
GenN2N: Generative NeRF2NeRF Translation
Xiangyue Liu · Han Xue · Kunming Luo · Ping Tan · Li Yi
|
||
Dual-scale Transformer for Large-scale Single-Pixel Imaging
Gang Qu · Ping Wang · Xin Yuan
|
||
Parameter Efficient Self-Supervised Geospatial Domain Adaptation
Linus Scheibenreif · Michael Mommert · Damian Borth
|
||
Multimodal Representation Learning by Alternating Unimodal Adaptation
Xiaohui Zhang · Xiaohui Zhang · Jaehong Yoon · Mohit Bansal · Huaxiu Yao
|
||
Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps
Octave Mariotti · Oisin Mac Aodha · Hakan Bilen
|
||
SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model
Zhengang Li · Yan Kang · Yuchen Liu · Difan Liu · Tobias Hinz · Feng Liu · Yanzhi Wang
|
||
Compositional Video Understanding with Spatiotemporal Structure-based Transformers
Hoyeoung Yun · Jinwoo Ahn · Minseo Kim · Eun-Sol Kim
|
||
CoDi-2: Interleaved and In-Context Any-to-Any Generation
Zineng Tang · Ziyi Yang · MAHMOUD KHADEMI · Yang Liu · Chenguang Zhu · Mohit Bansal
|
||
SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation
Yamei Chen · Yan Di · Guangyao Zhai · Fabian Manhardt · Chenyangguang Zhang · Ruida Zhang · Federico Tombari · Nassir Navab · Benjamin Busam
|
||
ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation
Dar-Yen Chen · Hamish Tennent · Ching-Wen Hsu
|
||
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
Agneet Chatterjee · Tejas Gokhale · Chitta Baral · 'YZ' Yezhou Yang
|
||
SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model
Inhwan Bae · Young-Jae Park · Hae-Gon Jeon
|
||
TexVocab: Texture Vocabulary-conditioned Human Avatars
Yuxiao Liu · Zhe Li · Yebin Liu · Haoqian Wang
|
||
Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features
Thomas Wimmer · Peter Wonka · Maks Ovsjanikov
|
||
SketchINR: A First Look into Sketches as Implicit Neural Representations
Hmrishav Bandyopadhyay · Ayan Kumar Bhunia · Pinaki Nath Chowdhury · Aneeshan Sain · Tao Xiang · Timothy Hospedales · Yi-Zhe Song
|
||
Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement
Jinyoung Jun · Jae-Han Lee · Chang-Su Kim
|
||
Neighbor Relations Matter in Video Scene Detection
Jiawei Tan · Hongxing Wang · Jiaxin Li · Zhilong Ou · Zhangbin Qian
|
||
NOPE: Novel Object Pose Estimation from a Single Image
Van Nguyen Nguyen · Thibault Groueix · Georgy Ponimatkin · Yinlin Hu · Renaud Marlet · Mathieu Salzmann · Vincent Lepetit
|
||
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
Yuqi Wang · Jiawei He · Lue Fan · Hongxin Li · Yuntao Chen · Zhaoxiang Zhang
|
||
Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric Characteristics
Xingtao Wang · Hongliang Wei · Xiaopeng Fan · Debin Zhao
|
||
Link-Context Learning for Multimodal LLMs
Yan Tai · Weichen Fan · Zhao Zhang · Ziwei Liu
|
||
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
Jiawang Bai · Kuofeng Gao · Shaobo Min · Shu-Tao Xia · Zhifeng Li · Wei Liu
|
||
The Manga Whisperer: Automatically Generating Transcriptions for Comics
Ragav Sachdeva · Andrew Zisserman
|
||
Unsupervised Template-assisted Point Cloud Shape Correspondence Network
Jiacheng Deng · Jiahao Lu · Tianzhu Zhang
|
||
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Hangjie Yuan · Shiwei Zhang · Xiang Wang · Yujie Wei · Tao Feng · Yining Pan · Yingya Zhang · Ziwei Liu · Samuel Albanie · Dong Ni
|
||
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal · Yonatan Bitton · Idan Szpektor · Kai-Wei Chang · Aditya Grover
|
||
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
Yuanhuiyi Lyu · Xu Zheng · Jiazhou Zhou · Lin Wang
|
||
HEAL-SWIN: A Vision Transformer On The Sphere
Oscar Carlsson · Jan E. Gerken · Hampus Linander · Heiner Spiess · Fredrik Ohlsson · Christoffer Petersson · Daniel Persson
|
||
UniDepth: Universal Monocular Metric Depth Estimation
Luigi Piccinelli · Yung-Hsu Yang · Christos Sakaridis · Mattia Segu · Siyuan Li · Luc Van Gool · Fisher Yu
|
||
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain · Jianwei Yang · Humphrey Shi
|
||
QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction
Ishak Ayad · Nicolas Larue · Mai K. Nguyen
|
||
Cross-dimension Affinity Distillation for 3D EM Neuron Segmentation
Xiaoyu Liu · Miaomiao Cai · Yinda Chen · Yueyi Zhang · Te Shi · Ruobing Zhang · Xuejin Chen · Zhiwei Xiong
|
||
Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation
Keonhee Han · Dominik Muhle · Felix Wimbauer · Daniel Cremers
|
||
CrossKD: Cross-Head Knowledge Distillation for Dense Object Detection
JiaBao Wang · yuming chen · Zhaohui Zheng · Xiang Li · Ming-Ming Cheng · Qibin Hou
|
||
Leveraging Camera Triplets for Efficient and Accurate Structure-from-Motion
Lalit Manam · Venu Madhav Govindu
|
||
Seeing the World through Your Eyes
Hadi Alzayer · Kevin Zhang · Brandon Y. Feng · Christopher Metzler · Jia-Bin Huang
|
||
Equivariant Multi-Modality Image Fusion
Zixiang Zhao · Haowen Bai · Jiangshe Zhang · Yulun Zhang · Kai Zhang · Shuang Xu · Dongdong Chen · Radu Timofte · Luc Van Gool
|
||
Residual Denoising Diffusion Models
Jiawei Liu · Qiang Wang · Huijie Fan · Yinong Wang · Yandong Tang · Liangqiong Qu
|
||
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
Zheng Li · Xiang Li · xinyi fu · Xin Zhang · Weiqiang Wang · Shuo Chen · Jian Yang
|
||
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Ruoyu Feng · Wenming Weng · Yanhui Wang · Yuhui Yuan · Jianmin Bao · Chong Luo · Zhibo Chen · Baining Guo
|
||
CORES: Convolutional Response-based Score for Out-of-distribution Detection
Keke Tang · Chao Hou · Weilong Peng · Runnan Chen · Peican Zhu · Wenping Wang · Zhihong Tian
|
||
MoDE: CLIP Data Experts via Clustering
Jiawei Ma · Po-Yao Huang · Saining Xie · Shang-Wen Li · Luke Zettlemoyer · Shih-Fu Chang · Wen-tau Yih · Hu Xu
|
||
SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation
Aysim Toker · Marvin Eisenberger · Daniel Cremers · Laura Leal-Taixe
|
||
Dual-consistency Model Inversion for Non-exemplar Class Incremental Learning
Zihuan Qiu · Yi Xu · Fanman Meng · Hongliang Li · Linfeng Xu · Qingbo Wu
|
||
Class Tokens Infusion for Weakly Supervised Semantic Segmentation
Sung-Hoon Yoon · Hoyong Kwon · Hyeonseong Kim · Kuk-Jin Yoon
|
||
PointOBB: Learning Oriented Object Detection via Single Point Supervision
Junwei Luo · Xue Yang · Yi Yu · Qingyun Li · Junchi Yan · Yansheng Li
|
||
Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability, Composability, and Decomposability from Anatomy via Self-Supervision
Mohammad Reza Hosseinzadeh Taher · Michael Gotway · Jianming Liang
|
||
Category-Level Multi-Part Multi-Joint 3D Shape Assembly
Yichen Li · Kaichun Mo · Yueqi Duan · He Wang · Jiequan Zhang · Lin Shao · Wojciech Matusik · Leonidas Guibas
|
||
JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models
YUNCHENG GUO · Xiaodong Gu
|
||
A Category Agnostic Model for Visual Rearrangement
Yuyi Liu · Xinhang Song · Weijie Li · XIAOHAN Wang · Shuqiang Jiang
|
||
WorDepth: Variational Language Prior for Monocular Depth Estimation
Ziyao Zeng · Hyoungseob Park · Fengyu Yang · Daniel Wang · Stefano Soatto · Dong Lao · Alex Wong
|
||
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Tai Wang · Xiaohan Mao · Chenming Zhu · Runsen Xu · Ruiyuan Lyu · Peisen Li · Xiao Chen · Wenwei Zhang · Kai Chen · Tianfan Xue · Xihui Liu · Cewu Lu · Dahua Lin · Jiangmiao Pang
|
||
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
LIn Zhao · Tianchen Zhao · Zinan Lin · Xuefei Ning · Guohao Dai · Huazhong Yang · Yu Wang
|
||
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
Hongjie Wang · Difan Liu · Yan Kang · Yijun Li · Zhe Lin · Niraj Jha · Yuchen Liu
|
||
Deep Generative Model based Rate-Distortion for Image Downscaling Assessment
yuanbang liang · Bhavesh Garg · Paul L. Rosin · Yipeng Qin
|
||
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners
Chun Feng · Joy Hsu · Weiyu Liu · Jiajun Wu
|
||
Forecasting of 3D Whole-body Human Poses with Grasping Objects
yan haitao · Qiongjie Cui · Jiexin Xie · Shijie Guo
|
||
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
Xiang Li · Qianli Shen · Kenji Kawaguchi
|
||
Towards Co-Evaluation of Cameras, HDR, and Algorithms for Industrial-Grade 6DoF Pose Estimation
Agastya Kalra · Guy Stoppi · Dmitrii Marin · Vage Taamazyan · Aarrushi Shandilya · Rishav Agarwal · Anton Boykov · Aaron Chong · Michael Stark
|
||
Correcting Diffusion Generation through Resampling
Yujian Liu · Yang Zhang · Tommi Jaakkola · Shiyu Chang
|
||
Partial-to-Partial Shape Matching with Geometric Consistency
Viktoria Ehm · Maolin Gao · Paul Roetzer · Marvin Eisenberger · Daniel Cremers · Florian Bernard
|
||
Text-guided Explorable Image Super-resolution
Kanchana Vaishnavi Gandikota · Paramanand Chandramouli
|
||
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
Kaiwen Zhang · Yifan Zhou · Xudong XU · Bo Dai · Xingang Pan
|
||
Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis
Mingyang Zhao · Jiang Jingen · Lei Ma · Shiqing Xin · Gaofeng Meng · Dong-Ming Yan
|
||
LAENeRF: Local Appearance Editing for Neural Radiance Fields
Lukas Radl · Michael Steiner · Andreas Kurz · Markus Steinberger
|
||
Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation
Philipp Schröppel · Christopher Wewer · Jan Lenssen · Eddy Ilg · Thomas Brox
|
||
Exploiting Style Latent Flows for Generalizing Video Deepfake Detection
Jongwook Choi · Taehoon Kim · Yonghyun Jeong · Seungryul Baek · Jongwon Choi
|
||
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing
Zeyinzi Jiang · Chaojie Mao · Yulin Pan · Zhen Han · Jingfeng Zhang
|
||
DREAM: Diffusion Rectification and Estimation-Adaptive Models
Jinxin Zhou · Tianyu Ding · Tianyi Chen · Jiachen Jiang · Ilya Zharkov · Zhihui Zhu · Luming Liang
|
||
Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling
Leon Sick · Dominik Engel · Pedro Hermosilla · Timo Ropinski
|
||
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Willi Menapace · Aliaksandr Siarohin · Ivan Skorokhodov · Ekaterina Deyneka · Tsai-Shien Chen · Anil Kag · Yuwei Fang · Aleksei Stoliar · Elisa Ricci · Jian Ren · Sergey Tulyakov
|
||
URHand: Universal Relightable Hands
Zhaoxi Chen · Gyeongsik Moon · Kaiwen Guo · Chen Cao · Stanislav Pidhorskyi · Tomas Simon · Rohan Joshi · Yuan Dong · Yichen Xu · Bernardo Pires · He Wen · Lucas Evans · Bo Peng · Julia Buffalini · Autumn Trimble · Kevyn McPhail · Melissa Schoeller · Shoou-I Yu · Javier Romero · Michael Zollhoefer · Yaser Sheikh · Ziwei Liu · Shunsuke Saito
|
||
Enhancing Visual Continual Learning with Language-Guided Supervision
Bolin Ni · Hongbo Zhao · Chenghao Zhang · Ke Hu · Gaofeng Meng · Zhaoxiang Zhang · Shiming Xiang
|
||
Generating Human Motion in 3D Scenes from Text Descriptions
Zhi Cen · Huaijin Pi · Sida Peng · Zehong Shen · Minghui Yang · Shuai Zhu · Hujun Bao · Xiaowei Zhou
|
||
Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge
Haoxiang Ma · Modi Shi · Boyang GAO · Di Huang
|
||
Bridging the Synthetic-to-Authentic Gap: Distortion-Guided Unsupervised Domain Adaptation for Blind Image Quality Assessment
Aobo Li · Jinjian Wu · Yongxu Liu · Leida Li
|
||
Grid Diffusion Models for Text-to-Video Generation
Taegyeong Lee · Soyeong Kwon · Taehwan Kim
|
||
Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction
Devikalyan Das · Christopher Wewer · Raza Yunus · Eddy Ilg · Jan Lenssen
|
||
Personalized Residuals for Concept-Driven Text-to-Image Generation
Cusuh Ham · Matthew Fisher · James Hays · Nicholas Kolkin · Yuchen Liu · Richard Zhang · Tobias Hinz
|
||
Making Vision Transformers Truly Shift-Equivariant
Renan A. Rojas-Gomez · Teck-Yian Lim · Minh Do · Raymond A. Yeh
|
||
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
Dai Shi
|
||
Learning Triangular Distribution in Visual World
Ping Chen · Xingpeng Zhang · Chengtao Zhou · dichao Fan · Peng Tu · Le Zhang · Yanlin Qian
|
||
Free3D: Consistent Novel View Synthesis without 3D Representation
Chuanxia Zheng · Andrea Vedaldi
|
||
Generalized Event Cameras
Varun Sundar · Matthew Dutson · Andrei Ardelean · Claudio Bruschini · Edoardo Charbon · Mohit Gupta
|
||
Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration
Yuang Ai · Huaibo Huang · Xiaoqiang Zhou · Jiexiang Wang · Ran He
|
||
DIEM: Decomposition-Integration Enhancing Multimodal Insights
Xinyi Jiang · Guoming Wang · Junhao Guo · Juncheng Li · Wenqiao Zhang · Rongxing Lu · Siliang Tang
|
||
Balancing Act: Distribution-Guided Debiasing in Diffusion Models
Rishubh Parihar · Abhijnya Bhat · Abhipsa Basu · Saswat Mallick · Jogendra Kundu Kundu · R. Venkatesh Babu
|
||
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Zhen Zhao · Jingqun Tang · Chunhui Lin · Binghong Wu · Can Huang · Hao Liu · Xin Tan · Zhizhong Zhang · Yuan Xie
|
||
NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation
Vikas Thamizharasan · Difan Liu · Matthew Fisher · Nanxuan Zhao · Evangelos Kalogerakis · Michal Lukáč
|
||
Learning to Count without Annotations
Lukas Knobel · Tengda Han · Yuki Asano
|
||
Towards Backward-Compatible Continual Learning of Image Compression
Zhihao Duan · Ming Lu · Justin Yang · Jiangpeng He · Zhan Ma · Fengqing Zhu
|
||
Learning to Segment Referred Objects from Narrated Egocentric Videos
Yuhan Shen · Huiyu Wang · Xitong Yang · Matt Feiszli · Ehsan Elhamifar · Lorenzo Torresani · Effrosyni Mavroudi
|
||
3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation
Songchun Zhang · Yibo Zhang · Quan Zheng · Rui Ma · Wei Hua · Hujun Bao · Weiwei Xu · Changqing Zou
|
||
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
Wenjin Hou · Shiming Chen · Shuhuang Chen · Ziming Hong · Yan Wang · Xuetao Feng · Salman Khan · Fahad Shahbaz Khan · Xinge You
|
||
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers
Haoyu Ma · Shahin Mahdizadehaghdam · Bichen Wu · Zhipeng Fan · Yuchao Gu · Wenliang Zhao · Lior Shapira · Xiaohui Xie
|
||
Spike-guided Motion Deblurring with Unknown Modal Spatiotemporal Alignment
Jiyuan Zhang · Shiyan Chen · Yajing Zheng · Zhaofei Yu · Tiejun Huang
|
||
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Yuqi Yang · Peng-Tao Jiang · Qibin Hou · Hao Zhang · Jinwei Chen · Bo Li
|
||
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model
Jianhao Zeng · Dan Song · Weizhi Nie · Hongshuo Tian · Tongtong Wang · An-An Liu
|
||
Making Visual Sense of Oracle Bones for You and Me
Runqi Qiao · LAN YANG · Kaiyue Pang · Honggang Zhang
|
||
Binarized Low-light Raw Video Enhancement
Gengchen Zhang · Yulun Zhang · Xin Yuan · Ying Fu
|
||
Coherent Temporal Synthesis for Incremental Action Segmentation
GUODONG DING · Hans Golong · Angela Yao
|
||
Omni-Q: Omni-Directional Scene Understanding for Unsupervised Visual Grounding
Sai Wang · Yutian Lin · Yu Wu
|
||
Depth-aware Test-Time Training for Zero-shot Video Object Segmentation
Weihuang Liu · Xi Shen · Haolun Li · Xiuli Bi · Bo Liu · Chi-Man Pun · Xiaodong Cun
|
||
Communication-Efficient Federated Learning with Accelerated Client Gradient
Geeho Kim · Jinkyu Kim · Bohyung Han
|
||
Real-time 3D-aware Portrait Video Relighting
Ziqi Cai · Kaiwen Jiang · Shu-Yu Chen · Yu-Kun Lai · Hongbo Fu · Boxin Shi · Lin Gao
|
||
Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular, Stereo, and RGB-D Cameras
Huajian Huang · Longwei Li · Hui Cheng · Sai-Kit Yeung
|
||
Attention Calibration for Disentangled Text-to-Image Personalization
Yanbing Zhang · Mengping Yang · Qin Zhou · Zhe Wang
|
||
HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud
WENCAN CHENG · Hao Tang · Luc Van Gool · Jong Hwan Ko
|
||
NeISF: Neural Incident Stokes Field for Geometry and Material Estimation
Chenhao Li · Taishi Ono · Takeshi Uemori · Hajime Mihara · Alexander Gatto · Hajime Nagahara · Yusuke Moriuchi
|
||
MaskPLAN: Masked Generative Layout Planning from Partial Input
Hang Zhang · Anton Savov · Benjamin Dillenburger
|
||
Rapid 3D Model Generation with Intuitive 3D Input
Tianrun Chen · Chaotao Ding · Shangzhan Zhang · Chunan Yu · Ying Zang · Zejian Li · Sida Peng · Lingyun Sun
|
||
MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception
Thien-Minh Nguyen · Shenghai Yuan · Thien Nguyen · Pengyu Yin · Haozhi Cao · Lihua Xie · Maciej Wozniak · Patric Jensfelt · Marko Thiel · Justin Ziegenbein · Noel Blunder
|
||
Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Stanislaw Szymanowicz · Christian Rupprecht · Andrea Vedaldi
|
||
$CrowdDiff$: Multi-hypothesis Crowd Density Estimation using Diffusion Models
Yasiru Ranasinghe · Nithin Gopalakrishnan Nair · Wele Gedara Chaminda Bandara · Vishal M. Patel
|
||
Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
Minh-Tuan Tran · Trung Le · Xuan-May Le · Mehrtash Harandi · Dinh Phung
|
||
Unsupervised Salient Instance Detection
Xin Tian · Ke Xu · Rynson W.H. Lau
|
||
Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
Kim Youwang · Tae-Hyun Oh · Gerard Pons-Moll
|
||
L-MAGIC: Language Model Assisted Generation of Images with Consistency
zhipeng cai · Matthias Mueller · Reiner Birkl · Diana Wofk · Shao-Yen Tseng · JunDa Cheng · Gabriela Ben Melech Stan · Vasudev Lal · Michael Paulitsch
|
||
GLACE: Global Local Accelerated Coordinate Encoding
Fangjinhua Wang · Xudong Jiang · Silvano Galliani · Christoph Vogel · Marc Pollefeys
|
||
HouseCat6D - A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios
HyunJun Jung · Shun-Cheng Wu · Patrick Ruhkamp · Guangyao Zhai · Hannah Schieber · Giulia Rizzoli · Pengyuan Wang · Hongcheng Zhao · Lorenzo Garattoni · Sven Meier · Daniel Roth · Nassir Navab · Benjamin Busam
|
||
Customization Assistant for Text-to-image Generation
Yufan Zhou · Ruiyi Zhang · Jiuxiang Gu · Tong Sun
|
||
UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures
Mingyuan Zhou · Rakib Hyder · Ziwei Xuan · Guo-Jun Qi
|
||
Event-based Structure-from-Orbit
Ethan Elms · Yasir Latif · Tae Ha Park · Tat-Jun Chin
|
||
From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior
Jaeho Moon · Juan Luis Gonzalez Bello · Byeongjun Kwon · Munchurl Kim
|
||
Dynamic LiDAR Re-simulation using Compositional Neural Fields
Hanfeng Wu · Xingxing Zuo · Stefan Leutenegger · Or Litany · Konrad Schindler · Shengyu Huang
|
||
Unsupervised Blind Image Deblurring Based on Self-Enhancement
Lufei Chen · Xiangpeng Tian · Shuhua Xiong · Yinjie Lei · Chao Ren
|
||
SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance Fields
Quentin HERAU · Nathan Piasco · Moussab Bennehar · Luis Guiller,o Roldao Jimenez · Dzmitry Tsishkou · MigniotCyrille · Modélisation Information Systèmes · Cedric Demonceaux
|
||
Label-Efficient Group Robustness via Out-of-Distribution Concept Curation
Yiwei Yang · Anthony Liu · Robert Wolfe · Aylin Caliskan · Bill Howe
|
||
StyLitGAN: Image-based Relighting via Latent Control
Anand Bhattad · James Soole · David Forsyth
|
||
Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability
Jaehui Hwang · Junghyuk Lee · Jong-Seok Lee
|
||
ICON: Incremental CONfidence for Joint Pose and Radiance Field Optimization
Weiyao Wang · Pierre Gleize · Hao Tang · Xingyu Chen · Kevin Liang · Matt Feiszli
|
||
FreeDrag: Feature Dragging for Reliable Point-based Image Editing
Pengyang Ling · Lin Chen · Pan Zhang · Huaian Chen · Yi Jin · Jinjin Zheng
|
||
Instance-Aware Group Quantization for Vision Transformers
Jaehyeon Moon · Dohyung Kim · Jun Yong Cheon · Bumsub Ham
|
||
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
Tianyi Xie · Zeshun Zong · Yuxing Qiu · Xuan Li · Yutao Feng · Yin Yang · Chenfanfu Jiang
|
||
Viewpoint-Aware Visual Grounding in 3D Scenes
Xiangxi Shi · Zhonghua Wu · Stefan Lee
|
||
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Kang chenkang · Xiangqian Wu
|
||
PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
Kun Yuan · Hongbo Liu · Mading Li · Muyi Sun · Ming Sun · Jiachao Gong · Jinhua Hao · Chao Zhou · Yansong Tang
|
||
Batch Normalization Alleviates the Spectral Bias in Coordinate Networks
Zhicheng Cai · Hao Zhu · Qiu Shen · Xinran Wang · Xun Cao
|
||
Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling
Zhe Li · Zerong Zheng · Lizhen Wang · Yebin Liu
|
||
OMG-Seg: Is One Model Good Enough For All Segmentation?
Xiangtai Li · Haobo Yuan · Wei Li · Henghui Ding · Size Wu · Wenwei Zhang · Yining Li · Kai Chen · Chen Change Loy
|
||
Pose Adapted Shape Learning for Large-Pose Face Reenactment
Gee-Sern Hsu · Jie-Ying Zhang · Yu-Hsiang Huang · Wei-Jie Hong
|
||
Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty from Pre-trained Models
Gianni Franchi · Olivier Laurent · Maxence Leguéry · Andrei Bursuc · Andrea Pilzer · Angela Yao
|
||
Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation
Luca Barsellotti · Roberto Amoroso · Marcella Cornia · Lorenzo Baraldi · Rita Cucchiara
|
||
EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
Haiyang Liu · Zihao Zhu · Giorgio Becherini · YICHEN PENG · Mingyang Su · YOU ZHOU · Xuefei Zhe · Naoya Iwamoto · Bo Zheng · Michael J. Black
|
||
Explaining CLIP's performance disparities on data from blind/low vision users
Daniela Massiceti · Camilla Longden · Agnieszka Słowik · Samuel Wills · Martin Grayson · Cecily Morrison
|
||
NB-GTR: Narrow-Band Guided Turbulence Removal
Yifei Xia · Chu Zhou · Chengxuan Zhu · Minggui Teng · Chao Xu · Boxin Shi
|
||
LaneCPP: Continuous 3D Lane Detection using Physical Priors
Maximilian Pittner · Joel Janai · Alexandru Paul Condurache
|
||
Large Language Models are Good Prompt Learners for Low-Shot Image Classification
Zhaoheng Zheng · Jingmin Wei · Xuefeng Hu · Haidong Zhu · Ram Nevatia
|
||
PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
Yufei Zhang · Jeffrey Kephart · Zijun Cui · Qiang Ji
|
||
LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment
yiming ren · xiao han · Chengfeng Zhao · Jingya Wang · Lan Xu · Jingyi Yu · Yuexin Ma
|
||
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu · Otilia Stretcu · Chun-Ta Lu · Krishnamurthy Viswanathan · Kenji Hata · Enming Luo · Ranjay Krishna · Ariel Fuxman
|
||
FineSports: A Multi-person Hierarchical Sports Video Dataset for Fine-grained Action Understanding
Jinglin Xu · Guohao Zhao · Sibo Yin · Wenhao Zhou · Yuxin Peng
|
||
An Aggregation-Free Federated Learning for Tackling Data Heterogeneity
Yuan Wang · Huazhu Fu · Renuga Kanagavelu · Qingsong Wei · Yong Liu · Rick Goh
|
||
Infrared Adversarial Car Stickers
Xiaopei Zhu · Yuqiu Liu · Zhanhao Hu · Jianmin Li · Xiaolin Hu
|
||
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina · Massimiliano Mancini · Elia Cunegatti · Gaowen Liu · Giovanni Iacca · Elisa Ricci
|
||
CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification
Yiyu Chen · Zheyi Fan · Zhaoru Chen · Yixuan Zhu
|
||
CURSOR: Scalable Mixed-Order Hypergraph Matching with CUR Decomposition
Qixuan Zheng · Ming Zhang · Hong Yan
|
||
Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360 Room Layout Reconstruction
Cheng Sun · Wei-En Tai · Yu-Lin Shih · Kuan-Wei Chen · Yong-Jing Syu · Kent Selwyn The · Yu-Chiang Frank Wang · Hwann-Tzong Chen
|
||
Boosting Adversarial Transferability by Block Shuffle and Rotation
Kunyu Wang · he xuanran · Wenxuan Wang · Xiaosen Wang
|
||
Advancing Saliency Ranking with Human Fixations: Dataset, Models and Benchmarks
Bowen Deng · Siyang Song · Andrew French · Denis Schluppeck · Michael Pound
|
||
GALA: Generating Animatable Layered Assets from a Single Scan
Taeksoo Kim · Byungjun Kim · Shunsuke Saito · Hanbyul Joo
|
||
Single Mesh Diffusion Models with Field Latents for Texture Generation
Thomas W. Mitchel · Carlos Esteves · Ameesh Makadia
|
||
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo · Pedro Morgado
|
||
Move Anything with Layered Scene Diffusion
Jiawei Ren · Mengmeng Xu · Jui-Chieh Wu · Ziwei Liu · Tao Xiang · Antoine Toisoul
|
||
Learning Diffusion Texture Priors for Image Restoration
Tian Ye · Sixiang Chen · Wenhao Chai · Zhaohu Xing · Jing Qin · Ge lin · Lei Zhu
|
||
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
Runhao Zeng · Xiaoyong Chen · Jiaming Liang · Huisi Wu · Guang-Zhong Cao · Yong Guo
|
||
Implicit Event-RGBD Neural SLAM
Delin Qu · Chi Yan · Dong Wang · Jie Yin · Qizhi Chen · Dan Xu · Yiting Zhang · Bin Zhao · Xuelong Li
|
||
Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection
Xiaowei Zhao · Xianglong Liu · Duorui Wang · Yajun Gao · Zhide Liu
|
||
RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation
Yi Rong · Haoran Zhou · Kang Xia · Cheng Mei · Jiahao Wang · Tong Lu
|
||
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
Jiun Tian Hoe · Xudong Jiang · Chee Seng Chan · Yap-peng Tan · Weipeng Hu
|
||
Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation
Wenxiao Deng · Wenbin Li · Tianyu Ding · Lei Wang · Hongguang Zhang · Kuihua Huang · Jing Huo · Yang Gao
|
||
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
Junming Chen · Yunfei Liu · Jianan Wang · Ailing Zeng · Yu Li · Qifeng Chen
|
||
One-step Diffusion with Distribution Matching Distillation
Tianwei Yin · Michaël Gharbi · Richard Zhang · Eli Shechtman · Fredo Durand · William Freeman · Taesung Park
|
||
On Exact Inversion of DPM-Solvers
Seongmin Hong · Kyeonghyun Lee · Suh Yoon Jeon · Hyewon Bae · Se Young Chun
|
||
Privacy-Preserving Face Recognition Using Trainable Feature Subtraction
Yuxi Mi · Zhizhou Zhong · Yuge Huang · Jiazhen Ji · Jianqing Xu · Jun Wang · ShaoMing Wang · Shouhong Ding · Shuigeng Zhou
|
||
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang · Ziqiao Ma · Xiaofeng Gao · Suhaila Shakiah · Qiaozi Gao · Joyce Chai
|
||
DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting
Jer Pelhan · Alan Lukezic · Vitjan Zavrtanik · Matej Kristan
|
||
CONFORM: Contrast is All You Need for High-Fidelity Text-to-Image Diffusion Models
Tuna Han Salih Meral · Enis Simsar · Federico Tombari · Pinar Yanardag
|
||
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
Lin Li · Haoyan Guan · Jianing Qiu · Michael Spratling
|
||
Arbitrary Motion Style Transfer with Multi-condition Motion Latent Diffusion Model
Wenfeng Song · Xingliang Jin · Shuai Li · Chenglizhao Chen · Aimin Hao · Xia HOU · Ning Li · Hong Qin
|
||
Video-Based Human Pose Regression via Decoupled Space-Time Aggregation
Jijie He · Wenwu Yang
|
||
Scaling Up Video Summarization Pretraining with Large Language Models
Dawit Argaw Argaw · Seunghyun Yoon · Fabian Caba Heilbron · Hanieh Deilamsalehy · Trung Bui · Zhaowen Wang · Franck Dernoncourt · Joon Chung
|
||
Neural Refinement for Absolute Pose Regression with Feature Synthesis
Shuai Chen · Yash Bhalgat · Xinghui Li · Jia-Wang Bian · Kejie Li · Zirui Wang · Victor Adrian Prisacariu
|
||
Single-View Scene Point Cloud Human Grasp Generation
Yan-Kang Wang · Chengyi Xing · Yi-Lin Wei · Xiao-Ming Wu · Wei-Shi Zheng
|
||
UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement
yaofeng xie · Lingwei Kong · Kai Chen · Zheng Ziqiang · Xiao Yu · Zhibin Yu · Bing Zheng
|
||
GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects
Sungphill Moon · Hyeontae Son · Dongcheol Hur · Sangwook Kim
|
||
DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling
Xiaoyun Zheng · Liwei Liao · Xufeng Li · Jianbo Jiao · Rongjie Wang · Feng Gao · Shiqi Wang · Ronggang Wang
|
||
Improving Depth Completion via Depth Feature Upsampling
Yufei Wang · Ge Zhang · Shaoqian Wang · Bo Li · Qi Liu · Le Hui · Yuchao Dai
|
||
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
Maitreya Patel · Changhoon Kim · Sheng Cheng · Chitta Baral · 'YZ' Yezhou Yang
|
||
Nearest Is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks
Boheng Li · Yishuo Cai · Haowei Li · Feng Xue · Zhifeng Li · Yiming Li
|
||
Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket
Chengxu Zuo · Yiming Wang · Lishuang Zhan · Shihui Guo · Xinyu Yi · Feng Xu · Yipeng Qin
|
||
Neural Exposure Fusion for High-Dynamic Range Object Detection
Emmanuel Onzon · Maximilian Bömer · Fahim Mannan · Felix Heide
|
||
Discriminative Probing and Tuning for Text-to-Image Generation
Leigang Qu · Wenjie Wang · Yongqi Li · Hanwang Zhang · Liqiang Nie · Tat-seng Chua
|
||
Model Adaptation for Time Constrained Embodied Control
Jaehyun Song · Minjong Yoo · Honguk Woo
|
||
GigaTraj: Predicting Long-term Trajectories of Hundreds of Pedestrians in Gigapixel Complex Scenes
Haozhe Lin · Chunyu Wei · Li He · Yuchen Guo · Yuchy Zhao · Shanglong Li · Lu Fang
|
||
Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata
Dongsu Zhang · Francis Williams · Žan Gojčič · Karsten Kreis · Sanja Fidler · Young Min Kim · Amlan Kar
|
||
Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods
Mingqi Jiang · Saeed Khorram · Li Fuxin
|
||
CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
Sajid Javed · Arif Mahmood · IYYAKUTTI IYAPPAN GANAPATHI · Fayaz Ali · Naoufel Werghi · Mohammed Bennamoun
|
||
TTA-EVF: Test-Time Adaptation for Event-based Video Frame Interpolation via Reliable Pixel and Sample Estimation
Hoonhee Cho · Taewoo Kim · Yuhwan Jeong · Kuk-Jin Yoon
|
||
One-Prompt to Segment All Medical Images
Wu · Min Xu
|
||
Quantifying Task Priority for Multi-Task Optimization
Wooseong Jeong · Kuk-Jin Yoon
|
||
Image Sculpting: Precise Object Editing with 3D Geometry Control
Jiraphon Yenphraphai · Xichen Pan · Sainan Liu · Daniele Panozzo · Saining Xie
|
||
UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence
Ruihai Wu · Haoran Lu · Yiyan Wang · Yubo Wang · Hao Dong
|
||
Distilling Semantic Priors from SAM to Efficient Image Restoration Models
Quan Zhang · Xiaoyu Liu · Wei Li · Hanting Chen · Junchao Liu · Jie Hu · Zhiwei Xiong · Chun Yuan · Yunhe Wang
|
||
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
Haiwen Diao · Bo Wan · Ying Zhang · Xu Jia · Huchuan Lu · Long Chen
|
||
Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities
AJ Piergiovanni · Isaac Noble · Dahun Kim · Michael Ryoo · Victor Gomes · Anelia Angelova
|
||
Evaluating Transferability in Retrieval Tasks: An Approach Using MMD and Kernel Methods
Mengyu Dai · Amir Hossein Raffiee · Aashish Jain · Joshua Correa
|
||
CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention
Mohammad Sadil Khan · Elona Dupont · Sk Aziz Ali · Kseniya Cherenkova · Anis Kacem · Djamila Aouada
|
||
Instance Tracking in 3D Scenes from Egocentric Videos
Yunhan Zhao · Haoyu Ma · Shu Kong · Charless Fowlkes
|
||
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Shoukang Hu · Tao Hu · Ziwei Liu
|
||
Cyclic Learning for Binaural Audio Generation and Localization
Zhaojian Li · Bin Zhao · Yuan Yuan
|
||
Frequency-aware Event-based Video Deblurring for Real-World Motion Blur
Taewoo Kim · Hoonhee Cho · Kuk-Jin Yoon
|
||
Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset
Yiming Li · Zhiheng Li · Nuo Chen · Moonjun Gong · Zonglin Lyu · Zehong Wang · Peili Jiang · Chen Feng
|
||
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data
Chengxiang Fan · Muzhi Zhu · Hao Chen · Yang Liu · Weijia Wu · Huaqi Zhang · Chunhua Shen
|
||
Video Interpolation with Diffusion Models
Siddhant Jain · Daniel Watson · Aleksander Holynski · Eric Tabellion · Ben Poole · Janne Kontkanen
|
||
DeMatch: Deep Decomposition of Motion Field for Two-View Correspondence Learning
Shihua Zhang · Zizhuo Li · Yuan Gao · Jiayi Ma
|
||
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu · Runyu He · Gangshan Wu · Limin Wang
|
||
VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
Ziyang Luo · Nian Liu · Wangbo Zhao · Xuguang Yang · Dingwen Zhang · Deng-Ping Fan · Fahad Shahbaz Khan · Junwei Han
|
||
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Zhiwu Qing · Shiwei Zhang · Jiayu Wang · Xiang Wang · Yujie Wei · Yingya Zhang · Changxin Gao · Nong Sang
|
||
Self-supervised debiasing using low rank regularization
Geon Yeong Park · Chanyong Jung · Sangmin Lee · Jong Chul Ye · Sang Wan Lee
|
||
Neural Markov Random Field for Stereo Matching
Tongfan Guan · Chen Wang · Yun-Hui Liu
|
||
Ungeneralizable Examples
Jingwen Ye · Xinchao Wang
|
||
Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency
Xu Yingjie · Bangzhen Liu · Hao Tang · Bailin Deng · Shengfeng He
|
||
Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs
Sunghwan Hong · Jaewoo Jung · Heeseong Shin · Jiaolong Yang · Chong Luo · Seungryong Kim
|
||
ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments
Jingyu Zhang · Kun Yang · Yilei Wang · Hanqi Wang · Peng Sun · Liang Song
|
||
Tri-Perspective View Decomposition for Geometry-Aware Depth Completion
Zhiqiang Yan · Yuankai Lin · Kun Wang · Yupeng Zheng · Yufei Wang · Zhenyu Zhang · Jun Li · Jian Yang
|
||
Text-to-3D Generation with Bidirectional Diffusion using both 3D and 2D priors
Lihe Ding · Shaocong Dong · Zhanpeng Huang · Zibin Wang · Yiyuan Zhang · Kaixiong Gong · Dan Xu · Tianfan Xue
|
||
Instruct-Imagen: Image Generation with Multi-modal Instruction
Hexiang Hu · Kelvin C.K. Chan · Yu-Chuan Su · Wenhu Chen · Yandong Li · Kihyuk Sohn · Yang Zhao · Xue Ben · William Cohen · Ming-Wei Chang · Xuhui Jia
|
||
Beyond Average: Individualized Visual Scanpath Prediction
Xianyu Chen · Ming Jiang · Qi Zhao
|
||
Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning
Woo-Jin Ahn · Geun-Yeong Yang · Hyunduck Choi · Myo-Taeg Lim
|
||
DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach
Dayi Tan · Hansheng Chen · Wei Tian · Lu Xiong
|
||
Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
Takashi Otonari · Satoshi Ikehata · Kiyoharu Aizawa
|
||
Test-Time Domain Generalization for Face Anti-Spoofing
Qianyu Zhou · Ke-Yue Zhang · Taiping Yao · Xuequan Lu · Shouhong Ding · Lizhuang Ma
|
||
Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation
Jin Wang · Bingfeng Zhang · Jian Pang · Honglong Chen · Weifeng Liu
|
||
Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos
Leonhard Sommer · Artur Jesslen · Eddy Ilg · Adam Kortylewski
|
||
Fooling Polarization-based Vision using Locally Controllable Polarizing Projection
Zhuoxiao Li · Zhihang Zhong · Shohei Nobuhara · Ko Nishino · Yinqiang Zheng
|
||
Affine Equivariant Networks Based on Differential Invariants
Yikang Li · Yeqing Qiu · Yuxuan Chen · Lingshen He · Zhouchen Lin
|
||
C3: High-performance and low-complexity neural compression from a single image or video
Hyunjik Kim · Matthias Bauer · Lucas Theis · Jonathan Richard Schwarz · Emilien Dupont
|
||
AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing
Fan Yang · Tianyi Chen · XIAOSHENG HE · Zhongang Cai · Lei Yang · Si Wu · Guosheng Lin
|
||
ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining
Ruoxi Shi · Xinyue Wei · Cheng Wang · Hao Su
|
||
3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
Zhiyin Qian · Shaofei Wang · Marko Mihajlovic · Andreas Geiger · Siyu Tang
|
||
Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation
Haofeng Liu · Chenshu Xu · Yifei Yang · Lihua Zeng · Shengfeng He
|
||
SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement
Tao Wang · Lei Jin · Zheng Wang · Jianshu Li · Liang Li · Fang Zhao · Yu Cheng · Li Yuan · Li ZHOU · Junliang Xing · Jian Zhao
|
||
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
Gianluca Scarpellini · Stefano Fiorini · Francesco Giuliari · Pietro Morerio · Alessio Del Bue
|
||
MS-DETR: Efficient DETR Training with Mixed Supervision
Chuyang Zhao · Yifan Sun · Wenhao Wang · Qiang Chen · Errui Ding · Yi Yang · Jingdong Wang
|
||
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization
Xu Peng · Junwei Zhu · Boyuan Jiang · Ying Tai · Donghao Luo · Jiangning Zhang · Wei Lin · Taisong Jin · Chengjie Wang · Rongrong Ji
|
||
HDQMF: Holographic Feature Decomposition Using Quantum Algorithms
Prathyush Poduval · Zhuowen Zou · Mohsen Imani
|
||
Fair-VPT: Fair Visual Prompt Tuning for Image Classification
Sungho Park · Hyeran Byun
|
||
Task-conditioned adaptation of visual features in multi-task policy learning
Pierre Marza · Laetitia Matignon · Olivier Simonin · Christian Wolf
|
||
Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers
Jinxia Xie · Bineng Zhong · Zhiyi Mo · Shengping Zhang · Liangtao Shi · Shuxiang Song · Rongrong Ji
|
||
Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning
Jaewoo Jeong · Daehee Park · Kuk-Jin Yoon
|
||
Revisiting Single Image Reflection Removal In the Wild
Yurui Zhu · Bo Li · Xueyang Fu · Peng-Tao Jiang · Hao Zhang · Qibin Sun · Zheng-Jun Zha · Jinwei Chen
|
||
Facial Identity Anonymization via Intrinsic and Extrinsic Attention Distraction
Zhenzhong Kuang · Xiaochen Yang · Yingjie Shen · Chao Hu · Jun Yu
|
||
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
Yi Yu · Xue Yang · Qingyun Li · Feipeng Da · Jifeng Dai · Yu Qiao · Junchi Yan
|
||
TutteNet: Injective 3D Deformations by Composition of 2D Mesh Deformations
Bo Sun · Thibault Groueix · Chen Song · Qixing Huang · Noam Aigerman
|
||
Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes
Hmrishav Bandyopadhyay · Subhadeep Koley · Ayan Das · Ayan Kumar Bhunia · Aneeshan Sain · Pinaki Nath Chowdhury · Tao Xiang · Yi-Zhe Song
|
||
Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence
Ripon Saha · Dehao Qin · Nianyi Li · Jinwei Ye · Suren Jayasuriya
|
||
IIRP-Net: Iterative Inference Residual Pyramid Network for Enhanced Image Registration
Tai Ma · zhangsuwei · Jiafeng Li · Ying Wen
|
||
Frequency-Adaptive Dilated Convolution for Semantic Segmentation
Linwei Chen · Lin Gu · Dezhi Zheng · Ying Fu
|
||
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue · Jie Cheng · Longteng Guo · Xingyuan Dai · Zijia Zhao · Xingjian He · Gang Xiong · Yisheng Lv · Jing Liu
|
||
Style Aligned Image Generation via Shared Attention
Amir Hertz · Andrey Voynov · Shlomi Fruchter · Daniel Cohen-Or
|
||
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Yuchao Gu · Yipin Zhou · Bichen Wu · Licheng Yu · Jia-Wei Liu · Rui Zhao · Jay Zhangjie Wu · David Junhao Zhang · Mike Zheng Shou · Kevin Tang
|
||
Steganographic Passport: An Owner and User Verifiable Credential for Deep Model IP Protection Without Retraining
Qi Cui · Ruohan Meng · Chaohui Xu · Chip Hong Chang
|
||
SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation
Jiehong Lin · lihua liu · Dekun Lu · Kui Jia
|
||
MatSynth: A Modern PBR Materials Dataset
Giuseppe Vecchio · Valentin Deschaintre
|
||
$MonoDiff$: Monocular 3D Object Detection and Pose Estimation with Diffusion Models
Yasiru Ranasinghe · Deepti Hegde · Vishal M. Patel
|
||
Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm Regularization
Yujia Liu · Chenxi Yang · Dingquan Li · Jianhao Ding · Tingting Jiang
|
||
BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
Fengyuan Shi · Jiaxi Gu · Hang Xu · Songcen Xu · Wei Zhang · Limin Wang
|
||
Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring
Huicong Zhang · Haozhe Xie · Hongxun Yao
|
||
Bi-Causal: Group Activity Recognition via Bidirectional Causality
Youliang Zhang · Wenxuan Liu · danni xu · Zhuo Zhou · Zheng Wang
|
||
PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees
Chulin Xie · De-An Huang · Wenda Chu · Daguang Xu · Chaowei Xiao · Bo Li · Anima Anandkumar
|
||
How to Train Neural Field Representations: A Comprehensive Study and Benchmark
Samuele Papa · Riccardo Valperga · David Knigge · Miltiadis Kofinas · Phillip Lippe · Jan-Jakob Sonke · Efstratios Gavves
|
||
Universal Semi-Supervised Domain Adaptation by Mitigating Common-Class Bias
Wenyu Zhang · Qingmu Liu · Felix Ong · Mohamed Ragab · Chuan-Sheng Foo
|
||
Semantic-Aware Multi-Label Adversarial Attacks
Hassan Mahmood · Ehsan Elhamifar
|
||
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan · Jinke Ren · Chun-Mei Feng · Hengshuang Zhao · Shuguang Cui · Zhen Li
|
||
Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring
Chengxu Liu · Xuan Wang · Xiangyu Xu · Ruhao Tian · Shuai Li · Xueming Qian · Ming-Hsuan Yang
|
||
The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement
Gabriele Trivigno · Carlo Masone · Barbara Caputo · Torsten Sattler
|
||
PointInfinity: Resolution-Invariant Point Diffusion Models
Zixuan Huang · Justin Johnson · Shoubhik Debnath · James Rehg · Chao-Yuan Wu
|
||
F$^3$Loc: Fusion and Filtering for Floorplan Localization
Changan Chen · Rui Wang · Christoph Vogel · Marc Pollefeys
|
||
ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association
Shuxiao Ding · Lukas Schneider · Marius Cordts · Jürgen Gall
|
||
Construct to Associate: Cooperative Context Learning for Domain Adaptive Point Cloud Segmentation
Guangrui Li
|
||
EarthLoc: Astronaut Photography Localization by Indexing Earth from Space
Gabriele Berton · Alex Stoken · Barbara Caputo · Carlo Masone
|
||
Relation Rectification in Diffusion Model
Yinwei Wu · Xingyi Yang · Xinchao Wang
|
||
Close Imitation of Expert Retouching for Black-and-White Photography
Seunghyun Shin · Jisu Shin · Jihwan Bae · Inwook Shim · Hae-Gon Jeon
|
||
AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search
Junghyup Lee · Bumsub Ham
|
||
Transferable Structural Sparse Adversarial Attack Via Exact Group Sparsity Training
Di Ming · Peng Ren · Yunlong Wang · Xin Feng
|
||
Accelerating Diffusion Sampling with Optimized Time Steps
Shuchen Xue · Zhaoqiang Liu · Fei Chen · Shifeng Zhang · Tianyang Hu · Enze Xie · Zhenguo Li
|
||
Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
Hancheng Ye · Chong Yu · Peng Ye · Renqiu Xia · Bo Zhang · Yansong Tang · Jiwen Lu · Tao Chen
|
||
OneFormer3D: One Transformer for Unified Point Cloud Segmentation
Maksim Kolodiazhnyi · Anna Vorontsova · Anton Konushin · Danila Rukhovich
|
||
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
Junbo Yin · Wenguan Wang · Runnan Chen · Wei Li · Ruigang Yang · Pascal Frossard · Jianbing Shen
|
||
NC-TTT: A Noise Constrastive Approach for Test-Time Training
David OSOWIECHI · Gustavo Vargas Hakim · Mehrdad Noori · Milad Cheraghalikhani · Ali Bahri · Moslem Yazdanpanah · Ismail Ben Ayed · Christian Desrosiers
|
||
One-Shot Structure-Aware Stylized Image Synthesis
Hansam Cho · Jonghyun Lee · Seunggyu Chang · Yonghyun Jeong
|
||
ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Haokai Pang · Heming Zhu · Adam Kortylewski · Christian Theobalt · Marc Habermann
|
||
Enhancing Video Super-Resolution via Implicit Resampling-based Alignment
Kai Xu · Ziwei Yu · Xin Wang · Michael Bi Mi · Angela Yao
|
||
Grounded Text-to-Image Synthesis with Attention Refocusing
Quynh Phung · Songwei Ge · Jia-Bin Huang
|
||
TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis
Pavlo Melnyk · Andreas Robinson · Michael Felsberg · Mårten Wadenbäck
|
||
RTracker: Recoverable Tracking via PN Tree Structured Memory
Yuqing Huang · Xin Li · Zikun Zhou · Yaowei Wang · Zhenyu He · Ming-Hsuan Yang
|
||
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren · Linli Yao · Shicheng Li · Xu Sun · Lu Hou
|
||
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen · Jiannan Wu · Wenhai Wang · Weijie Su · Guo Chen · Sen Xing · Zhong Muyan · Qing-Long Zhang · Xizhou Zhu · Lewei Lu · Bin Li · Ping Luo · Tong Lu · Yu Qiao · Jifeng Dai
|
||
Tyche: Stochastic in Context Learning for Medical Image Segmentation
Marianne Rakic · Hallee Wong · Jose Javier Gonzalez Ortiz · Beth Cimini · John Guttag · Adrian V. Dalca
|
||
Unexplored Faces of Robustness and Out-of-Distribution: Covariate Shifts in Environment and Sensor Domains
Eunsu Baek · Keondo Park · Ji-yoon Kim · Hyung-Sin Kim
|
||
CLOAF: CoLlisiOn-Aware Human Flow
Andrey Davydov · Martin Engilberge · Mathieu Salzmann · Pascal Fua
|
||
What, How, and When Should Object Detectors Update in Continually Changing Test Domains?
Jayeon Yoo · Dongkwan Lee · Inseop Chung · Donghyun Kim · Nojun Kwak
|
||
Learning Correlation Structures for Vision Transformers
Manjin Kim · Paul Hongsuck Seo · Cordelia Schmid · Minsu Cho
|
||
CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning
Lianggangxu Chen · Xuejiao Wang · Jiale Lu · Shaohui Lin · Changbo Wang · Gaoqi He
|
||
Equivariant plug-and-play image reconstruction
Matthieu Terris · Thomas Moreau · Nelly Pustelnik · Julián Tachella
|
||
Visual Objectification in Films: Towards a New AI Task for Video Interpretation
Julie Tores · Lucile Sassatelli · Hui-Yin Wu · Clement Bergman · Léa Andolfi · Victor Ecrement · Frederic Precioso · Thierry Devars · Magali GUARESI · Virginie Julliard · Sarah Lécossais
|
||
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu · Kun Yin · Haoyu Cao · Xinghua Jiang · Xin Li · Yinsong Liu · Deqiang Jiang · Xing Sun · Linli Xu
|
||
Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning
Hao Jiang · Bingfeng Zhou · Yadong Mu
|
||
Bring Event into RGB and LiDAR: Hierarchical Visual-Motion Fusion for Scene Flow
Hanyu Zhou · Yi Chang · Zhiwei Shi
|
||
FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation
Zijia Lu · Ehsan Elhamifar
|
||
Patch2Self2: Self-supervised Denoising on Coresets via Matrix Sketching
Shreyas Fadnavis · Agniva Chowdhury · Joshua Batson · Petros Drineas · Eleftherios Garyfallidis
|
||
Optimizing Diffusion Noise Can Serve As Universal Motion Priors
Korrawe Karunratanakul · Konpat Preechakul · Emre Aksan · Thabo Beeler · Supasorn Suwajanakorn · Siyu Tang
|
||
Efficient Vision-Language Pre-training by Cluster Masking
Zihao Wei · Zixuan Pan · Andrew Owens
|
||
Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
Hao Li · Ying Chen · Yifei Chen · Rongshan Yu · Wenxian Yang · Liansheng Wang · Bowen Ding · Yuchen Han
|
||
Generative Unlearning for Any Identity
Juwon Seo · Sung-Hoon Lee · Tae-Young Lee · SeungJun Moon · Gyeong-Moon Park
|
||
Enhancing Multimodal Cooperation via Sample-level Modality Valuation
Yake Wei · Ruoxuan Feng · Zihe Wang · Di Hu
|
||
OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation
Xiongwei Wu · Sicheng Yu · Ee-Peng Lim · Chong Wah Ngo
|
||
CaKDP: Category-aware Knowledge Distillation and Pruning Framework for Lightweight 3D Object Detection
Haonan Zhang · Longjun Liu · Yuqi Huang · YangZhao · Xinyu Lei · Bihan Wen
|
||
SfmCAD: Unsupervised CAD Reconstruction by Learning Sketch-based Feature Modeling Operations
Pu Li · Jianwei Guo · HUIBIN LI · Bedrich Benes · Dong-Ming Yan
|
||
Neural Redshift: Random Networks are not Random Functions
Damien Teney · Armand Nicolicioiu · Valentin Hartmann · Ehsan Abbasnejad
|
||
SVDinsTN: A Tensor Network Paradigm for Efficient Structure Search from Regularized Modeling Perspective
Yu-Bang Zheng · Xile Zhao · Junhua Zeng · Chao Li · Qibin Zhao · Heng-Chao Li · Ting-Zhu Huang
|
||
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection
Ting Lei · Shaofeng Yin · Yang Liu
|
||
Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration
Shihao Zhou · Duosheng Chen · Jinshan Pan · Jinglei Shi · Jufeng Yang
|
||
DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation
Yifei Li · Hsiaoyu Chen · Egor Larionov · Nikolaos Sarafianos · Wojciech Matusik · Tuur Stuyck
|
||
OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
Guan Wang · Zhimin Li · Qingchao Chen · Yang Liu
|
||
On the Estimation of Image-matching Uncertainty in Visual Place Recognition
Mubariz Zaffar · Liangliang Nan · Julian F. P. Kooij
|
||
Learning to Transform Dynamically for Better Adversarial Transferability
Rongyi Zhu · Zeliang Zhang · Susan Liang · Zhuo Liu · Chenliang Xu
|
||
3DFIRES: Few Image 3D REconstruction for Scenes with Hidden Surfaces
Linyi Jin · Nilesh Kulkarni · David Fouhey
|
||
A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning
Yuelin Zhang · Pengyu Zheng · Wanquan Yan · Chengyu Fang · Shing Shin Cheng
|
||
Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships
Rangel Daroya · Aaron Sun · Subhransu Maji
|
||
Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts
Cansu Korkmaz · Ahmet Murat Tekalp · Zafer Dogan
|
||
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
Gang Zhang · Chen Junnan · Guohuan Gao · Jianmin Li · Si Liu · Xiaolin Hu
|
||
RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception
Ruiyang Hao · Siqi Fan · Yingru Dai · Zhenlin Zhang · Chenxi Li · YuntianWang · Haibao Yu · Wenxian Yang · Jirui Yuan · Zaiqing Nie
|
||
ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding
Le Xue · Ning Yu · Shu Zhang · Artemis Panagopoulou · Junnan Li · Roberto Martín-Martín · Jiajun Wu · Caiming Xiong · Ran Xu · Juan Carlos Niebles · Silvio Savarese
|
||
CFAT: Unleashing Triangular Windows for Image Super-resolution
Abhisek Ray · Gaurav Kumar · Maheshkumar Kolekar
|
||
Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing
Yafei Zhang · Shen Zhou · Huafeng Li
|
||
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Zhongcong Xu · Jianfeng Zhang · Jun Hao Liew · Hanshu Yan · Jia-Wei Liu · Chenxu Zhang · Jiashi Feng · Mike Zheng Shou
|
||
Relaxed Contrastive Learning for Federated Learning
Seonguk Seo · Jinkyu Kim · Geeho Kim · Bohyung Han
|
||
LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry
Weirong Chen · Le Chen · Rui Wang · Marc Pollefeys
|
||
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
Tianhao Qi · Shancheng Fang · Yanze Wu · Hongtao Xie · Jiawei Liu · Lang chen · Qian HE · Yongdong Zhang
|
||
CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data
Wei Fang · Yuxing Tang · Heng Guo · Mingze Yuan · Tony C. W. MOK · Ke Yan · Jiawen Yao · Xin Chen · Zaiyi Liu · Le Lu · Ling Zhang · Minfeng Xu
|
||
Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples
Yuyang Yu · Bangzhen Liu · Chenxi Zheng · Xuemiao Xu · Huaidong Zhang · Shengfeng He
|
||
Can’t make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models
Himangi Mittal · Nakul Agarwal · Shao-Yuan Lo · Kwonjoon Lee
|
||
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video
Hongchi Xia · Chih-Hao Lin · Wei-Chiu Ma · Shenlong Wang
|
||
Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering
Jiawei Yao · Qi Qian · Juhua Hu
|
||
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
Junyan Wang · Zhenhong Sun · Stewart Tan · Xuanbai Chen · Weihua Chen · li · Cheng Zhang · Yang Song
|
||
Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation
Hanyang Chi · Jian Pang · Bingfeng Zhang · Weifeng Liu
|
||
Targeted Representation Alignment for Open-World Semi-Supervised Learning
Ruixuan Xiao · Lei Feng · Kai Tang · Junbo Zhao · Yixuan Li · Gang Chen · Haobo Wang
|
||
Contrasting intra-modal and ranking cross-modal hard negatives to enhance visio-linguistic compositional understanding
Le Zhang · Rabiul Awal · Aishwarya Agrawal
|
||
Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal
Yijun Yang · Hongtao Wu · Angelica I. Aviles-Rivero · Yulun Zhang · Jing Qin · Lei Zhu
|
||
FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding
Jun Xiang · Xuan Gao · Yudong Guo · Juyong Zhang
|
||
DAP: A Dynamic Adversarial Patch for Evading Person Detectors
Amira Guesmi · Ruitian Ding · Muhammad Abdullah Hanif · Ihsen Alouani · Muhammad Shafique
|
||
T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory
Daehee Park · Jaeseok Jeong · Sung-Hoon Yoon · Jaewoo Jeong · Kuk-Jin Yoon
|
||
Dynamic Support Information Mining for Category-Agnostic Pose Estimation
Pengfei Ren · Yuanyuan Gao · Haifeng Sun · Qi Qi · Jingyu Wang · Jianxin Liao
|
||
Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud Matching
Matteo Bastico · Etienne Decencière · Laurent Corté · Yannick TILLIER · David Ryckelynck
|
||
Orthogonal Adaptation for Modular Customization of Diffusion Models
Ryan Po · Guandao Yang · Kfir Aberman · Gordon Wetzstein
|
||
MART: Masked Affective RepresenTation Learning via Masked Temporal Distribution Distillation
Zhicheng Zhang · Pancheng Zhao · Eunil Park · Jufeng Yang
|
||
Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning
Yun Li · Zhe Liu · Hang Chen · Lina Yao
|
||
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
Jeong-gi Kwak · Erqun Dong · Yuhe Jin · Hanseok Ko · Shweta Mahajan · Kwang Moo Yi
|
||
Map-Relative Pose Regression for Visual Re-Localization
Shuai Chen · Tommaso Cavallari · Victor Adrian Prisacariu · Eric Brachmann
|
||
MANUS: Markerless Grasp Capture using Articulated 3D Gaussians
Chandradeep Pokhariya · Ishaan Shah · Angela Xing · Zekun Li · Kefan Chen · Avinash Sharma · Srinath Sridhar
|
||
Towards Generalizable Tumor Synthesis
Qi Chen · Xiaoxi Chen · Haorui Song · Alan L. Yuille · Zhiwei Xiong · Chen Wei · Zongwei Zhou
|
||
Diversified and Personalized Multi-rater Medical Image Segmentation
Yicheng Wu · Xiangde Luo · Zhe Xu · Xiaoqing Guo · Lie Ju · Zongyuan Ge · Wenjun Liao · Jianfei Cai
|
||
ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D image
Marco Pesavento · Yuanlu Xu · Nikolaos Sarafianos · Robert Maier · Ziyan Wang · Chun-Han Yao · Marco Volino · Edmond Boyer · Adrian Hilton · Tony Tung
|
||
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya · Anurag Arnab · Arsha Nagrani · Michael Ryoo
|
||
Point Transformer V3: Simpler, Faster, Stronger
Xiaoyang Wu · Li Jiang · Peng-Shuai Wang · Zhijian Liu · Xihui Liu · Yu Qiao · Wanli Ouyang · Tong He · Hengshuang Zhao
|
||
Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion
Yujie Xue · Ruihui Li · F anWu · Zhuo Tang · Kenli Li · Duan Mingxing
|
||
FC-GNN: Recovering Reliable and Accurate Correspondences from Interferences
Haobo Xu · Jun Zhou · Hua Yang · Renjie Pan · Cunyan Li
|
||
Gradient-based Parameter Selection for Efficient Fine-Tuning
Zhi Zhang · Qizhe Zhang · Zijun Gao · Renrui Zhang · Ekaterina Shutova · Shiji Zhou · Shanghang Zhang
|
||
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Xiaohan Ding · Yiyuan Zhang · Yixiao Ge · Sijie Zhao · Lin Song · Xiangyu Yue · Ying Shan
|
||
Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements
Niccolò Biondi · Federico Pernici · Simone Ricci · Alberto Del Bimbo
|
||
A General and Efficient Training for Transformer via Token Expansion
Wenxuan Huang · Yunhang Shen · Jiao Xie · Baochang Zhang · Gaoqi He · Ke Li · Xing Sun · Shaohui Lin
|
||
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Xin Li · Yunfei Wu · Xinghua Jiang · ZhiHao Guo · Mingming Gong · Haoyu Cao · Yinsong Liu · Deqiang Jiang · Xing Sun
|
||
Language-Driven Anchors for Zero-Shot Adversarial Robustness
Xiao Li · Wei Zhang · Yining Liu · Zhanhao Hu · Bo Zhang · Xiaolin Hu
|
||
Learning Vision from Models Rivals Learning Vision from Data
Yonglong Tian · Lijie Fan · Kaifeng Chen · Dina Katabi · Dilip Krishnan · Phillip Isola
|
||
MotionEditor: Editing Video Motion via Content-Aware Diffusion
Shuyuan Tu · Qi Dai · Zhi-Qi Cheng · Han Hu · Xintong Han · Zuxuan Wu · Yu-Gang Jiang
|
||
EVS-assisted joint Deblurring, Rolling-Shutter Correction and Video Frame Interpolation through Sensor Inverse Modeling
Rui Jiang · Fangwen Tu · Yixuan Long · Aabhaas Vaish · Bowen Zhou · Qinyi Wang · Wei Zhang · Yuntan Fang · Luis Eduardo García Capel · Bo Mu · Tiejun Dai · Andreas Suess
|
||
Open-World Semantic Segmentation Including Class Similarity
Matteo Sodano · Federico Magistri · Lucas Nunes · Jens Behley · Cyrill Stachniss
|
||
MindBridge: A Cross-Subject Brain Decoding Framework
Shizun Wang · Songhua Liu · Zhenxiong Tan · Xinchao Wang
|
||
Towards Calibrated Multi-label Deep Neural Networks
Jiacheng Cheng · Nuno Vasconcelos
|
||
Collaborating Foundation models for Domain Generalized Semantic Segmentation
Yasser Benigmim · Subhankar Roy · Slim Essid · Vicky Kalogeiton · Stéphane Lathuilière
|
||
Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability
Yan Huang · Zhang Zhang · Qiang Wu · yi zhong · Liang Wang
|
||
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
Xiang Wang · Shiwei Zhang · Hangjie Yuan · Zhiwu Qing · Biao Gong · Yingya Zhang · Yujun Shen · Changxin Gao · Nong Sang
|
||
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Yujie Wei · Shiwei Zhang · Zhiwu Qing · Hangjie Yuan · Zhiheng Liu · Yu Liu · Yingya Zhang · Jingren Zhou · Hongming Shan
|
||
MPOD123: One Image to 3D Content Generation Using Mask-enhanced Progressive Outline-to-Detail Optimization
Jimin Xu · Tianbao Wang · Tao Jin · Shengyu Zhang · Dongjie Fu · Zhe Wang · Jiangjing Lyu · Chengfei Lv · Chaoyue Niu · Zhou Yu · Zhou Zhao · Fei Wu
|
||
3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation
Zidu Wang · Xiangyu Zhu · Tianshuo Zhang · baiqin wang · Zhen Lei
|
||
HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
Supreeth Narasimhaswamy · Uttaran Bhattacharya · Xiang Chen · Ishita Dasgupta · Saayan Mitra · Minh Hoai
|
||
OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation
Jisoo Jeong · Hong Cai · Risheek Garrepalli · Jamie Lin · Munawar Hayat · Fatih Porikli
|
||
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
Pablo Marcos-Manchón · Roberto Alcover-Couso · Juan SanMiguel · Jose M. Martinez
|
||
BoQ: A Place is Worth a Bag of Learnable Queries
Amar Ali-bey · Brahim Chaib-draa · Philippe Giguère
|
||
Generalizable Face Landmarking Guided by Conditional Face Warping
Jiayi Liang · Haotian Liu · Hongteng Xu · Dixin Luo
|
||
Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses
Inhee Lee · Byungjun Kim · Hanbyul Joo
|
||
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
Ganggui Ding · Canyu Zhao · Wen Wang · Zhen Yang · Zide Liu · Hao Chen · Chunhua Shen
|
||
Semantic Human Mesh Reconstruction with Textures
xiaoyu zhan · Jianxin Yang · Yuanqi Li · Jie Guo · Yanwen Guo · Wenping Wang
|
||
ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction
Zhicheng Zhang · Junyao Hu · Wentao Cheng · Danda Paudel · Jufeng Yang
|
||
TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation
Xiaopei Wu · Yuenan Hou · Xiaoshui Huang · Binbin Lin · Tong He · Xinge Zhu · Yuexin Ma · Boxi Wu · Haifeng Liu · Deng Cai · Wanli Ouyang
|
||
Probabilistic Sampling of Balanced K-Means using Adiabatic Quantum Computing
Jan-Nico Zaech · Martin Danelljan · Tolga Birdal · Luc Van Gool
|
||
Robust Image Denoising through Adversarial Frequency Mixup
Donghun Ryou · Inju Ha · Hyewon Yoo · Dongwan Kim · Bohyung Han
|
||
Learning Occupancy for Monocular 3D Object Detection
Liang Peng · Junkai Xu · Haoran Cheng · Zheng Yang · Xiaopei Wu · Wei Qian · Wenxiao Wang · Boxi Wu · Deng Cai
|
||
SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution
Rongyuan Wu · Tao Yang · Lingchen Sun · Zhengqiang ZHANG · Shuai Li · Lei Zhang
|
||
DUDF: Differentiable Unsigned Distance Fields with Hyperbolic Scaling
Miguel Fainstein · Viviana Siless · Emmanuel Iarussi
|
||
Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision Transfomers
Sheng Yang · Jiawang Bai · Kuofeng Gao · Yong Yang · Yiming Li · Shu-Tao Xia
|
||
DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model
Zhenghao Pan · Haijin Zeng · Jiezhang Cao · Kai Zhang · Yongyong Chen
|
||
Relightable Gaussian Codec Avatars
Shunsuke Saito · Gabriel Schwartz · Tomas Simon · Junxuan Li · Giljoo Nam
|
||
WildlifeMapper: Aerial Image Analysis for Multi-Species Detection and Identification
Satish Kumar · Bowen Zhang · Chandrakanth Gudavalli · Connor Levenson · Lacey Hughey · Jared Stabach · Irene Amoke · Gordon Ojwang · Joseph Mukeka · Howard Frederick · Stephen Mwiu · Joseph Ochieng Ogutu · B S Manjunath
|
||
Pre-training Vision Models with Mandelbulb Variations
Benjamin N. Chiche · Yuto Horikawa · Ryo Fujita
|
||
Plug-and-Play, Dense-Label-Free Extraction of Open-Vocabulary Semantic Segmentation from Vision-Language Models
Luo Jiayun · Siddhesh Khandelwal · Leonid Sigal · Boyang Li
|
||
SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
Keqi Chen · vinkle srivastav · Nicolas Padoy
|
||
Context-Aware Integration of Language and Visual References for Natural Language Tracking
Yanyan Shao · Shuting He · Qi Ye · Yuchao Feng · Wenhan Luo · Jiming Chen
|
||
OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
Xinyu Zhan · Lixin Yang · Yifei Zhao · Kangrui Mao · Hanlin Xu · Zenan Lin · Kailin Li · Cewu Lu
|
||
LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example
Soyeon Yoon · Kwan Yun · Kwanggyoon Seo · Sihun Cha · Jung Eun Yoo · Junyong Noh
|
||
ReGenNet: Towards Human Action-Reaction Synthesis
Liang Xu · Yizhou Zhou · Yichao Yan · Xin Jin · Wenhan Zhu · Fengyun Rao · Xiaokang Yang · Wenjun Zeng
|
||
GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding
Zi-Ting Chou · Sheng-Yu Huang · I-Jieh Liu · Yu-Chiang Frank Wang
|
||
LTA-PCS: Learnable Task-Agnostic Point Cloud Sampling
Jiaheng Liu · Jianhao Li · Kaisiyuan Wang · Hongcheng Guo · Jian Yang · Junran Peng · Ke Xu · Xianglong Liu · Jinyang Guo
|
||
DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization
Zeqin Yu · Jiangqun Ni · Yuzhen Lin · Haoyi Deng · Bin Li
|
||
Initialization Matters for Adversarial Transfer Learning
Andong Hua · Jindong Gu · Zhiyu Xue · Nicholas Carlini · Eric Wong · Yao Qin
|
||
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
Hang Li · Chengzhi Shen · Philip H.S. Torr · Volker Tresp · Jindong Gu
|
||
Universal Segmentation at Arbitrary Granularity with Language Instruction
Yong Liu · Cairong Zhang · Yitong Wang · Jiahao Wang · Yujiu Yang · Yansong Tang
|
||
On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm
Peng Sun · Bei Shi · Daiwei Yu · Tao Lin
|
||
Dr. Bokeh: DiffeRentiable Occlusion-aware Bokeh Rendering
Yichen Sheng · Zixun Yu · Lu Ling · Zhiwen Cao · Xuaner Zhang · Xin Lu · Ke Xian · Haiting Lin · Bedrich Benes
|
||
BrainWash: A Poisoning Attack to Forget in Continual Learning
Ali Abbasi · Parsa Nooralinejad · Hamed Pirsiavash · Soheil Kolouri
|
||
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
Mehmet Saygin Seyfioglu · Wisdom Ikezogwo · Fatemeh Ghezloo · Ranjay Krishna · Linda Shapiro
|
||
Segment and Caption Anything
Xiaoke Huang · Jianfeng Wang · Yansong Tang · Zheng Zhang · Han Hu · Jiwen Lu · Lijuan Wang · Zicheng Liu
|
||
Selective nonlinearities removal from digital signals
Krzysztof Maliszewski · Magdalena Urbanska · Varvara Vetrova · Sylwia Kolenderska
|
||
CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation
Lingjun Zhao · Jingyu Song · Katherine Skinner
|
||
Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D
Mukund Varma T · Peihao Wang · Zhiwen Fan · Zhangyang Wang · Hao Su · Ravi Ramamoorthi
|
||
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
JunDa Cheng · Wei Yin · Kaixuan Wang · Xiaozhi Chen · Shijie Wang · Xin Yang
|
||
Self-correcting LLM-controlled Diffusion
Tsung-Han Wu · Long Lian · Joseph Gonzalez · Boyi Li · Trevor Darrell
|
||
Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion
Fan Zhang · Shaodi You · Yu Li · Ying Fu
|
||
Atom-Level Optical Chemical Structure Recognition with Limited Supervision
Martijn Oldenhof · Edward De Brouwer · Adam Arany · Yves Moreau
|
||
Scalable 3D Registration via Truncated Entry-wise Absolute Residuals
Tianyu Huang · Liangzu Peng · Rene Vidal · Yun-Hui Liu
|
||
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Zhiyu Zhao · Bingkun Huang · Sen Xing · Gangshan Wu · Yu Qiao · Limin Wang
|
||
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Youngjoon Jang · Jihoon Kim · Junseok Ahn · Doyeop Kwak · Hongsun Yang · Yooncheol Ju · ILHWAN KIM · Byeong-Yeol Kim · Joon Chung
|
||
Generative Image Dynamics
Zhengqi Li · Richard Tucker · Noah Snavely · Aleksander Holynski
|
||
Continual Forgetting for Pre-trained Vision Models
Hongbo Zhao · Bolin Ni · Junsong Fan · Yuxi Wang · Yuntao Chen · Gaofeng Meng · Zhaoxiang Zhang
|
||
Distributionally Generative Augmentation for Fair Facial Attribute Classification
Fengda Zhang · Qianpei He · Kun Kuang · Jiashuo Liu · Long Chen · Chao Wu · Jun Xiao · Hanwang Zhang
|
||
CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs
Yingji Zhong · Lanqing Hong · Zhenguo Li · Dan Xu
|
||
Learning Adaptive Spatial Coherent Correlations for Speech-Preserving Facial Expression Manipulation
Tianshui Chen · Jianman Lin · Zhijing Yang · Chunmei Qing · Liang Lin
|
||
Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao · Zhan Tong · Kevin Qinghong Lin · Joya Chen · Mike Zheng Shou
|
||
THRONE: A Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul · Zhizhong Li · Hao Yang · Yonatan Dukler · Ashwin Swaminathan · CJ Taylor · Stefano Soatto
|
||
Clockwork Diffusion: Efficient Generation With Model-Step Distillation
Amirhossein Habibian · Amir Ghodrati · Noor Fathima · Guillaume Sautiere · Risheek Garrepalli · Fatih Porikli · Jens Petersen
|
||
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Huan Ling · Seung Wook Kim · Antonio Torralba · Sanja Fidler · Karsten Kreis
|
||
Inlier Confidence Calibration for Point Cloud Registration
Yongzhe Yuan · Yue Wu · Xiaolong Fan · Maoguo Gong · Qiguang Miao · Wenping Ma
|
||
Memory-Scalable and Simplified Functional Map Learning
Robin Magnet · Maks Ovsjanikov
|
||
ADFactory: An Effective Framework for Generalizing Optical Flow with NeRF
Han Ling · Quansen Sun · Yinghui Sun · Xian Xu · Xingfeng Li
|
||
IReNe: Instant Recoloring of Neural Radiance Fields
Alessio Mazzucchelli · Adrian Garcia-Garcia · Elena Garces · Fernando Rivas-Manzaneque · Francesc Moreno-Noguer · Adrian Penate-Sanchez
|
||
HardMo: A Large-Scale Hardcase Dataset for Motion Capture
Jiaqi Liao · Chuanchen Luo · Yinuo Du · Yuxi Wang · Xu-Cheng Yin · Man Zhang · Zhaoxiang Zhang · Junran Peng
|
||
HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions
Hao Xu · Li Haipeng · Yinqiao Wang · Shuaicheng Liu · Chi-Wing Fu
|
||
An Empirical Study of the Generalization Ability of Lidar 3D Object Detectors to Unseen Domains
George Eskandar
|
||
Constrained Layout Generation with Factor Graphs
Mohammed Haroon Dupty · Yanfei Dong · Sicong Leng · Guoji Fu · Yong Liang Goh · Wei Lu · Wee Sun Lee
|
||
FastMAC: Stochastic Spectral Sampling of Correspondence Graph
Yifei Zhang · Hao Zhao · Hongyang Li · Siheng Chen
|
||
Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID
Wentao Tan · Changxing Ding · Jiayu Jiang · Fei Wang · Yibing Zhan · Dapeng Tao
|
||
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation
guo · Tianwei Lin
|
||
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu · Xia Hu · Yaqing Wang · Bo Pang · Radu Soricut
|
||
Distilling CLIP with Dual Guidance for Learning Discriminative Human Body Shape Representation
Feng Liu · Minchul Kim · Zhiyuan Ren · Xiaoming Liu
|
||
Observation-Guided Diffusion Probabilistic Models
Junoh Kang · Jinyoung Choi · Sungik Choi · Bohyung Han
|
||
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Sihan liu · Yiwei Ma · Xiaoqing Zhang · Haowei Wang · Jiayi Ji · Xiaoshuai Sun · Rongrong Ji
|
||
GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding
Chengyao Wang · Li Jiang · Xiaoyang Wu · Zhuotao Tian · Bohao Peng · Hengshuang Zhao · Jiaya Jia
|
||
Fully Exploiting Every Real Sample: Super-Pixel Sample Gradient Model Stealing
Yunlong Zhao · Xiaoheng Deng · Yijing Liu · Xinjun Pei · Jiazhi Xia · Wei Chen
|
||
LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising
Yuxing Duan
|
||
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
Chenlu Zhan · Gaoang Wang · Yu LIN · Hongwei Wang · Jian Wu
|
||
DePT: Decoupled Prompt Tuning
Ji Zhang · Shihan Wu · Lianli Gao · Heng Tao Shen · Jingkuan Song
|
||
A Subspace-Constrained Tyler's Estimator and its Applications to Structure from Motion
Feng Yu · Teng Zhang · Gilad Lerman
|
||
Bi-level Learning of Task-Specific Decoders for Joint Registration and One-Shot Medical Image Segmentation
Xin Fan · Xiaolin Wang · Jiaxin Gao · Jia Wang · Zhongxuan Luo · Risheng Liu
|
||
Osprey: Pixel Understanding with Visual Instruction Tuning
Yuqian Yuan · Wentong Li · Jian liu · Dongqi Tang · Xinjie Luo · Chi Qin · Lei Zhang · Jianke Zhu
|
||
NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation
Sicheng Li · Hao Li · Yiyi Liao · Lu Yu
|
||
Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance
Yuto Enyo · Ko Nishino
|
||
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
Hyelin Nam · Gihyun Kwon · Geon Yeong Park · Jong Chul Ye
|
||
Domain Prompt Learning with Quaternion Networks
Qinglong Cao · Zhengqin Xu · Yuntian Chen · Chao Ma · Xiaokang Yang
|
||
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Lewei Yao · Renjie Pi · Jianhua Han · Xiaodan Liang · Hang Xu · Wei Zhang · Zhenguo Li · Dan Xu
|
||
Uncertainty-Guided Never-Ending Learning to Drive
Lei Lai · Eshed Ohn-Bar · Sanjay Arora · John Yi
|
||
PlatoNeRF: 3D Reconstruction in Plato’s Cave via Single-View Two-Bounce Lidar
Tzofi Klinghoffer · Xiaoyu Xiang · Siddharth Somasundaram · Yuchen Fan · Christian Richardt · Ramesh Raskar · Rakesh Ranjan
|
||
Koala: Key frame-conditioned long video-LLM
Reuben Tan · Ximeng Sun · Ping Hu · Jui-Hsien Wang · Hanieh Deilamsalehy · Bryan A. Plummer · Bryan Russell · Kate Saenko
|
||
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors
Biwen Lei · Kai Yu · Mengyang Feng · Miaomiao Cui · Xuansong Xie
|
||
ZeroShape: Regression-based Zero-shot Shape Reconstruction
Zixuan Huang · Stefan Stojanov · Anh Thai · Varun Jampani · James Rehg
|
||
Your Transferability Barrier is Fragile: Free-Lunch for Transferring the Non-Transferable Learning
Ziming Hong · Li Shen · Tongliang Liu
|
||
ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe
Yifan Bai · Zeyang Zhao · Yihong Gong · Xing Wei
|
||
DPHMs: Diffusion Parametric Head Models for Depth-based Tracking
Jiapeng Tang · Angela Dai · Yinyu Nie · Lev Markhasin · Justus Thies · Matthias Nießner
|
||
CNC-Net: Self-Supervised Learning for CNC Machining Operations
Mohsen Yavartanoo · Sangmin Hong · Reyhaneh Neshatavar · Kyoung Mu Lee
|
||
High-Quality Facial Geometry and Appearance Capture at Home
Yuxuan Han · Junfeng Lyu · Feng Xu
|
||
Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data
Yu Deng · Duomin Wang · Xiaohang Ren · Xingyu Chen · Baoyuan Wang
|
||
Efficient Scene Recovery Using Luminous Flux Prior
ZhongYu Li · Lei Zhang
|
||
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding
Hoang-Quan Nguyen · Thanh-Dat Truong · Xuan-Bac Nguyen · Ashley Dowling · Xin Li · Khoa Luu
|
||
IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
Yizhi Song · Zhifei Zhang · Zhe Lin · Scott Cohen · Brian Price · Jianming Zhang · Soo Ye Kim · He Zhang · Wei Xiong · Daniel Aliaga
|
||
Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels
Zhuohong Li · Wei He · Jiepan Li · Fangxiao Lu · Hongyan Zhang
|
||
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning
Siteng Huang · Biao Gong · Yutong Feng · Zhang Min · Yiliang Lv · Donglin Wang
|
||
Hyperbolic Anomaly Detection
Huimin Li · Zhentao Chen · Yunhao Xu · Junlin Hu
|
||
Multiple View Geometry Transformers for 3D Human Pose Estimation
Ziwei Liao · jialiang zhu · Chunyu Wang · Han Hu · Steven L. Waslander
|
||
Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data
Xinting Liao · Weiming Liu · Chaochao Chen · Pengyang Zhou · Fengyuan Yu · Huabin Zhu · Binhui Yao · Tao Wang · Xiaolin Zheng · Yanchao Tan
|
||
SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes
Soubhik Sanyal · Partha Ghosh · Jinlong Yang · Michael J. Black · Justus Thies · Timo Bolkart
|
||
Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment
Ziyu Shan · Yujie Zhang · Qi Yang · Haichen Yang · Yiling Xu · Jenq-Neng Hwang · Xiaozhong Xu · Shan Liu
|
||
Anatomically Constrained Implicit Face Models
Prashanth Chandran · Gaspard Zoss
|
||
Revisiting Global Translation Estimation with Feature Tracks
Peilin Tao · Hainan Cui · Mengqi Rong · Shuhan Shen
|
||
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang · Feng Cheng · Gedas Bertasius
|
||
WinSyn: A High Resolution Testbed for Synthetic Data
Tom Kelly · John Femiani · Peter Wonka
|
||
Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Yixiong Zou · Yicong Liu · Yiman Hu · Yuhua Li · Ruixuan Li
|
||
Neural Super-Resolution for Real-time Rendering with Radiance Demodulation
Jia Li · Ziling Chen · Xiaolong Wu · Lu Wang · Beibei Wang · Lei Zhang
|
||
Noisy One-point Homographies are Surprisingly Good
Yaqing Ding · Jonathan Astermark · Magnus Oskarsson · Viktor Larsson
|
||
Alchemist: Parametric Control of Material Properties with Diffusion Models
Prafull Sharma · Varun Jampani · Yuanzhen Li · Xuhui Jia · Dmitry Lagun · Fredo Durand · William Freeman · Mark Matthews
|
||
DisCo: Disentangled Control for Realistic Human Dance Generation
Tan Wang · Linjie Li · Kevin Lin · Yuanhao Zhai · Chung-Ching Lin · Zhengyuan Yang · Hanwang Zhang · Zicheng Liu · Lijuan Wang
|
||
PaReNeRF: Toward Fast Large-scale Dynamic NeRF with Patch-based Reference
Xiao Tang · Min Yang · Penghui Sun · Hui Li · Yuchao Dai · feng zhu · Hojae Lee
|
||
FLHetBench: Benchmarking Device and State Heterogeneity in Federated Learning
Junyuan Zhang · Shuang Zeng · Miao Zhang · Runxi Wang · Feifei Wang · Yuyin Zhou · Paul Pu Liang · Liangqiong Qu
|
||
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang · Guohao Sun · Pichao Wang · Dongfang Liu · Sohail Dianat · MAJID RABBANI · Raghuveer Rao · ZHIQIANG TAO
|
||
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin · Ryuichi Takanobu · Cai Zhang · Xiaochun Cao · Li Yuan
|
||
Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
Xun Lin · Shuai Wang · RIZHAO CAI · Yizhong Liu · Ying Fu · Wenzhong Tang · Zitong YU · Alex C. Kot
|
||
Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation
Qinghe Ma · Jian Zhang · Lei Qi · Qian Yu · Yinghuan Shi · Yang Gao
|
||
Universal Novelty Detection through Adaptive Contrastive Learning
Hossein Mirzaei · Mojtaba Nafez · Mohammad Jafari · Mohammad Soltani · Mohammad Azizmalayeri · Jafar Habibi · Mohammad Sabokrou · Mohammad Rohban
|
||
LAMP: Learn A Motion Pattern for Few-Shot Video Generation
Rui-Qi Wu · Liangyu Chen · Tong Yang · Chun-Le Guo · Chongyi Li · Xiangyu Zhang
|
||
CLiC: Concept Learning in Context
Mehdi Safaee · Aryan Mikaeili · Or Patashnik · Daniel Cohen-Or · Ali Mahdavi Amiri
|
||
Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis
Jiawen Li · Yuxuan Chen · Hongbo Chu · Sun Qiehe · Tian Guan · Anjia Han · Yonghong He
|
||
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
Hao Fei · Shengqiong Wu · Wei Ji · Hanwang Zhang · Tat-seng Chua
|
||
LEAD: Exploring Logit Space Evolution for Model Selection
Zixuan Hu · Xiaotong Li · SHIXIANG TANG · Jun Liu · Yichun Hu · Ling-Yu Duan
|
||
Towards CLIP-driven Language-free 3D Visual Grounding via 2D-3D Relational Enhancement and Consistency
Yuqi Zhang · Han Luo · Yinjie Lei
|
||
MR-VNet: Media Restoration using Volterra Networks
Siddharth Roheda · Amit Unde · Loay Rashid
|
||
WonderJourney: Going from Anywhere to Everywhere
Hong-Xing Yu · Haoyi Duan · Junhwa Hur · Kyle Sargent · Michael Rubinstein · William Freeman · Forrester Cole · Deqing Sun · Noah Snavely · Jiajun Wu · Charles Herrmann
|
||
UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets
Youngju Na · Woo Jae Kim · Kyu Han · Suhyeon Ha · Sung-Eui Yoon
|
||
Few-shot Learner Parameterization by Diffusion Time-steps
Zhongqi Yue · Pan Zhou · Richang Hong · Hanwang Zhang · Qianru Sun
|
||
Global and Hierarchical Geometry Consistency Priors for Few-shot NeRFs in Indoor Scenes
Xiaotian Sun · Qingshan Xu · Xinjie Yang · Yu Zang · Cheng Wang
|
||
Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
Simon Niedermayr · Josef Stumpfegger · rüdiger westermann
|
||
The STVchrono Dataset: Towards Continuous Change Recognition in Time
Yanjun Sun · Yue Qiu · Mariia Khan · Fumiya Matsuzawa · Kenji Iwata
|
||
SPIN: Simultaneous Perception, Interaction and Navigation
Shagun Uppal · Ananye Agarwal · Haoyu Xiong · Kenneth Shaw · Deepak Pathak
|
||
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Bin Xie · Jiale Cao · Jin Xie · Fahad Shahbaz Khan · Yanwei Pang
|
||
Unleashing Channel Potential: Space-Frequency Selection Convolution for SAR Object Detection
Ke Li · Di Wang · Zhangyuan Hu · Wenxuan Zhu · Shaofeng Li · Quan Wang
|
||
Motion Blur Decomposition with Cross-shutter Guidance
Xiang Ji · Haiyang Jiang · Yinqiang Zheng
|
||
Real-time Acquisition and Reconstruction of Dynamic Volumes with Neural Structured Illumination
Yixin Zeng · Zoubin Bi · Yin Mingrui · Xiang Feng · Kun Zhou · Hongzhi Wu
|
||
MV-Adapter: Exploring Parameter Efficient Learning for Video Text Retrieval
bowen zhang · Xiaojie Jin · Weibo Gong · Kai Xu · Xueqing Deng · Peng Wang · Zhao Zhang · Xiaohui Shen · Jiashi Feng
|
||
Mind marginal non-crack regions: Clustering-inspired representation learning for crack segmentation
zhuangzhuang chen · Zhuonan Lai · Jie Chen · Jianqiang Li
|
||
SpatialTracker: Tracking Any 2D Pixels in 3D Space
Yuxi Xiao · Qianqian Wang · Shangzhan Zhang · Nan Xue · Sida Peng · Yujun Shen · Xiaowei Zhou
|
||
FreePoint: Unsupervised Point Cloud Instance Segmentation
Zhikai Zhang · Jian Ding · Li Jiang · Dengxin Dai · Gui-Song Xia
|
||
Perceptual Assessment and Optimization of HDR Image Rendering
Peibei Cao · Rafal Mantiuk · Kede Ma
|
||
Programmable Motion Generation for Open-set Motion Control Tasks
Hanchao Liu · Xiaohang Zhan · Shaoli Huang · Tai-Jiang Mu · Ying Shan
|
||
Projecting Trackable Thermal Patterns for Dynamic Computer Vision
Mark Sheinin · Aswin C. Sankaranarayanan · Srinivasa G. Narasimhan
|
||
Overcoming Generic Knowledge Loss with Selective Parameter Update
Wenxuan Zhang · Paul Janson · Rahaf Aljundi · Mohamed Elhoseiny
|
||
EventPS: Real-Time Photometric Stereo Using an Event Camera
Bohan Yu · Jieji Ren · Jin Han · Feishi Wang · Jinxiu Liang · Boxin Shi
|
||
Kernel Adaptive Convolution for Scene Text Detection via Distance Map Prediction
Jinzhi Zheng · Heng Fan · Libo Zhang
|
||
Open-Vocabulary 3D Semantic Segmentation with Foundation Models
Li Jiang · Shaoshuai Shi · Bernt Schiele
|
||
Pick-or-Mix: Dynamic Channel Sampling for ConvNets
Ashish Kumar · Daneul Kim · Jaesik Park · Laxmidhar Behera
|
||
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
Zeyu Han · Fangrui Zhu · Qianru Lao · Huaizu Jiang
|
||
Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers
Zi-Xin Zou · Zhipeng Yu · Yuan-Chen Guo · Yangguang Li · Yan-Pei Cao · Ding Liang · Song-Hai Zhang
|
||
CAMEL: CAusal Motion Enhancement tailored for Lifting Text-driven Video Editing
Guiwei Zhang · Tianyu Zhang · Guanglin Niu · Zichang Tan · Zichang Tan · Yalong Bai · Qing Yang
|
||
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
Mubashir Noman · Muzammal Naseer · Hisham Cholakkal · Rao Anwer · Salman Khan · Fahad Shahbaz Khan
|
||
Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network
Yong Shu · Liquan Shen · Xiangyu Hu · Mengyao Li · Zihao Zhou
|
||
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jieneng Chen · Qihang Yu · Xiaohui Shen · Alan L. Yuille · Liang-Chieh Chen
|
||
GraCo: Granularity-Controllable Interactive Segmentation
Yian Zhao · Kehan Li · Zesen Cheng · Pengchong Qiao · Xiawu Zheng · Rongrong Ji · Chang Liu · Li Yuan · Jie Chen
|
||
Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera
Jiye Lee · Hanbyul Joo
|
||
DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation
Yuanchen Wu · Xichen Ye · KequanYang · Jide Li · Xiaoqiang Li
|
||
Image Neural Field Diffusion Models
Yinbo Chen · Oliver Wang · Richard Zhang · Eli Shechtman · Xiaolong Wang · Michaël Gharbi
|
||
Segment Every Out-of-Distribution Object
Wenjie Zhao · Jia Li · Xin Dong · Yu Xiang · Yunhui Guo
|
||
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
Shangchen Zhou · Peiqing Yang · Jianyi Wang · Yihang Luo · Chen Change Loy
|
||
DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data
Qihao Liu · Yi Zhang · Song Bai · Adam Kortylewski · Alan L. Yuille
|
||
An Interactive Navigation Method with Effect-oriented Affordance
XIAOHAN Wang · Yuehu LIU · Xinhang Song · Yuyi Liu · Sixian Zhang · Shuqiang Jiang
|
||
NAPGuard: Towards Detecting Naturalistic Adversarial Patches
Siyang Wu · Jiakai Wang · Jiejie Zhao · Yazhe Wang · Xianglong Liu
|
||
A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning
Xiaoyang Xu · Mengda Yang · Wenzhe Yi · Ziang Li · Juan Wang · Hongxin Hu · Yong ZHUANG · Yaxin Liu
|
||
Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball
Simon Weber · Barış Zöngür · Nikita Araslanov · Daniel Cremers
|
||
Generative Region-Language Pretraining for Open-Ended Object Detection
Chuang Lin · Yi Jiang · Lizhen Qu · Zehuan Yuan · Jianfei Cai
|
||
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity
Ruijie Quan · Wenguan Wang · Zhibo Tian · Fan Ma · Yi Yang
|
||
Rethinking Multi-domain Generalization with A General Learning Objective
Zhaorui Tan · Xi Yang · Kaizhu Huang
|
||
A Theory of Joint Light and Heat Transport for Lambertian Scenes
Mani Ramanagopal · Sriram Narayanan · Aswin C. Sankaranarayanan · Srinivasa G. Narasimhan
|
||
Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset
Yujin Jeon · Eunsue Choi · Youngchan Kim · Yunseong Moon · Khalid Omer · Felix Heide · Seung-Hwan Baek
|
||
Efficient Stitchable Task Adaptation
Haoyu He · Zizheng Pan · Jing Liu · Jianfei Cai · Bohan Zhuang
|
||
MuGE: Multiple Granularity Edge Detection
Caixia Zhou · Yaping Huang · Mengyang Pu · Qingji Guan · Ruoxi Deng · Haibin Ling
|
||
Efficient Multitask Dense Predictor via Binarization
Yuzhang Shang · Dan Xu · Gaowen Liu · Ramana Kompella · Yan Yan
|
||
Novel View Synthesis with View-Dependent Effects from a Single Image
Juan Luis Gonzalez Bello · Munchurl Kim
|
||
Wired Perspectives: Multi-View Wire Art Embraces Generative AI
Zhiyu Qu · LAN YANG · Honggang Zhang · Tao Xiang · Kaiyue Pang · Yi-Zhe Song
|
||
Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation
Hongwei Yan · Liyuan Wang · Kaisheng Ma · Yi Zhong
|
||
Small Scale Data-Free Knowledge Distillation
He Liu · Yikai Wang · Huaping Liu · Fuchun Sun · Anbang Yao
|
||
FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features
Andre Rochow · Max Schwarz · Sven Behnke
|
||
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
Yuqi Wang · Yuntao Chen · Xingyu Liao · Lue Fan · Zhaoxiang Zhang
|
||
AdaBM: On-the-Fly Adaptive Bit Mapping for Image Super-Resolution
Cheeun Hong · Kyoung Mu Lee
|
||
Domain Separation Graph Neural Networks for Saliency Object Ranking
Zijian Wu · Jun Lu · Jing Han · Lianfa Bai · Yi Zhang · Zhuang Zhao · Siyang Song
|
||
Solving the Catastrophic Forgetting Problem in Generalized Category Discovery
Xinzi Cao · Xiawu Zheng · Guanhong Wang · Weijiang Yu · Yunhang Shen · Ke Li · Yutong Lu · Yonghong Tian
|
||
Improving Image Restoration through Removing Degradations in Textual Representations
Jingbo Lin · Zhilu Zhang · Yuxiang Wei · Dongwei Ren · Dongsheng Jiang · Qi Tian · Wangmeng Zuo
|
||
Activity-Biometrics: Person Identification from Daily Activities
Shehreen Azad · Yogesh S. Rawat
|
||
Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation
Ming Xu · Stephen Gould
|
||
Instance-level Expert Knowledge and Aggregate Discriminative Attention for Radiology Report Generation
Shenshen Bu · Taiji Li · Zhiming Dai · Yuedong Yang
|
||
HyperSDFusion: Bridging Hierarchical Structures in Language and Geometry for Enhanced 3D Text2Shape Generation
Zhiying Leng · Tolga Birdal · Xiaohui Liang · Federico Tombari
|
||
MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images
Junwen Huang · Hao Yu · Kuan-Ting Yu · Nassir Navab · Slobodan Ilic · Benjamin Busam
|
||
Towards Variable and Coordinated Holistic Co-Speech Motion Generation
Yifei Liu · Qiong Cao · Yandong Wen · Huaiguang Jiang · Changxing Ding
|
||
Fast ODE-based Sampling for Diffusion Models in Around 5 Steps
Zhenyu Zhou · Defang Chen · Can Wang · Chun Chen
|
||
Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models
Huimin Huang · Yawen Huang · Lanfen Lin · Ruofeng Tong · Yen-Wei Chen · Hao Zheng · Yuexiang Li · Yefeng Zheng
|
||
WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concept
Yong Hyun Ahn · Hyeon Bae Kim · Seong Tae Kim
|
||
ToonerGAN: Reinforcing GANs for Obfuscating Automated Facial Indexing
Kartik Thakral · Shashikant Prasad · Stuti Aswani · Mayank Vatsa · Richa Singh
|
||
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation
Wenxuan Wang · Tongtian Yue · Yisi Zhang · Longteng Guo · Xingjian He · Xinlong Wang · Jing Liu
|
||
Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction
Junuk Cha · Jihyeon Kim · Jae Shin Yoon · Seungryul Baek
|
||
Video Frame Interpolation via Direct Synthesis with the Event-based Reference
Yuhan Liu · Yongjian Deng · Hao Chen · Zhen Yang
|
||
Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation
Xiao Lin · Wenfei Yang · Yuan Gao · Tianzhu Zhang
|
||
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation
Bo-Yuan Sun · Yuqi Yang · Le Zhang · Ming-Ming Cheng · Qibin Hou
|
||
MCNet: Rethinking the Core Ingredients for Accurate and Efficient Homography Estimation
Haokai Zhu · Si-Yuan Cao · Jianxin Hu · Sitong Zuo · Beinan Yu · Jiacheng Ying · Junwei Li · Hui-Liang Shen
|
||
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu · Jiacheng Zhu · William Han · Aditesh Kumar · Karthik Mittal · Claire Jin · Zhengyuan Yang · Linjie Li · Jianfeng Wang · DING ZHAO · Bo Li · Lijuan Wang
|
||
Open-Set Domain Adaptation for Semantic Segmentation
Seun-An Choe · Ah-Hyung Shin · Keon Hee Park · Jinwoo Choi · Gyeong-Moon Park
|
||
LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen · Leyang Shen · Rui Shao · Xiang Deng · Liqiang Nie
|
||
Pixel Aligned Language Models
Jiarui Xu · Xingyi Zhou · Shen Yan · Xiuye Gu · Anurag Arnab · Chen Sun · Xiaolong Wang · Cordelia Schmid
|
||
Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection
Zhiyuan Yan · Yuhao Luo · Siwei Lyu · Qingshan Liu · Baoyuan Wu
|
||
Rethinking the Evaluation Protocol of Domain Generalization
Han Yu · Xingxuan Zhang · Renzhe Xu · Jiashuo Liu · Yue He · Peng Cui
|
||
PFStorer: Personalized Face Restoration and Super-Resolution
Tuomas Varanka · Tapani Toivonen · Soumya Tripathy · Guoying Zhao · Erman Acar
|
||
Adapters Strike Back
Jan-Martin Steitz · Stefan Roth
|
||
Eclipse: Disambiguating Illumination and Materials using Unintended Shadows
Dor Verbin · Ben Mildenhall · Peter Hedman · Jonathan T. Barron · Todd Zickler · Pratul P. Srinivasan
|
||
ASAM: Boosting Segment Anything Model with Adversarial Tuning
Bo Li · Haoke Xiao · Lv Tang
|
||
ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis
Muhammad Hamza Mughal · Rishabh Dabral · Ikhsanul Habibie · Lucia Donatelli · Marc Habermann · Christian Theobalt
|
||
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
Bowen Wen · Wei Yang · Jan Kautz · Stan Birchfield
|
||
Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach
Beichen Zhang · Xiaoxing Wang · Xiaohan Qin · Junchi Yan
|
||
Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On
Xu Yang · Changxing Ding · Zhibin Hong · Junhao Huang · Jin Tao · Xiangmin Xu
|
||
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Wei Su · Peihan Miao · Huanzhang Dou · Xi Li
|
||
Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text
Junshu Tang · Yanhong Zeng · Ke Fan · Xuheng Wang · Bo Dai · Kai Chen · Lizhuang Ma
|
||
Exploiting Diffusion Prior for Generalizable Dense Prediction
Hsin-Ying Lee · Hung-Yu Tseng · Hsin-Ying Lee · Ming-Hsuan Yang
|
||
GSVA: Generalized Segmentation via Multimodal Large Language Models
Zhuofan Xia · Dongchen Han · Yizeng Han · Xuran Pan · Shiji Song · Gao Huang
|
||
ElasticDiffusion: Training-free Arbitrary Size Image Generation
Moayed Haji Ali · Guha Balakrishnan · Vicente Ordonez
|
||
Uncertainty Visualization via Low-Dimensional Posterior Projections
Omer Yair · Tomer Michaeli · Elias Nehme
|
||
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
Young Kyun Jang · Donghyun Kim · Zihang Meng · Dat Huynh · Ser-Nam Lim
|
||
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu · Saining Xie
|
||
Real-Time Neural BRDF with Spherically Distributed Primitives
Yishun Dou · Zhong Zheng · Qiaoqiao Jin · Bingbing Ni · Yugang Chen · Junxiang Ke
|
||
RCL: Reliable Continual Learning for Unified Failure Detection
Fei Zhu · Zhen Cheng · Xu-Yao Zhang · Cheng-Lin Liu · Zhaoxiang Zhang
|
||
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
Ozgur Kara · Bariscan Kurtkaya · Hidir Yesiltepe · James Rehg · Pinar Yanardag
|
||
Geometry Transfer for Stylizing Radiance Fields
Hyunyoung Jung · Seonghyeon Nam · Nikolaos Sarafianos · Sungjoo Yoo · Alexander Sorkine-Hornung · Rakesh Ranjan
|
||
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace · Meihua Dang · Rafael Rafailov · Linqi Zhou · Aaron Lou · Senthil Purushwalkam · Stefano Ermon · Caiming Xiong · Shafiq Joty · Nikhil Naik
|
||
CSTA: CNN-based Spatiotemporal Attention for Video Summarization
Jaewon Son · Jaehun Park · Kwangsu Kim
|
||
Sieve: Multimodal Dataset Pruning using Image-Captioning Models
Anas Mahmoud · Mostafa Elhoushi · Amro Abbas · Yu Yang · Newsha Ardalani · Hugh Leather · Ari Morcos
|
||
AMU-Tuning: Learning Effective Bias for CLIP-based Few-shot Classification
Yuwei Tang · ZhenYi Lin · Qilong Wang · Pengfei Zhu · Qinghua Hu
|
||
Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation
Song Wang · Jiawei Yu · Wentong Li · Wenyu Liu · Xiaolu Liu · Junbo Chen · Jianke Zhu
|
||
Towards Fairness-Aware Adversarial Learning
Yanghao Zhang · Tianle Zhang · Ronghui Mu · Xiaowei Huang · Wenjie Ruan
|
||
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu · Yifei Huang · Junlin Hou · Guo Chen · Yuejie Zhang · Rui Feng · Weidi Xie
|
||
Low-Rank Knowledge Decomposition for Medical Foundation Models
Yuhang Zhou · Haolin li · Siyuan Du · Jiangchao Yao · Ya Zhang · Yanfeng Wang
|
||
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
Shivangi Aneja · Justus Thies · Angela Dai · Matthias Nießner
|
||
Pixel-level Semantic Correspondence through Layout-aware Representation Learning and Multi-scale Matching Integration
Yixuan Sun · Zhangyue Yin · Haibo Wang · Yan Wang · Xipeng Qiu · Weifeng Ge · Wenqiang Zhang
|
||
CPR: Retrieval Augmented Generation for Copyright Protection
Aditya Golatkar · Alessandro Achille · Luca Zancato · Yu-Xiang Wang · Ashwin Swaminathan · Stefano Soatto
|
||
Event-assisted Low-Light Video Object Segmentation
Li Hebei · Jin Wang · Jiahui Yuan · Yue Li · Wenming Weng · Yansong Peng · Yueyi Zhang · Zhiwei Xiong · Xiaoyan Sun
|
||
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Wujian Peng · Sicheng Xie · Zuyao You · Shiyi Lan · Zuxuan Wu
|
||
Animating General Image with Large Visual Motion Model
Dengsheng Chen · Xiaoming Wei · Xiaolin Wei
|
||
DeIl: Direct and Inverse CLIP for Open-World Few-Shot Learning
Shuai Shao · Yu Bai · Yan WANG · Bao-di Liu · Yicong Zhou
|
||
FedAS: Bridging Inconsistency in Personalized Federated Learning
Xiyuan Yang · Wenke Huang · Mang Ye
|
||
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi · Ye Fang · Zeyi Sun · Xiaoyang Wu · Tong Wu · Jiaqi Wang · Dahua Lin · Hengshuang Zhao
|
||
Scene Adaptive Sparse Transformer for Event-based Object Detection
Yansong Peng · Li Hebei · Yueyi Zhang · Xiaoyan Sun · Feng Wu
|
||
Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary
Leheng Zhang · Yawei Li · Xingyu Zhou · Xiaorui Zhao · Shuhang Gu
|
||
Rendering Every Pixel for High-Fidelity Geometry in 3D GANs
Alex Trevithick · Matthew Chan · Towaki Takikawa · Umar Iqbal · Shalini De Mello · Manmohan Chandraker · Ravi Ramamoorthi · Koki Nagano
|
||
Residual Learning in Diffusion Models
Junyu Zhang · Daochang Liu · Eunbyung Park · Shichao Zhang · Chang Xu
|
||
Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains
Bang-Dang Pham · Phong Tran · Anh Tran · Cuong Pham · Rang Nguyen · Minh Hoai
|
||
FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
Jiahui Zhang · Fangneng Zhan · MUYU XU · Shijian Lu · Eric P. Xing
|
||
PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns
Shuliang Ning · Duomin Wang · Yipeng Qin · Zirong Jin · Baoyuan Wang · Xiaoguang Han
|
||
Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation.
Dong Zhao · Shuang Wang · Qi Zang · Licheng Jiao · Nicu Sebe · Zhun Zhong
|
||
Revisiting Sampson Approximations for Geometric Estimation Problems
Felix Rydell · Angelica Torres · Viktor Larsson
|
||
Neural 3D Strokes: Creating Stylized 3D Scenes with Vectorized 3D Strokes
Haobin Duan · Miao Wang · Yanxun Li · Yong-Liang Yang
|
||
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception
Junwen He · Yifan Wang · Lijun Wang · Huchuan Lu · Bin Luo · Jun-Yan He · Jin-Peng Lan · Xuansong Xie
|
||
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers
Subhadeep Koley · Ayan Kumar Bhunia · Aneeshan Sain · Pinaki Nath Chowdhury · Tao Xiang · Yi-Zhe Song
|
||
Flexible Depth Completion for Sparse and Varying Point Densities
Jinhyung Park · Yu-Jhe Li · Kris Kitani
|
||
Sparse Global Matching for Video Frame Interpolation with Large Motion
Chunxu Liu · Guozhen Zhang · Rui Zhao · Limin Wang
|
||
PIGEON: Predicting Image Geolocations
Lukas Haas · Michal Skreta · Silas Alberti · Chelsea Finn
|
||
Improving Generalization via Meta-Learning on Hard Samples
Nishant Jain · Arun Suggala · Pradeep Shenoy
|
||
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
Chi-Hsi Kung · 書緯 呂 · Yi-Hsuan Tsai · Yi-Ting Chen
|
||
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Shuyang Sun · Runjia Li · Philip H.S. Torr · Xiuye Gu · Siyang Li
|
||
LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition
Zhonglin Sun · Chen Feng · Ioannis Patras · Georgios Tzimiropoulos
|
||
SinSR: Diffusion-Based Image Super-Resolution in a Single Step
Yufei Wang · Wenhan Yang · Xinyuan Chen · Yaohui Wang · Lanqing Guo · Lap-Pui Chau · Ziwei Liu · Yu Qiao · Alex C. Kot · Bihan Wen
|
||
Tuning Stable Rank Shrinkage: Aiming at the Overlooked Structural Risk in Fine-tuning
Sicong Shen · Yang Zhou · Bingzheng Wei · Eric Chang · Yan Xu
|
||
DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF
Jie Long Lee · Chen Li · Gim Hee Lee
|
||
Relightable and Animatable Neural Avatar from Sparse-View Video
Zhen Xu · Sida Peng · Chen Geng · Linzhan Mou · Zihan Yan · Jiaming Sun · Hujun Bao · Xiaowei Zhou
|
||
DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses
Chen Zhao · Tong Zhang · Zheng Dang · Mathieu Salzmann
|
||
PostureHMR: Posture Transformation for 3D Human Mesh Recovery
Yu-Pei Song · Xiao WU · Zhaoquan Yuan · Jian-Jun Qiao · Qiang Peng
|
||
VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
Jiaqi Lin · Zhihao Li · Xiao Tang · Jianzhuang Liu · Shiyong Liu · Jiayue Liu · Yangdi Lu · Xiaofei Wu · Songcen Xu · Youliang Yan · Wenming Yang
|
||
WANDR: Intention-guided Human Motion Generation
Markos Diomataris · Nikos Athanasiou · Omid Taheri · Xi Wang · Otmar Hilliges · Michael J. Black
|
||
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
Tao Lu · Mulin Yu · Linning Xu · Yuanbo Xiangli · Limin Wang · Dahua Lin · Bo Dai
|
||
SimDA: Simple Diffusion Adapter for Efficient Video Generation
Zhen Xing · Qi Dai · Han Hu · Zuxuan Wu · Yu-Gang Jiang
|
||
GART: Gaussian Articulated Template Models
Jiahui Lei · Yufu Wang · Georgios Pavlakos · Lingjie Liu · Kostas Daniilidis
|
||
Learning from Observer Gaze: Zero-shot Attention Prediction Oriented by Human-Object Interaction Recognition
Yuchen Zhou · Linkai Liu · Chao Gou
|
||
Anchor-based Robust Finetuning of Vision-Language Models
Jinwei Han · Zhiwen Lin · Zhongyisun Sun · Yingguo Gao · Ke Yan · Shouhong Ding · Yuan Gao · Gui-Song Xia
|
||
Denoising Point Clouds in Latent Space via Graph Convolution and Invertible Neural Network
Aihua Mao · Biao Yan · Zijing Ma · Ying He
|
||
Diffusion Handles: Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D
Karran Pandey · Paul Guerrero · Matheus Gadelha · Yannick Hold-Geoffroy · Karan Singh · Niloy J. Mitra
|
||
PSDPM: Prototype-based Secondary Discriminative Pixels Mining for Weakly Supervised Semantic Segmentation
Xinqiao Zhao · Ziqian Yang · Tianhong Dai · Bingfeng Zhang · Jimin Xiao
|
||
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
Jisu Nam · Heesu Kim · DongJae Lee · Siyoon Jin · Seungryong Kim · Seunggyu Chang
|
||
COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction
Qihang Ma · Xin Tan · Yanyun Qu · Lizhuang Ma · Zhizhong Zhang · Yuan Xie
|
||
Generalizable Novel-View Synthesis using a Stereo Camera
Haechan Lee · Wonjoon Jin · Seung-Hwan Baek · Sunghyun Cho
|
||
Prompt3D: Random Prompt Assisted Weakly-Supervised 3D Object Detection
Xiaohong Zhang · Huisheng Ye · Jingwen Li · Qinyu Tang · Yuanqi Li · Yanwen Guo · Jie Guo
|
||
Language-driven All-in-one Adverse Weather Removal
Hao Yang · Liyuan Pan · Yan Yang · Wei Liang
|
||
Efficient Meshflow and Optical Flow Estimation from Event Cameras
Xinglong Luo · Ao Luo · Zhengning Wang · Chunyu Lin · Bing Zeng · Shuaicheng Liu
|
||
Volumetric Environment Representation for Vision-Language Navigation
Liu · Wenguan Wang · Yi Yang
|
||
LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis
Zehan Zheng · Fan Lu · Weiyi Xue · Guang Chen · Changjun Jiang
|
||
LEAD: Learning Decomposition for Source-free Universal Domain Adaptation
Sanqing Qu · Tianpei Zou · Lianghua He · Florian Röhrbein · Alois Knoll · Guang Chen · Changjun Jiang
|
||
CG-HOI: Contact-Guided 3D Human-Object Interaction Generation
Christian Diller · Angela Dai
|
||
Contrastive Mean-Shift Learning for Generalized Category Discovery
Sua Choi · Dahyun Kang · Minsu Cho
|
||
Federated Generalized Category Discovery
Nan Pu · Wenjing Li · Xinyuan Ji · Yalan Qin · Nicu Sebe · Zhun Zhong
|
||
Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?
Hanxin Zhu · Tianyu He · Xin Li · Bingchen Li · Zhibo Chen
|
||
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
Jiazuo Yu · Yunzhi Zhuge · Lu Zhang · Ping Hu · Dong Wang · Huchuan Lu · You He
|
||
How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?
Subhadeep Koley · Ayan Kumar Bhunia · Aneeshan Sain · Pinaki Nath Chowdhury · Tao Xiang · Yi-Zhe Song
|
||
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing
Chong Mou · Xintao Wang · Jiechong Song · Ying Shan · Jian Zhang
|
||
Iterated Learning Improves Compositionality in Large Vision-Language Models
Chenhao Zheng · Jieyu Zhang · Aniruddha Kembhavi · Ranjay Krishna
|
||
Detours for Navigating Instructional Videos
Kumar Ashutosh · Zihui Xue · Tushar Nagarajan · Kristen Grauman
|
||
Domain Gap Embeddings for Generative Dataset Augmentation
Yinong Wang · Younjoon Chung · Chen Henry Wu · Fernando De la Torre
|
||
Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation
Zhekai Du · Xinyao Li · Fengling Li · Ke Lu · Lei Zhu · Jingjing Li
|
||
TransLoc4D: Transformer-based 4D Radar Place Recognition
Guohao Peng · Heshan Li · Yangyang Zhao · Jun Zhang · Zhenyu Wu · Pengyu Zheng · Danwei Wang
|
||
Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
Jingyun Wang · Guoliang Kang
|
||
Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
Sravanti Addepalli · Ashish Asokan · Lakshay Sharma · R. Venkatesh Babu
|
||
Towards Learning a Generalist Model for Embodied Navigation
Duo Zheng · Shijia Huang · Lin Zhao · Yiwu Zhong · Liwei Wang
|
||
Small Steps and Level Sets: Fitting Neural Surface Models with Point Guidance
Chamin Hewa Koneputugodage · Yizhak Ben-Shabat · Dylan Campbell · Stephen Gould
|
||
Absolute Pose from One or Two Scaled and Oriented Features
Jonathan Ventura · Zuzana Kukelova · Torsten Sattler · Daniel Barath
|
||
\emph{RealCustom}: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
Mengqi Huang · Zhendong Mao · Mingcong Liu · Qian HE · Yongdong Zhang
|
||
Driving Everywhere with Large Language Model Policy Adaptation
Boyi Li · Yue Wang · Jiageng Mao · Boris Ivanovic · Sushant Veer · Karen Leung · Marco Pavone
|
||
SANeRF-HQ: Segment Anything for NeRF in High Quality
Yichen Liu · Benran Hu · Chi-Keung Tang · Yu-Wing Tai
|
||
APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation
Weizhao He · Yang Zhang · Wei Zhuo · Linlin Shen · Jiaqi Yang · Songhe Deng · Liang Sun
|
||
ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning
Beomyoung Kim · Joonsang Yu · Sung Ju Hwang
|
||
InstanceDiffusion: Instance-level Control for Image Generation
Xudong Wang · Trevor Darrell · Sai Saketh Rambhatla · Rohit Girdhar · Ishan Misra
|
||
Shadow Generation for Composite Image Using Diffusion Model
Qingyang Liu · Junqi You · Jian-Ting Wang · Xinhao Tao · Bo Zhang · Li Niu
|
||
DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes
Hao Yan · Zhihui Ke · Xiaobo Zhou · Tie Qiu · Xidong Shi · DaDong Jiang
|
||
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation
Ganlong Zhao · Guanbin Li · Weikai Chen · Yizhou Yu
|
||
Rolling Shutter Correction with Intermediate Distortion Flow Estimation
Mingdeng Cao · Sidi Yang · Yujiu Yang · Yinqiang Zheng
|
||
Towards Transferable Targeted 3D Adversarial Attack in the Physical World
Yao Huang · Yinpeng Dong · Shouwei Ruan · Xiao Yang · Hang Su · Xingxing Wei
|
||
AnyDoor: Zero-shot Object-level Image Customization
Xi Chen · Lianghua Huang · Yu Liu · Yujun Shen · Deli Zhao · Hengshuang Zhao
|
||
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs
Gege Gao · Weiyang Liu · Anpei Chen · Andreas Geiger · Bernhard Schölkopf
|
||
Revisiting Spatial-Frequency Information Integration from a Hierarchical Perspective for Panchromatic and Multi-Spectral Image Fusion
Jiangtong Tan · Jie Huang · Naishan Zheng · Man Zhou · Keyu Yan · Danfeng Hong · Feng Zhao
|
||
3D Facial Expressions through Analysis-by-Neural-Synthesis
George Retsinas · Panagiotis Filntisis · Radek Danecek · Victoria Abrevaya · Anastasios Roussos · Timo Bolkart · Petros Maragos
|
||
Exploring the Transferability of Visual Prompting for Multimodal Large Language Models
Yichi Zhang · Yinpeng Dong · Siyuan Zhang · Tianzan Min · Hang Su · Jun Zhu
|
||
Unified Language-driven Zero-shot Domain Adaptation
Senqiao Yang · Zhuotao Tian · Li Jiang · Jiaya Jia
|
||
Aligning Logits Generatively for Principled Black-Box Knowledge Distillation
Jing Ma · Xiang Xiang · Ke Wang · Yuchuan Wu · Yongbin Li
|
||
HomoFormer: Homogenized Transformer for Image Shadow Removal
Jie Xiao · Xueyang Fu · Yurui Zhu · Dong Li · Jie Huang · Kai Zhu · Zheng-Jun Zha
|
||
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Narges Norouzi · Svetlana Orlova · Daan de Geus · Gijs Dubbelman
|
||
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
Yifan Wang · Xingyi He · Sida Peng · Dongli Tan · Xiaowei Zhou
|
||
Language-guided Image Reflection Separation
Haofeng Zhong · Yuchen Hong · Shuchen Weng · Jinxiu Liang · Boxin Shi
|
||
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Yiming Zhang · Zhening Xing · Yanhong Zeng · Youqing Fang · Kai Chen
|
||
Motion Diversification Networks
Hee Jae Kim · Eshed Ohn-Bar
|
||
On the Scalability of Diffusion-based Text-to-Image Generation
Hao Li · Yang Zou · Ying Wang · Orchid Majumder · Yusheng Xie · R. Manmatha · Ashwin Swaminathan · Zhuowen Tu · Stefano Ermon · Stefano Soatto
|
||
BSNet: Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation
Jiahao Lu · Jiacheng Deng · Tianzhu Zhang
|
||
Unlocking Pretrained Image Backbones for Semantic Image Synthesis
Tariq Berrada · Jakob Verbeek · camille couprie · Karteek Alahari
|
||
HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D
Sangmin Woo · byeongjun park · Hyojun Go · Jin-Young Kim · Changick Kim
|
||
Infer from What You Have Seen Before: Temporally-dependent Classifier for Semi-supervised Video Semantic Segmentation
Jiafan Zhuang · Zilei Wang · Yixin Zhang · Zhun Fan
|
||
Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation
Jonas Herzog
|
||
FreeU: Free Lunch in Diffusion U-Net
Chenyang Si · Ziqi Huang · Yuming Jiang · Ziwei Liu
|
||
From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers
Swaminathan Gurumurthy · Karnik Ram · Bingqing Chen · Zachary Manchester · Zico Kolter
|
||
Image Restoration by Denoising Diffusion Models With Iteratively Preconditioned Guidance
Tomer Garber · Tom Tirer
|
||
Mean-Shift Feature Transformer
Takumi Kobayashi
|
||
SFOD: Spiking Fusion Object Detector
Yimeng Fan · Wei Zhang · Changsong Liu · Mingyang Li · Wenrui Lu
|
||
RegionGPT: Towards Region Understanding Vision Language Model
Qiushan Guo · Shalini De Mello · Danny Yin · Wonmin Byeon · Ka Chun Cheung · Yizhou Yu · Ping Luo · Sifei Liu
|
||
Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors
Ziqin Zhou · Hai-Ming Xu · Yangyang Shu · Lingqiao Liu
|
||
Relational Matching for Weakly Semi-Supervised Oriented Object Detection
Wenhao Wu · Hau San Wong · Si Wu · Tianyou Zhang
|
||
JointSQ: Joint Sparsification-Quantization for Distributed Learning
Weiying Xie · Haowei Li · Ma Jitao · Yunsong Li · Jie Lei · donglai Liu · Leyuan Fang
|
||
Endow SAM with Keen Eyes: Temporal-spatial Prompt Learning for Video Camouflaged Object Detection
Wenjun Hui · Zhenfeng Zhu · Shuai Zheng · Yao Zhao
|
||
NICE: Neurogenesis Inspired Contextual Encoding for Replay-free Class Incremental Learning
Mustafa B Gurbuz · Jean Moorman · Constantine Dovrolis
|
||
Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
Axel Barroso-Laguna · Sowmya Munukutla · Victor Adrian Prisacariu · Eric Brachmann
|
||
Learning for Transductive Threshold Calibration in Open-World Recognition
Qin ZHANG · DONGSHENG An · Tianjun Xiao · Tong He · Qingming Tang · Ying Nian Wu · Joseph Tighe · Yifan Xing
|
||
LightOctree: Lightweight 3D Spatially-Coherent Indoor Lighting Estimation
Xuecan Wang · Shibang Xiao · Xiaohui Liang
|
||
pix2gestalt: Amodal Segmentation by Synthesizing Wholes
Ege Ozguroglu · Ruoshi Liu · Dídac Surís · Dian Chen · Achal Dave · Pavel Tokmakov · Carl Vondrick
|
||
Navigate Beyond Shortcuts: Debiased Learning through the Lens of Neural Collapse
Yining Wang · Junjie Sun · Chenyue Wang · Mi Zhang · Min Yang
|
||
SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching
Xinghui Li · Jingyi Lu · Kai Han · Victor Adrian Prisacariu
|
||
3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
Jiakai Sun · Han Jiao · Guangyuan Li · Zhanjie Zhang · Lei Zhao · Wei Xing
|
||
TextCraftor: Your Text Encoder Can be Image Quality Controller
Yanyu Li · Xian Liu · Anil Kag · Ju Hu · Yerlan Idelbayev · Dhritiman Sagar · Yanzhi Wang · Sergey Tulyakov · Jian Ren
|
||
AAMDM: Accelerated Auto-regressive Motion Diffusion Model
Tianyu Li · Calvin Zhuhan Qiao · Ren Guanqiao · KangKang Yin · Sehoon Ha
|
||
TexOct: Generating Textures of 3D Models with Octree-based Diffusion
Jialun Liu · Chenming Wu · Xinqi Liu · Xing Liu · Jinbo Wu · Haotian Peng · Chen Zhao · Haocheng Feng · Jingtuo Liu · Errui Ding
|
||
OTE: Exploring Accurate Scene Text Recognition Using One Token
Jianjun Xu · Yuxin Wang · Hongtao Xie · Yongdong Zhang
|
||
Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
Biao Gong · Siteng Huang · Yutong Feng · Shiwei Zhang · Yuyuan Li · Yu Liu
|
||
$\mathsf{LQMFormer}$:~Language-aware Query Mask Transformer for Referring Image Segmentation
Nisarg Shah · Vibashan VS · Vishal M. Patel
|
||
Latent Modulated Function for Computational Optimal Continuous Image Representation
Zongyao He · Zhi Jin
|
||
Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Bingxin Ke · Anton Obukhov · Shengyu Huang · Nando Metzger · Rodrigo Caye Daudt · Konrad Schindler
|
||
LiDAR-based Person Re-identification
Wenxuan Guo · Zhiyu Pan · Yingping Liang · Ziheng Xi · Zhi Chen Zhong · Jianjiang Feng · Jie Zhou
|
||
Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification
Bin Yang · Jun Chen · Mang Ye
|
||
Spherical Mask: Coarse-to-Fine 3D Point Cloud Instance Segmentation with Spherical Representation
Sangyun Shin · Kaichen Zhou · Madhu Vankadari · Andrew Markham · Niki Trigoni
|
||
Neural Spline Fields for Burst Image Fusion and Layer Separation
Ilya Chugunov · David Shustin · Ruyu Yan · Chenyang Lei · Felix Heide
|
||
L2B: Learning to Bootstrap Robust Models for Combating Label Noise
Yuyin Zhou · Xianhang li · Fengze Liu · Qingyue Wei · Xuxi Chen · Lequan Yu · Cihang Xie · Matthew P. Lungren · Lei Xing
|
||
Deep Video Inverse Tone Mapping Based on Temporal Clues
Yuyao Ye · Ning Zhang · Yang Zhao · Hongbin Cao · Ronggang Wang
|
||
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis
Ziqiao Peng · Wentao Hu · Yue Shi · Xiangyu Zhu · Xiaomei Zhang · Hao Zhao · Jun He · Hongyan Liu · Zhaoxin Fan
|
||
Attack To Defend: Exploiting Adversarial Attacks for Detecting Poisoned Models
Samar Fares · Karthik Nandakumar
|
||
Non-autoregressive Sequence-to-Sequence Vision-Language Models
Kunyu Shi · Qi Dong · Luis Goncalves · Zhuowen Tu · Stefano Soatto
|
||
Seeing the Unseen: Visual Common Sense for Semantic Placement
Ram Ramrakhya · Aniruddha Kembhavi · Dhruv Batra · Zsolt Kira · Kuo-Hao Zeng · Luca Weihs
|
||
Inverse Rendering of Glossy Objects via the Neural Plenoptic Function and Radiance Fields
Haoyuan Wang · Wenbo Hu · Lei Zhu · Rynson W.H. Lau
|
||
3D LiDAR Mapping in Dynamic Environments using a 4D Implicit Neural Representation
Xingguang Zhong · Yue Pan · Cyrill Stachniss · Jens Behley
|
||
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
Yuanhui Huang · Wenzhao Zheng · Borui Zhang · Jie Zhou · Jiwen Lu
|
||
SUGAR: Pre-training 3D Visual Representation for Robotics
Shizhe Chen · Ricardo Garcia Pinel · Ivan Laptev · Cordelia Schmid
|
||
GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
Taoran Yi · Jiemin Fang · Junjie Wang · Guanjun Wu · Lingxi Xie · Xiaopeng Zhang · Wenyu Liu · Qi Tian · Xinggang Wang
|
||
Active Generalized Category Discovery
Shijie Ma · Fei Zhu · Zhun Zhong · Xu-Yao Zhang · Cheng-Lin Liu
|
||
CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering
Shaowei Wang · Lingling Zhang · Longji Zhu · Tao Qin · Kim-Hui Yap · Xinyu Zhang · Jun Liu
|
||
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives
Simone Peirone · Francesca Pistilli · Antonio Alliegro · Giuseppe Averta
|
||
Compact 3D Gaussian Representation for Radiance Field
Joo Chan Lee · Daniel Rho · Xiangyu Sun · Jong Hwan Ko · Eunbyung Park
|
||
FutureHuman3D: Forecasting Complex Long-Term 3D Human Behavior from Video Observations
Christian Diller · Thomas Funkhouser · Angela Dai
|
||
FlowIE:Efficient Image Enhancement via Rectified Flow
Yixuan Zhu · Wenliang Zhao · Ao Li · Yansong Tang · Jie Zhou · Jiwen Lu
|
||
Combining Frame and GOP Embeddings for Neural Video Representation
Jens Eirik Saethre · Roberto Azevedo · Christopher Schroers
|
||
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Qidong Huang · Xiaoyi Dong · Pan Zhang · Bin Wang · Conghui He · Jiaqi Wang · Dahua Lin · Weiming Zhang · Nenghai Yu
|
||
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
Yushi Huang · Ruihao Gong · Jing Liu · Tianlong Chen · Xianglong Liu
|
||
Not All Classes Stand on Same Embeddings: Calibrating a Semantic Distance with Metric Tensor
Jae Hyeon Park · Gyoomin Lee · Seunggi Park · Sung In Cho
|
||
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Shengqu Cai · Duygu Ceylan · Matheus Gadelha · Chun-Hao P. Huang · Tuanfeng Y. Wang · Gordon Wetzstein
|
||
Improving Out-of-Distribution Generalization in Graphs via Hierarchical Semantic Environments
Yinhua Piao · Sangseon Lee · Yijingxiu Lu · Sun Kim
|
||
Towards Understanding and Improving Adversarial Robustness of Vision Transformers
Samyak Jain · Tanima Dutta
|
||
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval
Fang Kaipeng · Jingkuan Song · Lianli Gao · Pengpeng Zeng · Zhi-Qi Cheng · Xiyao LI · Heng Tao Shen
|
||
ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting
Yankai Jiang · Zhongzhen Huang · Rongzhao Zhang · Xiaofan Zhang · Shaoting Zhang
|
||
Improved Self-Training for Test-Time Adaptation
Jing Ma
|
||
Structure-Aware Sparse-View X-ray 3D Reconstruction
Yuanhao Cai · Jiahao Wang · Alan L. Yuille · Zongwei Zhou · Angtian Wang
|
||
LangSplat: 3D Language Gaussian Splatting
Minghan Qin · Wanhua Li · Jiawei ZHOU · Haoqian Wang · Hanspeter Pfister
|
||
Retrieval-Augmented Embodied Agents
Yichen Zhu · Zhicai Ou · Xiaofeng Mou · Jian Tang
|
||
Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining
Xiang Chen · Jinshan Pan · Jiangxin Dong
|
||
Positive-Unlabeled Learning by Latent Group-Aware Meta Disambiguation
Lin Long · Haobo Wang · Zhijie Jiang · Lei Feng · Chang Yao · Gang Chen · Junbo Zhao
|
||
Contextrast: Contextual Contrastive Learning for Semantic Segmentation
Changki Sung · Wanhee Kim · Jungho An · WooJu Lee · Hyungtae Lim · Hyun Myung
|
||
DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data
Hanrong Ye · Dan Xu
|
||
Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis
FeiFan Xu · Rui Li · Si Wu · Yong Xu · Hau San Wong
|
||
MonoCD: Monocular 3D Object Detection with Complementary Depths
Longfei Yan · Pei Yan · Shengzhou Xiong · Xuanyu Xiang · Yihua Tan
|
||
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
Yu Zeng · Vishal M. Patel · Haochen Wang · Xun Huang · Ting-Chun Wang · Ming-Yu Liu · Yogesh Balaji
|
||
A Linear N-Point Solver for Line and Motion Estimation with Event Cameras
Ling Gao · Daniel Gehrig · Hang Su · Davide Scaramuzza · Laurent Kneip
|
||
Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction
Zilin Du · Haoxin Li · Xu Guo · Boyang Li
|
||
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Guanjun Wu · Taoran Yi · Jiemin Fang · Lingxi Xie · Xiaopeng Zhang · Wei Wei · Wenyu Liu · Qi Tian · Xinggang Wang
|
||
Differentiable Information Bottleneck for Deterministic Multi-view Clustering
Xiaoqiang Yan · Zhixiang Jin · Fengshou Han · Yangdong Ye
|
||
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
Antoine Guédon · Vincent Lepetit
|
||
R-Cyclic Diffuser: Reductive and Cyclic Latent Diffusion for 3D Clothed Human Digitalization
Kennard Chan · Fayao Liu · Guosheng Lin · Chuan-Sheng Foo · Weisi Lin
|
||
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Zhang Li · Biao Yang · Qiang Liu · Zhiyin Ma · Shuo Zhang · Jingxu Yang · Yabo Sun · Yuliang Liu · Xiang Bai
|
||
Zero-Reference Low-Light Enhancement via Physical Quadruple Priors
Wenjing Wang · Huan Yang · Jianlong Fu · Jiaying Liu
|
||
Hybrid Proposal Refiner: Revisiting DETR Series from the Faster R-CNN Perspective
Jinjing Zhao · Fangyun Wei · Chang Xu
|
||
DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion
Tom Van Wouwe · Seunghwan Lee · Antoine Falisse · Scott Delp · Karen Liu
|
||
HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion
Jingbo Zhang · Xiaoyu Li · Qi Zhang · Yan-Pei Cao · Ying Shan · Jing Liao
|
||
CurveCloudNet: Processing Point Clouds with 1D Structure
Colton Stearns · Alex Fu · Jiateng Liu · Jeong Joon Park · Davis Rempe · Despoina Paschalidou · Leonidas Guibas
|
||
Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach
Guoqiang Liang · Kanghao Chen · Hangyu Li · Yunfan Lu · Lin Wang
|
||
Learning Visual Prompt for Gait Recognition
Kang Ma · Ying Fu · Chunshui Cao · Saihui Hou · Yongzhen Huang · Dezhi Zheng
|
||
FCS: Feature Calibration and Separation for Non-Exemplar Class Incremental Learning
Qiwei Li · Yuxin Peng · Jiahuan Zhou
|
||
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Junhao Zheng · Chenhao Lin · Jiahao Sun · Zhengyu Zhao · Qian Li · Chao Shen
|
||
Discovering and Mitigating Visual Biases through Keyword Explanation
Younghyun Kim · Sangwoo Mo · Minkyu Kim · Kyungmin Lee · Jaeho Lee · Jinwoo Shin
|
||
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang · Kevin Lin · Zhengyuan Yang · Jianfeng Wang · Linjie Li · Chung-Ching Lin · Zicheng Liu · Lijuan Wang
|
||
GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs
Mustafa Munir · William Avery · Md Mostafijur Rahman · Radu Marculescu
|
||
MoML: Online Meta Adaptation for 3D Human Motion Prediction
Xiaoning Sun · Huaijiang Sun · Bin Li · Dong Wei · Weiqing Li · Jianfeng Lu
|
||
Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation
Yi Zhang · Meng-Hao Guo · Miao Wang · Shi-Min Hu
|
||
Improving Graph Contrastive Learning via Adaptive Positive Sampling
Jiaming Zhuo · Feiyang Qin · Can Cui · Kun Fu · Bingxin Niu · Mengzhu Wang · Yuanfang Guo · Chuan Wang · Zhen Wang · Xiaochun Cao · Liang Yang
|
||
VILA: On Pre-training for Visual Language Models
Ji Lin · Danny Yin · Wei Ping · Pavlo Molchanov · Mohammad Shoeybi · Song Han
|
||
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Chen Zhao · Shuming Liu · Karttikeya Mangalam · Guocheng Qian · Fatimah Zohra · Abdulmohsen Alghannam · Jitendra Malik · Bernard Ghanem
|
||
Vision-and-Language Navigation via Causal Learning
Liuyi Wang · Zongtao He · Ronghao Dang · mengjiao shen · Chengju Liu · Qijun Chen
|
||
A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?
Galadrielle Humblot-Renaux · Sergio Escalera · Thomas B. Moeslund
|
||
Learning with Structural Labels for Learning with Noisy Labels
Noo-ri Kim · Jin-Seop Lee · Jee-Hyong Lee
|
||
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
Letian Zhang · Xiaotong Zhai · Zhongkai Zhao · Yongshuo Zong · Xin Wen · Bingchen Zhao
|
||
Bayesian Exploration of Pre-trained Models for Low-shot Image Classification
Yibo Miao · Yu lei · Feng Zhou · Zhijie Deng
|
||
PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment
Tianchen Deng · Guole Shen · Tong Qin · jianyu wang · Wentao Zhao · Jingchuan Wang · Danwei Wang · Weidong Chen
|
||
RecDiffusion: Rectangling for Image Stitching with Diffusion Models
Tianhao Zhou · Li Haipeng · Ziyi Wang · Ao Luo · Chenlin Zhang · Jiajun Li · Bing Zeng · Shuaicheng Liu
|
||
Incremental Nuclei Segmentation from Histopathological Images via Future-class Awareness and Compatibility-inspired Distillation
Huyong Wang · Huisi Wu · Jing Qin
|
||
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
Jinseok Kim · Tae-Kyun Kim
|
||
HashPoint: Accelerated Point Searching and Sampling for Neural Rendering
Jiahao Ma · Miaomiao Liu · David Ahmedt-Aristizabal · Chuong Nguyen
|
||
Three Pillars improving Vision Foundation Model Distillation for Lidar
Gilles Puy · Spyros Gidaris · Alexandre Boulch · Oriane Siméoni · Corentin Sautier · Patrick Pérez · Andrei Bursuc · Renaud Marlet
|
||
Retraining-free Model Quantization via One-Shot Weight-Coupling Learning
Chen Tang · Yuan Meng · Jiacheng Jiang · Shuzhao Xie · Rongwei Lu · Xinzhu Ma · Zhi Wang · Wenwu Zhu
|
||
Model Inversion Robustness: Can Transfer Learning Help?
Sy-Tuyen Ho · Koh Jun Hao · Keshigeyan Chandrasegaran · Ngoc-Bao Nguyen · Ngai-Man Cheung
|
||
Seamless Human Motion Composition with Blended Positional Encodings
German Barquero · Sergio Escalera · Cristina Palmero
|
||
Single Domain Generalization for Crowd Counting
Zhuoxuan Peng · S.-H. Gary Chan
|
||
SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation
Junyan Ye · Qiyan Luo · Jinhua Yu · Huaping Zhong · Zhimeng Zheng · Conghui He · Weijia Li
|
||
GLaMM: Pixel Grounding Large Multimodal Model
Hanoona Rasheed · Muhammad Maaz · Sahal Shaji Mullappilly · Abdelrahman Shaker · Salman Khan · Hisham Cholakkal · Rao Anwer · Eric P. Xing · Ming-Hsuan Yang · Fahad Shahbaz Khan
|
||
Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners
Keon Hee Park · Kyungwoo Song · Gyeong-Moon Park
|
||
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
Mingxuan Liu · Tyler Hayes · Elisa Ricci · Gabriela Csurka · Riccardo Volpi
|
||
Learning Large-Factor EM Image Super-Resolution with Generative Priors
Jiateng Shou · Zeyu Xiao · Shiyu Deng · Wei Huang · ShiPeiyao · Ruobing Zhang · Zhiwei Xiong · Feng Wu
|
||
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
Yucheng Suo · Fan Ma · Linchao Zhu · Yi Yang
|
||
Functional Diffusion
Biao Zhang · Peter Wonka
|
||
VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams
Liao Wang · Kaixin Yao · Chengcheng Guo · Zhirui Zhang · Qiang Hu · Jingyi Yu · Lan Xu · Minye Wu
|
||
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Nataniel Ruiz · Yuanzhen Li · Varun Jampani · Wei Wei · Tingbo Hou · Yael Pritch · Neal Wadhwa · Michael Rubinstein · Kfir Aberman
|
||
Novel Class Discovery for Ultra-Fine-Grained Visual Categorization
Yu Liu · Yaqi Cai · Qi Jia · Binglin Qiu · Weimin Wang · Nan Pu
|
||
Clustering Propagation for Universal Medical Image Segmentation
Yuhang Ding · Liulei Li · Wenguan Wang · Yi Yang
|
||
Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening
Yule Duan · Xiao Wu · Haoyu Deng · Liang-Jian Deng
|
||
A Versatile Framework for Continual Test-Time Domain Adaptation: Balancing Discriminability and Generalizability
Xu Yang · Xuan chen · Moqi Li · Kun Wei · Cheng Deng
|
||
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
Jiangpeng He
|
||
Device-Wise Federated Network Pruning
Shangqian Gao · Junyi Li · Zeyu Zhang · Yanfu Zhang · Weidong Cai · Heng Huang
|
||
D$^4$M: Dataset Distillation via Disentangled Diffusion Model
Duo Su · Junjie Hou · Weizhi Gao · Yingjie Tian · Bowen Tang
|
||
Face2Diffusion for Fast and Editable Face Personalization
Kaede Shiohara · Toshihiko Yamasaki
|
||
Logarithmic Lenses: Exploring Log RGB Data for Image Classification
Bruce Maxwell · Sumegha Singhania · Avnish Patel · Rahul Kumar · Heather Fryling · Sihan Li · Haonan Sun · Ping He · Zewen Li
|
||
Score-Guided Diffusion for 3D Human Recovery
Anastasis Stathopoulos · Ligong Han · Dimitris N. Metaxas
|
||
Draw Step by Step: Reconstructing CAD Construction Sequences from Point Clouds via Multimodal Diffusion.
Weijian Ma · Shuaiqi Chen · Yunzhong Lou · Xueyang Li · Xiangdong Zhou
|
||
StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation
Yining Shi · Kun JIANG · Ke Wang · Jiusi Li · Yunlong Wang · Mengmeng Yang · Diange Yang
|
||
Specularity Factorization for Low Light Enhancement
Saurabh Saini · P. J. Narayanan
|
||
Enhance Image Classification Via Inter-Class Image Mixup With Diffusion Model
Zhicai Wang · Longhui Wei · Tan Wang · Heyu Chen · Yanbin Hao · Xiang Wang · Xiangnan He · Qi Tian
|
||
CroSel: Cross Selection of Confident Pseudo Labels for Partial-Label Learning
Shiyu Tian · Hongxin Wei · Yiqun Wang · Lei Feng
|
||
Just Add $\pi$! Pose Induced Video Transformers for Understanding Activities of Daily Living
Dominick Reilly · Srijan Das
|
||
3DInAction: Understanding Human Actions in 3D Point Clouds
Yizhak Ben-Shabat · Oren Shrout · Stephen Gould
|
||
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou · Chao Yang · Yu Qiao · Chengbin Quan · Youjian Zhao
|
||
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
Shuting He · Henghui Ding
|
||
Embracing Unimodal Aleatoric Uncertainty for Robust Multimodal Fusion
Zixian Gao · Xun Jiang · Xing Xu · Fumin Shen · Yujie Li · Heng Tao Shen
|
||
DGC-GNN: Leveraging Geometry and Color Cues for Visual Descriptor-Free 2D-3D Matching
Shuzhe Wang · Juho Kannala · Daniel Barath
|
||
Multiplane Prior Guided Few-Shot Aerial Scene Rendering
Zihan Gao · Licheng Jiao · Lingling Li · Xu Liu · Fang Liu · Puhua Chen · Yuwei Guo
|
||
4K4D: Real-Time 4D View Synthesis at 4K Resolution
Zhen Xu · Sida Peng · Haotong Lin · Guangzhao He · Jiaming Sun · Yujun Shen · Hujun Bao · Xiaowei Zhou
|
||
Context-Guided Spatio-Temporal Video Grounding
Xin Gu · Heng Fan · Yan Huang · Tiejian Luo · Libo Zhang
|
||
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
Xin Zhou · Dingkang Liang · Wei Xu · Xingkui Zhu · Yihan Xu · Zhikang Zou · Xiang Bai
|
||
Reconstruction-free Cascaded Adaptive Compressive Sensing
Chenxi Qiu · Tao Yue · Xuemei Hu
|
||
Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI
Chong Wang · Lanqing Guo · Yufei Wang · Hao Cheng · Yi Yu · Bihan Wen
|
||
A Unified Approach for Text- and Image-guided 4D Scene Generation
Yufeng Zheng · Xueting Li · Koki Nagano · Sifei Liu · Otmar Hilliges · Shalini De Mello
|
||
Intrinsic Image Diffusion for Indoor Single-view Material Estimation
Peter Kocsis · Vincent Sitzmann · Matthias Nießner
|
||
Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?
Zhengyue Zhao · Jinhao Duan · Kaidi Xu · Chenan Wang · Rui Zhang · Zidong Du · Qi Guo · Xing Hu
|
||
NetTrack: Tracking Highly Dynamic Objects with a Net
Guangze Zheng · Shijie Lin · Haobo Zuo · Changhong Fu · Jia Pan
|
||
CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment
Hyeongmin Lee · Kyoungkook Kang · Jungseul Ok · Sunghyun Cho
|
||
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Jianyuan Wang · Nikita Karaev · Christian Rupprecht · David Novotny
|
||
CPP-Net: Embracing Multi-Scale Feature Fusion into Deep Unfolding CP-PPA Network for Compressive Sensing
Zhen Guo · Hongping Gan
|
||
Video Recognition in Portrait Mode
Mingfei Han · Linjie Yang · Xiaojie Jin · Jiashi Feng · Xiaojun Chang · Heng Wang
|
||
Versatile Navigation under Partial Observability via Value-Guided Diffusion Policy
Gengyu Zhang · Hao Tang · Yan Yan
|
||
Point, Segment and Count: A Generalized Framework for Object Counting
Zhizhong Huang · Mingliang Dai · Yi Zhang · Junping Zhang · Hongming Shan
|
||
A Generative Approach for Wikipedia-Scale Visual Entity Recognition
Mathilde Caron · Ahmet Iscen · Alireza Fathi · Cordelia Schmid
|
||
Normalizing Flows on the Product Space of SO(3) Manifolds for Probabilistic Human Pose Modeling
Olaf Dünkel · Tim Salzmann · Florian Pfaff
|
||
GEARS: Local Geometry-aware Hand-object Interaction Synthesis
Keyang Zhou · Bharat Lal Bhatnagar · Jan Lenssen · Gerard Pons-Moll
|
||
GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
Junjie Wang · Jiemin Fang · Xiaopeng Zhang · Lingxi Xie · Qi Tian
|
||
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers
Yawar Siddiqui · Antonio Alliegro · Alexey Artemov · Tatiana Tommasi · Daniele Sirigatti · Vladislav Rosov · Angela Dai · Matthias Nießner
|
||
3D-Aware Face Editing via Warping-Guided Latent Direction Learning
Yuhao Cheng · Zhuo Chen · Xingyu Ren · Wenhan Zhu · Zhengqin Xu · Di Xu · Yang Changpeng · Yichao Yan
|
||
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
Geonho Bang · Kwangjin Choi · Jisong Kim · Dongsuk Kum · Jun Won Choi
|
||
MatFuse: Controllable Material Generation with Diffusion Models
Giuseppe Vecchio · Renato Sortino · Simone Palazzo · Concetto Spampinato
|
||
Global Latent Neural Rendering
Thomas Tanay · Matteo Maggioni
|
||
Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
Junyi Wu · Bin Duan · Weitai Kang · Hao Tang · Yan Yan
|
||
Cache Me if You Can: Accelerating Diffusion Models through Block Caching
Felix Wimbauer · Bichen Wu · Edgar Schoenfeld · Xiaoliang Dai · Ji Hou · Zijian He · Artsiom Sanakoyeu · Peizhao Zhang · Sam Tsai · Jonas Kohler · Christian Rupprecht · Daniel Cremers · Peter Vajda · Jialiang Wang
|
||
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
Subhadeep Koley · Ayan Kumar Bhunia · Deeptanshu Sekhri · Aneeshan Sain · Pinaki Nath Chowdhury · Tao Xiang · Yi-Zhe Song
|
||
ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-view Images
Jinseo Jeong · Junseo Koo · Qimeng Zhang · Gunhee Kim
|
||
Epistemic Uncertainty Quantification For Pre-trained Neural Networks
Hanjing Wang · Qiang Ji
|
||
OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning
Haiyang Ying · Yixuan Yin · Jinzhi Zhang · Fan Wang · Tao Yu · Ruqi Huang · Lu Fang
|
||
MRFS: Mutually Reinforcing Image Fusion and Segmentation
Hao Zhang · Xuhui Zuo · Jie Jiang · Chunchao Guo · Jiayi Ma
|
||
3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation
Dale Decatur · Itai Lang · Kfir Aberman · Rana Hanocka
|
||
Design2Cloth: 3D Cloth Generation from 2D Masks
Jiali Zheng · Rolandos Alexandros Potamias · Stefanos Zafeiriou
|
||
3D-LFM: Lifting Foundation Model
Mosam Dabhi · László A. Jeni · Simon Lucey
|
||
Localization Is All You Evaluate: Data Leakage in Online Mapping Datasets and How to Fix It
Adam Lilja · Junsheng Fu · Erik Stenborg · Lars Hammarstrand
|
||
Masked AutoDecoder is Effective Multi-Task Vision Generalist
Han Qiu · Jiaxing Huang · Peng Gao · Lewei Lu · Xiaoqin Zhang · Shijian Lu
|
||
UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather
Haimei Zhao · Jing Zhang · Zhuo Chen · Shanshan Zhao · Dacheng Tao
|
||
Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification
Tingting Zheng · Kui Jiang · Hongxun Yao
|
||
PerceptionGPT: Effectively Fusing Visual Perception into LLM
Renjie Pi · Lewei Yao · Jiahui Gao · Jipeng Zhang · Tong Zhang
|
||
Probing the 3D Awareness of Visual Foundation Models
Mohamed El Banani · Amit Raj · Kevis-kokitsi Maninis · Abhishek Kar · Yuanzhen Li · Michael Rubinstein · Deqing Sun · Leonidas Guibas · Justin Johnson · Varun Jampani
|
||
View-Category Interactive Sharing Transformer for Incomplete Multi-View Multi-Label Learning
Shilong Ou · Zhe Xue · Yawen Li · Meiyu Liang · Yuanqiang Cai · junjiang wu
|
||
PAD: Patch-Agnostic Defense against Adversarial Patch Attacks
Lihua Jing · Rui Wang · Wenqi Ren · Xin Dong · Cong Zou
|
||
Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization
Guopeng Li · Ming Qian · Gui-Song Xia
|
||
EasyDrag: Efficient Point-based Manipulation on Diffusion Models
Xingzhong Hou · Boxiao Liu · Yi Zhang · Jihao Liu · Yu Liu · Haihang You
|
||
Generating Illustrated Instructions
Sachit Menon · Ishan Misra · Rohit Girdhar
|
||
LASIL: Learner-Aware Supervised Imitation Learning For Long-term Microscopic Traffic Simulation
Ke Guo · Zhenwei Miao · Wei Jing · Weiwei Liu · Weizi Li · Dayang Hao · Jia Pan
|
||
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Ye Yuan · Xueting Li · Yangyi Huang · Shalini De Mello · Koki Nagano · Jan Kautz · Umar Iqbal
|
||
TexTile: A Differentiable Metric for Texture Tileability
Carlos Rodriguez-Pardo · Dan Casas · Elena Garces · Jorge Lopez-Moreno
|
||
Image Processing GNN: Breaking Rigidity in Super-Resolution
Yuchuan Tian · Hanting Chen · Chao Xu · Yunhe Wang
|
||
RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Jihan Yang · Runyu Ding · Weipeng DENG · Zhe Wang · Xiaojuan Qi
|
||
LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network
Hao Yang · Liyuan Pan · Yan Yang · Richard Hartley · Miaomiao Liu
|
||
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva · Fadime Sener · Edoardo Remelli · Bugra Tekin · Eric Sauser · Bernt Schiele · Shugao Ma
|
||
Morphable Diffusion: 3D-Consistent Diffusion for Single-image Avatar Creation
Xiyi Chen · Marko Mihajlovic · Shaofei Wang · Sergey Prokudin · Siyu Tang
|
||
MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark
Sanghyun Woo · Kwanyong Park · Inkyu Shin · Myungchul Kim · In So Kweon
|
||
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
Qihao Zhao · Yalun Dai · Hao Li · Wei Hu · Fan Zhang · Jun Liu
|
||
Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer
Hyeongjin Nam · Daniel Jung · Gyeongsik Moon · Kyoung Mu Lee
|
||
Learned Scanpaths Aid Blind Panoramic Video Quality Assessment
Kanglong FAN · Wen Wen · Mu Li · YIFAN PENG · Kede Ma
|
||
S$^2$MVTC: a Simple yet Efficient Scalable Multi-View Tensor Clustering
Zhen Long · Qiyuan Wang · Yazhou Ren · Yipeng Liu · Ce Zhu
|
||
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
Min Yang · gaohuan · Ping Guo · Limin Wang
|
||
MAFA: Managing False Negatives for Vision-Language Pre-training
Jaeseok Byun · Dohoon Kim · Taesup Moon
|
||
Efficiently Assemble Normalization Layers and Regularization for Federated Domain Generalization
Khiem Le · Tuan Long Ho · Cuong Do · Danh Le-Phuoc · KOK SENG WONG
|
||
Unsupervised Gaze Representation Learning from Multi-view Face Images
Yiwei Bao · Feng Lu
|
||
PEEKABOO: Interactive Video Generation via Masked-Diffusion
Yash Jain · Anshul Nasery · Vibhav Vineet · Harkirat Behl
|
||
Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering
Zhaohe Liao · Jiangtong Li · Li Niu · Liqing Zhang
|
||
MAS: Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion
Roy Kapon · Guy Tevet · Daniel Cohen-Or · Amit H. Bermano
|
||
Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector
Yifu Ding · Weilun Feng · Chuyan Chen · Jinyang Guo · Xianglong Liu
|
||
From Coarse to Fine-Grained Open-Set Recognition
Nico Lang · Vésteinn Snæbjarnarson · Elijah Cole · Oisin Mac Aodha · Christian Igel · Serge Belongie
|
||
DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer
Wei-Ting Chen · Gurunandan Krishnan · Qiang Gao · Sy-Yen Kuo · Sizhuo Ma · Jian Wang
|
||
Discriminative Pattern Calibration Mechanism for Source-Free Domain Adaptation
Haifeng Xia · Siyu Xia · Zhengming Ding
|
||
RAM-Avatar: Real-time Photo-Realistic Avatar from Monocular Videos with Full-body Control
xiang deng · Zerong Zheng · Yuxiang Zhang · Jingxiang Sun · Chao Xu · Xiaodong Yang · Lizhen Wang · Yebin Liu
|
||
Towards Generalizable Multi-Object Tracking
Zheng Qin · Le Wang · Sanping Zhou · Panpan Fu · Gang Hua · Wei Tang
|
||
EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation
Md Mostafijur Rahman · Mustafa Munir · Radu Marculescu
|
||
Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction
Ziyi Yang · Xinyu Gao · Wen Zhou · Shaohui Jiao · Yuqing Zhang · Xiaogang Jin
|
||
TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation
Sai Kumar Dwivedi · Yu Sun · Priyanka Patel · Yao Feng · Michael J. Black
|
||
Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models
Bin Fu · Fanghua Yu · Anran Liu · Zixuan Wang · Jie Wen · Junjun He · Yu Qiao
|
||
A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals
Jiangnan Tang · Jingya Wang · Kaiyang Ji · Lan Xu · Jingyi Yu · Ye Shi
|
||
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges
Tongtong Yuan · Xuange Zhang · Kun Liu · Bo Liu · Chen Chen · Jian Jin · Zhenzhen Jiao
|
||
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?
Yuxin Chen · Zongyang Ma · Ziqi Zhang · Zhongang Qi · Chunfeng Yuan · Bing Li · Junfu Pu · Ying Shan · Xiaojuan Qi · Weiming Hu
|
||
Locally Adaptive Neural 3D Morphable Models
Michail Tarasiou · Rolandos Alexandros Potamias · Eimear O' Sullivan · Stylianos Ploumpis · Stefanos Zafeiriou
|
||
Revisiting Adversarial Training at Scale
Zeyu Wang · Xianhang li · Hongru Zhu · Cihang Xie
|
||
Benchmarking Segmentation Models with Mask-Preserved Attribute Editing
Zijin Yin · Kongming Liang · Bing Li · Zhanyu Ma · Jun Guo
|
||
MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning
Mohamed Abdelfattah · Mariam Hassan · Alex Alahi
|
||
Logit Standardization in Knowledge Distillation
Shangquan Sun · Wenqi Ren · Jingzhi Li · Rui Wang · Xiaochun Cao
|
||
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video
Zicong Fan · Maria Parelli · Maria Kadoglou · Xu Chen · Muhammed Kocabas · Michael J. Black · Otmar Hilliges
|
||
Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity
Yuhang Chen · Wenke Huang · Mang Ye
|
||
Visual In-Context Prompting
Feng Li · Qing Jiang · Hao Zhang · Shilong Liu · Huaizhe Xu · Xueyan Zou · Tianhe Ren · Hongyang Li · Lei Zhang · Chunyuan Li · Jianwei Yang · Jianfeng Gao
|
||
Overload: Latency Attacks on Object Detection for Edge Devices
Erh-Chung Chen · Pin-Yu Chen · I-Hsin Chung · Che-Rung Lee
|
||
Dual DETRs for Multi-Label Temporal Action Detection
Yuhan Zhu · Guozhen Zhang · Jing Tan · Gangshan Wu · Limin Wang
|
||
UFC-Net: Unrolling Fixed-point Continuous Network for Deep Compressive Sensing
Xiaoyang Wang · Hongping Gan
|
||
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
Haoyi Jiang · Tianheng Cheng · Naiyu Gao · Haoyang Zhang · Tianwei Lin · Wenyu Liu · Xinggang Wang
|
||
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu · Chenlin Zhang · Chen Zhao · Bernard Ghanem
|
||
AlignSAM: Aligning Segment Anything Model to Open Context via Reinforcement Learning
Duojun Huang · Xinyu Xiong · Jie Ma · Jichang Li · Zequn Jie · Lin Ma · Guanbin Li
|
||
Adaptive VIO: Deep Visual-Inertial Odometry with Online Continual Learning
Youqi Pan · Wugen Zhou · Yingdian Cao · Hongbin Zha
|
||
Tri-Modal Motion Retrieval by Learning a Joint Embedding Space
Kangning Yin · Shihao Zou · Yuxuan Ge · Zheng Tian
|
||
Continual Segmentation with Disentangled Objectness Learning and Class Recognition
Yizheng Gong · Siyue Yu · Xiaoyang Wang · Jimin Xiao
|
||
Supervised Anomaly Detection for Complex Industrial Images
Aimira Baitieva · David Hurych · Victor Besnier · Olivier BERNARD
|
||
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing
Denis Bobkov · Vadim Titov · Aibek Alanov · Dmitry Vetrov
|
||
Interactive Continual Learning: Fast and Slow Thinking
Biqing Qi · Xinquan Chen · Junqi Gao · Dong Li · Jianxing Liu · Ligang Wu · Bowen Zhou
|
||
Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection
Chen Chen · Jiahao Qi · Xingyue Liu · Kangcheng Bin · Ruigang Fu · Xikun Hu · Ping Zhong
|
||
Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion
Lucas Nunes · Rodrigo Marcuzzi · Benedikt Mersch · Jens Behley · Cyrill Stachniss
|
||
Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization
Deng Li · Aming Wu · Yaowei Wang · Yahong Han
|
||
In-Context Matting
He Guo · Zixuan Ye · Zhiguo Cao · Hao Lu
|
||
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang · Kunchang Li · Xinyuan Chen · Yaohui Wang · Ziwei Liu · Yu Qiao · Yali Wang
|
||
EscherNet: A Generative Model for Scalable View Synthesis
Xin Kong · Shikun Liu · Xiaoyang Lyu · Marwan Taher · Xiaojuan Qi · Andrew J. Davison
|
||
FlowTrack: Revisiting Optical Flow for Long-Range Dense Tracking
Seokju Cho · Gabriel Huang · Seungryong Kim · Joon-Young Lee
|
||
MVCPS-NeuS: Multi-view Constrained Photometric Stereo for Neural Surface Reconstruction
Hiroaki Santo · Fumio Okura · Yasuyuki Matsushita
|
||
LLaFS: When Large Language Models Meet Few-Shot Segmentation
Lanyun Zhu · Tianrun Chen · Deyi Ji · Jieping Ye · Jun Liu
|
||
Towards Memorization-Free Diffusion Models
Chen Chen · Daochang Liu · Chang Xu
|
||
RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
Ming Yan · Yan Zhang · Shuqiang Cai · Shuqi Fan · Xincheng Lin · Yudi Dai · Siqi Shen · Chenglu Wen · Lan Xu · Yuexin Ma · Cheng Wang
|
||
Guided Slot Attention for Unsupervised Video Object Segmentation
Minhyeok Lee · Suhwan Cho · Dogyoon Lee · Chaewon Park · Jungho Lee · Sangyoun Lee
|
||
Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
Zhiyuan Min · Yawei Luo · Wei Yang · Yuesong Wang · Yi Yang
|
||
Unified Entropy Optimization for Open-Set Test-Time Adaptation
Zhengqing Gao · Xu-Yao Zhang · Cheng-Lin Liu
|
||
SEED-Bench: Benchmarking Multimodal Large Language Models
Bohao Li · Yuying Ge · Yixiao Ge · Guangzhi Wang · Rui Wang · Ruimao Zhang · Ying Shan
|
||
LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels
Tuo Feng · Wenguan Wang · Fan Ma · Yi Yang
|
||
Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model
Zelin Peng · Zhengqin Xu · Zhilin Zeng · Lingxi Xie · Qi Tian · Wei Shen
|
||
MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction
Xiaolu Liu · Song Wang · Wentong Li · Ruizi Yang · Junbo Chen · Jianke Zhu
|
||
ViT-Lens: Towards Omni-modal Representations
Stan Weixian Lei · Yixiao Ge · Kun Yi · Jianfeng Zhang · Difei Gao · Dylan Sun · Yuying Ge · Ying Shan · Mike Zheng Shou
|
||
Rewrite the stars
Xu Ma · Xiyang Dai · Yue Bai · Yizhou Wang · Yun Fu
|
||
MultiPhys: Multi-Person Physics-aware 3D Motion Estimation
Nicolás Ugrinovic · Boxiao Pan · Georgios Pavlakos · Despoina Paschalidou · Bokui Shen · Jordi Sanchez-Riera · Francesc Moreno-Noguer · Leonidas Guibas
|
||
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Hao Shao · Yuxuan Hu · Letian Wang · Guanglu Song · Steven L. Waslander · Yu Liu · Hongsheng Li
|
||
A-Teacher: Asymmetric Network for 3D Semi-Supervised Object Detection
Hanshi Wang · Zhipeng Zhang · Jin Gao · Weiming Hu
|
||
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao · Haiping Wu · Weijian Xu · Xiyang Dai · Houdong Hu · Yumao Lu · Michael Zeng · Ce Liu · Lu Yuan
|
||
Adversarial Score Distillation: When score distillation meets GAN
Min Wei · Jingkai Zhou · Junyao Sun · Xuesong Zhang
|
||
BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection
Zhenxin Li · Shiyi Lan · Jose M. Alvarez · Zuxuan Wu
|
||
HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse Poses
Caoyuan Ma · Yu-Lun Liu · Zhixiang Wang · Wu Liu · Xinchen Liu · Zheng Wang
|
||
DMR: Decomposed Multi-Modality Representations for Frames and Events Fusion in Visual Reinforcement Learning
Haoran Xu · Peixi Peng · Guang Tan · Yuan Li · Xinhai Xu · Yonghong Tian
|
||
Communication-Efficient Collaborative Perception via Information Filling with Codebook
Yue Hu · Juntong Peng · Sifei Liu · Junhao Ge · Si Liu · Siheng Chen
|
||
EventDance: Unsupervised Cross-modal Source-free Adaptation for Event-based Object Recognition
Xu Zheng · Lin Wang
|
||
Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion
Xunpeng Yi · Han Xu · Hao Zhang · Linfeng Tang · Jiayi Ma
|
||
Semantics-aware Motion Retargeting with Vision-Language Models
Haodong Zhang · ZhiKe Chen · Haocheng Xu · Lei Hao · Xiaofei Wu · Songcen Xu · Zhensong Zhang · Yue Wang · Rong Xiong
|
||
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
Chenjie Cao · Yunuo Cai · Qiaole Dong · Yikai Wang · Yanwei Fu
|
||
MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints
Pengfei Xie · Wenqiang Xu · Tutian Tang · Zhenjun Yu · Cewu Lu
|
||
PanoPose: Self-supervised Relative Pose Estimation for Panoramic Images
Diantao Tu · Hainan Cui · Xianwei Zheng · Shuhan Shen
|
||
Enhancing Post-training Quantization Calibration through Contrastive Learning
Yuzhang Shang · Gaowen Liu · Ramana Kompella · Yan Yan
|
||
DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
Jiahe Li · Jiawei Zhang · Xiao Bai · Jin Zheng · Xin Ning · Jun Zhou · Lin Gu
|
||
DiffHuman: Probabilistic Photorealistic 3D Reconstruction of Humans
Akash Sengupta · Thiemo Alldieck · NIKOS KOLOTOUROS · Enric Corona · Andrei Zanfir · Cristian Sminchisescu
|
||
Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problem
Haoquan Zhang · Ronggang Huang · Yi Xie · Huaidong Zhang
|
||
Global and Local Prompts Cooperation via Optimal Transport for Federated Learning
Hongxia Li · Wei Huang · Jingya Wang · Ye Shi
|
||
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
Jiequan Cui · Beier Zhu · Xin Wen · Xiaojuan Qi · Bei Yu · Hanwang Zhang
|
||
Dense Optical Tracking: Connecting the Dots
Guillaume Le Moing · Jean Ponce · Cordelia Schmid
|
||
Multi-agent Collaborative Perception via Motion-aware Robust Communication Network
Shixin Hong · Yu LIU · Zhi Li · Shaohui Li · You He
|
||
Fourier Priors-Guided Diffusion for Zero-Shot Joint Low-Light Enhancement and Deblurring
Xiaoqian Lv · Shengping Zhang · Chenyang Wang · Yichen Zheng · Bineng Zhong · Chongyi Li · Liqiang Nie
|
||
Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training
Qian Li · Yuxiao Hu · Yinpeng Dong · Dongxiao Zhang · Yuntian Chen
|
||
ColorPCR: Color Point Cloud Registration with Multi-Stage Geometric-Color Fusion
Juncheng Mu · Lin Bie · Shaoyi Du · Yue Gao
|
||
Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi
Kangwei Yan · Fei Wang · Bo Qian · Han Ding · Jinsong Han · Xing Wei
|
||
FREE: Faster and Better Data-Free Meta-Learning
Yongxian Wei · Zixuan Hu · Zhenyi Wang · Li Shen · Chun Yuan · Dacheng Tao
|
||
Open Vocabulary Semantic Scene Sketch Understanding
Ahmed Bourouis · Judith Fan · Yulia Gryaditskaya
|
||
Unsupervised Feature Learning with Emergent Data-Driven Prototypicality
Yunhui Guo · Youren Zhang · Yubei Chen · Stella X. Yu
|
||
Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
Haoyu Chen · Wenbo Li · Jinjin Gu · Jingjing Ren · Haoze Sun · Xueyi Zou · Youliang Yan · Zhensong Zhang · Lei Zhu
|
||
Distilling ODE Solvers of Diffusion Models into Smaller Steps
Sanghwan Kim · Hao Tang · Fisher Yu
|
||
3DiffTection: 3D Object Detection with Geometry-aware Diffusion Features
Chenfeng Xu · Huan Ling · Sanja Fidler · Or Litany
|
||
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Ivan Skorokhodov · Willi Menapace · Aliaksandr Siarohin · Sergey Tulyakov
|
||
XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies
Xuanchi Ren · Jiahui Huang · Xiaohui Zeng · Ken Museth · Sanja Fidler · Francis Williams
|
||
Probabilistic Human Mesh Estimation with Hypothesis Scoring
Yuan Xu · Xiaoxuan Ma · Jiajun Su · Wentao Zhu · Yu Qiao · Yizhou Wang
|
||
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Shitian Zhao · Zhuowan Li · YadongLu · Alan L. Yuille · Yan Wang
|
||
GRAM: Global Reasoning for Multi-Page VQA
Itshak Blau · Sharon Fogel · Roi Ronen · Alona Golts · Shahar Tsiper · Elad Ben Avraham · Aviad Aberdam · Roy Ganz · Ron Litman
|
||
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning
wenlong deng · Christos Thrampoulidis · Xiaoxiao Li
|
||
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Qifan Yu · Juncheng Li · Longhui Wei · Liang Pang · Wentao Ye · Bosheng Qin · Siliang Tang · Qi Tian · Yueting Zhuang
|
||
VMINer: Versatile Multi-view Inverse Rendering with Near- and Far-field Light Sources
Fan Fei · Jiajun Tang · Ping Tan · Boxin Shi
|
||
On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?
Maxime Zanella · Ismail Ben Ayed
|
||
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Pengchong Qiao · Lei Shang · Chang Liu · Baigui Sun · Xiangyang Ji · Jie Chen
|
||
SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks
Xinyu Shi · Zecheng Hao · Zhaofei Yu
|
||
SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction
Pin Tang · Zhongdao Wang · Guoqing Wang · Jilai Zheng · Xiangxuan Ren · Bailan Feng · Chao Ma
|
||
Towards High-fidelity Artistic Image Vectorization via Texture-Encapsulated Shape Parameterization
Ye Chen · Bingbing Ni · Jinfan Liu · Xiaoyang Huang · Xuanhong Chen
|
||
OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees
Hakyeong Kim · Andreas Meuleman · Hyeonjoong Jang · James Tompkin · Min H. Kim
|
||
Extreme Point Supervised Instance Segmentation
Hyeonjun Lee · Sehyun Hwang · Suha Kwak
|
||
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
Xu He · Qiaochu Huang · Zhensong Zhang · Zhiwei Lin · Zhiyong Wu · Sicheng Yang · Minglei Li · Zhiyi Chen · Songcen Xu · Xiaofei Wu
|
||
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang · Yukun Huang · Xiaoyang Wu · Yuan-Chen Guo · Song-Hai Zhang · Hengshuang Zhao · Tong He · Xihui Liu
|
||
Degree-of-Freedom Matters: Inferring Dynamics from Point Trajectories
Yan Zhang · Sergey Prokudin · Marko Mihajlovic · Qianli Ma · Siyu Tang
|
||
ActiveDC: Distribution Calibration for Active Finetuning
Wenshuai Xu · Zhenghui Hu · Yu Lu · Jinzhou Meng · Qingjie Liu · Yunhong Wang
|
||
KVQ: Kwai Video Quality Assessment for Short-form Videos
Yiting Lu · Xin Li · Yajing Pei · Kun Yuan · Qizhi Xie · Yunpeng Qu · Ming Sun · Chao Zhou · Zhibo Chen
|
||
Bidirectional Autoregessive Diffusion Model for Dance Generation
Canyu Zhang · Youbao Tang · NING Zhang · Ruei-Sung Lin · Mei Han · Jing Xiao · Song Wang
|
||
CoSeR: Bridging Image and Language for Cognitive Super-Resolution
Haoze Sun · Wenbo Li · Jianzhuang Liu · Haoyu Chen · Renjing Pei · Xueyi Zou · Youliang Yan · Yujiu Yang
|
||
You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval
Subhadeep Koley · Ayan Kumar Bhunia · Aneeshan Sain · Pinaki Nath Chowdhury · Tao Xiang · Yi-Zhe Song
|
||
NeRF Analogies - Example-Based Visual Attribute Transfer for NeRFs
Michael Fischer · Zhengqin Li · Thu Nguyen-Phuoc · Aljaž Božič · Zhao Dong · Carl Marshall · Tobias Ritschel
|
||
InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning
Yan-Shuo Liang · Wu-Jun Li
|
||
Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
Zhiwen Yan · Weng Fei Low · Yu Chen · Gim Hee Lee
|
||
Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer
Jiwoo Chung · Sangeek Hyun · Jae-Pil Heo
|
||
Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model
Dian Zheng · Xiao-Ming Wu · Shuzhou Yang · Jian Zhang · Jian-Fang Hu · Wei-Shi Zheng
|
||
Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning
Xinshun Wang · Zhongbin Fang · Xia Li · Xiangtai Li · Chen Chen · Mengyuan Liu
|
||
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun Reddy · William Paul · Corban Rivera · Ketul Shah · Celso M. de Melo · Rama Chellappa
|
||
Fast Adaptation for Human Pose Estimation via Meta-Optimization
Shengxiang Hu · Huaijiang Sun · Bin Li · Dong Wei · Weiqing Li · Jianfeng Lu
|
||
"Previously on ..." From Recaps to Story Summarization
Aditya Kumar Singh · Dhruv Srivastava · Makarand Tapaswi
|
||
Generating Non-Stationary Textures using Self-Rectification
Yang Zhou · Rongjun Xiao · Dani Lischinski · Daniel Cohen-Or · Hui Huang
|
||
SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection
JUNSU KIM · Hoseong Cho · Jihyeon Kim · Yihalem Tiruneh · Seungryul Baek
|
||
Frozen Feature Augmentation for Few-Shot Image Classification
Andreas Bär · Neil Houlsby · Mostafa Dehghani · Manoj Kumar
|
||
1-Lipschitz Layers Compared: Memory, Speed, and Certifiable Robustness
Bernd Prach · Fabio Brau · Giorgio Buttazzo · Christoph Lampert
|
||
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Hyeonho Jeong · Geon Yeong Park · Jong Chul Ye
|
||
Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection
Jiawen Zhu · Choubo Ding · Yu Tian · Guansong Pang
|
||
L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream
Jingtao Sun · Yaonan Wang · Mingtao Feng · Yulan Guo · Ajmal Mian · Mike Zheng Shou
|
||
BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation
Qihang Zhang · Yinghao Xu · Yujun Shen · Bo Dai · Bolei Zhou · Ceyuan Yang
|
||
GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces
Yingwenqi Jiang · Jiadong Tu · Yuan Liu · Xifeng Gao · Xiaoxiao Long · Wenping Wang · Yuexin Ma
|
||
Scaling Laws for Data Filtering: Data Curation cannot be Compute Agnostic
Sachin Goyal · Pratyush Maini · Zachary Lipton · Aditi Raghunathan · Zico Kolter
|
||
PoNQ: a Neural QEM-based Mesh Representation
Nissim Maruani · Maks Ovsjanikov · Pierre Alliez · Mathieu Desbrun
|
||
Representing Signs as Language: A New Method for Sign Language Translation from Videos
Jia Gong · Lin Geng Foo · Yixuan He · Hossein Rahmani · Jun Liu
|
||
HIPTrack: Visual Tracking with Historical Prompts
Wenrui Cai · Qingjie Liu · Yunhong Wang
|
||
CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective
Shunsuke Yasuki · Masato Taki
|
||
Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation
Siteng Huang · Biao Gong · Yutong Feng · Xi Chen · Yuqian Fu · Yu Liu · Donglin Wang
|
||
Regularized Parameter Uncertainty for Improving Generalization in Reinforcement Learning
Pehuen Moure · Longbiao Cheng · Joachim Ott · Zuowen Wang · Shih-Chii Liu
|
||
Robust Noisy Correspondence Learning with Equivariant Similarity Consistency
Yuchen Yang · Erkun Yang · Likai Wang · Cheng Deng
|
||
PanoRecon: Real-Time Panoptic 3D Reconstruction from Monocular Video
Dong Wu · Zike Yan · Hongbin Zha
|
||
Boosting Flow-based Generative Super-Resolution Models via Learned Prior
Li-Yuan Tsao · Yi-Chen Lo · Chia-Che Chang · Hao-Wei Chen · Roy Tseng · Chien Feng · Chun-Yi Lee
|
||
Situational Awareness Matters in 3D Vision Language Reasoning
Yunze Man · Liang-Yan Gui · Yu-Xiong Wang
|
||
Directed Decentralized Collaboration for Personalized Federated Learning
Yingqi Liu · Yifan Shi · Qinglun Li · Baoyuan Wu · Xueqian Wang · Li Shen
|
||
Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image
Yiqun Mei · Yu Zeng · He Zhang · Zhixin Shu · Xuaner Zhang · Sai Bi · Jianming Zhang · HyunJoon Jung · Vishal M. Patel
|
||
Learning to Rank Patches for Unbiased Image Redundancy Reduction
Yang Luo · Zhineng Chen · Peng Zhou · Zuxuan Wu · Xieping Gao · Yu-Gang Jiang
|
||
Task-Driven Wavelets using Constrained Empirical Risk Minimization
Eric Marcus · Ray Sheombarsing · Jan-Jakob Sonke · Jonas Teuwen
|
||
Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision
Xin Juan · Kaixiong Zhou · Ninghao Liu · Tianlong Chen · Xin Wang
|
||
AHIVE: Anatomy-aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval
Sixing Yan · William K. Cheung · Ivor Tsang · Wan Hang Keith Chiu · Tong Terence · Ka Chun Cheung · Simon See
|
||
Text-to-3D using Gaussian Splatting
Zilong Chen · Feng Wang · Yikai Wang · Huaping Liu
|
||
Probing Synergistic High-Order Interaction in Infrared and Visible Image Fusion
Naishan Zheng · Man Zhou · Jie Huang · Junming Hou · Haoying Li · Yuan Xu · Feng Zhao
|
||
InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion
Jihyun Lee · Shunsuke Saito · Giljoo Nam · Minhyuk Sung · Tae-Kyun Kim
|
||
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan · Kaifeng Chen · Dilip Krishnan · Dina Katabi · Phillip Isola · Yonglong Tian
|
||
Egocentric Full Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement
Jian Wang · Zhe Cao · Diogo Luvizon · Lingjie Liu · Kripasindhu Sarkar · Danhang Tang · Thabo Beeler · Christian Theobalt
|
||
MMA: Multi-Modal Adapter for Vision-Language Models
Lingxiao Yang · Ru-Yuan Zhang · Yanchen Wang · Xiaohua Xie
|
||
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment
Zheren Fu · Lei Zhang · Hou Xia · Zhendong Mao
|
||
Blind Image Quality Assessment Based on Geometric Order Learning
Nyeong-Ho Shin · Seon-Ho Lee · Chang-Su Kim
|
||
Unsupervised Deep Unrolling Networks for Phase Unwrapping
Zhile Chen · Yuhui Quan · Hui Ji
|
||
Would Deep Generative Models Amplify Bias in Future Models?
Tianwei Chen · Yusuke Hirota · Mayu Otani · Noa Garcia · Yuta Nakashima
|
||
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Kiana Ehsani · Tanmay Gupta · Rose Hendrix · Jordi Salvador · Luca Weihs · Kuo-Hao Zeng · Kunal Singh Singh · Yejin Kim · Winson Han · Alvaro Herrasti · Ranjay Krishna · Dustin Schwenk · Eli VanderBilt · Aniruddha Kembhavi
|
||
What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation
Yihua Cheng · Yaning Zhu · Zongji Wang · hongquan hao · Liu wei · Shiqing Cheng · Xi Wang · Hyung Jin Chang
|
||
HUGS: Human Gaussian Splatting
Muhammed Kocabas · Jen-Hao Rick Chang · James Gabriel · Oncel Tuzel · Anurag Ranjan
|
||
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis
Shunyuan Zheng · Boyao ZHOU · Ruizhi Shao · Boning Liu · Shengping Zhang · Liqiang Nie · Yebin Liu
|
||
Commonsense Prototype for Outdoor Unsupervised 3D Object Detection
Hai Wu · Shijia Zhao · Xun Huang · Chenglu Wen · Xin Li · Cheng Wang
|
||
Rapid Motor Adaptation for Robotic Manipulator Arms
Yichao Liang · Kevin Ellis · João F. Henriques
|
||
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Junwen Xiong · Peng Zhang · Tao You · Chuanyue Li · Wei Huang · Yufei Zha
|
||
TurboSL: Dense, Accurate and Fast 3D by Neural Inverse Structured Light
Parsa Mirdehghan · Maxx Wu · Wenzheng Chen · David B. Lindell · Kiriakos Kutulakos
|
||
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Haokun Lin · Haoli Bai · Zhili Liu · Lu Hou · Muyi Sun · Linqi Song · Ying Wei · Zhenan Sun
|
||
Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation
Dong Lao · Congli Wang · Alex Wong · Stefano Soatto
|
||
Adapting to Length Shift: FlexiLength Network for Trajectory Prediction
Yi Xu · Yun Fu
|
||
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Zanlin Ni · Yulin Wang · Renping Zhou · Jiayi Guo · Jinyi Hu · Zhiyuan Liu · Shiji Song · Yuan Yao · Gao Huang
|
||
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices
Huancheng Chen · Haris Vikalo
|
||
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Dewei Zhou · You Li · Fan Ma · Xiaoting Zhang · Yi Yang
|
||
CausalPC: Improving the Robustness of Point Cloud Classification by Causal Effect Identification
Yuanmin Huang · Mi Zhang · Daizong Ding · Erling Jiang · Zhaoxiang Wang · Min Yang
|
||
DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation
Chenyang Wang · Zerong Zheng · Tao Yu · Xiaoqian Lv · Bineng Zhong · Shengping Zhang · Liqiang Nie
|
||
LiSA: LiDAR Localization with Semantic Awareness
Bochun Yang · Zijun Li · Wen Li · zhipeng cai · Chenglu Wen · Yu Zang · Matthias Mueller · Cheng Wang
|
||
Unknown Prompt, the only Lacuna: Unveiling CLIP's Potential for Open Domain Generalization
Mainak Singha · Ankit Jha · Shirsha Bose · Ashwin Nair · Moloud Abdar · Biplab Banerjee
|
||
Tumor Micro-environment Interactions Guided Graph Learning for Survival Analysis of Human Cancers from Whole-slide Pathological Images.
WEI SHAO · YangYang Shi · Daoqiang Zhang · Junjie Zhou · Peng Wan
|
||
Diffusion-based Blind Text Image Super-Resolution
Yuzhe Zhang · jiawei zhang · Hao Li · Zhouxia Wang · Luwei Hou · Dongqing Zou · Liheng Bian
|
||
Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution
Longguang Wang · Juncheng Li · Yingqian Wang · Qingyong Hu · Yulan Guo
|
||
FlowDiffuser: Advancing Optical Flow Estimation with Diffusion Models
Ao Luo · XIN LI · Fan Yang · Jiangyu Liu · Haoqiang Fan · Shuaicheng Liu
|
||
Rethinking Human Motion Prediction with Symplectic Integral
Haipeng Chen · Kedi L yu · Zhenguang Liu · Yifang Yin · Xun Yang · Yingda Lyu
|
||
Holodeck: Language Guided Generation of 3D Embodied AI Environments
Yue Yang · Fan-Yun Sun · Luca Weihs · Eli VanderBilt · Alvaro Herrasti · Winson Han · Jiajun Wu · Nick Haber · Ranjay Krishna · Lingjie Liu · Chris Callison-Burch · Mark Yatskar · Aniruddha Kembhavi · Christopher Clark
|
||
Unleashing Network Potentials for Semantic Scene Completion
Fengyun Wang · Qianru Sun · Dong Zhang · Jinhui Tang
|
||
Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
Xingyu Zhou · Leheng Zhang · Xiaorui Zhao · Keze Wang · Leida Li · Shuhang Gu
|
||
Fully Geometric Panoramic Localization
Junho Kim · Jiwon Jeong · Young Min Kim
|
||
BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image
Minje Kim · Tae-Kyun Kim
|
||
Towards Robust 3D Pose Transfer with Adversarial Learning
Haoyu Chen · Hao Tang · Ehsan Adeli · Guoying Zhao
|
||
Building Vision-Language Models on Solid Foundations with Masked Distillation
Sepehr Sameni · Kushal Kafle · Hao Tan · Simon Jenni
|
||
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
Jeongsoo Choi · Se Jin Park · Minsu Kim · Yong Man Ro
|
||
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong · Weihan Wang · Qingsong Lv · Jiazheng Xu · Wenmeng Yu · Junhui Ji · Yan Wang · Zihan Wang · Yuxiao Dong · Ming Ding · Jie Tang
|
||
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang · Bingyi Kang · Zilong Huang · Xiaogang Xu · Jiashi Feng · Hengshuang Zhao
|
||
Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network
wenqiao Li · Xiaohao Xu · Yao Gu · BoZhong Zheng · Shenghua Gao · Yingna Wu
|
||
Discontinuity-preserving Normal Integration with Auxiliary Edges
Hyomin Kim · Yucheol Jung · Seungyong Lee
|
||
Learning to navigate efficiently and precisely in real environments
Guillaume Bono · Hervé Poirier · Leonid Antsfeld · Gianluca Monaci · Boris Chidlovskii · Christian Wolf
|
||
PAPR in Motion: Seamless Point-level 3D Scene Interpolation
Shichong Peng · Yanshu Zhang · Ke Li
|
||
Towards Modern Image Manipulation Localization: A Large-Scale Dataset and Novel Methods
Chenfan Qu · Yiwu Zhong · Chongyu Liu · Guitao Xu · Dezhi Peng · Fengjun Guo · Lianwen Jin
|
||
Dense Vision Transformer Compression with Few Samples
Hanxiao Zhang · Yifan Zhou · Guo-Hua Wang
|
||
Weakly Supervised Monocular 3D Detection with a Single-View Image
Xueying Jiang · Sheng Jin · Lewei Lu · Xiaoqin Zhang · Shijian Lu
|
||
AM-RADIO: Agglomerative Models - Reduce All Domains Into One
Mike Ranzinger · Greg Heinrich · Jan Kautz · Pavlo Molchanov
|
||
Tune-An-Ellipse: CLIP Has Potential to Find What You Want
Jinheng Xie · Songhe Deng · Bing Li · Haozhe Liu · Yawen Huang · Yefeng Zheng · Jürgen Schmidhuber · Bernard Ghanem · Linlin Shen · Mike Zheng Shou
|
||
LISA: Reasoning Segmentation via Large Language Model
Xin Lai · Zhuotao Tian · Yukang Chen · Yanwei Li · Yuhui Yuan · Shu Liu · Jiaya Jia
|
||
Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM
Pingping Zhang · Tianyu Yan · Yang Liu · Huchuan Lu
|
||
IntrinsicAvatar: Physically Based Inverse Rendering of Dynamic Humans from Monocular Videos via Explicit Ray Tracing
Shaofei Wang · Bozidar Antic · Andreas Geiger · Siyu Tang
|
||
Exploring Pose-Aware Human-Object Interaction via Hybrid Learning
EASTMAN Z Y WU · Yali Li · Yuan Wang · Shengjin Wang
|
||
Multi-modal learning for geospatial vegetation forecasting
Vitus Benson · Claire Robin · Christian Requena-Mesa · LAZARO ALONSO SILVA · Mélanie Weynants · Nora Linscheid · Jose Cortes · Zhihan Gao · Nuno Carvalhais · Markus Reichstein
|
||
All in One Framework for Multimodal Re-identification in the Wild
He Li · Mang Ye · Ming Zhang · Bo Du
|
||
Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification
Chao Yi · Lu Ren · De-Chuan Zhan · Han-Jia Ye
|
||
Bilateral Adaptation for Human-Object Interaction Detection with Occlusion-Robustness
Guangzhi Wang · Yangyang Guo · Ziwei Xu · Mohan Kankanhalli
|
||
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs
Michael Dorkenwald · Nimrod Barazani · Cees G. M. Snoek · Yuki Asano
|
||
MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior
Honghua Chen · Chen Change Loy · Xingang Pan
|
||
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
Omkar Thawakar · Muzammal Naseer · Rao Anwer · Salman Khan · Michael Felsberg · Mubarak Shah · Fahad Shahbaz Khan
|
||
TCP: Textual-based Class-aware Prompt tuning for Visual-Language Model
Hantao Yao · Rui Zhang · Changsheng Xu
|
||
RMT: Retentive Networks Meet Vision Transformers
Qihang Fan · Huaibo Huang · Mingrui Chen · Hongmin Liu · Ran He
|
||
Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs
Lin Song · Yukang Chen · Shuai Yang · Xiaohan Ding · Yixiao Ge · Ying-Cong Chen · Ying Shan
|
||
Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis
Zicheng Zhang · RUOBING ZHENG · Bonan Li · Congying Han · Tianqi Li · Meng Wang · Tiande Guo · Jingdong Chen · Ziwen Liu · Ming Yang
|
||
PairDETR : Joint Detection and Association of Human Bodies and Faces
Ammar Ali · Georgii Gaikov · Denis Rybalchenko · Alexander Chigorin · Ivan Laptev · Sergey Zagoruyko
|
||
Language Models as Black-Box Optimizers for Vision-Language Models
Shihong Liu · Samuel Yu · Zhiqiu Lin · Deepak Pathak · Deva Ramanan
|
||
GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering
Abdullah J Hamdi · Luke Melas-Kyriazi · Jinjie Mai · Guocheng Qian · Ruoshi Liu · Carl Vondrick · Bernard Ghanem · Andrea Vedaldi
|
||
Steerers: A framework for rotation equivariant keypoint descriptors
Georg Bökman · Johan Edstedt · Michael Felsberg · Fredrik Kahl
|
||
On the Faithfulness of Vision Transformer Explanations
Junyi Wu · Weitai Kang · Hao Tang · Yuan Hong · Yan Yan
|
||
Learning Transferable Negative Prompts for Out-of-Distribution Detection
Tianqi Li · Guansong Pang · wenjun miao · Xiao Bai · Jin Zheng
|
||
3D Multi-frame Fusion for Video Stabilization
Zhan Peng · Xinyi Ye · Weiyue Zhao · TIANQI LIU · Huiqiang Sun · Baopu Li · Zhiguo Cao
|
||
Fun with Flags: Robust Principal Directions via Flag Manifolds
Tolga Birdal · Nathan Mankovich
|
||
Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss
Jaeha Kim · Junghun Oh · Kyoung Mu Lee
|
||
Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature Enhancement
Kangmin Xu · Liang Liao · Jing Xiao · Chaofeng Chen · Haoning Wu · Qiong Yan · Weisi Lin
|
||
COLMAP-Free 3D Gaussian Splatting
Yang Fu · Sifei Liu · Amey Kulkarni · Jan Kautz · Alexei A. Efros · Xiaolong Wang
|
||
Towards Realistic Scene Generation with LiDAR Diffusion Models
Haoxi Ran · Vitor Guizilini · Yue Wang
|
||
Point-VOS: Pointing Up Video Object Segmentation
Sabarinath Mahadevan · Idil Esen Zulfikar · Paul Voigtlaender · Bastian Leibe
|
||
Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion
Linzhan Mou · Jun-Kun Chen · Yu-Xiong Wang
|
||
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning
Desai Xie · Jiahao Li · Hao Tan · Xin Sun · Zhixin Shu · Yi Zhou · Sai Bi · Soeren Pirk · ARIE KAUFMAN
|
||
Exploring Orthogonality in Open World Object Detection
Zhicheng Sun · Jinghan Li · Yadong Mu
|
||
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Chancharik Mitra · Brandon Huang · Trevor Darrell · Roei Herzig
|
||
As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors
Seungwoo Yoo · Kunho Kim · Vladimir G. Kim · Minhyuk Sung
|
||
Unifying Automatic and Interactive Matting with Pretrained ViTs
Zixuan Ye · Wenze Liu · He Guo · Yujia Liang · Chaoyi Hong · Hao Lu · Zhiguo Cao
|
||
Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts
Qin Liu · Jaemin Cho · Mohit Bansal · Marc Niethammer
|
||
NViST: In the Wild New View Synthesis from a Single Image with Transformers
Wonbong Jang · Lourdes Agapito
|
||
Authentic Hand Avatar from a Phone Scan via Universal Hand Model
Gyeongsik Moon · Weipeng Xu · Rohan Joshi · Chenglei Wu · Takaaki Shiratori
|
||
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Yunhao Ge · Xiaohui Zeng · Jacob Huffman · Tsung-Yi Lin · Ming-Yu Liu · Yin Cui
|
||
Latency Correction for Event-guided Deblurring and Frame Interpolation
Yixin Yang · Jinxiu Liang · Bohan Yu · Yan Chen · Jimmy S. Ren · Boxin Shi
|
||
ZERO-IG: Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light Images
Yiqi Shi · Duo Liu · Liguo Zhang · Ye Tian · Xuezhi Xia · fuxiaojing
|
||
HINTED: Hard Instance Enhanced Detector with Mixed-Density Feature Fusion for Sparsely-Supervised 3D Object Detection
Qiming Xia · Wei Ye · Hai Wu · Shijia Zhao · Leyuan Xing · Xun Huang · Jinhao Deng · Xin Li · Chenglu Wen · Cheng Wang
|
||
Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
Hyeongjun Kwon · Jinhyun Jang · Jin Kim · Kwonyoung Kim · Kwanghoon Sohn
|
||
Self-supervised Representation Learning from Arbitrary Scenarios
Zhaowen Li · Yousong Zhu · Zhiyang Chen · Zongxin Gao · Rui Zhao · Chaoyang Zhao · Ming Tang · Jinqiao Wang
|
||
NEAT: Distilling 3D Wireframes from Neural Attraction Fields
Nan Xue · Bin Tan · Yuxi Xiao · Liang Dong · Gui-Song Xia · Tianfu Wu · Yujun Shen
|
||
FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization
Shuai Tan · Bin Ji · Ye Pan
|
||
Generating Content for HDR Deghosting from Frequency View
Tao Hu · Qingsen Yan · Yuankai Qi · Yanning Zhang
|
||
Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model
Tian Liang · Jing Huang · Ming Kong · Luyuan Chen · Qiang Zhu
|
||
Dual Prototype Attention for Unsupervised Video Object Segmentation
Suhwan Cho · Minhyeok Lee · Seunghoon Lee · Dogyoon Lee · Heeseung Choi · Ig-Jae Kim · Sangyoun Lee
|
||
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
Kartik Kuckreja · Muhammad Sohail Danish · Muzammal Naseer · Abhijit Das · Salman Khan · Fahad Shahbaz Khan
|
||
Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection
Zhiwei Yang · Jing Liu · Peng Wu
|
||
AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings
Jamie Watson · Filippo Aleotti · Mohamed Sayed · Zawar Qureshi · Oisin Mac Aodha · Gabriel J. Brostow · Michael Firman · Sara Vicente
|
||
Prompt Learning via Meta-Regularization
Jinyoung Park · Juyeon Ko · Hyunwoo J. Kim
|
||
Addressing Background Context Bias in Few-Shot Segmentation through Iterative Modulation
Lanyun Zhu · Tianrun Chen · Jianxiong Yin · Simon See · Jun Liu
|
||
Rethinking the Region Classification in Open-Vocabulary Semantic Segmentation: An Image-to-Image View
Yuan Wang · Rui Sun · Naisong Luo · Yuwen Pan · Tianzhu Zhang
|
||
Local-consistent Transformation Learning for Rotation-invariant Point Cloud Analysis
Yiyang Chen · Lunhao Duan · Shanshan Zhao · Changxing Ding · Dacheng Tao
|
||
KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation
Fengyuan Yang · Kerui Gu · Angela Yao
|
||
SuperNormal: Neural Surface Reconstruction via Multi-View Normal Integration
Xu Cao · Takafumi Taketomi
|
||
Navigating Beyond Dropout: An Intriguing Solution towards Generalizable Image Super-Resolution
Hongjun Wang · Jiyuan Chen · Yinqiang Zheng · Tieyong Zeng
|
||
Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation
Xiaoyang Wang · Huihui Bai · Limin Yu · Yao Zhao · Jimin Xiao
|
||
General Object Foundation Model for Images and Videos at Scale
Junfeng Wu · Yi Jiang · Qihao Liu · Zehuan Yuan · Xiang Bai · Song Bai
|
||
Friendly Sharpness-Aware Minimization
Tao Li · Pan Zhou · Zhengbao He · Xinwen Cheng · Xiaolin Huang
|
||
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch
Xidong Wu · Shangqian Gao · Zeyu Zhang · Zhenzhen Li · Runxue Bao · Yanfu Zhang · Xiaoqian Wang · Heng Huang
|
||
SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image
Yunhao Li · Xiaodong Wang · Ping Wang · Xin Yuan · Peidong Liu
|
||
Complementing Event Streams and RGB Frames for Hand Mesh Reconstruction
Jianping Jiang · xinyu zhou · Bingxuan Wang · Xiaoming Deng · Chao Xu · Boxin Shi
|
||
Emotional Speech-Driven 3D Body Animation via Disentangled Latent Diffusion
Kiran Chhatre · Radek Danecek · Nikos Athanasiou · Giorgio Becherini · Christopher Peters · Michael J. Black · Timo Bolkart
|
||
Deciphering ‘What’ and ‘Where’ Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
Xiao Zhang · David Yunis · Michael Maire
|
||
Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification
Kunlun Xu · Xu Zou · Yuxin Peng · Jiahuan Zhou
|
||
KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation
Jihua Peng · Yanghong Zhou · Tracy P Y Mok
|
||
Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
Zaid Khan · Yun Fu
|
||
Optimal Transport Aggregation for Visual Place Recognition
Sergio Izquierdo · Javier Civera
|
||
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
Hongyu Zhou · Jiahao Shao · Lu Xu · Dongfeng Bai · Weichao Qiu · Bingbing Liu · Yue Wang · Andreas Geiger · Yiyi Liao
|
||
Human Motion Prediction under Unexpected Perturbation
Jiangbei Yue · Baiyi Li · Julien Pettré · Armin Seyfried · He Wang
|
||
LLM-AR: When Large Language Model Meets Skeleton-Based Action Recognition
Haoxuan Qu · Yujun Cai · Jun Liu
|
||
MFP: Making Full use of Probability Maps for Interactive Image Segmentation
Chaewon Lee · Seon-Ho Lee · Chang-Su Kim
|
||
Instantaneous Perception of Moving Objects in 3D
Di Liu · Bingbing Zhuang · Dimitris N. Metaxas · Manmohan Chandraker
|
||
Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining
Jiahao Nie · Yun Xing · Gongjie Zhang · Pei Yan · Aoran Xiao · Yap-peng Tan · Alex C. Kot · Shijian Lu
|
||
Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning
Zhengwei Fang · Rui Wang · Tao Huang · Liping Jing
|
||
Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
Junjiao Tian · Lavisha Aggarwal · Andrea Colaco · Zsolt Kira · Mar Gonzalez-Franco
|
||
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
Runze He · Shaofei Huang · Xuecheng Nie · Tianrui Hui · Luoqi Liu · Jiao Dai · Jizhong Han · Guanbin Li · Si Liu
|
||
Slice3D: Multi-Slice, Occlusion-Revealing, Single View 3D Reconstruction
Yizhi Wang · Wallace Lira · Wenqi Wang · Ali Mahdavi Amiri · Hao Zhang
|
||
Learning to Produce Semi-dense Correspondences for Visual Localization
Khang Truong Giang · Soohwan Song · Sungho Jo
|
||
Differentiable Neural Surface Refinement for Transparent Objects
Weijian Deng · Dylan Campbell · Chunyi Sun · Shubham Kanitkar · Matthew Shaffer · Stephen Gould
|
||
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Yunyang Xiong · Balakrishnan Varadarajan · Lemeng Wu · Xiaoyu Xiang · Fanyi Xiao · Chenchen Zhu · Xiaoliang Dai · Dilin Wang · Fei Sun · Forrest Iandola · Raghuraman Krishnamoorthi · Vikas Chandra
|
||
Look-Up Table Compression for Efficient Image Restoration
Yinglong Li · Jiacheng Li · Zhiwei Xiong
|
||
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Wenhao Li · Mengyuan Liu · Hong Liu · Pichao Wang · Jialun Cai · Nicu Sebe
|
||
RepAn: Enhanced Annealing through Re-parameterization
Xiang Fei · Xiawu Zheng · Yan Wang · Fei Chao · Chenglin Wu · Liujuan Cao
|
||
Vector Graphics Generation via Mutually Impulsed Dual-domain Diffusion
Zhongyin Zhao · Ye Chen · Zhangli Hu · Xuanhong Chen · Bingbing Ni
|
||
FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
You Huang · Zongyu Lan · Liujuan Cao · Xianming Lin · Shengchuan Zhang · Guannan Jiang · Rongrong Ji
|
||
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Yanwu Xu · Yang Zhao · Zhisheng Xiao · Tingbo Hou
|
||
Understanding and Improving Source-free Domain Adaptation from a Theoretical Perspective
Yu Mitsuzumi · Akisato Kimura · Hisashi Kashima
|
||
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Chen Duan · Pei Fu · Shan Guo · Qianyi Jiang · Xiaoming Wei
|
||
CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow
Chenbin Pan · Burhaneddin Yaman · Senem Velipasalar · Liu Ren
|
||
Hyperspherical Classification with Dynamic Label-to-Prototype Assignment
Mohammad Saadabadi Saadabadi · Ali Dabouei · Sahar Rahimi Malakshan · Nasser Nasrabadi
|
||
Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts
Jiawen Zhu · Guansong Pang
|
||
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
Syed Talal Wasim · Muzammal Naseer · Salman Khan · Ming-Hsuan Yang · Fahad Shahbaz Khan
|
||
ODIN: A Single Model for 2D and 3D Segmentation
Ayush Jain · Pushkal Katara · Nikolaos Gkanatsios · Adam Harley · Gabriel Sarch · Kriti Aggarwal · Vishrav Chaudhary · Katerina Fragkiadaki
|
||
Prompt Augmentation for Self-supervised Text-guided Image Manipulation
Rumeysa Bodur · Binod Bhattarai · Tae-Kyun Kim
|
||
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
Yong Liu · Sule Bai · Guanbin Li · Yitong Wang · Yansong Tang
|
||
FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models
Jinglin Xu · Yijie Guo · Yuxin Peng
|
||
MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
Mi Yan · Jiazhao Zhang · Yan Zhu · He Wang
|
||
MemoNav: Working Memory Model for Visual Navigation
Hongxin Li · Zeyu Wang · Xu Yang · yuran Yang · Shuqi Mei · Zhaoxiang Zhang
|
||
PointBeV: A Sparse Approach for BeV Predictions
Loick Chambon · Éloi Zablocki · Mickaël Chen · Florent Bartoccioni · Patrick Pérez · Matthieu Cord
|
||
Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification
kaijie ren · Lei Zhang
|
||
On the Content Bias in Fréchet Video Distance
Songwei Ge · Aniruddha Mahapatra · Gaurav Parmar · Jun-Yan Zhu · Jia-Bin Huang
|
||
Sheared Backpropagation for Finetuning Foundation Models
Zhiyuan Yu · Li Shen · Liang Ding · Xinmei Tian · Yixin Chen · Dacheng Tao
|
||
Hyperbolic Learning with Synthetic Captions for Open-World Detection
Fanjie Kong · Yanbei Chen · Jiarui Cai · Davide Modolo
|
||
NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation
Jiahao Chen · Yipeng Qin · Lingjie Liu · Jiangbo Lu · Guanbin Li
|
||
In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing
Yiran Xu · Zhixin Shu · Cameron Smith · Seoung Wug Oh · Jia-Bin Huang
|
||
Towards Language-Driven Video Inpainting via Multimodal Large Language Models
Jianzong Wu · Xiangtai Li · Chenyang Si · Shangchen Zhou · Jingkang Yang · Jiangning Zhang · Yining Li · Kai Chen · Yunhai Tong · Ziwei Liu · Chen Change Loy
|
||
High Fidelity Person-centric Subject-to-Image Synthesis
Yibin Wang · Weizhong Zhang · Jianwei Zheng · Cheng Jin
|
||
Fixed Point Diffusion Models
Luke Melas-Kyriazi · Xingjian Bai
|
||
Contextual Augmented Global Contrast for Multimodal Intent Recognition
Kaili Sun · Zhiwen Xie · Mang Ye · Huyin Zhang
|
||
SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation System
Yunfei Fan · Tianyu Zhao · Guidong Wang
|
||
MACE: Mass Concept Erasure in Diffusion Models
Shilin Lu · Zilan Wang · Leyang Li · Yanzhu Liu · Adams Wai-Kin Kong
|
||
XFeat: Accelerated Features for Lightweight Image Matching
Guilherme Potje · Felipe Cadar · André Araujo · Renato Martins · Erickson R. Nascimento
|
||
GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation
WEIMING ZHANG · Yexin Liu · Xu Zheng · Lin Wang
|
||
VideoBooth: Diffusion-based Video Generation with Image Prompts
Yuming Jiang · Tianxing Wu · Shuai Yang · Chenyang Si · Dahua Lin · Yu Qiao · Chen Change Loy · Ziwei Liu
|
||
CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization
Yao Ni · Piotr Koniusz
|
||
Day-Night Cross-domain Vehicle Re-identification
Hongchao Li · Jingong Chen · AIHUA ZHENG · Yong Wu · YongLong Luo
|
||
DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis
Jiapeng Tang · Yinyu Nie · Lev Markhasin · Angela Dai · Justus Thies · Matthias Nießner
|
||
SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training
WU Sitong · Haoru Tan · Zhuotao Tian · Yukang Chen · Xiaojuan Qi · Jiaya Jia
|
||
StrokeFaceNeRF: Stroke-based Facial Appearance Editing in Neural Radiance Field
Xiao-juan Li · Dingxi Zhang · Shu-Yu Chen · Feng-Lin Liu
|
||
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu · Hadi Pouransari · Fartash Faghri · Raviteja Vemulapalli · Oncel Tuzel
|
||
Neural Modes: Self-supervised Learning of Nonlinear Modal Subspaces
Jiahong Wang · Yinwei DU · Stelian Coros · Bernhard Thomaszewski
|
||
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
Soyong Shin · Juyong Kim · Eni Halilaj · Michael J. Black
|
||
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong · Zhuang Liu · Yuexiang Zhai · Yi Ma · Yann LeCun · Saining Xie
|
||
YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng · Lin Song · Yixiao Ge · Wenyu Liu · Xinggang Wang · Ying Shan
|
||
Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment
Alireza Ganjdanesh · Shangqian Gao · Heng Huang
|
||
Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion
Yuanxun Lu · Jingyang Zhang · Shiwei Li · Tian Fang · David McKinnon · Yanghai Tsin · Long Quan · Xun Cao · Yao Yao
|
||
Bézier Everywhere All at Once: Learning Drivable Lanes as Bézier Graphs
Hugh Blayney · Hanlin Tian · Hamish Scott · Nils Goldbeck · Chess Stetson · Panagiotis Angeloudis
|
||
FedUV: Uniformity and Variance for Heterogeneous Federated Learning
Ha Min Son · Moon-Hyun Kim · Tai-Myoung Chung · Chao Huang · Xin Liu
|
||
Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes
Ziqian Bai · Feitong Tan · Sean Fanello · Rohit Pandey · Mingsong Dou · Shichen Liu · Ping Tan · Yinda Zhang
|
||
FaceCom: Towards High-fidelity 3D Facial Shape Completion via Optimization and Inpainting Guidance
Yinglong Li · Hongyu Wu · Wang · Qingzhao Qin · yijiao zhao · Yong Wang · Aimin Hao
|
||
RankMatch: Exploring the Better Consistency Regularization for Semi-supervised Semantic Segmentation
Huayu Mai · Rui Sun · Tianzhu Zhang · Feng Wu
|
||
Revisiting Adversarial Training under Long-Tailed Distributions
Xinli Yue · Ningping Mou · Qian Wang · Lingchen Zhao
|
||
From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation
Hyeokjun Kweon · Kuk-Jin Yoon
|
||
VINECS: Video-based Neural Character Skinning
Zhouyingcheng Liao · Vladislav Golyanik · Marc Habermann · Christian Theobalt
|
||
Plug and Play Active Learning for Object Detection
Chenhongyi Yang · Lichao Huang · Elliot Crowley
|
||
Learning Structure-from-Motion with Graph Attention Networks
Lucas Brynte · José Pedro Iglesias · Carl Olsson · Fredrik Kahl
|
||
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
Ruicong Liu · Takehiko Ohkawa · Mingfang Zhang · Yoichi Sato
|
||
Insights from the Use of Previously Unseen Neural Architecture Search Datasets
Rob Geada · David Towers · Matthew Forshaw · Amir Atapour-Abarghouei · Stephen McGough
|
||
Joint-Task Regularization for Partially Labeled Multi-Task Learning
Kento Nishi · Junsik Kim · Wanhua Li · Hanspeter Pfister
|
||
Mind Artist: Creating Artistic Snapshots with Human Thought
Jiaxuan Chen · Yu Qi · Yueming Wang · Gang Pan
|
||
$L_0$-Sampler: An $L_{0}$ Model Guided Volume Sampling for NeRF
Liangchen Li · Juyong Zhang
|
||
SAI3D: Segment Any Instance in 3D Scenes
Yingda Yin · Yuzheng Liu · Yang Xiao · Daniel Cohen-Or · Jingwei Huang · Baoquan Chen
|
||
EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Priors
Zhipeng Hu · Minda Zhao · Chaoyi Zhao · Xinyue Liang · Lincheng Li · Zeng Zhao · Changjie Fan · Xiaowei Zhou · Xin Yu
|
||
Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features
Niladri Shekhar Dutt · Sanjeev Muralikrishnan · Niloy J. Mitra
|
||
SGC-Occ: Semantic-Geometry Consistent 3D Occupancy Prediction for Autonomous Driving
Zhiwen Yang · Xiangteng He · Yuxin Peng
|
||
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
Sibo Wang · Jie Zhang · Zheng Yuan · Shiguang Shan
|
||
De-Diffusion Makes Text a Strong Cross-Modal Interface
Chen Wei · Chenxi Liu · Siyuan Qiao · Zhishuai Zhang · Alan L. Yuille · Jiahui Yu
|
||
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
Yanzuo Lu · Manlin Zhang · Jinhua Ma · Xiaohua Xie · Jianhuang Lai
|
||
Unsupervised Occupancy Learning from Sparse Point Cloud
Amine Ouasfi · Adnane Boukhayma
|
||
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM
Tongyan Hua · Lin Wang
|
||
GLOW: Global Layout Aware Attacks on Object Detection
Jun Bao · Buyu Liu · Kui Ren · Jun Yu
|
||
DeepCache: Accelerating Diffusion Models for Free
Xinyin Ma · Gongfan Fang · Xinchao Wang
|
||
HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention
Xiaolong Tang · Meina Kan · Shiguang Shan · Zhilong Ji · Jinfeng Bai · Xilin Chen
|
||
CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection
Mikhail Kennerley · Jian-Gang Wang · Bharadwaj Veeravalli · Robby T. Tan
|
||
Neural Underwater Scene Representation
Yunkai Tang · Chengxuan Zhu · Renjie Wan · Chao Xu · Boxin Shi
|
||
Scale Decoupled Distillation
Shicai Wei · Chunbo Luo · Yang Luo
|
||
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Tanvir Mahmud · Yapeng Tian · Diana Marculescu
|
||
PolarMatte: Fully Computational Ground-Truth-Quality Alpha Matte Extraction for Images and Video using Polarized Screen Matting
Kenji Enomoto · TJ Rhodes · Brian Price · Gavin Miller
|
||
Traceable Federated Continual Learning
Qiang Wang · Bingyan Liu · Yawen Li
|
||
CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection
Jiayi Zhu · Qing Guo · Felix Juefei Xu · Yihao Huang · Yang Liu · Geguang Pu
|
||
CrossMAE: Cross Modality Masked Autoencoders For Region-Aware Audio-Visual Pre-Training
Yuxin Guo · Siyang Sun · Shuailei Ma · Kecheng Zheng · Xiaoyi Bao · Shijie Ma · Wei Zou · Yun Zheng
|
||
Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models
Gihyun Kwon · Simon Jenni · Ding Li · Joon-Young Lee · Jong Chul Ye · Fabian Caba Heilbron
|
||
CapHuman: Capture Your Moments in Parallel Universes
Chao Liang · Fan Ma · Linchao Zhu · Yingying Deng · Yi Yang
|
||
Vista-LLaMA: Reliable Video Teller via Equal Distance to Visual Tokens
Fan Ma · Xiaojie Jin · Heng Wang · Yuchen Xian · Jiashi Feng · Yi Yang
|
||
Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach
Wei Dong · Xing Zhang · Bihui Chen · Dawei Yan · Zhijun Lin · Qingsen Yan · Peng Wang · Yang Yang
|
||
Real-World Mobile Image Denoising Dataset with Efficient Baselines
Roman Flepp · Andrey Ignatov · Radu Timofte · Luc Van Gool
|
||
PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving
Xinshuo Weng · Boris Ivanovic · Yan Wang · Yue Wang · Marco Pavone
|
||
SRTube: Video-Language Pre-Training with Action-Centric Video Tube Features and Semantic Role Labeling
Juhee Lee · Jewon Kang
|
||
Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles
Vanessa Skliarova · Egor Zakharov · Otmar Hilliges · Michael J. Black · Justus Thies
|
||
MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading
Abdallah Dib · Luiz Gustavo Hafemann · Emeline Got · Trevor Anderson · Amin Fadaeinejad · Rafael M. O. Cruz · Marc-André Carbonneau
|
||
Defense without Forgetting: Continual Adversarial Defense with Anisotropic & Isotropic Pseudo Replay
Yuhang Zhou · Zhongyun Hua
|
||
Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension
Quan Liu · Hongzi Zhu · Zhenxi Wang · Yunsong Zhou · Shan Chang · Minyi Guo
|
||
PoseIRM: Enhance 3D Human Pose Estimation on Unseen Camera Settings via Invariant Risk Minimization
Yanlu Cai · Weizhong Zhang · Yuan Wu · Cheng Jin
|
||
UniHuman: A Unified Model For Editing Human Images in the Wild
Nannan Li · Qing Liu · Krishna Kumar Singh · Yilin Wang · Jianming Zhang · Bryan A. Plummer · Zhe Lin
|
||
Learning to Select Views for Efficient Multi-View Understanding
Yunzhong Hou · Stephen Gould · Liang Zheng
|
||
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
shiyu xuan · Qingpei Guo · Ming Yang · Shiliang Zhang
|
||
Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields
TIANQI LIU · Xinyi Ye · Min Shi · Zihao Huang · Zhiyu Pan · Zhan Peng · Zhiguo Cao
|
||
Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds
Yujia Liu · Anton Obukhov · Jan D. Wegner · Konrad Schindler
|
||
Active Object Detection with Knowledge Aggregation and Distillation from Large Models
Dejie Yang · Yang Liu
|
||
ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations
Rwiddhi Chakraborty · Adrian de Sena Sletten · Michael C. Kampffmeyer
|
||
RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation
Oded Bialer · Yuval Haitman
|
||
FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer
Dongyeong Hwang · Hyunju Kim · Sunwoo Kim · Kijung Shin
|
||
Mip-Splatting: Alias-free 3D Gaussian Splatting
Zehao Yu · Anpei Chen · Binbin Huang · Torsten Sattler · Andreas Geiger
|
||
Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation
Guangyang Wu · Xiaohong Liu · Jun Jia · Xuehao Cui · Guangtao Zhai
|
||
UV-IDM: Identity-Conditioned Latent Diffusion Model for Face UV-Texture Generation
Hong Li · Yutang Feng · Song Xue · Xuhui Liu · Boyu Liu · Bohan Zeng · Shanglin Li · Jianzhuang Liu · Shumin Han · Baochang Zhang
|
||
UniPTS: A Unified Framework for Proficient Post-Training Sparsity
JingJing Xie · Yuxin Zhang · Mingbao Lin · ZhiHang Lin · Liujuan Cao · Rongrong Ji
|
||
PBWR: Parametric Building Wireframe Reconstruction from Aerial LiDAR Point Clouds
Shangfeng Huang · Ruisheng Wang · Bo Guo · Hongxin Yang
|
||
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning
Sijin Chen · Xin Chen · Chi Zhang · Mingsheng Li · Gang Yu · Hao Fei · Hongyuan Zhu · Jiayuan Fan · Tao Chen
|
||
ProMark: Proactive Diffusion Watermarking for Causal Attribution
Vishal Asnani · John Collomosse · Tu Bui · Xiaoming Liu · Shruti Agarwal
|
||
MMM: Generative Masked Motion Model
Ekkasit Pinyoanuntapong · Pu Wang · Minwoo Lee · Chen Chen
|
||
Bridging the Gap Between End-to-End and Two-Step Text Spotting
Mingxin Huang · Hongliang Li · Yuliang Liu · Xiang Bai · Lianwen Jin
|
||
GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction
Xiao Chen · Quanyi Li · Tai Wang · Tianfan Xue · Jiangmiao Pang
|
||
Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning
Fan Qi · Shuai Li
|
||
VS: Reconstructing Clothed 3D Human from Single Image via Vertex Shift
Leyuan Liu · Yuhan Li · Yunqi Gao · Changxin Gao · Yuanyuan Liu · Jingying Chen
|
||
En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data
Yifang Men · Biwen Lei · Yuan Yao · Miaomiao Cui · Zhouhui Lian · Xuansong Xie
|
||
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Xiaoxiao Long · Yuan-Chen Guo · Cheng Lin · Yuan Liu · Zhiyang Dou · Lingjie Liu · Yuexin Ma · Song-Hai Zhang · Marc Habermann · Christian Theobalt · Wenping Wang
|
||
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha · Woo-Young Kang · Jonghwan Mun · Byungseok Roh
|
||
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
Zhiqi Li · Zhiding Yu · Shiyi Lan · Jiahan Li · Jan Kautz · Tong Lu · Jose M. Alvarez
|
||
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Zaid Khan · Vijay Kumar BG · Samuel Schulter · Yun Fu · Manmohan Chandraker
|
||
MoMask: Generative Masked Modeling of 3D Human Motions
chuan guo · Yuxuan Mu · Muhammad Gohar Javed · Sen Wang · Li Cheng
|
||
Text2Loc: 3D Point Cloud Localization from Natural Language
Yan Xia · Letian Shi · Zifeng Ding · João F. Henriques · Daniel Cremers
|
||
Gaussian Shadow Casting for Neural Characters
Luis Bolanos · Shih-Yang Su · Helge Rhodin
|
||
SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers
Jonathan F. Carter · Joao Jorge · Oliver Gibson · Lionel Tarassenko
|
||
Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences
Seungwook Kim · Kejie Li · Xueqing Deng · Yichun Shi · Minsu Cho · Peng Wang
|
||
BigGait: Learning Gait Representation You Want by Large Vision Models
Dingqiang Ye · Chao Fan · Jingzhe Ma · Xiaoming Liu · Shiqi Yu
|
||
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
Haochen Han · Qinghua Zheng · Guang Dai · Minnan Luo · Jingdong Wang
|
||
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments
Liyuan Zhu · Shengyu Huang · Konrad Schindler · Iro Armeni
|
||
Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition
Kyle Buettner · Sina Malakouti · Xiang Li · Adriana Kovashka
|
||
DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection
Yuhao Sun · Lingyun Yu · Hongtao Xie · Jiaming Li · Yongdong Zhang
|
||
Loopy-SLAM: Dense Neural SLAM with Loop Closures
Lorenzo Liso · Erik Sandström · Vladimir Yugay · Luc Van Gool · Martin R. Oswald
|
||
DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation
Xiaoliang Ju · Zhaoyang Huang · Yijin Li · Guofeng Zhang · Yu Qiao · Hongsheng Li
|
||
Feedback-Guided Autonomous Driving
Jimuyang Zhang · Zanming Huang · Arijit Ray · Eshed Ohn-Bar
|
||
Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance
Yu · Jie Huang · Li · Kaiwen Zheng · Qi Zhu · Man Zhou · Feng Zhao
|
||
LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering
Jaehoon Choi · Rajvi Shah · Qinbo Li · Yipeng Wang · Ayush Saraf · Changil Kim · Jia-Bin Huang · Dinesh Manocha · Suhib Alsisan · Johannes Kopf
|
||
Test-Time Linear Out-of-Distribution Detection
Ke Fan · Tong Liu · Xingyu Qiu · Yikai Wang · Lian Huai · Zeyu Shangguan · Shuang Gou · FENGJIAN LIU · Yuqian Fu · Yanwei Fu · Xingqun Jiang
|
||
Matching Anything by Segmenting Anything
Siyuan Li · Lei Ke · Martin Danelljan · Luigi Piccinelli · Mattia Segu · Luc Van Gool · Fisher Yu
|
||
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
Chengjian Feng · Yujie Zhong · Zequn Jie · Weidi Xie · Lin Ma
|
||
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
Shiyi Zhang · Sule Bai · Guangyi Chen · Lei Chen · Jiwen Lu · Junle Wang · Yansong Tang
|
||
Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition
Zihan Wang · Siyang Song · Cheng Luo · Songhe Deng · Weicheng Xie · Linlin Shen
|
||
Tailored Visions: Enhancing Text-to-Image Generation with Personalized Prompt Rewriting
Zijie Chen · Lichao Zhang · Fangsheng Weng · Lili Pan · ZHENZHONG Lan
|
||
Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation
Sixian Zhang · Xinyao Yu · Xinhang Song · XIAOHAN Wang · Shuqiang Jiang
|
||
Multi-view Aggregation Network for Dichotomous Image Segmentation
Qian Yu · Xiaoqi Zhao · Youwei Pang · Lihe Zhang · Huchuan Lu
|
||
EVCap: Retrieval-Augmented Image Captioning with External Visual--Name Memory for Open-World Comprehension
Jiaxuan Li · Duc Minh Vo · Akihiro Sugimoto · Hideki Nakayama
|
||
Plug-and-Play Diffusion Distillation
Yi-Ting Hsiao · Siavash Khodadadeh · Kevin Duarte · Wei-An Lin · Hui Qu · Mingi Kwon · Ratheesh Kalarot
|
||
CLIB-FIQA: Face Image Quality Assessment with Confidence Calibration
Fu-Zhao Ou · Fu-Zhao Ou · Chongyi Li · Shiqi Wang · Sam Kwong
|
||
TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding
Zhihao Zhang · Shengcao Cao · Yu-Xiong Wang
|
||
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada · Kanta Kaneda · Daichi Saito · Komei Sugiura
|
||
XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold
Guangyu Wang · Jinzhi Zhang · Fan Wang · Ruqi Huang · Lu Fang
|
||
FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models
Adrian Bulat · Yassine Ouali · Georgios Tzimiropoulos
|
||
Differentiable Micro-Mesh Construction
Yishun Dou · Zhong Zheng · Qiaoqiao Jin · Rui Shi · Yuhan Li · Bingbing Ni
|
||
CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement
Qiang Zhu · Jinhua Hao · Yukang Ding · Yu Liu · Qiao Mo · Ming Sun · Chao Zhou · Shuyuan Zhu
|
||
Enhancing Vision-Language Pretraining with Rich Supervisions
Yuan Gao · Kunyu Shi · Pengkai Zhu · Edouard Belval · Oren Nuriel · Srikar Appalaraju · Shabnam Ghadar · Zhuowen Tu · Vijay Mahadevan · Stefano Soatto
|
||
HOISDF: Constraining 3D Hand Object Pose Estimation with Global Signed Distance Fields
Haozhe Qi · Chen Zhao · Mathieu Salzmann · Alexander Mathis
|
||
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
Xuanming Cui · Alejandro Aparcedo · Young Kyun Jang · Ser-Nam Lim
|
||
Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations
Daan de Geus · Gijs Dubbelman
|
||
Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
Wei Zhang · Chaoqun Wan · Tongliang Liu · Xinmei Tian · Xu Shen · Jieping Ye
|
||
Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring
Xin Gao · Tianheng Qiu · Xinyu Zhang · Hanlin Bai · Kang Liu · xuan huang · Hu Wei · Guoying Zhang · Huaping Liu
|
||
Countering Personalized Text-to-Image Generation with Influence Watermarks
Hanwen Liu · Zhicheng Sun · Yadong Mu
|
||
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields
Yunsong Wang · Hanlin Chen · Gim Hee Lee
|
||
SNIDA: Unlocking Few-Shot Object Detection with Non-linear Semantic Decoupling Augmentation
Yanjie Wang · Xu Zou · Luxin Yan · Sheng Zhong · Jiahuan Zhou
|
||
Automatic Controllable Colorization via Imagination
Xiaoyan Cong · Yue Wu · Qifeng Chen · Chenyang Lei
|
||
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
Yukang Cao · Yan-Pei Cao · Kai Han · Ying Shan · Kwan-Yee K. Wong
|
||
Are Conventional SNNs Really Efficient? A Perspective from Network Quantization
Guobin Shen · Dongcheng Zhao · Tenglong Li · Jindong Li · Yi Zeng
|
||
Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching
Peng Xu · Zhiyu Xiang · Chengyu Qiao · Jingyun Fu · Tianyu Pu
|
||
Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models
Shweta Mahajan · Tanzila Rahman · Kwang Moo Yi · Leonid Sigal
|
||
Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships
Sebastian Koch · Narunas Vaskevicius · Mirco Colosi · Pedro Hermosilla · Timo Ropinski
|
||
DeconfuseTrack: Dealing with Confusion for Multi-Object Tracking
Cheng Huang · Shoudong Han · Mengyu He · Wenbo Zheng · Yuhao Wei
|
||
PoseGPT: Chatting about 3D Human Pose
Yao Feng · Jing Lin · Sai Kumar Dwivedi · Yu Sun · Priyanka Patel · Michael J. Black
|
||
Improved Baselines with Visual Instruction Tuning
Haotian Liu · Chunyuan Li · Yuheng Li · Yong Jae Lee · Yong Jae Lee
|
||
DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
Jiaxin Zhang · Dezhi Peng · Chongyu Liu · Peirong Zhang · Lianwen Jin
|
||
Bilateral Propagation Network for Depth Completion
Jie Tang · Fei-Peng Tian · Boshi An · Jian Li · Ping Tan
|
||
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
Zichen Miao · Jiang Wang · Ze Wang · Zhengyuan Yang · Lijuan Wang · Qiang Qiu · Zicheng Liu
|
||
Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation
Lior Talker · Aviad Cohen · Erez Yosef · Alexandra Dana · Michael Dinerstein
|
||
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Zetong Yang · Li Chen · Yanan Sun · Hongyang Li
|
||
On the Road to Portability: Compressing End-to-End Motion Planner for Autonomous Driving
Kaituo Feng · Changsheng Li · Dongchun Ren · Ye Yuan · Guoren Wang
|
||
NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models
Yusuf Dalva · Pinar Yanardag
|
||
Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion
Hao Ai · Lin Wang
|
||
Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
Nikita Starodubcev · Dmitry Baranchuk · Artem Fedorov · Artem Babenko
|
||
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
Fangfu Liu · Diankun Wu · Yi Wei · Yongming Rao · Yueqi Duan
|
||
Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts
Fei Ni · Jianye Hao · Shiguang Wu · Longxin Kou · Jiashun Liu · YAN ZHENG · Bin Wang · Yuzheng Zhuang
|
||
Improving Distant 3D Object Detection Using 2D Box Supervision
Zetong Yang · Zhiding Yu · Christopher Choy · Renhao Wang · Anima Anandkumar · Jose M. Alvarez
|
||
Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment
Angchi Xu · Wei-Shi Zheng
|
||
Infrared Small Target Detection with Scale and Location Sensitivity
Qiankun Liu · Rui Liu · Bolun Zheng · Hongkui Wang · Ying Fu
|
||
Minimal Perspective Autocalibration
Andrea Porfiri Dal Cin · Timothy Duff · Luca Magri · Tomas Pajdla
|
||
SVGDreamer: Text Guided SVG Generation with Diffusion Model
XiMing Xing · Chuang Wang · Haitao Zhou · Jing Zhang · Dong Xu · Qian Yu
|
||
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
xin zhang · Jiawei Du · Weiying Xie · Yunsong Li · Joey Tianyi Zhou
|
||
GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo
Jiang Wu · Rui Li · Haofei Xu · Wenxun Zhao · Yu Zhu · Jinqiu Sun · Yanning Zhang
|
||
Paint3D: Paint Anything 3D with Lighting-less Texture Diffusion Models
Xianfang Zeng · Xin Chen · Zhongqi Qi · Wen Liu · Zibo Zhao · Zhibin Wang · Bin Fu · Yong Liu · Gang Yu
|
||
Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption
Buzhen Huang · Chen Li · Chongyang Xu · Liang Pan · Yangang Wang · Gim Hee Lee
|
||
VRetouchEr: Learning Cross-frame Feature Interdependence with Imperfection Flow for Face Retouching in Videos
Wen Xue · Le Jiang · Lianxin Xie · Si Wu · Yong Xu · Hau San Wong
|
||
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
Ruichen Ma · Guanchao Qiao · Yian Liu · Liwei Meng · Ning Ning · Yang Liu · Shaogang Hu
|
||
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing
Boqiang Zhang · Hongtao Xie · Zuan Gao · Yuxin Wang
|
||
Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation
Xianghui Xie · Bharat Lal Bhatnagar · Jan Lenssen · Gerard Pons-Moll
|
||
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Yutong Feng · Biao Gong · Di Chen · Yujun Shen · Yu Liu · Jingren Zhou
|
||
GeoAuxNet: Towards Universal 3D Representation Learning for Multi-sensor Point Clouds
Shengjun Zhang · Xin Fei · Yueqi Duan
|
||
SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis
Teng Hu · Ran Yi · Baihong Qian · Jiangning Zhang · Paul L. Rosin · Yu-Kun Lai
|
||
Video ReCap: Recursive Captioning of Hour-Long Videos
Md Mohaiminul Islam · Vu Bao Ngan Ho · Xitong Yang · Tushar Nagarajan · Lorenzo Torresani · Gedas Bertasius
|
||
Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention, Alignment and Prompt Tuning
Leslie Ching Ow Tiong · Dick Sigmund · Chen-Hui Chan · Andrew Beng Jin Teoh
|
||
G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
Yufei Ye · Abhinav Gupta · Kris Kitani · Shubham Tulsiani
|
||
MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation
Hanzhe Hu · Zhizhuo Zhou · Varun Jampani · Shubham Tulsiani
|
||
IQ-VFI: Implicit Quadratic Motion Estimation for Video Frame Interpolation
Mengshun Hu · Kui Jiang · Zhihang Zhong · Zheng Wang · Yinqiang Zheng
|
||
Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition
Anqi Zhu · Qiuhong Ke · Mingming Gong · James Bailey
|
||
Semantic-aware SAM for Point-Prompted Instance Segmentation
Zhaoyang Wei · Pengfei Chen · Xuehui Yu · Guorong Li · Jianbin Jiao · Zhenjun Han
|
||
CoGS: Controllable Gaussian Splatting
Heng Yu · Joel Julin · Zoltán Á. Milacski · Koichiro Niinuma · László A. Jeni
|
||
A Bayesian Approach to OOD Robustness in Image Classification
Prakhar Kaushik · Adam Kortylewski · Alan L. Yuille
|
||
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu · Christopher Clark · Sangho Lee · Zichen Zhang · Savya Khosla · Ryan Marten · Derek Hoiem · Aniruddha Kembhavi
|
||
PTQ4SAM: Post-Training Quantization for Segment Anything
Chengtao Lv · Hong Chen · Jinyang Guo · Yifu Ding · Xianglong Liu
|
||
Leveraging Predicate and Triplet Learning for Scene Graph Generation
Jiankai Li · Yunhong Wang · Xiefan Guo · Ruijie Yang · Weixin Li
|
||
Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment
Alvi Md Ishmam · Chris Thomas
|
||
PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors
Haley So · Laurie Bose · Piotr Dudek · Gordon Wetzstein
|
||
Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning
Xialei Liu · Jiang-Tian Zhai · Andrew Bagdanov · Ke Li · Ming-Ming Cheng
|
||
Action Detection via an Image Diffusion Process
Lin Geng Foo · Tianjiao Li · Hossein Rahmani · Jun Liu
|
||
Disentangled Prompt Representation for Domain Generalization
De Cheng · Zhipeng Xu · XINYANG JIANG · Nannan Wang · Dongsheng Li · Xinbo Gao
|
||
PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios
Jingbo Wang · Zhengyi Luo · Ye Yuan · Yixuan LI · Bo Dai
|
||
SAOR: Single-View Articulated Object Reconstruction
Mehmet Aygun · Oisin Mac Aodha
|
||
TULIP: Transformer for Upsampling of LiDAR Point Cloud
Bin Yang · Patrick Pfreundschuh · Roland Siegwart · Marco Hutter · Peyman Moghadam · Vaishakh Patil
|
||
Incremental Residual Concept Bottleneck Models
Chenming Shang · Shiji Zhou · Hengyuan Zhang · Xinzhe Ni · Yujiu Yang · Yuwang Wang
|
||
Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement
Han Wu · Guanyan Ou · Weibin Wu · Zibin Zheng
|
||
Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
Jin-Chuan Shi · Miao Wang · Haobin Duan · Shaohua Guan
|
||
Efficient Dataset Distillation via Minimax Diffusion
Jianyang Gu · Saeed Vahidian · Vyacheslav Kungurtsev · Haonan Wang · Wei Jiang · Yang You · Yiran Chen
|
||
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Lirui Zhao · Yue Yang · Kaipeng Zhang · Wenqi Shao · Yuxin Zhang · Yu Qiao · Ping Luo · Rongrong Ji
|
||
Density-Adaptive Model Based on Motif Matrix for Multi-Agent Trajectory Prediction
Di Wen · Haoran Xu · Zhaocheng He · Zhe Wu · Guang Tan · Peixi Peng
|
||
Towards Accurate Post-training Quantization for Diffusion Models
Changyuan Wang · Ziwei Wang · Xiuwei Xu · Yansong Tang · Jie Zhou · Jiwen Lu
|
||
GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
Chi Yan · Delin Qu · Dong Wang · Dan Xu · Zhigang Wang · Bin Zhao · Xuelong Li
|
||
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Xiangheng Shan · Dongyue Wu · Guilin Zhu · Yuanjie Shao · Nong Sang · Changxin Gao
|
||
View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network
Quan Zhang · Lei Wang · Vishal M. Patel · Xiaohua Xie · Jianhuang Lai
|
||
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang · Guo Chen · Jilan Xu · Mingfang Zhang · Lijin Yang · Baoqi Pei · Hongjie Zhang · Lu Dong · Yali Wang · Limin Wang · Yu Qiao
|
||
DUSt3R: Geometric 3D Vision Made Easy
Shuzhe Wang · Vincent Leroy · Yohann Cabon · Boris Chidlovskii · Jerome Revaud
|
||
InceptionNeXt: When Inception Meets ConvNeXt
Weihao Yu · Pan Zhou · Shuicheng Yan · Xinchao Wang
|
||
MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild
Zeren Jiang · Chen Guo · Manuel Kaufmann · Tianjian Jiang · Julien Valentin · Otmar Hilliges · Jie Song
|
||
Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval
Rohan Sarkar · Avinash Kak
|
||
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Chaolei Tan · Jianhuang Lai · Wei-Shi Zheng · Jian-Fang Hu
|
||
Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation
Haojie Zhang · Yongyi Su · Xun Xu · Kui Jia
|
||
TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process
Zhiyuan Ren · Minchul Kim · Feng Liu · Xiaoming Liu
|
||
MLP Can Be A Good Transformer Learner
Sihao Lin · Pumeng Lyu · Dongrui Liu · Tao Tang · Xiaodan Liang · Andy Song · Xiaojun Chang
|
||
Learning Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification
Zhenyu Cui · Jiahuan Zhou · Xun Wang · Manyu Zhu · Yuxin Peng
|
||
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
Jing Shi · Wei Xiong · Zhe Lin · HyunJoon Jung
|
||
Towards a Perceptual Evaluation Framework for Lighting Estimation
Justine Giroux · Mohammad Reza Karimi Dastjerdi · Yannick Hold-Geoffroy · Javier Vazquez-Corral · Jean-François Lalonde
|
||
RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos
Hongchi Xia · Yang Fu · Sifei Liu · Xiaolong Wang
|
||
Aligning and Prompting Everything All at Once for Universal Visual Perception
Yunhang Shen · Chaoyou Fu · Peixian Chen · Mengdan Zhang · Ke Li · Xing Sun · Yunsheng Wu · Shaohui Lin · Rongrong Ji
|
||
DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance
Zixuan Wang · Jia Jia · Shikun Sun · Haozhe Wu · Rong Han · Zhenyu Li · Di Tang · Jiaqing Zhou · Jiebo Luo
|
||
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
Hanwen Jiang · Arjun Karpur · Bingyi Cao · Qixing Huang · André Araujo
|
||
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
Linfeng Yuan · Miaojing Shi · Zijie Yue · Qijun Chen
|
||
Diffusion-FOF: Single-view Clothed Human Reconstruction via Diffusion-based Fourier Occupancy Field
Yuanzhen Li · Fei LUO · Chunxia Xiao
|
||
Leveraging Frame Affinity for sRGB-to-RAW Video De-rendering
Chen Zhang · Wencheng Han · Yang Zhou · Jianbing Shen · Cheng-Zhong Xu · Wentao Liu
|
||
Joint2Human: High-quality 3D Human Generation via Compact Spherical Embedding of 3D Joints
Muxin Zhang · Qiao Feng · Zhuo Su · Chao Wen · Zhou Xue · Kun Li
|
||
Investigating Compositional Challenges in Vision-Language Models for Visual Grounding
Yunan Zeng · Yan Huang · Jinjin Zhang · Zequn Jie · Zhenhua Chai · Liang Wang
|
||
Relightful Harmonization: Lighting-aware Portrait Background Replacement
Mengwei Ren · Wei Xiong · Jae Shin Yoon · Zhixin Shu · Jianming Zhang · HyunJoon Jung · Guido Gerig · He Zhang
|
||
eTraM: Event-based Traffic Monitoring Dataset
Aayush Atul Verma · Bharatesh Chakravarthi · Arpitsinh Vaghela · Hua Wei · 'YZ' Yezhou Yang
|
||
Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
Mark Hamilton · Andrew Zisserman · John Hershey · William Freeman
|
||
FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion
George Cazenavette · Avneesh Sud · Thomas Leung · Ben Usman
|
||
Overcoming Data Limitations for High-Quality Video Diffusion Models
Haoxin Chen · Yong Zhang · Xiaodong Cun · Menghan Xia · Xintao Wang · CHAO WENG · Ying Shan
|
||
TextNeRF: A Novel Scene-Text Image Synthesis Method based on Neural Radiance Fields
Jialei Cui · Jianwei Du · Wenzhuo Liu · Zhouhui Lian
|
||
MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection
Haowen Sun · Yueqi Duan · Juncheng Yan · Yifan Liu · Jiwen Lu
|
||
GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence
Van Nguyen Nguyen · Thibault Groueix · Mathieu Salzmann · Vincent Lepetit
|
||
Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement
Xiuquan Hou · Meiqin Liu · Senlin Zhang · Ping Wei · Badong Chen
|
||
Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts
Dominik Scheuble · Chenyang Lei · Mario Bijelic · Seung-Hwan Baek · Felix Heide
|
||
Multi-Attribute Interactions Matter for 3D Visual Grounding
Can Xu · Yuehui Han · Rui Xu · Le Hui · Jin Xie · Jian Yang
|
||
Bootstrapping Autonomous Driving Radars with Self-Supervised Learning
Yiduo Hao · Sohrab Madani · Junfeng Guan · Mo Alloulah · Saurabh Gupta · Haitham Al Hassanieh
|
||
CAD: Photorealistic 3D Generation via Adversarial Distillation
Ziyu Wan · Despoina Paschalidou · Ian Huang · Hongyu Liu · Bokui Shen · Xiaoyu Xiang · Jing Liao · Leonidas Guibas
|
||
DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking
Fei Xie · Zhongdao Wang · Chao Ma
|
||
SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency
Paul Roetzer · Florian Bernard
|
||
Towards Better Vision-Inspired Vision-Language Models
Yun-Hao Cao · Kaixiang Ji · Ziyuan Huang · Chuanyang Zheng · Jiajia Liu · Jian Wang · Jingdong Chen · Ming Yang
|
||
Generative Quanta Color Imaging
Vishal Purohit · Junjie Luo · Yiheng Chi · Qi Guo · Stanley H. Chan · Qiang Qiu
|
||
Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models
Zijin Yang · Kai Zeng · Kejiang Chen · Han Fang · Weiming Zhang · Nenghai Yu
|
||
Training Like a Medical Resident: Context-Prior Learning Toward Universal Medical Image Segmentation
Yunhe Gao
|
||
ParamISP: Learned Forward and Inverse ISPs using Camera Parameters
Woohyeok Kim · Geonu Kim · Junyong Lee · Seungyong Lee · Seung-Hwan Baek · Sunghyun Cho
|
||
Structured Model Probing: Empowering Efficient Transfer Learning by Structured Regularization
Zhi-Fan Wu · Chaojie Mao · Xue Wang · Jianwen Jiang · Yiliang Lv · Rong Jin
|
||
Instance-aware Contrastive Learning for Occluded Human Mesh Reconstruction
Mi-Gyeong Gwon · Gi-Mun Um · Won-Sik Cheong · Wonjun Kim
|
||
SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field
Lizhe Liu · Bohua Wang · Hongwei Xie · Daqi Liu · Li Liu · Kuiyuan Yang · Bing Wang · Zhiqiang Tian
|
||
WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects under Occlusion
Khiem Vuong · N. Dinesh Reddy · Robert Tamburo · Srinivasa G. Narasimhan
|
||
Data Valuation and Detections in Federated Learning
Wenqian Li · Shuran Fu · Fengrui Zhang · Yan Pang
|
||
UnO: Unsupervised Occupancy Fields for Perception and Forecasting
Ben Agro · Quinlan Sykora · Sergio Casas · Thomas Gilles · Raquel Urtasun
|
||
DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
Jaehyeok Shim · Kyungdon Joo
|
||
Unveiling the Unknown: Unleashing the Power of Unknown to Known in Open-Set Source-Free Domain Adaptation
Fuli Wan · Han Zhao · Xu Yang · Cheng Deng
|
||
AutoAD III: The Prequel -- Back to the Pixels
Tengda Han · Max Bain · Arsha Nagrani · Gül Varol · Weidi Xie · Andrew Zisserman
|
||
Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner
Mengfei Xia · Yujun Shen · Changsong Lei · Yu Zhou · Deli Zhao · Ran Yi · Wenping Wang · Yong-Jin Liu
|
||
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder · Ziad Al-Halah · Kristen Grauman
|
||
Diversity-aware Channel Pruning for StyleGAN Compression
Jiwoo Chung · Sangeek Hyun · Sang-Heon Shim · Jae-Pil Heo
|
||
SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models
Feifei Wang · Zhentao Tan · Tianyi Wei · Yue Wu · Qidong Huang
|
||
RobustSAM: Segment Anything Robustly on Degraded Images
Wei-Ting Chen · Yu Jiet Vong · Sy-Yen Kuo · Sizhuo Ma · Jian Wang
|
||
Learned Trajectory Embedding for Subspace Clustering
Yaroslava Lochman · Christopher Zach · Carl Olsson
|
||
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
Xian Liu · Xiaohang Zhan · Jiaxiang Tang · Ying Shan · Gang Zeng · Dahua Lin · Xihui Liu · Ziwei Liu
|
||
Rethinking Inductive Biases for Surface Normal Estimation
Gwangbin Bae · Andrew J. Davison
|
||
Dynamic Prompt Optimizing for Text-to-Image Generation
Wenyi Mo · Tianyu Zhang · Yalong Bai · Bing Su · Ji-Rong Wen · Qing Yang
|
||
Grounded Question-Answering in Long Egocentric Videos
Shangzhe Di · Weidi Xie
|
||
Learning Inclusion Matching for Animation Paint Bucket Colorization
Yuekun Dai · Shangchen Zhou · Blake Li · Chongyi Li · Chen Change Loy
|
||
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
Yixuan Zhu · Ao Li · Yansong Tang · Wenliang Zhao · Jie Zhou · Jiwen Lu
|
||
Pose-Guided Self-Training with Two-Stage Clustering for Unsupervised Landmark Discovery
Siddharth Tourani · Ahmed Alwheibi · Arif Mahmood · Muhammad Haris Khan
|
||
PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection
Xiaofan Li · Zhizhong Zhang · Xin Tan · Yanyun Qu · Chengwei Chen · Yuan Xie · Lizhuang Ma
|
||
RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang · Hui Chen · Zijia Lin · Jungong Han · Guiguang Ding
|
||
Simple Semantic-Aided Few-Shot Learning
Hai Zhang · Junzhe Xu · Shanlin Jiang · Zhenan He
|
||
OVMR: Open-Vocabulary Recognition with Multi-Modal References
Zehong Ma · Shiliang Zhang · Longhui Wei · Qi Tian
|
||
An edit friendly ddpm noise space: inversion and manipulations
Inbar Huberman-Spiegelglas · Vladimir Kulikov · Tomer Michaeli
|
||
AdaShift: Learning Discriminative Self-Gated Neural Feature Activation With an Adaptive Shift Factor
Sudong Cai
|
||
Improved Implicit Neural Representation with Fourier Reparameterized Training
Kexuan Shi · Xingyu Zhou · Shuhang Gu
|
||
U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation
You Wu · Kean Liu · Xiaoyue Mi · Fan Tang · Juan Cao · Jintao Li
|
||
DaReNeRF: Direction-aware Representation for Dynamic Scenes
Ange Lou · Benjamin Planche · Zhongpai Gao · Yamin Li · Tianyu Luan · Hao Ding · Terrence Chen · Jack Noble · Ziyan Wu
|
||
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
Jianjian Cao · Peng Ye · Shengze Li · Chong Yu · Yansong Tang · Jiwen Lu · Tao Chen
|
||
COCONut: Modernizing COCO Segmentation
Xueqing Deng · Qihang Yu · Peng Wang · Xiaohui Shen · Liang-Chieh Chen
|
||
Towards Automated Movie Trailer Generation
Dawit Argaw Argaw · Mattia Soldan · Alejandro Pardo · Chen Zhao · Fabian Caba Heilbron · Joon Chung · Bernard Ghanem
|
||
How to Configure Good In-Context Sequence for Visual Question Answering
Li Li · Jiawei Peng · huiyi chen · Chongyang Gao · Xu Yang
|
||
Capturing Closely Interacted Two-Person Motions with Reaction Priors
Qi Fang · Yinghui Fan · Yanjun Li · Junting Dong · Dingwei Wu · Weidong Zhang · Kang Chen
|
||
PredToken: Predicting Unknown Tokens and Beyond with Coarse-to-Fine Iterative Decoding
Xuesong Nie · Haoyuan Jin · Yunfeng Yan · Xi Chen · Zhihang Zhu · Donglian Qi
|
||
Learning Object State Changes in Videos: An Open-World Perspective
Zihui Xue · Kumar Ashutosh · Kristen Grauman
|
||
Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images
JungEun Kim · Hangyul Yoon · Geondo Park · Kyungsu Kim · Eunho Yang
|
||
PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos
Qi Zhao · M. Salman Asif · Zhan Ma
|
||
G$^3$-LQ: Marrying Hyperbolic Alignment with Explicit Semantic-Geometric Modeling for 3D Visual Grounding
Yuan Wang · Yali Li · Shengjin Wang
|
||
NightCC: Nighttime Color Constancy via Adaptive Channel Masking
Shuwei Li · Robby T. Tan
|
||
DYSON: Dynamic Feature Space Self-Organization for Online Task-Free Class Incremental Learning
Yuhang He · YingJie Chen · Yuhan Jin · Songlin Dong · Xing Wei · Yihong Gong
|
||
Harnessing Large Language Models for Training-free Video Anomaly Detection
Luca Zanella · Willi Menapace · Massimiliano Mancini · Yiming Wang · Elisa Ricci
|
||
ODCR: Orthogonal Decoupling Contrastive Regularization for Unpaired Image Dehazing
Zhongze Wang · Haitao Zhao · Jingchao Peng · Lujian Yao · Kaijie Zhao
|