Skip to yearly menu bar Skip to main content


Show Detail Timezone:
America/Los_Angeles
 
Filter Rooms:  

MON 17 JUN
7 a.m.
Break:
(ends 9:00 AM)
8:30 a.m.
Workshop:
(ends 1:00 PM)
Workshop:
(ends 5:30 PM)
Workshop:
(ends 5:30 PM)
Workshop:
(ends 5:30 PM)
Workshop:
(ends 12:00 PM)
Workshop:
(ends 5:30 PM)
Workshop:
(ends 6:00 PM)
Workshop:
(ends 5:30 PM)
Workshop:
(ends 5:30 PM)
Workshop:
(ends 12:30 PM)
8:45 a.m.
Workshop:
(ends 12:45 PM)
10 a.m.
Break:
(ends 11:00 AM)
noon
Break:
(ends 1:45 PM)
12:45 p.m.
Workshop:
(ends 6:05 PM)
3 p.m.
Break:
(ends 4:00 PM)

TUE 18 JUN
7 a.m.
Break:
(ends 9:00 AM)
7:50 a.m.
8:20 a.m.
Workshop:
(ends 5:40 PM)
8:30 a.m.
Workshop:
(ends 5:45 PM)
Workshop:
(ends 1:00 PM)
Workshop:
(ends 12:00 PM)
Workshop:
(ends 5:30 PM)
Workshop:
(ends 1:30 PM)
Workshop:
(ends 5:30 PM)
Workshop:
(ends 5:30 PM)
Tutorial:
(ends 12:00 PM)
9:30 a.m.
Workshop:
(ends 5:30 PM)
10 a.m.
Break:
(ends 11:00 AM)
noon
Break:
(ends 1:45 PM)
3 p.m.
Break:
(ends 4:00 PM)

WED 19 JUN
7 a.m.
Break:
(ends 9:00 AM)
8:30 a.m.
Remarks:
(ends 9:00 AM)
9 a.m.
Orals 9:00-10:30
[9:00] Specularity Factorization for Low-Light Enhancement
[9:18] FlowIE: Efficient Image Enhancement via Rectified Flow
[9:36] Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach
[9:54] Bilateral Event Mining and Complementary for Event Stream Super-Resolution
[10:12] FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring
(ends 10:30 AM)
Orals 9:00-10:30
[9:00] GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors
[9:18] Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
[9:36] Eclipse: Disambiguating Illumination and Materials using Unintended Shadows
[9:54] Objects as Volumes: A Stochastic Geometry View of Opaque Solids
[10:12] DiffusionLight: Light Probes for Free by Painting a Chrome Ball
(ends 10:30 AM)
Orals 9:00-10:30
[9:00] MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild
[9:18] URHand: Universal Relightable Hands
[9:36] Relightable Gaussian Codec Avatars
[9:54] Semantic Human Mesh Reconstruction with Textures
[10:12] Stratified Avatar Generation from Sparse Observations
(ends 10:30 AM)
10:30 a.m.
Demonstration:
(ends 6:45 PM)
Posters 10:30-12:00
(ends 12:00 PM)
noon
Break:
(ends 2:00 PM)
1 p.m.
Orals 1:00-2:30
[1:00] FreeU: Free Lunch in Diffusion U-Net
[1:18] Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
[1:36] Instruct-Imagen: Image Generation with Multi-modal Instruction
[1:54] Attention Calibration for Disentangled Text-to-Image Personalization
[2:12] Style Aligned Image Generation via Shared Attention
(ends 2:30 PM)
Orals 1:00-2:30
[1:00] Neural Redshift: Random Networks are not Random Functions
[1:18] Neural Lineage
[1:36] Learning Structure-from-Motion with Graph Attention Networks
[1:54] Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
[2:12] In Search of a Data Transformation That Accelerates Neural Field Training
(ends 2:30 PM)
Orals 1:00-2:30
[1:00] Point Transformer V3: Simpler Faster Stronger
[1:18] Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
[1:36] Seeing the World through Your Eyes
[1:54] Tri-Perspective View Decomposition for Geometry-Aware Depth Completion
[2:12] Steerers: A Framework for Rotation Equivariant Keypoint Descriptors
(ends 2:30 PM)
1:15 p.m.
Expo Track Keynote:
Swami Sivasubramanian
(ends 2:15 PM)
2:30 p.m.
Break:
(ends 2:45 PM)
2:45 p.m.
Keynote:
Joshua Bongard
(ends 3:45 PM)
3:45 p.m.
Break:
(ends 4:00 PM)
4 p.m.
Panel:
Fei-Fei Li · Matt McIlwain · Hadi Partovi · Oren Etzioni · Peter Lee
(ends 5:00 PM)
5 p.m.
Posters 5:00-6:30
(ends 6:30 PM)

THU 20 JUN
7:30 a.m.
Break:
(ends 9:00 AM)
9 a.m.
Orals 9:00-10:30
[9:00] Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
[9:18] EscherNet: A Generative Model for Scalable View Synthesis
[9:36] WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects Under Occlusion
[9:54] Diffusion-FOF: Single-View Clothed Human Reconstruction via Diffusion-Based Fourier Occupancy Field
[10:12] Rethinking Inductive Biases for Surface Normal Estimation
(ends 10:30 AM)
Orals 9:00-10:30
[9:00] Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods
[9:18] MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
[9:36] Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
[9:54] LISA: Reasoning Segmentation via Large Language Model
[10:12] Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
(ends 10:30 AM)
Orals 9:00-10:30
[9:00] EventPS: Real-Time Photometric Stereo Using an Event Camera
[9:18] EvDiG: Event-guided Direct and Global Components Separation
[9:36] MemSAM: Taming Segment Anything Model for Echocardiography Video Segmentation
[9:54] Transcriptomics-guided Slide Representation Learning in Computational Pathology
[10:12] Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration
(ends 10:30 AM)
10:30 a.m.
Demonstration:
(ends 6:45 PM)
Posters 10:30-12:00
(ends 12:00 PM)
11:30 a.m.
Talk:
(ends 1:30 PM)
noon
Break:
(ends 2:00 PM)
1 p.m.
Orals 1:00-2:30
[1:00] SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
[1:18] UnO: Unsupervised Occupancy Fields for Perception and Forecasting
[1:36] EgoGen: An Egocentric Synthetic Data Generator
[1:54] Learning to Segment Referred Objects from Narrated Egocentric Videos
[2:12] Producing and Leveraging Online Map Uncertainty in Trajectory Prediction
(ends 2:30 PM)
Orals 1:00-2:30
[1:00] SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
[1:18] SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency
[1:36] PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness
[1:54] PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar
[2:12] A Subspace-Constrained Tyler's Estimator and its Applications to Structure from Motion
(ends 2:30 PM)
Orals 1:00-2:30
[1:00] Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
[1:18] An N-Point Linear Solver for Line and Motion Estimation with Event Cameras
[1:36] RoHM: Robust Human Motion Reconstruction via Diffusion
[1:54] Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation
[2:12] FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment
(ends 2:30 PM)
2:30 p.m.
Break:
(ends 2:45 PM)
2:45 p.m.
Keynote:
David Baker
(ends 3:45 PM)
3:45 p.m.
Break:
(ends 4:00 PM)
4 p.m.
Meeting:
(ends 5:00 PM)
5 p.m.
Posters 5:00-6:30
(ends 6:30 PM)
7 p.m.
Reception:
(ends 9:00 PM)

FRI 21 JUN
8 a.m.
Break:
(ends 9:30 AM)
9 a.m.
Expo Track Keynote:
Ece Kamar
(ends 10:00 AM)
Orals 9:00-10:30
[9:00] Deep Generative Model based Rate-Distortion for Image Downscaling Assessment
[9:18] 360+x: A Panoptic Multi-modal Scene Understanding Dataset
[9:36] Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
[9:54] Rich Human Feedback for Text-to-Image Generation
[10:12] BioCLIP: A Vision Foundation Model for the Tree of Life
(ends 10:30 AM)
Orals 9:00-10:30
[9:00] Grounding and Enhancing Grid-based Models for Neural Fields
[9:18] NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation
[9:36] Mip-Splatting: Alias-free 3D Gaussian Splatting
[9:54] pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
[10:12] Learning to Produce Semi-dense Correspondences for Visual Localization
(ends 10:30 AM)
Orals 9:00-10:30
[9:00] CroSel: Cross Selection of Confident Pseudo Labels for Partial-Label Learning
[9:18] MLP Can Be A Good Transformer Learner
[9:36] From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation
[9:54] LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
[10:12] Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps
(ends 10:30 AM)
10:30 a.m.
Posters 10:30-12:00
(ends 12:00 PM)
Demonstration:
(ends 6:45 PM)
noon
Break:
(ends 2:00 PM)
1 p.m.
Orals 1:00-2:30
[1:00] LDP: Language-driven Dual-Pixel Image Defocus Deblurring Network
[1:18] S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data
[1:36] Task-Driven Wavelets using Constrained Empirical Risk Minimization
[1:54] Image Processing GNN: Breaking Rigidity in Super-Resolution
[2:12] DART: Implicit Doppler Tomography for Radar Novel View Synthesis
(ends 2:30 PM)
Orals 1:00-2:30
[1:00] Alchemist: Parametric Control of Material Properties with Diffusion Models
[1:18] Generative Image Dynamics
[1:36] Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
[1:54] MonoHair: High-Fidelity Hair Modeling from a Monocular Video
[2:12] Analyzing and Improving the Training Dynamics of Diffusion Models
(ends 2:30 PM)
Orals 1:00-2:30
[1:00] InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
[1:18] Describing Differences in Image Sets with Natural Language
[1:36] NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models
[1:54] MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning
[2:12] EGTR: Extracting Graph from Transformer for Scene Graph Generation
(ends 2:30 PM)
2:30 p.m.
Break:
(ends 2:45 PM)
2:45 p.m.
Keynote:
Sofia Crespo
(ends 3:45 PM)
3:45 p.m.
Break:
(ends 4:00 PM)
4 p.m.
Panel:
Dima Damen · Cordelia Schmid · Ranjay Krishna
(ends 5:00 PM)
5 p.m.
Posters 5:00-6:30
(ends 6:30 PM)