CVPR2022论文列表（中英对照）

Cascade Transformers for End-to-End Person Search用于端到端人员搜索的级联变压器
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning结构化变分跨图对应学习的组合时间基础
Long-Tailed Recognition via Weight Balancing通过权重平衡识别长尾
InfoGCN: Representation Learning for Human Skeleton-based Action RecognitionInfoGCN：基于人体骨骼的动作识别的表示学习
Interactive Geometry Editing of Neural Radiance Fields神经辐射场的交互式几何编辑
MLSLT: Towards Multilingual Sign Language TranslationMLSLT：迈向多语言手语翻译
360MonoDepth: High-Resolution 360° Monocular Depth Estimation360MonoDepth：高分辨率 360° 单目深度估计
Generating Diverse and Natural 3D Human Motions from textual descriptions从文本生成多样化和自然的 3D 人体运动
Masked-attention Mask Transformer for Universal Image Segmentation用于通用图像分割的 Masked-attention Mask Transformer
Pointly-Supervised Instance Segmentation点监督实例分割
A Closer Look at Few-shot Image Generation近距离观察少镜头图像生成
Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation学习用于多人姿势估计的局部-全局上下文适应
Neural 3D Scene Reconstruction with the Manhattan-world Assumption基于曼哈顿世界假设的神经 3D 场景重建
Masked Autoencoders Are Scalable Vision Learners蒙面自动编码器是可扩展的视觉学习者
De-rendering 3D Objects in the Wild在野外去渲染 3D 对象
Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction直接体素网格优化：辐射场重建的超快速收敛
Finding Badly Drawn Bunnies寻找画得不好的兔子
GradViT: Gradient Inversion of Vision TransformersGradViT：视觉变压器的梯度反转
On the Importance of Asymmetry for Siamese Representation Learning论不对称对连体表示学习的重要性
Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation用于无偏场景图生成的堆叠混合注意力和组协作学习
Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks遥感任务的自监督材料和纹理表示学习
Rethinking Efficient Lane Detection via Curve Modeling通过曲线建模重新思考高效车道检测
StyleT2I: Toward Compositional and High-Fidelity Text-to-Image SynthesisStyleT2I：走向组合和高保真文本到图像合成
Learning Fair Classifiers with Partially Annotated Group Labels学习具有部分注释组标签的公平分类器
Demystifying the Neural Tangent Kernel from a Practical Perspective: Can it be trusted for Neural Architecture Search without training?从实用的角度揭开神经切线内核的神秘面纱：无需训练就可以信任神经架构搜索吗？
Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis用于 3D 医学图像分析的 Swin Transformers 的自我监督预训练
A ConvNet for the 2020s2020 年代的卷积网络
Consistent 3D Scene Stylization as Stylized NeRF via 2D-3D Mutual Learning通过 2D-3D 相互学习将一致的 3D 场景风格化为风格化 NeRF
Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast通过像素到原型对比的弱监督语义分割
Connecting the Complementary-view Videos: Joint Camera Identification and Subject Association连接互补视图视频：联合相机识别和主题关联
Decoupled Knowledge Distillation解耦知识蒸馏
Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation未配对图像到图像转换的最大空间扰动一致性
Compound Domain Generalization via Meta-Knowledge Encoding基于元知识编码的复合域泛化
Bilateral Video Magnification Filter双边视频放大滤镜
EDTER: Edge Detection with TransformerEDTER：使用 Transformer 进行边缘检测
Structure-Aware Motion Transfer with Deformable Anchor Model具有可变形锚模型的结构感知运动传递
Attentive Fine-Grained Structured Sparsity for Image Restoration用于图像恢复的细粒度结构稀疏性
Sign Language Video Retrieval with Free-Form Textual Queries具有自由格式文本查询的手语视频检索
SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted SystemsSplitNets：为头戴式系统上的高效分布式计算设计神经架构
Neural Mean Discrepancy for Efficient Out-of-Distribution Detection用于有效分布外检测的神经平均差异
LAKe-Net: Topology-Aware Point Cloud Completion by Localizing Aligned KeypointsLAKe-Net：通过定位对齐的关键点完成拓扑感知点云
Focal and Global Knowledge Distillation for Detectors探测器的焦点和全局知识蒸馏
Enhancing Adversarial Robustness for Deep Metric Learning增强深度度量学习的对抗鲁棒性
Novel Class Discovery in Semantic Segmentation语义分割中的新类发现
IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding AlignmentIDEA-Net：通过深度嵌入对齐的动态 3D 点云插值
WarpingGAN:Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation为对抗性 3D 点云生成扭曲多个均匀先验
Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection重新思考重构基于自动编码器的分布外检测
HyperDet3D: Learning a Scene-Conditioned 3D Object DetectorHyperDet3D：学习基于场景的 3D 物体检测器
Deep Decomposition for Stochastic Normal-Abnormal Transport随机正常-异常传输的深度分解
Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production大规模手语：学习为大规模逼真的手语制作共同发音标志
Self-supervised Video Transformers自监督视频转换器
HLRTF: Hierarchical Low-Rank Tensor Factorization for Inverse Problems in Multi-Dimensional ImagingHLRTF：多维成像中逆问题的分层低秩张量分解
φ-SfT: Shape-from-Template with a Physics-based Deformation Modelφ-SfT：具有基于物理的变形模型的模板形状
Boosting View Synthesis with Residual Transfer使用残差转移促进视图合成
DINE: Domain Adaptation from Single and Multiple Black-box PredictorsDINE：来自单个和多个黑盒预测器的域适应
Occluded Human Mesh Recovery遮挡人体网格恢复
Understanding Uncertainty Maps in Vision with Statistical Testing通过统计测试了解视觉中的不确定性图
Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets在分析汇集的神经影像数据集时，等方差允许处理多个讨厌的变量
Learning from Pixel-Level Label Noise: A New Perspective for Light Field Salient Object Detection从像素级标签噪声中学习：光场显着目标检测的新视角
Self-Supervised Global-Local Structure Modeling for Point Cloud Domain Adaptation with Reliable Voted Pseudo Labels具有可靠投票伪标签的点云域自适应的自监督全局-局部结构建模
Towards An End-to-End Framework for Flow-Guided Video Inpainting面向流引导视频修复的端到端框架
E-CIR: Event-Enhanced Continuous Intensity RecoveryE-CIR：事件增强的连续强度恢复
Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization using Satellite Image超越跨视图图像检索：使用卫星图像进行高度准确的车辆定位
Unsupervised Hierarchical Semantic Segmentation with Multiview Cosegmentation and Clustering Transformers具有多视图 Cosegmentation 和 Clustering Transformers 的无监督分层语义分割
Forward Propagation, Backward Regression and Pose Association for Hand Tracking in the Wild野外手部追踪的前向传播、后向回归和姿势关联
FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in VideosFERV39k：用于视频中面部表情识别的大规模多场景数据集
Efficient Neural Radiance Fields高效的神经辐射场
Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurementsRobust Equivariant Imaging：一个完全无监督的框架，用于从噪声和部分测量中学习图像
HumanNeRF: Efficiently Generated Human Radiance Field from Sparse InputsHumanNeRF：从稀疏输入高效生成人体辐射场
Attributable Visual Similarity Learning可归因的视觉相似性学习
Efficient Multi-view Stereo by Iterative Dynamic Cost Volume通过迭代动态成本量实现高效的多视图立体
Replacing Labeled Real-image Datasets with Auto-generated Contours用自动生成的轮廓替换标记的真实图像数据集
SOMSI: Spherical Novel View Synthesis with Soft Occlusion Multi-Sphere ImagesSOMSI：具有软遮挡多球面图像的球面新视图合成
AutoSDF: Shape Priors for 3D Completion, Reconstruction, and GenerationAutoSDF：用于 3D 完成、重建和生成的形状先验
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio DescriptionsMAD：电影音频描述视频语言基础的可扩展数据集
PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image DecompositionPIE-Net：用于内在图像分解的光度不变边缘引导网络
DST: Dynamic Substitute Training for Data-free Black-box AttackDST：无数据黑盒攻击的动态替代训练
HCSC: Hierarchical Contrastive Selective CodingHCSC：分层对比选择性编码
Towards Diverse and Natural Scene-aware 3D Human Motion Synthesis迈向多样化和自然的场景感知 3D 人体运动合成
Inertia-Guided Flow Completion and Style Fusion for Video Inpainting用于视频修复的惯性引导流完成和样式融合
PlaneMVS: 3D Plane Reconstruction from Multi-View StereoPlaneMVS：从多视图立体重建 3D 平面
Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance FieldsRef-NeRF：神经辐射场的结构化视图相关外观
Interactiveness Field of Human-Object Interactions人与物交互的交互领域
Learning Memory-Augmented Unidirectional Metrics for Cross-modality Person Re-identification学习用于跨模态人员重新识别的记忆增强单向度量
Event-based Video Reconstruction via Potential-assisted Spiking Neural Network通过电位辅助尖峰神经网络进行基于事件的视频重建
SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object DetectionSIGMA：用于域自适应对象检测的语义完整图匹配
Surface Reconstruction from Point Clouds by Learning Predictive Context Priors通过学习预测上下文先验从点云重建表面
Active Teacher for Semi-Supervised Object Detection半监督目标检测的主动教师
Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning非示例类增量学习的自我维持表示扩展
RCL: Recurrent Continuous Localization for Temporal Action DetectionRCL：用于时间动作检测的循环连续定位
GroupNet: Multiscale Hypergraph Neural Networks for Trajectory Prediction with Relational ReasoningGroupNet：使用关系推理进行轨迹预测的多尺度超图神经网络
SPAMs: Structured Implicit Parametric Models垃圾邮件：结构化隐式参数模型
A Keypoint-based Global Association Network for Lane Detection基于关键点的车道检测全球关联网络
Weakly Supervised Semantic Segmentation using Out-of-Distribution Data使用分布外数据的弱监督语义分割
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and AlignmentBasicVSR++：通过增强的传播和对齐提高视频超分辨率
Investigating Tradeoffs in Real-World Video Super-Resolution调查现实世界视频超分辨率的权衡
OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object InteractionOakInk：用于理解手物交互的大型知识库
Bending Graphs: Hierarchical Shape Matching using Gated Optimal Transport弯曲图：使用门控最优传输的分层形状匹配
The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization规范必须继续：通过规范化进行动态无监督域适应
SimT: Handling Open-set Noise for Domain Adaptive Semantic SegmentationSimT：处理域自适应语义分割的开放集噪声
Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation用于引用视频对象分割的语言桥接时空交互
Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification基于图采样的深度度量学习用于可泛化的人员重新识别
Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion基于运动不确定性扩散的随机轨迹预测
Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation半监督语义分割的无偏子类正则化
Stratified Transformer for 3D Point Cloud Segmentation用于 3D 点云分割的分层变压器
Cloning Outfits from Real-World Images to 3D Characters for Generalizable Person Re-Identification将现实世界图像中的服装克隆为 3D 角色以进行可概括的人物重新识别
ImplicitAtlas: Learning Deformable Shape Templates in Medical ImagingImplicitAtlas：学习医学成像中的可变形形状模板
Sparse Instance Activation for Real-Time Instance Segmentation实时实例分割的稀疏实例激活
Pastiche Master: Exemplar-Based High-Resolution Portrait Style TransferPastiche Master：基于示例的高分辨率肖像风格转移
Unsupervised Image-to-Image Translation with Generative Prior具有生成先验的无监督图像到图像翻译
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation用于协同语音手势生成的学习分层跨模式关联
Versatile Multi-Modal Pre-Training for Human-Centric Perception用于以人为中心的感知的多功能多模态预训练
Instance-wise Occlusion and Depth Orders in Natural Scenes自然场景中的实例遮挡和深度顺序
Degradation-agnostic Correspondence from Resolution-asymmetric Stereo来自分辨率非对称立体声的与退化无关的对应
No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces没有痛苦，收获很大：通过拟合特征级时空表面，用静态模型对动态点云序列进行分类
Multi-Dimensional with Intensity: A Crowd-sourced Method for Measuring the Perception of Facial Expression具有强度的多维：一种用于测量面部表情感知的众包方法
Class-Incremental Learning with Strong Pretrained Models具有强预训练模型的类增量学习
A Patch-centric Error Analysis of Image Super-Resolution图像超分辨率的以块为中心的误差分析
IFOR: Iterative Flow Minimization for Robotic Object RearrangementIFOR：机器人对象重排的迭代流最小化
3D-aware Image Synthesis via Learning Structural and Textural Representations通过学习结构和纹理表示进行 3D 感知图像合成
DeeCap: Dynamic Early Exiting for Efficient Image CaptioningDeeCap：用于高效图像字幕的动态提前退出
GAN-Supervised Dense Visual AlignmentGAN监督的密集视觉对齐
Multilayer GAN Inversion and Editing多层 GAN 反转和编辑
On Aliased Resizing and Surprising Subtleties in GAN Evaluation关于 GAN 评估中的别名调整大小和令人惊讶的细微之处
Learning Pixel Trajectories with Multiscale Contrastive Random Walks使用多尺度对比随机游走学习像素轨迹
Comparing Correspondences: Video Prediction with Correspondences-wise Losses比较对应：视频预测与对应损失
Mix and Localize: Localizing Sound Sources from Mixtures混音和本地化：从混音中本地化声源
AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D PerceptionAziNorm：利用点云的径向对称性进行方位归一化 3D 感知
Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-time用于实时动态辐射场渲染的傅里叶 PlenOctrees
Point Cloud Pre-training with Natural 3D Structures使用自然 3D 结构进行点云预训练
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding将更多注意力转移到视觉骨干上：用于端到端视觉基础的查询调制细化网络
Video K-Net: A Simple, Strong, and Unified Baseline for Video SegmentationVideo K-Net：一个简单、强大、统一的视频分割基线
Mr.BiQ: Post-Training Non-Uniform Quantization based on Minimizing the Reconstruction ErrorMr.BiQ：基于最小化重构误差的训练后非均匀量化
Drop the GAN: In Defense of Patches Nearest Neighbors as Single Image Generative Models放弃 GAN：保护最近邻的补丁作为单图像生成模型
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video RecognitionMeMViT：用于高效长期视频识别的记忆增强多尺度视觉转换器
MS-TCT: Multi-Scale Temporal ConvTransformer for Action DetectionMS-TCT：用于动作检测的多尺度时间 ConvTransformer
Reversible Vision Transformers可逆视觉变形金刚
RigNeRF: Fully Controllable Neural 3D PortraitsRigNeRF：完全可控的神经 3D 肖像
Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation重新思考多视图立体的深度估计：统一表示
Integrative Few-Shot Learning for Classification and Segmentation用于分类和分割的集成少样本学习
Learning Affordance Grounding from Exocentric Images从离中心图像中学习可供性基础
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection用于通用事件边界检测的多级密集差分图的渐进式注意
Exploring Geometry Consistency for monocular 3D object detection探索单目 3D 对象检测的几何一致性
Visual Abductive Reasoning视觉溯因推理
Putting People in their Place: Monocular Regression of 3D People in Depth把人放在他们的位置上：3D 人物深度的单目回归
Exploiting Explainable Metrics for Augmented SGD利用增强 SGD 的可解释指标
Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation重新思考用于半监督体积医学图像分割的贝叶斯深度学习方法
A Hybrid Quantum-Classical Algorithm for Robust Fitting一种用于鲁棒拟合的混合量子经典算法
Dataset Distillation by Matching Training Trajectories通过匹配训练轨迹进行数据集蒸馏
DiLiGenT10^2: A Photometric Stereo Benchmark Dataset with Controlled Shape and Material VariationDiLiGenT10^2：具有受控形状和材料变化的光度立体基准数据集
Scene Representation Transformer场景表示转换器
ConDor: Self-Supervised Canonicalization of 3D Pose for Partial ShapesConDor：部分形状的 3D 姿势的自我监督规范化
Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion学习倾听：非确定性二元面部运动建模
Injecting Visual Concepts into End-to-End Image Captioning将视觉概念注入端到端的图像字幕
Learning Neural Light Fields with Ray-Space Embedding Networks使用光线空间嵌入网络学习神经光场
What’s in your hands? 3D Reconstruction of Generic Objects in Hands你手里有什么？手中通用对象的 3D 重建
Virtual Correspondences: Human as a Cue for Extreme-View Geometry虚拟通信：人类作为极端视图几何的线索
Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering通过联合表示学习和在线聚类进行无监督活动分割
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation RecognitionTransRank：通过基于排名的转换识别进行自监督视频表示学习
SketchEdit: Mask-Free Local Image Manipulation with Partial SketchesSketchEdit：使用部分草图进行无蒙版局部图像处理
GroupViT: Zero-Shot Transfer to Semantic Segmentation with Text SupervisionGroupViT：零样本转移到带有文本监督的语义分割
LSVC: A Learning-based Stereo Video Compression FrameworkLSVC：基于学习的立体视频压缩框架
BEHAVE: Dataset and Method for Tracking Human Object InteractionsBEHAVE：跟踪人类对象交互的数据集和方法
Learning to Align Sequential Actions in the Wild在野外学习对齐顺序动作
Motion-from-Blur: 3D Shape and Motion Estimation of Motion-blurred Objects in VideosMotion-from-Blur：视频中运动模糊对象的 3D 形状和运动估计
Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction通过学习的物理模拟和功能预测修复故障对象
Simulated Adversarial Testing of Face Recognition Models人脸识别模型的模拟对抗测试
GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping目标：为手物体抓取生成 4D 全身运动
Ensembling Off-the-shelf Models for GAN Training为 GAN 训练集成现成模型
Global Tracking Transformers全球追踪变形金刚
Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline可见热无人机跟踪：大规模基准和新基线
Joint Global and Local Hierarchical Priors for Learned Image Compression用于学习图像压缩的联合全局和局部分层先验
D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object InteractionsD-Grasp：用于手物交互的物理上合理的动态抓取合成
Human-Aware Object Placement for Visual Environment Reconstruction用于视觉环境重建的人类感知对象放置
Dual-path Image Inpainting with Auxiliary GAN Inversion具有辅助 GAN 反转的双路径图像修复
Accurate 3D Body Shape Regression using Metric and Semantic Attributes使用度量和语义属性进行准确的 3D 身体形状回归
BARC: Learning to Regress 3D Dog Shape from Images by Exploiting Breed InformationBARC：通过利用品种信息学习从图像中回归 3D 狗形状
Capturing and Inferring Dense Full-Body Human-Scene Contact捕获和推断密集的全身人体场景接触
Not All Labels Are Equal: Rationalizing The Labeling Costs for Training Object Detection并非所有标签都是平等的：合理化训练对象检测的标签成本
Background Activation Suppression for Weakly Supervised Object Localization弱监督目标定位的背景激活抑制
Attribute Group Editing for Reliable Few-shot Image Generation属性组编辑用于可靠的少镜头图像生成
Negative-aware Attention for Image-Text Matching图像-文本匹配的负意识注意
Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects看它移动：无监督地发现 3D 关节以重新定位铰接物体
TransWeather: Transformer-based Restoration of Images Degraded by Adverse Weather ConditionsTransWeather：基于变压器的恶劣天气条件下图像的恢复
HyperTransformer: A Textural and Spectral Feature Fusion Transformer for PansharpeningHyperTransformer：用于全色锐化的纹理和光谱特征融合转换器
gDNA: Towards Generative Detailed Neural AvatarsgDNA：迈向生成详细的神经化身
CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural HomeomorphismCaDeX：通过神经同胚学习动态表面表示的规范变形坐标空间
BACON: Band-limited Coordinate Networks for Multiscale Scene RepresentationBACON：用于多尺度场景表示的带限坐标网络
Revisiting Near/Remote Sensing with Geospatial Attention用地理空间注意力重新审视近/遥感
Simple multi-dataset detection简单的多数据集检测
Generalizable Cross-modality Medical Image Segmentation via Style Augmentation and Dual Normalization通过风格增强和双重归一化的可泛化跨模态医学图像分割
Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation用于 LiDAR 语义分割的点到体素知识蒸馏
Online Convolutional Re-parameterization在线卷积重新参数化
Neural Inertial Localization神经惯性定位
MNSRNet: Multimodal Transformer Network for 3D Surface Super-ResolutionMNSRNet：用于 3D 表面超分辨率的多模态变压器网络
Unsupervised Pre-training for Temporal Action Localization Tasks时间动作定位任务的无监督预训练
Augmented Geometric Distillation for Data-Free Incremental Person ReID无数据增量人员 ReID 的增强几何蒸馏
HEAT: Holistic Edge Attention Transformer for Structured ReconstructionHEAT：用于结构化重建的整体边缘注意力转换器
NomMer: Nominate Synergistic Context in Vision Transformer for Visual RecognitionNomMer：在视觉转换器中为视觉识别指定协同上下文
ContrastMask: Contrastive Learning to Segment Every ThingContrastMask：对比学习来分割每一件事
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression用于高效神经图像压缩的统一多元高斯混合
CoordGAN: Self-Supervised Dense Correspondences Emerge from GANsCoordGAN：来自 GAN 的自我监督密集通信
MAT: Mask-Aware Transformer for Large Hole Image InpaintingMAT：用于大孔图像修复的掩模感知变压器
A Comprehensive Study of End-to-End Temporal Action Detection端到端时间动作检测的综合研究
Rethinking Image Cropping: Exploring Diverse Compositions from Global Views重新思考图像裁剪：从全局视图中探索多样化的构图
OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D ReconstructionOcclusionFusion：实时动态 3D 重建的遮挡感知运动估计
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose EstimationMHFormer：用于 3D 人体姿势估计的多假设变换器
Asynchronous Event-based Graph-Neural Networks基于异步事件的图神经网络
RAMA: A Rapid Multicut Algorithm on GPURAMA：GPU 上的快速多切算法
EvUnroll: Neuromorphic Events based Rolling Shutter Image CorrectionEvUnroll：基于神经形态事件的滚动快门图像校正
Cycle-Consistent Counterfactuals by Latent Transformations潜在变换的循环一致反事实
Understanding 3D Object Articulation in Internet Videos了解互联网视频中的 3D 对象衔接
Synthetic Generation of Face Videos with Plethysmograph Physiology用体积描记器生理学合成人脸视频
MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object DetectionMonoJSG：单目 3D 对象检测的联合语义和几何成本量
Neural Architecture Search with Representation Mutual Information具有表示互信息的神经架构搜索
Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning基于高斯的对比建议学习的弱监督时间句子接地
Blind2Unblind: Self-Supervised Image Denoising with Visible Blind SpotsBlind2Unblind：具有可见盲点的自我监督图像去噪
Semi-Supervised Object Detection via Multi-instance Alignment with Global Class Prototypes基于全局类原型的多实例对齐的半监督目标检测
Fine-Grained Predicates Learning for Scene Graph Generation用于场景图生成的细粒度谓词学习
Meta Distribution Alignment for Generalizable Person Re-Identification可泛化人员重新识别的元分布对齐
Align Representations with Base: A New Approach to Self-Supervised Learning将表示与基础对齐：一种自我监督学习的新方法
Style-Based Global Appearance Flow for Virtual Try-On基于样式的虚拟试穿全局外观流程
Learning Semantic Associations for Mirror Detection学习镜像检测的语义关联
Task Decoupled Framework for Reference-based Super-Resolution基于参考的超分辨率的任务解耦框架
Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement超越语义到实例分割：通过语义知识转移和自我完善的弱监督实例分割
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction用于高效高光谱图像重建的掩模引导光谱变换器
GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic CamerasGLAMR：使用动态相机进行全局遮挡感知人体网格恢复
Fast and Unsupervised Action Boundary Detection for Action Segmentation用于动作分割的快速且无监督的动作边界检测
Neural MoCon: Neural Motion Control for Physically Plausible Human Motion Capture神经 MoCon：用于物理上合理的人体运动捕捉的神经运动控制
Unified Transformer Tracker for Object Tracking用于对象跟踪的统一 Transformer Tracker
NeuralHOFusion: Neural Volumetric Rendering under Human-object InteractionsNeuralHOFusion：人机交互下的神经体积渲染
H $^2$ FA R-CNN: Holistic and Hierarchical Feature Alignment for Cross-domain Weakly Supervised Object DetectionH $^2$ FA R-CNN：跨域弱监督目标检测的整体和分层特征对齐
ICON: Implicit Clothed humans Obtained from Normals图标：从法线获得的隐式穿衣人类
Semantic-Aware Domain Generalized Segmentation语义感知领域广义分割
ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose EstimationZebraPose：用于 6DoF 对象姿态估计的粗到细表面编码
Detecting Deepfakes with Self-Blended Images使用自混合图像检测 Deepfake
Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization任意风格迁移和域泛化的精确特征分布匹配
FreeSOLO: Learning to Segment Objects without AnnotationsFreeSOLO：学习在没有注释的情况下分割对象
Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage通过生成梯度泄漏审计联邦学习中的隐私防御
Differentially Private Federated Learning with Local Regularization and Sparsification局部正则化和稀疏化的差分私有联邦学习
Modeling 3D Layout For Group Re-Identification为组重新识别建模 3D 布局
DASO: Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised LearningDASO：不平衡半监督学习的面向分布的语义导向伪标签
Structured Local Radiance Fields for Human Avatar Modeling用于人体化身建模的结构化局部辐射场
Contrastive Regression for Domain Adaptation on Gaze Estimation凝视估计领域适应的对比回归
Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition用于半监督动作识别的跨模型伪标签
Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification联合分布问题：Few-Shot 分类的深度布朗距离协方差
Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation树能量损失：走向稀疏注释的语义分割
Learning Second Order Local Anomaly for General Face Forgery Detection学习用于一般人脸伪造检测的二阶局部异常
LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer NetworkLGT-Net：使用几何感知变压器网络进行室内全景房间布局估计
Audio-Adaptive Activity Recognition Across Video Domains跨视频域的音频自适应活动识别
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective迈向稳健和自适应运动预测：因果表示视角
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos以自我为中心的视频的联合手部运动和交互热点预测
Omnivore: A Single Model for Many Visual Modalities杂食动物：多种视觉形式的单一模型
Multi-Frame Self-Supervised Depth with Transformers带有变形金刚的多帧自监督深度
Voice-Face Homogeneity Tells Deepfake声脸同质性告诉 Deepfake
Representation Compensation Networks for Continual Semantic Segmentation连续语义分割的表示补偿网络
Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation弥合在视觉和语言导航的离散和连续环境中学习之间的差距
FLAVA: A Foundational Language And Vision Alignment ModelFLAVA：基础语言和视觉对齐模型
Vision Prompt Tuning视觉提示调整
Vehicle trajectory prediction works, but not everywhere车辆轨迹预测有效，但并非无处不在
Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification用于孤立摄像机监督人员重新识别的摄像机条件稳定特征生成
ReSTR: Convolution-free Referring Image Segmentation Using TransformersReSTR：使用 Transformers 进行无卷积的参考图像分割
DATA: Domain-Aware and Task-Aware Self-supervised Learning数据：领域感知和任务感知自监督学习
Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval无忧素描：基于素描的抗噪图像检索
Balanced MSE for Imbalanced Visual Regression用于不平衡视觉回归的平衡 MSE
The Devil Is in the Details: Window-based Attention for Image Compression细节中的魔鬼：图像压缩的基于窗口的注意力
DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in VideosDeltaCNN：视频中稀疏帧差异的端到端 CNN 推断
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud UnderstandingCrossPoint：用于 3D 点云理解的自监督跨模态对比学习
Video Frame Interpolation Transformer视频帧插值转换器
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling通过鲁棒的跨模态伪标签进行开放词汇实例分割
LASER: LAtent SpacE Rendering for 2D Visual LocalizationLASER：用于 2D 视觉定位的潜在空间渲染
LaTr: Layout-Aware Transformer for Scene-Text VQALaTr：用于场景文本 VQA 的布局感知转换器
Universal Photometric Stereo Network using Global Lighting Contexts使用全局光照上下文的通用光度立体网络
Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training引导 ViT：从预训练中解放视觉变形金刚
Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models随机反向传播：一种用于训练视频模型的内存高效策略
Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic MemoryBailando：具有编排记忆的演员评论家 GPT 的 3D 舞蹈生成
Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis用于 3D 感知图像合成的多视图一致生成对抗网络
AdaViT: Adaptive Tokens for Efficient Vision TransformerAdaViT：高效视觉转换器的自适应令牌
Neural Template: Topology-aware Reconstruction and Disentangled Generation of 3D Meshes神经模板：拓扑感知重建和解缠结生成 3D 网格
CRAFT: Cross-Attentional Flow Transformer for Robust Optical FlowCRAFT：用于鲁棒光流的跨注意力流转换器
Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition用于基于 RGB-D 的运动识别的时空表示的解耦和重新耦合
Cross-Modal Transferable Adversarial Attacks from Images to Videos从图像到视频的跨模态可转移对抗攻击
PTTR: Relational 3D Point Cloud Object Tracking with TransformerPTTR：使用 Transformer 进行关系 3D 点云对象跟踪
Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds点云的变形和对应感知无监督合成到真实场景流估计
Lifelong Unsupervised Domain Adaptive Person Re-identification with Coordinated Anti-forgetting and Adaptation具有协同抗遗忘和适应能力的终身无监督域自适应人重新识别
Object Localization under Single Coarse Point Supervision单粗点监督下的目标定位
Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation通过代表性片段知识传播的弱监督时间动作本地化
TubeDETR: Spatio-Temporal Video Grounding with TransformersTubeDETR：使用变压器的时空视频接地
Reinforced Structured State-Evolution for Vision-Language Navigation用于视觉语言导航的强化结构化状态演化
Learning to Anticipate Future with Dynamic Context Removal通过动态上下文删除学习预测未来
Learning Program Representations for Food Images and Cooking Recipes食物图像和烹饪食谱的学习计划表示
Transferability Estimation using Bhattacharyya Class Separability使用 Bhattacharyya 类可分离性的可迁移性估计
LiDAR Snowfall Simulation for Robust 3D Object Detection用于稳健 3D 对象检测的 LiDAR 降雪模拟
Masked Feature Prediction for Vision Self-Supervised Pre-Training视觉自监督预训练的掩蔽特征预测
Unbiased Teacher v2: Semi-supervised Object Detection for Anchor-free and Anchor-based DetectorsUnbiased Teacher v2：无锚和基于锚的检测器的半监督目标检测
Shape from Polarization for Complex Scenes in the Wild野外复杂场景的极化形状
PhotoScene: Physically-Based Material and Lighting Transfer for Indoor ScenesPhotoScene：室内场景的基于物理的材质和照明传输
Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization通过节点到邻域互信息最大化的图中节点表示学习
Selective-Supervised Contrastive Learning with Noisy Labels带有噪声标签的选择性监督对比学习
LAVT: Language-Aware Vision Transformer for Referring Image SegmentationLAVT：用于参考图像分割的语言感知视觉转换器
L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic SegmentationL2G：用于弱监督语义分割的简单本地到全球知识转移框架
TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial EditingTransEditor：用于高度可控面部编辑的基于变换器的双空间 GAN
Leveraging Self-Supervision for Cross-Domain Crowd Counting利用自我监督进行跨域人群计数
Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency从未修剪的视频中学习：具有分层一致性的自我监督视频表示学习
TimeReplayer: Unlocking the Potential of Event Cameras for Video InterpolationTimeReplayer：释放事件摄像机用于视频插值的潜力
Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation弱监督语义分割的自监督图像特定原型探索
Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation用于域自适应语义分割的类平衡像素级自标记
Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences弱监督语义对应的概率扭曲一致性
DIFNet: Boosting Visual Information Flow for Image CaptioningDIFNet：提升图像字幕的视觉信息流
ScaleNet: A Shallow Architecture for Scale EstimationScaleNet：一种用于规模估计的浅层架构
HODOR: High-level Object Descriptors for Object Re-segmentation in Video Learned from Static ImagesHODOR：用于从静态图像中学习的视频中对象重新分割的高级对象描述符
Density-preserving Deep Point Cloud Compression保密度深点云压缩
Exploring Dual-task Correlation for Pose Guided Person Image Generation探索姿势引导人物图像生成的双任务相关性
Exploring Endogenous Shift for Cross-domain Detection: A Large-scale Benchmark and Perturbation Suppression Network探索跨域检测的内生转移：大规模基准和扰动抑制网络
Transferability metrics for selecting Source Model Ensembles用于选择源模型集成的可迁移性指标
The Auto Arborist Dataset: A Large-Scale Benchmark for Multimodal Urban Forest Monitoring Under Domain ShiftAuto Arborist 数据集：域转移下多模式城市森林监测的大规模基准
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose EstimationEPro-PnP：用于单目物体姿态估计的广义端到端概率透视-n-点
Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection用于多模态 3D 目标检测的激光雷达相机深度融合
Learning from Temporal Gradient for Semi-supervised Action Recognition从时间梯度中学习半监督动作识别
JoinABLe: Learning Bottom-up Assembly of Parametric CAD JointsJoinABLe：学习参数化 CAD 关节的自下而上装配
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse MotionDanceTrack：统一外观和多样化运动中的多对象跟踪
Defensive Patches for Robust Recognition in the Physical World物理世界中强大识别的防御补丁
UniCoRN: A Unified Conditional Image Repainting NetworkUniCorN：一个统一的条件图像重绘网络
APES: Articulated Part Extraction from Sprite SheetsAPES：从 Sprite 表中提取关节部分
Learning Deep Implicit Functions for 3D Shapes with Dynamic Code Clouds使用动态代码云学习 3D 形状的深度隐式函数
Neural Rays for Occlusion-aware Image-based Rendering用于遮挡感知的基于图像的渲染的神经射线
DisARM: Displacement Aware Relation Module for 3D DetectionDisARM：用于 3D 检测的位移感知关系模块
A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration时间缝合节省九个：用于改进神经网络校准的训练时间正则化损失
RIM-Net: Recursive Implicit Fields for Unsupervised Learning of Hierarchical Shape StructuresRIM-Net：用于分层形状结构无监督学习的递归隐式场
Weakly Supervised Object Localization as Domain Adaption作为域适应的弱监督对象定位
Reflash Dropout in Image Super-Resolution图像超分辨率中的闪退丢失
Semantic Segmentation by Early Region Proxy早期区域代理的语义分割
EyePAD++: A Distillation-based approach for joint Eye Authentication and Presentation Attack Detection using Periocular ImagesEyePAD++：一种基于蒸馏的方法，用于使用眼周图像进行联合眼睛身份验证和演示攻击检测
Online Learning of Reusable Abstract Models for Object Goal Navigation对象目标导航可重用抽象模型的在线学习
Time Microscope: Event-based Frame Interpolation with Parametric Non-linear Flow and Multi-scale Fusion时间显微镜：具有参数非线性流和多尺度融合的基于事件的帧插值
OSOP: A Multi-Stage One Shot Object Pose Estimation FrameworkOSOP：多阶段单镜头对象姿态估计框架
Localization Distillation for Dense Object Detection密集对象检测的定位蒸馏
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse InputsRegNeRF：对来自稀疏输入的视图合成的神经辐射场进行正则化
Cross-Image Relational Knowledge Distillation for Semantic Segmentation用于语义分割的跨图像关系知识蒸馏
Trustworthy Long-tailed Classification可信长尾分类
Episodic Memory Question Answering情景记忆问答
REX: Reasoning-aware and Grounded ExplanationREX：推理意识和扎根的解释
Query and Attention Augmentation for Knowledge-Based Explainable Reasoning基于知识的可解释推理的查询和注意力增强
LOLNerf: Learn from One LookLOLnerf：一目了然
Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object InteractionsBongard-HOI：对人-物交互的少数镜头视觉推理进行基准测试
CoNeRF: Controllable Neural Radiance FieldsCoNeRF：可控神经辐射场
Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization SpaceVision Transformer Slimming：连续优化空间中的多维搜索
UnweaveNet: Unweaving Activity StoriessUnweaveNet：解开活动故事
MeMOT: Multi-Object Tracking with MemoryMeMOT：带内存的多对象跟踪
VisualHow: Multimodal Problem SolvingVisualHow：多模式问题解决
Affine Medical Image Registration with Coarse-to-Fine Vision Transformer使用粗到精视觉变压器的仿射医学图像配准
Unpaired Deep Image Deraining Using Dual Contrastive Learning使用双重对比学习的非配对深度图像去雨
DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image AnalysisDiRA：用于自我监督医学图像分析的判别性、恢复性和对抗性学习
Mask Transfiner for High-Quality Instance Segmentation用于高质量实例分割的 Mask Transfiner
GLASS: Geometric Latent Augmentation for Shape Spaces玻璃：形状空间的几何潜在增强
Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot LearningMAML 的全局收敛和受理论启发的神经架构搜索以进行 Few-Shot 学习
Multi-modal Extreme Classification多模态极端分类
CodedVTR: Codebook-Based Sparse Voxel Transformer in Geometric RegionsCodedVTR：几何区域中基于码本的稀疏体素变换器
Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity对语义相似性的频率驱动的不可察觉的对抗性攻击
Learning to Refactor Action and Co-occurrence Features for Temporal Action Localization学习重构动作和共现特征以进行时间动作定位
Self-augmented Unpaired Image Dehazing via Density and Depth Decomposition通过密度和深度分解的自增强非配对图像去雾
QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object DetectionQueryDet：用于加速高分辨率小目标检测的级联稀疏查询
Cross-modal Representation Learning for Zero-shot Action Recognition零样本动作识别的跨模态表示学习
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation非均匀到均匀量化：通过广义直通估计实现精确量化
AUV-Net: Learning Aligned UV Maps for Texture Transfer and SynthesisAUV-Net：学习用于纹理转移和合成的对齐 UV 贴图
Bijective Mapping Network for Shadow Removal阴影去除的双射映射网络
ObjectFormer for Image Manipulation Detection and Localization用于图像处理检测和定位的 ObjectFormer
GraFormer: Graph-oriented Transformer for 3D Pose EstimationGraFormer：用于 3D 姿势估计的面向图的 Transformer
Multi-Granularity Alignment Domain Adaptation for Object Detection用于目标检测的多粒度对齐域自适应
Adaptive Hierarchical Representation Learning for Long-Tailed Object Detection用于长尾目标检测的自适应分层表示学习
Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial SensorsPhysical Inertial Poser (PIP)：来自稀疏惯性传感器的物理感知实时人体运动跟踪
3D Scene Painting via Semantic Image Synthesis通过语义图像合成进行 3D 场景绘画
MViTv2: Improved Multiscale Vision Transformers for Classification and DetectionMViTv2：用于分类和检测的改进的多尺度视觉转换器
One-bit Active Query with Contrastive Pairs具有对比对的一位主动查询
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object InteractionHOI4D：用于类别级人-物交互的 4D 以自我为中心的数据集
Leveraging Object-Level Rotation Equivariance for 3D Object Detection利用对象级旋转等方差进行 3D 对象检测
DenseCLIP: Language-Guided Dense Prediction with Context-Aware PromptingDenseCLIP：具有上下文感知提示的语言引导密集预测
JIFF: Jointly-aligned Implicit Face Function for High Fidelity Single View Clothed Human ReconstructionJIFF：用于高保真单视图着装人体重建的联合对齐隐式人脸函数
Prompt Distribution Learning快速分布学习
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped WindowsCSWin Transformer：具有十字形窗口的通用视觉变压器主干
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense CaptioningX-Trans2Cap：使用 Transformer 进行 3D 密集字幕的跨模式知识转移
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds超越 3D 连体跟踪：点云中 3D 单对象跟踪的以运动为中心的范式
Noisy Boundaries: Lemon or Lemonade for Semi-supervised Instance Segmentation?嘈杂的边界：半监督实例分割的柠檬还是柠檬水？
Interactive Image Synthesis with Panoptic Layout Generation具有全景布局生成的交互式图像合成
Learning to Find Good Models in RANSAC学习在 RANSAC 中寻找好的模型
Meta-attention for ViT-backed Continual LearningViT 支持的持续学习的元注意力
Deep Anomaly Discovery from Unlabeled Videos via Normality Advantage and Self-Paced Refinement通过常态优势和自定进度细化从未标记视频中发现深度异常
Improving neural implicit surfaces geometry with patch warping使用补丁变形改进神经隐式曲面几何
Rope3D: Take A New Look from the 3D Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection TaskRope3D：从用于自动驾驶和单目 3D 目标检测任务的 3D 路边感知数据集中重新审视
AME: Attention and Memory Enhancement in Hyper-Parameter OptimizationAME：超参数优化中的注意力和记忆增强
TopFormer: Token Pyramid Transformer for Mobile Semantic SegmentationTopFormer：用于移动语义分割的令牌金字塔转换器
Automated Progressive Learning for Efficient Training of Vision Transformers用于高效训练视觉转换器的自动渐进式学习
Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions重新审视 3D 对象姿势估计的模板：对新对象的泛化和对遮挡的鲁棒性
Towards Implicit Text-Guided 3D Shape Generation迈向隐式文本引导的 3D 形状生成
Spatial-Temporal Parallel Transformer for Arm-Hand Dynamic Estimation手臂手动态估计的时空并行变压器
Revisiting skeleton-based action recognition重新审视基于骨架的动作识别
Mutual Quantization for Cross-Modal Search with Noisy Labels带有噪声标签的跨模态搜索的相互量化
Revisiting Temporal Alignment for Video Restoration重新审视视频恢复的时间对齐
Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation在野外学习多视图聚合以进行大规模 3D 语义分割
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural ActivitiesAssembly101：用于理解程序活动的大规模多视图视频数据集
Video Frame Interpolation with Transformer使用 Transformer 进行视频帧插值
Autofocus for Event Cameras事件相机的自动对焦
Event-based Direct Sparse Odometry基于事件的直接稀疏里程计
OpenTAL: Towards Open Set Temporal Action LocalizationOpenTAL：走向开放集时间动作本地化
Programmatic Concept Learning for Human Motion Description and Synthesis用于人体运动描述和合成的程序化概念学习
MAXIM: Multi-Axis MLP for Image ProcessingMAXIM：用于图像处理的多轴 MLP
Temporal Alignment Networks for Long-term Video长期视频的时间对齐网络
Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches自己涂鸦：通过绘制一些草图进行课堂增量学习
Registering Explicit to Implicit: Towards High-Fidelity Garment mesh Reconstruction from Single Images将显式注册到隐式：从单个图像实现高保真服装网格重建
Progressive End-to-End Object Detection in Crowded Scenes拥挤场景中的渐进式端到端对象检测
Object-aware Video-language Pre-training for Retrieval用于检索的对象感知视频语言预训练
Multi-Source Uncertainty Mining for Deep Unsupervised Saliency Detection用于深度无监督显着性检测的多源不确定性挖掘
Surface Representation for Point Clouds点云的表面表示
Context-Aware Video Reconstruction for Rolling Shutter Cameras滚动快门相机的上下文感知视频重建
MonoScene: Monocular 3D Semantic Scene CompletionMonoScene：单目 3D 语义场景完成
Weakly But Deeply Supervised Occlusion-Reasoned Parametric Road Layouts弱但深度监督的遮挡推理参数化道路布局
Point Cloud Color Constancy点云颜色恒常性
HDNet: High-resolution Dual-domain Learning for Spectral Compressive ImagingHDNet：光谱压缩成像的高分辨率双域学习
iPLAN: Interactive and Procedural Layout PlanningiPLAN：交互式和程序化布局规划
End-to-End Multi-Person Pose Estimation with Transformers使用变形金刚的端到端多人姿势估计
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation在鸡尾酒会上阅读聆听：多模态语音分离
Adversarial Eigen Attack on Black-Box Models对黑盒模型的对抗性特征攻击
Domain-Aware Representation Learning for Unsupervised Domain Generalization无监督域泛化的域感知表示学习
Sub-word Level Lip Reading With Visual Attention带有视觉注意的子词级唇读
Efficient Video Instance Segmentation via Tracklet Query and Proposal通过 Tracklet Query 和 Proposal 进行高效的视频实例分割
Towards cross-modal pose localization from text-based position descriptions从基于文本的位置描述迈向跨模态姿势定位
Opening up Open World Tracking开放开放世界追踪
Dynamic Clustering Mask Transformers for Panoptic Segmentation用于全景分割的动态聚类掩码转换器
Compressive Single-Photon 3D Cameras压缩单光子 3D 相机
Style-ERD: Responsive and Coherent Online Motion Style TransferStyle-ERD：响应式和连贯的在线运动风格转移
MixFormer: Mixing Features across Windows and DimensionsMixFormer：跨窗口和维度混合功能
Robust Image Forgery Detection over Online Social Network Shared Images基于在线社交网络共享图像的鲁棒图像伪造检测
Semantic-aligned Fusion Transformer for One-shot Object Detection用于一次性目标检测的语义对齐融合转换器
Long-term Video Frame Interpolation Via Feature Propagation通过特征传播的长期视频帧插值
Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation使用分层视觉语言知识蒸馏的开放词汇单阶段检测
GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI DetectionGEN-VLKT：简化关联并增强对 HOI 检测的交互理解
ETHSeg: An Amodel Instance Segmentation Network and a Real-world Dataset for X-Ray Waste InspectionETHSeg：用于 X 射线废物检测的 Amodel 实例分割网络和真实数据集
SEEG: Semantic Energized Co-speech Gesture GenerationSEEG：语义激励的协同语音手势生成
Instance-Dependent Label-Noise Learning With Manifold-Regularized Transition Matrix Estimation使用流形正则化转移矩阵估计的实例相关标签噪声学习
Acquiring a Dynamic Light Field through a Single-Shot Coded Image通过单次编码图像获取动态光场
How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting多少个观察就足够了？轨迹预测的知识蒸馏
FaceVerse: a Fine-grained and Detail-changeable 3D Neural Face Model from a Hybrid DatasetFaceVerse：来自混合数据集的细粒度和可更改细节的 3D 神经人脸模型
Learning Where to Learn in Cross-View Self-Supervised Learning在 Cross-View Self-Supervised Learning 中学习在哪里学习
Automatic Relation-aware Graph Network Proliferation自动关系感知图网络扩散
CoSSL: Co-Learning of Representation and Classifier for Imbalanced Semi-Supervised LearningCoSSL：不平衡半监督学习的表示和分类器的共同学习
P3Depth: Monocular Depth Estimation with a Piecewise Planarity PriorP3Depth：具有分段平面先验的单目深度估计
Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability知识蒸馏作为高效的预训练：更快的收敛、更高的数据效率和更好的可迁移性
En-Compactness: Self-Distillation Embedding & Contrastive Generation for Generalized Zero-Shot LearningEn-Compactness：用于广义零样本学习的自蒸馏嵌入和对比生成
Unsupervised Learning of Accurate Siamese Tracking准确连体跟踪的无监督学习
Accelerating DETR Convergence via Semantic-Aligned Matching通过语义对齐匹配加速 DETR 收敛
Co-advise: Cross Inductive Bias Distillation共同建议：交叉感应偏置蒸馏
Medial Spectral Coordinates for 3D Shape Analysis用于 3D 形状分析的内侧光谱坐标
Coupled Iterative Refinement for 6D Multi-Object Pose Estimation用于 6D 多目标姿态估计的耦合迭代细化
DeepCurrents: Learning Implicit Representations of Shapes with BoundariesDeepCurrents：学习带边界形状的隐式表示
Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image向外看：从单个图像合成一致的长期 3D 场景视频
Zero Experience Required: Plug & Play Modular Transfer Learning for Semantic Visual Navigation零经验要求：用于语义视觉导航的即插即用模块化迁移学习
Day-to-Night Image Synthesis for Training Nighttime Neural ISPs用于训练夜间神经 ISP 的日夜图像合成
Playable Environments: Video Manipulation in Space and Time可播放环境：空间和时间中的视频操作
Unified Contrastive Learning in Image-Text-Label Space图文标签空间中的统一对比学习
Many-to-many Splatting for Efficient Video Frame Interpolation用于高效视频帧插值的多对多 Splatting
Uncertainty-Aware Deep Multi-View Photometric Stereo不确定性感知深度多视图光度立体
Multi-Robot Active Mapping via Neural Bipartite Graph Matching基于神经二分图匹配的多机器人主动映射
Location-free Human Pose Estimation无位置人体姿态估计
Multiview Transformers for Video Recognition用于视频识别的多视图转换器
RIO: Rotation-equivariance supervised learning of robust inertial odometryRIO：稳健惯性里程计的旋转等方差监督学习
Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment基于松弛空间结构对齐的 Few Shot 生成模型自适应
MiniViT: Compressing Vision Transformers with Weight MultiplexingMiniViT：使用权重复用压缩视觉变压器
Pop-Out Motion: 3D-Aware Image Deformation via Learning Shape Laplacian弹出运动：通过学习形状拉普拉斯算子实现 3D 感知图像变形
On the Road to Online Adaptation for Semantic Image Segmentation语义图像分割的在线适应之路
Generalized Binary Search Network for Highly-Efficient Multi-View Stereo用于高效多视图立体的广义二元搜索网络
Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation视觉语言导航中指令跟踪和生成的反事实循环一致学习
MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger TokensMSG-Transformer：通过操作 Messenger 令牌交换本地空间信息
Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning用于提高元学习中的泛化和内存效率的动态内核选择
Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation弱监督语义分割的区域语义对比和聚合
DLFormer:Discrete Latent Transformer for Video InpaintingDLFormer：用于视频修复的离散潜在变压器
Continuous Scene Representations for Embodied AI具身 AI 的连续场景表示
vCLIMB: A Novel Video Class Incremental Learning BenchmarkvCLIMB：一种新颖的视频类增量学习基准
NODEO: A Neural Ordinary Differential Equation Based Optimization Framework for Deformable Image RegistrationNODEO：基于神经常微分方程的可变形图像配准优化框架
ONCE-3DLanes: Building Monocular 3D Lane DetectionONCE-3DLanes：构建单目 3D 车道检测
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real TransferObjectFolder 2.0：用于 Sim2Real 传输的多感官对象数据集
HairMapper: Removing Hair from Portraits Using GANsHairMapper：使用 GAN 从肖像中去除头发
Dist-PU: Positive-Unlabeled Learning from a Label Distribution PerspectiveDist-PU：从标签分布的角度进行正无标签学习
Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection多样性很重要：充分利用深度线索进行可靠的单目 3D 对象检测
Interactive Multi-Class Tiny-Object Detection交互式多类微小物体检测
Generalizable Human Pose Triangulation可概括的人体姿势三角测量
Towards Discriminative Representation: Multi-view Trajectory Contrastive Learning for Online Multi-object TrackingTowards Discriminative Representation：用于在线多目标跟踪的多视图轨迹对比学习
A Simple Episodic Linear Probe Improves Visual Recognition in the Wild一个简单的情节线性探针提高了野外的视觉识别
Learning to Learn by Jointly Optimizing Neural Architecture and Weights通过联合优化神经架构和权重来学习学习
Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot LearningTransformers 中的属性替代学习和频谱令牌池化，用于少样本学习
Learning Soft Estimator of Keypoint Scale and Orientation with Probabilistic Covariant Loss学习具有概率协变损失的关键点尺度和方向的软估计器
Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin具有自适应置信度的半监督深度面部表情识别
Cross Domain Object Detection by Target-Perceived Dual Branch Distillation目标感知双分支蒸馏的跨域目标检测
Depth-Aware Generative Adversarial Network for Talking Head Video Generation用于说话头视频生成的深度感知生成对抗网络
OccAM’s Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR DataOccAM 的激光：基于遮挡的 3D 物体检测器在 LiDAR 数据上的属性图
Improving Adversarially Robust Few-shot Image Classification with Generalizable Representations使用可泛化的表示改进对抗性鲁棒的 Few-shot 图像分类
DyTox: Transformers for Continual Learning with DYnamic TOken eXpansionDyTox：使用动态令牌扩展进行持续学习的变形金刚
Stable Long-Term Recurrent Video Super-Resolution稳定的长期循环视频超分辨率
Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization避免简单性偏差：训练一组不同的模型发现具有卓越 OOD 泛化的解决方案
SelfD: Self-Learning Large-Scale Driving Policies From the WebSelfD：从网络上自学大规模驾驶策略
InstaFormer: Instance-Aware Image-to-Image Translation with TransformerInstaFormer：使用 Transformer 的实例感知图像到图像转换
AutoGPart: Intermediate Supervision Search for Generalizable 3D Part SegmentationAutoGPart：通用 3D 零件分割的中间监督搜索
GASP, a generalized framework for agglomerative clustering of signed graphs and its application to Instance SegmentationGASP，一种用于签名图凝聚聚类的通用框架及其在实例分割中的应用
Exploring and Evaluating Image Restoration Potential in Dynamic Scenes探索和评估动态场景中的图像恢复潜力
Multi-level Feature Learning for Contrastive Multi-view Clustering用于对比多视图聚类的多级特征学习
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data自然图像的共性拯救了 GAN：使用通用且无隐私的合成数据预训练 GAN
Threshold Matters in WSSS: Manipulating the Activation for the Robust and Accurate Segmentation Model Against ThresholdsWSSS 中的阈值问题：针对阈值操作稳健且准确的分割模型的激活
StyleSwin: Transformer-based GAN for High-resolution Image GenerationStyleSwin：用于生成高分辨率图像的基于 Transformer 的 GAN
Semi-Supervised Learning of Semantic Correspondence with Pseudo-Labels带有伪标签的语义对应的半监督学习
Divide and Conquer: Compositional Experts for Generalized Novel Class Discovery分而治之：广义小说类发现的组合专家
Splicing ViT Features for Semantic Appearance Transfer为语义外观转移拼接 ViT 特征
Optimizing Video Prediction via Video Frame Interpolation通过视频帧插值优化视频预测
Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects迭代对应几何：融合区域和深度以实现无纹理对象的高效 3D 跟踪
HARA: A Hierarchical Approach for Robust Rotation AveragingHARA：稳健旋转平均的分层方法
Revisiting Weakly Supervised Pre-Training of Visual Perception Models重新审视视觉感知模型的弱监督预训练
Safe-Student for Safe Deep Semi-Supervised Learning with Unseen-Class Unlabeled Data
PatchFormer: An Efficient Point Transformer with Patch AttentionPatchFormer：具有补丁注意的高效点变压器
Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning用于自我监督对应学习的局部感知视频间和视频内重建
Neural Global Shutter: Learn to Restore Video from a Rolling Shutter Camera with Global Reset Feature神经全局快门：学习从具有全局重置功能的滚动快门相机中恢复视频
Conditional Prompt Learning for Vision-Language Models视觉语言模型的条件提示学习
Stability-driven Contact Reconstruction From Monocular Color Images基于单目彩色图像的稳定性驱动的接触重建
SharpContour: A Contour-based Boundary Refinement Approach for Efficient and Accurate Instance SegmentationSharpContour：一种基于轮廓的边界细化方法，用于高效准确的实例分割
MSDN: Mutually Semantic Distillation Network for Zero-Shot LearningMSDN：用于零样本学习的相互语义蒸馏网络
GeneralDepth: Unsupervised Learning of Single-Image Depth Estimation in General ScenesGeneralDepth：一般场景中单图像深度估计的无监督学习
Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection重温用于密集对象检测的 AP 损失：自适应排名对选择
No-Reference Point Cloud Quality Assessment via Domain Adaptation通过域适应进行无参考点云质量评估
DArch: Dental Arch Prior-assisted 3D Tooth Instance Segmentation with Weak AnnotationsDArch：具有弱注释的牙弓先验辅助 3D 牙齿实例分割
Self-Supervised Keypoint Discovery in Behavioral Videos行为视频中的自我监督关键点发现
Toward Practical Self-Supervised Monocular Indoor Depth Estimation迈向实用的自监督单目室内深度估计
Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?跨模态感知者：可以从声音中收集面部几何形状吗？
DPGEN: Differentially Private Generative Energy-Guided Network for Natural Image SynthesisDPGEN：用于自然图像合成的差分私有生成能量引导网络
Learning the Degradation Distribution for Blind Image Super-Resolution学习盲图像超分辨率的退化分布
ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action LocalizationASM-Loc：弱监督时间动作定位的动作感知分段建模
Exploiting Rigidity Constraints for LiDAR Scene Flow Estimation利用刚性约束进行 LiDAR 场景流估计
Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection民主很重要：共同显着目标检测的综合特征挖掘
Unsupervised Domain Adaptation for Nighttime Aerial Tracking夜间空中跟踪的无监督域自适应
UDA-COPE: Unsupervised Domain Adaptation for Category-level Object Pose EstimationUDA-COPE：类别级对象姿态估计的无监督域自适应
3D Shape Reconstruction from 2D Images with Disentangled Attribute Flow使用分离属性流从 2D 图像重建 3D 形状
Multimodal Dynamics: Dynamical Fusion for Trustworthy Multimodal Classification多模态动力学：可信赖多模态分类的动态融合
Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer使用多任务转换器实现弱监督文本定位
StyTr2: Image Style Transfer with TransformersStyTr2：使用变形金刚进行图像风格转移
BokehMe: When Neural Rendering Meets Classical RenderingBokehMe：当神经渲染遇到经典渲染
Memory-augmented Deep Conditional Unfolding Network for Pan-sharpening用于全色锐化的内存增强深度条件展开网络
Learning Object Context for Novel-view Scene Layout Generation新视图场景布局生成的学习对象上下文
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality AssessmentFineDiving：用于程序感知动作质量评估的细粒度数据集
TCTrack: Temporal Contexts for Aerial TrackingTCTrack：空中跟踪的时间上下文
RBGNet: Ray-based Grouping for 3D Object DetectionRBGNet：用于 3D 对象检测的基于射线的分组
3PSDF: Three-Pole Signed Distance Function for Learning Surfaces with Arbitrary Topologies3PSDF：用于学习具有任意拓扑结构的曲面的三极符号距离函数
PanopticNeRF: A Semantic Object-Aware Neural Scene RepresentationPanopticNeRF：语义对象感知神经场景表示
Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation弯曲现实：适应全景语义分割的失真感知变压器
Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer使用新型一元成对变换器对人-物体交互进行高效两阶段检测
Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors使用表面先验重建稀疏点云的表面
Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships无监督视觉语言解析：通过依赖关系将视觉场景图与语言结构无缝连接
Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution细节或人工制品：真实图像超分辨率的局部判别学习方法
Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera从单个相机学习动态人体高保真渲染的运动相关外观
A Voxel Graph CNN for Object Classification with Event Cameras使用事件相机进行对象分类的体素图 CNN
How Good Is Aesthetic Ability of a Fashion Model?时装模特的审美能力有多好？
Recurrent Dynamic Embedding for Video Object Segmentation视频对象分割的循环动态嵌入
Self-Distillation from the Last Mini-Batch for Consistency Regularization用于一致性正则化的最后一个小批量的自蒸馏
Group Contextualization for Video Recognition用于视频识别的组语境化
Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos桥梁提示：在教学视频中对顺序动作的理解
Dual Adversarial Adaptation for Cross-Device Real-World Image Super-Resolution跨设备真实世界图像超分辨率的双重对抗适应
Urban Radiance Fields城市辐射场
Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack通过自适应自动攻击对对抗鲁棒性的实际评估
PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video SequencePINA：从单个 RGB-D 视频序列中学习个性化的隐式神经化身
Disentangled3D: Learning a 3D Generative Model with Disentangled Geometry and Appearance from Monocular ImagesDisentangled3D：从单目图像中学习具有分离几何和外观的 3D 生成模型
Global Sensing and Measurements Reuse for Image Compressed Sensing图像压缩传感的全局传感和测量重用
AKB-48: A Real-World Articulated Object Knowledge BaseAKB-48：真实世界的铰接对象知识库
Structured Sparse R-CNN for Direct Scene Graph Generation用于直接场景图生成的结构化稀疏 R-CNN
Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing逼真的单目 3D 重建人类穿着服装
Spectral Unsupervised Domain Adaptation for Visual Recognition用于视觉识别的光谱无监督域自适应
SimMatch: Semi-supervised Learning with Similarity MatchingSimMatch：具有相似性匹配的半监督学习
Multi-grained Spatio-Temporal Features Perceived Network for Event-based Lip-Reading基于事件的唇读的多粒度时空特征感知网络
POCO: Point Convolution for Surface ReconstructionPOCO：用于表面重建的点卷积
HerosNet: Hyperspectral Explicable Reconstruction and Optimal Sampling Deep Network for Snapshot Compressive ImagingHerosNet：用于快照压缩成像的高光谱可解释重建和最优采样深度网络
Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond迈向强大的雨水清除对抗对抗性攻击：综合基准分析及其他
FedDC: Federated Learning with Non-IID Data via Local Drift Decoupling and CorrectionFedDC：通过局部漂移解耦和校正使用非 IID 数据进行联邦学习
Open-set Text Recognition via Character-Context Decoupling基于字符上下文解耦的开集文本识别
Generalized Few-shot Semantic Segmentation广义小样本语义分割
Causal Transportability for Neural Representations神经表示的因果可迁移性
Uncertainty-Guided Probabilistic Transformer for Complex Action Recognition用于复杂动作识别的不确定性引导概率变换器
Matching Feature Sets for Few-Shot Image Classification少镜头图像分类的匹配特征集
Interactron: Embodied Adaptive Object DetectionInteractron：体现的自适应对象检测
It’s About Time: Analog Clock Reading in the Wild时间到了：野外模拟时钟读数
A Graph Matching Perspective with Transformers on Video Instance Segmentation视频实例分割中带有 Transformers 的图匹配视角
GIF: Neural Implicit Function for General Shape RepresentationGIF：一般形状表示的神经隐式函数
AdaViT: Adaptive Vision Transformers for Efficient Image RecognitionAdaViT：用于高效图像识别的自适应视觉转换器
Language as Queries for Referring Video Object Segmentation语言作为引用视频对象分割的查询
Federated Class-Incremental Learning联邦类增量学习
Human Hands as Probes for Interactive Object Understanding人手作为交互式对象理解的探针
STIF: Learning Continuous Video Representation for Space-Time Super-ResolutionSTIF：学习时空超分辨率的连续视频表示
Bridging Video-text Retrieval with Multiple Choice Questions桥接视频文本检索与多项选择题
FoggyStereo: Stereo Matching with Fog Volume RepresentationFoggyStereo：立体匹配与雾体积表示
MonoGround: Detecting Monocular 3D Objects from the GroundMonoGround：从地面检测单目 3D 物体
CLIMS: Cross Language Image Matching for Weakly Supervised Semantic SegmentationCLIMS：用于弱监督语义分割的跨语言图像匹配
ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive CodingELIC：具有不均匀分组的空间通道上下文自适应编码的高效学习图像压缩
Local Texture Estimator for Implicit Representation Function隐式表示函数的局部纹理估计器
Neural Recognition of Dashed Curves with Gestalt Law of Continuity具有格式塔连续性定律的虚线曲线的神经识别
Voxel Field Fusion for 3D Object Detection用于 3D 对象检测的体素场融合
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with TransformersPanoptic SegFormer：使用 Transformers 深入研究全景分割
Both Style and Fog Matter: Cumulative Domain Adaptation for Semantic Foggy Scene Understanding风格和雾很重要：语义雾场景理解的累积域适应
SCS-Co: Self-Consistent Style Contrastive Learning for Image HarmonizationSCS-Co：图像协调的自洽风格对比学习
H4D: Human 4D Modeling by Learning Neural Compositional RepresentationH4D：通过学习神经组合表示进行人体 4D 建模
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference TransformerPhysFormer：使用时差变换器的基于面部视频的生理测量
A Unified Query-based Paradigm for Point Cloud Understanding一种基于统一查询的点云理解范式
AdaInt: Learning Adaptive Intervals for 3D Lookup Tables on Real-time Image EnhancementAdaInt：学习实时图像增强中 3D 查找表的自适应间隔
FS6D: Few-Shot 6D Pose Estimation of Novel ObjectsFS6D：新物体的 Few-Shot 6D 姿态估计
CLIP-Event: Connecting Text and Images with Event StructuresCLIP-Event：用事件结构连接文本和图像
Category Contrast for Unsupervised Domain Adaptation in Visual Tasks视觉任务中无监督域适应的类别对比
GateHUB: Gated History Unit with Background Suppression for Online Action DetectionGateHUB：用于在线动作检测的具有背景抑制的门控历史单元
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in VideoMixSTE：用于视频中 3D 人体姿势估计的 Seq2seq 混合时空编码器
Learning 3D Object Shape and Layout without 3D Supervision在没有 3D 监督的情况下学习 3D 对象形状和布局
Discrete Cosine Transform Network for Guided Depth Super-Resolution用于引导深度超分辨率的离散余弦变换网络
DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image ClassificationDTFD-MIL：用于组织病理学全幻灯片图像分类的双层特征蒸馏多实例学习
Recurrent Glimpse-based Decoder for Detection with Transformer基于递归 Glimpse 的解码器，用于使用 Transformer 进行检测
HSC4D: Human-centered 4D Scene Capture in Large-scale Indoor-outdoor Space Using Wearable IMUs and LiDARHSC4D：使用可穿戴 IMU 和 LiDAR 在大规模室内外空间中以人为中心的 4D 场景捕获
Multi-Object Tracking Meets Moving UAV多目标跟踪遇到移动无人机
Estimating Fine-Grained Noise Model via Contrastive Learning通过对比学习估计细粒度噪声模型
ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP CuesProposalCLIP：通过利用 CLIP 线索生成无监督的开放类别对象建议
Task-specific Inconsistency Alignment for Domain Adaptive Object Detection用于域自适应对象检测的特定于任务的不一致对齐
Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization用于弱监督时间动作定位的细粒度时间对比学习
Global-Aware Registration of Less-Overlap RGB-D Scans少重叠 RGB-D 扫描的全局感知配准
XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font GenerationXMP-Font：用于 Few-Shot 字体生成的自监督跨模态预训练
A Simple Data Mixing Prior for Improving Self-Supervised Vision Transformer改进自监督视觉转换器的简单数据混合先验
Dense Learning based Semi-Supervised Object Detection基于密集学习的半监督目标检测
RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose OptimizationRNNPose：具有鲁棒对应场估计和姿态优化的递归 6-DoF 对象姿态细化
Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation用于图像生成的矢量量化建模中具有离散扩散的全局上下文
Collaborative Learning for Hand and Object Reconstruction with Attention-guided Graph Convolution使用注意力引导的图卷积进行手部和对象重建的协作学习
End-to-end Generative Pretraining for Multimodal Video Captioning多模态视频字幕的端到端生成预训练
Exposure Normalization and Compensation for Multiple Exposure Correction多重曝光校正的曝光归一化和补偿
Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks神经网络中可解释的部分-整体层次结构和概念语义关系
Multi-label Classification with Partial Annotations using Class-aware Selective Loss使用类感知选择性损失的带有部分注释的多标签分类
Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask PredictionFire Together Wire Together：一种具有自我监督掩码预测的动态修剪方法
IterMVS: Iterative Probability Estimation for Efficient Multi-View StereoIterMVS：高效多视图立体的迭代概率估计
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation全局思考，局部行动：用于视觉和语言导航的双尺度图形转换器
Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction用于有效降维的分层最近邻图嵌入
Decoupling Makes Weakly Supervised Local Feature Better解耦使弱监督的局部特征更好
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds并非所有点都是平等的：学习用于 3D LiDAR 点云的高效基于点的检测器
Expanding Large Pre-trained Unimodal Models with Multimodal Information Injection for Image-Text Multimodal Classification使用多模态信息注入扩展大型预训练单模态模型以进行图像-文本多模态分类
Semi-Weakly-Supervised Learning of Complex Actions from Instructional Videos教学视频中复杂动作的半弱监督学习
Set-Supervised Action Learning in Procedural Videos via Pairwise Order Consistency通过成对顺序一致性在程序视频中进行集合监督动作学习
SHIFT: A Synthetic Driving Dataset for Continuous Multi-Task Domain AdaptationSHIFT：用于连续多任务域适应的综合驱动数据集
BANMo: Building Animatable 3D Neural Models from Many Casual VideosBANMo：从许多休闲视频中构建可动画的 3D 神经模型
HD-CSE: Learning Dense Correspondence of Clothed Humans with Vision TransformersHD-CSE：使用视觉变形器学习穿衣人的密集对应
Efficient Geometry-aware 3D Generative Adversarial Networks高效的几何感知 3D 生成对抗网络
CAPRI-Net: Learning Compact CAD Shapes with Adaptive Primitive AssemblyCAPRI-Net：使用自适应基元装配学习紧凑的 CAD 形状
HL-Net: Heterophily Learning Network for Scene Graph GenerationHL-Net：用于场景图生成的异质学习网络
Towards Efficient Data Free Black-box Adversarial Attack迈向高效的无数据黑盒对抗攻击
Neural Collaborative Graph Machines for Table Structure Recognition用于表结构识别的神经协同图机
Dimension Embeddings for Monocular 3D Object Detection用于单目 3D 对象检测的维度嵌入
Nested Collaborative Learning for Long-Tailed Visual Recognition用于长尾视觉识别的嵌套协作学习
Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels带有噪声标签的学习中噪声检测的可扩展惩罚回归
Calibrating Deep Neural Networks by Pairwise Constraints通过成对约束校准深度神经网络
HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive RegularizationHybridCR：通过混合对比正则化的弱监督 3D 点云语义分割
Few-Shot Font Generation by Learning Fine-Grained Local Styles通过学习细粒度的局部样式生成 Few-Shot 字体
Point-NeRF: Point-based Neural Radiance FieldsPoint-NeRF：基于点的神经辐射场
Spatial-Temporal Space Hand-in-Hand: Spatial-Temporal Video Super-Resolution via Cycle-Projected Mutual Learning时空空间手牵手：通过周期投影互学习的时空视频超分辨率
Learning from All Vehicles向所有车辆学习
Gait Recognition in the Wild with Dense 3D Representations and A Benchmark具有密集 3D 表示和基准的野外步态识别
DETReg: Unsupervised Pretraining with Region Priors for Object DetectionDETReg：使用区域先验进行目标检测的无监督预训练
Rethinking Semantic Segmentation: A Prototype View重新思考语义分割：原型视图
Distillation Using Oracle Queries for Transformer-based Human-Object Interaction Detection使用 Oracle 查询进行基于 Transformer 的人机交互检测的蒸馏
MobRecon: Mobile-Friendly Hand Mesh Reconstruction from Monocular ImageMobRecon：从单目图像重建移动友好的手部网格
Spatio-temporal Relation Modeling for Few-shot Action Recognition少样本动作识别的时空关系建模
RestoreFormer: High-Quality Blind Face Restoration from Undegraded Key-Value PairsRestoreFormer：从未降级的键值对中进行高质量的盲人脸恢复
DF-GAN: A Simple and Effective Baseline for Text-to-Image SynthesisDF-GAN：文本到图像合成的简单有效基线
Domain-Agnostic Prior for Unsupervised Transfer Segmentation无监督转移分割的领域不可知先验
Unimodal-Concentrated Loss: Fully Adaptive Label Distribution Learning for Ordinal Regression单峰集中损失：序数回归的完全自适应标签分布学习
Pyramid Grafting Network for One-Stage High Resolution Saliency Detection用于单阶段高分辨率显着性检测的金字塔嫁接网络
Pseudo-Q: Generating Pseudo Language Queries for Visual GroundingPseudo-Q：为视觉基础生成伪语言查询
Keypoint Transformer: Solving Joint Identification in Challenging Hands and Object Interactions for Accurate 3D Pose EstimationKeypoint Transformer：解决具有挑战性的手和物体交互中的联合识别，以实现准确的 3D 姿势估计
Towards Discovering the Effectiveness of Moderately Confident Samples for Semi-Supervised Learning探索中等自信样本在半监督学习中的有效性
Semi-Supervised Video Semantic Segmentation with Inter-Frame Feature Reconstruction具有帧间特征重构的半监督视频语义分割
Revisiting the “Video” in Video-Language Understanding重温视频语言理解中的“视频”
SNUG: Self-Supervised Neural Dynamic GarmentsSNUG：自我监督的神经动态服装
FocalClick: Towards Practical Interactive Image SegmentationFocalClick：迈向实用的交互式图像分割
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic SegmentationDAFormer：改进域自适应语义分割的网络架构和训练策略
GRAM: Generative Radiance Manifolds for 3D-Aware Image GenerationGRAM：用于 3D 感知图像生成的生成辐射歧管
Temporally Efficient Vision Transformer for Video Instance Segmentation用于视频实例分割的时间高效视觉转换器
C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical ImageC-CAM：用于医学图像弱监督语义分割的因果 CAM
Adversarial Texture for Fooling Person Detectors in the Physical World物理世界中愚弄人探测器的对抗性纹理
Automatic Color Image Stitching Using Quaternion Rank-1 Alignment使用四元数 Rank-1 对齐的自动彩色图像拼接
TemporalUV: Capturing Loose Clothing with Temporally Coherent UV CoordinatesTemporalUV：使用时间相干的 UV 坐标捕捉宽松的衣服
Kernelized Few-shot Object Detection by Integral Aggregation使用积分聚合的核化少样本目标检测
Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data用于自动驾驶数据的图像到激光雷达的自监督蒸馏
Amodal Segmentation through Out-of-Task and Out-of-Distribution Generalization with a Bayesian Model使用贝叶斯模型通过任务外和分布外泛化进行 Amodal 分割
FocusCut: Diving into a Focus View in Interactive SegmentationFocusCut：深入了解交互式分割中的焦点视图
Mutual Information-driven Pan-sharpening互信息驱动的全色锐化
Gradient-SDF: A Semi-Implicit Surface Representation for 3D ReconstructionGradient-SDF：用于 3D 重建的半隐式表面表示
Neural Head Avatars from Monocular RGB Videos来自单眼 RGB 视频的神经头部头像
Point-Level Region Contrast for Object Detection Pre-Training目标检测预训练的点级区域对比
HODEC: Towards Efficient High-Order DEcomposed Convolutional Neural NetworksHODEC：迈向高效的高阶分解卷积神经网络
Bridging Global Context Interactions for High-Fidelity Image Completion桥接全局上下文交互以完成高保真图像
CDGNet: Class Distribution Guided Network for Human ParsingCDGNet：用于人类解析的类分布引导网络
Primitive3D: Learning from 3D Objects Assembled with Random PrimitivesPrimitive3D：从随机基元组装的 3D 对象中学习
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular VideoHumanNeRF：来自单目视频的移动人物的自由视点渲染
TransMix: Attend to Mix for Vision TransformersTransMix：参加视觉变形金刚的混音
JRDB-Act: A Large-scale Dataset for Spatio-temporal Action, Social Group and Activity DetectionJRDB-Act：用于时空行为、社会群体和活动检测的大规模数据集
Few-shot Head Swapping in the Wild在野外换头的少之又少
Neural Texture Extraction and Distribution for Controllable Person Image Synthesis用于可控人物图像合成的神经纹理提取和分布
Embracing Single Stride 3D Object Detector with Sparse Transformer使用 Sparse Transformer 拥抱单步 3D 对象检测器
Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning告诉我什么并告诉我如何：通过多模式调节进行视频合成
Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data利用 3D 合成数据去除人像眼镜和阴影
Expanding Low-Density Latent Regions for Open-Set Object Detection为开放集目标检测扩展低密度潜在区域
GMFlow: Learning Optical Flow via Global MatchingGMFlow：通过全局匹配学习光流
Source-Free Domain Adaptation via Distribution Estimation通过分布估计进行无源域自适应
Aesthetic Text Logo Synthesis via Content-aware Layout Inferring通过内容感知布局推断的审美文本标志合成
An Image Patch is a Wave: Phase-Aware Vision MLP图像补丁是波浪：相位感知视觉 MLP
FisherMatch: Semi-Supervised Rotation Regression via Entropy-based FilteringFisherMatch：基于熵的过滤的半监督旋转回归
BE-STI: Spatial-Temporal Integrated Network for Class-agnostic Motion Prediction with Bidirectional EnhancementBE-STI：用于具有双向增强的类别不可知运动预测的时空集成网络
DC-SSL: Addressing Mismatched Class Distribution in Semi-supervised LearningDC-SSL：解决半监督学习中不匹配的类分布
Deterministic Point Cloud Registration via Novel Transformation Decomposition通过新颖的变换分解进行确定性点云配准
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos寻找变化：从未修剪的网络视频中学习对象状态和状态修改操作
Deep Visual Geo-localization Benchmark深度视觉地理定位基准
LC-FDNet: Learned Lossless Image Compression with Frequency Decomposition NetworkLC-FDNet：用频率分解网络学习无损图像压缩
Towards Robust Vision Transformer迈向强大的视觉变压器
Volumetric Bundle Adjustment for Photorealistic Real-time Reconstruction用于真实感实时重建的体积束调整
Continual Test-Time Domain Adaptation持续测试时域适应
Scribble-Supervised LiDAR Semantic SegmentationScribble-Supervised LiDAR 语义分割
TableFormer: Table Structure Understanding with TransformersTableFormer：使用 Transformer 理解表结构
Focal Sparse Convolutional Networks for 3D Object Detection用于 3D 对象检测的焦点稀疏卷积网络
CLRNet: Cross Layer Refinement Network for Lane DetectionCLRNet：用于车道检测的跨层细化网络
Transformer Based Line Segment Classifier with Image Context for Real-Time Vanishing Point Detection in Manhattan World基于变压器的线段分类器与图像上下文在曼哈顿世界实时消失点检测
NeRFReN: Neural Radiance Fields with ReflectionsNeRFReN：带反射的神经辐射场
HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image EditingHyperStyle：使用 HyperNetworks 进行 StyleGAN 反演，用于真实图像编辑
Ditto: Building Digital Twins of Articulated Objects from Interaction同上：从交互中构建铰接对象的数字孪生
CroMo: Cross-Modal Learning for Monocular Depth EstimationCroMo：单目深度估计的跨模态学习
Mobile-Former: Bridging MobileNet and TransformerMobile-Former：连接 MobileNet 和 Transformer
MetaFormer is Actually What You Need for VisionMetaFormer 实际上是您需要的视觉
RU-Net: Regularized Unrolling Network for Scene Graph GenerationRU-Net：用于场景图生成的正则化展开网络
Dreaming to Prune Image Deraining Networks梦想修剪图像去雨网络
Salvage of Supervision in Weakly Supervised Object Detection弱监督目标检测中的监督补救
Lagrange Motion Analysis and View Embeddings for Improved Gait Recognition用于改进步态识别的拉格朗日运动分析和视图嵌入
Lite Pose: Efficient Architecture Design for 2D Human Pose EstimationLite Pose：用于 2D 人体姿势估计的高效架构设计
SwinBERT: End-to-End Transformers with Sparse Attention for Video CaptioningSwinBERT：用于视频字幕的具有稀疏注意力的端到端变压器
FMCNet: Feature-Level Modality Compensation for Visible-Infrared Person Re-IdentificationFMCNet：可见红外人员重新识别的特征级模态补偿
Generalizing Gaze Estimation with Rotation Consistency用旋转一致性概括注视估计
SIOD: Single Instance Annotated Per Category Per Image for Object DetectionSIOD：用于对象检测的每个类别每个图像的单个实例注释
Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification用于图像到视频的人员重新识别的时间互补引导强化学习
A Differentiable Two-stage Alignment Scheme for Burst Image Reconstruction with Large Shift大位移突发图像重建的可微分两阶段对齐方案
Manifold Learning Benefits GANs流形学习有利于 GAN
Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing通过 Shuffled Style Assembly 进行域泛化以进行人脸反欺骗
OW-DETR: Open-world Detection TransformerOW-DETR：开放世界检测变压器
Learning Optimal K-space Acquisition and Reconstruction using Physics-Informed Neural Networks使用基于物理的神经网络学习最优 K 空间采集和重建
Global Tracking via Ensemble of Local Trackers通过本地跟踪器集合进行全球跟踪
Robust Region Feature Synthesizer for Zero-Shot Object Detection用于零样本目标检测的鲁棒区域特征合成器
Confidence Propagation Cluster: Unleash Full Potential of Object Detectors信心传播集群：释放物体检测器的全部潜力
PartGlot: Learning Shape Part Segmentation from Language Reference GamesPartGlot：从语言参考游戏中学习形状部分分割
Self-Taught Metric Learning without Labels没有标签的自学度量学习
GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise VotingGPV-Pose：通过几何引导的逐点投票进行类别级对象姿态估计
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware FusionOmniFusion：通过几何感知融合进行 360 度单目深度估计
3D Common Corruptions and Data Augmentation3D 常见损坏和数据增强
DIVeR: Real-time and Accurate Neural Radiance Fields with Deterministic Integration for Volume RenderingDIveR：具有确定性集成的实时和准确的神经辐射场，用于体积渲染
Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation通过上下文组装和强大的数据增强提高图像抠图的鲁棒性
Cross-modal Clinical Graph Transformer For Ophthalmic Report Generation用于生成眼科报告的跨模态临床图形转换器
Correlation-Aware Deep Tracking相关感知深度跟踪
Learning to Imagine: Diversify Memory for Incremental Learning using Unlabeled Data学习想象：使用未标记数据为增量学习多样化记忆
Block-NeRF: Scalable Large Scene Neural View SynthesisBlock-NeRF：可扩展的大场景神经视图合成
Vector Quantized Diffusion Model for Text-to-Image Synthesis用于文本到图像合成的矢量量化扩散模型
Boosting Crowd Counting via Multifaceted Attention通过多方面注意提高人群计数
Physically-guided Disentangled Implicit Rendering for 3D Face Modeling用于 3D 人脸建模的物理引导解开隐式渲染
IFRNet: Intermediate Feature Refine Network for Efficient Frame InterpolationIFRNet：用于高效帧插值的中间特征细化网络
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with TransformersTransFusion：使用 Transformers 进行 3D 对象检测的稳健 LiDAR-Camera Fusion
Back to Reality: Weakly-supervised 3D Detection with Shape-guided Label Enhancement回到现实：带有形状引导标签增强的弱监督 3D 检测
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding增量变压器结构增强图像修复与掩蔽位置编码
Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel对噪声和核进行精细退化建模的盲图像超分辨率
Reduce Information Loss in Transformers for Pluralistic Image Inpainting减少变压器中的信息损失以进行多元图像修复
OCSampler: Compressing Videos to One Clip with Single-step SamplingOCSampler：使用单步采样将视频压缩为一个剪辑
Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network掩盖对抗性损害：为稳健和稀疏网络寻找对抗性显着性
SemAffiNet: Semantic-Affine Transformation for Point Cloud SegmentationSemAffiNet：点云分割的语义仿射变换
High-resolution Face Swapping via Latent Semantics Disentanglement通过潜在语义解缠结实现高分辨率人脸交换
Deep Rectangling for Image Stitching: A Learning Baseline图像拼接的深度矩形：学习基线
Detector-Free Weakly Supervised Group Activity Recognition无检测器弱监督群体活动识别
Unsupervised Domain Generalization by learning a Bridge Across Domains通过学习跨域的桥梁进行无监督域泛化
RSCFed: Random Sampling Consensus Federated Semi-supervised LearningRSCFed：随机抽样共识联邦半监督学习
IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network QuantizationIntraQ：学习具有类内异质性的合成图像以进行零样本网络量化
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution一种用于空间变形鲁棒场景文本图像超分辨率的文本注意网络
Learned Queries for Efficient Local Attention有效局部注意力的学习查询
Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling来回回顾：具有显式时间差异建模的视频超分辨率
HVH: Learning a Hybrid Neural Volumetric Representation for Dynamic Hair Performance CaptureHVH：学习用于动态头发性能捕获的混合神经体积表示
Robust Contrastive Learning against Noisy Views针对嘈杂视图的鲁棒对比学习
Discovering Objects that Can Move发现可以移动的物体
TubeFormer-DeepLab: Video Mask TransformerTubeFormer-DeepLab：视频掩码转换器
Sparse and Complete Latent Organization for Geospatial Semantic Segmentation地理空间语义分割的稀疏和完整潜在组织
ITSA: An Information Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching NetworksITSA：立体匹配网络中自动避免捷径和域泛化的信息论方法
Few-shot Backdoor Defense Using Shapley Estimation使用 Shapley 估计的 Few-shot 后门防御
Exploring Domain-Invariant Parameters for Source Free Domain Adaptation探索无源域自适应的域不变参数
Ev-TTA: Test-Time Adaptation for Event-Based Object RecognitionEv-TTA：基于事件的对象识别的测试时间适应
Likert Scoring with Grade Decoupling for Long-term Action Assessment长期行动评估的李克特评分与成绩解耦
Unpaired Cartoon Image Synthesis via Gated Cycle Mapping通过门控循环映射合成未配对卡通图像
Contextual Instance Decoupling for Robust Multi-Person Pose Estimation用于鲁棒多人姿势估计的上下文实例解耦
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes自监督预测学习：视觉场景中声源定位的无负法方法
Modulated Contrast for Versatile Image Translation用于多功能图像翻译的调制对比度
Oriented RepPoints for Aerial Object Detection面向空中目标检测的 RepPoints
INS-Conv: Incremental Sparse Convolution for Online 3D SegmentationINS-Conv：用于在线 3D 分割的增量稀疏卷积
PanopticDepth: Instance-Decoupled Depth Estimation for Unified Depth-Aware Panoptic SegmentationPanopticDepth：用于统一深度感知全景分割的实例解耦深度估计
Point-BERT : Pre-Training 3D Point Cloud Transformers with Masked Point ModelingPoint-BERT：使用掩蔽点建模预训练 3D 点云变压器
Implicit Sample Extension for Unsupervised Person Re-Identification无监督人员重新识别的隐式样本扩展
Incorporating Semi-Supervised and Positive-Unlabeled learning for Boosting Full Reference Image Quality Assessment结合半监督和正无标记学习来提升全参考图像质量评估
HairCLIP: Design Your Hair by Text and Reference ImageHairCLIP：通过文本和参考图像设计你的头发
C2AM Loss: Chasing a Better Decision Boundary for Long-Tail Object DetectionC2AM 损失：为长尾目标检测寻找更好的决策边界
MogFace: Towards a Deeper Appreciation on Face DetectionMogFace：对人脸检测进行更深入的了解
RegionCLIP: Region-based Language-Image PretrainingRegionCLIP：基于区域的语言图像预训练
HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule NetworkHP-Capsule：分层解析胶囊网络的无监督面部部分发现
Structure-Aware Flow Generation for Human Body Reshaping用于人体重塑的结构感知流生成
Revisiting Document Image Dewarping by Grid Regularization通过网格正则化重新审视文档图像去扭曲
GANSeg: Learning to Segment by Unsupervised Hierarchical Image GenerationGANSeg：通过无监督分层图像生成学习分割
Align and Prompt: Video-and-Language Pre-training with Entity Prompts对齐和提示：使用实体提示进行视频和语言预训练
Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization为弱监督目标定位弥合分类和定位之间的差距
Shunted Self-Attention via Multi-Scale Token Aggregation通过多尺度令牌聚合分流自注意力
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial AttentionVISTA：通过 Dual Cross-VIew SpaTial Attention 提升 3D 对象检测
MonoDTR: Monocular 3D Object Detection with Depth-Aware TransformerMonoDTR：使用深度感知 Transformer 的单目 3D 对象检测
YouMVOS: An Actor-centric Multi-shot Video Object Segmentation DatasetYouMVOS：一个以演员为中心的多镜头视频对象分割数据集
Single-Stage is Enough: Multi-Person Absolute 3D Pose Estimation单阶段就足够了：多人绝对 3D 姿势估计
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight DetectionUMT：用于联合视频时刻检索和高光检测的统一多模态转换器
DiSparse: Disentangled Sparsification for Multitask Model CompressionDiSparse：多任务模型压缩的解耦稀疏化
Coarse-to-fine Deep Video Coding with Hyperprior-guided Mode Prediction具有超先验引导模式预测的粗到细深度视频编码
Weakly Supervised High-Fidelity Clothing Model Generation弱监督高保真服装模型生成
Deep Generalized Unfolding Networks for Image Restoration用于图像恢复的深度广义展开网络
Panoptic-PHNet: Towards Real-Time and High-Precision LiDAR Panoptic Segmentation via Clustering Pseudo HeatmapPanoptic-PHNet：通过聚类伪热图实现实时和高精度 LiDAR 全景分割
ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression FrameworkES6D：计算效率高、对称感知的 6D 姿势回归框架
Iterative Deep Homography Estimation迭代深度单应性估计
Homography Loss for Monocular 3D Object Detection单目 3D 目标检测的单应性损失
Infrared Invisible Clothing: Hiding from Infrared Detectors at Multiple Angles in Real World红外隐形服装：在现实世界中从多个角度躲避红外探测器
Deep Stereo Image Compression via Bi-directional Coding通过双向编码进行深度立体图像压缩
Degree-of-linear-polarization-based Color Constancy基于线性偏振度的颜色恒常性
Unleashing Potential of Unsupervised Pre-Training with Intra-Identity Regularization for Person Re-Identification通过身份内正则化释放无监督预训练的潜力以进行人员重新识别
Aladdin: Joint Atlas Building and Diffeomorphic Registration Learning with Pairwise Alignment阿拉丁：联合地图集构建和微分配准学习与成对对齐
Learning Transferable Human-Object Interaction Detector with Natural Language Supervision通过自然语言监督学习可迁移的人-物交互检测器
PNP: Robust Learning from Noisy Labels by Probabilistic Noise PredictionPNP：通过概率噪声预测从噪声标签中鲁棒学习
RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View StereoRayMVSNet：学习基于光线的 1D 隐式场以实现准确的多视图立体
Shapley-NAS: Discovering Operation Contribution for Neural Architecture SearchShapley-NAS：发现对神经架构搜索的操作贡献
Few-shot Keypoint Detection with Uncertainty Learning for Unseen Species对未见物种进行不确定性学习的小样本关键点检测
Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation重用特定于任务的分类器作为鉴别器：无鉴别器的对抗域适应
The Pedestrian next to the Lamppost'' Adaptive Object Graphs for Better Instantaneous Mapping灯柱旁边的行人’‘自适应对象图更好的瞬时映射
Point2Seq: Detecting 3D Objects as SequencesPoint2Seq：将 3D 对象检测为序列
Towards Noiseless Object Contours for Weakly Supervised Semantic Segmentation面向弱监督语义分割的无噪声对象轮廓
Syntax-Aware Network for Handwritten Mathematical Expression Recognition用于手写数学表达式识别的语法感知网络
RAGO: Recurrent Graph Optimizer For Multiple Rotation AveragingRAGO：用于多次旋转平均的循环图优化器
A Brand New Dance Partner: Music-Conditioned Pluralistic Dancing Controlled by Multiple Dance Genres全新舞伴：多舞种控制的音乐条件多元舞
BNVF: Dense 3D Reconstruction using Bi-level Neural Volume FusionBNVF：使用双层神经体积融合的密集 3D 重建
AutoLoss-Zero: Searching Loss Functions from Scratch for Generic TasksAutoLoss-Zero：从头开始搜索通用任务的损失函数
Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework通过统一梯度框架探索连体自监督学习的等价性
Cross-domain Few-shot Learning with Task-specific Adapters使用特定任务适配器的跨域小样本学习
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot TasksUni-Perceiver：零样本和少样本任务通用感知的预训练统一架构
Geometric and Textural Augmentation for Domain Gap Reduction用于减少域间隙的几何和纹理增强
Geometric Transformer for Fast and Robust Point Cloud Registration用于快速和稳健点云配准的几何变压器
Group R-CNN for Point-based Weakly Semi-supervised Object DetectionGroup R-CNN 用于基于点的弱半监督目标检测
Wnet: Audio-Guided Video Semantic Segmentation via Wavelet-Based Cross-Modal Denoising NetworksWnet：基于小波的跨模态去噪网络的音频引导视频语义分割
3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds3DJCG：在 3D 点云上进行联合密集字幕和视觉接地的统一框架
ELSR: Efficient Line Segment Reconstruction with Planes and Points GuidanceELSR：使用平面和点引导的高效线段重建
A Proposal-based Paradigm for Self-supervised Sound Source Localization in Videos基于提案的视频自监督声源定位范式
Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer基于多尺度 Transformer 的半监督广角人像校正
End-to-End Referring Video Object Segmentation with Multimodal Transformers使用多模态转换器的端到端参考视频对象分割
Neural fields as learnable kernels for 3D reconstruction神经域作为 3D 重建的可学习内核
IDR: Self-Supervised Image Denoising via Iterative Data RefinementIDR：通过迭代数据细化的自我监督图像去噪
TransMVSNet: Global Context-aware Multi-view Stereo Network with TransformersTransMVSNet：具有 Transformer 的全局上下文感知多视图立体网络
SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware NormalizationSimAN：通过相似性感知归一化探索场景文本的自监督表示学习
Deep vanishing point detection: Geometric priors make dataset variations vanish深度消失点检测：几何先验使数据集变化消失
On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles自动驾驶汽车轨迹预测的对抗鲁棒性
Learning Multiple Dense Prediction Tasks from Partially Annotated Data从部分注释数据中学习多个密集预测任务
Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free隔离：稀疏性可以免费发现木马攻击触发器
Video Demoireing with Relation-based Temporal Consistency具有基于关系的时间一致性的视频演示
FLAG: Flow-based 3D Avatar Generation from Sparse ObservationsFLAG：从稀疏观察中生成基于流的 3D 头像
Learning an Optimal Linear Program for Multi-Target Tracking学习多目标跟踪的最优线性规划
IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from Photometric ImagesIRON：通过优化来自光度图像的神经 SDF 和材料进行反向渲染
Stereoscopic Universal Perturbations across Different Architectures and Datasets跨不同架构和数据集的立体普遍扰动
The Flag Median and FlagIRLS国旗中位数和 FlagIRLS
NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images黑暗中的 NeRF：来自嘈杂原始图像的高动态范围视图合成
BoxeR: Box-Attention for 2D and 3D TransformersBoxeR：用于 2D 和 3D 变形金刚的 Box-Attention
DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change SegmentationDynamicEarthNet：用于语义变化分割的每日多光谱卫星数据集
UBnormal: New Benchmark for Supervised Open-Set Video Anomaly DetectionUBnormal：监督开放集视频异常检测的新基准
Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection用于异常检测的自监督预测卷积注意力块
CADTransformer: Panoptic Symbol Spotting Transformer for CAD DrawingsCADTransformer：用于 CAD 绘图的全景符号识别变压器
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy多样性原则：训练视力更强的变形金刚需要减少所有级别的冗余
Learning To Recognize Procedural Activities with Distant Supervision通过远程监督学习识别程序活动
Audio-driven Neural Gesture Reenactment with Video Motion Graphs使用视频运动图进行音频驱动的神经手势重演
Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence迈向双向任意图像缩放：联合优化和循环幂等
Hire-MLP: Vision MLP via Hierarchical RearrangementHire-MLP：通过层次重排的视觉 MLP
Escaping Data Scarcity for High-Resolution Heterogeneous Face Hallucination摆脱高分辨率异构面部幻觉的数据稀缺性
DeepDPM: Deep Clustering With an Unknown Number of ClustersDeepDPM：具有未知数量集群的深度聚类
ZeroWaste Dataset: Towards Deformable Object Segmentation in Cluttered ScenesZeroWaste 数据集：在杂乱场景中实现可变形对象分割
Context-Aware Sequence Alignment using 4D Skeletal Augmentation使用 4D 骨骼增强的上下文感知序列比对
COAP: Compositional Articulated Occupancy of PeopleCOAP：人的组合铰接占用
Sound and Visual Representation Learning with Multiple Pretraining Tasks具有多个预训练任务的声音和视觉表示学习
The Wanderings of Odysseus in 3D Scenes3D 场景中的奥德修斯流浪
Deblurring via Stochastic Refinement通过随机细化去模糊
SMPL-A: Modeling Person-Specific Deformable AnatomySMPL-A：建模特定于人的可变形解剖结构
Neural Point Light Fields神经点光场
FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated LearningFedCor：用于异构联合学习的基于相关性的主动客户选择策略
ADeLA: Automatic Dense Labeling with Attention for Viewpoint Shift in Semantic SegmentationADeLA：关注语义分割中视点偏移的自动密集标签
Adversarial Parametric Pose Prior对抗性参数姿势先验
Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior通过学习交通先验生成有用的事故多发驾驶场景
Pre-Training meets Self-Training for Supersizing 3D Reconstruction预训练与超大 3D 重建的自我训练相遇
Safe Self-Refinement for Transformer-based Domain Adaptation基于 Transformer 的域自适应的安全自我改进
ElePose: Unsupervised 3D Human Pose Estimation by Predicting Camera Elevation and Learning Normalizing Flows on 2D PosesElePose：通过预测相机高度和学习 2D 姿势的归一化流来进行无监督的 3D 人体姿势估计
Towards Multimodal Depth Estimation from Light Fields基于光场的多模态深度估计
Deformable Sprites for Unsupervised Video Decomposition用于无监督视频分解的可变形精灵
Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection你能发现变色龙吗？来自共显着目标检测的对抗性伪装图像
MISF: Multi-level Interactive Siamese Filtering for High-Fidelity Image InpaintingMISF：用于高保真图像修复的多级交互式连体过滤
Aug-NeRF: Training Stronger Neural Radiance Fields with Triple-Level Physically-Grounded AugmentationsAug-NeRF：通过三级物理接地增强训练更强的神经辐射场
Semi-supervised Semantic Segmentation with Error Localization Network带有错误定位网络的半监督语义分割
Quantization-aware Deep Optics for Snapshot Hyperspectral Imaging用于快照高光谱成像的量化感知深度光学
Gravitationally Lensed Black Hole Emission Tomography引力透镜黑洞发射断层扫描
Improving Video Model Transfer with Dynamic Representation Learning通过动态表示学习改进视频模型迁移
FWD: Real-time Novel View Synthesis with Forward Warping and DepthFWD：具有前向扭曲和深度的实时新视图合成
Enhancing Adversarial Training with Second-Order Statistics of Weights使用权重的二阶统计加强对抗训练
Patch Slimming for Efficient Vision Transformers高效视觉变形器的补丁瘦身
3DAC: Learning Attribute Compression for Point Clouds3DAC：点云的学习属性压缩
SNR-Aware Low-light Image EnhancementSNR 感知低光图像增强
Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation基于视频的人体姿态估计的时间特征对齐和互信息最大化
Motion-modulated Temporal Fragment Alignment Network For Few-Shot Action Recognition用于少镜头动作识别的运动调制时间片段对齐网络
Self-Supervised Bulk Motion Artifact Removal in Optical Coherence Tomography Angiography光学相干断层扫描血管造影中的自监督体运动伪影去除
Salient-to-Broad Transition for Video Person Re-identification视频人物重新识别的显着到广泛的过渡
Which images to label for few-shot medical landmark detection?标记哪些图像以进行少样本医学地标检测？
Hybrid Relation Guided Set Matching for Few-shot Action Recognition用于小样本动作识别的混合关系引导集匹配
Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction逐步为高质量人体运动预测的下一阶段生成更好的初始猜测
Bringing Old Films Back to Life让老电影起死回生
Face Relighting with Geometrically Consistent Shadows具有几何一致阴影的面部重新照明
Learning Cloth-Irrelevant Features for Cloth-Changing Person Re-identification学习与布料无关的特征以进行换衣人重新识别
DPICT: Deep Progressive Image Compression Using Trit-PlanesDPICT：使用 Trit-Planes 进行深度渐进式图像压缩
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering从表示到推理：面向视频问答的证据推理和常识推理
Simple but Effective: CLIP Embeddings for Embodied AI简单但有效：嵌入式 AI 的 CLIP 嵌入
Scene Consistency Representation Learning for Video Scene Segmentation视频场景分割的场景一致性表示学习
Neural Data-Dependent Transform for Learned Image Compression用于学习图像压缩的神经数据相关变换
CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow EstimationCamLiFlow：用于联合光流和场景流估计的双向相机-LiDAR 融合
Global Matching with Overlapping Attention for Optical Flow Estimation具有重叠注意力的全局匹配光流估计
Meta Agent Teaming Active Learning for Pose Estimation用于姿势估计的元代理组合主动学习
Robust Combination of Distributed Gradients Under Adversarial Perturbations对抗性扰动下分布式梯度的稳健组合
Toward Fast, Flexible, and Robust Low-Light Image Enhancement实现快速、灵活和稳健的低光图像增强
Motion-aware Contrastive Video Representation Learning via Foreground-background Merging通过前景-背景合并的运动感知对比视频表示学习
ViSTA: Vision and Scene Text Aggregation for Cross-Modal RetrievalViSTA：跨模态检索的视觉和场景文本聚合
L-Verse: Bidirectional Generation Between Image and TextL-Verse：图像和文本之间的双向生成
GANORCON: Are Generative Models Useful for Few-shot Segmentation?GANORCON：生成模型对小样本分割有用吗？
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation基于文本的视频分割的多模态特征建模运动
Towards Robust Adaptive Object Detection under Noisy Annotations噪声注释下的鲁棒自适应目标检测
Point2Cyl: Reverse Engineering 3D Objects – from Point Clouds to Extrusion CylindersPoint2Cyl：逆向工程 3D 对象——从点云到挤压圆柱体
MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic SegmentationMM-TTA：用于 3D 语义分割的多模态测试时间自适应
Subspace Adversarial Training子空间对抗训练
Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation语义分割的结构和统计纹理知识蒸馏
UniVIP: A Unified Framework for Self-Supervised Visual Pre-trainingUniVIP：自我监督视觉预训练的统一框架
MUM : Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object DetectionMUM : 混合图像块和 UnMix 特征块用于半监督目标检测
SS3D: Sparsely-Supervised 3D Object Detection from Point CloudSS3D：来自点云的稀疏监督 3D 对象检测
On the Integration of Self-Attention and Convolution关于self-attention和卷积的整合
Single-Domain Generalized Object Detection in Urban Scene via Cyclic-Disentangled Self-Distillation基于循环解纠缠自蒸馏的城市场景单域广义目标检测
Human Instance Matting via Mutual Guidance and Multi-Instance Refinement通过相互指导和多实例细化的人体实例消光
Delving Deep into the Generalization of Vision Transformers under Distribution Shifts深入研究分布变化下的视觉变形金刚的泛化
Causality Inspired Representation Learning for Domain Generalization因果关系启发的领域泛化表示学习
Learning Local Displacements for Point Cloud Completion学习点云补全的局部位移
Remember Intentions: Retrospective-Memory-based Trajectory Prediction记住意图：基于回顾性记忆的轨迹预测
Contextual Similarity Distillation for Asymmetric Image Retrieval非对称图像检索的上下文相似性蒸馏
Self-Supervised Models are Continual Learners自监督模型是持续学习者
High-Fidelity Human Avatars from a Single RGB Camera来自单个 RGB 相机的高保真人体头像
Not All Relations are Equal: Mining Informative Labels for Scene Graph Generation并非所有关系都是平等的：为场景图生成挖掘信息标签
TWIST: Two-Way Inter-label Self-Training for Semi-supervised 3D Instance SegmentationTWIST：用于半监督 3D 实例分割的双向标签间自训练
Focal length and object pose estimation via render and compare通过渲染和比较进行焦距和物体姿态估计
Kubric: A scalable dataset generatorKubric：可扩展的数据集生成器
VRDFormer: End-to-End Video Visual Relation Detection with TransformersVRDFormer：使用 Transformers 进行端到端视频视觉关系检测
A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection用于分段级视频复制检测的大规模综合数据集和复制重叠感知评估协议
Brain-inspired Multilayer Perceptron with Spiking Neurons具有尖峰神经元的类脑多层感知器
Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection很少有人能胜过所有：场景文本检测的特征采样和分组
High Quality Segmentation for Ultra High-resolution Images超高分辨率图像的高质量分割
Physically Disentangled Intra- and Inter-domain Adaptation for Varicolored Haze Removal去除杂色雾霾的物理解耦域内和域间自适应
HandOccNet: Occlusion-Robust 3D Hand Mesh Estimation NetworkHandOccNet：Occlusion-Robust 3D Hand Mesh Estimation Network
Future Transformer for Long-term Action Anticipation长期行动预期的未来变压器
Decoupling Zero-Shot Semantic Segmentation解耦零样本语义分割
Long-tail Recognition via Compositional Knowledge Transfer基于组合知识迁移的长尾识别
Open Challenges in Deep Stereo: the Booster Dataset深度立体中的公开挑战：Booster 数据集
BigDatasetGAN: Synthesizing ImageNet with Pixel-wise AnnotationsBigDatasetGAN：使用逐像素注释合成 ImageNet
Recall@k Surrogate Loss with Large Batches and Similarity Mixup大批量和相似性混合的 Recall@k 代理损失
PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervisionPoseTriplet：在自我监督下共同进化的 3D 人体姿势估计、模仿和幻觉
Dynamic Dual-Output Diffusion Models动态双输出扩散模型
End-to-End Human-Gaze-Target Detection with Transformers使用 Transformer 进行端到端的人眼目标检测
EMOCA: Emotion Driven Monocular Face Capture and AnimationEMOCA：情绪驱动的单目人脸捕捉和动画
R(Det) $^2$ : Randomized Decision Routing for Object DetectionR(Det) $^2$ ：用于对象检测的随机决策路由
Diffusion Autoencoders: Toward a Meaningful and Decodable Representation扩散自动编码器：迈向有意义且可解码的表示
PatchNet: A Simple Face Anti-Spoofing Framework via Fine-Grained Patch RecognitionPatchNet：通过细粒度补丁识别的简单人脸反欺骗框架
NeurMiPs: Neural Mixture of Planar Experts for View SynthesisNeurMiPs：用于视图合成的平面专家的神经混合
Learning to generate line drawings that convey geometry and semantics学习生成传达几何和语义的线条图
AlignQ: Alignment Quantization with ADMM-based Correlation PreservationAlignQ：使用基于 ADMM 的相关性保留的对齐量化
Learning Embodied Object-Search Strategies from 50k Human Demonstrations从 50k 人的演示中学习具体的对象搜索策略
Longitudinal Self-Supervision for Learning 2D Amodal Representation用于学习 2D Amodal 表示的纵向自我监督
Controllable Dynamic Multi-Task Architectures可控动态多任务架构
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation用于语义分割的多尺度高分辨率视觉转换器
Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning超越预训练的对象检测器：用于图像字幕的跨模式文本和视觉上下文
Depth-supervised NeRF: Fewer Views and Faster Training for Free深度监督的 NeRF：更少的视图和更快的免费训练
Learning to Detect Mobile Objects from LiDAR Scans Without Labels学习从没有标签的 LiDAR 扫描中检测移动物体
Revisiting Random Channel Pruning for Neural Network Compression重新审视用于神经网络压缩的随机通道修剪
ActiveZero: Mixed Domain Learning for Active Stereovision with Zero AnnotationActiveZero：具有零注释的主动立体视觉的混合域学习
Learning sRGB-to-Raw De-rendering with Content-Aware Metadata使用内容感知元数据学习 sRGB-to-Raw 反渲染
SimVQA: Exploring Simulated Environments for Visual Question AnsweringSimVQA：探索视觉问答的模拟环境
Cross-Domain Adaptive Teacher for Object Detection用于目标检测的跨域自适应教师
Modality-Agnostic Learning for Radar-Lidar Fusion in Vehicle Detection车辆检测中雷达-激光雷达融合的模态无关学习
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering千言万语胜过图片：以自然语言为中心的外知识视觉问答
Towards General Purpose Vision Systems: An End-to-End Task-Agnostic Vision-Language Architecture迈向通用视觉系统：与任务无关的端到端视觉语言架构
Holocurtains: Programming Light Curtains via Binary HolographyHolocurtains：通过二进制全息术对光幕进行编程
Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy利用您的本地和全球代表：一种新的自我监督学习策略
3D human tongue reconstruction from single “in-the-wild” images从单个“野外”图像重建 3D 人类舌头
Pushing the Performance Limit of Scene Text Recognizer without Human Annotation在没有人工注释的情况下推动场景文本识别器的性能极限
SAR-Net: Shape Alignment and Recovery Network for Category-level 6D Object Pose and Size EstimationSAR-Net：用于类别级 6D 对象姿势和大小估计的形状对齐和恢复网络
Improving Subgraph Recognition with Variational Graph Information Bottleneck利用变分图信息瓶颈改进子图识别
Towards Multi-domain Single Image Dehazing via Test-time Training通过测试时训练实现多域单图像去雾
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding MatchingEMScore：通过粗粒度和细粒度嵌入匹配评估视频字幕
CHEX: CHannel EXploration for CNN Model CompressionCHEX：CNN模型压缩的通道探索
ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural RepresentationsImFace：具有隐式神经表示的非线性 3D 可变形人脸模型
Deblur-NeRF: Neural Radiance Fields from Blurry imagesDeblur-NeRF：来自模糊图像的神经辐射场
An MIL-Derived Transformer for Weakly Supervised Point Cloud Segmentation用于弱监督点云分割的 MIL 衍生变压器
Distribution Consistent Neural Architecture Search分布一致的神经架构搜索
Training Object Detectors from Scratch: An Empirical Study in the Era of Vision Transformer从零开始训练目标检测器：视觉转换器时代的实证研究
Glass Segmentation using Intensity and Spectral Polarization Cues使用强度和光谱偏振线索进行玻璃分割
GAT-CADNet: Graph Attention Network for Panoptic Symbol Spotting in CAD DrawingsGAT-CADNet：用于 CAD 绘图中全景符号定位的图形注意网络
Unsupervised Deraining: Where Contrastive Learning Meets Self-similarity无监督脱水：对比学习遇到自相似性的地方
Delving into the Estimation Shift of Batch Normalization in a Network深入研究网络中批量标准化的估计偏移
Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light结合双目立体和单目结构光的深度估计
Full-Range Virtual Try-On with Recurrent Tri-Level Transformation具有反复三级转换的全方位虚拟试穿
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation弱监督语义分割的类重新激活图
Generalizing Interactive Backpropagating Refinement for Dense Prediction Networks为密集预测网络推广交互式反向传播细化
Protecting Celebrities from DeepFake with Identity Consistency Transformer使用身份一致性转换器保护名人免受 DeepFake 的影响
SVIP: Sequence VerIfication for Procedures in VideosSVIP：视频中过程的序列验证
Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints to Better Classify Objects in Videos只见树木不见森林：聚合多个视点以更好地分类视频中的对象
Deep Saliency Prior for Reducing Visual Distraction减少视觉干扰的深度显着性先验
ClothFormer: Taming Video Virtual Try-on in All ModuleClothFormer：在所有模块中驯服视频虚拟试穿
FLARF: Fast LArge-scale Radiance Field ReconstructionFLARF：快速大规模辐射场重建
Estimating Structural Disparities in Face Models估计人脸模型中的结构差异
Faithful Extreme Rescaling via Generative Prior Reciprocated Invertible Representations通过生成的先验互易可逆表示进行忠实的极端重新缩放
Animal Kingdom: A Large and Diverse Dataset for Animal Behavior Understanding动物王国：用于理解动物行为的庞大而多样的数据集
Uniform Subdivision of Omnidirectional Camera Space for Efficient Spherical Stereo Matching用于高效球面立体匹配的全向相机空间的均匀细分
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal RetrievalCOTS：用于跨模态检索的协作双流视觉语言预训练模型
Scene Graph Expansion for Semantics-Guided Image Outpainting语义引导图像外画的场景图扩展
Deep Constrained Least Squares for Blind Image Super-Resolution用于盲图像超分辨率的深度约束最小二乘
MaskGIT: Masked Generative Image TransformerMaskGIT：蒙面生成图像转换器
CMT: Convolutional Neural Networks Meet Vision TransformersCMT：卷积神经网络遇见视觉转换器
GraftNet: Towards Domain Generalized Stereo Matching with a Broad-Spectrum and Task-Oriented FeatureGraftNet：面向具有广谱和面向任务的特征的域广义立体匹配
SoftGroup for 3D Instance Segmentation on Point CloudsSoftGroup 用于点云上的 3D 实例分割
Partial Class Activation Attention for Semantic Segmentation语义分割的部分类激活注意
AnyFace: Free-style Text-to-Face Synthesis and ManipulationAnyFace：自由风格的文本到人脸合成和操作
PoseKernelLifter: Metric Lifting of 3D Human Pose using SoundPoseKernelLifter：使用声音的 3D 人体姿势的度量提升
LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object DetectionLIFT：学习用于 3D 对象检测的 4D LiDAR 图像融合变压器
Make It Move: Controllable Image-to-Video Generation with Text Descriptions让它动起来：带有文本描述的可控图像到视频生成
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels使用不可靠伪标签的半监督语义分割
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation学习不分割的内容：关于少镜头分割的新视角
TT-VSR: Learning Trajectory-Aware Transformer for Video Super-ResolutionTT-VSR：用于视频超分辨率的学习轨迹感知变压器
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes规范投票：在 3D 场景中实现稳健的定向边界框检测
DyRep: Bootstrapping Training with Dynamic Re-parameterizationDyRep：使用动态重新参数化的引导训练
VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot LearningVGSE：用于零样本学习的基于视觉的语义嵌入
GreedyNASv2: Greedier Search with a Greedy Path FilterGreedyNASv2：使用贪心路径过滤器的贪心搜索
HDR-NeRF: High Dynamic Range Neural Radiance FieldsHDR-NeRF：高动态范围神经辐射场
Novel-View Object Selection in Neural Volumetric Representations神经体积表示中的新视图对象选择
Relieving Long-tailed Instance Segmentation via Pairwise Class Balance通过 Pairwise Class Balance 减轻长尾实例分割
Complex Video Action Reasoning via Learnable Markov Logic Network基于可学习马尔可夫逻辑网络的复杂视频动作推理
PCL: Proxy-based Contrastive Learning for Domain GeneralizationPCL：基于代理的领域泛化对比学习
Unifying Motion Deblurring and Frame Interpolation with Events将运动去模糊和帧插值与事件统一起来
Shape-invariant 3D Adversarial Point Clouds形状不变的 3D 对抗点云
Learning Pixel-Level Distinctions for Video Highlight Detection学习视频高光检测的像素级区别
Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation小波知识蒸馏：迈向高效的图像到图像转换
ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic SegmentationADAS：多目标域自适应语义分割的直接适应策略
PSTR: End-to-End One-Step Person Search With TransformersPSTR：使用 Transformers 进行端到端的一站式人员搜索
Towards real-world navigation with deep differentiable planners通过深度可微规划器实现现实世界导航
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation用于弱监督语义分割的多类令牌转换器
Fourier Document Restoration for Robust Document Dewarping and Recognition用于鲁棒文档去扭曲和识别的傅里叶文档恢复
Neural RGB-D Surface Reconstruction神经 RGB-D 表面重建
LMGP: Lifted Multicut Meets Geometry Projections for Multi-Camera Multi-Object TrackingLMGP: Lifted Multicut 满足多摄像机多目标跟踪的几何投影
ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and GenerationManiTrans：通过 Token-wise 语义对齐和生成的实体级文本引导图像操作
Spatio-Temporal Gating-Adjacency GCN for Human Motion Prediction用于人体运动预测的时空门控邻接 GCN
What Matters For Meta-Learning Vision Regression Tasks?元学习视觉回归任务的重要性是什么？
Self-supervised Learning of Adversarial Examples: Towards Good Generalizations for Deepfake Detection对抗样本的自监督学习：迈向 Deepfake 检测的良好泛化
Ray Priors through Reprojection: Improving Neural Radiance Fields for Novel View Extrapolation通过重投影的射线先验：改进新视图外推的神经辐射场
Perception Prioritized Training of Diffusion Models扩散模型的感知优先训练
Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving自动驾驶中用于单目 3D 目标检测的伪立体
Human Trajectory Prediction with Momentary Observation基于瞬时观测的人体轨迹预测
General Facial Representation Learning in a Visual-Linguistic Manner以视觉语言方式学习一般面部表征
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions通过大规模视频转录推进高分辨率视频语言表示
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model预测、预防和评估：由预训练的视觉语言模型支持的解耦的文本驱动图像处理
Contextual Outpainting with Object-level Contrastive Learning使用对象级对比学习的上下文外绘
Optical Flow Estimation for Spiking Camera尖峰相机的光流估计
PointCLIP: Point Cloud Understanding by CLIPPointCLIP：通过 CLIP 理解点云
Large scale pre-training for person re-identification with noisy labels带有噪声标签的人员重新识别的大规模预训练
Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection放大和缩小：用于伪装目标检测的混合尺度三元组网络
Blended Diffusion for Text-driven Editing of Natural Images用于自然图像文本驱动编辑的混合扩散
CREAM: Weakly Supervised Object Localization via Class RE-Activation MappingCREAM：通过类重新激活映射进行弱监督对象定位
Finding Fallen Objects Via Asynchronous Audio-Visual Integration通过异步视听集成寻找坠落的物体
HeadNeRF: A Real-time NeRF-Based Parametric Head ModelHeadNeRF：基于实时 NeRF 的参数化头部模型
Interacting Attention Graph for Single Image Two-Hand Reconstruction单幅图像双手重建的交互注意力图
Learning based Multi-modality Image and Video Compression基于学习的多模态图像和视频压缩
DR.VIC: Decomposition and Reasoning for Video Individual CountingDR.VIC：视频个体计数的分解与推理
End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection通用事件边界检测的端到端压缩视频表示学习
BaLeNAS: Differentiable Architecture Search via Bayesian Learning RuleBaLeNAS：通过贝叶斯学习规则进行可微架构搜索
Task Adaptive Parameter Sharing for Multi-Task Learning多任务学习的任务自适应参数共享
ViM: Out-Of-Distribution with Virtual-logit MatchingViM：具有虚拟 logit 匹配的分布外
Pyramid Adversarial Training Improves ViT Performance金字塔对抗训练提高 ViT 表现
Depth-Guided Sparse Structure-from-Motion for Movies and TV Shows电影和电视节目的深度引导稀疏结构
Part-based Pseudo Label Refinement for Unsupervised Person Re-identification用于无监督人员重新识别的基于部分的伪标签细化
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment通过基于检索的多粒度对齐进行无监督视觉和语言预训练
MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D ConvolutionsMVS2D：通过注意力驱动的 2D 卷积实现高效的多视图立体
Consistent Explanations by Constrastive Learning对比学习的一致解释
FvOR: Robust Joint Shape and Pose Optimization for Few-view Object ReconstructionFvOR：用于少视图对象重建的稳健关节形状和姿势优化
Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision具有自我监督的情境化时空对比学习
Frame Averaging for Equivariant Shape Space Learning等变形状空间学习的帧平均
iFS-RCNN: An Incremental Few-shot Instance SegmenteriFS-RCNN：增量少样本实例分割器
Bring Evanescent Representations to Life in Lifelong Class Incremental Learning在终身课堂增量学习中将短暂的表示带入生活
Text to Image Generation with Semantic-Spatial Aware GAN使用语义空间感知 GAN 生成文本到图像
Real-Time Light-Weight Near-Field Photometric Stereo实时轻量近场光度立体
DESTR: Object Detection with Split TransformerDESTR：使用拆分变压器进行对象检测
Backdoor Attacks on Self-Supervised Learning对自我监督学习的后门攻击
Diverse Image Outpainting via GAN Inversion通过 GAN Inversion 进行多样化的图像外绘
High-Resolution Image Synthesis with Latent Diffusion Models具有潜在扩散模型的高分辨率图像合成
NFormer: Robust Person Re-identification with Neighbor TransformerNFormer：使用 Neighbor Transformer 进行强大的人员重新识别
Winoground: Probing Vision and Language Models for Visio-Linguistic CompositionalityWinoground：探索视觉语言组合的视觉和语言模型
CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic DataCrossLoc：由多模态合成数据辅助的可扩展空中定位
SceneSqueezer: Learning to Compress Scene for Camera RelocalizationSceneSqueezer：学习压缩场景以进行相机重定位
Dancing under the stars: video denoising in starlight在星空下跳舞：星光下的视频去噪
Tracking People by Predicting 3D Appearance, Location and Pose通过预测 3D 外观、位置和姿势来跟踪人员
BCOT: A Markerless High-Precision 3D Object Tracking BenchmarkBCOT：无标记的高精度 3D 对象跟踪基准
Continual Stereo Matching of Continuous Driving Scenes with Growing Architecture连续驾驶场景与不断发展的建筑的连续立体匹配
CVF-SID: Cyclic multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise from ImageCVF-SID：通过从图像中分离噪声来进行自监督图像去噪的循环多变量函数
Unknown-Aware Object Detection: Learning What You Don’t Know from Videos in the Wild未知感知对象检测：从野外视频中学习你不知道的东西
BodyGAN: General-purpose Controllable Neural Human Body GenerationBodyGAN：通用可控神经人体生成
Training-free Transformer Architecture Search免培训变压器架构搜索
Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification学习附属：少数样本分类的相互集中学习
Single-Photon Structured Light单光子结构光
Towards Practical Certifiable Patch Defense with Vision Transformer使用 Vision Transformer 实现实用的可认证补丁防御
On Generalizing Beyond Domains in Cross-Domain Continual Learning关于跨域持续学习中的域外泛化
Practical Learned Lossless JPEG Recompression with Multi-Level Cross-Channel Entropy Model in the DCT DomainDCT域中多级跨通道熵模型的实用学习无损JPEG再压缩
GazeOnce: Real-Time Multi-Person Gaze EstimationGazeOnce：实时多人注视估计
RendNet: Unified 2D/3D Recognizer with Latent Space RenderingRendNet：具有潜在空间渲染的统一 2D/3D 识别器
Identifying Ambiguous Similarity Conditions via Semantic Matching通过语义匹配识别模糊相似性条件
Learn from Others and Be Yourself in Heterogeneous Federated Learning向他人学习，在异构联邦学习中做自己
Enhancing Face Recognition with Self-Supervised 3D Reconstruction通过自我监督 3D 重建增强人脸识别
Visual Vibration Tomography: Estimating Interior Material Properties from Monocular Video视觉振动断层扫描：从单目视频估计内部材料特性
ACPL: Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image ClassificationACPL：用于半监督医学图像分类的反课程伪标签
The Two Dimensions of Worst-case Training and the Integrated Effect for Out-of-domain Generalization最坏情况训练的两个维度和域外泛化的综合效果
Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation半监督语义分割的扰动和严格均值教师
Directional Self-supervised Learning for Heavy Image Augmentations用于重图像增强的定向自监督学习
CPPF: Towards Robust Category-Level 9D Pose Estimation in the WildCPPF：在野外实现稳健的类别级 9D 姿势估计
Cross-patch Dense Contrastive Learning for Semi-supervised Segmentation of Cellular Nuclei in Histopathologic Images跨补丁密集对比学习用于组织病理学图像中细胞核的半监督分割
Dual-AI: Dual-path Actor Interaction Learning for Group Activity RecognitionDual-AI：用于群体活动识别的双路径 Actor 交互学习
UCC: Uncertainty guided Cross-head Co-training for Semi-Supervised Semantic SegmentationUCC：用于半监督语义分割的不确定性引导交叉头协同训练
Few-Shot Object Detection with Fully Cross-Transformer使用完全交叉变换器的少镜头目标检测
Exploiting Temporal Relations on Radar Perception for Autonomous Driving利用自动驾驶雷达感知的时间关系
Unsupervised Visual Representation Learning by Online Constrained K-Means通过在线约束 K-Means 进行无监督视觉表示学习
Contextual Debiasing for Visual Recognition with Causal Mechanisms具有因果机制的视觉识别的上下文去偏
Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes从野外拥挤的场景中学习估计稳健的 3D 人体网格
Towards Accurate Facial Landmark Detection via Cascaded Transformers通过级联变压器实现准确的面部地标检测
DIP: Deep Inverse Patchmatch for High-Resolution Optical FlowDIP：高分辨率光流的深度逆补丁匹配
Critical Regularizations for Neural Surface Reconstruction in the Wild野外神经表面重建的关键正则化
Per-Clip Video Object Segmentation每剪辑视频对象分割
CAFE: Learning to Condense Dataset by Aligning FeaturesCAFE：通过对齐特征学习压缩数据集
ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and SynthesisArtiBoost：通过在线探索和合成提升关节式 3D 手物体姿势估计
SphereSR: 360° Image Super-Resolution with Arbitrary Projection via Continuous Spherical Image RepresentationSphereSR：通过连续球面图像表示进行任意投影的 360° 图像超分辨率
Learning to Restore 3D Face from In-the-Wild Degraded Images学习从野外退化图像中恢复 3D 人脸
BEVT: BERT Pretraining of Video TransformersBEVT：视频转换器的 BERT 预训练
A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-shot Representation Forecasting通过记忆增强循环和一次性表示预测的混合以自我为中心的活动预期框架
Sparse Fuse Dense: Towards High Quality 3D Detection with Depth CompletionSparse Fuse Dense：迈向具有深度完成的高质量 3D 检测
MSTR: Mutli-Scale Transformer for End-to-End Human-Object Interaction DetectionMSTR：用于端到端人与对象交互检测的多尺度转换器
Synthetic Aperture Imaging with Events and Frames具有事件和帧的合成孔径成像
AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot NetworkAP-BSN：通过非对称 PD 和盲点网络对真实世界图像进行自我监督去噪
Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information利用地理和时间信息进行细粒度图像分类的动态 MLP
Lepard: Learning partial point cloud matching in rigid and deformable scenesLepard：在刚性和可变形场景中学习部分点云匹配
Neural Compression-Based Feature Learning for Video Restoration用于视频恢复的基于神经压缩的特征学习
Learning to Collaborate in Decentralized Learning of Personalized Models在个性化模型的分散学习中学习协作
Rethinking Parsing Branch for Human Densepose Estimation重新思考人类密集估计的解析分支
Collaborative Transformers for Grounded Situation Recognition用于接地情况识别的协作变压器
ISNet: Shape Matters for Infrared Small Target DetectionISNet：红外小目标检测的形状问题
Bi-level Doubly Variational Learning for Energy-based Latent Variable Models基于能量的潜变量模型的双层双变分学习
PSMNet: Position-aware Stereo Merging Network for Room Layout EstimationPSMNet：用于房间布局估计的位置感知立体合并网络
Bi-level Alignment for Cross-Domain Crowd Counting跨域人群计数的双层对齐
Unsupervised Homography Estimation with Coplanarity-Aware GAN具有 Coplanarity-Aware GAN 的无监督单应性估计
Real-time Object Detection for Streaming Perception用于流感知的实时对象检测
Neural Window Fully-connected CRFs for Monocular Depth Estimation用于单目深度估计的神经窗口全连接 CRF
Deep Hyperspectral-Depth Reconstruction Using Single Color-Dot Projection使用单色点投影的深高光谱深度重建
Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing用于人脸解析的具有循环自调节的解耦多任务学习
Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon阴影可能很危险：自然现象的隐秘而有效的物理世界对抗性攻击
Towards Understanding Adversarial Robustness of Optical Flow Networks了解光流网络的对抗鲁棒性
Class-Incremental Learning by Knowledge Distillation with Adaptive Feature Consolidation基于自适应特征整合的知识蒸馏的类增量学习
A Continuous Video Generator with the Price, Quality and Perks of StyleGAN2具有 StyleGAN2 的价格、质量和优势的连续视频生成器
Self-Supervised Learning of Object Parts for Semantic Segmentation用于语义分割的对象部分的自监督学习
High-Resolution Image Harmonization via Collaborative Dual Transformations通过协作双变换实现高分辨率图像协调
Slot-VPS: Object-centric Representation Learning for Video Panoptic SegmentationSlot-VPS：用于视频全景分割的以对象为中心的表示学习
FIFO: Learning Fog-invariant Features for Foggy Scene SegmentationFIFO：学习雾景分割的雾不变特征
Forecasting Characteristic 3D Poses of Human Actions预测人类行为的特征 3D 姿势
Equalized Focal Loss for Dense Long-tailed Object Detection用于密集长尾目标检测的均衡焦点损失
Style Neophile: Constantly Seeking Novel Styles for Domain Generalization风格Neophile：不断寻求新的领域泛化风格
Mining Multi-View Information: A Strong Self-Supervised Framework for Depth-based 3D Hand Pose and Mesh Estimation挖掘多视图信息：基于深度的 3D 手部姿势和网格估计的强大自监督框架
The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation魔鬼在标签中：用于鲁棒场景图生成的嘈杂标签校正
Correlation Verification for Image Retrieval图像检索的相关性验证
Exploring Denoised Cross-video Contrast for Weakly-supervised Temporal Action Localization探索用于弱监督时间动作定位的去噪跨视频对比度
UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary DetectionUBoCo：通用事件边界检测的无监督边界对比学习
Multi-View Mesh Reconstruction with Neural Deferred Shading使用神经延迟着色的多视图网格重建
SoftCollage: A Differentiable Probabilistic Tree Generator for Image CollageSoftCollage：用于图像拼贴的可微分概率树生成器
OVE6D: Object Viewpoint Encoding For Depth-based 6D Object Pose EstimationOVE6D：基于深度的 6D 对象姿态估计的对象视点编码
Smooth-Swap: A Simple Enhancement for Face-Swapping with SmoothnessSmooth-Swap：平滑换脸的简单增强
3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection3D-SPS：通过参考点渐进选择的单级 3D 视觉接地
Image Disentanglement Autoencoder for Steganography without Embedding无嵌入隐写术的图像解缠结自动编码器
Gated2Gated: Self-Supervised Depth Estimation from Gated ImagesGated2Gated：门控图像的自我监督深度估计
Interact before Align: Leveraging Cross-Modal Knowledge for Domain Adaptive Action Recognition对齐前交互：利用跨模态知识进行域自适应动作识别
DN-DETR: Accelerate DETR Training by Introducing Query DeNoisingDN-DETR：通过引入查询去噪加速 DETR 训练
The Probabilistic Normal Epipolar Constraint for Frame-To-Frame Rotation Optimization under Uncertain Feature Positions不确定特征位置下帧到帧旋转优化的概率法线对极约束
A Scalable Combinatorial Solver for Elastic Geometrically Consistent 3D Shape Matching用于弹性几何一致 3D 形状匹配的可扩展组合求解器
Enhancing Classifier Conservativeness and Robustness by Polynomiality通过多项式增强分类器的保守性和鲁棒性
Raw High-Definition Radar for Multi-Task Learning用于多任务学习的原始高清雷达
Self-Supervised Image Representation Learning with Geometric Set Consistency具有几何集一致性的自监督图像表示学习
Multi-View Transformer for 3D Visual Grounding用于 3D 视觉接地的多视图变压器
Semiconductor Defect Detection by Hybrid Classical-Quantum Deep Learning混合经典-量子深度学习的半导体缺陷检测
Attention Reveals Occlusions注意力揭示遮挡
Revisiting Domain Generalized Stereo Matching Networks from a Feature Consistency Perspective从特征一致性的角度重新审视域广义立体匹配网络
Chi-transformer: Towards Reliable Stereo From CuesChi-transformer：从线索走向可靠的立体声
NinjaDesc: Content-Concealing Visual Descriptors via Adversarial LearningNinjaDesc：通过对抗学习隐藏内容的视觉描述符
SwapMix: Diagnosing and Regularizing the Over-reliance on Visual Context in Visual Question AnsweringSwapMix：诊断和规范视觉问答中对视觉上下文的过度依赖
Learning Part Segmentation through Unsupervised Domain Adaptation from Synthetic Vehicles通过合成车辆的无监督域适应学习零件分割
CellTypeGraph: A New Geometric Computer Vision BenchmarkCellTypeGraph：一种新的几何计算机视觉基准
Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning组合零样本学习的连体对比嵌入网络
Reference-based Video Super-Resolution Using Multi-Camera Video Triplets使用多摄像机视频三元组的基于参考的视频超分辨率
End-to-End Semi-Supervised Learning for Video Action Detection视频动作检测的端到端半监督学习
Parameter-free Online Test-time Adaptation无参数在线测试时间自适应
3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces通过小批量特征交换身体和面部的 3D 形状变分自动编码器潜在解缠结
Dual-Key Multimodal Backdoors for Visual Question Answering用于视觉问答的双键多模式后门
Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective神经网络可以两次学习相同的模型吗？从决策边界的角度研究可重复性和双重下降
RePaint: Inpainting using Denoising Diffusion Probabilistic ModelsRePaint：使用去噪扩散概率模型进行修复
Improving GAN Equilibrium by Raising Spatial Awareness通过提高空间意识来改善 GAN 平衡
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning超越监督与无监督：图像表示学习的代表性基准测试和分析
A variational Bayesian method for similarity learning in non-rigid image registration非刚性图像配准中相似性学习的变分贝叶斯方法
Task2Sim: Towards Effective Pre-training and Transfer from Synthetic DataTask2Sim：从合成数据实现有效的预训练和迁移
Adaptive Trajectory Prediction via Transferable GNN基于可迁移 GNN 的自适应轨迹预测
Learning to Learn across Diverse Data Biases in Deep Face Recognition学习在深度人脸识别中跨多种数据偏差进行学习
RIDDLE: Lidar Data Compression with Range Image Deep Delta EncodingRIDDLE：激光雷达数据压缩与距离图像深度增量编码
Total Variation Optimization Layers for Computer Vision计算机视觉的总变异优化层
Transforming Model Prediction for Tracking转换模型预测以进行跟踪
Human Mesh Recovery from Multiple Shots从多次射击中恢复人体网格
FastDOG: Fast Discrete Optimization on GPUFastDOG：GPU 上的快速离散优化
Estimating Example Difficulty using Variance of Gradients使用梯度方差估计示例难度
Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation缩小跨筒仓联合医学图像分割的泛化差距
Scale-Equivalent Distillation for Semi-Supervised Object Detection用于半监督目标检测的尺度等效蒸馏
Long-term Visual Map Sparsification with Heterogeneous GNN使用异构 GNN 的长期视觉地图稀疏化
ResSFL: A Resistance Transfer Framework for Defending Model Inversion Attack in Split Federated LearningResSFL：在拆分联邦学习中防御模型反转攻击的阻力转移框架
Fast Point Transformer快速点变压器
Sketch3T: Test-time Training for Zero-Shot SBIRSketch3T：零样本 SBIR 的测试时间训练
Generative Flows with Invertible Attentions具有可逆注意力的生成流
ABO: Dataset and Benchmarks for Real-World 3D Object UnderstandingABO：真实世界 3D 对象理解的数据集和基准
A Dual Weighting Label Assignment Scheme for Object Detection一种用于目标检测的双重加权标签分配方案
ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts适应：视觉语言导航与模态对齐的动作提示
Explore the Spatio-temporal Aggregation for Insubstantial Object Detection：Benchmark Dataset and Baseline探索非实体对象检测的时空聚合：基准数据集和基线
A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information深入探讨深度时空网络编码的内容：量化静态与动态信息
DGECN: A Depth-Guided Edge Convolutional Network For End-to-End 6D Pose EstimationDGECN：用于端到端 6D 姿态估计的深度引导边缘卷积网络
BNUDC: A Two-Branched Deep Neural Network for Restoring Images from Under-Display CamerasBNUDC：用于从显示不足的相机中恢复图像的两分支深度神经网络
Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation朝向更少的注释：通过区域不纯度和预测不确定性进行域自适应语义分割的主动学习
Hallucinated Neural Radiance Fields in the Wild野外幻觉神经辐射场
The Devil is in the Margin: Margin-based Label Smoothing for Network Calibration魔鬼在边缘：用于网络校准的基于边缘的标签平滑
Deep Depth from Focus with Differential Focus Volume具有不同焦点体积的焦点深度
Towards Layer-wise Image Vectorization迈向逐层图像矢量化
Robust Federated Learning with Noisy and Heterogeneous Clients具有嘈杂和异构客户端的鲁棒联合学习
Retrieval-based Spatially Adaptive Normalization for Semantic Image Synthesis用于语义图像合成的基于检索的空间自适应归一化
Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation用于少量语义分割的动态原型卷积网络
Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training通过时空插值一致性训练进行视频阴影检测
It’s All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher一切尽在老师身上：零镜头量化更贴近老师
VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance SegmentationVISOLO：用于高效在线视频实例分割的基于网格的时空聚合
Rethinking Spatial Invariance of Convolutional Networks for Object Counting重新思考用于对象计数的卷积网络的空间不变性
Self-supervised Correlation Mining Network for Person Image Generation用于人物图像生成的自监督相关挖掘网络
ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-high Resolution SegmentationISDNet：集成浅层和深层网络以实现高效的超高分辨率分割
Exploring Effective Data for Surrogate Training Towards Black-box Attack探索针对黑盒攻击的代理训练的有效数据
Contrastive Learning for Space-Time Correspondence via Self-cycle Consistency基于自循环一致性的时空对应对比学习
Accelerating Video Object Segmentation with Compressed Video使用压缩视频加速视频对象分割
Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory通过双峰联想记忆进行声音和图像表示的弱配对联想学习
Incremental Cross-view Mutual Distillation for Self-supervised Medical CT Synthesis用于自监督医学 CT 合成的增量交叉视图相互蒸馏
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer并非所有代币都是平等的：通过代币聚类转换器进行以人为中心的可视化分析
Non-parametric Depth Distribution Modelling based Depth Inference for Multi-view Stereo基于非参数深度分布建模的多视图立体深度推断
LISA: Learning Implicit Shape and Appearance of HandsLISA：学习手的隐式形状和外观
GIQE: Generic Image Quality Enhancement via N $^{th}$ Order Iterative DegradationGIQE：通过 N $^{th}$ 阶迭代降级的通用图像质量增强
Continual Learning for Visual Search with Backward Consistent Feature Embedding具有向后一致特征嵌入的视觉搜索的持续学习
STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded ScenesSTCrowd：拥挤场景中行人感知的多模态数据集
Differentiable Stereopsis: Meshes from multiple views using differentiable rendering可微立体：使用可微渲染的多个视图的网格
ST++: Make Self-training Work Better for Semi-supervised Semantic SegmentationST++：让自我训练更好地用于半监督语义分割
Arbitrary-Scale Image Synthesis任意尺度图像合成
CRIS: CLIP-Driven Referring Image SegmentationCRIS：剪辑驱动的参考图像分割
ShapeFormer: Transformer-based Shape Completion via Sparse RepresentationShapeFormer：通过稀疏表示的基于 Transformer 的形状完成
Quantifying Societal Bias Amplification in Image Captioning量化图像字幕中的社会偏见放大
Omni-DETR: Omni-Supervised Object Detection with TransformersOmni-DETR：使用 Transformers 进行全方位监督的目标检测
XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document UnderstandingXYLayoutLM：迈向布局感知多模式网络，以实现视觉丰富的文档理解
Cross-Architecture Self-supervised Video Representation Learning跨架构自监督视频表示学习
Feature Erasing and Diffusion Network for Occluded Person Re-Identification用于遮挡人员重新识别的特征擦除和扩散网络
Styleformer: Transformer based Generative Adversarial Networks with Style VectorStyleformer：基于 Transformer 的具有样式向量的生成对抗网络
A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty基于实例难度的类不平衡分类的再平衡策略
360-Attack: Distortion-Aware Perturbations from Perspective-Views360 度攻击：透视图的失真感知扰动
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance FieldsCLIP-NeRF：文本和图像驱动的神经辐射场操作
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos模态特定注释视频上多模态动作识别的可学习不相关模态丢失
Cerberus Transformer: Joint Semantic, Affordance and Attribute ParsingCerberus Transformer：联合语义、功能和属性解析
NICE-SLAM: Neural Implicit Scalable Encoding for SLAMNICE-SLAM：SLAM 的神经隐式可扩展编码
FIBA: Frequency-Injection based Backdoor Attack in Medical Image AnalysisFIBA：医学图像分析中基于频率注入的后门攻击
Learning Modal-Invariant and Temporal-Memory for Video-based Visible-Infrared Person Re-Identification学习基于视频的可见红外人员重新识别的模态不变和时间记忆
Continual Predictive Learning from Videos从视频中持续预测学习
BatchFormer: Learning to Explore Sample Relationships for Robust Representation LearningBatchFormer：学习探索样本关系以进行鲁棒表示学习
Learning to Zoom Inside Camera Imaging Pipeline学习放大相机成像管道
TeachAugment: Data Augmentation Optimization Using Teacher KnowledgeTeachAugment：使用教师知识进行数据增强优化
PhyIR: Physics-based Inverse Rendering for Panoramic Indoor ImagesPhyIR：全景室内图像的基于物理的逆向渲染
Finding Good Configurations of Planar Primitives in Unorganized Point Clouds在无组织点云中寻找良好的平面基元配置
Towards Better Understanding Attribution Methods更好地理解归因方法
B-cos Networks: Alignment is All We Need for InterpretabilityB-cos 网络：对齐是我们所需要的可解释性
TO-FLOW: Efficient Continuous Normalizing Flows with Temporal Optimization adjoint with Moving SpeedTO-FLOW：具有时间优化和移动速度的高效连续归一化流
Learning Invisible Markers for Hidden Codes in Offline-to-online Photography学习离线到在线摄影中隐藏代码的隐形标记
Learning Distinctive Margin toward Active Domain Adaptation向主动领域适应学习独特的边际
Adiabatic Quantum Computing for Multi Object Tracking用于多目标跟踪的绝热量子计算
Learnable Lookup Table for Neural Network Quantization神经网络量化的可学习查找表
Artistic Style Discovery With Independent Components独立组件的艺术风格发现
Occlusion-Aware Cost Constructor for Light Field Depth Estimation光场深度估计的遮挡感知成本构造函数
Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning通过序列对比学习的长视频逐帧动作表示
Which Model to Transfer? Finding the Needle in the Growing Haystack要转移哪个模型？在不断增长的干草堆中寻找针
Using 3D Topological Connectivity for Ghost Particle Reduction in Flow Reconstruction在流重构中使用 3D 拓扑连通性减少鬼粒子
Neural Points: Point Cloud Representation with Neural Fields神经点：具有神经场的点云表示
C $^2$ AM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic SegmentationC $^2$ AM：用于弱监督对象定位和语义分割的类不可知激活图的对比学习
RCP: Recurrent Closest Point for Point CloudRCP：点云的循环最近点
Label, Verify, Correct: A Simple Few-Shot Object Detection Method标签、验证、正确：一种简单的 Few-Shot 对象检测方法
Weakly-supervised Action Transition Learning for Stochastic Human Motion Prediction用于随机人体运动预测的弱监督动作转换学习
Dual-Generator Face Reenactment双生成器人脸再现
BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active AnnotationBoostMIS：使用自适应伪标签和信息性主动注释促进医学图像半监督学习
InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume RenderingInfoNeRF：少镜头神经体渲染的射线熵最小化
Balanced Contrastive Learning for Long-Tailed Visual Recognition长尾视觉识别的平衡对比学习
The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware ConvolutionThe Devil is in the Pose：通过 Pose-aware Convolution 的无歧义 3D 旋转不变学习
Partially Does It: Towards Scene-Level FG-SBIR with Partial Input部分做到了：走向带有部分输入的场景级 FG-SBIR
Source-Free Object Detection by Learning to Overlook Domain Style通过学习忽略领域风格进行无源目标检测
Region-Aware Face Swapping区域感知人脸交换
COOPERNAUT: End-to-End Driving with Cooperative Perceptionfor Networked VehiclesCOOPERNAUT：具有协作感知的联网车辆端到端驾驶
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language TasksNLX-GPT：视觉和视觉语言任务中的自然语言解释模型
SkinningNet: Two-Stream Graph Convolutional Neural Network for Skinning Prediction of Synthetic CharactersSkinningNet：用于合成字符皮肤预测的双流图卷积神经网络
Efficient Large-scale Localization by Global Instance Recognition基于全局实例识别的高效大规模本地化
All-photon Polarimetric Time-of-Flight Imaging全光子偏振飞行时间成像
Parametric Scattering Networks参数散射网络
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question AnsweringMuKEA：基于知识的视觉问答的多模态知识提取和积累
Coarse-to-Fine Feature Mining for Video Semantic Segmentation视频语义分割的粗到细特征挖掘
Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation基于自适应相关的级联循环网络的实用立体匹配
Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality用于虚拟现实的强大的以自我为中心的逼真面部表情转移
Rethinking Visual Geo-localization for Large-Scale Applications重新思考大规模应用程序的视觉地理定位
Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with Learned Morph Maps多态 GAN：使用学习的变形图生成跨多个域的对齐样本
Balanced and Hierarchical Relation Learning for One-shot Object Detection用于一次性目标检测的平衡和分层关系学习
High-Fidelity GAN Inversion for Image Attribute Editing用于图像属性编辑的高保真 GAN 反演
Killing Two Birds with One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC一石二鸟：Partial FC 对人脸识别 CNN 的高效稳健训练
I M Avatar: Implicit Morphable Head Avatars from VideosIM Avatar：来自视频的隐式可变形头像
Proactive Image Manipulation Detection主动图像处理检测
Text Spotting Transformers文本识别变形金刚
Learning a Structured Latent Space for Unsupervised Point Cloud Completion学习用于无监督点云完成的结构化潜在空间
PCA-Based Knowledge Distillation Towards Lightweight and Content-Style Balanced Photorealistic Style Transfer Models面向轻量级和内容风格平衡的真实感风格转移模型的基于 PCA 的知识蒸馏
Grounding Answers for Visual Questions Asked by Visually Impaired People视障人士提出的视觉问题的基本答案
Efficient Classification of Very Large Images with Tiny Objects对具有微小对象的超大图像进行有效分类
Leveraging Adversarial Examples to Quantify Membership Information Leakage利用对抗性示例量化会员信息泄露
Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks迈向深度神经网络的实际部署阶段后门攻击
When to Prune? A Policy towards Early Structural Pruning什么时候修剪？早期结构修剪政策
Robust Optimization as Data Augmentation for Large-scale Graphs鲁棒优化作为大规模图的数据增强
Sylph: A Hypernetwork Framework for Incremental Few-shot Object DetectionSylph：用于增量少镜头目标检测的超网络框架
Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis视听语音编解码器：通过重新合成重新思考视听语音增强
Harmony: A Generic Unsupervised Approach for Disentangling Semantic Content from Parameterized TransformationsHarmony：一种从参数化转换中分离语义内容的通用无监督方法
The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement良好握手的隐含价值：手持式多帧神经深度细化
Noise2NoiseFlow: Realistic Camera Noise Modeling without Clean ImagesNoise2NoiseFlow：没有干净图像的逼真的相机噪声建模
MetaPose: Fast 3D Pose from Multiple Views without 3D SupervisionMetaPose：无需 3D 监督即可从多个视图快速生成 3D 姿势
Virtual Elastic Objects虚拟弹性对象
StyleSDF: High-Resolution 3D-Consistent Image and Geometry GenerationStyleSDF：高分辨率 3D 一致图像和几何生成
Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning重新思考解决联邦学习中数据异构性的架构设计
Self-supervised Neural Articulated Shape and Appearance Models自监督神经关节形状和外观模型
A Self-Supervised Descriptor for Image Copy Detection用于图像复制检测的自监督描述符
Rethinking Deep Face Restoration重新思考深层面部修复
Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes用于室内场景中单图像逆向渲染的密集视觉转换器
Rethinking Controllable Variational Autoencoders重新思考可控变分自编码器
Convolutions for Spatial Interaction Modeling空间交互建模的卷积
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization以自我为中心的深度多通道视听有源扬声器定位
AdaFace: Quality Adaptive Margin for Face RecognitionAdaFace：人脸识别的质量自适应余量
Towards End-to-End Unified Scene Text Detection and Layout Analysis走向端到端统一场景文本检测和布局分析
Active Learning by Feature Mixing通过特征混合进行主动学习
Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs分类然后接地：将视频场景图重新格式化为时间二分图
Towards Better Plasticity-Stability Trade-off in Incremental Learning: A Simple Linear Connector在增量学习中实现更好的可塑性-稳定性权衡：一个简单的线性连接器
Cloth-Changing Person Re-identification from A Single Image with Gait Prediction and Regularization具有步态预测和正则化的单幅图像的换布人重新识别
SpaceEdit: Learning a Unified Editing Space for Open-Domain Image EditingSpaceEdit：学习开放域图像编辑的统一编辑空间
Learning to Answer Questions in Dynamic Audio-Visual Scenarios学习在动态视听场景中回答问题
Non-generative Generalized Zero-shot Learning via Task-correlated Disentanglement and Controllable Samples Synthesis通过任务相关解开和可控样本合成的非生成广义零样本学习
Knowledge-Driven Self-Supervised Representation Learning for Facial Action Unit Recognition用于面部动作单元识别的知识驱动的自监督表示学习
Coupling Vision and Proprioception for Navigation of Legged Robots耦合视觉和本体感知的腿式机器人导航
URetinex-Net: Retinex-based Deep Unfolding Network for Low-light Image EnhancementURetinex-Net：用于弱光图像增强的基于 Retinex 的深度展开网络
Modeling Image Composition for Complex Scene Generation为复杂场景生成建模图像合成
Think Twice Before Detecting GAN-generated Fake Images from their Spectral Domain Imprints在从光谱域印记中检测 GAN 生成的假图像之前三思而后行
Undoing the Damage of Label Shift for Cross-domain Semantic Segmentation消除标签移位对跨域语义分割的损害
Implicit Motion Handling for Video Camouflaged Object Detection视频伪装对象检测的隐式运动处理
Contrastive Conditional Neural Processes对比条件神经过程
Exploring Set Similarity for Dense Self-supervised Representation Learning探索密集自监督表示学习的集合相似性
E2V-SDE: From Asynchronous Events to Fast and Continuous Video Reconstruction via Neural Stochastic Differential EquationsE2V-SDE：通过神经随机微分方程从异步事件到快速连续视频重建
Catching Both Gray and Black Swans: Open-set Supervised Anomaly Detection捕捉灰天鹅和黑天鹅：开放集监督异常检测
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal PretrainingM5Product：电子商务多模态预训练的自协调对比学习
CycleMix: A Holistic Strategy for Medical Image Segmentation from Scribble SupervisionCycleMix：Scribble Supervision 医学图像分割的整体策略
Mixed Multimodal Tokens for Vision Transformers视觉转换器的混合多模式令牌
Rethinking the Augmentation Module in Contrastive Learning: Learning Hierarchical Augmentation Invariance with Expanded Views重新思考对比学习中的增强模块：使用扩展视图学习分层增强不变性
AirObject: A Temporally Evolving Graph Embedding for Object IdentificationAirObject：用于对象识别的时间演化图嵌入
Balanced Multimodal Learning via On-the-fly Gradient Modulation通过动态梯度调制平衡多模态学习
Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localizationRay3D：用于单目绝对 3D 定位的基于射线的 3D 人体姿态估计
Computing Wasserstein- $p$ Distance Between Images with Linear Cost使用线性成本计算图像之间的 Wasserstein- $p$ 距离
Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video捕捉运动中的人类：来自单目视频的时间注意 3D 人体姿势和形状估计
Feature Statistics Mixing Regularization for Generative Adversarial Networks生成对抗网络的特征统计混合正则化
Expressive Talking Head Generation with Granular Audio-Visual Control具有精细视听控制的富有表现力的说话头生成器
Geometric Anchor Correspondence Mining with Uncertainty Modelling for Universal Domain Adaptation具有不确定性建模的几何锚点对应挖掘用于通用域自适应
OSSO: Obtaining Skeletal Shape from OutsideOSSO：从外部获取骨骼形状
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs你怎么做呢？使用伪副词进行细粒度的动作理解
GIRAFFE HD: A High-Resolution 3D-aware Generative ModelGIRAFFE HD：高分辨率 3D 感知生成模型
Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism通过原型任务相关引导门控机制进行连续目标检测
Pixel screening based intermediate correction for blind deblurring基于像素筛选的盲去模糊中间校正
LAS-AT: Adversarial Training with Learnable Attack StrategyLAS-AT：具有可学习攻击策略的对抗性训练
Eigenlanes: Data-Driven Lane Descriptors for Structurally Diverse LanesEigenlanes：结构多样化车道的数据驱动车道描述符
Moving Window Regression: A Novel Approach to Ordinal Regression移动窗口回归：序数回归的一种新方法
SC^2-PCR: A Second Order Spatial Compatibility for Efficient and Robust Point Cloud RegistrationSC^2-PCR：高效和稳健点云配准的二阶空间兼容性
APRIL: Finding the Achilles’ Heel on Privacy Leakage for Vision Transformers月：寻找视觉变形金刚隐私泄露的致命弱点
Eigencontours: Novel Contour Descriptors Based on Low-Rank Approximation特征轮廓：基于低秩近似的新型轮廓描述符
Cross-modal Background Suppression for Audio-Visual Event Localization用于视听事件定位的跨模态背景抑制
WebQA: Multihop and Multimodal QAWebQA：多跳和多模式 QA
Fairness-aware Adversarial Perturbation Towards Bias Mitigation for Deployed Deep Models对已部署深度模型的偏差缓解的公平感知对抗性扰动
Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation用于多人 3D 姿势估计的分布感知单阶段模型
Active Learning for Open-set Annotation开放集注释的主动学习
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance SegmentationE2EC：一种基于端到端轮廓的高质量高速实例分割方法
Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit Neural Representation通过隐式神经表示的自监督任意尺度点云上采样
Relative Pose from a Calibrated and an Uncalibrated Smartphone Image校准和未校准智能手机图像的相对姿势
Learning Optical Flow with Kernel Patch Attention使用内核补丁注意力学习光流
Contrastive Learning for Unsupervised Video Highlight Detection用于无监督视频亮点检测的对比学习
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image PriorISNAS-DIP：用于深度图像先验的图像特定神经架构搜索
MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Videos Similarity EvaluationMVSE：用于多模态视频相似度评估的大规模基准数据集
Discrete time convolution for fast event-based stereo用于快速基于事件的立体声的离散时间卷积
Proper Reuse of Image Classification Features Improves Object Detection正确重用图像分类特征可提高目标检测
Object-Region Video Transformers对象区域视频转换器
Vision-Language Pre-Training for Boosting Scene Text Detectors增强场景文本检测器的视觉语言预训练
Bandits for Structure Perturbation-based Black-box Attacks to Graph Neural Networks with Theoretical Guarantees具有理论保证的基于结构扰动的黑盒攻击图神经网络的强盗
Revisiting Large Kernel Design in Convolutional Neural Networks重新审视卷积神经网络中的大型内核设计
Generating High Fidelity Data from Low-density Regions using Diffusion Models使用扩散模型从低密度区域生成高保真数据
Colar: Effective and Efficient Online Action Detection by Consulting ExemplarsColar：通过咨询示例进行有效且高效的在线动作检测
Learning Visual-Semantic Explanations of Deep Visual Latent Representations学习深度视觉潜在表示的视觉语义解释
StyleMesh: Style Transfer for Indoor 3D Scene ReconstructionsStyleMesh：室内 3D 场景重建的风格转移
Probing Representation Forgetting in Supervised and Unsupervised Continual Learning探索有监督和无监督持续学习中的表征遗忘
Light Field Neural Rendering光场神经渲染
ROCA: Robust CAD Model Retrieval and Alignment from a Single ImageROCA：从单个图像中检索和对齐强大的 CAD 模型
Pix2NeRF: Unsupervised Conditional pi-GAN for Single Image to Neural Radiance Fields TranslationPix2NeRF：用于单图像到神经辐射场转换的无监督条件 pi-GAN
Non-Iterative Recovery from Nonlinear Observations using Generative Models使用生成模型从非线性观测中进行非迭代恢复
Forecasting from LiDAR via Future Object Detection通过未来目标检测从 LiDAR 进行预测
Towards Total Recall in Industrial Anomaly Detection工业异常检测中的全面召回
Low-Resource Adaptation for Personalized Co-Speech Gesture Generation用于个性化共同语音手势生成的低资源适应
Integrating Language Guidance into Vision-based Deep Metric Learning将语言指导集成到基于视觉的深度度量学习中
Non-isotropy Regularization for Proxy-based Deep Metric Learning基于代理的深度度量学习的非各向同性正则化
Estimating Egocentric 3D Human Pose in the Wild with External Weak Supervision在外部弱监督下估计以自我为中心的 3D 人体姿势
Less is More: Generating Grounded Navigation Instructions from Landmarks少即是多：从地标生成接地导航指令
Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis自动合成多种弱监督源进行行为分析
Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search改进神经架构搜索的性能感知互知识蒸馏
End-to-End Reconstruction-Classification Learning for Face Forgery Detection人脸伪造检测的端到端重建分类学习
UKPGAN: A General Self-Supervised Keypoint DetectorUKPGAN：一种通用的自我监督关键点检测器
C2SLR: Consistency-enhanced Continuous Sign Language RecognitionC2SLR：一致性增强的连续手语识别
Boosting Black-Box Attack with Partially Transferred Conditional Adversarial Distribution使用部分转移的条件对抗分布增强黑盒攻击
Style Transformer for Image Inversion and Editing用于图像反转和编辑的样式转换器
Uformer: A General U-Shaped Transformer for Image RestorationUformer：用于图像恢复的通用 U 形变压器
Speech Driven Tongue Animation语音驱动的舌头动画
DO-GAN: A Double Oracle Framework for Generative Adversarial NetworksDO-GAN：用于生成对抗网络的双 Oracle 框架
IntentVizor: Towards Generic Query Guided Interactive Video SummarizationIntentVizor：迈向通用查询引导的交互式视频摘要
Self-supervised Deep Image Restoration via Adaptive Stochastic Gradient Langevin Dynamics基于自适应随机梯度朗之万动力学的自监督深度图像恢复
Sound-Guided Semantic Image Manipulation声导语义图像处理
Adaptive Gating for Single-Photon 3D Imaging单光子 3D 成像的自适应选通
Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection目标感知双对抗学习和多场景多模态基准融合红外和可见光进行目标检测
GaTector: A Unified Framework for Gaze Object PredictionGaTector：凝视对象预测的统一框架
Multi-Level Representation Learning with Semantic Alignment for Referring Video Object Segmentation具有语义对齐的多级表示学习用于参考视频对象分割
Anomaly Detection via Reverse Distillation from One-Class Embedding通过一类嵌入的反向蒸馏进行异常检测
Dynamic 3D Gaze from Afar: Deep Gaze Estimation from Temporal Eye-Head-Body Coordination远方动态 3D 凝视：时间眼-头-身协调的深度凝视估计
Maximum Consensus by Weighted Influences of Monotone Boolean Functions单调布尔函数加权影响的最大共识
Beyond Fixation: Dynamic Window Visual Transformer超越固定：动态窗口视觉转换器
Dressing in the Wild by Watching Dance Videos通过观看舞蹈视频在野外穿衣
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers从注意力中学习亲和力：使用 Transformers 的端到端弱监督语义分割
Contrastive Boundary Learning for Point Cloud Segmentation点云分割的对比边界学习
Proto2Proto: Can you recognize the car, the way I do?Proto2Proto：你能像我一样认出这辆车吗？
Bridged Transformer for Vision and Point Cloud 3D Object Detection用于视觉和点云 3D 对象检测的桥接变压器
V2C: Visual Voice CloningV2C：视觉语音克隆
An Efficient Training Approach for Very Large Scale Face Recognition一种有效的超大规模人脸识别训练方法
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and EditingSemanticStyleGAN：学习用于可控图像合成和编辑的组合生成先验
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text RecognitionSwinTextSpotter：通过文本检测和文本识别之间更好的协同作用进行场景文本定位
Task Discrepancy Maximization for Fine-grained Few-Shot Classification细粒度小样本分类的任务差异最大化
Reflection and Rotation Symmetry Detection via Equivariant Learning基于等变学习的反射和旋转对称检测
Self-Supervised Equivariant Learning for Oriented Keypoint Detection面向关键点检测的自监督等变学习
Improving the Transferability of Targeted Adversarial Examples through Object-Based Diverse Input通过基于对象的多样化输入提高目标对抗样本的可迁移性
3DeformRS: Certifying Spatial Deformations on Point Clouds3DeformRS：证明点云上的空间变形
DiGS : Divergence guided shape implicit neural representation for unoriented point cloudsDiGS：无向点云的散度引导形状隐式神经表示
UNICON: Combating Label Noise Through Uniform Selection and Contrastive LearningUNICON：通过统一选择和对比学习来对抗标签噪声
Vision Transformer with Deformable Attention具有可变形注意力的视觉转换器
Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation用于高效 3DCG 背景创建的多样化合理 360 度图像外绘
Industrial Style Transfer with Large-scale Geometric Warping and Content Preservation具有大规模几何变形和内容保留的工业风格转移
Hierarchical Modular Network for Video Captioning用于视频字幕的分层模块化网络
Optimal LED Spectral Multiplexing for NIR2RGB Translation用于 NIR2RGB 转换的最佳 LED 光谱复用
Exploring Frequency Adversarial Attacks for Face Forgery Detection探索用于面部伪造检测的频率对抗攻击
LAR-SR: A Local Autoregressive Model for Image Super ResolutionLAR-SR：图像超分辨率的局部自回归模型
What do navigation agents learn about their environment?导航代理如何了解他们的环境？
HOP: History-and-Order Aware Pre-training for Vision-and-Language NavigationHOP：视觉和语言导航的历史和顺序感知预训练
Entropy-based Active Learning for Object Detection with Progressive Diversity Constraint基于熵的渐进多样性约束目标检测主动学习
Class Similarity Weighted Knowledge Distillation for Continual Semantic Segmentation用于连续语义分割的类相似度加权知识蒸馏
Swin Transformer V2: Scaling Up Capacity and ResolutionSwin Transformer V2：扩大容量和分辨率
Knowledge Distillation via the Target-aware Transformer通过目标感知转换器进行知识蒸馏
Sparse Object-level Supervision for Instance Segmentation with Pixel Embeddings具有像素嵌入的实例分割的稀疏对象级监督
Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources通过在线资源对上下文外图像进行开放域、基于内容、多模式的事实检查
Exemplar-based Pattern Synthesis with Implicit Periodic Field Network具有隐式周期性场网络的基于示例的模式合成
RigidFlow: Self-Supervised Scene Flow Learning on Point Clouds by Local Rigidity PriorRigidFlow：通过局部刚性先验在点云上进行自我监督的场景流学习
Weakly Supervised Segmentation on Outdoor 4D point clouds with Temporal Matching and Spatial Graph Propagation具有时间匹配和空间图传播的室外 4D 点云的弱监督分割
E^2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action RecognitionE^2(GO)MOTION：用于以自我为中心的动作识别的运动增强事件流
Ego4D: Around the World in 3,000 Hours of Egocentric VideoEgo4D：3000 小时以自我为中心的视频环游世界
Spiking Transformers for Event-based Single Object Tracking用于基于事件的单对象跟踪的尖峰转换器
Few-Shot Incremental Learning for Label-to-Image Translation用于标签到图像翻译的少量增量学习
CD^2-pFed: Cyclic Distillation-guided Channel Decoupling for Model Personalization in Federated LearningCD^2-pFed：循环蒸馏引导的联合学习中模型个性化的通道解耦
OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution GeneralizationOoD-Bench：量化和理解分布外泛化的两个维度
Speed up Object Detection on Gigapixel-level Image with Patch Arrangement使用补丁排列加速千兆像素级图像的目标检测
Learning Adaptive Warping for Real-World Rolling Shutter Correction学习自适应翘曲以进行真实世界的卷帘快门校正
Robust and Accurate Superquadric Recovery: a Probabilistic Approach稳健且准确的超二次曲线恢复：一种概率方法
SimVP: Simpler yet Better Video PredictionSimVP：更简单但更好的视频预测
Hyperspherical Consistency Regularization超球面一致性正则化
Dense Depth Priors for Neural Radiance Fields from Sparse Input Views来自稀疏输入视图的神经辐射场的密集深度先验
HyperInverter: Improving StyleGAN Inversion via HypernetworkHyperInverter：通过超网络改进 StyleGAN 反转
Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection多源域自适应目标检测的目标相关知识保存
Whose Hands are These? Hand Detection and Hand-Body Association in the Wild这些是谁的手？野外手部检测与手体关联
Blind Face Restoration via Integrating Face Shape and Generative Priors通过整合人脸形状和生成先验的盲人脸恢复
Multimodal Material Segmentation多模态材料分割
Do explanation methods explain? Model knows best解释方法能解释吗？模型最清楚
Deep Hybrid Models for Out-of-Distribution Detection分布外检测的深度混合模型
Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetics用于视觉语义算术的零样本图像到文本生成
Detecting Camouflaged Object in Frequency Domain在频域中检测伪装对象
Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection在人-物交互检测的交互建议中探索结构感知 Transformer
Appearance and Structure Aware Robust Deep Visual Graph Matching: Attack, Defense and Beyond外观和结构感知鲁棒的深度视觉图匹配：攻击、防御及超越
PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation with Photometrically Challenging ObjectsPhoCaL：用于具有光度挑战性物体的类别级物体姿态估计的多模态数据集
HINT: Hierarchical Neuron Concept Explainer提示：分层神经元概念解释器
Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces from 3D MRI Scans with Geometric Deep Neural NetworksVox2Cortex：使用几何深度神经网络从 3D MRI 扫描中快速显式重建皮质表面
Generative Cooperative Learning for Unsupervised Video Anomaly Detection用于无监督视频异常检测的生成式协作学习
Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation全景、实例和语义关系：增强全景分割的关系上下文编码器
Object-Relation Reasoning Graph for Action Recognition动作识别的对象关系推理图
Lifelong Graph Learning终身图学习
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation用于手语翻译的简单多模态迁移学习基线
Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture SearchArch-Graph：用于任务可转移神经架构搜索的非循环架构关系预测器
Rethinking Minimal Sufficient Representation in Contrastive Learning重新思考对比学习中的最小充分表示
Physical Simulation Layer for Accurate 3D Modeling用于精确 3D 建模的物理模拟层
Image Animation with Perturbed Masks带有扰动蒙版的图像动画
Sparse to Dense Dynamic 3D Facial Expression Generation稀疏到密集的动态 3D 面部表情生成
AIM: an Auto-Augmenter for Images and MeshesAIM：图像和网格的自动增强器
PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed Monocular VideosPlanarRecon：实时 3D 平面检测和从姿势单目视频重建
Modular Action Concept Grounding in Semantic Video Prediction语义视频预测中的模块化动作概念接地
Generating Representative Samples for Few-Shot Classification为 Few-Shot 分类生成代表性样本
SurfEmb: Dense and Continuous Correspondence Distributions for Object Pose Estimation with Learnt Surface EmbeddingsSurfEmb：具有学习表面嵌入的对象姿态估计的密集和连续对应分布
Sequential Voting with Relational Box Fields for Active Object Detection用于主动对象检测的带有关系框字段的顺序投票
Are Multimodal Transformers Robust to Missing Modality?多模态变压器对缺失模态具有鲁棒性吗？
Debiased Learning from Naturally Imbalanced Pseudo-Labels从自然不平衡的伪标签中进行去偏学习
Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos多视图教学视频中的弱监督在线动作分割
Learning to deblur using light field generated and real defocus images学习使用生成的光场和真实的散焦图像去模糊
TOAD: Topologically-Aware Deformation Fields for Single-view 3D ReconstructionTOAD：用于单视图 3D 重建的拓扑感知变形场
An Empirical Study of Training End-to-End Vision-and-Language Transformers培训端到端视觉和语言变形金刚的实证研究
PLAD: Learning to Infer Shape Programs with Pseudo-Labels and Approximate DistributionsPLAD：学习用伪标签和近似分布推断形状程序
The Neurally-Guided Shape Parser: Grammar-based Labeling of 3D Shape Regions with Approximate Inference神经引导的形状解析器：具有近似推理的 3D 形状区域的基于语法的标记
Imposing Consistency for Optical Flow Estimation为光流估计施加一致性
Generating Diverse 3D Reconstructions from a Single Occluded Face Image从单个被遮挡的人脸图像生成不同的 3D 重建
RecDis-SNN: Rectifying Membrane Potential Distribution for Directly Training Spiking Neural NetworksRecDis-SNN：用于直接训练尖峰神经网络的校正膜电位分布
3D Moments from Near-Duplicate Photos来自近乎重复的照片的 3D 时刻
CLIP-Forge: Towards Zero-Shot Text-to-Shape GenerationCLIP-Forge：迈向零样本文本到形状生成
MatteFormer: Transformer-Based Image Matting via Prior-TokensMatteFormer：通过 Prior-Tokens 进行基于 Transformer 的图像抠图
Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable PrototypesDeformable ProtoPNet：使用可变形原型的可解释图像分类器
Learning Bayesian Sparse Networks with Full Experience Replay for Continual Learning学习具有完整体验重放的贝叶斯稀疏网络以进行持续学习
Category-Aware Transformer Network for Better Human-Object Interaction Detection类别感知变压器网络，用于更好的人-物交互检测
Segment, Magnify and Reiterate: Detecting Camouflaged Objects the Hard Way分割、放大和重复：以艰难的方式检测伪装的物体
UNIST: Unpaired Neural Implicit Shape-to-Shape TranslationUNIST：非配对神经隐式形状到形状转换
REGTR: End-to-end Point Cloud Correspondences with TransformersREGTR：与 Transformer 的端到端点云通信
Show, Deconfound and Tell: Image Captioning with Causal Inference展示、解惑和讲述：带有因果推理的图像字幕
DeepFake Disrupter: The Detector of DeepFake Is My FriendDeepFake Disrupter：DeepFake 的检测器是我的朋友
Lite Vision Transformer with Enhanced Self-Attention增强自注意力的 Lite Vision Transformer
Bi-directional Object-context Prioritization Learning for Saliency Ranking显着性排名的双向对象上下文优先级学习
OSKDet: Orientation-sensitive Keypoint Localization for Rotated Object DetectionOSKDet：用于旋转目标检测的方向敏感关键点定位
Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification用于细粒度视觉分类和对象重新识别的双重交叉注意学习
Invariant Grounding for Video Question Answering视频问答的不变接地
Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning通过非 IID 联邦学习的无数据知识蒸馏微调全局模型
Learning Robust Image-Based Rendering on Sparse Scene Geometry via Depth Completion通过深度补全学习基于稀疏场景几何的鲁棒图像渲染
FENeRF: Face Editing in Neural Radiance FieldsFENeRF：神经辐射场中的人脸编辑
A Probabilistic Graphical Model Based on Neural-symbolic Reasoning for Visual Relationship Detection基于神经符号推理的视觉关系检测概率图形模型
CVNet: Contour Vibration Network for Building ExtractionCVNet：用于建筑物提取的轮廓振动网络
What to Look at and Where: Semantic and Spatial Refined Transformer for Detecting Human-Object Interactions看什么和在哪里看：用于检测人与物体交互的语义和空间精炼变压器
Nested Hyperbolic Spaces for Dimensionality Reduction and Hyperbolic NN Design用于降维和双曲 NN 设计的嵌套双曲空间
ABPN: Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution PhotoABPN：用于超高分辨率照片实时局部修饰的自适应混合金字塔网络
Does Robustness on ImageNet Transfer to Downstream Tasks?ImageNet 的鲁棒性是否会转移到下游任务？
Crowd Counting in the Frequency Domain频域中的人群计数
SimMIM: A Simple Framework for Masked Image ModelingSimMIM：蒙版图像建模的简单框架
GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive Recognition of Cereal GrainsGrainSpace：用于细粒度和域自适应识别谷物的大规模数据集
End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps基于占用网格图的端到端轨迹分布预测
MPViT : Multi-Path Vision Transformer for Dense PredictionMPViT：用于密集预测的多路径视觉转换器
Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer记住差异：通过元内存传输的跨域少镜头语义分割
ARCS: Accurate Rotation and Correspondence SearchARCS：准确的旋转和对应搜索
Ranking Distance Calibration for Cross-Domain Few-Shot Learning跨域小样本学习的排名距离校准
MetaFSCIL: A Meta-Learning Approach for Few-Shot Class Incremental LearningMetaFSCIL：少样本增量学习的元学习方法
Fisher Information Guidance for Learned Time-of-Flight Imaging用于学习飞行时间成像的 Fisher 信息指南
Joint Video Summarization and Moment Localization by Cross-Task Sample Transfer跨任务样本转移的联合视频摘要和时刻定位
MotionAug: Augmentation with Physical Correction for Human Motion PredictionMotionAug：用于人体运动预测的物理校正增强
Deep Color Consistent Network for Low-Light Image Enhancement用于低光图像增强的深色一致网络
Non-Probability Sampling Network for Stochastic Human Trajectory Prediction用于随机人体轨迹预测的非概率采样网络
GCFSR: a Generative and Controllable Face Super Resolution Method Without Facial and GAN PriorsGCFSR：一种没有面部和 GAN 先验的生成且可控的面部超分辨率方法
Improving Adversarial Transferability via Neuron Attribution-Based Attacks通过基于神经元归因的攻击提高对抗性可转移性
HiVT: Hierarchical Vector Transformer for Multi-Agent Motion PredictionHiVT：用于多智能体运动预测的分层矢量变换器
Pooling Revisited: Your Receptive Field is Sub-optimal重新审视池化：你的感受野不是最佳的
Compressing Models with Few Samples: Mimicking then Replacing使用少量样本压缩模型：模仿然后替换
Shape from Thermal Radiation: Passive Ranging Using Multi-spectral LWIR Measurements热辐射的形状：使用多光谱 LWIR 测量的无源测距
Layered Depth Refinement with Mask Guidance使用掩模引导进行分层深度细化
Highly-efficient Incomplete Large-scale Multi-view Clustering with Consensus Bipartite Graph基于共识二分图的高效不完全大规模多视图聚类
Scaling Up Vision-Language Pretraining for Image Captioning扩大图像字幕的视觉语言预训练
Optimal Correction Cost for Object Detection Evaluation目标检测评估的最佳校正成本
Deformable Video Transformer可变形视频变压器
High-fidelity Monocular Human Reconstruction by Combining Implicit and Explicit Representations结合隐式和显式表示的高保真单目人体重建
Nonlocal Sparse CRF非局部稀疏 CRF
Long-Short Temporal Contrastive Learning of Video Transformers视频变压器的长短时间对比学习
QS-Attn: Query-Selected Attention for Contrastive Learning in I2I TranslationQS-Attn：I2I 翻译中对比学习的查询选择注意
All-In-One Image Restoration for Unknown Corruption针对未知损坏的多合一图像恢复
Learning to Detect Scene Landmarks for Camera Localization学习检测场景地标以进行相机定位
WildNet: Learning Domain Generalized Semantic Segmentation from the WildWildNet：从野外学习领域广义语义分割
Pushing the Envelope of Gradient Boosting Forests via Globally-Optimized Oblique Trees通过全局优化的斜树推动梯度提升森林的范围
Egocentric Scene Understanding via Multimodal Spatial Rectifier通过多模态空间整流器理解以自我为中心的场景
OSSGAN: Open-Set Semi-Supervised Image GenerationOSSGAN：开放集半监督图像生成
Large-scale Video Panoptic Segmentation in the Wild: A Benchmark野外大规模视频全景分割：基准
Unsupervised Representation Learning for Binary Networks by Joint Classifier Learning联合分类器学习的二元网络无监督表示学习
β-DARTS: Beta-Decay Regularization for Differentiable Architecture Searchβ-DARTS：可微架构搜索的 Beta-Decay 正则化
Stereo Depth from Events Cameras: Concentrate and Focus on the Future事件摄像机的立体深度：专注于未来
Transferable Sparse Adversarial Attack可转移稀疏对抗攻击
FAM: Visual Explanations for the Feature Representations from Deep Convolutional NetworksFAM：来自深度卷积网络的特征表示的视觉解释
Noise-Aware NeRFs for Burst-Denoising用于突发去噪的噪声感知 NeRF
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point CloudsVoxel Set Transformer：从点云进行 3D 对象检测的 Set-to-Set 方法
Bayesian Invariant Risk Minimization贝叶斯不变风险最小化
Extracting Triangular 3D Models, Materials, and Lighting From Images从图像中提取三角形 3D 模型、材质和照明
RelTransformer: A Transformer-Based Long-Tail Visual Relationship RecognitionRelTransformer：一种基于 Transformer 的长尾视觉关系识别
Transformer-empowered Multi-scale Contextual Matching and Aggregation for Multi-contrast MRI Super-resolution变压器赋能的多尺度上下文匹配和聚合用于多对比度 MRI 超分辨率
SphericGAN: Semi-supervised Hyper-spherical Generative Adversarial Networks for Fine-grained Image SynthesisSphericGAN：用于细粒度图像合成的半监督超球面生成对抗网络
LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture RecognitionLD-ConGR：用于长距离连续手势识别的大型 RGB-D 视频数据集
Unifying Panoptic Segmentation for Autonomous Driving统一自动驾驶全景分割
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image CaptioningVisualGPT：用于图像字幕的预训练语言模型的数据高效适应
Interspace Pruning: Using Adaptive Filter Representations to Improve Training of Sparse CNNs空间修剪：使用自适应滤波器表示来改进稀疏 CNN 的训练
NightLab: A Dual-level Architecture with Hardness Detection for Segmentation at NightNightLab：用于夜间分割的具有硬度检测的双级架构
Learning to Memorize Feature Hallucination for One-Shot Image Generation学习记忆特征幻觉以生成一次性图像
FedCorr: Multi-Stage Federated Learning for Label Noise CorrectionFedCorr：标签噪声校正的多阶段联合学习
GeoNeRF: Generalizing NeRF with Geometry PriorsGeoNeRF：使用几何先验概括 NeRF
Neural 3D Video Synthesis神经 3D 视频合成
TransforMatcher: Match-to-Match Attention for Semantic CorrespondenceTransforMatcher：语义对应的匹配注意
Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting表示、比较和学习：用于类不可知计数的相似性感知框架
AxIoU: An Axiomatically Justified Measure for Video Moment RetrievalAxIoU：一种公理合理的视频时刻检索度量
Deep Safe Multi-view Clustering: Reducing the Risk of Clustering Performance Degradation Caused by View Increase.深度安全的多视图聚类：降低视图增加导致的聚类性能下降的风险。
Burst Image Restoration and Enhancement突发图像恢复和增强
Modeling Indirect Illumination for Inverse Rendering为反向渲染建模间接照明
Knowledge Mining with Scene Text for Fine-Grained Recognition用于细粒度识别的场景文本知识挖掘
FlexIT: Towards Flexible Semantic Image TranslationFlexIT：迈向灵活的语义图像翻译
Surpassing the Human Accuracy: Detecting Gallbladder Cancer from USG Images with Curriculum Learning超越人类准确度：通过课程学习从 USG 图像中检测胆囊癌
More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech不仅仅是文字：用于文本到语音的野外视觉驱动韵律
Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning模仿 Oracle：类增量学习的初始阶段去相关方法
Multi-Person Extreme Motion Prediction多人极限运动预测
Does text attract attention on e-commerce images: A novel saliency prediction dataset and method文本是否会引起电子商务图像的注意：一种新颖的显着性预测数据集和方法
Instance-Aware Dynamic Neural Network Quantization实例感知动态神经网络量化
Energy-based Latent Aligner for Incremental Learning用于增量学习的基于能量的潜在校准器
Semi-supervised Video Paragraph Grounding with Contrastive Encoder具有对比编码器的半监督视频段落接地
Personalized Image Aesthetics Assessment with Rich Attributes属性丰富的个性化形象美学评估
Attention Concatenation Volume for Accurate and Efficient Stereo Matching用于精确和高效立体匹配的注意力连接体积
Split Hierarchal Variational Compression拆分分层变分压缩
MS2DG-Net: Progressive Correspondence Learning via Multi Sparse Semantic Dynamic GraphMS2DG-Net：基于多稀疏语义动态图的渐进式对应学习
Large Loss Matters in Weakly Supervised Multi-Label Classification弱监督多标签分类中的大损失很重要
Recurring the Transformer for Video Action Recognition循环使用 Transformer 进行视频动作识别
Look Closer to Supervise Better: One-Shot Font Generation via Component-Based Discriminator仔细观察以更好地监督：通过基于组件的鉴别器生成 One-Shot 字体
KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot LearningKG-SP：开放世界组合零样本学习的知识引导简单原语
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning双曲线视觉变形金刚：结合度量学习的改进
Camera Pose Estimation using Implicit Distortion Models使用隐式失真模型的相机姿态估计
A Structured Dictionary Perspective on Implicit Neural Representations隐式神经表示的结构化字典视角
ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame InterpolationST-MFNet：用于帧插值的时空多流网络
Geometric Structure Preserving Warp for Natural Image Stitching几何结构保持翘曲的自然图像拼接
Slimmable Domain Adaptation可精简的域适应
Meta Convolutional Neural Networks for Single Domain Generalization用于单域泛化的元卷积神经网络
Label Matching Semi-Supervised Object Detection标签匹配半监督目标检测
Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning用于鲁棒人脸对齐和地标固有关系学习的稀疏局部补丁转换器
Abandoning the Bayer-Filter to See in the Dark放弃拜耳过滤器以在黑暗中看到
Deep Hierarchical Semantic Segmentation深度层次语义分割
MixFormer: End-to-End Tracking with Iterative Mixed AttentionMixFormer：具有迭代混合注意力的端到端跟踪
ContIG: Self-supervised Multimodal Contrastive Learning for Medical Imaging with GeneticsContIG：具有遗传学的医学成像的自我监督多模态对比学习
Occlusion-robust Face Alignment using A Viewpoint-invariant Hierarchical Network Architecture使用视点不变分层网络架构的遮挡鲁棒人脸对齐
Segment-Fusion: Hierarchical Context Fusion for Robust 3D Semantic SegmentationSegment-Fusion：用于稳健 3D 语义分割的分层上下文融合
STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution Video PredictionSTRPM：高分辨率视频预测的时空残差预测模型
Boosting 3D Object Detection by Simulating Multimodality on Point Clouds通过在点云上模拟多模态来提升 3D 对象检测
RADU: Ray-Aligned Depth Update Convolutions for ToF Data DenoisingRADU：用于 ToF 数据去噪的射线对齐深度更新卷积
Auto-Encoder is All You Need自动编码器就是您所需要的
Whose Track Is It Anyway? Improving Robustness to Tracking Errors with Affinity-Based Prediction到底是谁的轨道？使用基于亲和的预测提高跟踪错误的鲁棒性
Multi-marginal Contrastive Learning for Multi-label Subcellular Protein Localization多标签亚细胞蛋白定位的多边缘对比学习
Stand-Alone Inter-Frame Attention in Video Models视频模型中的独立帧间注意
Hyperbolic Image Segmentation双曲线图像分割
RepMLPNet: Hierarchical Vision MLP with Re-parameterized LocalityRepMLPNet：具有重新参数化局部性的分层视觉 MLP
Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous DrivingTime3D：用于自动驾驶的端到端联合单目 3D 对象检测和跟踪
SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-MaximizationSWEM：通过顺序加权期望最大化实现实时视频对象分割
ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial RotationART-Point：通过对抗旋转提高点云分类器的旋转鲁棒性
Super-Fibonacci Spirals: Fast, Low-Discrepancy Sampling of SO(3)超斐波那契螺旋：SO(3) 的快速、低差异采样
Learning to Learn and Remember Super Long Multi-Domain Task Sequence学习学习和记忆超长多领域任务序列
Noise Is Also Useful: Negative Correlation-Steered Latent Contrastive Learning噪声也有用：负相关引导的潜在对比学习
FLOAT: Factorized Learning of Object Attributes for Improved Multi-object Multi-part Scene ParsingFLOAT：用于改进多对象多部分场景解析的对象属性分解学习
Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis用于可控 3D 人体合成的表面对齐神经辐射场
Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model使用视觉语言模型学习提示开放词汇对象检测
Real World Self-Supervised Multi-Image Super-Resolution for Multi-Exposure Push-Frame Satellites多曝光推帧卫星的真实世界自监督多图像超分辨率
Knowledge Distillation with the Reused Teacher Classifier重用教师分类器的知识蒸馏
Geometry-Aware Guided Loss for Deep Crack Recognition用于深度裂缝识别的几何感知引导损失
AdaMixer: A Simple and Accurate Query-based Object DetectorAdaMixer：一个简单而准确的基于查询的对象检测器
Learning Structured Gaussians to Approximate Deep Ensembles学习结构化高斯函数以逼近深度集成
Input-level Inductive Biases for 3D Reconstruction用于 3D 重建的输入级归纳偏差
BTS: A Bi-lingual Benchmark for Text Segmentation in the WildBTS：野外文本分割的双语基准
Stereo Magnification with Multi-Layer Images具有多层图像的立体放大
Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection分段和完整：通过强大的补丁检测保护对象检测器免受对抗性补丁攻击
Coherent Point Drift Revisited for Non-rigid Shape Matching and Registration重新审视非刚性形状匹配和配准的相干点漂移
Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint通过结构一致性约束减轻无监督低级图像到图像转换中的语义失真
CNN Filter DB: An Empirical Investigation of Trained Convolutional FiltersCNN 滤波器数据库：训练卷积滤波器的实证研究
Text2Mesh: Text-Driven Neural Stylization for MeshesText2Mesh：网格的文本驱动神经样式化
RFNet: Unsupervised Network for Mutually Reinforcing Multi-modal Image Registration and FusionRFNet：用于相互加强多模态图像配准和融合的无监督网络
Image Dehazing Transformer with Transmission-Aware 3D Position Embedding具有传输感知 3D 位置嵌入的图像去雾转换器
Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification用于分层多粒度分类的标签关系图增强分层残差网络
RGB-Multispectral Matching: Dataset, Learning Methodology, EvaluationRGB-多光谱匹配：数据集、学习方法、评估
Maintaining Reasoning Consistency in Compositional Visual Question Answering在组合视觉问答中保持推理一致性
PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite ImagesPolyWorld：卫星图像中使用图神经网络的多边形建筑物提取
Fast Algorithm for Low-rank Tensor Completion in Delay-embedded Space延迟嵌入空间中低秩张量补全的快速算法
Dynamic Sparse R-CNN动态稀疏 R-CNN
Improving Robustness Against Stealthy Weight Bit-Flip Attacks by Output Code Matching通过输出代码匹配提高对隐匿权重位翻转攻击的鲁棒性
NPBG++: Accelerating Neural Point-Based GraphicsNPBG++：加速基于神经点的图形
Forward Compatible Few-Shot Class-Incremental Learning前向兼容 Few-Shot Class-Incremental Learning
Weakly-supervised Metric Learning with Cross-Module Communications for the Classification of Anterior Chamber Angle Images用于前房角度图像分类的跨模块通信的弱监督度量学习
Learning Canonical F-Correlation Projection for Compact Multiview Representation学习用于紧凑多视图表示的典型 F 相关投影
Learning Non-target Knowledge for Few-shot Semantic Segmentation学习非目标知识进行少量语义分割
Towards Low-Cost and Efficient Malaria Detection迈向低成本和高效的疟疾检测
PoseTrack21: A Dataset for Person Search, Multi-Object Tracking and Multi-Person Pose TrackingPoseTrack21：用于人员搜索、多对象跟踪和多人姿势跟踪的数据集
NeuralHDHair: Automatic High-fidelity Hair Modeling from a Single Image Using Implicit Neural RepresentationsNeuralHDHair：使用隐式神经表示从单个图像自动进行高保真头发建模
ClusterGNN: Cluster-based Coarse-to-fine Graph Neural Network for Efficient Feature MatchingClusterGNN：用于高效特征匹配的基于集群的粗到精图神经网络
An Iterative Quantum Approach for Transformation Estimation from Point Sets从点集估计变换的迭代量子方法
ATPFL: Automatic Trajectory Prediction Model Design under Federated Learning FrameworkATPFL：联邦学习框架下的自动轨迹预测模型设计
Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training了解 Frank-Wolfe 对抗训练并提高效率
Targeted Supervised Contrastive Learning for Long-Tailed Recognition用于长尾识别的有针对性的监督对比学习
Optimizing Elimination Templates by Greedy Parameter Search通过贪心参数搜索优化消除模板
M3T: three-dimensional Medical image classifier using Multi-plane and Multi-slice TransformerM3T：使用 Multi-plane 和 Multi-slice Transformer 的三维医学图像分类器
Projective Manifold Gradient Layer for Deep Rotation Regression用于深度旋转回归的投影流形梯度层
PUMP: Pyramidal and Uniqueness Matching Priors for Unsupervised Learning of Local DescriptorsPUMP：用于局部描述符无监督学习的金字塔和唯一性匹配先验
Deep orientation-aware functional maps : Tackling symmetry issues in Shape Matching深度感知功能图：解决形状匹配中的对称问题
A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation在全景分割的指导下，用于基于 LiDAR 的 3D 对象检测的多功能多视图框架
Lite-MDETR: A Lightweight Multi-Modal DetectorLite-MDETR：轻量级多模态探测器
Cross Modal Retrieval with Querybank Normalisation使用 Querybank 规范化的跨模态检索
On Learning Contrastive Representations for Learning with Noisy Labels关于学习带有噪声标签的学习对比表示
Cross-view transformers for real-time map-view semantic segmentation用于实时地图视图语义分割的跨视图转换器
Towards Data-Free Model Stealing in a Hard Label Setting在硬标签设置中实现无数据模型窃取
The DEVIL is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting魔鬼在细节中：视频修复的诊断评估基准
Unseen Classes at a Later Time? No Problem以后看不到的课程？没问题
Channel Balancing for Accurate Quantization of Winograd Convolutions用于精确量化 Winograd 卷积的通道平衡
Instance masks are what you need: Segmentation parity from object boundaries实例掩码是您所需要的：对象边界的分段奇偶校验
TVConv: Efficient Translation Variant Convolution for Layout-aware Visual ProcessingTVConv：用于布局感知视觉处理的高效翻译变体卷积
Scanline Homographies for Rolling-Shutter Plane Absolute Pose卷帘快门平面绝对位姿的扫描线单应性
Dual-Shutter Optical Vibration Sensing双快门光学振动传感
DoubleField: Bridging the Neural Surface and Radiance Fields for High-fidelity Human Reconstruction and RenderingDoubleField：桥接神经表面和辐射场以进行高保真人体重建和渲染
Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks with Implicit Gradients3D 点云的稳健结构化声明性分类器：使用隐式梯度防御对抗性攻击
TubeR: Tubelet Transformer for Video Action DetectionTubeR：用于视频动作检测的 Tubelet 变压器
Data-Free Network Compression via Parametric Non-uniform Mixed Precision Quantization通过参数非均匀混合精度量化的无数据网络压缩
Contour-Hugging Heatmaps for Landmark Detection用于地标检测的轮廓拥抱热图
Local Attention Pyramid for Scene Image Generation用于场景图像生成的局部注意力金字塔
Implicit Feature Decoupling with Depthwise Quantization使用深度量化的隐式特征解耦
InsetGAN for Full-Body Image Generation用于全身图像生成的 InsetGAN
Recurrent Variational Network: A Deep Learning Inverse Problem Solver applied to the task of Accelerated MRI Reconstruction循环变分网络：应用于加速 MRI 重建任务的深度学习逆问题求解器
Robust Invertible Image Steganography强大的可逆图像隐写术
Disentangling visual and written concepts in CLIP在 CLIP 中解开视觉和书面概念
Causal CLIP Fine-tuning for Fashion Product Retrieval时尚产品检索的因果剪辑微调
Accelerating Neural Network Optimization Through an Automated Control Theory Lens通过自动控制理论镜头加速神经网络优化
Comprehending and Ordering Semantics for Image Captioning理解和排序图像字幕的语义
Grounded Language-Image Pre-training扎根语言-图像预训练
Hierarchical Self-supervised Representation Learning for Movie Understanding用于电影理解的分层自监督表示学习
RSTT: Real-time Spatial Temporal Transformer for Space-Time Video Super-ResolutionRSTT：用于时空视频超分辨率的实时时空变换器
DirecFormer: A Directed Attention in Transformer Approach to Robust Action RecognitionDirecFormer：稳健动作识别的 Transformer 方法中的定向注意
Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes用于部分可观察场景的一致性驱动的顺序 Transformers 注意力模型
Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-AttentionParamixer：参数化稀疏因子中的混合链接比点积自我注意效果更好
How Well Do Sparse ImageNet Models Transfer?稀疏 ImageNet 模型的迁移效果如何？
Towards Principled Disentanglement for Domain Generalization面向领域泛化的原则解耦
Task-Adaptive Negative Class Envision for Few-Shot Open-Set Recognition用于 Few-Shot Open-Set 识别的任务自适应负类设想
Path-CNN: Topology-Aware Centerline Segmentation Using Sparse AnnotationPath-CNN：使用稀疏注释的拓扑感知中心线分割
Image Based Reconstruction of Liquids from 2D Surface Detections基于二维表面检测的液体图像重建
Neural Convolutional Surfaces神经卷积表面
Graph-context Attention Networks for Size-varied Deep Graph Matching用于大小变化的深度图匹配的图上下文注意网络
Learning to Solve Hard Minimal Problems学习解决最小的难题
Neural Mesh Simplification神经网格简化
SPAct: Self-supervised Privacy Preservation for Action RecognitionSPAct：用于动作识别的自我监督隐私保护
Towards Language-free Training for Text-to-Image Generation面向文本到图像生成的无语言培训
Rep-Net: Efficient On-Device Learning via Feature ReprogrammingRep-Net：通过特征重编程实现高效的设备学习
3D-VField: Learning to Adversarially Deform Point Clouds for Robust 3D Object Detection3D-VField：学习对抗变形点云以进行稳健的 3D 对象检测
TrackFormer: Multi-Object Tracking with TransformersTrackFormer：使用 Transformer 进行多对象跟踪
Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings深度 3D 到 2D 水印：在 3D 网格中嵌入消息并从 2D 渲染中提取它们
A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes图像分类模型对前景、背景和视觉属性敏感性的综合研究
EnvEdit: Environment Editing for Vision-and-Language NavigationEnvEdit：视觉和语言导航的环境编辑
DeepFace-EMD: Re-ranking using Patch-wise Earth Mover’s Distance Improves Out-of-Distribution Face IdentificationDeepFace-EMD：使用 Patch-wise Earth Mover 的距离重新排序改进了分布外人脸识别
Mega-NERF: Scalable Construction of Large-Scale NeRFs for Virtual Fly-ThroughsMega-NERF：用于虚拟穿越的大型 NeRF 的可扩展构建
MulT: An End-to-End Multitask Learning TransformerMulT：端到端的多任务学习转换器
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance FieldsMip-NeRF 360：无界抗锯齿神经辐射场
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection通过自我监督利用真实会说话的面孔进行稳健的伪造检测
Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework使用所有标签：分层多标签对比学习框架
Plenoxels: Radiance Fields without Neural NetworksPlenoxels：没有神经网络的辐射场
Pushing the Limits of Simple Pipelines for Practical Few-Shot Learning突破简单流水线的极限以进行实用的小样本学习
PONI: Potential Functions for ObjectGoal Navigation with Interaction-free LearningPONI：ObjectGoal Navigation with Interaction-free Learning 的潜在功能
CO-SNE: Dimensionality Reduction and Visualization for Hyperbolic DataCO-SNE：双曲线数据的降维和可视化
EASE: Unsupervised Discriminant Subspace Learning for Transductive Few-Shot LearningEASE：无监督判别子空间学习，用于转导小样本学习
3D Photo Stylization: Learning to Generate Stylized Novel Views from a Single Image3D 照片风格化：学习从单个图像生成风格化的新颖视图
SIMBAR: Single Image-Based Scene Relighting For Effective Data Augmentation For Automated Driving Vision TasksSIMBAR：基于单一图像的场景重新照明，用于自动驾驶视觉任务的有效数据增强
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language TasksVL-Adapter：视觉和语言任务的参数高效迁移学习
VALHALLA: Visual Hallucination for Machine TranslationVALHALLA：机器翻译的视觉幻觉
Learning Pairwise Affinity for Open-World Instance Segmentation学习开放世界实例分割的成对亲和力
CAD: Co-Adapting Discriminative Features for Improved Few-Shot ClassificationCAD：为改进的 Few-Shot 分类共同适应判别特征
Investigating the Impact of Multi-LiDAR Placement on Object Detection for Autonomous Driving研究多激光雷达放置对自动驾驶目标检测的影响
Hypergraph-Induced Semantic Tuplet Loss for Deep Metric Learning用于深度度量学习的超图诱导语义元组损失
Generalized Category Discovery广义类别发现
Deep Image-based Illumination Harmonization基于深度图像的照明协调
Mixed Differential Privacy in Computer Vision计算机视觉中的混合差分隐私
MUSE-VAE: Multi-Scale VAE for Environment-Aware Long Term Trajectory PredictionMUSE-VAE：用于环境感知长期轨迹预测的多尺度 VAE
UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual DialogUTC：用于视觉对话的具有任务间对比学习的统一转换器
Weakly Supervised Rotation-Invariant Aerial Object Detection Network弱监督旋转不变航空目标检测网络
Evaluation-oriented Knowledge Distillation for Deep Face Recognition面向评价的深度人脸识别知识蒸馏
Robust Cross-Modal Representation Learning with Progressive Self-Distillation具有渐进式自蒸馏的鲁棒跨模态表示学习
Transformer Tracking with Cyclic Shifting Window Attention具有循环移位窗口注意的变压器跟踪
LTP: Lane-based Trajectory Prediction for Autonomous DrivingLTP：基于车道的自动驾驶轨迹预测
Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers使用伤口分割和重建生成 3D 生物可打印贴片以治疗糖尿病足溃疡
Multi-instance Point Cloud Registration by Efficient Correspondence Clustering通过高效的对应聚类进行多实例点云注册
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video RecognitionAdaFocus V2：用于视频识别的空间动态网络端到端训练
AutoLoss-GMS: Searching Generalized Margin-based Softmax Loss Function for Person Re-identificationAutoLoss-GMS：搜索基于广义边距的 Softmax 损失函数进行人员重新识别
Convolution of Convolution: Let Kernels Spatially Collaborate卷积的卷积：让内核在空间上协作
DiffPoseNet: Direct Differentiable Camera Pose EstimationDiffPoseNet：直接可微分相机姿态估计
Modeling sRGB Camera Noise with Normalizing Flows使用归一化流对 sRGB 相机噪声进行建模
Semantic-shape Adaptive Feature Modulation for Semantic Image Synthesis语义图像合成的语义形状自适应特征调制
Federated Learning with Position-Aware Neurons具有位置感知神经元的联邦学习
Symmetry and Uncertainty-Aware Object SLAM for 6DoF Object Pose Estimation用于 6DoF 对象姿态估计的对称性和不确定性感知对象 SLAM
Point Density-Aware Voxels for LiDAR 3D Object Detection用于 LiDAR 3D 对象检测的点密度感知体素
A Conservative Approach for Unbiased Learning on Unknown Biases一种关于未知偏差的无偏学习的保守方法
The Majority Can Help the Minority: Context-rich Minority Oversampling for Long-tailed Classification多数可以帮助少数：用于长尾分类的上下文丰富的少数过采样
Symmetry-aware Neural Architecture for Embodied Visual Exploration用于体现视觉探索的对称感知神经架构
DearKD: Data-Efficient Early Knowledge Distillation for Vision TransformersDearKD：视觉变形金刚的数据高效早期知识蒸馏
Egocentric Prediction of Action Target in 3D以自我为中心的 3D 行动目标预测
What makes transfer learning work for medical images: feature reuse & other factors是什么让迁移学习适用于医学图像：特征重用和其他因素
Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification零镜头视频分类的对齐-均匀性感知表示学习
Unsupervised Learning of De-biased Representation with Pseudo-bias Attribute具有伪偏属性的去偏表示的无监督学习
DECORE: Deep Compression with Reinforcement LearningDECORE：使用强化学习进行深度压缩
RGB-Depth Fusion GAN for Indoor Depth Completion用于室内深度完成的 RGB 深度融合 GAN
MERLOT Reserve: Neural Script Knowledge through Vision and Language and SoundMERLOT Reserve：通过视觉、语言和声音的神经脚本知识
Class-Aware Contrastive Semi-Supervised Learning类感知对比半监督学习
Learning to Prompt for Continual Learning学习提示持续学习
DEFEAT: Deep Hidden Feature Backdoor Attacks by Imperceptible Perturbation and Latent Representation ConstraintsDEFEAT：通过不可察觉的扰动和潜在表示约束的深度隐藏特征后门攻击
Self-Supervised Dense Consistency Regularization for Image-to-Image Translation图像到图像转换的自监督密集一致性正则化
Forward Compatible Training for Large-Scale Embedding Retrieval Systems大规模嵌入检索系统的前向兼容训练
Joint Forecasting of Panoptic Segmentations with Difference Attention具有差异注意的全景分割联合预测
Revisiting the Transferability of Supervised Pretraining: an MLP Perspective重新审视监督预训练的可迁移性：MLP 视角
Disentangling Visual Embeddings for Attributes and Objects解开属性和对象的视觉嵌入
SeeThroughNet: Resurrection of Auxiliary Loss by Preserving Class Probability InformationSeeThroughNet：通过保留类概率信息来复活辅助损失
Neural Reflectance for Shape Recovery with Shadow Handling使用阴影处理进行形状恢复的神经反射
Topology-Preserving Shape Reconstruction and Registration via Neural Diffeomorphic Flow基于神经微分流的拓扑保持形状重建和配准
XYDeblur: Divide and Conquer for Single Image DeblurringXYDeblur：单幅图像去模糊的分而治之
ScePT: Scene-consistent, Policy-based Trajectory Predictions for PlanningScePT：用于规划的场景一致、基于策略的轨迹预测
Visual Acoustic Matching视声匹配
Fair Contrastive Learning for Facial Attribute Classification面部属性分类的公平对比学习
Neural Prior for Trajectory Estimation用于轨迹估计的神经先验
AutoMine: An Unmanned Mine DatasetAutoMine：无人矿山数据集
SMARTADAPT: Multi-branch Object Detection Framework for Videos on MobilesSMARTADAPT：用于手机视频的多分支对象检测框架
Neural Face Identification in a 2D Wireframe Projection of a Manifold Object流形对象的二维线框投影中的神经人脸识别
AlignMixup: Improving Representations By Interpolating Aligned FeaturesAlignMixup：通过插入对齐的特征来改进表示
Memory-Augmented Non-Local Attention for Video Super-Resolution用于视频超分辨率的内存增强非局部注意
ESCNet: Gaze Target Detection with the Understanding of 3D ScenesESCNet：基于 3D 场景的凝视目标检测
AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by Learnable Motion GenerationAdaptPose：通过可学习运动生成进行 3D 人体姿势估计的跨数据集自适应
Distinguishing Unseen from Seen for Generalized Zero-shot Learning区分 Unseen 和 Seen 以进行广义零样本学习
When Does Contrastive Visual Representation Learning Work?对比视觉表征学习何时起作用？
Privacy-preserving Online AutoML for Domain-Specific Face Detection用于特定领域人脸检测的隐私保护在线 AutoML
Robust outlier detection by de-biasing VAE likelihoods通过去偏 VAE 可能性进行稳健的异常值检测
GridShift: A Faster Mode-seeking Algorithm for Image Segmentation and Object TrackingGridShift：用于图像分割和对象跟踪的更快模式搜索算法
Continual Learning with Lifelong Vision Transformer使用 Lifelong Vision Transformer 持续学习
M2I: From Factored Marginal Trajectory Prediction to Interactive PredictionM2I：从因子边际轨迹预测到交互式预测
Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability随机方差减少集成对抗攻击以提高对抗可转移性
Representing 3D Shapes with Probabilistic Directed Distance Fields用概率定向距离场表示 3D 形状
Restormer: Efficient Transformer for High-Resolution Image RestorationRestormer：用于高分辨率图像恢复的高效变压器
Learning with Twin Noisy Labels for Visible-Infrared Person Re-Identification使用双噪声标签学习可见红外人员重新识别
Few-shot Learning with Noisy Labels带有噪声标签的小样本学习
Co-Domain Symmetry for Complex-Valued Deep Learning复值深度学习的共域对称
Pyramid Architecture for Multi-Scale Processing in Point Cloud Segmentation点云分割中多尺度处理的金字塔结构
GCR: Gradient Coreset based Replay Buffer Selection for Continual LearningGCR：用于持续学习的基于梯度核心集的重放缓冲区选择
Domain Adaptation on Point Clouds via Geometry-Aware Implicits通过几何感知隐式对点云进行域自适应
Ranking-Based Siamese Visual Tracking基于排名的连体视觉跟踪
Coarse-to-Fine Disentangling Transformer for Human-Object Interaction Detection用于人-物体交互检测的粗到细解缠结变压器
MDAN: Multi-level Dependent Attention Network for Visual Emotion AnalysisMDAN：用于视觉情感分析的多级依赖注意网络
AdaSTE: An Adaptive Straight-Through Estimator to Train Binary Neural NetworksAdaSTE：一种用于训练二元神经网络的自适应直通估计器
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image ManipulationDiffusionCLIP：用于鲁棒图像处理的文本引导扩散模型
DTA: Physical Camouflage Attacks using Differentiable Transformation NetworkDTA：使用可微变换网络的物理伪装攻击
Layer-wised Model Aggregation for Personalized Federated Learning用于个性化联邦学习的分层模型聚合
Video Swin Transformer视频旋转变压器
Online Continual Learning on a Contaminated Data Stream with Blurry Task Boundaries任务边界模糊的受污染数据流的在线持续学习
General Incremental Learning with Domain-aware Categorical Representations具有领域感知分类表示的一般增量学习
Crafting Better Contrastive Views for Siamese Representation Learning为连体表示学习制作更好的对比视图
A Style-aware Discriminator for Controllable Image Translation一种用于可控图像翻译的风格感知鉴别器
BoosterNet: Improving Domain Generalization of Deep Neural Nets using Culpability-Ranked FeaturesBoosterNet：使用 Culpability-Ranked 特征改进深度神经网络的域泛化
A Unified Framework for Implicit Sinkhorn Differentiation隐式 Sinkhorn 微分的统一框架
Brain-Supervised Image Editing脑监督图像编辑
Neural Shape Mating: Self-Supervised Object Assembly with Adversarial Shape Priors神经形状匹配：具有对抗形状先验的自我监督对象组装
Multimodal Colored Point Cloud to Image Alignment多模式彩色点云到图像对齐
Graph-based Spatial Transformer with Memory Replay for Multi-future Pedestrian Trajectory Prediction用于多未来行人轨迹预测的具有记忆重放的基于图的空间变换器
Multi-Objective Diverse Human Motion Prediction with Knowledge Distillation基于知识蒸馏的多目标多样化人体运动预测
Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart两个耦合的拒绝指标可以区分对抗性示例
Autoregressive Image Generation using Residual Quantization使用残差量化的自回归图像生成
SGTR: End-to-end Scene Graph Generation with TransformerSGTR：使用 Transformer 生成端到端场景图
Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-robust Makeup Transfer保护面部隐私：通过风格稳健的化妆转移生成对抗性身份面具
PPDL: Predicate Probability Distribution based Loss for Unbiased Scene Graph GenerationPPDL：基于谓词概率分布的无偏场景图生成损失
Localized Adversarial Domain Generalization局部对抗域泛化
Patch-level Representation Learning for Self-supervised Vision Transformers自监督视觉转换器的补丁级表示学习
KNN Local Attention for Image Restoration用于图像恢复的 KNN 局部注意力
Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation通过弹性响应蒸馏克服增量目标检测中的灾难性遗忘
PILC: Practical Image Lossless Compression with an End-to-end GPU Oriented Neural FrameworkPILC：具有端到端面向 GPU 的神经框架的实用图像无损压缩
DAD-3DHeads: A Large-scale Dense, Accurate and Diverse Dataset for 3D Dense Head Alignment from a Single ImageDAD-3DHeads：用于从单个图像进行 3D 密集头部对齐的大规模密集、准确和多样化的数据集
Is Mapping Necessary for Realistic PointGoal Navigation?现实点目标导航是否需要映射？
Cross-Domain Correlation Distillation for Unsupervised Domain Adaptation in Nighttime Semantic Segmentation夜间语义分割中无监督域自适应的跨域相关蒸馏
LiT: Zero-Shot Transfer with Locked-image text TuningLiT：带锁定图像文本调整的零样本传输
Scaling Vision Transformers缩放视觉变形金刚
Spatial Commonsense Graph for Object Localisation in Partial Scenes局部场景中对象定位的空间常识图
Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video基于物理的单目视频 3d 人体姿态重建的轨迹优化
3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos3MASSIV：社交媒体短视频的多语言、多模态和多方面数据集
Upright-Net: Learning Upright Orientation for 3D Point CloudUpright-Net：学习 3D 点云的垂直方向
D*-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object DetectionD*-V2X：用于车辆-基础设施协同 3D 目标检测的大规模数据集
Differentiable Dynamics for Articulated 3d Human Motion Reconstruction关节式 3d 人体运动重建的微分动力学
Clean Implicit 3D Structure from Noisy 2D STEM Images从嘈杂的 2D STEM 图像中清除隐式 3D 结构
MPC: Multi-view Probabilistic ClusteringMPC：多视图概率聚类
Node-aligned Graph Convolutional Network for Whole-slide Image Representation and Classification用于全幻灯片图像表示和分类的节点对齐图卷积网络
Multidimensional Belief Quantification for Label-Efficient Meta-Learning标签高效元学习的多维信念量化
Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection用于鲁棒异常检测的贝叶斯非参数子模视频分区
Uni6D: A Unified CNN Framework without Projection Breakdown in 6D Pose EstimationUni6D：在 6D 姿态估计中没有投影分解的统一 CNN 框架
Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks探索图像到图像翻译任务中对比学习的补丁语义关系
Enabling Equivariance for Arbitrary Lie Groups为任意李群启用等方差
Multi-Scale Memory-Based Video Deblurring基于内存的多尺度视频去模糊
Privacy Preserving Partial Localization隐私保护部分本地化
Towards Robust and Reproducible Active Learning using Neural Networks使用神经网络实现稳健和可重复的主动学习
Marginal Contrastive Correspondence for Exemplar-based Image Translation基于样本的图像翻译的边际对比对应
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repeated Action CountingTransRAC：使用 Transformer 编码多尺度时间相关性以进行重复动作计数
Training High-Performance Low-Latency Spiking Neural Networks by Differentiation on Spike Representation通过尖峰表示的微分训练高性能低延迟尖峰神经网络
FaceFormer: Speech-Driven 3D Facial Animation with TransformersFaceFormer：带有变形金刚的语音驱动的 3D 面部动画
LARGE: Latent-Based Regression Through GAN Semantics大：通过 GAN 语义进行基于潜在的回归
TransVPR: Transformer-Based Place Recognition with Multi-Level Attention AggregationTransVPR：具有多级注意力聚合的基于 Transformer 的位置识别
AR-NeRF: Unsupervised Learning of Depth and Defocus Effects from Natural Images with Aperture Rendering Neural Radiance FieldsAR-NeRF：具有孔径渲染神经辐射场的自然图像的深度和散焦效果的无监督学习
CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object DetectionCAT-Det：用于多模态 3D 对象检测的对比增强变压器
SASIC: Stereo Image Compression with Latent Shifts and Stereo AttentionSASIC：具有潜在移位和立体注意的立体图像压缩
Controllable Animation of Fluid Elements in Still Images静止图像中流体元素的可控动画
Revisiting BatchNorm’s Learnable Affines in Few-Shot Transfer Learning在 Few-Shot 迁移学习中重新审视 BatchNorm 的可学习仿射
Learning Graph Regularisation for Guided Super-Resolution引导超分辨率的学习图正则化
Topology Preserving Local Road Network Estimation from Single Onboard Camera Image从单个车载摄像头图像中保留局部路网估计的拓扑
Video-Text Representation Learning via Differentiable Weak Temporal Alignment通过可微弱时间对齐学习视频-文本表示
BppAttack: Stealthy and Efficient Trojan Attacks against Deep Neural Networks via Image Quantization and Contrastive Adversarial LearningBppAttack：通过图像量化和对比对抗学习对深度神经网络进行隐秘而高效的特洛伊木马攻击
Face2Exp: Combating Data Biases for Facial Expression RecognitionFace2Exp：消除面部表情识别的数据偏差
Leveraging Equivariant Features for Absolute Pose Regression利用等变特征进行绝对姿势回归
Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut使用归一化切割的无监督对象发现的自监督变压器
Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry通过融合单视图深度概率与多视图几何进行多视图深度估计
ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point CloudsZZ-Net：二维点云的通用旋转等变架构
Interactive Disentanglement: Learning Concepts by Interacting with their Prototype Representations交互式解开：通过与原型表示交互来学习概念
Incremental Learning in Semantic Segmentation from Image Labels从图像标签进行语义分割的增量学习
Complex Backdoor Detection by Symmetric Feature Differencing基于对称特征差分的复杂后门检测
Constrained Few-shot Class-incremental Learning约束少样本类增量学习
HyperSegNAS: Bridging One-Shot Neural Architecture Search with 3D Medical Image Segmentation using HyperNetHyperSegNAS：使用 HyperNet 将 One-Shot 神经架构搜索与 3D 医学图像分割相结合
Amodal Panoptic SegmentationAmodal全景分割
Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency不仅仅是选择，而是探索：通过双视图一致性的在线课堂增量持续学习
Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via DiscretisationCoarse-to-Fine Q-attention：通过离散化实现视觉机器人操作的高效学习
Learning ABCs: Approximate Bijective Correspondence for isolating factors of variation学习 ABC：用于隔离变异因素的近似双射对应
Pin the Memory: Learning to Generalize Semantic Segmentation固定记忆：学习泛化语义分割
Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment基于高斯云 Logit 调整的长尾视觉识别
Knowledge distillation: A good teacher is patient and consistent知识升华：好老师有耐心、始终如一
Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language具有跨模态注意力和语言的视听广义零样本学习
Searching the Deployable Convolution Neural Networks for GPUs在可部署的卷积神经网络中搜索 GPU
MLP-3D: A MLP-like 3D Architecture with Grouped Time MixingMLP-3D：具有分组时间混合的类似 MLP 的 3D 架构
Condensing CNNs with Partial Differential Equations用偏微分方程压缩 CNN
Adaptive Early-Learning Correction for Segmentation from Noisy Annotations从噪声注释中分割的自适应早期学习校正
Bounded Adversarial Attack on Deep Content Features对深度内容特征的有界对抗攻击
Towards Driving-Oriented Metric for Lane Detection Models迈向车道检测模型的驾驶导向指标
Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness请注意：点积注意力被认为对对抗性补丁的鲁棒性有害
Better Trigger Inversion Optimization in Backdoor Scanning后门扫描中更好的触发反转优化
Leveling Down in Computer Vision: Pareto Inefficiencies in Fair Deep Classifiers计算机视觉水平下降：公平深度分类器中的帕累托效率低下
Towards Understanding and Simplifying MoCo: Dual Temperature Helps Contrastive Learning without Many Negative Samples走向理解和简化 MoCo：双温度有助于在没有很多负样本的情况下进行对比学习
Smooth Maximum Unit: Smooth Activation Function for Deep Networks using Smoothing Maximum Technique平滑最大单元：使用平滑最大技术的深度网络平滑激活函数
Text-to-Image Synthesis based on Object-Guided Joint-Decoding Transformer基于对象引导联合解码转换器的文本到图像合成
Image Segmentation Using Text and Image Prompts使用文本和图像提示进行图像分割
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation自监督 3D 人体姿态估计的不确定性感知适应
Vision-Language Pre-Training with Triple Contrastive Learning三重对比学习的视觉语言预训练
Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations时间上下文很重要：使用疾病进展表示增强单图像预测
Globetrotter: Connecting Languages by Connecting ImagesGlobetrotter：通过连接图像连接语言
Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures with Uncalibrated Stereo Data使用未校准立体数据的数据集混合进行单阶段 3D 几何保留深度估计模型训练
It’s Time for Artistic Correspondence in Music and Video是时候在音乐和视频中进行艺术通信了
Equivariant Point Set Analysis via Learning Orientations for Message Passing通过消息传递的学习方向进行等变点集分析
KeyTr: Keypoint Transporter for 3D Reconstruction of Deformable Objects in VideosKeyTr：用于 3D 重建视频中可变形对象的关键点传输器
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak SupervisionP3IV：来自监督薄弱的教学视频的概率程序规划
GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for Multi-category Attributes PredictionGlideNet：用于多类别属性预测的基于全局、局部和内在的密集嵌入网络
MatchFAME: Fast, Accurate and Memory-Efficient Multi-Object MatchingMatchFAME：快速、准确且内存高效的多对象匹配
Neural Emotion Director: Speech-preserving semantic control of facial expressions in “in-the-wild” videosNeural Emotion Director：“in-the-wild”视频中面部表情的语音保留语义控制
Id-Free Person Similarity Learning无身份的人相似性学习
Alleviating Emotional bias in Affective Image Captioning by Contrastive Data Collection通过对比数据收集减轻情感图像字幕中的情绪偏见
A study on the distribution of social biases in self-supervised learning visual models自监督学习视觉模型中社会偏见分布的研究
Motron: Multimodal Probabilistic Human Motion ForecastingMotron：多模式概率人体运动预测
Gaussian Process Modeling of Approximate Inference Errors for Variational Autoencoders变分自动编码器近似推理误差的高斯过程建模
Real-time hyperspectral imaging in hardware via trained metasurface encoders通过训练有素的超表面编码器在硬件中进行实时高光谱成像
SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and SynthesisSmartPortraits：用于状态估计、重建和合成的人像深度供电手持式智能手机数据集
Improving Segmentation of the Inferior Alveolar Nerve through Deep Label Propagation通过深度标签传播改进下肺泡神经的分割
SLIC: Self-Supervised Learning with Iterative Clustering for Human Action VideosSLIC：具有迭代聚类的人类动作视频的自我监督学习
Self-supervised Spatial Reasoning on Multi-View Line Drawings多视图线图的自监督空间推理
Contrastive Test-Time Adaptation对比测试时间适应
Why Discard if You can Recycle?:A Recycling Max Pooling Module for 3D Point Cloud Analysis如果可以回收，为什么要丢弃？：用于 3D 点云分析的回收最大池化模块
Do learned representations respect causal relationships?学习表示尊重因果关系吗？
Zero-Query Transfer Attacks on Context-Aware Object Detectors对上下文感知对象检测器的零查询传输攻击
Training Quantised Neural Networks with STE Variants: the Additive Noise Annealing Algorithm使用 STE 变体训练量化神经网络：加性噪声退火算法
Contrastive Dual Gating: Learning Sparse Features With Contrastive Learning对比双门控：通过对比学习学习稀疏特征
Efficient Maximal Coding Rate Reduction by Variational Forms变分形式的有效最大编码率降低
Everything at Once - Multi-modal Fusion Transformer for Video Retrieval一切尽在 - 用于视频检索的多模态融合转换器
Towards Efficient and Scalable Sharpness-Aware Minimization迈向高效和可扩展的锐度感知最小化
X-Pool: Cross-Modal Language-Video Attention for Text-Video RetrievalX-Pool：用于文本视频检索的跨模态语言视频注意
Merry Go Round: Rotate a Frame and Fool a DNNMerry Go Round：旋转框架并愚弄 DNN
Label-Only Model Inversion Attacks via Boundary Repulsion通过边界排斥的仅标签模型反转攻击
Style-Structure Disentangled Features and Normalizing Flows for Diverse Icon Colorization风格结构分离特征和标准化流程以实现不同的图标着色
How Much More Data Do I Need? Estimating Requirements For Downstream Tasks我还需要多少数据？估算下游任务的需求
A sampling-based approach for efficient clustering in large datasets一种在大型数据集中进行有效聚类的基于抽样的方法
Deep Equilibrium Optical Flow Estimation深度平衡光流估计
Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values极性采样：通过奇异值对预训练生成网络的质量和多样性控制
Multi-label Iterated Learning for Image Classification with Label Ambiguity具有标签模糊度的图像分类的多标签迭代学习
Cross-modal Map Learning for Vision and Language Navigation用于视觉和语言导航的跨模式地图学习
Learning with Neighbor Consistency for Noisy Labels噪声标签的邻居一致性学习
Measuring Compositional Consistency for Video Question Answering测量视频问答的组成一致性
Failure Modes of Domain Generalization Algorithms领域泛化算法的失效模式
AutoRF: Learning 3D Object Radiance Fields from Single View ObservationsAutoRF：从单视图观察中学习 3D 对象辐射场
A Unified Model for Line Projections in Catadioptric Cameras折反射相机中线投影的统一模型
OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural NetworksOrphicX：用于解释图神经网络的因果关系启发的潜在变量模型
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning通过视觉语言验证和迭代推理提高视觉基础
Cluster-guided Image Synthesis with Unconditional Models无条件模型的聚类引导图像合成
Self-supervised object detection from audio-visual correspondence基于视听对应的自我监督目标检测
Clipped Hyperbolic Classifiers Are Super-Hyperbolic Classifiers剪裁双曲分类器是超双曲分类器
Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning本地学习很重要：重新思考联邦学习中的数据异构性
Weakly-Supervised Generation and Grounding of Visual Descriptions with Conditional Generative Models带有条件生成模型的视觉描述的弱监督生成和基础
How much does input data type impact final face model accuracy?输入数据类型对最终人脸模型的准确性有多大影响？
Certified Patch Robustness via Smoothed Vision Transformers通过平滑视觉变压器认证的补丁鲁棒性
PubTables-1M: Towards comprehensive table extraction from unstructured documentsPubTables-1M：从非结构化文档中进行全面的表格提取
Fine-tuning Image Transformers using Learnable Memory使用可学习内存微调图像转换器
GuideFormer: Transformers for Image Guided Depth CompletionGuideFormer：用于图像引导深度完成的变形金刚
Motion-Adjustable Neural Implicit Video Representation运动可调神经隐式视频表示
LiDARCap: Long-range Marker-less 3D Human Motion Capture with LiDAR Point CloudsLiDARCap：使用 LiDAR 点云进行远程无标记 3D 人体运动捕捉
Multi-modal Alignment using Representation Codebook使用表示码本的多模式对齐
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External KnowledgeNOC-REK：从外部知识中检索词汇的新颖对象字幕
Investigating Top- $k$ White-Box and Transferable Black-box Attack调查 Top- $k$ 白盒和可转移黑盒攻击
GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision计算机视觉中最小问题的基于 GPU 的同伦延续
On the Instability of Relative Pose Estimation and RANSAC’s Role关于相对姿态估计的不稳定性和 RANSAC 的作用
Dual Task Learning by Leveraging Both Dense Correspondence and Mis-Correspondence for Robust Change Detection With Imperfect Matches通过利用密集对应和错误对应进行双重任务学习，以实现具有不完美匹配的稳健变化检测
M3L: Language-based Video Editing via Multi-Modal Multi-Level TransformersM3L：通过多模式多级转换器进行基于语言的视频编辑
Dynamic Scene Graph Generation via Anticipatory Pre-training通过预期预训练生成动态场景图
ScanQA: 3D Question Answering for Spatial Scene UnderstandingScanQA：用于空间场景理解的 3D 问答
PixMix: Dreamlike Pictures Comprehensively Improve Safety MeasuresPixMix：梦幻般的画面全面提升安全措施
Large Images as Long Documents: Hierarchical ViTs with Self-Supervised Pretraining in Gigapixel Image Pyramids大图像作为长文档：在千兆像素图像金字塔中具有自我监督预训练的分层 ViT
Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection人类对象交互检测中变形金刚解码路径增强的一致性学习
On Guiding Visual Attention with Language Specification用语言规范引导视觉注意
OnePose: One-Shot Object Pose Estimation without CAD ModelsOnePose：没有 CAD 模型的 One-Shot 对象姿态估计
Thin-Plate Spline Motion Model for Image Animation用于图像动画的薄板样条运动模型
PokeBNN: A Binary Pursuit of Lightweight AccuracyPokeBNN：对轻量级准确性的二元追求
Semi-Supervised Few-shot Learning via Multi-Factor Clustering基于多因素聚类的半监督小样本学习
FashionVLP: Vision Language Transformer for Fashion Retrieval with FeedbackFashionVLP：带有反馈的时尚检索视觉语言转换器
CLIPstyler: Image Style Transfer with a Single Text ConditionCLIPstyler：具有单一文本条件的图像风格转移
Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather ConditionsIthaca365：重复和具有挑战性的天气条件下的数据集和驾驶感知
Out-of-distribution Generalization with Causal Invariant Transformations具有因果不变变换的分布外泛化
Zero-Shot Text-Guided Object Generation with Dream Fields具有梦想场的零样本文本引导对象生成
Noise Distribution Adaptive Self-Supervised Image Denoising using Tweedie Distribution and Score Matching使用 Tweedie 分布和分数匹配的噪声分布自适应自监督图像去噪
TransGeo: Transformer Is All You Need for Cross-view Image Geo-localizationTransGeo：TransGeo 是跨视图图像地理定位所需的全部
NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation ModelsNICGSlowDown：评估神经图像字幕生成模型的效率鲁棒性
Deep Unlearning via Randomized Conditionally Independent Hessians通过随机条件独立 Hessians 进行深度学习
Multi-Modal Dynamic Graph Transformer for Visual Grounding用于视觉接地的多模态动态图变换器
Propagation Regularizer for Semi-supervised Learning with Extremely Scarce Labeled Samples带有极少标记样本的半监督学习的传播正则化器
Discrete Wasserstein Distributional Matching for Quantization in Image Hashing图像散列中量化的离散 Wasserstein 分布匹配
Robust fine-tuning of zero-shot models零样本模型的稳健微调
Probabilistic Representations for Video Contrastive Learning视频对比学习的概率表示
Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic ContractionCome-Closer-Diffuse-Faster：通过随机收缩加速逆问题的条件扩散模型
Fine-Grained Object Classification via Self-Supervised Pose Alignment通过自监督姿势对齐进行细粒度对象分类
One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones一步一步：里程碑式的远景视觉和语言导航
A Framework for Learning Ante-hoc Explainable Models via Concepts通过概念学习事前可解释模型的框架
Retrieval Augmented Classification for Long Tail Visual Recognition长尾视觉识别的检索增强分类
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization深谱方法：无监督语义分割和定位的惊人强基线
Learning Video Representations of Human Motion from Synthetic Data从合成数据中学习人体运动的视频表示
Exploiting Pseudo Labels in a Self-Supervised Learning Framework for Improved Monocular Depth Estimation在自我监督学习框架中利用伪标签改进单目深度估计
Efficient Deep Embedded Subspace Clustering高效的深度嵌入子空间聚类
Local-Adaptive Face Recognition via Graph-based Meta-Clustering and Regularized Adaptation基于图元聚类和正则化自适应的局部自适应人脸识别
GenDR: A Generalized Differentiable RendererGenDR：广义可微渲染器
Fingerprinting Deep Neural Networks Globally via Universal Adversarial Perturbations通过通用对抗性扰动在全球范围内对深度神经网络进行指纹识别
Learning Multiple Adverse Weather Removal via Two-stage Knowledge Learning and Multi-contrastive Regularization: Toward a Unified Model通过两阶段知识学习和多对比正则化学习多个不利天气去除：迈向统一模型

CVPR2022论文列表（中英对照）

相关文章

【数据可视化】第二章——基于matplotlib的数据可视化

大新闻｜高通将为Meta定制VR芯片，Quest端Beat Saber售出650万份

还在用chatGPT聊天？《元宇宙2086》已开始用AIGC做漫画连载了！

联想乐Pad A1入手！开箱图片鉴赏（点击打开高清大图）

荣耀智慧屏：AI与IoT碰撞而生的“变形金刚”

【Protobuf速成指南】enum类型的使用

3.多线程之JUC并发编程0

【Hello MySQL】数据库基础