计算机视觉论文-2021-07-27

本专栏是计算机视觉方向论文收集积累，时间：2021年7月27日，来源：paper digest

欢迎关注原创公众号 【计算机视觉联盟】，回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记！

直达笔记地址：机器学习手推笔记（GitHub地址）

1, TITLE: 3D AGSE-VNet: An Automatic Brain Tumor MRI Data Segmentation Framework
AUTHORS: XI GUAN et. al.
CATEGORY: cs.AI [cs.AI, cs.CV, cs.LG]
HIGHLIGHT: Methods: To meet the above challenges, we propose an automatic brain tumor MRI data segmentation framework which is called AGSE-VNet.

2, TITLE: Benign Adversarial Attack: Tricking Algorithm for Goodness
AUTHORS: Xian Zhao ; Jiaming Zhang ; Zhiyu Lin ; Jitao Sang
CATEGORY: cs.AI [cs.AI, cs.CV]
HIGHLIGHT: Inspired by this, we present brave new idea called benign adversarial attack to exploit adversarial examples for goodness in three directions: (1) adversarial Turing test, (2) rejecting malicious algorithm, and (3) adversarial data augmentation.

3, TITLE: Temporal-wise Attention Spiking Neural Networks for Event Streams Classification
AUTHORS: MAN YAO et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a temporal-wise attention SNN (TA-SNN) model to learn frame-based representation for processing event streams.

4, TITLE: Multi-Label Image Classification with Contrastive Learning
AUTHORS: Son D. Dao ; Ethan Zhao ; Dinh Phung ; Jianfei Cai
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we show that a direct application of contrastive learning can hardly improve in multi-label cases.

5, TITLE: Hand Image Understanding Via Deep Multi-Task Learning
AUTHORS: ZHANG XIONG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To further improve the performance of these tasks, we propose a novel Hand Image Understanding (HIU) framework to extract comprehensive information of the hand object from a single RGB image, by jointly considering the relationships between these tasks.

6, TITLE: An Uncertainty-Aware Deep Learning Framework for Defect Detection in Casting Products
AUTHORS: MARYAM HABIBPOUR et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Extracted features are then processed by various machine learning algorithms to perform the classification task.

7, TITLE: Deep Machine Learning Based Egyptian Vehicle License Plate Recognition Systems
AUTHORS: Mohamed Shehata ; Mohamed Taha Abou-Kreisha ; Hany Elnashar
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: In this paper, four smart systems are developed to recognize Egyptian vehicles license plates.

8, TITLE: Rank & Sort Loss for Object Detection and Instance Segmentation
AUTHORS: Kemal Oksuz ; Baris Can Cam ; Emre Akbas ; Sinan Kalkan
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose Rank & Sort (RS) Loss, as a ranking-based loss function to train deep object detection and instance segmentation methods (i.e. visual detectors).

9, TITLE: Semantic-guided Pixel Sampling for Cloth-Changing Person Re-identification
AUTHORS: Xiujun Shu ; Ge Li ; Xiao Wang ; Weijian Ruan ; Qi Tian
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: This paper proposes a semantic-guided pixel sampling approach for the cloth-changing person re-ID task.

10, TITLE: X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering
AUTHORS: Jingjing Jiang ; Ziyi Liu ; Yifan Liu ; Zhixiong Nan ; Nanning Zheng
CATEGORY: cs.CV [cs.CV, cs.MM]
HIGHLIGHT: In this paper, we formulate OOD generalization in VQA as a compositional generalization problem and propose a graph generative modeling-based training scheme (X-GGM) to handle the problem implicitly.

11, TITLE: Going Deeper Into Semi-supervised Person Re-identification
AUTHORS: Olga Moskvyak ; Frederic Maire ; Feras Dayoub ; Mahsa Baktashmotlagh
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To overcome these limitations, we propose to employ part-based features from a single CNN without requiring the knowledge of the label space (i.e., the number of identities).

12, TITLE: Multi-Echo LiDAR for 3D Object Detection
AUTHORS: Yunze Man ; Xinshuo Weng ; Prasanna Kumar Sivakuma ; Matthew O'Toole ; Kris Kitani
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present a 3D object detection model which leverages the full spectrum of measurement signals provided by LiDAR.

13, TITLE: Two Headed Dragons: Multimodal Fusion and Cross Modal Transactions
AUTHORS: Rupak Bose ; Shivam Pande ; Biplab Banerjee
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: To this end, we propose a novel transformer based fusion method for HSI and LiDAR modalities.

14, TITLE: TinyAction Challenge: Recognizing Real-world Low-resolution Activities in Videos
AUTHORS: PRAVEEN TIRUPATTUR et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a benchmark dataset, TinyVIRAT-v2, which is comprised of naturally occuring low-resolution actions. We use current state-of-the-art action recognition methods on the dataset as a benchmark, and propose the TinyAction Challenge.

15, TITLE: NeLF: Neural Light-transport Field for Portrait View Synthesis and Relighting
AUTHORS: Tiancheng Sun ; Kai-En Lin ; Sai Bi ; Zexiang Xu ; Ravi Ramamoorthi
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: To this end, we present a system for portrait view synthesis and relighting: given multiple portraits, we use a neural network to predict the light-transport field in 3D space, and from the predicted Neural Light-transport Field (NeLF) produce a portrait from a new camera view under a new environmental lighting.

16, TITLE: Facetron: Multi-speaker Face-to-Speech Model Based on Cross-modal Latent Representations
AUTHORS: SE-YUN UM et. al.
CATEGORY: cs.CV [cs.CV, cs.LG, cs.SD, eess.AS]
HIGHLIGHT: In this paper, we propose an effective method to synthesize speaker-specific speech waveforms by conditioning on videos of an individual's face.

17, TITLE: Parametric Contrastive Learning
AUTHORS: Jiequan Cui ; Zhisheng Zhong ; Shu Liu ; Bei Yu ; Jiaya Jia
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose Parametric Contrastive Learning (PaCo) to tackle long-tailed recognition. We introduce a set of parametric class-wise learnable centers to rebalance from an optimization perspective.

18, TITLE: Continental-Scale Building Detection from High Resolution Satellite Imagery
AUTHORS: WOJCIECH SIRKO et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we describe a model training pipeline for detecting buildings across the entire continent of Africa, using 50 cm satellite imagery.

19, TITLE: Synthetic Periocular Iris PAI from A Small Set of Near-Infrared-Images
AUTHORS: Jose Maureira ; Juan Tapia ; Claudia Arellano ; Christoph Busch
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper proposes a novel PAI synthetically created (SPI-PAI) using four state-of-the-art GAN algorithms (cGAN, WGAN, WGAN-GP, and StyleGAN2) and a small set of periocular NIR images.

20, TITLE: Efficient Large Scale Inlier Voting for Geometric Vision Problems
AUTHORS: Dror Aiger ; Simon Lynen ; Jan Hosang ; Bernhard Zeisl
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To approach the problem we present a efficient and general algorithm for outlier rejection based on "intersecting" $k$-dimensional surfaces in $R^d$.

21, TITLE: Language Models As Zero-shot Visual Semantic Learners
AUTHORS: Yue Jiao ; Jonathon Hare ; Adam Pr�gel-Bennett
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a Visual Se-mantic Embedding Probe (VSEP) designed to probe the semantic information of contextualized word embeddings in visual semantic understanding tasks.

22, TITLE: Adaptive Recursive Circle Framework for Fine-grained Action Recognition
AUTHORS: Hanxi Lin ; Xinxiao Wu ; Jiebo Luo
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose an Adaptive Recursive Circle (ARC) framework, a fine-grained decorator for pure feedforward layers.

23, TITLE: Improve Unsupervised Pretraining for Few-label Transfer
AUTHORS: SUICHAN LI et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: But in this paper, we find this conclusion may not hold when the target dataset has very few labeled samples for finetuning, \ie, few-label transfer.

24, TITLE: Bangla Sign Language Recognition Using Concatenated BdSL Network
AUTHORS: Thasin Abedin ; Khondokar S. S. Prottoy ; Ayana Moshruba ; Safayat Bin Hakim
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, a novel architecture "Concatenated BdSL Network" is proposed which consists of a CNN based image network and a pose estimation network.

25, TITLE: Comprehensive Studies for Arbitrary-shape Scene Text Detection
AUTHORS: Pengwen Dai ; Xiaochun Cao
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we carefully examine and analyze the inconsistent settings, and propose a unified framework for the bottom-up based scene text detection methods.

26, TITLE: Denoising and Segmentation of Epigraphical Scripts
AUTHORS: P Preethi ; Hrishikesh Viswanath
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: This paper explores the process of segmentation using Neural Networks.

27, TITLE: ICDAR 2021 Competition on Scene Video Text Spotting
AUTHORS: Zhanzhan Cheng ; Jing Lu ; Baorui Zou ; Shuigeng Zhou ; Fei Wu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of the ICDAR 2021 on SVTS competition.

28, TITLE: Distributional Shifts in Automated Diabetic Retinopathy Screening
AUTHORS: Jay Nandy ; Wynne Hsu ; Mong Li Lee
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: Our paper presents a Dirichlet Prior Network-based framework to address this issue.

29, TITLE: Transcript to Video: Efficient Clip Sequencing from Texts
AUTHORS: Yu Xiong ; Fabian Caba Heilbron ; Dahua Lin
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To meet the demands for non-experts, we present Transcript-to-Video -- a weakly-supervised framework that uses texts as input to automatically create video sequences from an extensive collection of shots.

30, TITLE: Will Multi-modal Data Improves Few-shot Learning?
AUTHORS: Zilun Zhang ; Shihao Ma ; Yichun Zhang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To achieve this goal, we propose four types of fusion method to combine the image feature and text feature.

31, TITLE: On-Device Content Moderation
AUTHORS: ANCHAL PANDEY et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper we present a novel on-device solutionfor detecting NSFW images.

32, TITLE: Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
AUTHORS: ABHISHEK AICH et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To alleviate these problems, we propose Spatio-Temporal Representation Factorization module (STRF), a flexible new computational unit that can be used in conjunction with most existing 3D convolutional neural network architectures for re-ID.

33, TITLE: Perceptually Validated Precise Local Editing for Facial Action Units with StyleGAN
AUTHORS: Alara Zindanc?o?lu ; T. Metin Sezgin
CATEGORY: cs.CV [cs.CV, cs.AI, cs.HC]
HIGHLIGHT: We build a solution based on StyleGAN, which has been used extensively for semantic manipulation of faces.

34, TITLE: AA3DNet: Attention Augmented Real Time 3D Object Detection
AUTHORS: Abhinav Sagar
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this work, we address the problem of 3D object detection from point cloud data in real time.

35, TITLE: An Efficient Insect Pest Classification Using Multiple Convolutional Neural Network Based Models
AUTHORS: Hieu T. Ung ; Huy Q. Ung ; Binh T. Nguyen
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We present different convolutional neural network-based models in this work, including attention, feature pyramid, and fine-grained models.

36, TITLE: Efficient Video Object Segmentation with Compressed Video
AUTHORS: Kai Xu ; Angela Yao
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose an efficient inference framework for semi-supervised video object segmentation by exploiting the temporal redundancy of the video.

37, TITLE: Using Synthetic Corruptions to Measure Robustness to Natural Distribution Shifts
AUTHORS: Alfred Laugros ; Alice Caplier ; Matthieu Ospici
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a methodology to build synthetic corruption benchmarks that make robustness estimations more correlated with robustness to real-world distribution shifts. Applying the proposed methodology, we build a new benchmark called ImageNet-Syn2Nat to predict image classifier robustness.

38, TITLE: HANet: Hierarchical Alignment Networks for Video-Text Retrieval
AUTHORS: Peng Wu ; Xiangteng He ; Mingqian Tang ; Yiliang Lv ; Jing Liu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address the above limitations, we propose a Hierarchical Alignment Network (HANet) to align different level representations for video-text matching.

39, TITLE: Towards Unbiased Visual Emotion Recognition Via Causal Intervention
AUTHORS: Yuedong Chen ; Xu Yang ; Tat-Jen Cham ; Jianfei Cai
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we scrutinize this problem from the perspective of causal inference, where such dataset characteristic is termed as a confounder which misleads the system to learn the spurious correlation.

40, TITLE: CP-loss: Connectivity-preserving Loss for Road Curb Detection in Autonomous Driving with Aerial Images
AUTHORS: Zhenhua Xu ; Yuxiang Sun ; Lujia Wang ; Ming Liu
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: To alleviate this issue, we detect road curbs offline using high-resolution aerial images in this paper.

41, TITLE: A Multiple-Instance Learning Approach for The Assessment of Gallbladder Vascularity from Laparoscopic Images
AUTHORS: C. Loukas ; A. Gazis ; D. Schizas
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper we propose a multiple-instance learning (MIL) technique for assessment of the GB wall vascularity via computer-vision analysis of images from LC operations.

42, TITLE: Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition
AUTHORS: AYAN KUMAR BHUNIA et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we argue that semantic information offers a complementary role in addition to visual only.

43, TITLE: Learning to Adversarially Blur Visual Object Tracking
AUTHORS: QING GUO et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this work, we explore the robustness of visual object trackers against motion blur from a new angle, i.e., adversarial blur attack (ABA).

44, TITLE: Text Is Text, No Matter What: Unifying Text Recognition Using Knowledge Distillation
AUTHORS: Ayan Kumar Bhunia ; Aneeshan Sain ; Pinaki Nath Chowdhury ; Yi-Zhe Song
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, for the first time, we argue for their unification -- we aim for a single model that can compete favourably with two separate state-of-the-art STR and HTR models.

45, TITLE: Towards The Unseen: Iterative Text Recognition By Distilling from Errors
AUTHORS: Ayan Kumar Bhunia ; Pinaki Nath Chowdhury ; Aneeshan Sain ; Yi-Zhe Song
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we put forward a novel framework to specifically tackle this "unseen" problem.

46, TITLE: LAConv: Local Adaptive Convolution for Image Fusion
AUTHORS: Zi-Rong Jin ; Liang-Jian Deng ; Tai-Xiang Jiang ; Tian-Jing Zhang
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: In this paper, we propose a local adaptive convolution (LAConv), which is dynamically adjusted to different spatial locations.

47, TITLE: Log-Polar Space Convolution for Convolutional Neural Networks
AUTHORS: Bing Su ; Ji-Rong Wen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper proposes a novel log-polar space convolution (LPSC) method, where the convolution kernel is elliptical and adaptively divides its local receptive field into different regions according to the relative directions and logarithmic distances.

48, TITLE: Transductive Maximum Margin Classifier for Few-Shot Learning
AUTHORS: Fei Pan ; Chunlei Xu ; Jie Guo ; Yanwen Guo
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We introduce Transductive Maximum Margin Classifier (TMMC) for few-shot learning.

49, TITLE: Meta-FDMixup: Cross-Domain Few-Shot Learning Guided By Labeled Target Data
AUTHORS: Yuqian Fu ; Yanwei Fu ; Yu-Gang Jiang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we realize that the labeled target data in CD-FSL has not been leveraged in any way to help the learning process.

50, TITLE: Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph
AUTHORS: Wentian Zhao ; Yao Hu ; Heda Wang ; Xinxiao Wu ; Jiebo Luo
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To tackle these challenges, we propose a novel approach that constructs a multi-modal knowledge graph to associate the visual objects with named entities and capture the relationship between entities simultaneously with the help of external knowledge collected from the web.

51, TITLE: Self-Conditioned Probabilistic Learning of Video Rescaling
AUTHORS: YUAN TIAN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously.

52, TITLE: Clustering By Maximizing Mutual Information Across Views
AUTHORS: Kien Do ; Truyen Tran ; Svetha Venkatesh
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We propose a novel framework for image clustering that incorporates joint representation learning and clustering.

53, TITLE: Temporal Alignment Prediction for Few-Shot Video Classification
AUTHORS: Fei Pan ; Chunlei Xu ; Jie Guo ; Yanwen Guo
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose Temporal Alignment Prediction (TAP) based on sequence similarity learning for few-shot video classification.

54, TITLE: Crosslink-Net: Double-branch Encoder Segmentation Network Via Fusing Vertical and Horizontal Convolutions
AUTHORS: QIAN YU et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, 68T07, I.4.6]
HIGHLIGHT: To further cope with these challenges, we present a novel double-branch encoder architecture.

55, TITLE: ASOD60K: Audio-Induced Salient Object Detection in Panoramic Videos
AUTHORS: YI ZHANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: With this goal in mind, we propose PV-SOD, a new task that aims to segment salient objects from panoramic videos.

56, TITLE: Augmentation Pathways Network for Visual Recognition
AUTHORS: YALONG BAI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper introduces a novel network design, noted as Augmentation Pathways (AP), to systematically stabilize training on a much wider range of augmentation policies.

57, TITLE: HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration
AUTHORS: FAN LU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose an efficient hierarchical network named HRegNet for large-scale outdoor LiDAR point cloud registration.

58, TITLE: What Remains of Visual Semantic Embeddings
AUTHORS: Yue Jiao ; Jonathon Hare ; Adam Pr�gel-Bennett
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we introduce the split of tiered-ImageNet to the ZSL task, in order to avoid the structural flaws in the standard ImageNet benchmark.

59, TITLE: Cycled Compositional Learning Between Images and Text
AUTHORS: Jongseok Kim ; Youngjae Yu ; Seunghwan Lee ; GunheeKim
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We present an approach named the Cycled Composition Network that can measure the semantic distance of the composition of image-text embedding.

60, TITLE: Spatial-Temporal Transformer for Dynamic Scene Graph Generation
AUTHORS: Yuren Cong ; Wentong Liao ; Hanno Ackermann ; Michael Ying Yang ; Bodo Rosenhahn
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose Spatial-temporal Transformer (STTran), a neural network that consists of two core modules: (1) a spatial encoder that takes an input frame to extract spatial context and reason about the visual relationships within a frame, and (2) a temporal decoder which takes the output of the spatial encoder as input in order to capture the temporal dependencies between frames and infer the dynamic relationships.

61, TITLE: Alleviate Representation Overlapping in Class Incremental Learning By Contrastive Class Concentration
AUTHORS: Zixuan Ni ; Haizhou shi ; Siliang tang ; Yueting Zhuang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, to alleviate the phenomenon of representation overlapping for both memory-based and memory-free methods, we propose a new CIL framework, Contrastive Class Concentration for CIL (C4IL).

62, TITLE: Boosting Video Captioning with Dynamic Loss Network
AUTHORS: Nasibullah ; Partha Pratim Mohanta
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: This paper addresses the drawback by introducing a dynamic loss network (DLN), which provides an additional feedback signal that directly reflects the evaluation metrics.

63, TITLE: Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
AUTHORS: YUXIN CHEN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a novel Channel-wise Topology Refinement Graph Convolution (CTR-GC) to dynamically learn different topologies and effectively aggregate joint features in different channels for skeleton-based action recognition.

64, TITLE: Deep Learning Based Cardiac MRI Segmentation: Do We Need Experts?
AUTHORS: Youssef Skandarani ; Pierre-Marc Jodoin ; Alain Lalande
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we set out to explore whether expert knowledge is a strict requirement for the creation of annotated datasets that machine learning can successfully train on.

65, TITLE: Cross-Sentence Temporal and Semantic Relations in Video Activity Localisation
AUTHORS: Jiabo Huang ; Yang Liu ; Shaogang Gong ; Hailin Jin
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we develop a more accurate weakly-supervised solution by introducing Cross-Sentence Relations Mining (CRM) in video moment proposal generation and matching when only a paragraph description of activities without per-sentence temporal annotation is available.

66, TITLE: PoseFace: Pose-Invariant Features and Pose-Adaptive Loss for Face Recognition
AUTHORS: QIANG MENG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose an efficient PoseFace framework which utilizes the facial landmarks to disentangle the pose-invariant features and exploits a pose-adaptive loss to handle the imbalance issue adaptively.

67, TITLE: Can Action Be Imitated? Learn to Reconstruct and Transfer Human Dynamics from Videos
AUTHORS: Yuqian Fu ; Yanwei Fu ; Yu-Gang Jiang
CATEGORY: cs.CV [cs.CV, cs.MM]
HIGHLIGHT: In this paper, we introduce a novel task, dubbed mesh-based action imitation.

68, TITLE: Semantic Attention and Scale Complementary Network for Instance Segmentation in Remote Sensing Images
AUTHORS: TIANYANG ZHANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we focus on the challenging multicategory instance segmentation problem in remote sensing images (RSIs), which aims at predicting the categories of all instances and localizing them with pixel-level masks.

69, TITLE: Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference
AUTHORS: JUNCHENG LI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we study how to address three critical challenges for this task: judging the global correctness of the statement involved multiple semantic meanings, joint reasoning over video and subtitles, and modeling long-range relationships and complex social interactions.

70, TITLE: ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation
AUTHORS: TSUNG-HAN WU et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: To reduce the huge annotation burden, we propose a Region-based and Diversity-aware Active Learning (ReDAL), a general framework for many deep learning approaches, aiming to automatically select only informative and diverse sub-scene regions for label acquisition.

71, TITLE: Character Spotting Using Machine Learning Techniques
AUTHORS: P Preethi ; Hrishikesh Viswanath
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: This work presents a comparison of machine learning algorithms that are implemented to segment the characters of text presented as an image.

72, TITLE: Contextual Transformer Networks for Visual Recognition
AUTHORS: Yehao Li ; Ting Yao ; Yingwei Pan ; Tao Mei
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, cs.MM]
HIGHLIGHT: In this work, we design a novel Transformer-style module, i.e., Contextual Transformer (CoT) block, for visual recognition.

73, TITLE: Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation
AUTHORS: LIAN XU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Motivated by the significant inter-task correlation, we propose a novel weakly supervised multi-task framework termed as AuxSegNet, to leverage saliency detection and multi-label image classification as auxiliary tasks to improve the primary task of semantic segmentation using only image-level ground-truth labels.

74, TITLE: Image-Based Parking Space Occupancy Classification: Dataset and Baseline
AUTHORS: Martin Marek
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We introduce a new dataset for image-based parking space occupancy classification: ACPDS.

75, TITLE: Multimodal Fusion Using Deep Learning Applied to Driver's Referencing of Outside-Vehicle Objects
AUTHORS: Abdul Rafey Aftab ; Michael von der Beeck ; Steven Rohrhirsch ; Benoit Diotte ; Michael Feld
CATEGORY: cs.HC [cs.HC, cs.CV, cs.LG]
HIGHLIGHT: In this paper, we utilize deep learning for a multimodal fusion network for referencing objects outside the vehicle.

76, TITLE: Adversarial Training May Be A Double-edged Sword
AUTHORS: Ali Rahmati ; Seyed-Mohsen Moosavi-Dezfooli ; Huaiyu Dai
CATEGORY: cs.LG [cs.LG, cs.CR, cs.CV]
HIGHLIGHT: In this work, we demonstrate that some geometric consequences of adversarial training on the decision boundary of deep networks give an edge to certain types of black-box attacks.

77, TITLE: Using A Cross-Task Grid of Linear Probes to Interpret CNN Model Predictions On Retinal Images
AUTHORS: Katy Blumer ; Subhashini Venugopalan ; Michael P. Brenner ; Jon Kleinberg
CATEGORY: cs.LG [cs.LG, cs.CV, eess.IV]
HIGHLIGHT: We use this method across all possible pairings of 93 tasks in the UK Biobank dataset of retinal images, leading to ~164k different models.

78, TITLE: Go Wider Instead of Deeper
AUTHORS: Fuzhao Xue ; Ziji Shi ; Yuxuan Lou ; Yong Liu ; Yang You
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: In this paper, to achieve better performance with fewer trainable parameters, we propose a framework to deploy trainable parameters efficiently, by going wider instead of deeper.

79, TITLE: Robust Explainability: A Tutorial on Gradient-Based Attribution Methods for Deep Neural Networks
AUTHORS: Ian E. Nielsen ; Ghulam Rasool ; Dimah Dera ; Nidhal Bouaynaya ; Ravi P. Ramachandran
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: In this tutorial paper, we start by presenting gradient-based interpretability methods.

80, TITLE: Thought Flow Nets: From Single Predictions to Trains of Model Thought
AUTHORS: Hendrik Schuff ; Heike Adel ; Ngoc Thang Vu
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CL, cs.CV]
HIGHLIGHT: We take inspiration from Hegel's dialectics and propose a method that turns an existing classifier's class prediction (such as the image class forest) into a sequence of predictions (such as forest $\rightarrow$ tree $\rightarrow$ mushroom).

81, TITLE: In Defense of The Learning Without Forgetting for Task Incremental Learning
AUTHORS: Guy Oren ; Lior Wolf
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In Defense of The Learning Without Forgetting for Task Incremental Learning

82, TITLE: Compressing Neural Networks: Towards Determining The Optimal Layer-wise Decomposition
AUTHORS: Lucas Liebenwein ; Alaa Maalouf ; Oren Gal ; Dan Feldman ; Daniela Rus
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: We present a novel global compression framework for deep neural networks that automatically analyzes each layer to identify the optimal per-layer compression ratio, while simultaneously achieving the desired overall compression.

83, TITLE: Improving Variational Autoencoder Based Out-of-Distribution Detection for Embedded Real-time Applications
AUTHORS: Yeli Feng ; Daniel Jun Xian Ng ; Arvind Easwaran
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: Generative learning models are widely adopted for the task, namely out-of-distribution (OoD) detection.

84, TITLE: Free Hyperbolic Neural Networks with Limited Radii
AUTHORS: Yunhui Guo ; Xudong Wang ; Yubei Chen ; Stella X. Yu
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: Our thorough experiments show that the proposed method can successfully avoid the vanishing gradient problem when training HNNs with backpropagation.

85, TITLE: Improving Robot Localisation By Ignoring Visual Distraction
AUTHORS: Oscar Mendez ; Matthew Vowels ; Richard Bowden
CATEGORY: cs.RO [cs.RO, cs.CV]
HIGHLIGHT: In this work, we introduce Neural Blindness, which gives an agent the ability to completely ignore objects or classes that are deemed distractors.

86, TITLE: Dual-Attention Enhanced BDense-UNet for Liver Lesion Segmentation
AUTHORS: WENMING CAO et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this work, we propose a new segmentation network by integrating DenseUNet and bidirectional LSTM together with attention mechanism, termed as DA-BDense-UNet.

87, TITLE: A Real Use Case of Semi-Supervised Learning for Mammogram Classification in A Local Clinic of Costa Rica
AUTHORS: SAUL CALDERON-RAMIREZ et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this work, a real world scenario is evaluated where a novel target dataset sampled from a private Costa Rican clinic is used, with few labels and heavily imbalanced data.

88, TITLE: Reconstructing Images of Two Adjacent Objects Through Scattering Medium Using Generative Adversarial Network
AUTHORS: Xuetian Lai ; Qiongyao Li ; Ziyang Chen ; Xiaopeng Shao ; Jixiong Pu
CATEGORY: eess.IV [eess.IV, cs.CV, physics.optics]
HIGHLIGHT: In this paper, we demonstrate an approach by using generative adversarial network (GAN) to reconstruct images of two adjacent objects through scattering media.

89, TITLE: Weakly Supervised Attention Model for RV StrainClassification from Volumetric CTPA Scans
AUTHORS: NOA CAHAN et. al.
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG, 92C50, 68T07 (Primary), I.4.9; J.6; I.2.1]
HIGHLIGHT: We developed a weakly supervised deep learning algorithm, with an emphasis on a novel attention mechanism, to automatically classify RV strain on CTPA.

90, TITLE: Structure-Preserving Multi-Domain Stain Color Augmentation Using Style-Transfer with Disentangled Representations
AUTHORS: SOPHIA J. WAGNER et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: We propose a novel color augmentation technique, HistAuGAN, that can simulate a wide variety of realistic histology stain colors, thus making neural networks stain-invariant when applied during training.

91, TITLE: Towards Generative Video Compression
AUTHORS: FABIAN MENTZER et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: We present a neural video compression method based on generative adversarial networks (GANs) that outperforms previous neural video compression methods and is comparable to HEVC in a user study.

92, TITLE: A Unified Hyper-GAN Model for Unpaired Multi-contrast MR Image Translation
AUTHORS: Heran Yang ; Jian Sun ; Liwei Yang ; Zongben Xu
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this work, we propose a unified Hyper-GAN model for effectively and efficiently translating between different contrast pairs.

93, TITLE: Accelerating Atmospheric Turbulence Simulation Via Learned Phase-to-Space Transform
AUTHORS: Zhiyuan Mao ; Nicholas Chimitt ; Stanley H. Chan
CATEGORY: eess.IV [eess.IV, cs.CV, physics.flu-dyn]
HIGHLIGHT: Recognizing the limitations of previous approaches, we introduce a new concept known as the phase-to-space (P2S) transform to significantly speed up the simulation.

94, TITLE: Lung Cancer Risk Estimation with Incomplete Data: A Joint Missing Imputation Perspective
AUTHORS: RIQIANG GAO et. al.
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: In this paper, we address imputation of missing data by modeling the joint distribution of multi-modal data.

95, TITLE: MAG-Net: Mutli-task Attention Guided Network for Brain Tumor Segmentation and Classification
AUTHORS: Sachin Gupta ; Narinder Singh Punn ; Sanjay Kumar Sonbhadra ; Sonali Agarwal
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: Motivated by the deep learning based computer-aided-diagnosis systems, this paper proposes multi-task attention guided encoder-decoder network (MAG-Net) to classify and segment the brain tumor regions using MRI images.

96, TITLE: B-line Detection in Lung Ultrasound Videos: Cartesian Vs Polar Representation
AUTHORS: HAMIDEH KERDEGARI et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: This paper presents an attention-based Convolutional+LSTM model to automatically detect B-lines in LUS videos, comparing performance when image data is taken in Cartesian and polar representations.

97, TITLE: Deep Learning-based Frozen Section to FFPE Translation
AUTHORS: KUTSEV BENGISU OZYORUK et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we propose an artificial intelligence (AI) method that improves FS image quality by computationally transforming frozen-sectioned whole-slide images (FS-WSIs) into whole-slide FFPE-style images in minutes.

计算机视觉论文-2021-07-27

相关文章

计算机视觉论文-2021-04-06

九点手眼标定的原理及方法，视觉引导必知必会。

中国机器视觉产业全景图谱

计算机视觉论文-2021-09-14

计算机视觉相关会议

计算机视觉论文-2021-03-03

计算机视觉论文-2021-03-18

计算机视觉论文-2021-06-01