摘要 :
Data mixing augmentation have proved to be effective for improving the generalization ability of deep neural networks. While early methods mix samples by hand-crafted policies (e.g., linear interpolation), recent methods utilize s...
展开
Data mixing augmentation have proved to be effective for improving the generalization ability of deep neural networks. While early methods mix samples by hand-crafted policies (e.g., linear interpolation), recent methods utilize saliency information to match the mixed samples and labels via complex offline optimization. However, there arises a trade-off between precise mixing policies and optimization complexity. To address this challenge, we propose a novel automatic mixup (AutoMix) framework, where the mixup policy is parameterized and serves the ultimate classification goal directly. Specifically, AutoMix reformulates the mixup classification into two sub-tasks (i.e., mixed sample generation and mixup classification) with corresponding sub-networks and solves them in a bi-level optimization framework. For the generation, a learnable lightweight mixup generator, Mix Block, is designed to generate mixed samples by modeling patch-wise relationships under the direct supervision of the corresponding mixed labels. To prevent the degradation and instability of bi-level optimization, we further introduce a momentum pipeline to train AutoMix in an end-to-end manner. Extensive experiments on nine image benchmarks prove the superiority of AutoMix compared with state-of-the-arts in various classification scenarios and downstream tasks.
收起
摘要 :
Current multi-category Multiple Object Tracking (MOT) metrics use class labels to group tracking results for per-class evaluation. Similarly, MOT methods typically only associate objects with the same class predictions. These two ...
展开
Current multi-category Multiple Object Tracking (MOT) metrics use class labels to group tracking results for per-class evaluation. Similarly, MOT methods typically only associate objects with the same class predictions. These two prevalent strategies in MOT implicitly assume that the classification performance is near-perfect. However, this is far from the case in recent large-scale MOT datasets, which contain large numbers of classes with many rare or semantically similar categories. Therefore, the resulting inaccurate classification leads to sub-optimal tracking and inadequate benchmarking of trackers. We address these issues by disentangling classification from tracking. We introduce a new metric, Track Every Thing Accuracy (TETA), breaking tracking measurement into three sub-factors: localization, association, and classification, allowing comprehensive benchmarking of tracking performance even under inaccurate classification. TETA also deals with the challenging incomplete annotation problem in large-scale tracking datasets. We further introduce a Track Every Thing tracker (TETer), that performs association using Class Exemplar Matching (CEM). Our experiments show that TETA evaluates trackers more comprehensively, and TETer achieves significant improvements on the challenging large-scale datasets BDD100K and TAO compared to the state-of-the-art.
收起
摘要 :
Manifold learning (ML) aims to seek low-dimensional embedding from high-dimensional data. The problem is challenging on real-world datasets, especially with under-sampling data, and we find that previous methods perform poorly in ...
展开
Manifold learning (ML) aims to seek low-dimensional embedding from high-dimensional data. The problem is challenging on real-world datasets, especially with under-sampling data, and we find that previous methods perform poorly in this case. Generally, ML methods first transform input data into a low-dimensional embedding space to maintain the data's geometric structure and subsequently perform downstream tasks therein. The poor local connectivity of under-sampling data in the former step and inappropriate optimization objectives in the latter step leads to two problems: structural distortion and underconstrained embedding. This paper proposes a novel ML framework named Deep Local-flatness Manifold Embedding (DLME) to solve these problems. The proposed DLME constructs semantic manifolds by data augmentation and overcomes the structural distortion problem using a smoothness constrained based on a local flatness assumption about the manifold. To overcome the underconstrained embedding problem, we design a loss and theoretically demonstrate that it leads to a more suitable embedding based on the local flatness. Experiments on three types of datasets (toy, biological, and image) for various downstream tasks (classification, clustering, and visualization) show that our proposed DLME outperforms state-of-the-art ML and contrastive learning methods.
收起
摘要 :
Dimension reduction (DR) aims to learn low-dimensional representations of high-dimensional data with the preservation of essential information. In the context of manifold learning, we define that the representation after informati...
展开
Dimension reduction (DR) aims to learn low-dimensional representations of high-dimensional data with the preservation of essential information. In the context of manifold learning, we define that the representation after information-lossless DR preserves the topological and geometric properties of data manifolds formally, and propose a novel two-stage DR method, called invertible manifold learning (inv-ML) to bridge the gap between theoretical information-lossless and practical DR. The first stage includes a homeomorphic sparse coordinate transformation to learn low-dimensional representations without destroying topology and a local isometry constraint to preserve local geometry. In the second stage, a linear compression is implemented for the trade-off between the target dimension and the incurred information loss in excessive DR scenarios. Experiments are conducted on seven datasets with a neural network implementation of inv-ML, called i-ML-Enc. Empirically, i-ML-Enc achieves invertible DR in comparison with typical existing methods as well as reveals the characteristics of the learned manifolds. Through latent space interpolation on real-world datasets, we find that the reliability of tangent space approximated by the local neighborhood is the key to the success of manifold-based DR algorithms.
收起
摘要 :
Dimension reduction (DR) aims to learn low-dimensional representations of high-dimensional data with the preservation of essential information. In the context of manifold learning, we define that the representation after informati...
展开
Dimension reduction (DR) aims to learn low-dimensional representations of high-dimensional data with the preservation of essential information. In the context of manifold learning, we define that the representation after information-lossless DR preserves the topologi-cal and geometric properties of data manifolds formally, and propose a novel two-stage DR method, called invertible manifold learning (inv-ML) to bridge the gap between theoretical information-lossless and practical DR. The first stage includes a homeomorphic sparse coordinate transformation to learn low-dimensional representations without destroying topology and a local isometry constraint to preserve local geometry. In the second stage, a linear compression is implemented for the trade-off between the target dimension and the incurred information loss in excessive DR scenarios. Experiments are conducted on seven datasets with a neural network implementation of inv-ML, called i-ML-Enc. Empirically, i-ML-Enc achieves invertible DR in comparison with typical existing methods as well as reveals the characteristics of the learned manifolds. Through latent space interpolation on real-world datasets, we find that the reliability of tangent space approximated by the local neighborhood is the key to the success of manifold-based DR algorithms.
收起
摘要 :
Image inpainting refers to the process of reconstructing damaged areas of an image. For image inpainting, there are many means to generate not too bad inpainting results today. However, these methods either make the results look u...
展开
Image inpainting refers to the process of reconstructing damaged areas of an image. For image inpainting, there are many means to generate not too bad inpainting results today. However, these methods either make the results look unrealistic or have complex structures and a large number of parameters. In order to solve the above problems, this paper designed a simple encoder-decoder network and introduced the region normalization technique. At the same time, a new separable gate convolution is proposed. The simple network architecture and separable gate convolution significantly reduce the number of network parameters. Moreover, the separable gate convolution can learn the mask (represents the missing area) from the feature map and update it automatically. After mask update, weights will be applied to each pixel of the feature map to alleviate the impact of invalid mask information on the completed result and improve the inpainting quality. Our method reduces 0.58M parameters. Moreover, our method improved the PSNR of Celeba and Paris Street View by 0.7-1.4dB and 0.7-1.0dB, respectively, in 10% to 60% damage cases. The corresponding SSIM has been increased 1.6 to 2.7 and 0.9 to 2.3%.
收起
摘要 :
Image inpainting refers to the process of reconstructing damaged areas of an image. For image inpainting, there are many means to generate not too bad inpainting results today. However, these methods either make the results look u...
展开
Image inpainting refers to the process of reconstructing damaged areas of an image. For image inpainting, there are many means to generate not too bad inpainting results today. However, these methods either make the results look unrealistic or have complex structures and a large number of parameters. In order to solve the above problems, this paper designed a simple encoder-decoder network and introduced the region normalization technique. At the same time, a new separable gate convolution is proposed. The simple network architecture and separable gate convolution significantly reduce the number of network parameters. Moreover, the separable gate convolution can learn the mask (represents the missing area) from the feature map and update it automatically. After mask update, weights will be applied to each pixel of the feature map to alleviate the impact of invalid mask information on the completed result and improve the inpainting quality. Our method reduces 0.58M parameters. Moreover, our method improved the PSNR of Celeba and Paris Street View by 0.7-1.4 dB and 0.7-1.0 dB, respectively, in 10% to 60% damage cases. The corresponding SSIM has been increased 1.6 to 2.7 and 0.9 to 2.3%.
收起
摘要 :
Human visual system relies on both binocular stereo cues and monocular focusness cues to gain effective 3D perception. In computer vision, the two problems are traditionally solved in separate tracks. In this paper, we present a u...
展开
Human visual system relies on both binocular stereo cues and monocular focusness cues to gain effective 3D perception. In computer vision, the two problems are traditionally solved in separate tracks. In this paper, we present a unified learning-based technique that simultaneously uses both types of cues for depth inference. Specifically, we use a pair of focal stacks as input to emulate human perception. We first construct a comprehensive focal stack training dataset synthesized by depth-guided light field rendering. We then construct three individual networks: a Focus-Net to extract depth from a single focal stack, a EDoF-Net to obtain the extended depth of field (EDoF) image from the focal stack, and a Stereo-Net to conduct stereo matching. We show how to integrate them into a unified BDfF-Net to obtain high-quality depth maps. Comprehensive experiments show that our approach outperforms the state-of-the-art in both accuracy and speed and effectively emulates human vision systems.
收起
摘要 :
Human visual system relies on both binocular stereo cues and monocular focusness cues to gain effective 3D perception. In computer vision, the two problems are traditionally solved in separate tracks. In this paper, we present a u...
展开
Human visual system relies on both binocular stereo cues and monocular focusness cues to gain effective 3D perception. In computer vision, the two problems are traditionally solved in separate tracks. In this paper, we present a unified learning-based technique that simultaneously uses both types of cues for depth inference. Specifically, we use a pair of focal stacks as input to emulate human perception. We first construct a comprehensive focal stack training dataset synthesized by depth-guided light field rendering. We then construct three individual networks: a Focus-Net to extract depth from a single focal stack, a EDoF-Net to obtain the extended depth of field (EDoF) image from the focal stack, and a Stereo-Net to conduct stereo matching. We show how to integrate them into a unified BDfF-Net to obtain high-quality depth maps. Comprehensive experiments show that our approach outperforms the state-of-the-art in both accuracy and speed and effectively emulates human vision systems.
收起
摘要 :
Formal specification can be an error-prone process for complex systems and how to efficiently write correct specifications is still a challenge for practitioners in industry. This paper presents a software tool to support the scen...
展开
Formal specification can be an error-prone process for complex systems and how to efficiently write correct specifications is still a challenge for practitioners in industry. This paper presents a software tool to support the scenario-based formal specification approach developed in the SOFL formal engineering method. Using the tool, the current version of the formal specification under construction can be automatically checked to ensure the internal consistency and some further contents of the specification may be automatically predicated to facilitate the user in completing the specification. To improve the readability of the formal specification, the tool can also automatically translate the textual format of the specification into a comprehensible tabular format. All of these three functions can be helpful to prevent errors during the construction of the specification. We discuss each of the functions by first presenting its principle and then illustrating it with examples. We present a case study to show how the tool supports the scenario-based specification approach. Finally, we conclude the paper and suggest topics for future research.
收起