摘要 :
This paper proposes a new measure for ensemble pruning via directed hill climbing, dubbed Uncertainty Weighted Accuracy (UWA), which takes into account the uncertainty of the decision of the current ensemble. Empirical results on ...
展开
This paper proposes a new measure for ensemble pruning via directed hill climbing, dubbed Uncertainty Weighted Accuracy (UWA), which takes into account the uncertainty of the decision of the current ensemble. Empirical results on 30 data sets show that using the proposed measure to prune a heterogeneous ensemble leads to significantly better accuracy results compared to state-of-the-art measures and other baseline methods, while keeping only a small fraction of the original models. Besides the evaluation measure, the paper also studies two other parameters of directed hill climbing ensemble pruning methods, the search direction and the evaluation dataset, with interesting conclusions on appropriate values.
收起
摘要 :
Classifier diversity and fusion architecture are two critical characteristics stressed in homogeneous and heterogeneous ensemble learning methods and they are equally important for building a successful multi-classifier system. In...
展开
Classifier diversity and fusion architecture are two critical characteristics stressed in homogeneous and heterogeneous ensemble learning methods and they are equally important for building a successful multi-classifier system. In this study, we introduced a two-level framework, namely hierarchical fusion of homogeneous and heterogeneous multi-classifiers (HF2HM), to integrate the diversified classification models produced by feeding heterogeneous classifiers with homogeneous random-projected training datasets. The proposed hierarchical fusion scheme was comprehensively validated using fifteen public UCI datasets and three clinical datasets. The experimental results demonstrated the superiority of the proposed HF2HM framework over the base classifiers and the state-of-the-art benchmark ensemble methods, verifying it as a potential tool to assist in medical decision making in practical clinical settings. (C) 2020 Elsevier B.V. All rights reserved.
收起
摘要 :
Climate model ensembles are used to estimate uncertainty in future projections, typically by interpreting the ensemble distribution for a particular variable probabilistically. There are, however, different ways to produce climate...
展开
Climate model ensembles are used to estimate uncertainty in future projections, typically by interpreting the ensemble distribution for a particular variable probabilistically. There are, however, different ways to produce climate model ensembles that yield different results, and therefore different probabilities for a future change in a variable. Perhaps equally importantly, there are different approaches to interpreting the ensemble distribution that lead to different conclusions. Here we use a reduced-resolution climate system model to compare three common ways to generate ensembles: initial conditions perturbation, physical parameter perturbation, and structural changes. Despite these three approaches conceptually representing very different categories of uncertainty within a modelling system, when comparing simulations to observations of surface air temperature they can be very difficult to separate. Using the twentieth century CMIP5 ensemble for comparison, we show that initial conditions ensembles, in theory representing internal variability, significantly underestimate observed variance. Structural ensembles, perhaps less surprisingly, exhibit over-dispersion in simulated variance. We argue that future climate model ensembles may need to include parameter or structural perturbation members in addition to perturbed initial conditions members to ensure that they sample uncertainty due to internal variability more completely. We note that where ensembles are over- or under-dispersive, such as for the CMIP5 ensemble, estimates of uncertainty need to be treated with care.
收起
摘要 :
For the unitary ensembles of N * N Hermitian matrices associated with a weight function w there is a kernel, expressible in terms of the polynomials orthogonal with respect to the weight function, which plays an important role. Fo...
展开
For the unitary ensembles of N * N Hermitian matrices associated with a weight function w there is a kernel, expressible in terms of the polynomials orthogonal with respect to the weight function, which plays an important role. For the orthogonal and symplectic ensembles of Hermitian matrices there are 2 * 2 matrix kernels, usually constructed using skew-orthogonal polynomials, which play an analogous role. These matrix kernels are determined by their upper left- hand entries. We derive formulas expressing these entries in terms of the scalar kernel for the corresponding unitary ensembles. We also show that whenever w'/w is a rational function the entries are equal to the scalar kernel plus some extra terms whose number equals the order of w'/w. General formulas are obtained for these extra terms. We do not use skew-orthogonal polynomials in the derivations.
收起
摘要 :
This paper investigates the problem of integrating multiple structures which are extracted from different sets of data points into a single unified structure. We first propose a new generalized concept called structure ensemble fo...
展开
This paper investigates the problem of integrating multiple structures which are extracted from different sets of data points into a single unified structure. We first propose a new generalized concept called structure ensemble for the fusion of multiple structures. Unlike traditional cluster ensemble approaches the main objective of which is to align individual labels obtained from different clustering solutions, the structure ensemble approach focuses on how to unify the structures obtained from different data sources. Based on this framework, a new structure ensemble approach called the probabilistic bagging based structure ensemble approach (BSEA) is designed, which integrates the bagging technique, the force based self-organizing map (FBSOM) and the normalized cut algorithm into the proposed framework. BSEA views structures obtained from different datasets generated by the bagging technique as nodes in a graph, and adopts graph theory to find the most representative structure. In addition, the force based self-organizing map (FBSOM), which is a generalized form of SOM, is proposed to serve as the basic clustering algorithm in the structure ensemble framework. Finally, a new external index called correlation index (CI), which considers the correlation relationship of both the similarity and dissimilarity between the predicted solution and the true solution, is proposed to evaluate the performance of BSEA. The experiments show that (i) The performance of BSEA outperforms most of the state-of-the-art clustering approaches, and (ii) BSEA performs well on datasets from the UCI repository and real cancer gene expression profiles.
收起
摘要 :
Despite significant successes achieved in knowledge discovery, traditional machine learning methods may fail to obtain satisfactory performances when dealing with complex data, such as imbalanced, high-dimensional, noisy data, etc...
展开
Despite significant successes achieved in knowledge discovery, traditional machine learning methods may fail to obtain satisfactory performances when dealing with complex data, such as imbalanced, high-dimensional, noisy data, etc. The reason behind is that it is difficult for these methods to capture multiple characteristics and underlying structure of data. In this context, it becomes an important topic in the data mining field that how to effectively construct an efficient knowledge discovery and mining model. Ensemble learning, as one research hot spot, aims to integrate data fusion, data modeling, and data mining into a unified framework. Specifically, ensemble learning firstly extracts a set of features with a variety of transformations. Based on these learned features, multiple learning algorithms are utilized to produce weak predictive results. Finally, ensemble learning fuses the informative knowledge from the above results obtained to achieve knowledge discovery and better predictive performance via voting schemes in an adaptive way. In this paper, we review the research progress of the mainstream approaches of ensemble learning and classify them based on different characteristics. In addition, we present challenges and possible research directions for each mainstream approach of ensemble learning, and we also give an extra introduction for the combination of ensemble learning with other machine learning hot spots such as deep learning, reinforcement learning, etc.
收起
摘要 :
Broad classes of statistical classification algorithms have been developed and applied successfully to a wide range of real-world domains. In general, ensuring that the particular classification algorithm matches the properties of...
展开
Broad classes of statistical classification algorithms have been developed and applied successfully to a wide range of real-world domains. In general, ensuring that the particular classification algorithm matches the properties of the data is crucial in providing results that meet the needs of the particular application domain. One way in which the impact of this algorithm/application match can be alleviated is by using ensembles of classifiers, where a variety of classifiers (either different types of classifiers or different instantiations of the same classifier) are pooled before a final classification decision is made. Intuitively, classifier ensembles allow the different needs of a difficult problem to be handled by classifiers suited to those particular needs. Mathematically, classifier ensembles provide an extra degree of freedom in the classical bias/variance tradeoff, allowing solutions that would be difficult (if not impossible) to reach with only a single classifier. Because of these advantages, classifier ensembles have been applied to many difficult real-world problems. In this paper, we survey select applications of ensemble methods to problems that have historically been most representative of the difficulties in classification. In particular, we survey applications of ensemble methods to remote sensing, person recognition, one vs. all recognition, and medicine.
收起
摘要 :
The iterative ensemble Kalman filter (IEnKF) was recently proposed in order to improve the performance of ensemble Kalman filtering with strongly nonlinear geophysical models. The IEnKF can be used as a lag-one smoother and extend...
展开
The iterative ensemble Kalman filter (IEnKF) was recently proposed in order to improve the performance of ensemble Kalman filtering with strongly nonlinear geophysical models. The IEnKF can be used as a lag-one smoother and extended to a fixed-lag smoother: the iterative ensemble Kalman smoother (IEnKS). The IEnKS is an ensemble variational method. It does not require the use of the tangent linear of the evolution and observation models, nor the adjoint of these models: the required sensitivities (gradient and Hessian) are obtained from the ensemble. Looking for optimal performance, out of the many possible extensions we consider a quasi-static algorithm. The IEnKS is explored for the Lorenz '95 model and for a two-dimensional turbulence model. As the logical extension of the IEnKF, the IEnKS significantly outperforms standard Kalman filters and smoothers in strongly nonlinear regimes. In mildly nonlinear regimes (typically synoptic-scale meteorology), its filtering performance is marginally but clearly better than the standard ensemble Kalman filter and it keeps improving as the length of the temporal data assimilation window is increased. For long windows, its smoothing performance outranks the standard smoothers very significantly, a result that is believed to stem from the variational but flow-dependent nature of the algorithm. For very long windows, the use of a multiple data assimilation variant of the scheme, where observations are assimilated several times, is advocated. This paves the way for finer reanalysis, freed from the static prior assumption of 4D-Var but also partially freed from the Gaussian assumptions that usually impede standard ensemble Kalman filtering and smoothing.
收起
摘要 :
Characterizing surface deformation throughout a full earthquake cycle is a challenge due to the lack of high-resolution geodetic observations of duration comparable to that of characteristic earthquake recurrence intervals (250-10...
展开
Characterizing surface deformation throughout a full earthquake cycle is a challenge due to the lack of high-resolution geodetic observations of duration comparable to that of characteristic earthquake recurrence intervals (250-10,000 years). Here we approach this problem by comparing long-term geologic slip rates with geodetically derived fault slip rates by sampling only a short fraction (0.001%-0.1%) of a complete earthquake cycle along 15 continental strike-slip faults. Geodetic observations provide snapshots of surface deformation from different times through the earthquake cycle. The timing of the last earthquake on many of these faults is poorly known, and may vary greatly from fault to fault. Assuming that the underlying mechanics of the seismic cycle are similar for all faults, geodetic observations from different faults may be interpreted as samples over a significantly larger fraction of the earthquake cycle than could be obtained from the geodetic record along any one fault alone. As an ensemble, we find that geologically and geodetically inferred slip rates agree well with a linear relation of 0:94 ± 0:09. To simultaneously explain both the ensemble agreement between geologic and geodetic slip-rate estimates with observations of rapid postseismic deformation, we consider the predictions from simple two-layer earthquake-cycle models with both Maxwell and Burgers viscoelastic rheologies. We find that a two-layer Burgers model, with two relaxation timescales, is consistent with observations of deformation throughout the earthquake cycle, whereas the widely used two-layer Maxwell model with a single relaxation timescale, is not, suggesting that the earthquake cycle is effectively characterized by a largely stressrecoverable rapid postseismic stage and a much more slowly varying interseismic stage.
收起
摘要 :
In this paper, we propose a meta-evolutionary approach to improve on the performance of individual classifiers. In the proposed system, individual classifiers evolve, competing to correctly classify test points, and are given extr...
展开
In this paper, we propose a meta-evolutionary approach to improve on the performance of individual classifiers. In the proposed system, individual classifiers evolve, competing to correctly classify test points, and are given extra rewards for getting difficult points right. Ensembles consisting of multiple classifiers also compete for member classifiers, and are rewarded based on their predictive performance. In this way we aim to build small-sized optimal ensembles rather than form large-sized ensembles of individually-optimized classifiers. Experimental results on 15 data sets suggest that our algorithms can generate ensembles that are more effective than single classifiers and traditional ensemble methods.
收起