摘要 :
Consider the following social choice problem. A group of individuals seek to classify the elements of X as belonging in one of two sets. The individuals may disagree as to how the elements of X should be classified, and so an aggr...
展开
Consider the following social choice problem. A group of individuals seek to classify the elements of X as belonging in one of two sets. The individuals may disagree as to how the elements of X should be classified, and so an aggregation rule is applied to determine a compromise outcome. We require that the social classification should not be imposed, nor should it be manipulable. We prove that the only aggregation rules satisfying these properties are dictatorships.
收起
摘要 :
Organic-mineral complexations can be isolated from bulk soil by physical disaggregation followed by density fractionation for further examination of the patchy nature of aggregates distributed in soil. Phaeozem, which is a specifi...
展开
Organic-mineral complexations can be isolated from bulk soil by physical disaggregation followed by density fractionation for further examination of the patchy nature of aggregates distributed in soil. Phaeozem, which is a specific regional soil in China with high organic matter, was selected for fractionation into particle-size aggregates with clay, silt, fine sand and coarse sand. Partition and characterization of the adsorbed Cd in the bulk soil and various aggregates of Phaeozem were investigated by Fourier-transform infrared (FTIR) spectrometry, X-ray diffraction (XRD) and sequential extraction. The results indicated that Cd could be differently partitioned into various particle-size aggregates, which could be characterized by FTIR and XRD with the same results. Clay had the highest adsorption capacity for the relevant high content of montmorillonite, kaolinite and chlorite, as well as organic matter and cation exchange capability. The fine fraction had the greatest potential availability and mobility of Cd because it was primarily absorbed on the surface of kaolinite. The texture of Phaeozem likely contributed to the metal fate and behavior in the soil environment, as well as other properties such as organic matter, cation exchange capacity and surface area.
收起
摘要 :
It is increasingly common applications where data are naturally generated in a distributed fashion, especially after the emergence of technologies like the Internet of Things (IoT). In sensor networks, in collaborative health or g...
展开
It is increasingly common applications where data are naturally generated in a distributed fashion, especially after the emergence of technologies like the Internet of Things (IoT). In sensor networks, in collaborative health or genomic projects, in credit risk analysis, among other domains, distinct features are collected from multiple sources, including the use of social media and mobile applications, and due to privacy concerns or communication costs, may not be shared among sites. This scenario of vertical data partitioning poses challenges to traditional machine learning (ML) approaches, as classical algorithms are designed to learn from the complete set of features. A common strategy is to combine predictions from local models trained at each site into a global model, and for this purpose, several aggregation methods have been proposed. In this work we tackle a gap within the related literature, performing a comparative evaluation of elementary and meta-learning-based aggregation methods to reveal their strengths and weakness for 46 datasets with varied characteristics. We show that no method outperforms its counterparts in all domains, emphasizing the need for experimental comparison to ensure a good choice in the domain of interest. Moreover, our experiments provide the first insights into the relations between datasets' properties and aggregators' performance. We show that for low class imbalance and a good instance-to-feature ratio, almost all aggregation methods tend to perform well. The silhouette coefficient (reflecting class separability) and class imbalance coefficient are the most influential properties on aggregators' performance, thus we recommend their analysis in the first step of the methodological design. We found that arithmetic-based methods are not suitable for datasets with poor class separability and a large number of classes, whereas meta-learning approaches are less sensitive for datasets with silhouette coefficient close to 0. Our analyses were summarized as classification and regression trees, which have the impact to serve as practical tools for future research. Taken together, our findings give rise to interesting applications in the domain of intelligent systems, especially regarding their potential to reduce the burden of vast experimental comparisons when training ML models with feature-partitioned data. (C) 2020 Elsevier Ltd. All rights reserved.
收起
摘要 :
Similarity join on high-dimensional data is a primitive operation. It is used to find all data pairs that
with distance no more than 𝜖 from the given data set according to a specific distance measure.
As the data set scale and ...
展开
Similarity join on high-dimensional data is a primitive operation. It is used to find all data pairs that
with distance no more than 𝜖 from the given data set according to a specific distance measure.
As the data set scale and dimension increase, computation cost increases vastly. Hadoop and
Spark have become the popular platforms for big-data analysis. Because Spark has native
advantages in iterative computations, we adopted it as our platformto performsimilarity joins on
high-dimensional data sets. In order to resolve problems such as data imbalance, data duplication,
and redundant computation of existing works, we have proposed a new algorithm based on
Symbolic aggregation and vertical decomposition. We first conduct dimension-reduction using
symbolic aggregation method. Then, we applied vertical partition operation on processed data.
The join operations are performed on each vertical partition in parallelmanner and the proposed
new filters are utilized to prune false positives in early stage. Finally, the partial results generated
from each partition will be aggregated and verified to get final results. Our proposed algorithm
can significantly improve the efficiency of similarity joins on high-dimensional data. In order to
verify the efficiency and scalability of our methods, we implemented it using MapReduce and
Spark.We compared ourmethods with existing works on public data sets, and the experimental
results showed that the new methods were more efficient and scalable under different running
environments.
收起
摘要 :
T-spherical fuzzy set (T-SFS) is emerged as one of the effective tools for dealing uncertainty in decision-making process. Whereas, power aggregation operators help us in normalizing the impact of extreme values and capture the in...
展开
T-spherical fuzzy set (T-SFS) is emerged as one of the effective tools for dealing uncertainty in decision-making process. Whereas, power aggregation operators help us in normalizing the impact of extreme values and capture the interconnectedness of the arguments. Meantime, one of the most prominent factors in multi-attribute decision-making (MADM) problems is the lack of awareness of biasness. Neutral operations highlight fair and unbiased character of decision makers. Thus, aiming these advantages and heterogeneity of arguments, a hybrid form of operators, weighted power partitioned neutral average operator and weighted power partitioned neutral geometric operator are developed under T-SFS environment for the first time. Beside these, power weighted neutral average, power ordered weighted neutral average, power hybrid neutral average operators, and their dual forms are initiated too. A new modified score function for T-SFS is formulated. Based on the developed operators and score function, an MADM algorithm is constituted and utilized in solving a hypothetical case study problem on hydrogen (H,) refuelling station site selection. Finally, comparative study of the developed operators with other operators is carried out to explore the applicability and supremacy of the designed MADM algorithm.
收起
摘要 :
Understanding the determinants of species' distributions is a fundamental aim in ecology and a prerequisite for conservation but is particularly challenging in the marine environment. Advances in bio-logging technology have result...
展开
Understanding the determinants of species' distributions is a fundamental aim in ecology and a prerequisite for conservation but is particularly challenging in the marine environment. Advances in bio-logging technology have resulted in a rapid increase in studies of seabird movement and distribution in recent years. Multi-colony studies examining the effects of intra- and inter-colony competition on distribution have found that several species exhibit inter-colony segregation of foraging areas, rather than overlapping distributions. These findings are timely given the increasing rate of human exploitation of marine resources and the need to make robust assessments of likely impacts of proposed marine developments on biodiversity. Here we review the occurrence of foraging area segregation reported by published tracking studies in relation to the density-dependent hinterland (DDH) model, which predicts that segregation occurs in response to inter-colony competition, itself a function of colony size, distance from the colony and prey distribution. We found that inter-colony foraging area segregation occurred in 79% of 39 studies. The frequency of occurrence was similar across the four seabird orders for which data were available, and included species with both smaller (10-100 km) and larger (100-1000 km) foraging ranges. Many predictions of the DDH model were confirmed, with examples of segregation in response to high levels of inter-colony competition related to colony size and proximity, and enclosed landform restricting the extent of available habitat. Moreover, as predicted by the DDH model, inter-colony overlap tended to occur where birds aggregated in highly productive areas, often remote from all colonies. The apparent prevalence of inter-colony foraging segregation has important implications for assessment of impacts of marine development on protected seabird colonies. If a development area is accessible from multiple colonies, it may impact those colonies much more asymmetrically than previously supposed. Current impact assessment approaches that do not consider spatial inter-colony segregation will therefore be subject to error. We recommend the collection of tracking data from multiple colonies and modelling of inter-colony interactions to predict colony-specific distributions.
收起
摘要 :
Datacenter networking is currently dominated by two major trends. One aims toward lossless, flat layer-2 fabrics based on Converged Enhanced Ethernet or InfiniBand, with benefits in efficiency and performance. The other targets fl...
展开
Datacenter networking is currently dominated by two major trends. One aims toward lossless, flat layer-2 fabrics based on Converged Enhanced Ethernet or InfiniBand, with benefits in efficiency and performance. The other targets flexibility based on Software Defined Networking, which enables Overlay Virtual Networking. Although clearly complementary, these trends also exhibit some conflicts: In contrast to physical fabrics, which avoid packet drops by means of flow control, practically all current virtual networks are lossy. We quantify these losses for several common combinations of hy-pervisors and virtual switches, and show their detrimental effect on application performance. Moreover, we propose a zero-loss Overlay Virtual Network (zOVN) designed to reduce the query and flow completion time of latency-sensitive datacenter applications. We describe its architecture and detail the design of its key component, the zVALE lossless virtual switch. As proof of concept, we implemented a zOVN prototype and benchmark it with Partition-Aggregate in two testbeds, achieving an up to 15-fold reduction of the mean completion time with three widespread TCP versions. For larger-scale validation and deeper introspection into zOVN, we developed an OMNeT++ model for accurate cross-layer simulations of a virtualized datacenter, which confirm the validity of our results.
收起
摘要 :
This paper investigates lexical mass-to-count and count-to-mass operators in Slavic languages, primarily Russian and Ukrainian, by exploring the distribution and semantic contribution of the suffix -in/-yn. The focus is on two use...
展开
This paper investigates lexical mass-to-count and count-to-mass operators in Slavic languages, primarily Russian and Ukrainian, by exploring the distribution and semantic contribution of the suffix -in/-yn. The focus is on two uses of the suffix: the singulative turns mass nouns like gorox 'pea' into count, denoting sets of natural units (e.g., gorosina 'a pea'), and the massifler applies to count nouns, such as kon' 'horse', and turns them into mass (e.g., konina 'horsemeat'). It is proposed that each use of -in/-yn contributes a partition operator which triggers a new division into units of the original material part. It is further argued that the singulative and the massifier should be unified, given their (ⅰ) phonological identity, (ⅱ) shared grammatical properties, and (ⅲ) common semantic core. Under the proposed analysis, there is a single suffix that functions as an underspecified lexical partition shifter.
收起
摘要 :
Three models for the aggregated stochastic processes based on an underlying continuous-time Markov repairable system are developed in which two-part partition of states is used. Several availability measures such as interval avail...
展开
Three models for the aggregated stochastic processes based on an underlying continuous-time Markov repairable system are developed in which two-part partition of states is used. Several availability measures such as interval availability, instantaneous availability and steady-state availability are presented. Some of these availabilities are derived by using Laplace transforms, which are more compact and concise. Other reliability-distributions for these three models are given as well.
收起
摘要 :
Range-aggregate queries are to apply a certain aggregate function on all tuples within given query ranges. Existing approaches to range-aggregate queries are insufficient to quickly provide accurate results in big data environment...
展开
Range-aggregate queries are to apply a certain aggregate function on all tuples within given query ranges. Existing approaches to range-aggregate queries are insufficient to quickly provide accurate results in big data environments. In this paper, we propose FastRAQ—a fast approach to range-aggregate queries in big data environments. FastRAQ first divides big data into independent partitions with a balanced partitioning algorithm, and then generates a local estimation sketch for each partition. When a range-aggregate query request arrives, FastRAQ obtains the result directly by summarizing local estimates from all partitions. FastRAQ has time complexity for data updates and time complexity for range-aggregate queries, where is the number of distinct tuples for all dimensions, is the partition number, and is the bucket number in the histogram. We implement the FastRAQ approach on the Linux platform, and evaluate its performance with about 10 billions data records. Experimental results demonstrate that FastRAQ provides range-aggregate query results within a time per- od two orders of magnitude lower than that of Hive, while the relative error is less than 3 percent within the given confidence interval.
收起