摘要 :
We aim to find the effect of diversity on combiner performance. Three diversity measures are used to calculate the diversity of combined classifiers. We aim to identify which measure is closely related to the combiner performance....
展开
We aim to find the effect of diversity on combiner performance. Three diversity measures are used to calculate the diversity of combined classifiers. We aim to identify which measure is closely related to the combiner performance. Three combiner types are used; Bagging and a conventional three classifier system, in which three classifier types are used; backpropagation neural network, bayesian and k-nearest neighbor classifiers. Additionally we experiment with a feature based combiner system proposed by Alkoot [13,14]. Results obtained on real data indicate the diversity measure to be higher for systems with higher classification rate, if it outperforms other classifiers by a large margin. Otherwise, if the performances of the compared systems are close the diversity measure may not be higher for the best system. On many occasions the diversity measures were not good indicators of system performance. On some instances we found that the more diverse system did not yield a better performance.
收起
摘要 :
The data warehousing approach intends to exploit a very large volume of data to make relevant decisions. In this paper, we deal with object-oriented data warehouse design. More precisely, we present an object-oriented data warehou...
展开
The data warehousing approach intends to exploit a very large volume of data to make relevant decisions. In this paper, we deal with object-oriented data warehouse design. More precisely, we present an object-oriented data warehouse model, integrating temporal and archive data. We provide functions allowing the administrator to specify a data warehouse from a global source schema.
收起
摘要 :
Measures of quantity of information have been studied extensively for more than fifty years. The seminal work on information theory is by Shannon. This work, based on probability theory, can be used in a logical setting when the w...
展开
Measures of quantity of information have been studied extensively for more than fifty years. The seminal work on information theory is by Shannon. This work, based on probability theory, can be used in a logical setting when the worlds are the possible events. This work is also the basis of Lozinskii's work for defining the quantity of information of a formula (or knowledgebase) in propositional logic. But this definition is not suitable when the knowledgebase is inconsistent. In this case, it has no classical model, so we have no "event" to count. This is a shortcoming since in practical applications (e.g. databases) it often happens that the knowledgebase is not consistent. And it is definitely not true that all inconsistent knowledgebases contain the same (null) amount of information, as given by the "classical information theory". As explored for several years in the paraconsistent logic community, two inconsistent knowledgebases can lead to very different conclusions, showing that they do not convey the same information. There has been some recent interest in this issue, with some interesting proposals. Though a general approach for information theory in (possibly inconsistent) logical knowledgebases is missing. Another related measure is the measure of contradiction. It is usual in classical logic to use a binary measure of contradiction: a knowledgebase is either consistent or inconsistent. This dichotomy is obvious when the only deductive tool is classical inference, since inconsistent knowledgebases are of no use. But there are now a number of logics developed to draw non-trivial conclusions from an inconsistent knowledgebase. So this dichotomy is not sufficient to describe the amount of contradiction of a knowledgebase, one needs more fine-grained measures. Some interesting proposals have been made for this. The main aim of this paper is to review the measures of information and contradiction, and to study some potential practical applications. This has significant potential in developing intelligent systems that can be tolerant to inconsistencies when reasoning with real-world knowledge.
收起
摘要 :
Measures of quantity of information have been studied extensively for more than fifty years. The seminal work on information theory is by Shannon. This work, based on probability theory, can be used in a logical setting when the w...
展开
Measures of quantity of information have been studied extensively for more than fifty years. The seminal work on information theory is by Shannon. This work, based on probability theory, can be used in a logical setting when the worlds are the possible events. This work is also the basis of Lozinskii's work for defining the quantity of information of a formula (or knowledgebase) in propositional logic. But this definition is not suitable when the knowledgebase is inconsistent. In this case, it has no classical model, so we have no "event" to count. This is a shortcoming since in practical applications (e.g. databases) it often happens that the knowledgebase is not consistent. And it is definitely not true that all inconsistent knowledgebases contain the same (null) amount of information, as given by the "classical information theory". As explored for several years in the paraconsistent logic community, two inconsistent knowledgebases can lead to very different conclusions, showing that they do not convey the same information. There has been some recent interest in this issue, with some interesting proposals. Though a general approach for information theory in (possibly inconsistent) logical knowledgebases is missing. Another related measure is the measure of contradiction. It is usual in classical logic to use a binary measure of contradiction: a knowledgebase is either consistent or inconsistent. This dichotomy is obvious when the only deductive tool is classical inference, since inconsistent knowledgebases are of no use. But there are now a number of logics developed to draw non-trivial conclusions from an inconsistent knowledgebase. So this dichotomy is not sufficient to describe the amount of contradiction of a knowledgebase, one needs more fine-grained measures. Some interesting proposals have been made for this. The main aim of this paper is to review the measures of information and contradiction, and to study some potential practical applications. This has significant potential in developing intelligent systems that can be tolerant to inconsistencies when reasoning with real-world knowledge.
收起
摘要 :
Data analysts often need to transform an existing dataset, such as with filtering, into a new dataset for downstream analysis. Even the. most trivial of mistakes in this phase can introduce bias and lead to the formation of invali...
展开
Data analysts often need to transform an existing dataset, such as with filtering, into a new dataset for downstream analysis. Even the. most trivial of mistakes in this phase can introduce bias and lead to the formation of invalid conclusions. For example, consider a researcher identifying subjects for trials of a new statin drug. She might identify patients with a high dietary cholesterol intake as a population likely to benefit from the drug, however, selection of these individuals could bias the test population to those with a generally unhealthy lifestyle, thereby compromising the analysis. Reducing the potential for bias in the dataset transformation process can minimize the need to later engage in the tedious, time-consuming process of trying to eliminate bias while preserving the target dataset. We propose a novel interaction model for explain-and-repair data transformation systems, in which users interactively define constraints for transformation code and the resultant data. The system satisfies these constraints as far as possible, and provides an explanation for any problems encountered. We present an algorithm that yields filter-based transformation code satisfying user constraints. We implemented and evaluated a prototype of this architecture, EMERIL. using both synthetic and real-world datasets. Our approach finds solutions 34% more often and 77% more quickly than the previous state-of-the-art solution.
收起
摘要 :
In this paper, we address the ambitious task of formulating a general framework for data mining. We discuss the requirements that such a framework should fulfill: It should elegantly handle different types of data, different data ...
展开
In this paper, we address the ambitious task of formulating a general framework for data mining. We discuss the requirements that such a framework should fulfill: It should elegantly handle different types of data, different data mining tasks, and different types of patterns/models. We also discuss data mining languages and what they should support: this includes the design and implementation of data mining algorithms, as well as their composition into nontrivial multistep knowledge discovery scenarios relevant for practical application. We proceed by laying out some basic concepts, starting with (structured) data and generalizations (e.g., patterns and models) and continuing with data mining tasks and basic components of data mining algorithms (i.e., refinement operators, distances, features and kernels). We next discuss how to use these concepts to formulate constraint-based data mining tasks and design generic data mining algorithms. We finally discuss how these components would fit in the overall framework and in particular into a language for data mining and knowledge discovery.
收起
摘要 :
In this paper, we address the ambitious task of formulating a general framework for data mining. We discuss the requirements that such a framework should fulfill: It should elegantly handle different types of data, different data ...
展开
In this paper, we address the ambitious task of formulating a general framework for data mining. We discuss the requirements that such a framework should fulfill: It should elegantly handle different types of data, different data mining tasks, and different types of patterns/models. We also discuss data mining languages and what they should support: this includes the design and implementation of data mining algorithms, as well as their composition into nontrivial multi-step knowledge discovery scenarios relevant for practical application. We proceed by laying out some basic concepts, starting with (structured) data and generalizations (e.g., patterns and models) and continuing with data mining tasks and basic components of data mining algorithms (I.e., refinement operators, distances, features and kernels). We next discuss how to use these concepts to formulate constraint-based data mining tasks and design generic data mining algorithms. We finally discuss how these components would fit in the overall framework and in particular into a language for data mining and knowledge discovery.
收起
摘要 :
In this paper, we address the ambitious task of formulating a general framework for data mining. We discuss the requirements that such a framework should fulfill: It should elegantly handle different types of data, different data ...
展开
In this paper, we address the ambitious task of formulating a general framework for data mining. We discuss the requirements that such a framework should fulfill: It should elegantly handle different types of data, different data mining tasks, and different types of patterns/models. We also discuss data mining languages and what they should support: this includes the design and implementation of data mining algorithms, as well as their composition into nontrivial multi-step knowledge discovery scenarios relevant for practical application. We proceed by laying out some basic concepts, starting with (structured) data and generalizations (e.g., patterns and models) and continuing with data mining tasks and basic components of data mining algorithms (I.e., refinement operators, distances, features and kernels). We next discuss how to use these concepts to formulate constraint-based data mining tasks and design generic data mining algorithms. We finally discuss how these components would fit in the overall framework and in particular into a language for data mining and knowledge discovery.
收起
摘要 :
When data sources are virtually integrated there is no common and centralized mechanism for maintaining global consistency. In consequence, it is likely that inconsistencies with respect to certain global integrity constraints (IC...
展开
When data sources are virtually integrated there is no common and centralized mechanism for maintaining global consistency. In consequence, it is likely that inconsistencies with respect to certain global integrity constraints (ICs) will occur. In this chapter we consider the problem of defining and computing those answers that are consistent wrt the global ICs when global queries are posed to virtual data integration systems whose sources are specified following the local-as-view approach. The solution is based on a specification using logic programs with stable model semantics of the minimal legal instances of the integration system. Apart from being useful for computing consistent answers, the specification can be used to compute the certain answers to monotone queries, and minimal answers to non monotone queries.
收起
摘要 :
When data sources are virtually integrated there is no common and centralized mechanism for maintaining global consistency. In consequence, it is likely that inconsistencies with respect to certain global integrity constraints (IC...
展开
When data sources are virtually integrated there is no common and centralized mechanism for maintaining global consistency. In consequence, it is likely that inconsistencies with respect to certain global integrity constraints (ICs) will occur. In this chapter we consider the problem of defining and computing those answers that are consistent wrt the global ICs when global queries are posed to virtual data integration systems whose sources are specified following the local-as-view approach. The solution is based on a specification using logic programs with stable model semantics of the minimal legal instances of the integration system. Apart from being useful for computing consistent answers, the specification can be used to compute the certain answers to monotone queries, and minimal answers to non monotone queries.
收起