摘要 :
Trillions of bytes of data are generated every day in different forms, and extracting useful information from that massive amount of data is the study of data mining. Sequential pattern mining is a major branch of data mining that...
展开
Trillions of bytes of data are generated every day in different forms, and extracting useful information from that massive amount of data is the study of data mining. Sequential pattern mining is a major branch of data mining that deals with mining frequent sequential patterns from sequence databases. Due to items having different importance in real-life scenarios, they cannot be treated uniformly. With today's datasets, the use of weights in sequential pattern mining is much more feasible. In most cases, as in real-life datasets, pushing weights will give a better understanding of the dataset, as it will also measure the importance of an item inside a pattern rather than treating all the items equally. Many techniques have been introduced to mine weighted sequential patterns, but typically these algorithms generate a massive number of candidate patterns and take a long time to execute. This work aims to introduce a new pruning technique and a complete framework that takes much less time and generates a small number of candidate sequences without compromising with completeness. Performance evaluation on real-life datasets shows that our proposed approach can mine weighted patterns substantially faster than other existing approaches.
收起
摘要 :
Sequences are one of the most important types of data. Recently, mining and analysis of sequence data has been studied in several fields. In a DNA sequences may exist other characters then not exist in alphabet. It is related to a...
展开
Sequences are one of the most important types of data. Recently, mining and analysis of sequence data has been studied in several fields. In a DNA sequences may exist other characters then not exist in alphabet. It is related to a function of the DNA that has been preserved in the evolutionary process of an organism. Discovery of DNA sequences dataset that contains gap is very hard job for algorithms. We present an algorithm that discovery this DNA sequences in datasets. Our algorithm used sequential pattern mining method for in problems.
收起
摘要 :
Frequent pattern mining has become very useful and interesting to researchers due to its high applicability. Different real-life databases (e.g., sensor network, medical diagnosis data) are uncertain in their nature. Many algorith...
展开
Frequent pattern mining has become very useful and interesting to researchers due to its high applicability. Different real-life databases (e.g., sensor network, medical diagnosis data) are uncertain in their nature. Many algorithms have been developed to mine the frequent uncertain patterns based on expected support values. Nonetheless, those are circumscribed to find the frequent patterns by using some filtering constraints. Moreover, it is challenging to find the actual interesting patterns as different patterns carry different importance. In this work, a new framework is proposed to mine sequences in uncertain databases satisfying both weight and support constraints. Subsequently, an efficient algorithm (uWSequence) is developed to discover the uncertain weighted sequences. In addition, the pruning measures iMaxPr, and expSupport(top) play a vital role to make uWSequence remarkably time-efficient, by filtering out the unfavorable patterns in early stages. The applicability of this proposed framework is shown to solve various problems (e.g., weather forecasting, sensor-based event findings). To our knowledge, ours is the first work on weighted sequences in uncertain databases. Extensive performance analysis confirms the efficiency of the proposed algorithm as well as the superiority over the existing algorithms. (C) 2018 Elsevier Inc. All rights reserved.
收起
摘要 :
Sequential pattern mining in data streams environment is an interesting data mining problem. The problem of finding sequential patterns in static databases had been studied extensively in the past years, however mining sequential ...
展开
Sequential pattern mining in data streams environment is an interesting data mining problem. The problem of finding sequential patterns in static databases had been studied extensively in the past years, however mining sequential patterns in the data streams still an active field for researches. In this research a new greedy sequence pattern mining algorithm for the data streams is introduced, it will be used to find the strongly supported sequences. The proposed algorithm is built based on the sequence tree which is used to find the sequential patterns in static databases. The proposed algorithm divides the streams into patches or windows and each patch will update the sequence tree which built from the previous windows. An example is introduced to explain how this algorithm works. We also show the efficiency and the effectiveness of the proposed algorithm on a synthetic dataset and prove how it is suited for data streams environment. We showed experimentally that the proposed algorithm is more efficient than the PrefixSpan algorithm for patterns with any support less than 30% for CPU time and with any support less than 60% for memory usage.
收起
摘要 :
A large event sequence can generate episode rules that are patterns which help to identify the possible dependencies existing among event types. Frequent episodes occurring in a simple sequence of events are commonly used for mini...
展开
A large event sequence can generate episode rules that are patterns which help to identify the possible dependencies existing among event types. Frequent episodes occurring in a simple sequence of events are commonly used for mining the episodes from a sequential database. Mining serial positioning episode rules (MSPER) using a fixed-gap episode occurrence suffers from unsatisfied scalability with complex sequences to test whether an episode occurs in a sequence. Large number of redundant nodes was generated in the MSPER-trie-based data structure. In this paper, forward and backward search algorithm (FBSA) is proposed here to detect minimal occurrences of frequent peak episodes. An extensive correlation of parameter settings and the generating procedure of fixed-gap episodes are carried out. To generate a fixed-gap episode and estimate the variance that decides the parameter selection in event sequences, Spearman's correlation coefficient is used for verifying the sequence of occurrences of the episodes. MFSPER with FBSA is developed to eh'minate the frequent sequence scans and redundant event sets. The MFSPER-FBSA stores the minimal occurrences of frequent peak episodes from the event sequences. The experimental evaluation on benchmark datasets shows that the proposed technique outperforms the existing methods with respect to memory, execution time, recall and precision.
收起
摘要 :
A method is proposed for selecting a rational mining sequence with internal dumping for flat stratified deposits, using new principles of the open-pit process-space formation and development. The main criteria for substantiating t...
展开
A method is proposed for selecting a rational mining sequence with internal dumping for flat stratified deposits, using new principles of the open-pit process-space formation and development. The main criteria for substantiating the mining sequence are geometrical form and development direction of the open-pit space, structure of the working wall and transportation network, internal dumping capacities and mining earthworks volumes.
收起
摘要 :
Spatiotemporal event sequences (STESs) are the ordered series of event types whose instances frequently follow each other in time and are located close-by. An STES is a spatiotemporal frequent pattern type, which is discovered fro...
展开
Spatiotemporal event sequences (STESs) are the ordered series of event types whose instances frequently follow each other in time and are located close-by. An STES is a spatiotemporal frequent pattern type, which is discovered from moving region objects whose polygon-based locations continiously evolve over time. Previous studies on STES mining require significance and prevalence thresholds for the discovery, which is usually unknown to domain experts. The quality of the discovered sequences is of great importance to the domain experts who use these algorithms. We introduce a novel algorithm to find the most relevant STESs without threshold values. We tested the relevance and performance of our threshold-free algorithm with a case study on solar event metadata, and compared the results with the previous STES mining algorithms.
收起
摘要 :
Rare events analysis is an area that includes methods for the detection and prediction of events, e.g. a network intrusion or an engine failure, that occur infrequently and have some impact to the system. There are various methods...
展开
Rare events analysis is an area that includes methods for the detection and prediction of events, e.g. a network intrusion or an engine failure, that occur infrequently and have some impact to the system. There are various methods from the areas of statistics and data mining for that purpose. In this article we propose PREVENT, an algorithm which uses inter-transactional patterns for the prediction of rare events in transaction databases. PREVENT is a general purpose inter-transaction association rules mining algorithm that optimally fits the demands of rare event prediction. It requires only 1 scan on the original database and 2 over the transformed, which is considerably smaller and it is complete as it does not miss any patterns. We provide the mathematical formulation of the problem and experimental results that show PREVENT's efficiency in terms of run time and effectiveness in terms of sensitivity and specificity.
收起
摘要 :
Intraday traders buy and sell financial instruments in the short term, typically within the same trading day. Stocks are notable examples of financial instruments. However, since hundreds of stocks are listed on the stock exchange...
展开
Intraday traders buy and sell financial instruments in the short term, typically within the same trading day. Stocks are notable examples of financial instruments. However, since hundreds of stocks are listed on the stock exchange selecting on each trading day the most tradeable stocks is a challenging task, which is commonly addressed through manual inspection of historical stock prices and technical indicators. This paper aims at discovering tradeable stocks on a given trading day by analyzing the historical prices assumed by the same stocks or by other ones on the preceding days by means of regression and weighted sequence mining techniques. The use of regression and weighted sequence mining techniques allows traders to automatically consider a potentially large number of candidate stocks and to effectively analyze their price variations across consecutive days. The experimental results, which were achieved on data acquired from different markets and under different market conditions, show that sequence mining algorithms yield profits higher than both regression techniques and naive strategies. (C) 2017 Elsevier Inc. All rights reserved.
收起
摘要 :
Sequence mining is one of the fundamental data mining tasks. In this paper we present a novel approach for mining frequent sequences, called Prism. It utilizes a vertical approach for enumeration and support counting, based on the...
展开
Sequence mining is one of the fundamental data mining tasks. In this paper we present a novel approach for mining frequent sequences, called Prism. It utilizes a vertical approach for enumeration and support counting, based on the novel notion of primal block encoding, which in turn is based on prime factorization theory. Via an extensive evaluation on both synthetic and real datasets, we show that Prism outperforms popular sequence mining methods like SPADE [M.J. Zaki, SPADE: An efficient algorithm for mining frequent sequences, Mach. Learn. J. 42 (1/2) (Jan/Feb 2001) 31-60). PrefixSpan (J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U Dayal, M.-C. Hsu, PrefixSpan: Mining sequential patterns efficiently by prefixprojected pattern growth, in: Int'l Conf. Data Engineering, April 2001 ] and SPAM [J. Ayres, J.E. Gehrke, T. Yiu, J. Flannick, Sequential pattern mining using bitmaps, in: SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining, July 2002], by an order of magnitude or more.
收起