摘要 :
Reverse k-Nearest Neighbor (RkNN) Queries have got considerable attentions over the recent years. Most state of the art methods use the two-step (filter-refinement) RANN processing. However, for a large k, the amount of calculatio...
展开
Reverse k-Nearest Neighbor (RkNN) Queries have got considerable attentions over the recent years. Most state of the art methods use the two-step (filter-refinement) RANN processing. However, for a large k, the amount of calculation becomes very heavy, especially in the filter step. This is not acceptable for most mobile devices. A new filter strategy called BRC is proposed to deal with the filter step for R&NN queries. There are two pruning heuristics in BRC. The experiments show that the processing time of BRC is still acceptable for most mobile devices when k is large. And we extend the BRC to the continuous R&NN queries.
收起
摘要 :
Reverse k-Nearest Neighbor (RkNN) Queries have got considerable attentions over the recent years. Most state of the art methods use the two-step (filter-refinement) RkNN processing. However, for a large k, the amount of calculatio...
展开
Reverse k-Nearest Neighbor (RkNN) Queries have got considerable attentions over the recent years. Most state of the art methods use the two-step (filter-refinement) RkNN processing. However, for a large k, the amount of calculation becomes very heavy, especially in the filter step. This is not acceptable for most mobile devices. A new filter strategy called BRC is proposed to deal with the filter step for RkNN queries. There are two pruning heuristics in BRC. The experiments show that the processing time of BRC is still acceptable for most mobile devices when k is large. And we extend the BRC to the continuous RkNN queries.
收起
摘要 :
The outlier detection algorithm based on reverse k-nearest neighbors can detect isolated points. The time complexity of finding the k-nearest neighbor is O(kN~2), which is not suitable for large data set, and the selection of the ...
展开
The outlier detection algorithm based on reverse k-nearest neighbors can detect isolated points. The time complexity of finding the k-nearest neighbor is O(kN~2), which is not suitable for large data set, and the selection of the parameters k have a great impact on getting the outliers in large data set. This paper used an adaptive method to determine the parameters k, and proposed an efficient pruning method by the triangle inequality, which reduced the computation in detecting outliers. The theoretical analysis and experimental results demonstrated the feasibility and efficiency of the algorithm.
收起
摘要 :
The outlier detection algorithm based on reverse k-nearest neighbors can detect isolated points. The time complexity of finding the k-nearest neighbor is O(kN~2), which is not suitable for large data set, and the selection of the ...
展开
The outlier detection algorithm based on reverse k-nearest neighbors can detect isolated points. The time complexity of finding the k-nearest neighbor is O(kN~2), which is not suitable for large data set, and the selection of the parameters k have a great impact on getting the outliers in large data set. This paper used an adaptive method to determine the parameters k, and proposed an efficient pruning method by the triangle inequality, which reduced the computation in detecting outliers. The theoretical analysis and experimental results demonstrated the feasibility and efficiency of the algorithm.
收起
摘要 :
In recent years, the classification problem of an imbalanced dataset is getting a high demand in the field of machine learning. The SMOTE (Synthetic Minority Oversampling Technique) is a traditional approach to solve this issue. T...
展开
In recent years, the classification problem of an imbalanced dataset is getting a high demand in the field of machine learning. The SMOTE (Synthetic Minority Oversampling Technique) is a traditional approach to solve this issue. The main drawback of SMOTE is the issue of overfitting, as it randomly synthesized the minority data samples taking no notice of the significance of the majority class. To solve this problem, the paper proposes a new algorithm named as Reverse-Synthetic Minority Oversampling Technique (R-SMOTE), based on SMOTE and Reverse-Nearest Neighbor (R-NN). The proposed R-SMOTE extracts a significant set of data points out of the minority class and considers that set to synthesize new samples from their reverse nearest neighbors. The proposed algorithm is compared with four standard oversampling techniques. From the empirical analysis, it is observed that the proposed R-SMOTE had produced much improved results over the existing oversampling methods.
收起
摘要 :
For a given query object, Reverse k-Nearest Neighbor queries retrieve those objects that have the query object among their k-nearest neighbors. However, computing the k-nearest neighbor sets for all points in a database is expensi...
展开
For a given query object, Reverse k-Nearest Neighbor queries retrieve those objects that have the query object among their k-nearest neighbors. However, computing the k-nearest neighbor sets for all points in a database is expensive in terms of computational costs. Therefore, specific index structures have been invented to apply pruning heuristics which aim at reducing the search space. At time, the state-of-the-art index structure for enabling fast RkNN query processing in general metric spaces is the MRfcNNCoP-Tree which uses linear functions to approximate lower and upper bounds on the k-distances to prune the search space. Storing those linear functions results in additional storage costs in O(n) which might be infeasible in situation where storage space is limited, e.g., on mobile devices. In this work, we present a novel index based on the MRkNNCoP-Tree as well as recent developments in the field of neural indexing. By learning a single neural network model that approximates the fc-nearest neighbor distance bounds for all points in a database, the storage complexity of the proposed index structure is reduced to O(1) while the index is still able to guarantee exact query results. As shown in our experimental evaluations on synthetic and real-world data sets, our approach can significantly reduce the required storage space in trade-off to some growth in terms of refinement sets when relying on exact query processing. We provide our code at www.github. com/mberr/k-distance- prediction.
收起
摘要 :
For a given query object, Reverse k-Nearest Neighbor queries retrieve those objects that have the query object among their k-nearest neighbors. However, computing the k-nearest neighbor sets for all points in a database is expensi...
展开
For a given query object, Reverse k-Nearest Neighbor queries retrieve those objects that have the query object among their k-nearest neighbors. However, computing the k-nearest neighbor sets for all points in a database is expensive in terms of computational costs. Therefore, specific index structures have been invented to apply pruning heuristics which aim at reducing the search space. At time, the state-of-the-art index structure for enabling fast RkNN query processing in general metric spaces is the MRfcNNCoP-Tree which uses linear functions to approximate lower and upper bounds on the k-distances to prune the search space. Storing those linear functions results in additional storage costs in O(n) which might be infeasible in situation where storage space is limited, e.g., on mobile devices. In this work, we present a novel index based on the MRkNNCoP-Tree as well as recent developments in the field of neural indexing. By learning a single neural network model that approximates the fc-nearest neighbor distance bounds for all points in a database, the storage complexity of the proposed index structure is reduced to O(1) while the index is still able to guarantee exact query results. As shown in our experimental evaluations on synthetic and real-world data sets, our approach can significantly reduce the required storage space in trade-off to some growth in terms of refinement sets when relying on exact query processing. We provide our code at www.github. com/mberr/k-distance- prediction.
收起
摘要 :
Geographic objects associated with descriptive texts are becoming prevalent. This gives prominence to spatial keyword queries that take into account both the locations and textual descriptions of content. Specifically, the relevan...
展开
Geographic objects associated with descriptive texts are becoming prevalent. This gives prominence to spatial keyword queries that take into account both the locations and textual descriptions of content. Specifically, the relevance of an object to a query is measured by spatial-textual similarity that is based on both spatial proximity and textual similarity. In this paper, we define Reverse Spatial Textual k Nearest Neighbor (RST kNN) query, i.e.. finding objects that take the query object as one of their k most spatial-textual similar objects. Existing works on reverse kNN queries focus solely on spatial locations but ignore text relevance.To answer RSTkNN queries efficiently, we propose a hybrid index tree called IUR-tree (Intersection-Union R-Tree) that effectively combines location proximity with textual similarity. Based on the IUR-tree, we design a branch-and-bound search algorithm. To further accelerate the query processing, we propose an enhanced variant of the IUR-tree called clustered IUR-tree and two corresponding optimization algorithms. Empirical studies show that the proposed algorithms offer scalability and are capable of excellent performance.
收起
摘要 :
Geographic objects associated with descriptive texts are becoming prevalent. This gives prominence to spatial keyword queries that take into account both the locations and textual descriptions of content. Specifically, the relevan...
展开
Geographic objects associated with descriptive texts are becoming prevalent. This gives prominence to spatial keyword queries that take into account both the locations and textual descriptions of content. Specifically, the relevance of an object to a query is measured by spatial-textual similarity that is based on both spatial proximity and textual similarity. In this paper, we define Reverse Spatial Textual k Nearest Neighbor (RST kNN) query, i.e.. finding objects that take the query object as one of their k most spatial-textual similar objects. Existing works on reverse kNN queries focus solely on spatial locations but ignore text relevance.To answer RSTkNN queries efficiently, we propose a hybrid index tree called IUR-tree (Intersection-Union R-Tree) that effectively combines location proximity with textual similarity. Based on the IUR-tree, we design a branch-and-bound search algorithm. To further accelerate the query processing, we propose an enhanced variant of the IUR-tree called clustered IUR-tree and two corresponding optimization algorithms. Empirical studies show that the proposed algorithms offer scalability and are capable of excellent performance.
收起
摘要 :
A new data stream outlier detection algorithm SODRNN is proposed based on reverse nearest neighbors. this paper researches data stream outlier detection algorithm which is based on Reverse k nearest neighbours. When we analyze the...
展开
A new data stream outlier detection algorithm SODRNN is proposed based on reverse nearest neighbors. this paper researches data stream outlier detection algorithm which is based on Reverse k nearest neighbours. When we analyze the known algorithms, we find that the algorithm cannot deal with the concept drifting problem and they need multi-scan of the dataset. So, this paper introduces the SODRNN algorithm, which needs only one pass of scan for the current sliding window. The empirical study verify the feasibility and effectiveness of X*tree index structure which supports knn searching and the SODRNN algorithm in this paper.
收起