摘要 :
We introduce a partition of the web pages particularly suited to the PageRank problems in which the web link graph has a nested block structure. Based on the partition of the web pages, dangling nodes, common nodes, and general no...
展开
We introduce a partition of the web pages particularly suited to the PageRank problems in which the web link graph has a nested block structure. Based on the partition of the web pages, dangling nodes, common nodes, and general nodes, the hyperlink matrix can be reordered to be a more simple block structure.Then based on the parallel computation method, we propose an algorithm for the PageRank problems. In this algorithm, the dimension of the linear system becomes smaller, and the vector for general nodes in each block can be calculated separately in every iteration. Numerical experiments show that this approach speeds up the computation of PageRank.
收起
摘要 :
As the information on the World Wide Web is growing every day, user searching for information can be easily lost in the hyperlinked structure of the web. The main goal of the search engine is to return relevant information to the ...
展开
As the information on the World Wide Web is growing every day, user searching for information can be easily lost in the hyperlinked structure of the web. The main goal of the search engine is to return relevant information to the user in respond of a query. In this paper we describe PageRank algorithm and simulate it in PageRank simulator, then we show how text search affects PageRank result and finally with the help of graph PageRank and final PageRank values are compared.
收起
摘要 :
The World Wide Web has emerged to become the biggest and most popular way of communication and information dissemination. Every day, the Web is expending and people generally rely on search engine to explore the web. Because of it...
展开
The World Wide Web has emerged to become the biggest and most popular way of communication and information dissemination. Every day, the Web is expending and people generally rely on search engine to explore the web. Because of its rapid and chaotic growth, the resulting network of information lacks of organization and structure. It is a challenge for service provider to provide proper, relevant and quality information to the internet users by using the web page contents and hyperlinks between web pages. This paper deals with analysis and comparison of web pages ranking algorithms based on various parameters to find out their advantages and limitations for ranking web pages and to give the further scope of research in web pages ranking algorithms. Six important algorithms: the Page Rank, Query Dependent-PageRank, HITS, SALSA, Simultaneous Terms Query Dependent-PageRank (SQD-PageRank) and Onto-SQD-PageRank are presented and their performances are discussed.
收起
摘要 :
Web crawlers are essential to many Web applications, such as Web search engines, Web archives, and Web directories, which maintain Web pages in their local repositories. In this paper, we study the problem of crawl scheduling that...
展开
Web crawlers are essential to many Web applications, such as Web search engines, Web archives, and Web directories, which maintain Web pages in their local repositories. In this paper, we study the problem of crawl scheduling that biases crawl ordering toward important pages. We propose a set of crawling algorithms for effective and efficient crawl ordering by prioritizing important pages with the well-known PageRank as the importance metric. In order to score URLs, the proposed algorithms utilize various features, including partial link structure, inter-host links, page titles, and topic relevance. We conduct a large-scale experiment using publicly available data sets to examine the effect of each feature on crawl ordering and evaluate the performance of many algorithms. The experimental results verify the efficacy of our schemes. In particular, compared with the representative RankMass crawler, the FPR-title-host algorithm reduces computational overhead by a factor as great as three in running time while improving effectiveness by 5% in cumulative PageRank.
收起
摘要 :
Google's success derives in large part from its PageRank algorithm, which ranks the importance of web pages according to an eigenvector of a weighted link matrix. Analysis of the PageRank formula provides a wonderful applied topic...
展开
Google's success derives in large part from its PageRank algorithm, which ranks the importance of web pages according to an eigenvector of a weighted link matrix. Analysis of the PageRank formula provides a wonderful applied topic for a linear algebra course. Instructors may assign this article as a project to more advanced students or spend one or two lectures presenting the material with assigned homework from the exercises. This material also complements the discussion of Markov chains in matrix algebra. Maple and Mathematica files supporting this material can be found at www.rose-liulman.edu/similar to bryan.
收起
The uniqueness of multilinear PageRank vectors is discussed, and the new uniqueness condition is given. The new results are better than the one given in the work of Gleich et al. published in SIAM J Matrix A
The uniqueness of multilinear PageRank vectors is discussed, and the new uniqueness condition is given. The new results are better than the one given in the work of Gleich et al. published in SIAM J Matrix Anal Appl. 2015;36;1409‐1465. Numerical examples are given to demonstrate the new theoretical results.
摘要 :
A new algorithm for attributed multiplex networks is proposed and analysed with the main objective to compute the centrality of the nodes based on the original PageRank model used to establish a ranking in the Web pages network. T...
展开
A new algorithm for attributed multiplex networks is proposed and analysed with the main objective to compute the centrality of the nodes based on the original PageRank model used to establish a ranking in the Web pages network. Taking as a basis the Adapted PageRank Algorithm for monoplex networks with data and the two-layer PageRank approach, an algorithm for biplex networks is designed with two main characteristics. First, it solves the drawback of the existence of isolated nodes in any of the layers. Second, the algorithm allows us to choose the value of the parameter alpha controlling the importance assigned to the network topology and the data associated to the nodes in the Adapted PageRank Algorithm, respectively. The proposed algorithm inherits this ability to determine the importance of node attribute data in the calculation of the centrality; yet, going further, it allows to choose different alpha values for each of the two layers. The biplex algorithm is then generalised to the case of multiple layers, that is, for multiplex networks. Its possibilities and characteristics are demonstrated using a dataset of aggregate origin-destination flows of private cars in Rome. This dataset is augmented with attribute data describing city locations. In particular, a biplex network is constructed by taking the data about car mobility for layer 1. Layer 2 is generated from data describing the local bus transport system. The algorithm establishes the most central locations in the city when these layers are intertwined with the location attributes in the biplex network. Four cases are evaluated and compared for different values of the parameter that modulates the importance of data in the network. (c) 2020 Elsevier Inc. All rights reserved.
收起
摘要 :
Using combinatorial and analytic techniques, we give conditioning bounds for the stationary vector pi(T) of a stochastic matrix of the form cA + (1 - c)B, where c is an element of (0, 1) is a scalar, and A and B are stochastic mat...
展开
Using combinatorial and analytic techniques, we give conditioning bounds for the stationary vector pi(T) of a stochastic matrix of the form cA + (1 - c)B, where c is an element of (0, 1) is a scalar, and A and B are stochastic matrices, the latter being rank one. Such matrices and their stationary vectors arise as a key component in Google's PageRank algorithm. The conditioning bounds considered include normwise, absolute componentwise, and relative componentwise, and the bounds depend on c, and on quantities such as the number of dangling nodes (which correspond to rows of A having all entries equal), or the lengths of certain cycles in the directed graph associated with A. It is shown that if vertex j is on only long cycles in that directed graph, then the corresponding entry in pi(T) exhibits better conditioning properties, and that for dangling nodes, the sensitivity of the corresponding entries in pi(T) decreases as the number of dangling nodes increases. Conditions are given that are sufficient to ensure that an iterate of the power method accurately reflects the relative ordering of two entries in pi(T). (c) 2006 Elsevier Inc. All rights reserved.
收起
摘要 :
The Google matrix is a Web hyperlink matrix which is given by P(alpha) = alpha P+(1- alpha) E, where P is a row stochastic matrix, E is a row stochastic rank-one matrix, and 0 < alpha < 1. In this paper we explore the analytic expression of the Jordan canonical form and point out that a theorem due to Serra-Capizzano (cf. Theorem 2.3 in [SIAM J. Matrix Anal. Appl., 27 ( 2005), pp. 305-312]) can be used for estimating the condition number of the PageRank vector as a function of a now viewed in the complex field. Furthermore, we give insight into a more efficient scaling matrix in order to minimize the condition number....
展开
The Google matrix is a Web hyperlink matrix which is given by P(alpha) = alpha P+(1- alpha) E, where P is a row stochastic matrix, E is a row stochastic rank-one matrix, and 0 < alpha < 1. In this paper we explore the analytic expression of the Jordan canonical form and point out that a theorem due to Serra-Capizzano (cf. Theorem 2.3 in [SIAM J. Matrix Anal. Appl., 27 ( 2005), pp. 305-312]) can be used for estimating the condition number of the PageRank vector as a function of a now viewed in the complex field. Furthermore, we give insight into a more efficient scaling matrix in order to minimize the condition number.
收起
摘要 :
Non-backtracking centrality was introduced as a way to correct what may be understood as a deficiency in the eigenvector centrality, since the eigenvector centrality in a network can be artificially increased in high-degree nodes ...
展开
Non-backtracking centrality was introduced as a way to correct what may be understood as a deficiency in the eigenvector centrality, since the eigenvector centrality in a network can be artificially increased in high-degree nodes (hubs) because a hub is central because its neighbors are central, but these, in turn, are central just because they are hub neighbors. We define the non-backtracking PageRank as a new measure modifying the well-known classic PageRank in order to avoid the possibility of the random walker returning to the node immediately visited (non-backtracking walk). But, as we show, this measure presents a gap and a remarkable difference between the limit of "no penalty for return trips" and the direct calculation of the non-backtracking PageRank. Also, as it is shown in the applications presented, in certain cases this new measure produces notable variations with respect to the classifications obtained by the classic PageRank. (C) 2019 Elsevier Ltd. All rights reserved.
收起