摘要:
Text summarization task involves condensing a given input single document or a set of documents into a shorter piece of textual summary(a.k.a. single-document or multi-document summarization),which preserves the main contents of t...
展开
Text summarization task involves condensing a given input single document or a set of documents into a shorter piece of textual summary(a.k.a. single-document or multi-document summarization),which preserves the main contents of the input.There are different Automatic Document Summarization(ADS,hereafter)is primarily a text compression mechanism to produce a shorter document to quickly access the important goals and main features of the input document. ADS is gaining researchers attention with the increasing volume of text documents all around us. With the advent of the 5G era,the text data generated from different sources including news,comments,and literature are growing explosively. Therefore,we have to spend a lot of time for finding the interesting information we need. Hence,it is very crucial to extract effective summary from these massive textual data. Text summarization methods can be classified into abstractive and extractive summarization. An Abstractive Text Summarization(ATS)is an arbitrary text that describes the contexts of the source document. Extractive Text Summarization(ETS)consists in selecting the most important units(normally sentences)from the original text,but it must be done as closer as humans do. From these two summarization approaches,ETS has captured the research community attention,as it results the textual summary closer to the human being generated summary. However,ETS has multiple challenges:(ⅰ)generic formulations for text extraction,which leads to erroneous summarization;(ⅱ)existing methods generates domain-specific document summaries;and(ⅲ)mostly existing approaches are one-dimensional(i.e.,these approaches are static/fixed/biased). In literature on ETS,several significant automatic approaches are suggested for text summarization,but few of them are focused on generating a better result rather than giving some assumptions about what human being use when producing a summary. In this thesis,a novel approach is suggested for a single document summarization using ETS. The proposed approach is based on particle swarm intelligence algorithm involving clustering mechanism. The most promising side of the proposed approach is that it dynamically extracts text using an efficient fitness function. The proposed algorithm works in three main phases. In first phase,the input document is preprocessed to make it ready for clustering and particle swarm intelligence processing. In next step,the preprocessed document is clustered using k-mean clustering algorithm using Google Normalized Distance as a distance measurement among the sentences. Once the clustered of the sentences are formed,then the proposed algorithm computes the different characteristics values of each sentence in the clusters. Further,these clusters are feed into particle swarm intelligence algorithm. The swarm intelligence returns the significant sentences as a summary of the document from the different clusters. Furthermore,the proposed approach sort the whole input document in such a way each sentence in the final output is similar to its neighboring sentences. Hence,it results in clusters of identical sentences. The importance of the sentences depends upon the density of the cluster,the denser the cluster is,the more important these sentences are. The quality of the summary computed evaluated using different measures including ROUGE,F1-Score,Precision,and Recall. The computed results show the supremacy of the proposed approach. Furthermore,the proposed approach is also compared with the state-of-the-art ETS techniques. The results show that the proposed approach is efficient than the others.
收起