摘要 :
This paper explores the application of structured learning methods (SLMs) to word sense disambiguation (WSD). On one hand, the semantic dependencies between polysemous words in the sentence can be encoded in SLMs. On the other han...
展开
This paper explores the application of structured learning methods (SLMs) to word sense disambiguation (WSD). On one hand, the semantic dependencies between polysemous words in the sentence can be encoded in SLMs. On the other hand, SLMs obtained significant achievements in natural language processing, and so it is a natural idea to apply them to WSD. However, there are many theoretical and practical problems when SLMs are applied to WSD, due to characteristics of WSD. Beginning with the method based on hidden Markov model, this paper proposes for the first time a comprehensive and unified solution for WSD based on maximum entropy Markov model, conditional random field and tree-structured conditional random field, and reduces the time complexity and running time of the proposed methods to a reasonable level by beam search, approximate training, and parallel training. The update of models brings performance improvement, the introduction of one step dependency improves performance by 1--5 percent, the adoption of non-independent features improves performance by 2--3 percent, and the extension of underlying structure to dependency parsing tree improves performance by about 1 percent. On the English all-words WSD dataset of Senseval-2004, the method based on tree-structured conditional random field outperforms the best attendee system significantly. Nevertheless, almost all machine learning methods suffer from data sparseness due to the scarcity of sense tagged data, and so do SLMs. Besides improving structured learning methods according to the characteristics of WSD, another approach to improve disambiguation performance is to mine disambiguation knowledge from all kinds of sources, such as Wikipedia, parallel corpus, and to alleviate knowledge acquisition bottleneck of WSD.
收起
摘要 :
Ubiquitous spam messages cause a serious waste of time and resources. This paper addresses the practical spam filtering problem, and proposes a universal approach to fight with various spam messages. The proposed active multi-fiel...
展开
Ubiquitous spam messages cause a serious waste of time and resources. This paper addresses the practical spam filtering problem, and proposes a universal approach to fight with various spam messages. The proposed active multi-field learning approach is based on: 1) It is cost-sensitive to obtain a label for a real-world spam filter, which suggests an active learning idea; and 2) Different messages often have a similar multi-field text structure, which suggests a multi-field learning idea. The multi-field learning framework combines multiple results predicted from field classifiers by a novel compound weight, and each field classifier calculates the arithmetical average of multiple conditional probabilities predicted from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and regards the more uncertain message as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance at greatly reduced label requirements both in email spam filtering and short text spam filtering. Our active multi-field learning performance, the standard (1-ROCA) % measurement, even exceeds the full feedback performance of some advanced individual classifying algorithm.
收起
摘要 :
Contrary to existing heterogeneous data integration systems which need to be fully integrated before using, a Dataspace Support Platform is a self-sustained system which automatically provides for the user its best endeavor result...
展开
Contrary to existing heterogeneous data integration systems which need to be fully integrated before using, a Dataspace Support Platform is a self-sustained system which automatically provides for the user its best endeavor results regardless of how integrated its sources are. Therefore, a Dataspace Support Platform needs to support uncertainty in mediated schema and in schema mappings. This paper proposes a novel approach to automatically providing reliable mediated schemas and reliable semantic mappings in Dataspace Support Platforms. Our aim is to increase the system's endeavor results by leading it to considering as much as possible information available in any source connected. In fact, we first extract from the source schemas, their corresponding graph representations. Then, we introduce algorithms which automatically extract a set of mediated schemas from the graph representations and a set of semantic mappings between a source and a target mediated schema. Finally, we assign reliability degrees to the mediated schema generated and to the semantic mappings. Indeed, the higher the reliability degree of a given mediated schema or semantic mapping, the more consistent with the source it is. Compared with existing systems, experimental results show that our system is faster and, although completely automatic, it produces reliable mediated schemas and reliable semantic mappings which are as accurate as those produced by semi-automatic systems.
收起
摘要 :
Cadmium (Cd) contamination is the most serious and largest heavy metal pollution of China. It is necessary to explore the mechanism of Cd poisoning and the mechanism of resistance to Cd stress in crops. In the study,Illumina seque...
展开
Cadmium (Cd) contamination is the most serious and largest heavy metal pollution of China. It is necessary to explore the mechanism of Cd poisoning and the mechanism of resistance to Cd stress in crops. In the study,Illumina sequencing technology was used for transcriptome high-throughput sequencing.The molecular mechanism of Cd stress in muskmelon was investigated using gene ontolog (GO) and Kyoto encyclopedia of genes and genomes(KEGG) enrichment analysis.
收起
摘要 :
The flexible geometry configuration of the bistatic synthetic aperture radar (SAR) has many advantages. However, it causes serious measurement error in the bistatic SAR system, which degrades the quality of the SAR images and the ...
展开
The flexible geometry configuration of the bistatic synthetic aperture radar (SAR) has many advantages. However, it causes serious measurement error in the bistatic SAR system, which degrades the quality of the SAR images and the precision of the digital elevation model (DEM) obtained using stereoscopy bistatic SAR. In this paper, the influence of the scene height estimation error, trigger delay, transmitter position measurement error, receiver position measurement error, and transmission line length measurement error are analyzed. These analyses are very useful in bistatic SAR system design. The scene height estimation error, trigger delay, transmitter position measurement error, and synchronization receiver position measurement error affect both the quality of the images and the precision of the DEM obtained by stereoscopy bistatic SAR slightly. The echo receiver position measurement error and transmission line length measurement error affect the quality of the imaging only slightly, but seriously affect the precision of the DEM obtained by stereoscopy bistatic SAR. Luckily, their measurement precision can be quite satisfactory. Simulations and real bistatic experimental results verify the proposed theoretical analysis.
收起
摘要 :
A metal-free catalytic method for the synthesis of 2-deoxy glycosides and disaccharides has been developed using stable 2-deoxy glucosyl and galactosyl acetate donors. They could react with a variety of acceptors in the presence o...
展开
A metal-free catalytic method for the synthesis of 2-deoxy glycosides and disaccharides has been developed using stable 2-deoxy glucosyl and galactosyl acetate donors. They could react with a variety of acceptors in the presence of catalytic amount of TMSOTf at 0癈 to form glycosides, glycoconjugates, and disaccharides with excellent ?selectivity (> 19:1) and yields (up to 99%) in a short time (0.5 h). With this expedient method, several new compounds against human K562 and SMMC7721 cell lines were obtained and tested with in vitro antitumor bioactivities.
收起
摘要 :
Information services on the Internet are being used as an important tool to facilitate discovery of the information that is of user interests. Many approaches have been proposed to discover the information on the Internet, while t...
展开
Information services on the Internet are being used as an important tool to facilitate discovery of the information that is of user interests. Many approaches have been proposed to discover the information on the Internet, while the search engines are the most common ones. However, most of the current approaches of information discovery can discover the keyword-matching information only but cannot recommend the most recent and relative information to users automatically. Sometimes users can give only a fuzzy keyword instead of an accurate one. Thus, some desired information would be ignored by the search engines. Moreover, the current search engines cannot discover the latent but logically relevant information or services for users. This paper measures the semantic-similarity and link-similarity between keywords. Based on that, it introduces the concept of similarity of web pages, and presents a method for information recommendation. The experimental evaluation and comparisons with the existing studies are finally performed.
收起
摘要 :
Recent investigations into proactive network defense have not produced a systematic methodology and structure; in addition, issues including multi-source information fusion and attacking behavior analysis have not been resolved. B...
展开
Recent investigations into proactive network defense have not produced a systematic methodology and structure; in addition, issues including multi-source information fusion and attacking behavior analysis have not been resolved. Borrowing ideas of danger sensing and immune response from danger theory, a proactive network defense model based on danger theory is proposed. This paper defines the signals and antigens in the network environment as well as attacking behavior analysis algorithm, providing evidence for future proactive defense strategy selection. The results of preliminary simulations demonstrate that this model can sense the onset of varied network attacks and corresponding endangered intensities, which help to understand the attack methods of hackers and assess the security situation of the current network, thus a better proactive defense strategy can be deployed. Moreover, this model possesses good robustness and accuracy.
收起
摘要 :
The half elliptical hole with an edge crack in a thermopiezoelectric material is studied by using the complex variable method. First, the mapping function which maps the outside of the elliptical hole and the crack in the right ha...
展开
The half elliptical hole with an edge crack in a thermopiezoelectric material is studied by using the complex variable method. First, the mapping function which maps the outside of the elliptical hole and the crack in the right half plane into the outside of a circular hole in a full plane is given by the method of conformal mapping. Then, the complex potential functions and the field intensity factors (FIF) are presented according to the boundary conditions, respectively. Some useful results can be found by numerical analysis: 1) The influence of the heat flux on FIF depends on the model of the crack; 2) The shape and the size of the hole possess a significant effect on the field distribution at the crack tip.
收起
摘要 :
In mobile ad hoc networks (MANET), nodes move freely and the distribution of access requests changes dynamically. Replica allocation in such a dynamic environment is a significant challenge. The communication cost has become a pro...
展开
In mobile ad hoc networks (MANET), nodes move freely and the distribution of access requests changes dynamically. Replica allocation in such a dynamic environment is a significant challenge. The communication cost has become a prominent factor influencing the performance of replica allocation in the MANET environment. In this paper, a dynamic adaptive replica allocation algorithm that can adapt to the nodes motion is proposed to minimize the communication cost of the object access. When changes occur in the access requests of the object or the network topology, each replica node collects access requests from its neighbors and makes decisions locally to expand the replica to neighbors or to relinquish the replica. This algorithm dynamically adjusts the replica allocation scheme towards a local optimal one. To reduce the oscillation of replica allocation, a statistical method based on history information is utilized to choose stable neighbors and to expand the replica to relatively stable nodes. Simulation results show that our algorithms efficiently reduce the communication cost of object access in MANET environment.
收起