中国科学技术信息研究所--国家工程技术数字图书馆

1. Optimizing distributed data stream processing by tracing

[机翻] 通过跟踪优化分布式数据流处理

[期刊] Zvara, Zoltan Szabo, Peter G. N. Balazs, Barnabas Benczur, Andras 《Future generation computer systems》 2019年90卷Jan.期共14页

摘要 : Heterogeneous mobile, sensor, loT, smart environment, and social networking applications have recently started to produce unbounded, fast, and massive-scale streams of data that have to be processed "on the fly". Systems that proc... 展开

关键词 : Distributed data processing Data stream processing Distributed tracing Data provenance Apache Spark

2. Investigating MapReduce Framework Extensions for Efficient Processing of Geographically Scattered Datasets

[机翻] 研究MapReduce框架扩展对地理上分散的数据集的有效处理

[期刊] Hrishikesh Gadre Ivan Rodero Manish Parashar 《Performance evaluation review》 2011年39卷3期共3页

摘要 : In this paper, we investigate real-world scenarios in which MapReduce programming model and specifically Hadoop framework could be used for processing large-scale, geographically scattered datasets. We propose an Adaptive Reduce T... 展开

关键词 : data center data processing distributed mapreduce

3. Performance of the Distributed Central Analysis in BaBar

[机翻] 分布式中心分析在BaBar中的应用

[期刊] Khan, A. Mommsen, R.K. Gradl, W. Fritsch, M. Petzold, A. Roethel, W. Smith, D.A. 《IEEE Transactions on Nuclear Science》 2006年53卷共5页

摘要 : The total dataset produced by the BaBar experiment at the Stanford Linear Accelerator Center (SLAC) currently comprises roughly$3times 10^9$data events and an equal amount of simulated events, corresponding to 23 Tbytes of real da... 展开

关键词 : Data handling data management data processing distributed computing

4. Success of Data Resource Management in Distributed Environments: An Empirical Investigation

[机翻] 分布式环境下数据资源管理成功的实证研究

[期刊] Hemant Jain K. Ramamurthy Hwa-Suk Ryu 《MIS Quarterly》 1998年22卷1期共29页

摘要 : The trend toward distributed processing has significatly increased the awareness of data as a key corporate resource and underscored the importance of its management. In spite of this, there is a lack of empirical investigation of... 展开

关键词 : Data resource management distributed processing distributed databases

5. Data processing procedure using distribution of slopes of phase differences for broadband VHF interferometer

[期刊] Manabu Akita Michael Stock Zen Kawasaki Paul Krehbiel William Rison Mark Stanley 《Journal of Geophysical Research, D. Atmospheres: JGR》 2014年119卷10期共20页

摘要 : The upgraded VHF digital interferometer (VHF DITF) system is introduced which can continuously sample the radiation associated with lightning. A new processing technique was implemented which uses the distribution of slopes of the... 展开

关键词 : Data processing procedure using distribution

原文获取

6. Distributed Computing Grid Experiences in CMS

[机翻] CMS中的分布式计算网格体验

[期刊] Andreeva, J. Anjum, A. Barrass, T. Bonacorsi, D. Bunn, J. Capiluppi, P. Corvo, M. Darmenov, N. DeFilippis, N. Donno, F. Donvito, G. Eulisse, G. Fanfani, A. Fanzago, F. Filine, A. Grandi, C. Hernandez, J.M. Innocente, V. Jan, A. Lacaprara, S. Legrand, I. 《IEEE Transactions on Nuclear Science》 2005年52卷4期共7页

摘要 : The CMS experiment is currently developing a computing system capable of serving, processing and archiving the large number of events that will be generated when the CMS detector starts taking data. During 2004 CMS undertook a lar... 展开

关键词 : Data flow analysis data management data processing distributed computing distributed information systems high energy physics Data flow analysis data management data processing distributed computing distributed information systems high energy physics

7. CHive: Bandwidth Optimized Continuous Querying in Distributed Clouds

[机翻] 分布式云中带宽优化的连续查询

[期刊] Theeten, Bart Janssens, Nico 《Cloud Computing, IEEE Transactions on》 2015年3卷2期共14页

摘要 : Bandwidth efficient execution of online big data analytics in telecommunication networks demands for tailored solutions. Existing streaming analytics systems are designed to operate in large data centers, assuming unlimited bandwi... 展开

关键词 : Big Data Big data Cloud Data processing Distributed computing Event Stream Processing Optimization Query processing Telecommunications cloud data processing distributed computing event stream processing optimization query processing telecommunications

8. Federating queries in SPARQL 1.1: Syntax, semantics and evaluation

[机翻] sparql1.1中的联合查询：语法、语义和计算

[期刊] Carlos Buil-Aranda Marcelo Arenas Oscar Corcho Axel Polleres 《Journal of web semantics:》 2013年18卷1期共17页

摘要 : Given the sustained growth that we are experiencing in the number of SPARQL endpoints available, the need to be able to send federated SPARQL queries across these has also grown. To address this use case, the W3C SPARQL working gr... 展开

关键词 : SPARQL 1.1 distributed data management distributed query processing RDF

9. Dragoon: a hybrid and efficient big trajectory management system for offline and online analytics

[期刊] Fang, Ziquan Chen, Lu Gao, Yunjun Pan, Lu Jensen, Christian S. 《The VLDB journal》 2021年30卷2期共24页

摘要 : With the explosive use of GPS-enabled devices, increasingly massive volumes of trajectory data capturing the movements of people and vehicles are becoming available, which is useful in many application areas, such as transportatio... 展开 With the explosive use of GPS-enabled devices, increasingly massive volumes of trajectory data capturing the movements of people and vehicles are becoming available, which is useful in many application areas, such as transportation, traffic management, and location-based services. As a result, many trajectory data management and analytic systems have emerged that target either offline or online settings. However, some applications call for both offline and online analyses. For example, in traffic management scenarios, offline analyses of historical trajectory data can be used for traffic planning purposes, while online analyses of streaming trajectories can be adopted for congestion monitoring purposes. Existing trajectory-based systems tend to perform offline and online trajectory analysis separately, which is inefficient. In this paper, we propose a hybrid and efficient framework, called Dragoon, based on Spark, to support both offline and online big trajectory management and analytics. The framework features a mutable resilient distributed dataset model, including RDD Share, RDD Update, and RDD Mirror, which enables hybrid storage of historical and streaming trajectories. It also contains a real-time partitioner capable of efficiently distributing trajectory data and supporting both offline and online analyses. Therefore, Dragoon provides a hybrid analysis pipeline. Support for several typical trajectory queries and mining tasks demonstrates the flexibility of Dragoon. An extensive experimental study using both real and synthetic trajectory datasets shows that Dragoon (1) has similar offline trajectory query performance with the state-of-the-art system UlTraMan; (2) decreases up to doubled storage overhead compared with UlTraMan during trajectory editing; (3) achieves at least 40% improvement of scalability compared with popular streaming processing frameworks (i.e., Flink and Spark Streaming); and (4) offers an average doubled performance improvement for online trajectory data analytics. 收起

关键词 : Trajectory system Data management Data analytics Distributed processing

10. BIG DATA PROCESSING: BIG CHALLENGES AND OPPORTUNITIES

[机翻] 大数据处理：大挑战与机遇

[期刊] CHANGQING JI YU LI WENMING QIU YINGWEI JIN YUJIE XU UCHECHUKWU AWADA KEQIU LI WENYU QU 《Journal of interconnection networks》 2012年13卷3/4期共19页

摘要 : With the rapid growth of emerging applications like social network, semantic web, sensor networks and LBS (Location Based Service) applications, a variety of data to be processed continues to witness a quick increase. Effective ma... 展开

关键词 : Big data cloud computing data management distributed processing