Data, Semantics and Cloud Computing Series

Book 485

Data Intensive Computing for Biodiversity

by Sarinder K. Dhillon and Amandeep S. Sidhu

Published 25 June 2013

This book is focused on the development of a data integration framework for retrieval of biodiversity information from heterogeneous and distributed data sources. The data integration system proposed in this book links remote databases in a networked environment, supports heterogeneous databases and data formats, links databases hosted on multiple platforms, and provides data security for database owners by allowing them to keep and maintain their own data and to choose information to be shared and linked. The book is a useful guide for researchers, practitioners, and graduate-level students interested in learning state-of-the-art development for data integration in biodiversity.

Book 759

Optimized Cloud Based Scheduling

by Rong Kun Jason Tan, John A. Leong, and Amandeep S. Sidhu

Published 5 March 2018

This book presents an improved design for service provisioning and allocation models that are validated through running genome sequence assembly tasks in a hybrid cloud environment. It proposes approaches for addressing scheduling and performance issues in big data analytics and showcases new algorithms for hybrid cloud scheduling. Scientific sectors such as bioinformatics, astronomy, high-energy physics, and Earth science are generating a tremendous flow of data, commonly known as big data. In the context of growing demand for big data analytics, cloud computing offers an ideal platform for processing big data tasks due to its flexible scalability and adaptability. However, there are numerous problems associated with the current service provisioning and allocation models, such as inefficient scheduling algorithms, overloaded memory overheads, excessive node delays and improper error handling of tasks, all of which need to be addressed to enhance the performance of big data analytics.

Book 806

Large Scale Data Analytics

by Chung Yik Cho, Rong Kun Jason Tan, John A. Leong, and Amandeep S. Sidhu

Published 25 January 2019

This book presents a language integrated query framework for big data. The continuous, rapid growth of data information to volumes of up to terabytes (1,024 gigabytes) or petabytes (1,048,576 gigabytes) means that the need for a system to manage and query information from large scale data sources is becoming more urgent. Currently available frameworks and methodologies are limited in terms of efficiency and querying compatibility between data sources due to the differences in information storage structures. For this research, the authors designed and programmed a framework based on the fundamentals of language integrated query to query existing data sources without the process of data restructuring. A web portal for the framework was also built to enable users to query protein data from the Protein Data Bank (PDB) and implement it on Microsoft Azure, a cloud computing environment known for its reliability, vast computing resources and cost-effectiveness.