This timely text/reference describes the development and implementation of large-scale distributed processing systems using open source tools and technologies. Comprehensive in scope, the book presents state-of-the-art material on building high performance distributed computing systems, providing practical guidance and best practices as well as describing theoretical software frameworks. Features: describes the fundamentals of building scalable software systems for large-scale data processing in the new paradigm of high performance distributed computing; presents an overview of the Hadoop ecosystem, followed by step-by-step instruction on its installation, programming and execution; Reviews the basics of Spark, including resilient distributed datasets, and examines Hadoop streaming and working with Scalding; Provides detailed case studies on approaches to clustering, data classification and regression analysis; Explains the process of creating a working recommender system using Scalding and Spark.

In order to carry out data analytics, we need powerful and flexible computing software. However the software available for data analytics is often proprietary and can be expensive. This book reviews Apache tools, which are open source and easy to use. After providing an overview of the background of data analytics, covering the different types of analysis and the basics of using Hadoop as a tool, it focuses on different Hadoop ecosystem tools, like Apache Flume, Apache Spark, Apache Storm, Apache Hive, R, and Python, which can be used for different types of analysis. It then examines the different machine learning techniques that are useful for data analytics, and how to visualize data with different graphs and charts. Presenting data analytics from a practice-oriented viewpoint, the book discusses useful tools and approaches for data analytics, supported by concrete code examples. The book is a valuable reference resource for graduate students and professionals in related fields, and is also of interest to general readers with an understanding of data analytics.

This book focuses on social network analysis from a computational perspective, introducing readers to the fundamental aspects of network theory by discussing the various metrics used to measure the social network. It covers different forms of graphs and their analysis using techniques like filtering, clustering and rule mining, as well as important theories like small world phenomenon. It also presents methods for identifying influential nodes in the network and information dissemination models. Further, it uses examples to explain the tools for visualising large-scale networks, and explores emerging topics like big data and deep learning in the context of social network analysis.

With the Internet becoming part of our everyday lives, social networking tools are used as the primary means of communication. And as the volume and speed of such data is increasing rapidly, there is a need to apply computational techniques to interpret and understand it. Moreover, relationships in molecular structures, co-authors in scientific journals, and developers in a software community can also be understood better by visualising them as networks.

This book brings together the theory and practice of social network analysis and includes mathematical concepts, computational techniques and examples from the real world to offer readers an overview of this domain.