Set up your Hadoop clusters and integrate them with processing tools such as Pig, Hive, and Spark
About This Book
* Deploy a robust system for massive and parallelized data processing and storage
*Build applications by implementing algorithms that process big data
*Explore key tools for managing Hadoop big data ecosystems effectively
Who This Book Is For
This book is for you if you're a data scientist or programmer with a background in business intelligence, data warehousing, data modeling, or machine learning and want to learn about the world of distributed data processing.
What You Will Learn
* Insert data into Hadoop Distributed File System (HDFS)
*Compute algorithms with MapReduce
*Delete and transport data to test Erasure and Balancer
*Build a YARN application for the MapReduce workflow
*Create Resilient Distributed Data System (RDD) analytics for Twitter tags and visualize your data using Python
*Configure multiple permission cases to see how Sqoop to Hive secured access works
In Detail
Apache Hadoop is an open source distributed processing framework that processes, manages, and stores big data for applications.
Hadoop Fundamentals begins by covering how distributed file systems in Hadoop work and how they are managed with YARN, a Hadoop management layer. You'll understand the MapReduce paradigm, which is the basic paradigm of data processing and analytics in parallelized systems. You'll then delve into Apache Spark, a super-fast cluster computing technology that extends the Hadoop MapReduce functionality to efficiently perform a variety of computations. As you advance, you'll explore data resources in the Hadoop ecosystem made by the big data community and enterprise users to find out what lies beyond MapReduce computations. This Hadoop book also takes you through frameworks such as Flame, Sqoop, Hive, and HBase for ingesting and warehousing different types of data. Finally, you'll learn all about the solutions that Hadoop systems implement to solve security issues.
By the end of this book, you'll have understood what big data is and have the skills necessary to work with Hadoop systems.
- ISBN13 9781838827618
- Publish Date 30 August 2019
- Publish Status Out of Print
- Out of Print 9 February 2021
- Publish Country GB
- Imprint Packt Publishing Limited
- Format Paperback
- Pages 423
- Language English