DESCRIPTION
Hadoop queries in Pig or Hive can be too slow for real-time data analysis. Impala, an ultra-speedy query engine from Cloudera, supercharges Hadoop by avoiding the typical Map-Reduce overhead and parallelizing queries so that they can run on multiple nodes. This is a big deal for big data, because with Impala, querying Hadoop takes seconds rather than minutes. Impala's dialect is close to standard SQL, and Impala seamlessly accesses HBase and HDFS (Hadoop Distributed File System), allowing considerable freedom in choice of data formats.
Impala in Action is a hands-on guide to querying Hadoop using Impala. It starts by comparing Impala to traditional databases and database services on Hadoop. Then it explains Impala's SQL dialect and the basics of data access. Next, it tackles data visualization tasks and provides techniques for securing Impala with Apache Sentry. The book also shows how to embed Impala queries in a Java client and how to connect to JDBC and ODBC clients. Advanced readers will appreciate the deep dive into Impala's architecture and the practical insights into the issues complicated configurations and complex queries can cause.
RETAIL SELLING POINTS
Design an accessible state of the art analytical platform
Dramatically improves the way data is analyzed
Learn how to truly make data driven decisions
AUDIENCE
No prior experience with Impala required. Knowledge of SQL and Hadoop basics is expected.
ABOUT THE TECHNOLOGY
A rapidly growing technology, Impala is an open source, scalable, distributed SQL query engine that is capable of analyzing terabytes, to petabytes of data. Impala's core architecture enables itself to scale linearly across hundreds to thousands of commodity machines.
- ISBN10 1617291986
- ISBN13 9781617291982
- Publish Date 28 June 2015
- Publish Status Out of Print
- Out of Print 5 March 2021
- Publish Country GB
- Imprint Pearson Education Limited
- Format Paperback
- Pages 250
- Language English