Spark: The Definitive Guide: Big data processing made simple Front Cover

Spark: The Definitive Guide: Big data processing made simple

  • Length: 450 pages
  • Edition: 1
  • Publisher:
  • Publication Date: 2017-10-25
  • ISBN-10: 1491912219
  • ISBN-13: 9781491912218
  • Sales Rank: #66614 (See Top 100 Books)
Description

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of this open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals.

You’ll explore the basic operations and common functions of Spark’s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine learning library.

  • Get a gentle overview of big data and Spark
  • Learn about DataFrames, SQL, and Datasets—Spark’s core APIs—through worked examples
  • Dive into Spark’s low-level APIs, RDDs, and execution of SQL and DataFrames
  • Understand how Spark runs on a cluster
  • Debug, monitor, and tune Spark clusters and applications
  • Learn the power of Spark’s Structured Streaming and MLlib for machine learning tasks
  • Explore the wider Spark ecosystem, including SparkR and Graph Analysis
  • Examine Spark deployment, including coverage of Spark in the Cloud

Table of Contents

Part I. Gentle Overview Of Big Data And Spark
Chapter 1. What Is Apache Spark?
Chapter 2. A Gentle Introduction To Spark
Chapter 3. A Tour Of Spark’S Toolset

Part II. Structured Apis—Dataframes, Sql, And Datasets
Chapter 4. Structured Api Overview
Chapter 5. Basic Structured Operations
Chapter 6. Working With Different Types Of Data
Chapter 7. Aggregations
Chapter 8. Joins
Chapter 9. Data Sources
Chapter 10. Spark Sql
Chapter 11. Datasets

Part III. Low-Level Apis
Chapter 12. Resilient Distributed Datasets (Rdds)
Chapter 13. Advanced Rdds
Chapter 14. Distributed Shared Variables
Chapter Iv. Production Applications
Chapter 15. How Spark Runs On A Cluster
Chapter 16. Developing Spark Applications
Chapter 17. Deploying Spark
Chapter 18. Monitoring And Debugging
Chapter 19. Performance Tuning

Part V. Streaming
Chapter 20. Stream Processing Fundamentals
Chapter 21. Structured Streaming Basics
Chapter 22. Event-Time And Stateful Processing
Chapter 23. Structured Streaming In Production

Part VI. Advanced Analytics And Machine Learning
Chapter 24. Advanced Analytics And Machine Learning Overview
Chapter 25. Preprocessing And Feature Engineering
Chapter 26. Classification
Chapter 27. Regression
Chapter 28. Recommendation
Chapter 29. Unsupervised Learning
Chapter 30. Graph Analytics
Chapter 31. Deep Learning

Part VII. Ecosystem
Chapter 32. Language Specifics: Python (Pyspark) And R (Sparkr And Sparklyr)
Chapter 33. Ecosystem And Community

To access the link, solve the captcha.