Spark for Data Science Cookbook
- Length: 358 pages
- Edition: 1
- Language: English
- Publisher: Packt Publishing
- Publication Date: 2017-01-05
- ISBN-10: 1785880101
- ISBN-13: 9781785880100
- Sales Rank: #3048899 (See Top 100 Books)
Key Features
- Optimize your work flow with Spark in data science, and get solutions to all your big data problems
- Large-scale data science made easy with Spark
- Get recipes to make the most of Spark’s power and speed in predictive analytics
Book Description
Spark has emerged as the big data platform of choice for data scientists. The real power and value proposition of Apache Spark is its platform to execute data science tasks. Spark’s unique use case is that it combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations to allow data scientists to tackle the complexities that come with raw unstructured data sets.
This hands-on, practical resource will allow you to dive in and become comfortable and confident in working with Spark for data science. We will walk you through various techniques to deal with simple and complex data science tasks with Spark. We’ll effectively offer solutions to problematic concepts in data science using Spark’s data science libraries. The book will help you derive intelligent information at every step of the way through simple yet efficient recipes that will not only show you how to implement algorithms, but also optimize your work.
What you will learn
- Explore the topics of data mining, text mining, NLP, information retrieval, and machine learning
- Solve real-world analytical problems with large data sets
- Get the flavor of challenges in data science and address them with a variety of analytical tools on a distributed system like Spark (apt for iterative algorithms), which offers in-memory processing and more flexibility for data analysis at scale
Table of Contents
Chapter 1. Big Data Analytics with Spark
Chapter 2. Tricky Statistics with Spark
Chapter 3. Data Analysis with Spark
Chapter 4. Clustering, Classification, and Regression
Chapter 5. Working with Spark MLlib
Chapter 6. NLP with Spark
Chapter 7. Working with Sparkling Water – H2O
Chapter 8. Data Visualization with Spark
Chapter 9. Deep Learning on Spark
Chapter 10. Working with SparkR