Parallel Computing for Data Science: With Examples in R, C++ and CUDA
- Length: 328 pages
- Edition: 1
- Language: English
- Publisher: Chapman and Hall/CRC
- Publication Date: 2015-06-23
- ISBN-10: 1466587016
- ISBN-13: 9781466587014
- Sales Rank: #1263798 (See Top 100 Books)
Parallel Computing for Data Science: With Examples in R, C++ and CUDA is one of the first parallel computing books to concentrate exclusively on parallel data structures, algorithms, software tools, and applications in data science. It includes examples not only from the classic “n observations, p variables” matrix format but also from time series, network graph models, and numerous other structures common in data science. The examples illustrate the range of issues encountered in parallel programming.
With the main focus on computation, the book shows how to compute on three types of platforms: multicore systems, clusters, and graphics processing units (GPUs). It also discusses software packages that span more than one type of hardware and can be used from more than one type of programming language. Readers will find that the foundation established in this book will generalize well to other languages, such as Python and Julia.
Table of Contents
Chapter 1: Introduction to Parallel Processing in R
Chapter 2: “Why Is My Program So Slow?”: Obstacles to Speed
Chapter 3: Principles of Parallel Loop Scheduling
Chapter 4: The Shared-Memory Paradigm: A Gentle Introduction via R
Chapter 5: The Shared-Memory Paradigm: C Level
Chapter 6: The Shared-Memory Paradigm: GPUs
Chapter 7: Thrust and Rth
Chapter 8: The Message Passing Paradigm
Chapter 9: MapReduce Computation
Chapter 10: Parallel Sorting and Merging
Chapter 11: Parallel Pre x Scan
Chapter 12: Parallel Matrix Operations
Chapter 13: Inherently Statistical Approaches: Subset Methods
Appendix A: Review of Matrix Algebra
Appendix B: R Quick Start
Appendix C: Introduction to C for R Programmers