Data Algorithms: Recipes for Scaling Up with Hadoop and Spark

Length: 778 pages
Edition: 1
Language: English
Publisher: O'Reilly Media
Publication Date: 2015-05-07
ISBN-10: 1491906189
ISBN-13: 9781491906187
Sales Rank: #765207 (See Top 100 Books)

0 ratings

Description

Learn the algorithms and tools you need to build MapReduce applications with Hadoop and Spark for processing gigabyte, terabyte, or petabyte-sized datasets on clusters of commodity hardware. With this practical book, author Mahmoud Parsian, head of the big data team at Illumina, takes you step-by-stepthrough the design of machine-learning algorithms, such as Naive Bayes and Markov Chain, and shows you how apply them to clinical and biological datasets, using MapReduce design patterns.

Apply MapReduce algorithms to clinical and biological data, such as DNA-Seq and RNA-Seq
Use the most relevant regression/analytical algorithms used for different biological data types
Apply t-test, joins, top-10, and correlation algorithms using MapReduce/Hadoop and Spark

Chapter 1 Secondary Sort: Introduction
Chapter 2 Secondary Sorting: Detailed Example
Chapter 3 Top 10 List
Chapter 4 Left Outer Join in MapReduce
Chapter 5 Order Inversion Pattern
Chapter 6 Moving Average
Chapter 7 Market Basket Analysis
Chapter 8 Common Friends
Chapter 9 Recommendation Engines using MapReduce
Chapter 10 Content-Based Recommendation: Movies
Chapter 11 Smarter Email Marketing with Markov Model
Chapter 12 K-Means Clustering
Chapter 13 kNN: k-Nearest-Neighbors
Chapter 14 Naive Bayes
Chapter 15 Sentiment Analysis
Chapter 16 Finding, Counting and Listing all Triangles in Large Graphs
Chapter 17 K-mer Counting
Chapter 18 DNA-Sequencing
Chapter 19 Cox Regression
Chapter 20 Cochran-Armitage Test for Trend
Chapter 21 Allelic Frequency
Chapter 22 The T-Test
Chapter 23 Computing Pearson Correlation
Chapter 24 DNA Base Count
Chapter 25 RNA-Sequencing
Chapter 26 Gene Aggregation
Chapter 27 Linear Regression
Chapter 28 MapReduce and Monoids
Chapter 29 The Small Files Problem
Chapter 30 Huge Cache for MapReduce
Chapter 31 Bloom Filter