Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining
- Length: 480 pages
- Edition: 1
- Language: English
- Publisher: Wiley
- Publication Date: 2015-01-20
- ISBN-10: 111883481X
- ISBN-13: 9781118834817
- Sales Rank: #1196774 (See Top 100 Books)
A hands on guide to web scraping and text mining for both beginners and experienced users of R
- Introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL.
- Provides basic techniques to query web documents and data sets (XPath and regular expressions).
- An extensive set of exercises are presented to guide the reader through each technique.
- Explores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management.
- Case studies are featured throughout along with examples for each technique presented.
- R code and solutions to exercises featured in the book are provided on a supporting website.
Table of Contents
Chapter 1: Introduction
Part One: A Primer on Web and Data Technologies
Chapter 2: HTML
Chapter 3: XML and JSON
Chapter 4: XPath
Chapter 5: HTTP
Chapter 6: AJAX
Chapter 7: SQL and relational databases
Chapter 8: Regular expressions and essential string functions
Part Two: A Practical Toolbox for Web Scraping and Text Mining
Chapter 9: Scraping the Web
Chapter 10: Statistical text processing
Chapter 11: Managing data projects
Part Three: A Bag of Case Studies
Chapter 12: Collaboration networks in the US Senate
Chapter 13: Parsing information from semistructured documents
Chapter 14: Predicting the 2014 Academy Awards using Twitter
Chapter 15: Mapping the geographic distribution of names
Chapter 16: Gathering data on mobile phones
Chapter 17: Analyzing sentiments of product reviews