Apache Flume: Distributed Log Collection for Hadoop, 2nd Edition
- Length: 175 pages
- Edition: 1
- Language: English
- Publisher: Packt Publishing
- Publication Date: 2015-02-27
- ISBN-10: 1784392170
- ISBN-13: 9781784392178
- Sales Rank: #2161562 (See Top 100 Books)
Design and implement a series of Flume agents to send streamed data into Hadoop
About This Book
- Construct a series of Flume agents using the Apache Flume service to efficiently collect, aggregate, and move large amounts of event data
- Configure failover paths and load balancing to remove single points of failure
- Use this step-by-step guide to stream logs from application servers to Hadoop’s HDFS
Who This Book Is For
If you are a Hadoop programmer who wants to learn about Flume to be able to move datasets into Hadoop in a timely and replicable manner, then this book is ideal for you. No prior knowledge about Apache Flume is necessary, but a basic knowledge of Hadoop and the Hadoop File System (HDFS) is assumed.
In Detail
Apache Flume is a distributed, reliable, and available service used to efficiently collect, aggregate, and move large amounts of log data. It is used to stream logs from application servers to HDFS for ad hoc analysis.
This book starts with an architectural overview of Flume and its logical components. It explores channels, sinks, and sink processors, followed by sources and channels. By the end of this book, you will be fully equipped to construct a series of Flume agents to dynamically transport your stream data and logs from your systems into Hadoop.
A step-by-step book that guides you through the architecture and components of Flume covering different approaches, which are then pulled together as a real-world, end-to-end use case, gradually going from the simplest to the most advanced features.
Table of Contents
Chapter 1. Overview and Architecture
Chapter 2. A Quick Start Guide to Flume
Chapter 3. Channels
Chapter 4. Sinks and Sink Processors
Chapter 5. Sources and Channel Selectors
Chapter 6. Interceptors, ETL, and Routing
Chapter 7. Putting It All Together
Chapter 8. Monitoring Flume
Chapter 9. There Is No Spoon – the Realities of Real-time Distributed Data Collection