Machine Learning for Imbalanced Data: Tackle imbalanced datasets using machine learning and deep learning techniques
- Length: 344 pages
- Edition: 1
- Language: English
- Publisher: Packt Publishing
- Publication Date: 2023-11-30
- ISBN-10: 1801070830
- ISBN-13: 9781801070836
- Sales Rank: #0 (See Top 100 Books)
Raise your machine learning game and deal with imbalanced data using libraries, such as imbalanced-learn, PyTorch, scikit-learn, pandas, and NumPy, and squeeze better performance from machine learning models using this essential guide
Key Features
- The book is packed with detailed explanations, illustrations, and code samples using modern machine learning frameworks
- Learn cutting edge deep learning techniques to overcome data imbalance
- The book has a comprehensive coverage of methods for dealing with skewed data in ML and DL applications
Book Description
As machine learning practitioners, we often encounter imbalanced datasets in which one class has considerably fewer instances than the other. Many machine learning algorithms assume an equilibrium between majority and minority classes, leading to a suboptimal performance on imbalanced data. Addressing class imbalance is crucial for significantly improving model performance.
Machine Learning for Imbalanced Data begins by introducing the challenges posed by imbalanced datasets and the importance of addressing these issues. It then guides you through techniques that enhance performance on imbalanced data when using classical machine learning models, including various sampling and cost-sensitive learning methods.
As you progress, the book delves into similar and more advanced techniques for deep learning models, employing PyTorch as the primary framework. Throughout the book, hands-on examples provide working, reproducible code that demonstrates the practical implementation of each technique.
By the end of this book, you will be adept at identifying and addressing class imbalances, and confidently applying various techniques including sampling, cost-sensitive techniques, and threshold adjustment when using traditional machine learning or deep learning models.
What you will learn
- Effectively use imbalanced data in your ML models
- Explore the metrics used when classes are imbalanced
- Understand how and when to apply various sampling methods such as over-sampling and under-sampling
- Apply data-based, algorithm-based, and hybrid approaches for dealing with class imbalance
- Combine and choose from various options for data balancing while avoiding the common pitfalls
- Understand the concepts of model calibration and threshold adjustment in the context of dealing with imbalanced datasets
Who This Book Is For
This book is for machine learning practitioners, who want to effectively address the challenges of imbalanced datasets in their projects. Data scientists, machine learning engineers/scientists, research scientists/engineers, and data scientists/engineers will find this book helpful. Though complete beginners are welcome to read this book, some familiarity with core ML concepts will help readers maximize the benefits and insights gained from this comprehensive resource.
Table of Contents
- Introduction
- Oversampling Methods
- Under-sampling
- Ensembling Methods
- Cost-Sensitive Learning
- Deep Learning
- Data-level deep learning methods
- Algorithm-level deep learning methods
- Hybrid deep learning methods
- Imbalanced learning in production