Python Feature Engineering Cookbook : Over 70 recipes for creating, engineering, and transforming features to build machine learning models (Paperback)

Galli, Soledad

  • 出版商: Packt Publishing
  • 出版日期: 2022-10-31
  • 售價: $1,650
  • 貴賓價: 9.5$1,568
  • 語言: 英文
  • 頁數: 386
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1804611301
  • ISBN-13: 9781804611302
  • 相關分類: Python程式語言Machine Learning
  • 立即出貨 (庫存=1)

買這商品的人也買了...

商品描述

Create end-to-end, reproducible feature engineering pipelines that can be deployed into production using open-source Python libraries

Key Features

  • Learn and implement feature engineering best practices
  • Reinforce your learning with the help of multiple hands-on recipes
  • Build end-to-end feature engineering pipelines that are performant and reproducible

Book Description

Feature engineering, the process of transforming variables and creating features, albeit time-consuming, ensures that your machine learning models perform seamlessly. This second edition of Python Feature Engineering Cookbook will take the struggle out of feature engineering by showing you how to use open source Python libraries to accelerate the process via a plethora of practical, hands-on recipes.

This updated edition begins by addressing fundamental data challenges such as missing data and categorical values, before moving on to strategies for dealing with skewed distributions and outliers. The concluding chapters show you how to develop new features from various types of data, including text, time series, and relational databases. With the help of numerous open source Python libraries, you'll learn how to implement each feature engineering method in a performant, reproducible, and elegant manner.

By the end of this Python book, you will have the tools and expertise needed to confidently build end-to-end and reproducible feature engineering pipelines that can be deployed into production.

What you will learn

  • Impute missing data using various univariate and multivariate methods
  • Encode categorical variables with one-hot, ordinal, and count encoding
  • Handle highly cardinal categorical variables
  • Transform, discretize, and scale your variables
  • Create variables from date and time with pandas and Feature-engine
  • Combine variables into new features
  • Extract features from text as well as from transactional data with Featuretools
  • Create features from time series data with tsfresh

Who this book is for

This book is for machine learning and data science students and professionals, as well as software engineers working on machine learning model deployment, who want to learn more about how to transform their data and create new features to train machine learning models in a better way.

商品描述(中文翻譯)

創建端到端、可重現的特徵工程流程,並使用開源的Python庫將其部署到生產環境中。

主要特點:

- 學習並實施特徵工程的最佳實踐
- 通過多個實踐性的手把手示例加深學習
- 構建高效且可重現的端到端特徵工程流程

書籍描述:

特徵工程是將變量轉換和創建特徵的過程,儘管耗時,但可以確保機器學習模型的無縫運行。《Python特徵工程食譜》第二版將通過大量實用的實踐性手把手示例,向您展示如何使用開源的Python庫加速特徵工程過程,從而輕鬆解決特徵工程的困境。

本更新版首先解決了基本的數據挑戰,如缺失數據和類別值,然後介紹了處理偏斜分佈和異常值的策略。最後幾章將向您展示如何從各種類型的數據(包括文本、時間序列和關聯數據庫)中開發新的特徵。通過許多開源的Python庫的幫助,您將學習如何以高效、可重現和優雅的方式實施每種特徵工程方法。

通過閱讀本書,您將掌握構建端到端且可重現的特徵工程流程所需的工具和專業知識,並能將其部署到生產環境中。

您將學到什麼:

- 使用各種單變量和多變量方法填補缺失數據
- 使用獨熱編碼、有序編碼和計數編碼對類別變量進行編碼
- 處理高基數類別變量
- 轉換、離散化和縮放變量
- 使用pandas和Feature-engine從日期和時間中創建變量
- 將變量組合成新特徵
- 使用Featuretools從文本和交易數據中提取特徵
- 使用tsfresh從時間序列數據中創建特徵

本書適合對機器學習和數據科學感興趣的學生和專業人士,以及從事機器學習模型部署的軟件工程師,他們希望了解如何轉換數據並創建新特徵以更好地訓練機器學習模型。

目錄大綱

  1. Imputing Missing Data
  2. Encoding Categorical Variables
  3. Transforming Numerical Variables
  4. Performing Variable Discretization
  5. Working with Outliers
  6. Extracting Features from Date and Time
  7. Performing Feature Scaling
  8. Creating New Features
  9. Extracting Features from Relational Data with Featuretools
  10. Creating Features from Time Series with tsfresh
  11. Extracting Features from Text Variables

目錄大綱(中文翻譯)

- 補充缺失數據
- 編碼類別變數
- 轉換數值變數
- 進行變量離散化
- 處理異常值
- 從日期和時間中提取特徵
- 執行特徵縮放
- 創建新特徵
- 使用Featuretools從關聯數據中提取特徵
- 使用tsfresh從時間序列中創建特徵
- 從文本變量中提取特徵