Python Feature Engineering Cookbook

Soledad Galli

  • 出版商: Packt Publishing
  • 出版日期: 2020-01-22
  • 售價: $1,380
  • 貴賓價: 9.5$1,311
  • 語言: 英文
  • 頁數: 372
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1789806313
  • ISBN-13: 9781789806311
  • 相關分類: Python程式語言
  • 立即出貨 (庫存=1)

買這商品的人也買了...

商品描述

Key Features

  • Discover solutions for feature generation, feature extraction, and feature selection
  • Uncover the end-to-end feature engineering process across continuous, discrete, and unstructured datasets
  • Implement modern feature extraction techniques using Python's pandas, scikit-learn, SciPy and NumPy libraries

Book Description

Feature engineering is invaluable for developing and enriching your machine learning models. In this cookbook, you will work with the best tools to streamline your feature engineering pipelines and techniques and simplify and improve the quality of your code.

Using Python libraries such as pandas, scikit-learn, Featuretools, and Feature-engine, you'll learn how to work with both continuous and discrete datasets and be able to transform features from unstructured datasets. You will develop the skills necessary to select the best features as well as the most suitable extraction techniques. This book will cover Python recipes that will help you automate feature engineering to simplify complex processes. You'll also get to grips with different feature engineering strategies, such as the box-cox transform, power transform, and log transform across machine learning, reinforcement learning, and natural language processing (NLP) domains.

By the end of this book, you'll have discovered tips and practical solutions to all of your feature engineering problems.

What you will learn

  • Simplify your feature engineering pipelines with powerful Python packages
  • Get to grips with imputing missing values
  • Encode categorical variables with a wide set of techniques
  • Extract insights from text quickly and effortlessly
  • Develop features from transactional data and time series data
  • Derive new features by combining existing variables
  • Understand how to transform, discretize, and scale your variables
  • Create informative variables from date and time

Who this book is for

This book is for machine learning professionals, AI engineers, data scientists, and NLP and reinforcement learning engineers who want to optimize and enrich their machine learning models with the best features. Knowledge of machine learning and Python coding will assist you with understanding the concepts covered in this book.

商品描述(中文翻譯)

主要特點


  • 發現特徵生成、特徵提取和特徵選擇的解決方案

  • 了解連續、離散和非結構化數據集的端到端特徵工程過程

  • 使用Python的pandas、scikit-learn、SciPy和NumPy庫實現現代特徵提取技術

書籍描述

特徵工程對於開發和豐富機器學習模型至關重要。在這本食譜中,您將使用最佳工具來簡化特徵工程流程和技術,簡化和改進代碼的質量。

使用Python庫,如pandas、scikit-learn、Featuretools和Feature-engine,您將學習如何處理連續和離散數據集,並能夠從非結構化數據集中轉換特徵。您將開發選擇最佳特徵以及最適合的提取技術所需的技能。本書將涵蓋幫助您自動化特徵工程以簡化複雜流程的Python食譜。您還將掌握不同的特徵工程策略,例如box-cox轉換、power轉換和log轉換,涵蓋機器學習、強化學習和自然語言處理(NLP)領域。

通過閱讀本書,您將發現解決所有特徵工程問題的技巧和實用解決方案。

您將學到什麼


  • 使用強大的Python包簡化特徵工程流程

  • 掌握填補缺失值的方法

  • 使用多種技術對分類變量進行編碼

  • 快速輕鬆地從文本中提取洞察力

  • 從交易數據和時間序列數據中開發特徵

  • 通過結合現有變量來衍生新特徵

  • 了解如何轉換、離散化和縮放變量

  • 從日期和時間中創建有信息量的變量

本書適合對象

本書適合機器學習專業人士、AI工程師、數據科學家以及NLP和強化學習工程師,他們希望通過最佳特徵優化和豐富他們的機器學習模型。機器學習和Python編程的知識將有助於您理解本書中涵蓋的概念。

作者簡介

Soledad Galli is a lead data scientist with more than 10 years of experience in world-class academic institutions and renowned businesses. She has researched, developed, and put into production machine learning models for insurance claims, credit risk assessment, and fraud prevention. Soledad received a Data Science Leaders' award in 2018 and was named one of LinkedIn's voices in data science and analytics in 2019. She is passionate about enabling people to step into and excel in data science, which is why she mentors data scientists and speaks at data science meetings regularly. She also teaches online courses on machine learning in a prestigious Massive Open Online Course platform, which have reached more than 10,000 students worldwide.

作者簡介(中文翻譯)

Soledad Galli 是一位領先的資料科學家,擁有超過10年的經驗,曾在世界一流的學術機構和知名企業工作。她研究、開發並將機器學習模型應用於保險理賠、信用風險評估和詐騙預防。Soledad在2018年獲得了資料科學領袖獎,並在2019年被列為LinkedIn資料科學和分析領域的聲音之一。她熱衷於幫助人們進入並在資料科學領域取得卓越成就,因此她定期指導資料科學家並在資料科學會議上演講。她還在一個知名的大規模開放式網路課程平台上教授機器學習的線上課程,已經吸引了全球超過10,000名學生。

目錄大綱

  1. Foreseeing Variable Problems When Building ML Models
  2. Imputing Missing Data
  3. Encoding Categorical Variables
  4. Transforming Numerical Variables
  5. Performing Variable Discretisation
  6. Working with Outliers
  7. Deriving Features from Dates and Time Variables
  8. Performing Feature Scaling
  9. Applying Mathematical Computations to Features
  10. Creating Features with Transactional and Time Series Data
  11. Extracting Features from Text Variables

目錄大綱(中文翻譯)

預見建立機器學習模型時可能遇到的變數問題
填補缺失的資料
編碼類別變數
轉換數值變數
進行變數離散化
處理離群值
從日期和時間變數中提取特徵
進行特徵縮放
對特徵應用數學計算
使用交易和時間序列資料創建特徵
從文字變數中提取特徵