Building Machine Learning Powered Applications: Going from Idea to Product

Ameisen, Emmanuel

買這商品的人也買了...

商品描述

Learn the skills necessary to design, build, and deploy applications powered by machine learning. Through the course of this hands-on book, you'll build an example ML-driven application from initial idea to deployed product. Data scientists, software engineers, and product managers with little or no ML experience will learn the tools, best practices, and challenges involved in building a real-world ML application step-by-step.

Author Emmanuel Ameisen, who worked as a data scientist at Zipcar and led Insight Data Science's AI program, demonstrates key ML concepts with code snippets, illustrations, and screenshots from the book's example application.

The first part of this guide shows you how to plan and measure success for an ML application. Part II shows you how to build a working ML model, and Part III explains how to improve the model until it fulfills your original vision. Part IV covers deployment and monitoring strategies.

This book will help you:

  • Determine your product goal and set up a machine learning problem
  • Build your first end-to-end pipeline quickly and acquire an initial dataset
  • Train and evaluate your ML model and address performance bottlenecks
  • Deploy and monitor models in a production environment

商品描述(中文翻譯)

學習設計、建構和部署由機器學習驅動的應用所需的技能。通過這本實踐性的書籍,您將從最初的想法到部署的產品,建立一個示例的機器學習驅動應用。沒有或只有很少機器學習經驗的數據科學家、軟體工程師和產品經理將逐步學習建立真實世界機器學習應用所涉及的工具、最佳實踐和挑戰。

作者Emmanuel Ameisen曾在Zipcar擔任數據科學家,並領導Insight Data Science的人工智慧計劃,他通過代碼片段、插圖和書中示例應用的屏幕截圖演示了關鍵的機器學習概念。

本指南的第一部分向您展示如何計劃和衡量機器學習應用的成功。第二部分向您展示如何建立一個可工作的機器學習模型,第三部分解釋如何改進模型,直到實現您最初的願景。第四部分涵蓋部署和監控策略。

本書將幫助您:

- 確定產品目標並設置機器學習問題
- 快速建立您的第一個端到端流程並獲取初始數據集
- 訓練和評估您的機器學習模型並解決性能瓶頸
- 在生產環境中部署和監控模型

作者簡介

Emmanuel Ameisen has worked for years as a Data Scientist. He implemented and deployed predictive analytics and machine learning solutions for Local Motion and Zipcar. Recently, Emmanuel has led Insight Data Science's AI program where he oversaw more than a hundred machine learning projects. Emmanuel holds graduate degrees in artificial intelligence, computer engineering, and management from three of France's top schools.

作者簡介(中文翻譯)

Emmanuel Ameisen多年來一直擔任數據科學家的職位。他曾為Local Motion和Zipcar實施並部署預測分析和機器學習解決方案。最近,Emmanuel領導了Insight Data Science的AI計劃,監督了100多個機器學習項目。Emmanuel擁有法國三所頂尖學校的人工智能、計算機工程和管理碩士學位。

目錄大綱

How to Contact Us
Acknowledgments
I. Find the Correct ML Approach
1. From Product Goal to ML Framing
Estimate What Is Possible
Models
Data
Framing the ML Editor
Trying to Do It All with ML: An End-to-End Framework
The Simplest Approach: Being the Algorithm
Middle Ground: Learning from Our Experience
Monica Rogati: How to Choose and Prioritize ML Projects
Conclusion

2. Create a Plan
Measuring Success
Business Performance
Model Performance
Freshness and Distribution Shift
Speed
Estimate Scope and Challenges
Leverage Domain Expertise
Stand on the Shoulders of Giants
ML Editor Planning
Initial Plan for an Editor
Always Start with a Simple Model
To Make Regular Progress: Start Simple
Start with a Simple Pipeline
Pipeline for the ML Editor
Conclusion

II. Build a Working Pipeline
3. Build Your First End-to-End Pipeline
The Simplest Scaffolding
Prototype of an ML Editor
Parse and Clean Data
Tokenizing Text
Generating Features
Test Your Workflow
User Experience
Modeling Results
ML Editor Prototype Evaluation
Model
User Experience
Conclusion

4. Acquire an Initial Dataset
Iterate on Datasets
Do Data Science
Explore Your First Dataset
Be Efficient, Start Small
Insights Versus Products
A Data Quality Rubric
Label to Find Data Trends
Summary Statistics
Explore and Label Efficiently
Be the Algorithm
Data Trends
Let Data Inform Features and Models
Build Features Out of Patterns
ML Editor Features
Robert Munro: How Do You Find, Label, and Leverage Data?
Conclusion

III. Iterate on Models
5. Train and Evaluate Your Model
The Simplest Appropriate Model
Simple Models
From Patterns to Models
Split Your Dataset
ML Editor Data Split
Judge Performance
Evaluate Your Model: Look Beyond Accuracy
Contrast Data and Predictions
Confusion Matrix
ROC Curve
Calibration Curve
Dimensionality Reduction for Errors
The Top-k Method
Other Models
Evaluate Feature Importance
Directly from a Classifier
Black-Box Explainers
Conclusion

6. Debug Your ML Problems
Software Best Practices
ML-Specific Best Practices
Debug Wiring: Visualizing and Testing
Start with One Example
Test Your ML Code
Debug Training: Make Your Model Learn
Task Difficulty
Optimization Problems
Debug Generalization: Make Your Model Useful
Data Leakage
Overfitting
Consider the Task at Hand
Conclusion

7. Using Classifiers for Writing Recommendations
Extracting Recommendations from Models
What Can We Achieve Without a Model?
Extracting Global Feature Importance
Using a Model’s Score
Extracting Local Feature Importance
Comparing Models
Version 1: The Report Card
Version 2: More Powerful, More Unclear
Version 3: Understandable Recommendations
Generating Editing Recommendations
Conclusion

IV. Deploy and Monitor
8. Considerations When Deploying Models
Data Concerns
Data Ownership
Data Bias
Systemic Bias
Modeling Concerns
Feedback Loops
Inclusive Model Performance
Considering Context
Adversaries
Abuse Concerns and Dual-Use
Chris Harland: Shipping Experiments
Conclusion

9. Choose Your Deployment Option
Server-Side Deployment
Streaming Application or API
Batch Predictions
Client-Side Deployment
On Device
Browser Side
Federated Learning: A Hybrid Approach
Conclusion

10. Build Safeguards for Models
Engineer Around Failures
Input and Output Checks
Model Failure Fallbacks
Engineer for Performance
Scale to Multiple Users
Model and Data Life Cycle Management
Data Processing and DAGs
Ask for Feedback
Chris Moody: Empowering Data Scientists to Deploy Models
Conclusion

11. Monitor and Update Models
Monitoring Saves Lives
Monitoring to Inform Refresh Rate
Monitor to Detect Abuse
Choose What to Monitor
Performance Metrics
Business Metrics
CI/CD for ML
A/B Testing and Experimentation
Other Approaches
Conclusion
Index

目錄大綱(中文翻譯)

如何聯繫我們
致謝
I. 找到正確的機器學習方法
1. 從產品目標到機器學習框架
估計可能性
模型
數據
構建機器學習編輯器
試圖用機器學習做一切:端到端框架
最簡單的方法:成為算法
中間地帶:從經驗中學習
Monica Rogati:如何選擇和優先處理機器學習項目
結論

2. 建立計劃
衡量成功
業務績效
模型性能
新鮮度和分佈變化
速度
估計範圍和挑戰
利用領域專業知識
站在巨人的肩膀上
機器學習編輯器計劃
編輯器的初始計劃
始終從簡單模型開始
為了取得持續進展:從簡單開始
從簡單的流程開始
機器學習編輯器的流程
結論

II. 構建工作流程
3. 構建第一個端到端工作流程
最簡單的腳手架
機器學習編輯器的原型
解析和清理數據
分詞文本
生成特徵
測試工作流程
用戶體驗
建模結果
機器學習編輯器原型評估
模型
用戶體驗
結論

4. 獲取初始數據集
迭代數據集
進行數據科學
探索第一個數據集
高效起步,從小開始
見解與產品
數據質量評估標準
標記以發現數據趨勢
摘要統計
高效探索和標記
成為算法
數據趨勢
讓數據指導特徵和模型
從模式中構建特徵
機器學習編輯器特徵
Robert Munro:如何找到、標記和利用數據?
結論

III. 模型迭代
5. 訓練和評估模型
最簡單適用的模型
簡單模型
從模式到模型
拆分數據集
機器學習編輯器數據拆分
評估性能
評估模型:超越準確性
對比數據和預測
混淆矩陣
ROC曲線
校準曲線
錯誤的降維
前k個方法
其他模型
評估特徵重要性
直接從分類器中獲取
黑盒解釋器
結論

6. 調試機器學習問題
軟件最佳實踐
機器學習特定的最佳實踐
調試連接:可視化和測試
從一個例子開始
測試機器學習代碼
調試訓練:使模型學習
任務難度
優化問題
調試泛化:使模型有用
數據泄漏
過度擬合
考慮手頭的任務
結論

7. 使用分類器進行寫作建議
從模型中提取建議
在沒有模型的情況下可以實現什麼?
提取全局特徵重要性
使用模型的分數
提取局部特徵重要性
比較模型
版本1:成績單
版本2:更強大,更不清晰
版本3:易於理解的建議
生成編輯建議
結論

IV. 部署和監控
8. 部署模型時的考慮事項
數據問題
數據所有權
數據偏見
系統性偏見
建模問題
反饋循環
包容性模型性能
考慮上下文
對手
濫用問題和雙重使用
Chris Harland:運送實驗
結論

9. 選擇部署選項
服務