Parallel Python with Dask: Perform distributed computing, concurrent programming and manage large dataset
Peters, Tim
相關主題
商品描述
Dask has revolutionized parallel computing for Python, empowering data scientists to accelerate their workflows. This comprehensive guide unravels the intricacies of Dask to help you harness its capabilities for machine learning and data analysis.
Across 10 chapters, you'll master Dask's fundamentals, architecture, and integration with Python's scientific computing ecosystem. Step-by-step tutorials demonstrate parallel mapping, task scheduling, and leveraging Dask arrays for NumPy workloads. You'll discover how Dask seamlessly scales Pandas, Scikit-Learn, PyTorch, and other libraries for large datasets.
Dedicated chapters explore scaling regression, classification, hyperparameter tuning, feature engineering, and more with clear examples. You'll also learn to tap into the power of GPUs with Dask, RAPIDS, and Google JAX for orders of magnitude speedups.
This book places special emphasis on practical use cases related to scalability and distributed computing. You'll learn Dask patterns for cluster computing, managing resources efficiently, and robust data pipelines. The advanced chapters on DaskML and deep learning showcase how to build scalable models with PyTorch and TensorFlow.
With this book, you'll gain practical skills to:
- Accelerate Python workloads with parallel mapping and task scheduling
- Speed up NumPy, Pandas, Scikit-Learn, PyTorch, and other libraries
- Build scalable machine learning pipelines for large datasets
- Leverage GPUs efficiently via Dask, RAPIDS and JAX
- Manage Dask clusters and workflows for distributed computing
- Streamline deep learning models with DaskML and DL frameworks
Packed with hands-on examples and expert insights, this book provides the complete toolkit to harness Dask's capabilities. It will empower Python programmers, data scientists, and machine learning engineers to achieve faster workflows and operationalize parallel computing.
- Introduction to Dask
- Dask Fundamentals
- Batch Data Parallel Processing with Dask
- Distributed Systems and Dask
- Advanced Dask: APIs and Building Blocks
- Dask with Pandas
- Dask with Scikit-learn
- Dask and PyTorch
- Dask with GPUs
- Scaling Machine Learning Projects with Dask
商品描述(中文翻譯)
解鎖並行Python的力量:Dask的完美學習指南,適合有抱負的數據科學家
Dask已經改變了Python的並行計算方式,使數據科學家能夠加速工作流程。這本全面的指南揭示了Dask的複雜性,幫助您利用其在機器學習和數據分析方面的能力。
在10章中,您將掌握Dask的基礎知識、架構以及與Python科學計算生態系統的整合。透過逐步教程,您將學習並行映射、任務調度以及利用Dask數組進行NumPy工作負載。您將發現Dask如何無縫擴展Pandas、Scikit-Learn、PyTorch和其他庫以處理大型數據集。
專門的章節探討了使用清晰的示例進行回歸、分類、超參數調整、特徵工程等方面的擴展。您還將學習如何通過Dask、RAPIDS和Google JAX利用GPU的強大性能提升。
本書特別強調與可擴展性和分布式計算相關的實際應用案例。您將學習Dask在集群計算、高效管理資源和強大數據管道方面的模式。關於DaskML和深度學習的高級章節展示了如何使用PyTorch和TensorFlow構建可擴展的模型。
通過本書,您將獲得以下實用技能:
- 使用並行映射和任務調度加速Python工作負載
- 加速NumPy、Pandas、Scikit-Learn、PyTorch和其他庫
- 為大型數據集構建可擴展的機器學習管道
- 通過Dask、RAPIDS和JAX高效利用GPU
- 管理Dask集群和分布式計算工作流程
- 使用DaskML和深度學習框架優化深度學習模型
這本書充滿了實際示例和專家見解,提供了完整的工具包,以利用Dask的能力。它將使Python程序員、數據科學家和機器學習工程師能夠實現更快的工作流程並實現並行計算。
目錄:
1. Dask簡介
2. Dask基礎知識
3. 使用Dask進行批量數據並行處理
4. 分佈式系統和Dask
5. 高級Dask:API和構建塊
6. 使用Dask和Pandas
7. 使用Dask和Scikit-learn
8. 使用Dask和PyTorch
9. 使用Dask和GPU
10. 使用Dask擴展機器學習項目