Scaling Python with Dask: From Data Science to Machine Learning

Karau, Holden, Kimmins, Mika

  • 出版商: O'Reilly
  • 出版日期: 2023-08-22
  • 定價: $2,800
  • 售價: 9.0$2,520
  • 語言: 英文
  • 頁數: 202
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1098119878
  • ISBN-13: 9781098119874
  • 相關分類: Python程式語言Machine LearningData Science
  • 立即出貨 (庫存 < 4)

買這商品的人也買了...

商品描述

Modern systems contain multi-core CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source library for parallel computing provides APIs that make it easy to parallelize PyData libraries including NumPy, pandas, and scikit-learn.

Authors Holden Karau and Mika Kimmins show you how to use Dask computations in local systems and then scale to the cloud for heavier workloads. This practical book explains why Dask is popular among industry experts and academics and is used by organizations that include Walmart, Capital One, Harvard Medical School, and NASA.

With this book, you'll learn:

  • What Dask is, where you can use it, and how it compares with other tools
  • How to use Dask for batch data parallel processing
  • Key distributed system concepts for working with Dask
  • Methods for using Dask with higher-level APIs and building blocks
  • How to work with integrated libraries such as scikit-learn, pandas, and PyTorch
  • How to use Dask with GPUs

商品描述(中文翻譯)

現代系統包含具有並行計算潛力的多核心CPU和GPU。然而,許多科學Python工具並未設計用於利用這種並行性。這本簡短但全面的資源將讓數據科學家和Python程序員學習到,Dask開源庫提供了API,使得並行化PyData庫,包括NumPy、pandas和scikit-learn變得容易。

作者Holden Karau和Mika Kimmins向您展示如何在本地系統中使用Dask計算,然後擴展到雲端以處理更重的工作負載。這本實用的書籍解釋了為什麼Dask在行業專家和學術界中很受歡迎,並且被包括沃爾瑪、Capital One、哈佛醫學院和NASA在內的組織使用。

通過這本書,您將學到:

- Dask是什麼,您可以在哪裡使用它,以及它與其他工具的比較
- 如何使用Dask進行批量數據並行處理
- 與使用Dask相關的分佈式系統概念
- 使用更高級API和構建塊的方法
- 如何與scikit-learn、pandas和PyTorch等集成庫一起使用Dask
- 如何使用Dask與GPU一起工作

作者簡介

Holden Karau is a queer transgender Canadian, Apache Spark committer, Apache Software Foundation member, and an active open source contributor. As a software engineer, she's worked on a variety of distributed computing, search, and classification problems at Apple, Google, IBM, Alpine, Databricks, Foursquare, and Amazon. She graduated from the University of Waterloo with a bachelor of mathematics in computer science. Outside of software, she enjoys playing with fire, welding, riding scooters, eating poutine, and dancing.

Mika Kimmins is a data engineer, distributed systems researcher, and ML consultant. She worked on a variety of NLP, language modeling, reinforcement learning, and ML pipelining at scale as a Siri Data Engineer at Apple, an academic, and in not-for-profit engineering capacities. She is currently earning an MS in Engineering Science and an MBA from Harvard, and holds a BS in Computer Science and Mathematics from the University of Toronto. As a Korean-Canadian-American trans woman, Mika is active in data-driven advocacy for queer healthcare access, advises undergraduate Computer Science students, and attempts to keep her volunteer EMT courses current. Her hobbies include figure skating, aerial arts, and sewing.

作者簡介(中文翻譯)

Holden Karau 是一位加拿大跨性別的同志,也是 Apache Spark 的貢獻者、Apache Software Foundation 的成員,以及一位活躍的開源貢獻者。作為一位軟體工程師,她在蘋果、谷歌、IBM、Alpine、Databricks、Foursquare 和亞馬遜等公司從事過各種分散式計算、搜索和分類問題的工作。她畢業於滑鐵盧大學,獲得計算機科學的數學學士學位。在軟體以外的領域,她喜歡玩火、焊接、騎踏板車、吃加拿大薯條和跳舞。

Mika Kimmins 是一位資料工程師、分散式系統研究員和機器學習顧問。她在蘋果的 Siri 資料工程師、學術界和非營利工程職位上從事過各種自然語言處理、語言模型、強化學習和大規模機器學習流程的工作。她目前正在哈佛大學攻讀工程科學碩士和工商管理碩士學位,並擁有多倫多大學的計算機科學和數學學士學位。作為一位韓裔加拿大美國跨性別女性,Mika 積極參與以數據為基礎的倡議,推動同志醫療保健的可及性,並為本科計算機科學學生提供指導,努力保持志願急救課程的最新知識。她的愛好包括花式滑冰、空中藝術和縫紉。