Scaling Python with Dask: From Data Science to Machine Learning (Paperback)
暫譯: 使用 Dask 擴展 Python:從資料科學到機器學習 (平裝本)

Karau, Holden, Kimmins, Mika

  • 出版商: O'Reilly
  • 出版日期: 2023-08-22
  • 定價: $2,800
  • 售價: 8.8$2,464
  • 語言: 英文
  • 頁數: 202
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1098119878
  • ISBN-13: 9781098119874
  • 相關分類: Python程式語言Machine LearningData Science
  • 立即出貨 (庫存 < 3)

買這商品的人也買了...

相關主題

商品描述

Modern systems contain multi-core CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source library for parallel computing provides APIs that make it easy to parallelize PyData libraries including NumPy, pandas, and scikit-learn.

Authors Holden Karau and Mika Kimmins show you how to use Dask computations in local systems and then scale to the cloud for heavier workloads. This practical book explains why Dask is popular among industry experts and academics and is used by organizations that include Walmart, Capital One, Harvard Medical School, and NASA.

With this book, you'll learn:

  • What Dask is, where you can use it, and how it compares with other tools
  • How to use Dask for batch data parallel processing
  • Key distributed system concepts for working with Dask
  • Methods for using Dask with higher-level APIs and building blocks
  • How to work with integrated libraries such as scikit-learn, pandas, and PyTorch
  • How to use Dask with GPUs

商品描述(中文翻譯)

現代系統包含多核心的 CPU 和 GPU,具備平行計算的潛力。然而,許多科學 Python 工具並未設計來利用這種平行性。透過這本簡短但全面的資源,資料科學家和 Python 程式設計師將學習如何使用 Dask 這個開源平行計算庫,該庫提供的 API 使得平行化 PyData 庫(包括 NumPy、pandas 和 scikit-learn)變得簡單。

作者 Holden Karau 和 Mika Kimmins 向您展示如何在本地系統中使用 Dask 計算,然後擴展到雲端以處理更重的工作負載。這本實用的書籍解釋了為什麼 Dask 在業界專家和學術界中受到歡迎,並且被包括 Walmart、Capital One、哈佛醫學院和 NASA 等組織所使用。

透過這本書,您將學到:

- Dask 是什麼,您可以在哪裡使用它,以及它與其他工具的比較
- 如何使用 Dask 進行批次資料的平行處理
- 與 Dask 一起工作的關鍵分散式系統概念
- 使用 Dask 的高階 API 和構建塊的方法
- 如何與整合的庫(如 scikit-learn、pandas 和 PyTorch)一起工作
- 如何將 Dask 與 GPU 一起使用

作者簡介

Holden Karau is a queer transgender Canadian, Apache Spark committer, Apache Software Foundation member, and an active open source contributor. As a software engineer, she's worked on a variety of distributed computing, search, and classification problems at Apple, Google, IBM, Alpine, Databricks, Foursquare, and Amazon. She graduated from the University of Waterloo with a bachelor of mathematics in computer science. Outside of software, she enjoys playing with fire, welding, riding scooters, eating poutine, and dancing.

Mika Kimmins is a data engineer, distributed systems researcher, and ML consultant. She worked on a variety of NLP, language modeling, reinforcement learning, and ML pipelining at scale as a Siri Data Engineer at Apple, an academic, and in not-for-profit engineering capacities. She is currently earning an MS in Engineering Science and an MBA from Harvard, and holds a BS in Computer Science and Mathematics from the University of Toronto. As a Korean-Canadian-American trans woman, Mika is active in data-driven advocacy for queer healthcare access, advises undergraduate Computer Science students, and attempts to keep her volunteer EMT courses current. Her hobbies include figure skating, aerial arts, and sewing.

作者簡介(中文翻譯)

霍爾登·卡勞(Holden Karau)是一位酷兒跨性別的加拿大人,Apache Spark 的提交者,Apache 軟體基金會的成員,以及活躍的開源貢獻者。作為一名軟體工程師,她曾在 Apple、Google、IBM、Alpine、Databricks、Foursquare 和 Amazon 等公司處理各種分散式計算、搜尋和分類問題。她畢業於滑鐵盧大學,獲得計算機科學的數學學士學位。在軟體之外,她喜歡玩火、焊接、騎滑板車、吃 poutine 和跳舞。

米卡·金敏斯(Mika Kimmins)是一名數據工程師、分散式系統研究員和機器學習顧問。她曾在 Apple 擔任 Siri 數據工程師,從事各種自然語言處理(NLP)、語言建模、強化學習和大規模機器學習管道的工作,並在學術界和非營利工程領域工作。她目前正在哈佛大學攻讀工程科學碩士學位和工商管理碩士學位,並持有多倫多大學的計算機科學和數學學士學位。作為一名韓裔加拿大裔美國跨性別女性,米卡積極參與以數據為驅動的酷兒醫療保健獲取倡導,指導本科計算機科學學生,並努力保持她的志願急救醫療技術課程的最新性。她的興趣包括花式滑冰、空中藝術和縫紉。