Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools

Janssens, Jeroen

  • 出版商: O'Reilly
  • 出版日期: 2021-09-21
  • 定價: $2,200
  • 售價: 8.0$1,760
  • 語言: 英文
  • 頁數: 250
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1492087912
  • ISBN-13: 9781492087915
  • 相關分類: Command LineData Science
  • 立即出貨 (庫存=1)

商品描述

This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools--useful whether you work with Windows, macOS, or Linux.

You'll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you're comfortable processing data with Python or R, you'll learn how to greatly improve your data science workflow by leveraging the command line's power. This book is ideal for data scientists, analysts, engineers, system administrators, and researchers.

  • Obtain data from websites, APIs, databases, and spreadsheets
  • Perform scrub operations on text, CSV, HTML, XML, and JSON files
  • Explore data, compute descriptive statistics, and create visualizations
  • Manage your data science workflow
  • Create your own tools from one-liners and existing Python or R code
  • Parallelize and distribute data-intensive pipelines
  • Model data with dimensionality reduction, regression, and classification algorithms
  • Leverage the command line from Python, Jupyter, R, RStudio, and Apache Spark

商品描述(中文翻譯)

這本經過全面修訂的指南展示了命令行的靈活性如何幫助您成為一名更高效和有成效的資料科學家。您將學習如何結合小而強大的命令行工具,快速獲取、清理、探索和建模您的數據。為了讓您入門,作者Jeroen Janssens提供了一個Docker映像,其中包含100多個Unix強大工具,無論您使用Windows、macOS還是Linux都很有用。

您將很快發現為什麼命令行是一種靈活、可擴展的技術。即使您熟悉使用Python或R處理數據,您也將學習如何通過利用命令行的功能大大改進您的數據科學工作流程。這本書非常適合資料科學家、分析師、工程師、系統管理員和研究人員。

- 從網站、API、數據庫和試算表獲取數據
- 對文本、CSV、HTML、XML和JSON文件進行清理操作
- 探索數據,計算描述性統計量並創建可視化圖表
- 管理您的數據科學工作流程
- 從一行命令和現有的Python或R代碼創建自己的工具
- 並行化和分佈式數據密集型流程
- 使用降維、回歸和分類算法對數據進行建模
- 從Python、Jupyter、R、RStudio和Apache Spark中利用命令行工具

作者簡介

Jeroen Janssens teaches data science; often through training and coaching, occasionally through speaking, and infrequently through writing. His interests include visualizing data, building machine learning models, and automating things using either Python, R, or Bash. He is the author of Data Science at the Command Line, published by O'Reilly Media. Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University. Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and various startups in New York City. Currently, Jeroen is the CEO of Data Science Workshops, which organises open enrollment workshops, in-company courses, inspiration sessions, hackathons, and meetups. All related to data science of course. He lives with his wife and two kids in Rotterdam, the Netherlands.

作者簡介(中文翻譯)

Jeroen Janssens 是一位教授資料科學的人,通常透過培訓和指導來教授,偶爾透過演講,並偶爾透過寫作。他的興趣包括視覺化數據、建立機器學習模型,並使用Python、R或Bash來自動化事物。他是O'Reilly Media出版的《命令列中的資料科學》一書的作者。Jeroen擁有荷蘭蒂爾堡大學的機器學習博士學位和馬斯特里赫特大學的人工智慧碩士學位。在此之前,他曾是Jheronimus Academy of Data Science的助理教授,以及阿姆斯特丹的Elsevier和紐約市的各個初創公司的資料科學家。目前,Jeroen是Data Science Workshops的首席執行官,該公司組織開放報名的工作坊、企業內部課程、靈感會議、黑客松和聚會,所有這些都與資料科學相關。他與妻子和兩個孩子居住在荷蘭的鹿特丹市。