Data Science at the Command Line: Facing the Future with Time-Tested Tools (Paperback)

Jeroen Janssens

買這商品的人也買了...

商品描述

This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.

To get you started—whether you’re on Windows, OS X, or Linux—author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools.

Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line.

  • Obtain data from websites, APIs, databases, and spreadsheets
  • Perform scrub operations on plain text, CSV, HTML/XML, and JSON
  • Explore data, compute descriptive statistics, and create visualizations
  • Manage your data science workflow using Drake
  • Create reusable tools from one-liners and existing Python or R code
  • Parallelize and distribute data-intensive pipelines using GNU Parallel
  • Model data with dimensionality reduction, clustering, regression, and classification algorithms

商品描述(中文翻譯)

這本實用指南展示了命令行的靈活性如何幫助您成為一名更高效和有成效的數據科學家。您將學習如何結合小而強大的命令行工具,快速獲取、清理、探索和建模數據。

為了讓您入門,無論您使用的是Windows、OS X還是Linux,作者Jeroen Janssens介紹了Data Science Toolbox,這是一個易於安裝的虛擬環境,內含超過80個命令行工具。

發現為什麼命令行是一種靈活、可擴展的技術。即使您已經熟悉使用Python或R處理數據,通過利用命令行的威力,您的數據科學工作流程也將大大提升。

- 從網站、API、數據庫和電子表格獲取數據
- 對純文本、CSV、HTML/XML和JSON進行清理操作
- 探索數據,計算描述性統計量並創建可視化圖表
- 使用Drake管理您的數據科學工作流程
- 從一行命令或現有的Python或R代碼創建可重用的工具
- 使用GNU Parallel並行化和分發數據密集型流程
- 使用降維、聚類、回歸和分類算法對數據進行建模