Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark

Russell Jurney

買這商品的人也買了...

商品描述

Agile Data Science 2.0 covers the theory and practice of applying agile methods to the practice of applied analytics research called data science. The book takes the stance that data products are the preferred output format for data science teams to effect change in an organization. Accordingly, we show how to "get meta" to enable agility in building applications describing the applied research process itself. Then we show how to use big data tools to iteratively build, deploy and refine analytics applications. Tracking data-product development through the five stages of the "data value pyramid", we show you how to build applications from conception through development through deployment and then through iterative improvement. Application development is a fundamental skill for a data scientist, and by publishing your data science work as a web application, we show you how to effect maximal change within your organization.

Technologies covered include Python, Apache Spark (Spark MLlib, Spark Streaming), Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn and Apache Airflow. More important than any one technology, we show you how to compose a data platform to make you a productive application developer.

商品描述(中文翻譯)

《敏捷數據科學2.0》涵蓋了將敏捷方法應用於應用分析研究(即數據科學)的理論和實踐。本書認為,數據產品是數據科學團隊在組織中實現變革的首選輸出格式。因此,我們展示了如何“獲取元數據”,以實現在構建描述應用研究過程的應用程序時的敏捷性。然後,我們展示了如何使用大數據工具來迭代地構建、部署和改進分析應用程序。通過跟踪數據產品開發過程中的五個階段(即“數據價值金字塔”),我們向您展示了如何從構思、開發、部署到迭代改進來構建應用程序。應用程序開發是數據科學家的基本技能,通過將數據科學工作發布為 Web 應用程序,我們向您展示了如何在組織內實現最大的變革。

涵蓋的技術包括 Python、Apache Spark(Spark MLlib、Spark Streaming)、Apache Kafka、MongoDB、ElasticSearch、d3.js、scikit-learn 和 Apache Airflow。比任何一種技術更重要的是,我們向您展示了如何組合數據平台,使您成為高效的應用程序開發人員。