Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning

Valliappa Lakshmanan

買這商品的人也買了...

商品描述

Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build on top of the Google Cloud Platform (GCP). This hands-on guide shows developers entering the data science field how to implement an end-to-end data pipeline, using statistical and machine learning methods and tools on GCP. Through the course of the book, you’ll work through a sample business decision by employing a variety of data science approaches.

Follow along by implementing these statistical and machine learning solutions in your own project on GCP, and discover how this platform provides a transformative and more collaborative way of doing data science.

You’ll learn how to:

  • Automate and schedule data ingest, using an App Engine application
  • Create and populate a dashboard in Google Data Studio
  • Build a real-time analysis pipeline to carry out streaming analytics
  • Conduct interactive data exploration with Google BigQuery
  • Create a Bayesian model on a Cloud Dataproc cluster
  • Build a logistic regression machine-learning model with Spark
  • Compute time-aggregate features with a Cloud Dataflow pipeline
  • Create a high-performing prediction model with TensorFlow
  • Use your deployed model as a microservice you can access from both batch and real-time pipelines

商品描述(中文翻譯)

學習如何在Google Cloud Platform (GCP) 上應用複雜的統計和機器學習方法解決現實世界的問題是多麼容易。這本實踐指南向進入數據科學領域的開發人員展示了如何在GCP上實施端到端的數據流程,使用統計和機器學習方法和工具。在本書的過程中,您將通過採用各種數據科學方法來完成一個示例業務決策。

通過在GCP上實施這些統計和機器學習解決方案,並發現這個平台提供了一種轉型和更具協作性的數據科學方式。

您將學習如何:

- 使用App Engine應用程序自動化和安排數據輸入
- 在Google Data Studio中創建和填充儀表板
- 構建實時分析流程以進行流式分析
- 使用Google BigQuery進行交互式數據探索
- 在Cloud Dataproc集群上創建貝葉斯模型
- 使用Spark構建邏輯回歸機器學習模型
- 使用Cloud Dataflow流程計算時間聚合特徵
- 使用TensorFlow創建高性能預測模型
- 將部署的模型用作可以從批處理和實時流程訪問的微服務