Algorithms for Data Science

Brian Steele, John Chandler, Swarna Reddy

  • 出版商: Springer
  • 出版日期: 2016-12-27
  • 售價: $3,980
  • 貴賓價: 9.5$3,781
  • 語言: 英文
  • 頁數: 430
  • 裝訂: Hardcover
  • ISBN: 3319457950
  • ISBN-13: 9783319457956
  • 相關分類: Algorithms-data-structuresData Science
  • 立即出貨 (庫存=1)

買這商品的人也買了...

商品描述

This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses.
 
This book has three parts:
(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.
(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.
(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.
This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.
 
 

商品描述(中文翻譯)

這本關於實用數據分析的教科書結合了基本原理、算法和數據。算法是數據分析的關鍵,也是本教科書的重點。對數學和統計基礎的清晰直觀的解釋使算法變得透明。但是實際的數據分析需要更多的東西。問題和數據變化非常大,只有最基本的算法才能在不修改的情況下使用。具備編程能力和處理真實且具有挑戰性數據的經驗是不可或缺的,因此讀者將深入學習Python和R以及真實數據分析。通過本書,讀者將獲得適應新問題並進行創新分析的能力。

本書分為三個部分:
(a) 數據降維:從數據降維、數據映射和信息提取的概念開始。第二章介紹了關聯統計學,這是可擴展算法和分布式計算的數學基礎。分布式計算的實際方面是Hadoop和MapReduce章節的主題。
(b) 從數據中提取信息:線性回歸和數據可視化是第二部分的主要主題。作者專門為實用數據分析的關鍵領域——醫療保健分析,撰寫了一章作為延伸示例。這些算法和分析對於希望利用美國疾病控制和預防中心的行為風險因素監測系統的大型且難以處理的數據集的從業人員非常有興趣。
(c) 預測分析:詳細介紹了兩個基礎且廣泛使用的算法——k最近鄰和朴素貝葉斯。其中一章專門討論預測。最後一章聚焦於流數據,並使用來自Twitter API和納斯達克股市的公開可訪問的數據流進行教學。
本書適用於數學、統計和計算機科學本科和研究生的一學期或兩學期的數據分析課程。先備知識要求較低,只需具備一兩門概率或統計課程、向量和矩陣的基礎知識以及一門編程課程的學生不會有困難。每章的核心內容對具備這些先備知識的人都是可理解的。每章末尾通常會擴展一些對數據科學從業人員有興趣的創新內容。每章包含不同難度的練習題。本書非常適合自學,也是從業人員的一個優秀資源。