R for Data Science Cookbook
暫譯: R 數據科學食譜
YuWei, Chiu (David Chiu)
- 出版商: Packt Publishing
- 出版日期: 2016-07-29
- 售價: $2,000
- 貴賓價: 9.5 折 $1,900
- 語言: 英文
- 頁數: 452
- 裝訂: Paperback
- ISBN: 178439081X
- ISBN-13: 9781784390815
-
相關分類:
R 語言、Data Science
-
相關翻譯:
數據科學:R語言實現 (簡中版)
相關主題
商品描述
Key Features
- Gain insight into how data scientists collect, process, analyze, and visualize data using some of the most popular R packages
- Understand how to apply useful data analysis techniques in R for real-world applications
- An easy-to-follow guide to make the life of data scientist easier with the problems faced while performing data analysis
Book Description
This cookbook offers a range of data analysis samples in simple and straightforward R code, providing step-by-step resources and time-saving methods to help you solve data problems efficiently.
The first section deals with how to create R functions to avoid the unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation is provided, illustrating how to use the “dplyr” and “data.table” packages to efficiently process larger data structures. We also focus on “ggplot2” and show you how to create advanced figures for data exploration.
In addition, you will learn how to build an interactive report using the “ggvis” package. Later chapters offer insight into time series analysis on financial data, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction.
By the end of this book, you will understand how to resolve issues and will be able to comfortably offer solutions to problems encountered while performing data analysis.
What you will learn
- Get to know the functional characteristics of R language
- Extract, transform, and load data from heterogeneous sources
- Understand how easily R can confront probability and statistics problems
- Get simple R instructions to quickly organize and manipulate large datasets
- Create professional data visualizations and interactive reports
- Predict user purchase behavior by adopting a classification approach
- Implement data mining techniques to discover items that are frequently purchased together
- Group similar text documents by using various clustering methods
About the Author
Yu-Wei, Chiu (David Chiu) is the founder of LargitData (www.LargitData.com), a startup company that mainly focuses on providing big data and machine learning products. He has previously worked for Trend Micro as a software engineer, where he was responsible for building big data platforms for business intelligence and customer relationship management systems. In addition to being a start-up entrepreneur and data scientist, he specializes in using Spark and Hadoop to process big data and apply data mining techniques for data analysis. Yu-Wei is also a professional lecturer and has delivered lectures on big data and machine learning in R and Python, and given tech talks at a variety of conferences.
In 2015, Yu-Wei wrote Machine Learning with R Cookbook, Packt Publishing. In 2013, Yu-Wei reviewed Bioinformatics with R Cookbook, Packt Publishing. For more information, visit his personal website at www.ywchiu.com.
Table of Contents
- Functions in R
- Data Extracting, Transforming, and Loading
- Data Preprocessing and Preparation
- Data Manipulation
- Visualizing Data with ggplot2
- Making Interactive Reports
- Simulation from Probability Distributions
- Statistical Inference in R
- Rule and Pattern Mining with R
- Time Series Mining with R
- Supervised Machine Learning
- Unsupervised Machine Learning
商品描述(中文翻譯)
**主要特點**
- 瞭解資料科學家如何使用一些最受歡迎的 R 套件來收集、處理、分析和視覺化資料
- 理解如何在 R 中應用有用的資料分析技術以應對實際應用
- 一個易於遵循的指南,幫助資料科學家解決在進行資料分析時所面臨的問題
**書籍描述**
本食譜提供了一系列簡單明瞭的 R 代碼資料分析範例,提供逐步資源和節省時間的方法,幫助您有效解決資料問題。
第一部分探討如何創建 R 函數以避免不必要的代碼重複。您將學習如何準備、處理和使用 R 套件對異構資料來源進行複雜的 ETL。提供了一個資料操作的範例,說明如何使用 “dplyr” 和 “data.table” 套件有效處理較大的資料結構。我們還專注於 “ggplot2”,並展示如何為資料探索創建高級圖形。
此外,您將學習如何使用 “ggvis” 套件建立互動報告。後面的章節提供了對金融資料的時間序列分析的見解,同時詳細介紹了熱門的機器學習主題,包括資料分類、回歸、聚類、關聯規則挖掘和降維。
在本書結束時,您將理解如何解決問題,並能夠自信地提供在進行資料分析時遇到的問題的解決方案。
**您將學到的內容**
- 瞭解 R 語言的功能特性
- 從異構來源提取、轉換和加載資料
- 理解 R 如何輕鬆應對機率和統計問題
- 獲得簡單的 R 指令以快速組織和操作大型資料集
- 創建專業的資料視覺化和互動報告
- 通過採用分類方法預測用戶購買行為
- 實施資料挖掘技術以發現經常一起購買的商品
- 使用各種聚類方法對相似的文本文件進行分組
**關於作者**
**邱宇威 (David Chiu)** 是 LargitData (www.LargitData.com) 的創辦人,這是一家主要專注於提供大數據和機器學習產品的初創公司。他曾在趨勢科技擔任軟體工程師,負責為商業智慧和客戶關係管理系統構建大數據平台。除了作為初創企業家和資料科學家外,他專注於使用 Spark 和 Hadoop 處理大數據並應用資料挖掘技術進行資料分析。宇威也是一位專業講師,曾在多個會議上講授 R 和 Python 中的大數據和機器學習。
在 2015 年,宇威撰寫了《Machine Learning with R Cookbook》,由 Packt Publishing 出版。在 2013 年,宇威審閱了《Bioinformatics with R Cookbook》,由 Packt Publishing 出版。欲了解更多資訊,請訪問他的個人網站 www.ywchiu.com。
**目錄**
1. R 中的函數
2. 資料提取、轉換和加載
3. 資料預處理和準備
4. 資料操作
5. 使用 ggplot2 視覺化資料
6. 創建互動報告
7. 機率分佈的模擬
8. R 中的統計推斷
9. 使用 R 的規則和模式挖掘
10. 使用 R 的時間序列挖掘
11. 監督式機器學習
12. 非監督式機器學習