Data Wrangling with Python: Simplify your ETL processes with these hands-on data sanitation tips, tricks and best practices
Tirthajyoti Sarkar, Shubhadeep Roychowdhury
- 出版商: Packt Publishing
- 出版日期: 2019-02-28
- 售價: $1,810
- 貴賓價: 9.5 折 $1,720
- 語言: 英文
- 頁數: 460
- 裝訂: Paperback
- ISBN: 1789800110
- ISBN-13: 9781789800111
-
相關分類:
Python、程式語言
-
相關翻譯:
Python數據整理 (簡中版)
相關主題
商品描述
Data is the new oil but it comes as crude, just like oil. To do anything meaningful - modeling, visualization, machine learning, for predictive analysis - you first need to wrestle and wrangle with data. This book teaches the essential basics of data wrangling using Python.
Key Features
- Focuses on essential basics of wrangling to get you up and running with analysis in no time
- Teaches the tricks and know-how of "how to solve data wrangling problems"
- Added bonus topics - random data generation, data integrity checks
Book Description
To practice high-quality science with data, first you need to make sure it is properly sourced, cleaned, formatted, and pre-processed. This book teaches you the most essential basics of this invaluable component of the data science pipeline - data wrangling.
What you will learn
- Able to manipulate complex and simple data structure using Python and it's built-in functions
- Use the fundamental and advanced level of Pandas DataFrames and numpy.array
- Manipulate them at run time
- Extract and format data from various formats (textual) - normal text file, SQL, CSV, Excel, JSON, and XML
- Perform web scraping using Python libraries such as BeautifulSoup4 and html5lib
- Perform advanced string search and manipulation using Python and RegEX
- Handle outliers, apply advanced programming tricks, and perform data imputation using Pandas
- Basic descriptive statistics and plotting techniques in Python for quick examination of data
- Practice data wrangling and modeling using the random data generation techniques
Who This Book Is For
Software professionals, web developers, database engineers, and business analysts who want to movetowards a career of full-fledged data scientist/analytics expert or whoever wants to use data analytics/machine learning to enrich their current personal or professional projects.Prior experience with Python is not an absolute requirement, however the knowledge of at least oneobject-oriented programming language (e.g. C/C++/Java/JavaScript), and high school level math is highlypreferred. It is a bonus if you have rudimentary idea about relational database and SQL.Even seasoned Python app/web developers can benefit from this book as it focuses on data engineering aspects
商品描述(中文翻譯)
資料是新的石油,但它的形態就像原油一樣。要進行任何有意義的工作——建模、視覺化、機器學習、預測分析——你首先需要與資料進行搏鬥和整理。本書教你使用 Python 進行資料整理的基本知識。
主要特色
- 專注於資料整理的基本知識,讓你迅速上手進行分析
- 教授解決資料整理問題的技巧和方法
- 附加主題 - 隨機資料生成、資料完整性檢查
書籍描述
要用資料進行高品質的科學實踐,首先需要確保資料來源正確、清理乾淨、格式正確並進行預處理。本書教你這個資料科學流程中不可或缺的組成部分——資料整理的最基本知識。
你將學到的內容
- 能夠使用 Python 及其內建函數操作複雜和簡單的資料結構
- 使用 Pandas DataFrames 和 numpy.array 的基本和進階功能
- 在運行時操作資料
- 從各種格式(文本)中提取和格式化資料 - 普通文本檔、SQL、CSV、Excel、JSON 和 XML
- 使用 Python 函式庫如 BeautifulSoup4 和 html5lib 進行網頁爬蟲
- 使用 Python 和 RegEX 進行進階字串搜尋和操作
- 處理異常值,應用進階編程技巧,並使用 Pandas 進行資料插補
- 在 Python 中進行基本的描述性統計和繪圖技術,以快速檢查資料
- 使用隨機資料生成技術進行資料整理和建模的實踐
本書適合對象
本書適合希望朝著全職資料科學家/分析專家職業發展的軟體專業人員、網頁開發者、資料庫工程師和商業分析師,或任何希望利用資料分析/機器學習來豐富其當前個人或專業專案的人。雖然對 Python 的先前經驗不是絕對必要,但至少了解一種物件導向程式語言(例如 C/C++/Java/JavaScript)和高中數學知識是非常受歡迎的。如果你對關聯資料庫和 SQL 有基本了解,那將是加分項。即使是經驗豐富的 Python 應用程式/網頁開發者也能從本書中受益,因為它專注於資料工程的各個方面。