Hands-On Web Scraping with Python - Second Edition: Extract quality data from the web using effective Python techniques

Chapagain, Anish

  • 出版商: Packt Publishing
  • 出版日期: 2023-10-06
  • 售價: $1,580
  • 貴賓價: 9.5$1,501
  • 語言: 英文
  • 頁數: 324
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1837636214
  • ISBN-13: 9781837636211
  • 相關分類: Python程式語言Web-crawler 網路爬蟲
  • 立即出貨 (庫存=1)

商品描述

Work through practical examples to unlock the full potential of web scraping with Python and gain valuable insights from high-quality data

Key Features

  • Build an initial portfolio of web scraping projects with detailed explanations
  • Grasp Python programming fundamentals related to web scraping and data extraction
  • Acquire skills to code web scrapers, store data in desired formats, and employ the data professionally
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

Web scraping is a powerful tool for extracting data from the web, but it can be daunting for those without a technical background. Designed for novices, this book will help you grasp the fundamentals of web scraping and Python programming, even if you have no prior experience.

Adopting a practical, hands-on approach, this updated edition of Hands-On Web Scraping with Python uses real-world examples and exercises to explain key concepts. Starting with an introduction to web scraping fundamentals and Python programming, you’ll cover a range of scraping techniques, including requests, lxml, pyquery, Scrapy, and Beautiful Soup. You’ll also get to grips with advanced topics such as secure web handling, web APIs, Selenium for web scraping, PDF extraction, regex, data analysis, EDA reports, visualization, and machine learning.

This book emphasizes the importance of learning by doing. Each chapter integrates examples that demonstrate practical techniques and related skills. By the end of this book, you’ll be equipped with the skills to extract data from websites, a solid understanding of web scraping and Python programming, and the confidence to use these skills in your projects for analysis, visualization, and information discovery.

What you will learn

  • Master web scraping techniques to extract data from real-world websites
  • Implement popular web scraping libraries such as requests, lxml, Scrapy, and pyquery
  • Develop advanced skills in web scraping, APIs, PDF extraction, regex, and machine learning
  • Analyze and visualize data with Pandas and Plotly
  • Develop a practical portfolio to demonstrate your web scraping skills
  • Understand best practices and ethical concerns in web scraping and data extraction

Who this book is for

This book is for beginners who want to learn web scraping and data extraction using Python. No prior programming knowledge is required, but a basic understanding of web-related concepts such as websites, browsers, and HTML is assumed. If you enjoy learning by doing and want to build a portfolio of web scraping projects and delve into data-related studies and application, then this book is tailored for your needs.

商品描述(中文翻譯)

透過實際範例,解鎖使用Python進行網頁爬蟲的全部潛力,並從高品質的數據中獲得有價值的洞察力。

主要特點:

- 以詳細解釋建立一個初始的網頁爬蟲專案組合
- 掌握與網頁爬蟲和數據提取相關的Python編程基礎
- 獲得編寫網頁爬蟲、以所需格式存儲數據並專業應用數據的技能
- 購買印刷版或Kindle電子書,即可獲得免費的PDF電子書

書籍描述:

網頁爬蟲是從網絡中提取數據的強大工具,但對於沒有技術背景的人來說可能會讓人望而卻步。本書專為初學者設計,即使您沒有任何經驗,也能幫助您掌握網頁爬蟲和Python編程的基礎知識。

本書採用實用的、實踐性的方法,使用真實世界的例子和練習來解釋關鍵概念。從介紹網頁爬蟲基礎和Python編程開始,您將學習一系列爬蟲技術,包括requests、lxml、pyquery、Scrapy和Beautiful Soup。您還將掌握高級主題,如安全的網頁處理、網頁API、用於網頁爬蟲的Selenium、PDF提取、正則表達式、數據分析、EDA報告、可視化和機器學習。

本書強調通過實踐學習的重要性。每章都融入了示例,演示實用技術和相關技能。通過閱讀本書,您將具備從網站提取數據的技能,對網頁爬蟲和Python編程有扎實的理解,並有信心在分析、可視化和信息發現等項目中應用這些技能。

您將學到什麼:

- 掌握從真實網站提取數據的網頁爬蟲技巧
- 實施流行的網頁爬蟲庫,如requests、lxml、Scrapy和pyquery
- 發展網頁爬蟲、API、PDF提取、正則表達式和機器學習的高級技能
- 使用Pandas和Plotly進行數據分析和可視化
- 建立一個實用的專案組合,展示您的網頁爬蟲技能
- 了解網頁爬蟲和數據提取的最佳實踐和道德關注點

本書適合對網頁爬蟲和使用Python進行數據提取有興趣的初學者。不需要先備的編程知識,但需要對網頁相關概念(如網站、瀏覽器和HTML)有基本的理解。如果您喜歡通過實踐學習,並希望建立一個網頁爬蟲專案組合,並深入研究和應用數據相關的學習,那麼本書將符合您的需求。

目錄大綱

  1. Web Scraping Fundamentals
  2. Python programming for Data and Web
  3. Searching and Processing Web Documents
  4. Scraping Using PyQuery, a jQuery-Like Library for Python
  5. Scraping the Web with Scrapy and Beautiful Soup
  6. Working with the Secure Web
  7. Data Extraction Using Web APIs
  8. Using Selenium to Scrape the Web
  9. Using Regular Expressions and PDFs
  10. Data Mining, Analysis, and Visualization
  11. Machine Learning and Web Scraping
  12. After Scraping – Next Steps and Data Analysis

目錄大綱(中文翻譯)

Web Scraping Fundamentals(網頁爬蟲基礎)
Python programming for Data and Web(Python程式設計與資料與網頁)
Searching and Processing Web Documents(搜尋與處理網頁文件)
Scraping Using PyQuery, a jQuery-Like Library for Python(使用PyQuery進行網頁爬蟲,一個類似jQuery的Python函式庫)
Scraping the Web with Scrapy and Beautiful Soup(使用Scrapy和Beautiful Soup進行網頁爬蟲)
Working with the Secure Web(處理安全網頁)
Data Extraction Using Web APIs(使用網頁API進行資料提取)
Using Selenium to Scrape the Web(使用Selenium進行網頁爬蟲)
Using Regular Expressions and PDFs(使用正則表達式和PDF)
Data Mining, Analysis, and Visualization(資料挖掘、分析和視覺化)
Machine Learning and Web Scraping(機器學習和網頁爬蟲)
After Scraping – Next Steps and Data Analysis(爬蟲後的下一步和資料分析)