Web Scraping with Python
暫譯: 使用 Python 進行網頁爬蟲

Name: Web Scraping with Python
Price: 1102 TWD
Availability: OnlineOnly
Author: Richard Lawson
ISBN: 1782164367

Richard Lawson

出版商: Packt Publishing
出版日期: 2015-10-29
售價: $1,160
貴賓價: 9.5 折 $1,102
語言: 英文
頁數: 174
裝訂: Paperback
ISBN: 1782164367
ISBN-13: 9781782164364
相關分類: Web-crawler 網路爬蟲、Python、Web-crawler 網路爬蟲
相關翻譯: 用 Python 寫網絡爬蟲 (Web Scraping with Python) (簡中版)

海外代購書籍(需單獨結帳)

買這商品的人也買了...

~~$2,340~~ $2,223

Real 802.11 Security: Wi-Fi Protected Access and 802.11i (Paperback)
$2,052

Python Cookbook, 3/e (Paperback)
~~$680~~ $646

Deep Learning - Hardware Design

商品描述

Successfully scrape data from any website with the power of Python

About This Book

A hands-on guide to web scraping with real-life problems and solutions
Techniques to download and extract data from complex websites
Create a number of different web scrapers to extract information

Who This Book Is For

This book is aimed at developers who want to use web scraping for legitimate purposes. Prior programming experience with Python would be useful but not essential. Anyone with general knowledge of programming languages should be able to pick up the book and understand the principals involved.

What You Will Learn

Extract data from web pages with simple Python programming
Build a threaded crawler to process web pages in parallel
Follow links to crawl a website
Download cache to reduce bandwidth
Use multiple threads and processes to scrape faster
Learn how to parse JavaScript-dependent websites
Interact with forms and sessions
Solve CAPTCHAs on protected web pages
Discover how to track the state of a crawl

In Detail

The Internet contains the most useful set of data ever assembled, largely publicly accessible for free. However, this data is not easily reusable. It is embedded within the structure and style of websites and needs to be carefully extracted to be useful. Web scraping is becoming increasingly useful as a means to easily gather and make sense of the plethora of information available online. Using a simple language like Python, you can crawl the information out of complex websites using simple programming.

This book is the ultimate guide to using Python to scrape data from websites. In the early chapters it covers how to extract data from static web pages and how to use caching to manage the load on servers. After the basics we'll get our hands dirty with building a more sophisticated crawler with threads and more advanced topics. Learn step-by-step how to use Ajax URLs, employ the Firebug extension for monitoring, and indirectly scrape data. Discover more scraping nitty-gritties such as using the browser renderer, managing cookies, how to submit forms to extract data from complex websites protected by CAPTCHA, and so on. The book wraps up with how to create high-level scrapers with Scrapy libraries and implement what has been learned to real websites.

Style and approach

This book is a hands-on guide with real-life examples and solutions starting simple and then progressively becoming more complex. Each chapter in this book introduces a problem and then provides one or more possible solutions.

商品描述(中文翻譯)

使用 Python 成功從任何網站擷取數據

本書介紹

一本針對網頁擷取的實作指南，包含真實案例及解決方案

從複雜網站下載和提取數據的技術

創建多個不同的網頁擷取器以提取信息

本書適合誰閱讀

本書針對希望合法使用網頁擷取的開發人員。具備 Python 的程式設計經驗會有幫助，但並非必需。任何對程式語言有一般了解的人都應該能夠閱讀本書並理解相關原則。

您將學到什麼

使用簡單的 Python 程式設計從網頁中提取數據

構建一個多執行緒的爬蟲以並行處理網頁

跟隨鏈接爬取網站

下載快取以減少帶寬使用

使用多個執行緒和進程加快擷取速度

學習如何解析依賴 JavaScript 的網站

與表單和會話互動

解決受保護網頁上的 CAPTCHA

了解如何追蹤爬取的狀態

詳細內容

互聯網包含了有史以來最有用的數據集，並且大部分是免費公開可訪問的。然而，這些數據並不容易重用。它嵌入在網站的結構和樣式中，需要仔細提取才能有用。隨著網頁擷取作為一種輕鬆收集和理解在線海量信息的手段，變得越來越有用。使用像 Python 這樣的簡單語言，您可以通過簡單的程式設計從複雜的網站中爬取信息。

本書是使用 Python 從網站擷取數據的終極指南。在早期章節中，涵蓋了如何從靜態網頁中提取數據以及如何使用快取來管理伺服器的負載。在掌握基礎知識後，我們將深入探討構建更複雜的爬蟲，使用多執行緒和更高級的主題。逐步學習如何使用 Ajax URL，利用 Firebug 擴展進行監控，以及間接擷取數據。探索更多擷取的細節，例如使用瀏覽器渲染器、管理 Cookies、如何提交表單以從受 CAPTCHA 保護的複雜網站中提取數據等等。本書最後介紹如何使用 Scrapy 庫創建高級擷取器，並將所學應用於實際網站。