Text Mining Application Programming (Paperback)

Manu Konchady

  • 出版商: Charles River Media
  • 出版日期: 2006-05-04
  • 售價: $2,100
  • 貴賓價: 9.5$1,995
  • 語言: 英文
  • 頁數: 432
  • 裝訂: Paperback
  • ISBN: 1584504609
  • ISBN-13: 9781584504603
  • 相關分類: Text-mining
  • 立即出貨(限量) (庫存=1)

買這商品的人也買了...

相關主題

商品描述

Description

Text Mining Application Programming teaches software developers how to mine the vast amounts of information available on the Web, internal networks, and desktop files and turn it into usable data. The book helps developers understand the problems associated with managing unstructured text, and explains how to build your own mining tools using standard statistical methods from Information Theory, Artificial Intelligence, and Operations Research. Each of the topics covered are thoroughly explained and then a practical implementation is provided.

The book begins with a brief overview of text data, where it can be found, and the typical search engines and tools used to search and gather this text. It details how to build tools for extracting and using the text, and covers the mathematics behind many of the algorithms used in building these tools. From there you’ll learn how to build tokens from text, construct indexes, and detect patterns in text. You’ll also find methods to extract the names of people, places, and organizations from an email, a news article, or a web page. The next portion of the book teaches you how to find information on the Web, the structure of the Web, and building spiders to crawl the Web. Text categorization is also described in the context of managing email. The final part of the book covers information monitoring, summarization, and a simple Question & Answer (Q&A) system. The code used in the book is written in Perl, but knowledge of Perl is not necessary to run the software. Developers with an intermediate level of experience with Perl can customize the software. Although the book is about programming, methods are explained with English-like pseudocode and the source code is provided on the CD-ROM.

After reading this book you’ll be ready to tap into the bevy of information available online in ways you never thought possible.

 

Features

  • Teaches developers how to build text mining applications to manage vast amounts of text and turn it into useful data
  • Covers key topics such as information extraction, clustering, building spiders, text categorization, summarization, and natural language query systems
  • Shows step-by-step techniques for implementing text mining solutions, and provides customizable solutions

 

商品描述(中文翻譯)

描述

《文本挖掘應用程式設計》教導軟體開發人員如何從網路、內部網路和桌面檔案中挖掘大量的資訊並將其轉化為可用的數據。本書幫助開發人員了解處理非結構化文本所面臨的問題,並解釋如何使用信息理論、人工智能和運籌學的標準統計方法來建立自己的挖掘工具。每個主題都有詳細的解釋,並提供實際的實施方法。

本書首先簡要介紹了文本數據的概述,以及可以找到它的地方,以及用於搜索和收集這些文本的典型搜索引擎和工具。它詳細介紹了如何建立提取和使用文本的工具,並涵蓋了在構建這些工具中使用的許多算法背後的數學知識。然後,您將學習如何從文本中建立標記,構建索引並檢測文本中的模式。您還將找到從電子郵件、新聞文章或網頁中提取人名、地名和組織名稱的方法。本書的下一部分教您如何在網絡上尋找信息,了解網絡的結構並構建爬蟲程序來爬取網絡。文本分類也在管理電子郵件的上下文中進行描述。本書的最後一部分涵蓋了信息監控、摘要和簡單的問答系統。本書中使用的代碼是用Perl編寫的,但不需要了解Perl即可運行軟體。有中級Perl經驗的開發人員可以自定義軟體。儘管本書是關於編程的,但方法是用類似英語的偽代碼解釋的,並且源代碼提供在CD-ROM上。

閱讀本書後,您將能夠以您從未想過的方式利用網絡上提供的大量信息。

特點

教導開發人員如何建立文本挖掘應用程式來管理大量文本並將其轉化為有用的數據
涵蓋信息提取、聚類、構建爬蟲程序、文本分類、摘要和自然語言查詢系統等關鍵主題
展示了實施文本挖掘解決方案的逐步技術,並提供可自定義的解決方案