Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining
暫譯: 文本數據管理與分析:信息檢索與文本挖掘的實用入門

Zhai, Chengxiang, Massung, Sean

  • 出版商: Morgan & Claypool
  • 出版日期: 2016-06-30
  • 售價: $4,140
  • 貴賓價: 9.5$3,933
  • 語言: 英文
  • 頁數: 530
  • 裝訂: Hardcover - also called cloth, retail trade, or trade
  • ISBN: 1970001194
  • ISBN-13: 9781970001198
  • 相關分類: Text-mining
  • 海外代購書籍(需單獨結帳)

商品描述

Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. This has led to an increasing demand for powerful software tools to help people analyze and manage vast amounts of text data effectively and efficiently. Unlike data generated by a computer system or sensors, text data are usually generated directly by humans, and are accompanied by semantically rich content. As such, text data are especially valuable for discovering knowledge about human opinions and preferences, in addition to many other kinds of knowledge that we encode in text. In contrast to structured data, which conform to well-defined schemas (thus are relatively easy for computers to handle), text has less explicit structure, requiring computer processing toward understanding of the content encoded in text. The current technology of natural language processing has not yet reached a point to enable a computer to precisely understand natural language text, but a wide range of statistical and heuristic approaches to analysis and management of text data have been developed over the past few decades. They are usually very robust and can be applied to analyze and manage text data in any natural language, and about any topic.

This book provides a systematic introduction to all these approaches, with an emphasis on covering the most useful knowledge and skills required to build a variety of practically useful text information systems. The focus is on text mining applications that can help users analyze patterns in text data to extract and reveal useful knowledge. Information retrieval systems, including search engines and recommender systems, are also covered as supporting technology for text mining applications. The book covers the major concepts, techniques, and ideas in text data mining and information retrieval from a practical viewpoint, and includes many hands-on exercises designed with a companion software toolkit (i.e., MeTA) to help readers learn how to apply techniques of text mining and information retrieval to real-world text data and how to experiment with and improve some of the algorithms for interesting application tasks. The book can be used as a textbook for a computer science undergraduate course or a reference book for practitioners working on relevant problems in analyzing and managing text data.

商品描述(中文翻譯)

近年來,自然語言文本數據的增長非常迅速,包括網頁、新聞文章、科學文獻、電子郵件、企業文件以及社交媒體,如部落格文章、論壇帖子、產品評論和推文。這導致了對強大軟體工具的需求日益增加,以幫助人們有效且高效地分析和管理大量文本數據。與由計算機系統或感測器生成的數據不同,文本數據通常是由人類直接生成的,並伴隨著語義豐富的內容。因此,文本數據對於發現有關人類意見和偏好的知識特別有價值,此外還有許多其他類型的知識我們以文本的形式編碼。與符合明確架構的結構化數據相比(因此相對容易被計算機處理),文本的結構不那麼明確,這需要計算機進行處理以理解文本中編碼的內容。目前的自然語言處理技術尚未達到能夠精確理解自然語言文本的程度,但在過去幾十年中,已經開發出多種統計和啟發式的方法來分析和管理文本數據。這些方法通常非常穩健,可以應用於分析和管理任何自然語言的文本數據,並且涵蓋任何主題。

本書系統地介紹了所有這些方法,重點在於涵蓋構建各種實用文本信息系統所需的最有用的知識和技能。重點是文本挖掘應用,這些應用可以幫助用戶分析文本數據中的模式,以提取和揭示有用的知識。信息檢索系統,包括搜索引擎和推薦系統,也作為文本挖掘應用的支持技術進行介紹。本書從實用的角度涵蓋了文本數據挖掘和信息檢索的主要概念、技術和思想,並包括許多設計有伴隨軟體工具包(即 MeTA)的實作練習,以幫助讀者學習如何將文本挖掘和信息檢索的技術應用於現實世界的文本數據,以及如何實驗和改進一些有趣應用任務的算法。本書可用作計算機科學本科課程的教科書或作為從事文本數據分析和管理相關問題的實務工作者的參考書。