Semantic Similarity from Natural Language and Ontology Analysis

Sebastien Harispe, Sylvie Ranwez, Stefan Janaqi, Jacky Montmain

  • 出版商: Morgan & Claypool
  • 出版日期: 2015-05-01
  • 售價: $2,630
  • 貴賓價: 9.5$2,499
  • 語言: 英文
  • 頁數: 254
  • 裝訂: Paperback
  • ISBN: 1627054464
  • ISBN-13: 9781627054461
  • 海外代購書籍(需單獨結帳)

買這商品的人也買了...

商品描述

Artificial Intelligence federates numerous scientific fields in the aim of developing machines able to assist human operators performing complex treatments; most of which demand high cognitive skills (e.g. learning or decision processes). Central to this quest is to give machines the ability to estimate the likeness or similarity between things in the way human beings estimate the similarity between stimuli. In this context, this book focuses on semantic measures: approaches designed for comparing semantic entities such as units of language, e.g. words, sentences, or concepts and instances defined into knowledge bases. The aim of these measures is to assess the similarity or relatedness of such semantic entities by taking into account their semantics, i.e. their meaning; intuitively, the words tea and coffee, which both refer to stimulating beverage, will be estimated to be more semantically similar than the words toffee (confection) and coffee, despite that the last pair has a higher syntactic similarity. The two state-of-the-art approaches for estimating and quantifying semantic similarities/relatedness of semantic entities are presented in detail: the first one relies on corpora analysis and is based on Natural Language Processing techniques and semantic models while the second is based on more or less formal, computer-readable and workable forms of knowledge such as semantic networks, thesauri or ontologies. Semantic measures are widely used today to compare units of language, concepts, instances or even resources indexed by them (e.g., documents, genes). They are central elements of a large variety of Natural Language Processing applications and knowledge-based treatments, and have therefore naturally been subject to intensive and interdisciplinary research efforts during last decades. Beyond a simple inventory and categorization of existing measures, the aim of this monograph is to convey novices as well as researchers of these domains toward a better understanding of semantic similarity estimation and more generally semantic measures. To this end, we propose an in-depth characterization of existing proposals by discussing their features, the assumptions on which they are based and empirical results regarding their performance in particular applications. By answering these questions and by providing a detailed discussion on the foundations of semantic measures, our aim is to give the reader key knowledge required to: (i) select the more relevant methods according to a particular usage context, (ii) understand the challenges offered to this field of study, (iii) distinguish room of improvements for state-of-the-art approaches and (iv) stimulate creativity toward the development of new approaches. In this aim, several definitions, theoretical and practical details, as well as concrete applications are presented.

商品描述(中文翻譯)

人工智慧結合了許多科學領域,旨在開發能夠協助人類操作者進行複雜處理的機器;其中大部分需要高度的認知能力(例如學習或決策過程)。在這個探索中,給予機器估計事物相似性的能力是至關重要的,就像人類估計刺激之間的相似性一樣。在這個背景下,本書專注於語義度量:設計用於比較語義實體(例如語言單位,如詞語、句子或概念和知識庫中定義的實例)的方法。這些度量的目的是通過考慮它們的語義(即它們的含義)來評估這些語義實體的相似性或相關性;直觀地說,詞語「茶」和「咖啡」都指的是提神飲料,因此它們的語義相似性會被估計為比「太妃糖」(糖果)和「咖啡」更高,儘管後者具有更高的句法相似性。目前用於估計和量化語義實體的語義相似性/相關性的兩種最先進方法將被詳細介紹:第一種方法依賴於語料庫分析,基於自然語言處理技術和語義模型;而第二種方法則基於更或多或少正式的、可讀且可操作的知識形式,例如語義網絡、詞彙表或本體論。語義度量在今天被廣泛用於比較語言單位、概念、實例甚至是由它們索引的資源(例如文件、基因)。它們是各種自然語言處理應用和基於知識的處理的核心元素,因此在過去幾十年中,它們自然而然地成為了密集且跨學科的研究努力的主題。本專著的目標不僅僅是對現有度量進行簡單的清單和分類,更重要的是讓初學者和研究人員更好地理解語義相似性估計和更一般的語義度量。為此,我們提出了對現有方法的深入描述,討論了它們的特點、基於哪些假設以及在特定應用中的實際表現。通過回答這些問題並提供對語義度量基礎的詳細討論,我們的目標是給讀者提供所需的關鍵知識,以便:(i)根據特定的使用情境選擇更相關的方法,(ii)了解這個研究領域所面臨的挑戰,(iii)區分現有方法的改進空間,並(iv)激發對新方法發展的創造力。為此,我們提供了多個定義、理論和實踐細節,以及具體的應用案例。