Machine Learning Methods for Stylometry: Authorship Attribution and Author Profiling
暫譯: 風格學的機器學習方法:作者歸屬與作者特徵分析
Savoy, Jacques
- 出版商: Springer
- 出版日期: 2020-09-29
- 售價: $6,780
- 貴賓價: 9.5 折 $6,441
- 語言: 英文
- 頁數: 286
- 裝訂: Hardcover - also called cloth, retail trade, or trade
- ISBN: 303053359X
- ISBN-13: 9783030533595
-
相關分類:
Machine Learning
海外代購書籍(需單獨結帳)
相關主題
商品描述
This book presents methods and approaches used to identify the true author of a doubtful document or text excerpt. It provides a broad introduction to all text categorization problems (like authorship attribution, psychological traits of the author, detecting fake news, etc.) grounded in stylistic features. Specifically, machine learning models as valuable tools for verifying hypotheses or revealing significant patterns hidden in datasets are presented in detail. Stylometry is a multi-disciplinary field combining linguistics with both statistics and computer science.
The content is divided into three parts. The first, which consists of the first three chapters, offers a general introduction to stylometry, its potential applications and limitations. Further, it introduces the ongoing example used to illustrate the concepts discussed throughout the remainder of the book. The four chapters of the second part are more devoted to computer science with a focus on machine learning models. Their main aim is to explain machine learning models for solving stylometric problems. Several general strategies used to identify, extract, select, and represent stylistic markers are explained. As deep learning represents an active field of research, information on neural network models and word embeddings applied to stylometry is provided, as well as a general introduction to the deep learning approach to solving stylometric questions. In turn, the third part illustrates the application of the previously discussed approaches in real cases: an authorship attribution problem, seeking to discover the secret hand behind the nom de plume Elena Ferrante, an Italian writer known worldwide for her My Brilliant Friend's saga; author profiling in order to identify whether a set of tweets were generated by a bot or a human being and in this second case, whether it is a man or a woman; and an exploration of stylistic variations over time using US political speeches covering a period of ca. 230 years.
A solutions-based approach is adopted throughout the book, and explanations are supported by examples written in R. To complement the main content and discussions on stylometric models and techniques, examples and datasets are freely available at the author's Github website.商品描述(中文翻譯)
本書介紹了用於識別可疑文件或文本摘錄的真實作者的方法和途徑。它對所有文本分類問題(如作者歸屬、作者的心理特徵、檢測假新聞等)提供了廣泛的介紹,這些問題都基於風格特徵。具體而言,詳細介紹了機器學習模型作為驗證假設或揭示數據集中隱藏的重要模式的有價值工具。風格計量學是一個多學科領域,結合了語言學、統計學和計算機科學。
內容分為三個部分。第一部分由前三章組成,提供了對風格計量學的概述、其潛在應用和局限性。此外,它介紹了用於說明本書其餘部分所討論概念的持續示例。第二部分的四章更專注於計算機科學,重點是機器學習模型。其主要目的是解釋用於解決風格計量問題的機器學習模型。解釋了幾種用於識別、提取、選擇和表示風格標記的一般策略。由於深度學習是一個活躍的研究領域,因此提供了應用於風格計量的神經網絡模型和詞嵌入的信息,以及對解決風格計量問題的深度學習方法的概述。第三部分則說明了前面討論的方法在實際案例中的應用:一個作者歸屬問題,旨在揭示以筆名 Elena Ferrante 為名的意大利作家的秘密身份,她因《我的天才女友》系列而聞名於世;作者分析以識別一組推文是由機器人還是人類生成,在第二種情況下,還要判斷是男性還是女性;以及使用美國政治演講探索隨時間變化的風格變異,涵蓋約230年的時間。
本書採用基於解決方案的方法,並通過用 R 語言編寫的示例來支持解釋。為了補充主要內容和對風格計量模型及技術的討論,示例和數據集可在作者的 GitHub 網站上免費獲得。
作者簡介
Jacques Savoy is a Full Professor of Computer Science at the University of Neuchatel (Switzerland). His research interests mainly include natural language processing and particularly information retrieval for languages other than English (European, Asian, and Indian) as well as multilingual and cross-lingual information retrieval. For many years he has participated in various evaluations campaigns (TREC, CLEF, NTCIR, FIRE) dealing with these questions. His current research interests focus on the statistical modeling and evaluation of natural language processing such as text clustering and categorization, as well as authorship attribution.
作者簡介(中文翻譯)
雅克·薩沃伊是瑞士納沙泰爾大學的計算機科學全職教授。他的研究興趣主要包括自然語言處理,特別是針對非英語(歐洲語言、亞洲語言和印度語言)的信息檢索,以及多語言和跨語言的信息檢索。多年來,他參與了多個評估活動(TREC、CLEF、NTCIR、FIRE),處理這些問題。他目前的研究興趣集中在自然語言處理的統計建模和評估上,例如文本聚類和分類,以及作者身份歸屬。