The Handbook of NLP with Gensim: Leverage topic modeling to uncover hidden patterns, themes, and valuable insights within textual data

Kuo, Chris

  • 出版商: Packt Publishing
  • 出版日期: 2023-10-27
  • 售價: $1,800
  • 貴賓價: 9.5$1,710
  • 語言: 英文
  • 頁數: 310
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1803244941
  • ISBN-13: 9781803244945
  • 相關分類: Text-mining
  • 立即出貨 (庫存=1)

商品描述

Navigating the terrain of NLP research and applying it practically can be a formidable task made easy with The Handbook of NLP with Gensim. This book demystifies NLP and equips you with hands-on strategies spanning healthcare, e-commerce, finance, and more to enable you to leverage Gensim in real-world scenarios. You'll begin by exploring motives and techniques for extracting text information like bag-of-words, TF-IDF, and word embeddings. This book will then guide you on topic modeling using methods such as Latent Semantic Analysis (LSA) for dimensionality reduction and discovering latent semantic relationships in text data, Latent Dirichlet Allocation (LDA) for probabilistic topic modeling, and Ensemble LDA to enhance topic modeling stability and accuracy. Next, you'll learn text summarization techniques with Word2Vec and Doc2Vec to build the modeling pipeline and optimize models using hyperparameters. As you get acquainted with practical applications in various industries, this book will inspire you to design innovative projects. Alongside topic modeling, you'll also explore named entity handling and NER tools, modeling procedures, and tools for effective topic modeling applications. By the end of this book, you'll have mastered the techniques essential to create applications with Gensim and integrate NLP into your business processes.

商品描述(中文翻譯)

在 NLP 研究領域中航行並實際應用可以是一項艱鉅的任務,但有了《使用 Gensim 的 NLP 手冊》,這一切變得輕而易舉。本書揭開了 NLP 的神秘面紗,並提供了實用策略,涵蓋醫療保健、電子商務、金融等領域,讓您能夠在現實世界的場景中充分利用 Gensim。您將首先探索提取文本信息的動機和技術,如詞袋模型、TF-IDF 和詞嵌入。接著,本書將引導您使用潛在語義分析(LSA)等方法進行主題建模,以降低維度並發現文本數據中的潛在語義關係,使用潛在狄利克雷分配(LDA)進行概率主題建模,以及使用集成 LDA 提高主題建模的穩定性和準確性。接下來,您將學習使用 Word2Vec 和 Doc2Vec 進行文本摘要的技術,構建建模流程並使用超參數優化模型。隨著您熟悉各個行業的實際應用,本書將激發您設計創新項目的靈感。除了主題建模,您還將探索命名實體處理和 NER 工具、建模程序以及有效的主題建模應用工具。通過閱讀本書,您將掌握使用 Gensim 創建應用程序並將 NLP 整合到您的業務流程中所必需的技術。