Text Analytics with Python: A Practitioner's Guide to Natural Language Processing, 2/e

Sarkar, Dipanjan

買這商品的人也買了...

商品描述

Leverage Natural Language Processing (NLP) in Python and learn how to set up your own robust environment for performing text analytics. This second edition has gone through a major revamp and introduces several significant changes and new topics based on the recent trends in NLP.

You'll see how to use the latest state-of-the-art frameworks in NLP, coupled with machine learning and deep learning models for supervised sentiment analysis powered by Python to solve actual case studies. Start by reviewing Python for NLP fundamentals on strings and text data and move on to engineering representation methods for text data, including both traditional statistical models and newer deep learning-based embedding models. Improved techniques and new methods around parsing and processing text are discussed as well.

Text summarization and topic models have been overhauled so the book showcases how to build, tune, and interpret topic models in the context of an interest dataset on NIPS conference papers. Additionally, the book covers text similarity techniques with a real-world example of movie recommenders, along with sentiment analysis using supervised and unsupervised techniques.

 

There is also a chapter dedicated to semantic analysis where you'll see how to build your own named entity recognition (NER) system from scratch. While the overall structure of the book remains the same, the entire code base, modules, and chapters has been updated to the latest Python 3.x release.


What You'll Learn

-Understand NLP and text syntax, semantics and structure-Discover text cleaning and feature engineering-Review text classification and text clustering - Assess text summarization and topic models- Study deep learning for NLP
Who This Book Is For
IT professionals, data analysts, developers, linguistic experts, data scientists and engineers and basically anyone with a keen interest in linguistics, analytics and generating insights from textual data.

商品描述(中文翻譯)

在Python中利用自然語言處理(NLP)並學習如何建立自己的強大環境來進行文本分析。第二版經過了重大改進,根據NLP的最新趨勢引入了幾個重要的變化和新主題。

您將看到如何使用最新的NLP框架,結合機器學習和深度學習模型,使用Python進行監督情感分析的實際案例解決方案。首先回顧Python的NLP基礎知識,包括字符串和文本數據,然後轉向為文本數據工程建立表示方法,包括傳統的統計模型和新的基於深度學習的嵌入模型。同時還討論了改進的技術和處理文本的新方法。

文本摘要和主題模型已經進行了全面改進,因此本書展示了如何在NIPS會議論文的興趣數據集背景下構建、調整和解釋主題模型。此外,本書還涵蓋了文本相似性技術,並提供了電影推薦系統的實際示例,以及使用監督和非監督技術進行情感分析。

還有一章專門介紹語義分析,您將看到如何從頭開始構建自己的命名實體識別(NER)系統。儘管本書的整體結構保持不變,但整個代碼庫、模塊和章節已更新到最新的Python 3.x版本。

您將學到什麼:
- 瞭解NLP和文本的語法、語義和結構
- 探索文本清理和特徵工程
- 複習文本分類和文本聚類
- 評估文本摘要和主題模型
- 學習NLP的深度學習

本書適合對語言學、分析和從文本數據中獲得洞察力感興趣的IT專業人士、數據分析師、開發人員、語言專家、數據科學家和工程師,以及任何對語言學、分析和從文本數據中獲得洞察力感興趣的人。

作者簡介

Dipanjan Sarkar is a Data Scientist at Intel, the world's largest silicon company which is on a mission to make the world more connected and productive. He primarily works on Analytics, Business Intelligence, Application Development and building large scale Intelligent Systems. He received his master's degree in Information Technology from the International Institute of Information Technology, Bangalore with a focus on Data Science and Software Engineering. He is also an avid supporter of self-learning, especially Massive Open Online Courses and holds a Data Science Specialisation from Johns Hopkins University on Coursera.

He has been an analytics practitioner for over six years, specializing in statistical, predictive and text analytics. He has also authored a books on R and Machine Learning and occasionally reviews technical books and acts as a course beta tester for Coursera. Dipanjan's interests include learning about new technology, financial markets, disruptive start-ups, data science and more recently, artificial intelligence and deep learning. In his spare time he loves reading, gaming and watching popular sitcoms and football.

作者簡介(中文翻譯)

Dipanjan Sarkar是英特爾的資料科學家,英特爾是全球最大的矽晶公司,致力於使世界更加連接和高效。他主要從事分析、商業智能、應用開發和建立大規模智能系統的工作。他在國際資訊技術研究所獲得了資訊技術碩士學位,專攻資料科學和軟體工程。他也是自學的狂熱支持者,尤其是大規模開放線上課程,並在Coursera上獲得了約翰霍普金斯大學的資料科學專業證書。

他已經從事分析實踐工作超過六年,專注於統計、預測和文本分析。他還撰寫了關於R和機器學習的書籍,並偶爾審查技術書籍,並擔任Coursera的課程測試者。Dipanjan的興趣包括學習新技術、金融市場、具有破壞性的初創企業、資料科學,最近還包括人工智慧和深度學習。在閒暇時間,他喜歡閱讀、遊戲和觀看流行的情景喜劇和足球比賽。