Blueprints for Text Analytics Using Python: Machine Learning-Based Solutions for Common Real World (Nlp) Applications

Albrecht, Jens, Ramachandran, Sidharth, Winkler, Christian



Turning text into valuable information is essential for many businesses looking to gain a competitive advantage. There have been many improvements in natural language processing and users have a lot of options when choosing to work on a problem. However, it's not always clear which NLP tools or libraries would work for a business use--or which techniques you should use and in what order.

This practical book provides theoretical background and real-world case studies with detailed code examples to help developers and data scientists obtain insight from text online. Authors Jens Albrecht, Sidharth Ramachandran, and Christian Winkler use blueprints for text-related problems that apply state-of-the-art machine learning methods in Python.

If you have a fundamental understanding of statistics and machine learning along with basic programming experience in Python, you're ready to get started. You'll learn how to:

  • Crawl and clean then explore and visualize textual data in different formats
  • Preprocess and vectorize text for machine learning
  • Apply methods for classification, topic analysis, summarization, and knowledge extraction
  • Use semantic word embeddings and deep learning approaches for complex problems
  • Work with Python NLP libraries like spaCy, NLTK, and Gensim in combination with scikit-learn, Pandas, and PyTorch


Jens Albrecht is a full-time professor for Computer Science Department at the Nuremberg Institute of Technology. His work focuses on data management and analytics with a focus on text. He holds a doctorates degree in computer science. Before he rejoined academia in 2012, he has been working for over a decade in the industry as consultant and data architect. He is author of several articles on Big Data management and analysis.

Sidharth Ramachandran currently leads a team of data scientists at GfK helping to build data products for the consumer goods industry. He has over 10 years of experience in software engineering and data science across telecom, banking and marketing industries. Sidharth also co-founded WACAO, a smart personal assistant on Whatsapp which was also featured on Techcrunch. He holds an undergraduate engineering degree from IIT Roorkee and an MBA from IIM Kozhikode. Sidharth is passionate about solving real problems through technology and loves to hack through personal projects in his free time.

Christian Winkler is a Data Scientist and Machine Learning Architect. He holds a PhD in theoretical physics and has been working in the field of large data volumes and artificial intelligence for 20 years, with particular focus on scalable systems and intelligent algorithms for mass text processing. He is founder of datanizing GmbH, speaker at conferences and author of Machine Learning / Text Analytics articles.