Sign In Start Free Trial

Book Overview & Buying
Table Of Contents
Feedback & Rating

Natural Language Processing with Java

By : Richard M. Reese

2 (3)

Natural Language Processing with Java

2 (3)

By: Richard M. Reese

Overview of this book

Natural Language Processing (NLP) allows you to take any sentence and identify patterns, special names, company names, and more. The second edition of Natural Language Processing with Java teaches you how to perform language analysis with the help of Java libraries, while constantly gaining insights from the outcomes. You’ll start by understanding how NLP and its various concepts work. Having got to grips with the basics, you’ll explore important tools and libraries in Java for NLP, such as CoreNLP, OpenNLP, Neuroph, and Mallet. You’ll then start performing NLP on different inputs and tasks, such as tokenization, model training, parts-of-speech and parsing trees. You’ll learn about statistical machine translation, summarization, dialog systems, complex searches, supervised and unsupervised NLP, and more. By the end of this book, you’ll have learned more about NLP, neural networks, and various other trained models in Java for enhancing the performance of NLP applications.

Preface

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

Introduction to NLP

Introduction to NLP

What is NLP?

Why use NLP?

Why is NLP so hard?

Survey of NLP tools

Deep learning for Java

Overview of text-processing tasks

Understanding NLP models

Preparing data

Summary

Finding Parts of Text

Finding Parts of Text

Understanding the parts of text

What is tokenization?

Simple Java tokenizers

NLP tokenizer APIs

Understanding normalization

Summary

Finding Sentences

Finding Sentences

The SBD process

What makes SBD difficult?

Understanding the SBD rules of LingPipe's HeuristicSentenceModel class

Simple Java SBDs

Using NLP APIs

Training a sentence-detector model

Summary

Finding People and Things

Finding People and Things

Why is NER difficult?

Techniques for name recognition

Using regular expressions for NER

Using NLP APIs

Building a new dataset with the NER annotation tool

Training a model

Summary

Detecting Part of Speech

Detecting Part of Speech

The tagging process

Using the NLP APIs

Summary

Representing Text with Features

Representing Text with Features

N-grams

Word embedding

GloVe

Word2vec

Dimensionality reduction

Principle component analysis

Distributed stochastic neighbor embedding

Summary

Information Retrieval

Information Retrieval

Boolean retrieval

Dictionaries and tolerant retrieval

Vector space model

Scoring and term weighting

Inverse document frequency

TF-IDF weighting

Evaluation of information retrieval systems

Summary

Classifying Texts and Documents

Classifying Texts and Documents

How classification is used

Understanding sentiment analysis

Text-classifying techniques

Using APIs to classify text

Summary

Topic Modeling

Topic Modeling

What is topic modeling?

The basics of LDA

Topic modeling with MALLET

Summary

Using Parsers to Extract Relationships

Using Parsers to Extract Relationships

Relationship types

Understanding parse trees

Using extracted relationships

Extracting relationships

Using NLP APIs

Extracting relationships for a question-answer system

Summary

Combined Pipeline

Combined Pipeline

Preparing data

Using boilerpipe to extract text from HTML

Using POI to extract text from Word documents

Using PDFBox to extract text from PDF documents

Using Apache Tika for content analysis and extraction

Pipelines

Using the Stanford pipeline

Using multiple cores with the Stanford pipeline

Creating a pipeline to search text

Summary

Creating a Chatbot

Creating a Chatbot

Chatbot architecture

Artificial Linguistic Internet Computer Entity

Summary

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

2 (3)

5 star

0

4 star

0

3 star

0

2 star

100%

1 star

0

Techniques for name recognition

There are a number of NER techniques available. Some use regular expressions and others are based on a predefined dictionary. Regular expressions have a lot of expressive power and can isolate entities. A dictionary of entity names can be compared to tokens of text to find matches.

Another common NER approach uses trained models to detect their presence. These models are dependent on the type of entity we are looking for and the target language. A model that works well for one domain, such as web pages, may not work well for a different domain, such as medical journals.

When a model is trained, it uses an annotated block of text, which identifies the entities of interest. To measure how well a model has been trained, several measures are used:

Precision: It is the percentage of entities found that match exactly the spans found in the evaluation...

Search

Your notes and bookmarks