Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Deep Learning for Natural Language Processing
  • Table Of Contents Toc
  • Feedback & Rating feedback
Deep Learning for Natural Language Processing

Deep Learning for Natural Language Processing

By : Karthiek Reddy Bokka, Shubhangi Hora , Tanuj Jain, Monicah Wambugu
1.5 (2)
close
close
Deep Learning for Natural Language Processing

Deep Learning for Natural Language Processing

1.5 (2)
By: Karthiek Reddy Bokka, Shubhangi Hora , Tanuj Jain, Monicah Wambugu

Overview of this book

Applying deep learning approaches to various NLP tasks can take your computational algorithms to a completely new level in terms of speed and accuracy. Deep Learning for Natural Language Processing starts by highlighting the basic building blocks of the natural language processing domain. The book goes on to introduce the problems that you can solve using state-of-the-art neural network models. After this, delving into the various neural network architectures and their specific areas of application will help you to understand how to select the best model to suit your needs. As you advance through this deep learning book, you’ll study convolutional, recurrent, and recursive neural networks, in addition to covering long short-term memory networks (LSTM). Understanding these networks will help you to implement their models using Keras. In later chapters, you will be able to develop a trigger word detection application using NLP techniques such as attention model and beam search. By the end of this book, you will not only have sound knowledge of natural language processing, but also be able to select the best text preprocessing and neural network models to solve a number of NLP issues.
Table of Contents (11 chapters)
close
close

Chapter 8: State of the art in Natural Language Processing

Activity 11: Build a Text Summarization Model

Solution:

  1. Import the necessary Python packages and classes.

    import os

    import re

    import pdb

    import string

    import numpy as np

    import pandas as pd

    from keras.utils import to_categorical

    import matplotlib.pyplot as plt

    %matplotlib inline

  2. Load the dataset and read the file.

    path_data = "news_summary_small.csv"

    df_text_file = pd.read_csv(path_data)

    df_text_file.headlines = df_text_file.headlines.str.lower()

    df_text_file.text = df_text_file.text.str.lower()

    lengths_text = df_text_file.text.apply(len)

    dataset = list(zip(df_text_file.text.values, df_text_file.headlines.values))

  3. Make vocab dictionary.

    input_texts = []

    target_texts = []

    input_chars = set()

    target_chars = set()

    for line in dataset:

    input_text, target_text = list(line[0]), list(line[1])

    target_text = ['BEGIN_'] + target_text + ['_END']

    input_texts.append(input_text)

    target_texts.append(target_text)

    for character in input_text:

    if character not in input_chars:

    input_chars.add(character)

    for character in target_text:

    if character not in target_chars:

    target_chars.add(character)

    input_chars.add("<unk>")

    input_chars.add("<pad>")

    target_chars.add("<pad>")

    input_chars = sorted(input_chars)

    target_chars = sorted(target_chars)

    human_vocab = dict(zip(input_chars, range(len(input_chars))))

    machine_vocab = dict(zip(target_chars, range(len(target_chars))))

    inv_machine_vocab = dict(enumerate(sorted(machine_vocab)))

    def string_to_int(string_in, length, vocab):

    """

    Converts all strings in the vocabulary into a list of integers representing the positions of the

    input string's characters in the "vocab"

    Arguments:

    string -- input string

    length -- the number of time steps you'd like, determines if the output will be padded or cut

    vocab -- vocabulary, dictionary used to index every character of your "string"

    Returns:

    rep -- list of integers (or '<unk>') (size = length) representing the position of the string's character in the vocabulary

    """

  4. Convert lowercase to standardize.

    string_in = string_in.lower()

    string_in = string_in.replace(',','')

    if len(string_in) > length:

    string_in = string_in[:length]

    rep = list(map(lambda x: vocab.get(x, '<unk>'), string_in))

    if len(string_in) < length:

    rep += [vocab['<pad>']] * (length - len(string_in))

    return rep

    def preprocess_data(dataset, human_vocab, machine_vocab, Tx, Ty):

    X, Y = zip(*dataset)

    X = np.array([string_to_int(i, Tx, human_vocab) for i in X])

    Y = [string_to_int(t, Ty, machine_vocab) for t in Y]

    print("X shape from preprocess: {}".format(X.shape))

    Xoh = np.array(list(map(lambda x: to_categorical(x, num_classes=len(human_vocab)), X)))

    Yoh = np.array(list(map(lambda x: to_categorical(x, num_classes=len(machine_vocab)), Y)))

    return X, np.array(Y), Xoh, Yoh

    def softmax(x, axis=1):

    """Softmax activation function.

    # Arguments

    x : Tensor.

    axis: Integer, axis along which the softmax normalization is applied.

    # Returns

    Tensor, output of softmax transformation.

    # Raises

    ValueError: In case 'dim(x) == 1'.

    """

    ndim = K.ndim(x)

    if ndim == 2:

    return K.softmax(x)

    elif ndim > 2:

    e = K.exp(x - K.max(x, axis=axis, keepdims=True))

    s = K.sum(e, axis=axis, keepdims=True)

    return e / s

    else:

    raise ValueError('Cannot apply softmax to a tensor that is 1D')

  5. Run the previous code snippet to load data, get vocab dictionaries and define some utility functions to be used later. Define length of input characters and output characters.

    Tx = 460

    Ty = 75

    X, Y, Xoh, Yoh = preprocess_data(dataset, human_vocab, machine_vocab, Tx, Ty)

    Define the model functions (Repeator, Concatenate, Densors, Dotor)

    # Defined shared layers as global variables

    repeator = RepeatVector(Tx)

    concatenator = Concatenate(axis=-1)

    densor1 = Dense(10, activation = "tanh")

    densor2 = Dense(1, activation = "relu")

    activator = Activation(softmax, name='attention_weights')

    dotor = Dot(axes = 1)

    Define one-step-attention function:

    def one_step_attention(h, s_prev):

    """

    Performs one step of attention: Outputs a context vector computed as a dot product of the attention weights

    "alphas" and the hidden states "h" of the Bi-LSTM.

    Arguments:

    h -- hidden state output of the Bi-LSTM, numpy-array of shape (m, Tx, 2*n_h)

    s_prev -- previous hidden state of the (post-attention) LSTM, numpy-array of shape (m, n_s)

    Returns:

    context -- context vector, input of the next (post-attetion) LSTM cell

    """

  6. Use repeator to repeat s_prev to be of shape (m, Tx, n_s) so that you can concatenate it with all hidden states "a"

    s_prev = repeator(s_prev)

  7. Use concatenator to concatenate a and s_prev on the last axis (≈ 1 line)

    concat = concatenator([h, s_prev])

  8. Use densor1 to propagate concat through a small fully-connected neural network to compute the "intermediate energies" variable e.

    e = densor1(concat)

  9. Use densor2 to propagate e through a small fully-connected neural network to compute the "energies" variable energies.

    energies = densor2(e)

  10. Use "activator" on "energies" to compute the attention weights "alphas"

    alphas = activator(energies)

  11. Use dotor together with "alphas" and "a" to compute the context vector to be given to the next (post-attention) LSTM-cell

    context = dotor([alphas, h])

    return context

    Define the number of hidden states for decoder and encoder.

    n_h = 32

    n_s = 64

    post_activation_LSTM_cell = LSTM(n_s, return_state = True)

    output_layer = Dense(len(machine_vocab), activation=softmax)

    Define the model architecture and run it to obtain a model.

    def model(Tx, Ty, n_h, n_s, human_vocab_size, machine_vocab_size):

    """

    Arguments:

    Tx -- length of the input sequence

    Ty -- length of the output sequence

    n_h -- hidden state size of the Bi-LSTM

    n_s -- hidden state size of the post-attention LSTM

    human_vocab_size -- size of the python dictionary "human_vocab"

    machine_vocab_size -- size of the python dictionary "machine_vocab"

    Returns:

    model -- Keras model instance

    """

  12. Define the inputs of your model with a shape (Tx,)
  13. Define s0 and c0, initial hidden state for the decoder LSTM of shape (n_s,)

    X = Input(shape=(Tx, human_vocab_size), name="input_first")

    s0 = Input(shape=(n_s,), name='s0')

    c0 = Input(shape=(n_s,), name='c0')

    s = s0

    c = c0

  14. Initialize empty list of outputs

    outputs = []

  15. Define your pre-attention Bi-LSTM. Remember to use return_sequences=True.

    a = Bidirectional(LSTM(n_h, return_sequences=True))(X)

    # Iterate for Ty steps

    for t in range(Ty):

    # Perform one step of the attention mechanism to get back the context vector at step t

    context = one_step_attention(h, s)

  16. Apply the post-attention LSTM cell to the "context" vector.

    # Pass: initial_state = [hidden state, cell state]

    s, _, c = post_activation_LSTM_cell(context, initial_state = [s,c])

  17. Apply Dense layer to the hidden state output of the post-attention LSTM

    out = output_layer(s)

  18. Append "out" to the "outputs" list

    outputs.append(out)

  19. Create model instance taking three inputs and returning the list of outputs.

    model = Model(inputs=[X, s0, c0], outputs=outputs)

    return model

    model = model(Tx, Ty, n_h, n_s, len(human_vocab), len(machine_vocab))

    #Define model loss functions and other hyperparameters. Also #initialize decoder state vectors.

    opt = Adam(lr = 0.005, beta_1=0.9, beta_2=0.999, decay = 0.01)

    model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

    s0 = np.zeros((10000, n_s))

    c0 = np.zeros((10000, n_s))

    outputs = list(Yoh.swapaxes(0,1))

    Fit the model to our data:

    model.fit([Xoh, s0, c0], outputs, epochs=1, batch_size=100)

    #Run inference step for the new text.

    EXAMPLES = ["Last night a meteorite was seen flying near the earth's moon."]

    for example in EXAMPLES:

    source = string_to_int(example, Tx, human_vocab)

    source = np.array(list(map(lambda x: to_categorical(x, num_classes=len(human_vocab)), source)))

    source = source[np.newaxis, :]

    prediction = model.predict([source, s0, c0])

    prediction = np.argmax(prediction, axis = -1)

    output = [inv_machine_vocab[int(i)] for i in prediction]

    print("source:", example)

    print("output:", ''.join(output))

    The output is as follows:

Figure 8.18: Text summarization model output
Figure 8.18: Text summarization model output

Unlock full access

Continue reading for free

A Packt free trial gives you instant online access to our library of over 7000 practical eBooks and videos, constantly updated with the latest in tech

Create a Note

Modal Close icon
You need to login to use this feature.
notes
bookmark search playlist font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Delete Note

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Edit Note

Modal Close icon
Write a note (max 255 characters)
Cancel
Update Note

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY