Appendix | Deep Learning for Natural Language Processing

Chapter 4: Introduction to convolutional networks

Activity 5: Sentiment Analysis on a real-life dataset

Solution:

Import the necessary classes
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras import layers
from keras.preprocessing.sequence import pad_sequences
import numpy as np
import pandas as pd
Define your variables and parameters.
epochs = 20
maxlen = 100
embedding_dim = 50
num_filters = 64
kernel_size = 5
batch_size = 32
Import the data.
data = pd.read_csv('data/sentiment labelled sentences/yelp_labelled.txt',names=['sentence', 'label'], sep='\t')
data.head()
Printing this out on a Jupyter notebook should display:
Figure 4.27: Labelled dataset
Select the 'sentence' and 'label' columns
sentences=data['sentence'].values
labels=data['label'].values
Split your data into training and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
sentences, labels, test_size=0.30, random_state=1000)
Tokenize
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(X_train)
X_train = tokenizer.texts_to_sequences(X_train)
X_test = tokenizer.texts_to_sequences(X_test)
vocab_size = len(tokenizer.word_index) + 1 #The vocabulary size has an additional 1 due to the 0 reserved index
Pad in order to ensure that all sequences have the same length
X_train = pad_sequences(X_train, padding='post', maxlen=maxlen)
X_test = pad_sequences(X_test, padding='post', maxlen=maxlen)
Create the model. Note that we use a sigmoid activation function on the last layer and the binary cross entropy for calculating loss. This is because we are doing a binary classification.
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model.add(layers.Conv1D(num_filters, kernel_size, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()
The above code should yield
Figure 4.28: Model summary
The model can be visualized as follows as well:
Figure 4.29: Model visualization
Train and test the model.
model.fit(X_train, y_train,
epochs=epochs,
verbose=False,
validation_data=(X_test, y_test),
batch_size=batch_size)
loss, accuracy = model.evaluate(X_train, y_train, verbose=False)
print("Training Accuracy: {:.4f}".format(accuracy))
loss, accuracy = model.evaluate(X_test, y_test, verbose=False)
print("Testing Accuracy: {:.4f}".format(accuracy))
The accuracy output should be as follows:

Deep Learning for Natural Language Processing

By : Karthiek Reddy Bokka, Shubhangi Hora , Tanuj Jain, Monicah Wambugu

Deep Learning for Natural Language Processing

By: Karthiek Reddy Bokka, Shubhangi Hora , Tanuj Jain, Monicah Wambugu

Overview of this book

Chapter 4: Introduction to convolutional networks

Activity 5: Sentiment Analysis on a real-life dataset

Figure 4.27: Labelled dataset

Figure 4.28: Model summary

Figure 4.29: Model visualization

Figure 4.30: Accuracy score

Unlock full access

Continue reading for free

Deep Learning for Natural Language Processing

By : Karthiek Reddy Bokka, Shubhangi Hora , Tanuj Jain, Monicah Wambugu

Deep Learning for Natural Language Processing

By: Karthiek Reddy Bokka, Shubhangi Hora , Tanuj Jain, Monicah Wambugu

Overview of this book

Chapter 4: Introduction to convolutional networks

Activity 5: Sentiment Analysis on a real-life dataset

Figure 4.27: Labelled dataset

Figure 4.28: Model summary

Figure 4.29: Model visualization

Figure 4.30: Accuracy score

Unlock full access

Continue reading for free

Create a Note

Delete Bookmark

Delete Note

Edit Note

Confirmation

Buy this book with your credits?