CS224N Assignment #3-Solved
CS 224n: Assignment #3
Due date: 2/27 11:59 PM PST (You are allowed to use 3 late days maximum for this assignment) These questions require thought, but do not require long answers. Please be as concise as possible.
We ask that you abide the university Honor Code and that of the Computer Science department, and make sure that all of your submitted work is done by yourself.
Please review any additional instructions posted on the assignment page at http://cs224n.stanford.edu/assignment3/index.html. When you are ready to submit, please follow the instructions on the course website.
Note: This assignment involves running an experiment that takes an estimated 3-4 hours. Do not start this assignment at the last minute!
Note: In this assignment, the inputs to neural network layers will be row vectors because this is standard practice for TensorFlow (some built-in TensorFlow functions assume the inputs are row vectors). This means the weight matrix of a hidden layer will right-multiply instead of left-multiply its input (i.e., xW+b instead of Wx + b).
A primer on named entity recognition
In this assignment, we will build several different models for named entity recognition (NER). NER is a subtask of information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. In the assignment, for a given a word in a context, we want to predict whether it represents one of four categories:
• Person (PER): e.g. “Martha Stewart”, “Obama”, “Tim Wagner”, etc. Pronouns like “he” or “she” are not considered named entities.
• Organization (ORG): e.g. “American Airlines”, “Goldman Sachs”, “Department of Defense”.
• Location (LOC): e.g. “Germany”, “Panama Strait”, “Brussels”, but not unnamed locations like “the bar” or “the farm”.
• Miscellaneous (MISC): e.g. “Japanese”, “USD”, “1,000”, “Englishmen”.
We formulate this as a 5-class classification problem, using the four above classes and a null-class (O) for words that do not represent a named entity (most words fall into this category). For an entity that spans multiple words (“Department of Defense”), each word is separately tagged, and every contiguous sequence of non-null tags is considered to be an entity.
Here is a sample sentence (x(t)) with the named entities tagged above each token (y(t)) as well as hypothetical predictions produced by a system (yˆ(t)):
y(t)
ORG
ORG
O O
O
ORG
ORG
...
O
PER PER
O
yˆ(t)
MISC
O
O O
O
ORG
O
...
O
PER PER
O
x(t)
American
Airlines,
a unit
of
AMR
Corp.,
...
spokesman
Tim Wagner
said.
1
In the above example, the system mistakenly predicted “American” to be of the MISC class and ignores “Airlines” and “Corp.”. All together, it predicts 3 entities, “American”, “AMR” and “Tim Wagner”.
To evaluate the quality of a NER system’s output, we look at precision, recall and the F1 measure.https://en.wikipedia.org/wiki/Precision_and_recall
https://en.wikipedia.org/wiki/Confusion_matrix
https://arxiv.org/pdf/1511.07916.pdf provides a good introduction to GRUs. http://colah. github.io/posts/2015-08-Understanding-LSTMs/ provides a more colorful picture of LSTMs and to an extent GRUs.
[5] Yes, several hours is a long time, but you are learning to become a Deep Learning researcher – so you need to be able to manage several-hour experiments!