Your Cart
Loading

CS224N Assignment #3-Solved

On Sale
$15.00
$15.00
Added to cart

CS 224n: Assignment #3

Due date: 2/27 11:59 PM PST (You are allowed to use 3 late days maximum for this assignment) These questions require thought, but do not require long answers. Please be as concise as possible.

We ask that you abide the university Honor Code and that of the Computer Science department, and make sure that all of your submitted work is done by yourself.

Please review any additional instructions posted on the assignment page at http://cs224n.stanford.edu/assignment3/index.html. When you are ready to submit, please follow the instructions on the course website.

Note: This assignment involves running an experiment that takes an estimated 3-4 hours. Do not start this assignment at the last minute!

Note: In this assignment, the inputs to neural network layers will be row vectors because this is standard practice for TensorFlow (some built-in TensorFlow functions assume the inputs are row vectors). This means the weight matrix of a hidden layer will right-multiply instead of left-multiply its input (i.e., xW+b instead of Wx + b).

A primer on named entity recognition

In this assignment, we will build several different models for named entity recognition (NER). NER is a subtask of information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. In the assignment, for a given a word in a context, we want to predict whether it represents one of four categories:

•   Person (PER): e.g. “Martha Stewart”, “Obama”, “Tim Wagner”, etc. Pronouns like “he” or “she” are not considered named entities.

•   Organization (ORG): e.g. “American Airlines”, “Goldman Sachs”, “Department of Defense”.

•   Location (LOC): e.g. “Germany”, “Panama Strait”, “Brussels”, but not unnamed locations like “the bar” or “the farm”.

•   Miscellaneous (MISC): e.g. “Japanese”, “USD”, “1,000”, “Englishmen”.

We formulate this as a 5-class classification problem, using the four above classes and a null-class (O) for words that do not represent a named entity (most words fall into this category). For an entity that spans multiple words (“Department of Defense”), each word is separately tagged, and every contiguous sequence of non-null tags is considered to be an entity.

Here is a sample sentence (x(t)) with the named entities tagged above each token (y(t)) as well as hypothetical predictions produced by a system (yˆ(t)):

y(t)

ORG

ORG

O O

O

ORG

ORG

...

O

PER PER

O

yˆ(t)

MISC

O

O O

O

ORG

O

...

O

PER PER

O

x(t)

American

Airlines,

a    unit

of

AMR

Corp.,

...

spokesman

Tim    Wagner

said.

1

In the above example, the system mistakenly predicted “American” to be of the MISC class and ignores “Airlines” and “Corp.”. All together, it predicts 3 entities, “American”, “AMR” and “Tim Wagner”.

To evaluate the quality of a NER system’s output, we look at precision, recall and the F1 measure.https://en.wikipedia.org/wiki/Precision_and_recall

https://en.wikipedia.org/wiki/Confusion_matrix

https://arxiv.org/pdf/1511.07916.pdf provides a good introduction to GRUs. http://colah. github.io/posts/2015-08-Understanding-LSTMs/ provides a more colorful picture of LSTMs and to an extent GRUs.

[5] Yes, several hours is a long time, but you are learning to become a Deep Learning researcher – so you need to be able to manage several-hour experiments!




You will get a ZIP (15MB) file

Customer Reviews

There are no reviews yet.