WHT

Lstm Crf | 18 Applying Lstm To Named Entity Recognition

Aiqcar

Subscribe Here

Likes

78

Views

6,046

18 Applying Lstm To Named Entity Recognition

Transcript:

Hello again in the previous video. We discussed the concept of recurrent neural networks. In this video. We are going to apply the character networks to a practical case, which is named entity recognition. We are going to be building a traditional bi-directional LSDM model with a conditional random field classifier using tensor flow. First we are going to import. The libraries needed for loading the data pre-processing and building the model. Next we load the data itself. The data can be found as a chunk of text. This is how align in the dataset. Looks like it Looks like a paragraph with each token. Its POS and it’s NER first. We’re going to be splitting using the space character. Here is our first token next. We are going to be splitting using the pipe. Character here is an array, including the token Pos and NER next. We are going to be putting them into a data frame. If you have limited resources, you don’t have to use the full data set. A subset of it is enough Next. We drop the POS column since we are not going to be using the POS in our model Here is how it looks like the Oxford Companion to philosophy. Its NER is a miscellaneous class. There is also Daniel Green. A person here is a dictionary. I’ve made to collect the classes. That are incorrect in the text. We apply the dictionary. Ner column in our data frame. Here is a histogram of our classes as you can see the vast majority of our classes, If the Oh class, which means there is no entity that is logical since most text has no end if we ignore the. Oh, class! We can see the distribution of our classes. They are almost similar with the majority at the person class. After that, we need to encode our classes our classes. Our strings need to turn them into numbers, so we are going to the label Encoder from psychical. Next, we need to define a function that will turn our data from a very long array of board vectors and class labels into sequences. In other words, we are going to cut our very long array of tokens into sentences. We assume that every ten consecutive tokens are going to be one sentence. Here is how one example looks like the input will have the sequence length and the word vector’s dimensions that class labels a vector of length sequence length. Next we need to split our data set into Train test and validation to do that. We are going to use the Train tests. Flipped helper from cyclotron. We are going to split our data set into 60% training and 40% tests and we are going to split our training data into 70% training and 30% validation in the end. Our training set shapes is going to be a thousand four hundred examples. Our validation set is going to be three thousand six hundred and our test set is going to be eight thousand long. We are going to use the tensorflow dataset. API here is how we use it. We define a dataset. We define a dataset by the function. Zip zip is going to zip two tensor slices or two arrays, the X array and the wall array for each of our three data sets the Train test and the validation. Then we perform a shuffle operation on the data set, which of all the examples and match them in batches of 64 We prefetch two batches. Next we define the initialization operations of our iterators. You will have three iterator’s train, iterator test, iterator and validation iterator. The output of these iterators will be a sequence and its labels next. Our first layer of the network is going to be an Alice TM layer. This layer will as input the word embeddings it will read them from left to right right to left and then concatenate their output, the others. TM, in this layer is going to be a feature extractor. It is going to find features in the word vectors and in the time relationship between the tokens either from left to right or from right to left, it will try to find the best features in order 12 then the classifier in the following layer in the next layer we are going to have a projection layer. A projection layer will project the output from the LS TM to a fixed size vector after the projection. We need to feed our fixed side vector Dixit size vector classifier, we can use a simple softmax classifier. However, in any R, we have an assumption that the classes are not independent. In other words, it is usual that when an entity stores there might be one or more entities following it, the conditional random field classifier exploits these facts. We define the loss function of the CRF to be the negative log likelihood we optimize the negative log likelihood using the ATOM OPTIMIZER with the starting learning rate of 0.001 In the next cell. We define a function to live test our model. This function will be used to monitor the output of our model On an example of our own. This function takes the text input token eises. It uses our word vector model to turn tokens into old vectors, and then it speeds these tokens into our model and returns the output and decodes the output from class labels into string classes that we can read and understand in the final cell. We start by defining a saver. A saver is a utility from tensorflow to save our session and thus save our model. We start our session. Run the global variables, initializer and then for each other In this example, we will run it for 50 epochs. We run the Train iterator of initializer. This operation will initialize our iterator. For each example, our iterator will return. We will compute the loss accumulate the loss and repeat until the train will iterator returns nothing and only then we compute the average loss and print it and then print the output from the lifetest function on a dummy example. Finally, we save the session. Here is the output from the model during training. You can see that the loss starts at five point four and the output is quite random. The loss keeps minimizing as we progress through the epochs. It keeps going down to point Eight two point, five two point four. Finally, two point three and the output looks correct in the end. John Doe is detected as a person that is the correct entity that means our model has learned to detect entities and succeeded at finding entities. At least in this dummy example. That’s it for anymore. In the next video, we are going to try text classification using LST. M stay tuned.

0.3.0 | Wor Build 0.3.0 Installation Guide

Transcript: [MUSIC] Okay, so in this video? I want to take a look at the new windows on Raspberry Pi build 0.3.0 and this is the latest version. It's just been released today and this version you have to build by yourself. You have to get your own whim, and then you...

read more