Transcript:
What is going on, everybody? Welcome to Part Four or LTK with Python for natural language processing tutorial video in this video. Were actually talking about something. I think is really cool and exciting, and that’s part of speech tagging so up to this point, we’ve talked about pre-processing and stuff in and part of speech. Tagging is pre-processing. But the other stuff was kind of, yeah. This is cool there at least to me. I hope it is to you too. So we’re gonna be doing part of speech tagging. It means just what it sounds. It’s labeling the part of speech to every single word in whatever you’re feeding, so usually it’s in a sentence, but you might feed in whole paragraphs. If you want in so first of all, we’re importing on CK, then we’re going to go from Nlt Kdot corpus lovey-lovey corpus we’re going to import state Union, so these are State of the Union addresses by various presidents over the last, like 60 years or something. Actually, now that’s probably more like, 70 years Anyways, unless DK is it, Yes, good and then from LCK Tokenize import and we’ll use a different tokenizer. Throw you guys for a loop. We’re gonna use the Pumped sentence tokenizer so the punx that gets it. Okay is be the punked sentence. Tokenizer is a sort of unspecial. II not sort of it is, it’s an unsupervised machine learning sentence. Tokenizer learning tokenizer. You can train it if you want. It’s comes pre trained, but you can retrain it on other stuff. If you so chose well, just price use the default one, but I don’t know, I’ll show you guys probably how to train it, so maybe it won’t, and that’s good enough for now now we’re going to need to pull in. Actually, We’ll, just keep on CK because we’re will do an else You got POS tag. So now! What we want to do is lets. Get the sample text and this will be state, underscore you and then dot raw, so we’re gonna grab the raw text for what text we want 2006 from G. W Bush. Dot TXT. That will be fun now. We’re gonna say the custom sent tokenizer. So this is how you would go about making your. Punk sentence sentence, tokenizer! You’re very, very own. So that will be equal to pumped sentence tokenizer, and we’re gonna kind of do something depression normally always do, but this will be fine. Custom sentence. Hogan Iser is the point second sent home. I guess I can’t. I can’t say this. Punk’d sentence, tokenizer on sample text, especially your training, The Punk’d sentence tokenizer on that text You can bring through a different text. Let’s see stay the Union that should occur every year so there should be trained text 2005 there. We go will be legit about this. So a custom sent Tokenizer punked sentence. Tokenizer train text now we’re gonna say tokenized equals. Let’s say tokenized equals, and that will be the custom sent tokenizer dot’s tokenize of sample text. Okay, now let’s go ahead and start using some functions. It’s a fun process concepts. Okay, so we’re gonna try accept exception as be here, and for now, we’ll just print out string E. So what we’re gonna try is we’re gonna say for? I in tokenized. This is for each little element And then tokenized. We’re gonna say words. V equals L. TK. DOT word underscore two tokenize and well do I. So those would be the words and then tags. Now is the following, and let’s see K Dot POS on our score tag, and that will be words. Okay, So now for I then let’s go ahead and print Tagged. That should do for hold on everybody. Settle down! This is the whole state of the Union, so we might not want to do area before I and tokenized. This will be a sentence, so lets. Just do and this would be fine, lets. Do it full steam ahead. This might be way too big. Yeah, that’s okay. Oh, we’re not processing the content. Lets let’s call this function first. Fine, now full steam ahead, everybody. Oh, how long did George W. Bush? Yeah, pond for the state there. We go, okay. We made it all the way to the bottom awesome applause. Okay, so let’s just. I don’t know you can go. Wherever you want? Let’s go to the top so everybody can follow along so president. George W. Bush gets that these are nouns before. This is just too much courageous. JJ. So actually, that’d be a good time to talk about what these little meanings are. Okay, so I’m looking at him. I’m like, yeah, that makes sense. Yall probably looking at these words like what knocking. So when you do. This part of speech tagging, it will take and create tuples of the word and then the part of speech and instead of writing them all out. I’m just gonna go and copy and paste them for you. Guys this over here. Make some space and paste, okay. These are your part of speech tags, so Si si coordinating conjunction and then like NNP is a proper noun. And then you’ve got, you know, verbs. So VB V B. DS past tense Beauvoir. The only one that’s weird. I think is our yeah. Our is Adverbs, So when you see that you might not like in your head like when I see these like. PR P. Obviously, actually, I always say that’s a preposition, but the I n is our preposition. So no, it’s good idea to have this around because you can quickly confuse yourself, just dude, so that’s it, so we can come back over to our output here, so now you can see okay now. Yeah, and then fellow citizens today. Courageous, blah, blah, blah. So what’s courageous, for example? JJ adjective. Okay, So those a success good served. Let’s see verb good good. America good, got that so anyway, you can go through and look at all of them. But for the most part, this actual part of speech tagging is like perfect. It is nice, it’s going to mess up from time to time, especially unlike. I think of some good examples of when it messes up, definitely on some nouns like a lot of times, it’s not gonna recognize nouns, especially if you’re reading like Twitter. So eventually we may or may not use Twitter as a dataset I’m contemplating using like streams from Twitter and you’ll find that a lot of people do not capitalize their persons. Okay, so a lot of times, they’ll have a person’s name, but it’s under lowercase and an LT. K is like what is this, so that can be kind of a problem, sometimes over the most part. It’s pretty darn good, so that’s part of speech tagging pretty simple again. Most of what we did before we even got to part of speech tagging was just getting some data and getting some example sentences plus. I want to show you the Punk’d sentence. Tokenizer look at me, singing and also loading in Corpora like this standing Union Ron George W. Bush again. We’ll be talking more about Coop. Rose very soon because this is probably one of the biggest things that, like I think probably most people that have used. I don’t see K don’t actually like going look and see what they have in their corpora because it’s actually our corpus. It’s actually a lot of really good stuff in there, A lot of golden nuggets so to speak so anyways, that’s it for part speech tagging, it’s really quite simple, really boils down to after you’ve split by word you can pass through the antique. A part of speech tagging now now. I will show it to you yet. When we get to chunking. I’ll show you how you can display everything That’s really pretty so anyways. That’s it for part of speech tagging. If you have any questions or comments concerns, whatever please feel free to leave them below, otherwise as always, thanks for watching, thanks for all the support of subscriptions and until next time.