Transcript:
[music] well! Human beings are the most advanced species on earth, and there’s no doubt in that and our success as human beings is because of our ability to communicate and share information. Now that’s where the concept of developing a language comes in, and when we talk about the human language, it is one of the most diverse and complex part of us, considering a total of six thousand and five hundred languages that exists so coming to the 31st century. According to the industry estimates, only twenty one percent of the available data is present in the structured form data is being generated as I speak Tweet and send messages on Whatsapp or the various of the crops of Facebook and majority of the Stata exist in the textual form, which is highly unstructured in nature. Now, in order to produce significant and actionable insights from this data, it is important to get acquainted with the techniques of text analysis and natural language processing, So let’s understand what is text, mining and natural language processing, so text mining or text analytics is the process of deriving meaningful information from natural language text. It usually involves the process of structuring the input text deriving patterns within the structured data and finally evaluating and interpreting the output. Now on the other hand, natural language processing refers to the artificial intelligence method of communicating with an intelligent system using the natural language as text mining refers to the process of deriving high-quality information from the text. The overall goal is here to essentially turn the text into data analysis via the application of natural language processing, satisfied, text, mining and NLP go hand-in-hand so let’s understand some of the applications of text mining or natural language processing so one of the first and the most important applications of natural language processing is sentiment analysis. Be it, Twitter, sentimental analysis or the Facebook sentiment as it’s being used heavily. Now next, we have the implementation of chat pod. Now you might have used the customer chat services, pride by various companies and the process behind all of that is because of the NLP. Now we have speech recognition and here we are also talking about divorce assistants like Siri. Google Assistant and Cortana and the process behind all of this is because of the natural language processing. No machine translation is also another use case of natural language processing and the most common example for it is the Google Translate, which uses NLP to translate data from one language to another and that, too in the real time now, other applications of NLP include spell checking keyword search and also extracting information from any doc or any website and finally, one of the coolest application of natural language processing is advertisement matching basically recommendation of ads. Based on your history now. NLP is divided into two major components that is the natural language understanding and the natural language generation, the understanding generally refers to mapping the given input into natural language into useful representation and analyzing those aspects of the language, whereas generation is the process of producing meaningful phrases and sentences in the form of natural language from some internal representation. Now the natural language understanding is usually harder than the natural language generation because it takes a lot of time and a lot of things to usually understand a particularly specially. If you are not a human being now. There are various steps involved in the natural language processing, which are tokenization stemming limit ization the POS tags named entity recognition and chunking now. Starting with tokenization tokenization is the process operating strings into tokens, which in turn are small structures or units that can be used for tokenization. So if we have a look at the example here taking this sentence into consideration, it can be divided into seven tokens now. This is very useful in the natural language processing part. Now, coming to the second process in natural language processing is stemming now stemming usually refers to normalizing the words into its base or the root form. So if we have a look at the words here, we have affectations, effects, affections, affected affection and effective now, all of these word originate from a single root word, and as you might have guessed, it is effect now. Stemming algorithm works by cutting off the end or the beginning of the war, taking into account a list of common prefixes suffixes that can be found in an infected word. This indiscriminate cutting can be successful in some occasions, but not always, so let’s understand the concept of limitation, now limitation on the other hand takes into consideration the morphological analysis of the void to do so, it is necessary to have a detail dictionary, which the algorithm can look through to link the form back to its original word or the root word, which is also known as Lemma. Now, what limit Ization does is groups together different infected forms of the word called Lemma and is somehow similar to stemming as it mapped several words into one common root, but the major difference between stemming and limitation is that the output of the limit. Ization is a proper word, for example, a limit. Iser should map the word gone going and went into Ko. That will not be the output for stemming. Now, once we have the tokens. And once we have divided the tokens into its root form, next comes the POS tags. Now, generally speaking, the grammatical type of the word is referred to as POS tags or the paths of speech, be it the verb noun, adjective adverb article and many more, it indicates how a word functions in meaning as well as grammatically. Within the sentence a word can have more than one part of speech, based on the context in which it is used. For example. Let’s take the sentence. Who was something on the Internet here. Google is used as a verb, although it’s a proper noun. Now these are some of the limitations, or I should say the problems that occur while processing the natural language now to overcome all of these challenges, we have the named entity recognition, also known as NER, so it is the process of detecting the named entities such as the person name the company names. We have the quantities or the location. Now it has three steps, which are the noun phrase identification, the phrase classification and entity disambiguation. So if you look at this particular example here. Google CEO Sundar Pichai introduced the new Pixel 3 at New York Central Mall. So as you can see here? Google is identified as a organization so in the picture as a person we have New York as lucky and Central Mall is also defined as an organization. Now once we have divided the sentences into tokens done the stemming the limit, Ization added the tags as the named entity recognition. It’s time for us to group it back together and make sense out of it, so for that, we have chunking, so chunking basically means picking up individual pieces of information and grouping them together into the bigger pieces. Now these bigger pieces are also known as chunks in the context of NLP, chunking means grouping of words or tokens into chunks. So as you can see here, we have pink as an adjective Panther as a noun and D as a determiner and all of these are together chunked into a noun phrase. Now this helps in getting insights and meaningful information from the given text. Now you might be wondering where does one execute or run all of these programs and all of this function on a given text file so for that? Python came up with an LD. K. Now, what is an LD again, n? LT. K is the natural language toolkit library, which is heavily used for all the natural language processing and the text analysis. So, guys, if you want to know the details about how to execute each and every parts like tokenization stemming limit ization through an LT. K, you can refer to our NLP tutorial. The link to it is given in the description box below till then. Thank you and happy then. I hope you have enjoyed listening to this video. Please be kind enough to like it and you can comment any of your doubts and queries and we will reply them at the earliest. Do look out for more videos in our playlist and subscribe to any Rekha channel to learn more happy learning.