Transcript:
So have you ever faced a dilemma regarding do? I have the required skills for what it takes to participate in Carroll competitions affairs and we are going to break it down so that q1 we’re facing this question again. Carroll is much more than a platform for predictive modeling, data, science or machine learning competitions. So for those of you, who aren’t familiar with what Carol is all about, Let’s dig deep. So it’s a global platform which hosts competitions datasets, kernels, discussions regarding data science machine, learning for people in different areas like developers, engineers, researchers as well as job board, it’s a platform for doing and sharing data science great place to learn the designs of machine learning from an amazing community earlier people proficient in fields like data science machine learning were the ones actively using cattle, but now Kara has done a lot in making it available for people who don’t have such strong background and making it available for public use. So if you do really good companies can hire you as well, what’s more, you will also award it with a handsome prize. Money well amount varies for the difficulty of the problem, but there’s a steep learning curve in the whole process from downloading and dataset. I’m building a model to submitting your predictions. It’s super fun so first of all, create an account in cattle. This step is required to your schedule. Well, there are two methods sign up. I would recommend signing up using email since if you sign up through Facebook or Google+, you may have some issues in using cattle command line. It’s like to download data set and submit prediction using command line, but recently Carol has reached on epi after that go to data sites, there are thousands of high quality data. That’s available on Carol. When you click on any one, you’re going to see the screen. There are many sections available, namely data, kernel discussions, activity, new kernel, and, of course, download in the overview section. You’re going to see the top contributors on this data heading tags, and when you scroll down, you will see the kernel with most words will dive into kernel. Afterwards for now, just consider them as scripts. Scripts are short snippets of code. Kernel is a combination of environment in put code in output all stored together rather than encourage you to become active in discussions because they are a really nice way to collaborate. Li, learn together. You get to learn about new perspectives and you made some new tricks, which may help you in carrying out the on task now when you click on gate. I use gather an insight into what the literature is all about. What does study file contains? Generally, it is divided into training and testing data sites. Some may contain validation you to that as well, but it’s highly recommended to create one if it’s not included before proceeding Very often, you would have to do this. The training set is used to build octane your machine learning models for the training site outcome, which is also known as ground load, is provided your model will be based on features. You can also use future engineering to create new features this test that should be used to see how well your model performs on unseen data data, which your model has an encountered before for the test set labeling or ground loop is not provided a model needs to plate. It is how it comes. There’s a submission dot CSV file. This is a file which you must be concerned with after dealing with the model since you would be submitting your predictions in this format Deploy. Are you a sample of how it should look like normally? It’s a data dictionary and I use panels. Get a frame to that. It’s super convenient discussions as you know, I like forums, you stack up with a problem, you use it to get to quiet, answered poor rankings. There’s a leaderboard. It’s divided into two sections, public and private public near the board is computed on a portion of the test set. The private is computed on the remainder of the cassette. Not the whole desert. When someone says something regarding freaking a leaderboard, It means they are referring to tuning your models to perform well on the public leaderboard. You have to make sure that you are not overfitting in any case privately. The ball remain secret until dying of the competition. The parent leader board determines the final competition winners. The purpose of this division is to prevent people from winning by over faking the public video board. Participants motivated to make sure the models generalize well to the privateer ahead. You can reduce total state. Now let’s come to competition section. There are many type of combinations like playground research that active ones and there are archived ones. So if you are playing with the data set of an R cast competition, you would be able to submit your predictions, but you won’t be dent. There is no prize money for it. It’s just kept for educational purpose. I would say after the competition is over after this combination. Avoid you with the money as we have talked active combinations. Avoid you with the money and as we have talked privately. The board is kept secret so that we don’t over fit on the public either side. You can take initial help. Some kernels, which have been made available many participants. So if you click on the competition, which interests you, you can see similar stuff as we have discussed. I didn’t talk about evaluation, so it gives you an idea of how your predictions are being evaluated. Sometimes they use log loss. Sometimes they use mean average precision, a different intersection of a union, sometimes crossings or P. You can check at that before submitting just to make sure they give you a format of what do they expect on the summation file? Normally, you create that in Csv format comma separated values, you can use binders to manage data. You have to be really particular regarding submitting your submission while everything has to be in same format as they asked for. Otherwise, you won’t be marked. Now, let’s talk about cattle kernels. They are essentially to pick and ibooks, using it on your browser, just like snippets of code into paternal book, It’s a free platform to then Jupiter, nor look in your browser, so that means you don’t have to worry about the hassles of setting up your local environment or any other environment on cloud instance, as well. So when you run the kernel, The processing power is coming from the servers in the cloud, so you can practice and learn machine learning without heating up your laptop. You can either use an already available kernels or you can create your new ones. The data set is already loaded in environment of the kernel, so there is no need to upload the detect. Upon the cloud instance you can still order lists in the files that you may require on to that instance, in order to demonstrate we deal with a fashion data set. It’s a data set that contains 10 categories of clothing and accessory type things like man’s bags. He is shirt and so on. So it’s an example of my key class classification. They are fifty thousand two examples. In time thousand evolution samples, let’s explore the data set in archival kernel, looking at the data set, it’s providing keidel in the form of CSV files. The origin of data was in a 28 by 28 pixel grayscale images and have been flattened to become 784 distinct columns in the CSV file. Since the data set is already in environment, we can make use of pandas, which is already included in the session. So let’s use the CSV files into panda’s dataframes. Now we have loaded the data into data frame. You can make use of all the features that this amazing library brings with it will display the first few goes with head and you can learn the cell you can learn more about the structure of the data head. Additionally, it will be really nice if you can visualize the data set so that they can have much more meaning to us rather than just those upon those of numbers to visualize it. We can make use of matplotlib, which is normally imported as p80 to see what some of these images look like you can use it to display the areas of pixel values as images you can see that these images, while fuzzy are indeed still recognizable as a clothing accessory items, then they claim to be. Tyrell kernel. Lets you visualize the data. In addition to just go sesang it, so it allows you to work in a fully interactive notebook in the browser with later or just no setup and I really want to focus that we didn’t have to launch any cloud instance, or have to manage any environment consideration, which is really awesome, be sure to subscribe to the channel to catch future episodes as they come out now. What are you waiting for? Let’s head on over to Carroll. Calm and sign up for an account and play with its colonel. Participate in discussions and competition as a talk to the next video.