Transcript:

Hey, guys, you’re watching. Python tutorials on my Youtube channel Python for Microscopist. In the previous tutorial, I talked about malarial cell classification using convolutional neural networks. And it was a classification problem where you can classify your images into one of many classes, okay. In the previous example, it was just a derp parasitized malarial cell or a healthy malarial cell. Okay, so it was a classic lassic classification problem. Now, in this tutorial, I’m going to talk about segmentation problem using deep learning, okay, so this is image segmentation using an architecture called unit, and this tutorial is about basically explaining what unit is, and the series of these tutorials basically covers how we can start understanding unit, you know, and then coding it in Python and then applying it on a real life example, so as I just mentioned unit is a special type of architecture for image segmentation purposes, and when I say architecture, that means an arrangement of the deep learning tools that we are familiar with like convolutional layer, for example, and Max pooling, you know, take these tools, arrange them in such a way that the result would be image segmentation. I’m not going to talk about what a convolutional layer is and all of this, which I’ve already done in one of the previous videos. And of course you can Google search or I should say search on Youtube where you can find excellent videos talking about this, so I’ll just mention certain terms and please dig into these at your convenience so but just a quick reminder. Okay, so here you see a special arrangement of various various convolutional layers and max pooling layers to achieve certain tasks. Okay, in this case? I don’t think this has any special name. But there are various architectures where people put together. You know these these into special arrangement, and they became a bit famous now. Looking at this example, as you can see, the first layer here is just an input layer. It’s called an input layer and you see the depth of this or the number of dimensions in this direction is three, which typically means it’s a color image, right, so you have RGB channels, so you have going back to this example, you have to 24 pixels in X to 24 pixels in Y, so this is a 2 24 by 2 24 by 3 image. So this is my input layer. This is what’s going into my convolutional neural network in this example, and the next layer that actually dimensions changed from 3 to 96 because there are 96 filters, apparently right here, digital filters that are applied to this image so now we have 96 copies or I should say Ninety six copies 96 convolutional responses of this input image and that makes up this next layer, OK, and again, a convolutional is nothing but a matrix multiplication and the dimensions may change like from 224 to some other dimension down here, depending upon how much padding you add to the image and so on, OK and Max pooling is again now you apply in this example, 5×5 Matrix, and you actually run it along the image or along this matrix, and now you can actually move this 5×5 matrix by one step in which case it’s called, the stride equals to one, or you can move it by two three four five, right so the stride can be any number and that also defines the dimensions of this output layer are the output image Over here, OK, so again. The dimensions are increasing here continuously increasing and then decreasing over there. And finally here you have the dense layers and the output over there. So this is a quick overview of you know the convolutional layers in this case or I should say our neural network now. This is a very confusing, busy image, and there’s a reason why I put this together because you can see some of the famous architectures that are out there. You probably heard of Alex Net and VGG before those lenok was kind of famous and then Google lent it a bit more complicated as you can see inception with quite recent. I should say inception B. And these are the recent ones I believe both are by Google, But as you can see, there are different architectures and we can put together our own architecture because no one single architecture is great for all types of problems. Depending on what type of problem we’re working on, we can actually put together our own, you know, Neural network, our own architecture? We can design our own now. The question is, how do we design our own? Of course, there are people who are getting. Phd’s working on these type of topics. Okay, so if you just want to use this as a tool to segment your images, then. I think we can still do that. We don’t need to be the architects of neural networks. Okay, so now. I pick this vgg because that’s again. A relatively famous. A lot of people are using VGG if you look at how many people use Linux. How many papers are out there on these vgg is probably the ones that has the most number of papers, in fact. I saw a bubble chart where it actually reflects that. Now we see 19 again, just a quick example OK? So initially you have your convolutional layers and then max pool layer and then another couple of convolutional layers Max pool again, convolutional max pool and so on and finally, the dense layers here and the output layer over there. OK, and the dense layer is called dense because here. This is where you have again. I explained this in the previous tutorial, but where you have a whole bunch of neurons that are connected to each other. Okay, so to code this. I mean, you can, actually, I believe there is a library that you can call like for Vzg 19 But if you want to code this like line by line. It should be pretty straightforward. OK, so you start by defining, OK? Convolutional layer 1 Then what is the well, you know? What are the parameters for this convolutional layer and then -? And then we are defining the max pool layer right here, and then the maximal layer goes as the input to the next convolutional layer and so on, OK as you can see after Convolutional 5 now we are entering into this dense layer and here is where dense layer is defined again. There are multiple ways of writing this code, But I’m just showing you an example code that I literally copied off a Google search result. Okay, so this is Vcg 19 again the point here. I’m trying to make is there. Various architectures, Vgg, 19 is one of those. And as you can see, you can put together this architecture yourself. If you know how the structure looks like, and you can modify this, you should modify this to make sure you’re you’re putting together the best. Network for the problem that you’re trying to solve now Unit architecture is designed for semantic segmentation. OK, unit is again. One of these architectures that’s designed for semantic segmentation. What is semantic segmentation again? You look at when you search for unit, you will probably run into this term semantic segmentation, just to give you a quick introduction. Let’s say we have an image like this, and if you have a bounding box around each individual in this example, then this is typically the object detection. If you have a bunch of cells in an image if you put bounding box around each cell or a particle, if you are looking at particles, then it is an object detection. OK, now, if you actually paint the pixels corresponding to humans in this image, then that is semantic segmentation, where every pixel either represents a human or a background, or I should say a nonhuman. OK, so a person or a non person, okay, or the background. So here every pixel is painted. Okay, so this is semantic segmentation and unit is designed to do exactly this task now to take it to the next level once. We actually do this now. If you can separate each individual like as person, 1% 2 3 4 5 then this is called instant segmentation, and this is an extension of semantic segmentation In a way. OK, so again. This is the task we are trying to achieve with unit. So how does it look like again? Don’t be worried about this confusing, or if it if the slide is too busy, don’t worry about it and here is the link. Of course you can always. Google search for a unit and find this original paper by these. I forgot the name. I believe, you know. The paper was by Olaf Rona Berger, Rona Ber ger. So these are the ones who actually published this and the original intention, the original reason why they have actually came up with this architecture was for biomechanical image segmentation. Okay, so again, it’s called unit as you can tell because it looks like you, okay. That’s exactly why it’s called unit and the architecture itself contains two paths. Okay, the path on the left, where it’s it’s a. I mean, it’s called contraction pad, okay, and it’s also called the encoder pad and on the right. This is the expansion or the expanding or the decoder path. Okay, and in between as you can see, this data from here, for example, is concatenated with this one, so this, and that are put together and in this context, actly, what gives the benefit or the concatenation of these feature? Maps is the reason why we get localized information what it makes a semantic segmentation possible using unit again. You can read more about this. If you are curious, you know, my intention is to make sure you understand at a high level. What it is and how to use it for your image segmentation. Okay, now let me actually. I mean, the example that they used. They used an input image of five seven to two by five 72 And so on. You can see that, okay, then they use 64 features and so on, but we can modify this. Okay, and we can write our own code. In fact, I would like to modify because the the network architecture as it is is okay, but then the parameters that are used did not work very well for the cell segmentation that I was trying to do so experimented with this. In fact, I did a lot of Google search and found like how others actually did it, and then obviously there is nothing wrong in copying. What others, if it’s saving you a lot of time, obviously, proper acknowledgement of those guys is definitely necessary, but then after all of that. I realized, okay. This is probably what works for the cell segmentation example. I’m gonna show you in the next two three four parts of this video, so let’s look at the top left Because this is where the input layer is and my input image. Let’s actually take all of our images and resize them to 128 by 128 and these are color images so 128 by 128 by 3 This would be all of my input image dimensions and they go into the input layer and then let’s add features 16 feature space of 16 so the images will be the output of this layer that we are calling. C One would be 128 by 128 by 16 Okay, and these are convolutional operations right there and the convolutional operations. Its 3 by 3 matrix that we are using here and/or the kernel size 3 by 3 and we are going to code it as padding equals to same. What that basically means is add extra pixels on the edges. Because when you run a 3 by 3 convolution, how does it treat edge pixels right at the edge pixels? We need to do something. So it adds an extra pixels to the edges, so the output image is same as the input image. That’s exactly what padding equals to same means. Make my output image, same dimensions as the input image. Otherwise, depending on your kernel size here, you may have a smaller image as the output image again, nothing wrong with that, There are various strategies that people use, But this is exactly what? I chose to do here. Then the next step is the max pooling step of using 2 by 2 and stride to Max pooling again. I explained it to you. In one of the previous videos, it’s nothing but put a 2 by 2 matrix and then within that, select the maximum values and replace that 2 by 2 matrix with that maximum value. So that’s exactly the operation that we are going to do where we get down to this stage and then the dimensions here would be half of this 128 half of that because we are using a 2 by 2 with asteroid 2 so this becomes 64 by 64 by 16 right there and then 2 convolutional operations, which means with a again feature space of 32 so it becomes 64 by 64 by 32 so well. Repeat this process again all the way until we get down here. Okay, so as you can see down here, The dimensions are now 8 by 8 by 128 and 8 by 8 by 256 up here. So now we are going to up sample exactly symmetric path as we got down, except we are going up now. Now we are doing this up sampling very similar parameters again 2 by 2 The only difference now is the result of this up. Sampling is like this box. Right there with this box would be. I forgot to add it here. This box would be 16 by 16 by 128 Yeah, because we are going from 8 to 16 over there, OK? And then it’s 2 by 2 so this would be 16 by 16 by 128 to that. We are going to add c4 OK, so this block is nothing, But you 6 plus c4 OK that? Plus this little thing. Okay, Hopefully I’m not confusing you So basically. These two individual rectangles represent each of 16 by 16 by 128 when you add it. It becomes 16 by 16 by 256 OK, and then it goes through a couple of convolutional steps. So the result here would be 16 by 16 by 128 Now we up sample again 2 by 2 again. Now we have two boxes each with 32 by 32 by 64 You add them, which becomes 32 by 32 by 128 OK, so that’s basically the steps here. So up sample concatenate. Okay, add these two. So this 1 plus that 1u 7 plus c3 over there, same thing. Keep going up! Keep going up all the way. And finally, the final output would be the convolutional layer with 1 over there, so 128 by 128 by 1 OK, in between these steps in between especially the convolutional steps from here to there, we are going to drop out which means randomly select. I don’t know if the drop out is 0.1 randomly select 10% of these and then just drop. Okay, 10% of the pixels and then just drop them from any of these calculations. And why would we want to do that? Well, it’s typically used to prevent overfitting. Okay, so let’s do that as part of this. And how do we code? This like you saw earlier for vgg. This is very similar, right so here. I put some code again. I’m not going to go through each line, but basically C 1 P. 1 right, so if I go back, so C 1 right there, which is nothing. But 2 convolutional operations, so C 1 is convolutional operation with dimensions of 16 and 3×3 exactly dimensions of 16 and my convolutional is 3×3 and we are using rail you as our activation. In this example, yeah, activation as Rayleigh. Okay, and padding equals to same, so that’s C 1 and then P 1 now we define that as Max pooling to buy to apply it on C 1 okay, so 2 by 2 Max pooling and apply this onto the incoming data, which is nothing but coming from C 1 this is it so and then we do two convolutional and then one Max pool to convolutional Z 1 max pool and we may add some dropouts like 20% here and so on now finally, when we get to the output layer now we have the OPTIMIZER. And then you see down here. I’m going to explain again. I did explain this in my previous videos, but later on when we start coding this. I’m going to explain, you know that There are a few choices for Optimizer. Adam seems to be the best. One and the loss function is what the OPTIMIZER is trying to minimize and the metrics that we keep track of as the as the algorithm progresses is accuracy. So this is this is it and repeat this step for each and every layer. And there you go, we’ll have our unit, okay, So, in the next tutorial, let’s actually start defining our unit in Caras in Python using Kari’s API and and then followed by that, Let’s go ahead and start understanding the data that we are going to work with and then implement it right away. Okay, so thank you very much for your attention, and I hope you liked this tutorial. If so, please go ahead and subscribe to my channels and let’s meet in the next tutorial.