Transcript:

Okay, let’s start the next session and next, stop it so next topic is basically a shot on that and we are going to discuss about Softnet, but before going through shopping net, Let’s let me introduce the group Convolution idea. So group coalition is basically the idea is dividing the input tensor into multiple parts, multiple muscle groups and perform the convolution on that group separately and then combine again, so the Appliance convolution is kind of special case of grouped coalitions in Vampires convolution. You divide the input tensor into number of channels groups, but in general group coalitions in group combinations, you can divide your input tensor into any number of groups, so in this figure, you can see the input. Tensor is here and it’s divided into two groups in channel dimension. And you perform the convolution operations, for example. This filter is common over this area, and this filter is followed over this one and after that after convolving and separately, you just concatenate the output output tensors. So this idea is basically proposed in Aleks net that was because of memory constraints in 2012 As you know, we didn’t have enough GPU memory for this kind of deep learning computations, and since they didn’t have enough Chippy memory, they wanted to train the Alex net in two GPUs and they wanted to and group the convolution of Convolution filters into two groups, and in this way they could train day, so this is the Alex Nest Network structure, so this this is the input image. They divided the convolutional filters into two groups because of number constraints and they they have done all this feed-forward operations in one GPU, and this is another GPU and they combined these feature maps and then at the last layer so of course, they didn’t know about group commendations while doing this, they have just well done engineering skills, but after that so researchers invented group convolutions so by grouping the convolution, for example, if you have two groups, you basically increase the number of operations two times. I as you may remember in that place composition since we separated. Each channels include in two separate channels, separate answers and we decreased number of operations a lot, for example, if you had two groups, you will decrease the operations to 200 times so other than that. There’s another advantage that is don’t better representations, like, for example, this convolution layers would learn something related with this one, not this one, and this also means the future relations are sparse, not not so that’s but at the same time it has also the desert. Vantage that is, for example, feature features here are only taken from all these feature maps. All these convolutional filters. This is the disadvantage, so you cannot like, propagate this information here. You can only do and propagate these informations in this channel in this this group of conditions, so this is the main disadvantage. Social social net is about addressing this problem and what they do is actually so this is how the group coalitions and in shuffle net the proposed to shuffle the channels shuffle the grouped convolutions after each after each layer. So basically, so since right now, we don’t have any memory constraints. We can do a group convolution in the white chip, you and make shuffling again and go further to next layers, yeah? The the main side effect was output from a certain channel. I only drive from a small fraction of input channels, and the solution is shuffling after each layer. And so as you guess in order to train the neural network, everything you will use in your network should be differentiable because you want to differentiate your boss different gradients and backward backward to the network, so channel shopping also differentiable. So that we can you can train your. You can train your network. You can like store pitch shuffling combinations, and then this this manner you can differentiate the gradients. So this is the main idea of shuttle net, so as the middle state in implementation and most of networks has units or like modules or small small block, both small blocks and shop net also has the block and this is the bottleneck layer named bottleneck there. This is the input tensor they applied 1 by. Y convolution first and then divide 3 by 3 convolution and again one by one and just add skip connection here, and this is the output output tensor. And this is the another version that is point wise group convolution so here they make a group convolution on on one by one coalition layer, so they have like one by one convolution and they group this one by an oldish into. For example, you can do it two or three. What else everything is okay and shuffling the channel and doing the device convolution and again, yeah. This is the main idea of shuffle net. Let me recap. They use point wise convolution, and they also make the group demolishing in point wise combinations, so in this manner, they they use the adventure advantage of group group coalition and also by shuffling by shuffling the channels they overcome the main side effect. So this is main idea chiffon that.