Transcript:

Okay, So now I have just with that. We have seen an overall idea behind gas. So you have the generator. You have the discriminator! Both of them are neural networks. You have this minimax loss function run of the one part is with respect to theta. The other part is with respect to Phi. And you alternate between these two objectives of T And Phi, Okay. Now what I’d not told you is that I just said that. The generator is a complex transformation, which is a neural network. Right, But. What is this neural network in practice? What do you use for that neural network whether you use a feed-forward neural network or use a convolutional neural network? So that’s what we look at in this module. What’s the architecture used for the generator? What’s the architecture used for the discriminator? Okay, and again there are various choices here. I’m going to talk about the most popular one which works very well, right so that’s. Something known as deep convolution grants. So the good thing is for the discriminator. You can use any CNN architecture that you want so pick up. Whatever is your favorite architecture? Vgg resonate whatever just use it as a discriminator. What would be the output layer of the discriminator? How many neurons will it have? One there is no. Softmax as we require an amazing it, they’ll just be one neuron that will give you a value 0 or 1 Is that fine? Okay, So the discriminator will just be any convolutional neural network that you like any popular architecture with one output. Which tells you whether this is real or fake, Okay. But for the generator, things are not so straightforward. People experimented a lot with various architectures and this paper, which is cited here actually came up with this A set of heuristics for what works well for the generator. And this is the network which they tried, so lets. Look at it, so you have this noise vector. Which is hundred dimensional. Okay, From here, you need to go to say a 64 cross 64 cross 3 output Right So. This is an image of 64 + 64 cross 3 channels. So this is your input and this is your output. So how do you go from the input to the output? So use a series of so let’s understand each of these? The first thing that you do is see what is said here. It’s project and reshape, Okay. So what you’re going to do is you’re going to take this 100 dimensional vector. Multiply it by an appropriate. W so that you get an output of size 1024. Cross 4 cross 4 What’s the size of W going to be? That’s the project part, Right. So what’s the size of W going to be? You have this as the input. This is the output This is RD. This is our N. What’s the dimension of W D Cross? N right, Is that fine, but because n will just give you a vector again. So what you do Is you take that vector and reshape it into 4 cross 4 cross 1024. Is that fine everyone gets this. Okay, Now what you do Is You apply something known as Transpose convolution. Because now remember, we have to go from 4 curse 4 curse 1 0 2 4 to 64 cross 64 cross 3 So we have to actually increase the size. Where is all the convolutional neural networks that we have seen? They start with 64 cross 64 and come down to a smaller size. So we have to do the reverse operation now. So you have to use something known as the Transpose convolution. So what’s the transpose convolution or in other words? Tell me, how will you do this operation that you go from 4 Cross 4 to 8 Cross 8 Shall give you some hints-. The operation is transpose convolution. You know how to take transpose of matrices? You also know how to treat a convolutional neural network as a feed-forward neural network, where a lot of entries in the weight matrix are 0 So in the first figure, if you can recall that we had seen in the convolution neural network where we are taking this M, this timid converted into 16 inputs 4 cross 4 and then shown the second hidden layer and shown a feed complete feed forward neural network and then remove some edges from it, Right. Do you remember that vaguely, Of course you do? Okay, So Ill in the interest of time, right. I’ll just tell you what I mean by this. So you can always think of the convolutional neural network as a feed-forward neural network with sparse connections. Right, So this guy would only be connected to these two, Okay. Maybe, for example. This guy would be connected to these two and so on Right, But the other way of saying this is that all of these are actually connected. It just happens that these dotted lines, which I am showing, are zero weights. Is that fine then? I can write this as a normal weight matrix, which goes from an n-dimensional input to a d-dimensional input. Is that fine? Now, what do I actually want to do? I want to go from a d-dimensional input to an n-dimensional input. What kind of a transformation can I do? W Transpose. I take the D dimensional input applied W transpose to it. I’ll get an n-dimensional. Does that make sense? Avian gets that, so that’s the inefficient, intuitive way of understanding it. In practice, you will have these API stencil flow and so on, which will implement this transpose convolution much more effectively, Right. Because if we do it this way the way I said they’re going to do a lot of these zero multiplication’s Right, which does not make sense. You know that the output is going to zero, But you are still doing those multiplications. Right, So actually, it will not be implemented. It like that, but the way to understand a transpose convolution is that you treat the convolution as a matrix multiplication operation, a very sparse matrix multiplication operation. Just take the transpose of that matrix to go from d to n. Everyone gets this piece. Raise your hands if you get this. Okay, Good, So that’s what these inverse transfers sorry. Transpose Convolution operations are, and that’s how you grow from 4 cross 4 to 64 cross 64 But how do you reduce the depth from 1024. To 3 How will you do that? How did you increase the depth from 3 to 1024. In a normal condition in your network by increasing the number of feature maps or number of filters. So now I will keep decreasing the number of filters. Okay, So here. Originally, you had 1 0 2 4 in the next layer, You’ll just have 512 filters and so on till you go to three, Does that make sense? Okay, So. This is what the generator looks like. This is a standard architecture for DC. Gants and this works very well in practice and the discriminator, as I said, can be any convolutional neural network. Okay, And here are some straightforward guidelines for having a stable, deep convolutional neural net, the stable, deep convolutional Gans. So if you go online and read a lot of literature about Gans, one constant complain is that it’s very hard to train them. The training is not very stable. You start getting nans you start. The disk! Media learns very fast, but the generator lags behind and all sorts of things happen. Right, So that’s why there’s a lot of emphasis in coming up with stable architectures, which learn well, so DC. Gas was one such interesting and important contribution where they came up with an architecture which works very well and here are the standard guidelines that they had so for the discriminator you do not use any pooling layers and instead you strided convolutions. What does that mean? What does a pooling layer do compression? What does trading do again compression? Right, Because you apply straight. They are not going over every pixel, so you’re going to get a smaller output so instead of doing Max pooling, you do strided convolution and for the discriminator sorry for the generator, use fractional strided convolutions. What does that mean, that’s the Transpose convolution operation that we just spoke about, Right. For the generator, you will have to use this Transpose Convolution operation, Okay, Use batch norm. For both the generator and the discriminator remove fully connected layers from the deeper architecture and use ReLU as the activation function for except for the output, which is going to be tan edge and use leaky ReLU for the describes. It all looks like very Blackmagic re’d. You can use this, use this and so on, but this is come out after the lot of experimentation, and these are the configurations, or these are the choices which led to the most stable training. Right, So you could just take this off the shelf and try to implement it and it should work better, Okay.