Transcript:
I’m going from Cornell University. The paper I’m presenting is density connected convolutional networks and this paper, which joins it down with Dromio. Lawrenceville, imagine and Okinawan burger? We are fortunate to win the best paper award this year, so the table is an architectural design for convolutional networks. So, comrades, were supposed about 30 years ago and there are many changes we made it to the architecture, which makes more and more powerful and the two years ago, the deep the basic architectural almost remain unchanged, like a join this picture. Each layer receives input from its previous layer and the generated features for the layer. Write a plate. Two years ago, the President Otto would propose so it introduces the identity collections who baptized singles across layers and this greatly promote good in propagation in the network and resin has achieved unprecedented. We’ll read out how many changing humble vision tasks in this table. We propose a loop connectivity pattern, which called dense connectivity. The idea is quite simple. In this article, we connect every two layers in the network, so each layer receives it signals from all it’s preceding years and this in this proceed. The input is combined by channelized concatenation, which I will talk about it in detail later and in this network as each layer has direct access to is created in years, so there is low information bottleneck in the network. So actually, we can make each layer much thinner and the obtaining a much compact model. This gives high computational efficiency and the parameter efficiency and each layer generates a feature Maps and the weak hoecake growth rate here and a case generally quite small. Now, let’s take a look at how the features are generated and used in the unit instance we have the original input. X 0 and the we passed it through the first layer, which corresponds to a linear transformation x1 and we obtain the feature x1 Now we concatenate x0 and x1 and use the second layer to generate the output for layer 2 and we concatenate the features of 10 so far and use the third layer to obtain new features. And we keep doing this and here. The end of the network and for each of the La Nina transformation, its crèche responds to a virtual machine air, followed by rectified linear units layer and the convolution layer will give the size 3 by 3 as you may have noticed as we are keeping concatenating features in the network, The input to deeper layers will become very wide and this may introduce too much competition for deeper layers to address this problem we found using a relatively cheaper convolution with the other size one by one to first reduce the dimension of a channels to 4k could greatly improve the parameter efficiency and also computational efficiency and here is the full picture of a dense, dense as in a normal component. We have cooling airs or convolution leaders who don’t conform down something on the picture maps, and this makes the concatenation operation unreliable so to address this problem, we simply spray the network into multiple dense box and within each box, the feature maps have the same size so they can be easily concatenated, so you may wonder we already have the amazing sense, resonant architecture why we bother to use a lot of network detector so here We summarized several prominent advantages of them sent. The particular advantage is in a dense set. The arrow signal can be easily propagated to earlier layers more directly, so this is a kind of implicit. Depot supervision as earlier layers can get more direct supervision from the final transcript of classification area and the second advantages here is dense. Data tend to have higher parameter and the computational efficiency, For example in each convolutional area of a low mode Normal component. The number of parameters is the proportional to C times C, where C is a layer width or the number of channels produced at each layer, however, in the densest, the number of firm, this is proportional to L times K Times K, where L is the layer index and the K is the growth rate yulik we have K much smaller than C. So the number of families in each layer of the instance, is really much fewer than that in a normal component. A lot of advantage here is in a denser. The features to each layer is a consolidation of features from opera billionaires and this tend to be more diversified and the tend to have richer patterns. The third of energies is Tensile maintains low complexity features across the network so in a standard component, the funnel classifieds build on top of the last convolution area, which produces most high level features and also most complex features because it composes many nonlinear transformations, so in a dense that the classified depends on features from with all competitive levels, and it uses both complex features and also simple features and the disk, an intuitive, most smooth decision boundaries and this really gives high generation performance, so this probably explains why dense that works especially well when the training data is insufficient. Sorry, let’s take a look at the hall. Sensor performs in practice. We first run dancer on. FIFA 10 data set, which is a classification data set with 10 classes. So here we first between a hidden results of ResNet, the 110 near Resnick at 6.4% area on this data set and one colony resident got six point four point. Sixty two percent hetero know it when that small, dense net with the 100’s and only point eighty million parameters, we were able to get comparable performance at the 1000 a already net, but using about one tenth of his number of parameters and the which in a larger than set, it gives significant significantly lower test error than previous state of the art. The better results shown here is the previous data. What was at a time with some metal paper? So if we train the same models on the same data set, but without using data impatient, we can see that residents over feeds to the training data severely, so both models gets higher than 10 percent error, however dense that is the able to get five point nine and the five point two percent Tessera without using any data limitation, so on the FIFA 100 data set, the trend is club. Sing enough and the this modern cell is able to get a comparable performance at the March, larger ResNet and the larger than cell gets state of the other performance so on the larger scale image classification that I said the answer is able to get similar performance as a resident but using less than half the amount of parameters and about half a month of computation, so the we recently twin 264 layers and sense, it’s got 20 points, 27% of wire on in gel and the finally under actually talked about a rest. Knox, the other on top of them set which is called the multi scale denser It is and for faster inference at the test on the that’s working learns Multi scale features at each layer and it’s used in connections at each scale, and the most importantly, it introduces multiple classifies attached to intermediate intermediate features to an inference We first pass a test image to the first classifier and we check the competence level. Which is the max maximum of the Max prediction. If the confidence is less than pre given threshold, we evaluated the second or classified and they until the class we keep evaluating until we get the confidence rather than the predefined threshold and the for the rest of features and the rest of classifiers, we can skip the conduct. Skip the commutation. So during inference, we can access easier images from earlier classifiers and only use expensive classifies to evaluate other examples, so give this gives us 3.6% times faster inference than resident and about one point three times faster than themselves. So if you are interested in using stem cells in your project, we have released the OVA Code and the models on Github and there are many third-party implementations and we also recently wrote a technical report on how to implement them set in a more memory efficient way and the technical will call will be on our case. Maybe later today and welcome to a process. Thank you very much.