WHT

Wasserstein Distance | Introduction To The Wasserstein Distance

Applied Algebraic Topology Network

Subscribe Here

Likes

206

Views

4,348

Introduction To The Wasserstein Distance

Transcript:

Hello, everybody! I wanted to give an introduction to the wasserstein distance. The vasserstein distance is used in applied topology for a variety of things distances between persistence diagrams or to define metric, some pressure thickenings. But today I just want to give an overview of the fossil sign distance completely separate from its its uses in applied topology in particular, so there are many names for the vaser sign distance. It’s also called the Kanterovich Rubinstein or the optimal transport or the Earth mover’s distance. And let me give you just a intuitive introduction to begin. So pretend I have three functions. F G and H. These are real valued functions defined on the real line. Now, one type of function distance might be the supernor’m distance, so I could consider the Supernorm distance between F and G all right, and to figure out the supernorm distance. You find the position in the domain where the values of F and G differ by as much as possible, and you see how much they differ by there at that spot. So maybe right here is a spot, You know where the values of F and G differ by as much as possible, you know, about the height of this black vertical line, so you can see that in the supernorm distance F and G differ by about that much and F and H differ by about the same amount, the height of that peak, so using the supernorm distances, the distance between F and G is about the same as the distance between F and H We’d like to have, however, some notion of in what sense is G closer to F than H is to f, right. If you if you sort of allow yourself to translate the function horizontally, right, you can get from F to G much more quickly than you can get from F to H. And that’s that’s what the wazer sign distance will rigorously encode for us. So, by contrast, using the waserstein distance, we’ll talk mostly about the one waserstein distance in this video, the waserstein or one waserstein distance between F and G is actually much smaller than the one was or sign distance between F and H okay and again. Intuitively you can see that. If I wanted to move f to get G, I wouldn’t have to move it as much as if I wanted to move F to get H another intuitive thing that I wanted to talk about, which is really important for why people care about Wasserstein distance is what do Geodesics look like in the space of functions under say the supernor’m metric or under this, the wazerstein distance, so a geodesic is a shortest path, so let’s consider the the geodesic or shortest path from this function F to this function H using using the soup, norm. What does a geodesic look like you start out with all of f and then you just take pointwise linear combinations of F and H to get all the way from F to H. So you start out just at F and then at, you know, um, one quarter of the way there where you are is three quarters of F plus one quarter of H and then once you’re halfway there, you have this function, which is half of F plus half of H and then once you’re three quarters of the way there you have this function, Which is, you know one. Quarter of F, plus three quarters of H and finally at the event. Very end. You get two, just all of h so sure we’ve translated from F to H, but we’ve done so in this way, where the function has sort of shrunk on one side and increased on the other and the shape of the function has really been distorted. By contrast in the waserstein distance was the geodesic between F and H. Okay, you just started F as you must, and then one quarter of the way there you’re about at G and then one half of the way there you’re at this function and then three quarters of the way there you’re at this function and finally you get to h so as you can tell G does six and the waserstein space can really preserve the shape of a function much more faithfully. Okay, so I want to build up to the definition of the waserstein distance between two functions. I’m remaining quite intuitive and the Western sign distance is really a distance between probability measures not functions so so really, you should think of these functions here as probability density functions for measures, All right, so I was being a little bit colloquial when talking about functions, really. The waser sign distance is a distance between measures. Let’s start not with measures with infinite support, but with measures with finite support. So I have two measures here. There are measures defined on the plane they both, uh, have finite support, so the blue measure has support. Which is these three points? The red measure has support, which is these four points and then the label on the point tells you how much mass there is at each point. So the waserstein distance. The one waserstein distance between these two measures is going to be determined by an optimal transport plan. I have these blue piles of dirt and the numbers. Tell me how much mass I have in each pile, and I have these red piles of dirt and my optimal transport plan is gonna is gonna move the dirt from the blue piles to the red piles in a way that requires as little work as possible so to actually move the blue to the red. I’ll move all 0.3 units from this blue point to this red point. I have an extra, Um, let’s see and then I’ll move. Um, 0.1 units here to to complete this red pile labeled 0.4 All right, at this pile. I’ll move another 0.1 units from blue to red and then remaining in this blue pile. I’ve still have just 0.2 units left, so I’ll move that much to this red pile, and then finally I’ll take this blue pile and split it 0.1 and 0.2 All right, so as you can see, I’ve, I’ve taken the blue piles and I’ve split them up and I’ve transported ported them to nearby. Red piles. Let’s make this a little bit more rigorous, So I’ve given names to my points. My red points are x1 x2 x3 and x4 and my blue points are y1 y2 and y3 and lets. Um, lets. Do write down this transport plan as I do it. So from Point X one, I move, I wanna move point three units to Y one, and I wanna move the remaining 0.1 unit to y2 So that’s why I put 0.3 and 0.1 and these entries in my matrix. That’s how much mass I’m using. I’m moving from x1 to y1 and from x1 to y2 And then I’m not moving any mass from x1 to y3 x2 I let’s see all right, x2 We decided to move. Um, all 0.1 units to y2 so I have zeros above and below and then x3 I’m going to move 0.2 units to y2 and 0.1 units to y3 and no units to y1 You see that, so that’s why I put a 0 here and then Lastly x4 I’m going to move all 0.2 units to y3 So this matrix that I’ve drawn in black. This is what’s called a transport plan. I’m telling you! How much mass are you moving from each X point to each each y point? Um, because all of the mass needs to be accounted for right that the sum of this column needs to be 0.4 and the sum of this column needs to be 0.1 and the sum of this column needs to be 0.3 and the sum of this column needs to be 0.2 as it is and because all of the blue mass needs to be counted for the sum of this row needs to be 0.3 The sum of this row needs to be 0.4 and the sum of this Rho needs to be 0.3 Okay, so a transport plan indeed has to move all of the red mass to the Blue Mass. This is just one possible transport plan. You could write down other transport plans or maybe I do move some mass, you know, from x3 to y1 you look over all possible transport plans when you’re trying to find the water sign distance and you, you choose the transport plan of the lowest possible cost, so the cost of a transport plan is as follows, so I’m looking at this red measure. It’s this this weighted sum of direct Delta masses. I’m looking at this blue measure, also a weighted sum of direct delta masses and the one waserstein distance between these two measures is the minimum cost of all possible transport plans. The pi ijs are just the entries of this matrix. Okay, so as I vary, I and J right. The pi ijs are read off by the labels in this matrix. The cost of the transport plan is given right here, so for every amount of dirt that’s being transported. The price you have to pay is the distance between the points that you’re moving. So the price that I have to pay to move. This dirt is 0.3 because that’s. How much dirt I’m I’m moving times. The distance that that dirt is moving and then the the amount of the price I have to pay here is 0.2 the amount of dirt I’m moving times this distance in green. So it’s like the concept of work coming from physics. You know, it’s force times distance is how much you have to pay. This right here is just saying that I indeed have a transport plan, so every entry in my transport plan is not negative because I can’t. I’m not allowed to transport a negative amount of dirt and the marginals in the, um, in one direction have to recover the blue measure, so that’s? What I already talked about when you take these row sums, you have to get the blue measure back and this formula right here. Is that’s saying that when you take the column sums you have to get The red measure back. So a transport plan is really a joint probability distribution, whose marginals are given by your two measures. I think that’s a nice way to think about it. In summary, the one was or signed distance between two measures is the minimum overall transport plans of the cost of the transport plan. Here’s the cost and a transport plan. What is it? It’s a joint probability distribution whose marginals are given by your two measures? Okay, so that’s how you should think about the water sun distance between Finitely supported measures. I’ve written down here. The formula for arbitrary measures, potentially of infinite support, so Mu and NU are two measures defined on a metric space capital M. And I’m defining the waserstein distance between these two measures, it’s the infimum over all possible transport plans, and then you, you integrate the the distance between points that are being matched times the amount of mass that’s moving from one point to another, so Pi is again A joint probability measure, right, you can see, it’s a joint probability measure. It’s it’s, um, it’s defined on M Cross M. Not just on M. But its marginals. When you integrate out in one factor or another, give you you your two input measures Mu and NU that you’re trying to find the distance between. So, for this particular example, you know, what would your transport plan? Look like this mass. You would match here, you know, and and maybe this mask gets mapped here and this mass might get mapped there and this mass gets mapped here. You can see that, you know. Mass on the left of Mu is going to get matched with mass on the right of sorry, Mass on the left side of of Mu is going to be matched with mass on the left side of NU and mass on the right side of Mu is going to be matched with mass on the right side of new and the the total cost or the water sign distance is going to be as follows. You look at this optimal transport plan and you multiply. How much mass was I moving times? How far did I move it? And you integrate that over all of the matched mass. Thanks so much. That was just a brief introduction to the Wasserstein distance. Here’s the definition for, uh, the distance between measures with arbitrary support, perhaps infinite or finite support, and here’s a more hands-on definition for the wazer sign distance between two measures of finite support, you should think of the waserstein distance as matching mass and and moving mass from one configuration of piles to another and is quite useful in applications. You can get a nicer notion of distance where here the probability density function G is closer to f than the probability density function. H is, and you can get nicer. G D six and the was or sign distance. When you transform f into H, you sort of just slide F over as opposed to when the, uh, with a super norm distance when you transpose after h, you really distort the shape of the function. Thanks so much and I hope you enjoyed my introduction to the weiserstein distance you.

Wide Resnet | Wide Resnet Explained!

Transcript: [MUSIC] This video will explain the wide residual networks otherwise known as wide ResNet and then frequently abbreviated in papers as wrn - the number of layers - the widening factor, so the headline from this paper is that a simple 16 layer wide. Resnet...

read more