Transcript:

Hello, welcome to the AI Coffee Break! Today, Ms. Coffee Bean has prepared the definitive introduction to Graph Neural Networks! Or GNNs in short. But Ms. Coffee Bean, are you not a little over-confident to say “definitive”? Phsssht! I just wonder if… psshht! Ok, ehm, here we go: I always struggled with Graph Neural Networks in machine learning classes. But one day, I could not get around them anymore and had to learn them! They are such a huge topic in machine learning for a very obvious reason: graphs are everywhere! And not just here artistically represented in this stock image. Let’s take for example air travel. Flights from origin to destination with perhaps a stop in the middle, can be represented as a graph, where airlines try to maximize the number of passengers transported by trying to minimize fuel, time and other costs. Navigation systems on smaller scales, on street maps, also do graph optimization for getting you to your destination. Other examples of graphs are personal connections or graphs that social media apps live of. Or even text is also a graph if we think of it not as a sequence, but if we analyze it linguistically in terms of the dependency parse, for example. Images are graphs too in multiple senses: We make one kind of graph interpretation of images when we represent each pixel as a graph node and each pixel is connected with an edge to its neighbors. This is a very regular kind of graph, a grid. Another, more semantic graph interpretation of an image, can be in terms of the scene graph, where objects and entities are nodes, connected by their relation in the image. One more example and we get finally to Graph Neural Networks: In NLP people are interested in the knowledge graphs comprising facts about the world, like these true facts here. Well, I hope you are now convinced now how ubiquitous graphs are, so of course we need a deep neural model to handle these structures, right? Well, graph neural networks are the deep and neural solution to this. They come in many shapes and flavors, but we want here to stay simple and explain only the Graph Convolutional Neural Networks. If you understand this one, you will also understand the basics of the others architectures, because they work based on the same principles. And Graph Convolutional Neural Nets are not hard, you just have to apply this formula iteratively and you are done. Wait what? Well if that’s it, let’s break it down to every detail: First, let’s talk about the nodes. They are denoted here with h, where h is a vector representation of the nodes. These vectors are chosen depending on the application: They can be word vectors when working with knowledge bases, they can be pixel values when working with images. They can be a combination of image features and word vectors if working with scene graphs! In Graph Neural Networks, the idea is that we have initial vector representations for our problem. But we can do even better, if we take the neighbors and relations into account. So in principle, we update our initial vectors using information in the graph structure and get vectors that represent the problem better. So the key is not in the initial representation of the nodes, but in how one aggregates the representations of the neighbors in order to get an even better representation than the original. This is why we apply this formula recursively, to get another, better vector h at each time step. Only the question now, is how do we exactly get this better representation? First, when computing the new representation, we want to keep something from the old one. For example when working with a knowledge base that truthfully says that Ms. Coffee Bean is a star, when computing a new vector for her, now knowing that she is a star, we still want to keep the knowledge of her identity. This is why, we take the old representation and multiply it by a weight matrix, deciding how this previous knowledge should be transformed. But how do we know this weight matrix? Well, we don’t! We let a neural network decide for it! We train it on our given data of node labels, edge labels or other predictions from the graph structure. When training with these objectives, the neural network learns to make right predictions given the new representations, so it better should learn the weight matrix that helps to get this new representation. This is how we decide on what and how to keep from our original representation. Now, we also want to aggregate the information from the neighbors. So we take their representation and also let a neural network decide how to transform it and what to keep from it for being able to solve the given task, for example node label prediction. And here is where the magic of the graph convolutional neural network happens: In the aggregation function. By this sum we say that we sum over all transformed neighbor representations. By c_ij we mean that we normalize the vectors differently for each neighbor. But this is not really important, because crucial here, is that the normalizing sum here is a permutation-invariant aggregation function. A whaat now? Well, this means that however we aggregate the information from the neighbors, the way we do this has to be insensitive to order. Because there is no intrinsic ordering to a graph, how would we ever decide if we take node 4 or node 2 first? So we don’t decide, because we take a permutation-invariant function, delivering the same result given any order. Another example for a permutation-invariant function would be the product or the maximum function, because they do not depend on argument order. And because we aggregate the information from the nodes, similarly to how we do in CNNs (convolutional neural networks), this is then called the convolutional graph neural network. Of course, the name “convolution” is misleading, because the convolution in images is not permutation-invariant (you can check it), but still, it is a valid operation if we regard images as graphs, because images are a special type of graphs, where the order is given. So yes, the name convolution is misleading, but it is how it is! And here we are: If we apply this formula iteratively, at each timestep we learn a better graph representation comprising node information also from neighbors and their neighbors. These new informed and refined representations should be then suitable to be used for the new task of, for example link prediction, node classification or even entire graph classification and many other tasks. GNNs have been successfully applied in the medical domain for example, to predict side-effects due to drug interactions. Or in numerous graph optimization problems, like the travelling salesman problem. Or for various applications in knowledge graph prediction. They can be even used for predicting object trajectories from video. These applications are just a few from a very big list. Where there is a graph (so everywhere), there can be a graph neural network solving the problem. So this was it, we hope that now you have a better understanding of Graph Neural Networks and are curious enough to go out and explore the topic of GNNs by yourself! Hey, do not forget to like and subscribe! See you!