Transcript:

We have introduced rnns. As architectures to learn features of time-varying processes, we define now Graphical neural networks as particular cases in which the signals at each point in time are supported on the graph to be more precise, consider a time varying process XT, in which each of the signals observed at each point in time, is supported in a common graph S in the figure, we show three instances of graph signals observed at times T minus 2 T minus 1. And T. The figures are variation diagrams where the edges represent changes in signal values. The graph that supports the signals is the same at all times. A graph recurrent neural network combines a graph neural network because the signals XT are supported on a graph and a recurrent neural network because XT is a sequence to define a grnn we begin by recalling the definition of an rnn, the component of an RNN that is different from usual. Neural networks is a hidden state ct that is updated. According to the Perceptron, Sigma of Axt plus BZT, we present here a block diagram that is more modular than the one we introduced earlier in this diagram. The observable state XT is fed to a linear block, where it is multiplied by the matrix. A the hidden state CT is fed to a separate linear block where it is multiplied by the Matrix B. The outputs of these two blocks are summed and processed with a pointwise non-narrative sigma to produce the state update ZT. This update is fed back as an input to the linear block, where it will be processed in the next iteration to compute an updated hidden state CT minus 1. The RNN involves the second perceptron this one processing the hidden state CT to produce the output estimate. Y hat T. This perceptron composes the multiplication of the hidden state CT with a matrix C with a pointwise nonlinearity sigma. In this video, We are interested in situations where the observed state XT and the output YT that we are trying to estimate are graph signals supported on a common shift operator S. We are therefore going to require that the hidden state CT also be a graph signal supported on the same graph shift operator S. This requirement is not necessary, but as it is easy to foresee requiring the hidden state duty to be a graph signal allows for the use of graph filters, this is likely to lead to architectures that are permutation equivalent error, retain the stability and transferability, properties of graph filters and conventional gns to complete the definition of a grnn. We therefore require that the linear operations, defined by the matrices, A B and C be graph filters. We start by specifying the update of the hidden state as one in which the hidden state and the observed state are propagated through graph filters. Then the matrix a is parametrically defined in terms of the shift operator S and is Furthermore, given by the familiar polynomial polynomial form. The coefficients of the filter are denoted as a sub K. This is the filter that we use to process the current observed State X T. The Matrix B is defined analogously. It is parametric on the shift operator. More concretely. It is a polynomial in the shift operator. The coefficients of which are denoted as B sub K. The outputs of these two filters are added up and the result is processed with a pointwise nonlinearity sigma. This produces the hidden state update ZT. The Updated Hidden State is fed back to become an input to the graph filter with coefficient’s BK in the next iteration. Observe that in this architecture. The blocks are all the same blocks that appear in the corresponding part of an rnn. The only difference is the use of graph filters and the blocks where a general RNN utilizes generic linear transformations for future reference. We write the state of data as the composition of a pointwise nominality sigma, or the addition of the graph filter, a of S applied to the observed state XT and the filter B of S applied to the previous hidden state CT to estimate the output Y T. The hidden state CT is propagated through a graph filter as well. That is the generic linear transformation. C is required to be a graph filter. This is a familiar polynomial. On the shift operator S modulated with coefficient’s that we denote with C K. Thus to estimate the output Y K, we multiply the hidden state C T with a graph filter and process the output with a pointwise nonlinearity. The output is our estimate Y hat T for future reference. We write the output prediction as the composition of a pointwise nonlinearity sigma with the graph filter C of S applied to the hidden state CT. This is a graph perceptron. With coefficient. C K applied to the hidden state CT. A GRNN is then made up of a hidden state perceptron, along with an output production perceptron. In our definitions, we write all of the graph signals as single feature signals, which are processed with single input single output filters. Each of the filters that make up the grnn can be replaced by a mammal filter doing so yells the grnn with multiple features. This means that we end up with a hidden state update in which the matrix graph filters signal capital DT is the result of applying the pointwise nonlinearity sigma to the sum of two Mimo graph filters. One of these mammograph filters processes the observable state capital X T. This is a matrix graph signal. The other mammal graph filter processes the hidden state capital. Z t minus one. This is also a matrix graph signal. The respective filter Coefficients are the matrices, A K and B K observe that in this architecture, the matrice’s B K have to be square. We also end up with a prediction in which we estimate a matrix graph signal as an output. This is the result of applying a pointwise non-narrative to the output of a mimograph filter whose input is the hidden state ZK. And whose coefficients are the matrices CK. The main advantage that follows from the use of a Mimo graph filter, is that the hidden state CT can have larger dimensionality, compared to the dimensionality of the observed states In the language of graph signals. The hidden state ZT can have more features at each node than the observed state XT.