GNN——图网络相关知识整理

未经许可请勿转载。
Please do not reprint this article without permission.

Introduction 引言

Due to its performance in non-euclidean spatial data, GNN methods are gradually appealing to the attention of researchers. Traditional deep neural networks take Euclidean-structured data as input, which is one of the reasons for its excellent performance in computer vision and other fields. However, in real life, there are many non-euclidean data, such as social network, retail network and biological network. In the field of brain neuroinformatics where the author focuses, one of the most commonly used methods of brain image analysis is voxel-based morphology, but different areas of the human brain are usually correlated and interacting, the brain network constructed based on which can reveal the higher-level brain activity mechanism. Similar to other topological network data, brain network is usually represented in the form of connection matrices, which cannot be directly vectorized and fed into machine learning models. However, the emergence of graph network analysis method breaks the deadlock.
得益于其在非欧几里得空间数据中的表现,图网络研究方法正逐渐吸引着研究人员的关注。传统的深度神经网络将欧几里得空间结构化数据作为输入,这也是其在计算机视觉等领域有着优异表现的原因之一。但现实生活中往往存在着各种非欧几里得结构的数据,例如社交网络数据、零售网络数据以及生物网络数据等。以笔者所处的脑神经信息学领域来说,目前常用的脑神经影像分析手段都是基于体素的形态学分析,但人脑的不同区域往往存在着相互关联和影响,以此为基础构建出的脑网络往往能反映出更深层次的大脑活动机理。而正如其他网络拓扑结构数据,脑网络通常以连接矩阵的形式表示,无法通过直观的手段将其向量化,作为机器学习模型的输入。而图网络分析方法的出现打破了这种僵局。

Theory 基础理论

What is GRAPH? 什么是图?

Graphs are a kind of common data structure used to represent objects and the interactions among them. Objects are expressed by nodes or vertices, and their interactions or connections are described by edges. A graph is mathematically expressed by G=(V,E,A,X), where V=\{v1,v2……,vn\} is the set of nodes, E=e_{ij} is the set of edges, A is a matrix with size of |V|×|V| used to describe the relationships between nodes. If e_{ij}∈E, then A_{ij}=1. Moreover, X is the feature matrix with size of |V|×d , where X_i means the attributes or features of the ith node, and d is the dimension of the attributes.
图是一种常见的数据结构,用于表示对象及其之间的关系。其中,对象又称节点(node)或顶点(vertex),关系用边(edge)来描述。在数学上一般用 G=(V,E,A,X) 来表示,其中V=\{v1,v2……,vn\} 是节点集合,E=e_{ij} 表示边的集合,A 是大小为|V|×|V|的邻接矩阵,用于表示节点之间的连接关系,如果 e_{ij}∈E,则 A_{ij}=1X 是大小为|V|×d 的特征矩阵,X 的第iX_i表示第 i 个节点的属性特征,其中 d 是属性的维度。
Reference: 从数据结构到算法:图网络方法初探

Simply put, a graph is an abstract and irregular data structure that can be used to describe and model complex systems. Different from Euclidean spatial data, graphs in real world usually have complex topological structure and huge data size. Using traditional graph analysis methods would be difficult to achieve the same level of performance as applications of machine learning like computer vision, while existing machine learning algorithms cannot be applied to graph data straightforward. In view of this, how to combine machine learning with graph data analysis method, capture the interactions between data nodes in graphs and mine the information therein, has become a hot trend in the field of machine learning.
简单来说,图是一种抽象而不规则的数据结构,可以用于描述和建模复杂的系统。不同于欧几里得空间数据,现实中的图往往具有复杂的拓扑结构和庞大的数据量,传统的图分析方法难以实现与计算机视觉领域相当的应用水平和模型性能,而现有的机器学习算法不能直接应用于图数据中。鉴于此,如何将机器学习与图数据分析方法结合起来,捕捉图结构中数据之间的依赖关系,挖掘其中的信息,成为了机器学习领域的一股热潮。

Notations of GNN 图网络中的符号表示

Notations of GNN

Theory of GCN 图卷积网络原理

Generally, before data is fed into machine learning algorithm models, it needs to be processed to extract valuable features, which can not only improve the quality of input data, but also greatly improve the reliability and performance of the model. This process is called feature engineering. Since the quality of feature engineering methods directly determines the performance of models, the research of data mining focuses on the handcrafted design and extraction of valuable features for specific data. For example, neuroimaging data often contains a lot of noise and has very high resolution, which is not suitable for direct input to machine learning models. Therefore, we preprocess the data and calculate the corresponding feature vectors, which are fed into the analysis model.
通常,在将数据输入到强大的机器学习算法模型中之前,需要将其进行一定的处理,提取出有价值的特征,这样不仅可以提高数据的质量,更能大大提升模型的可靠性和性能,这一处理过程被称作特征工程。正因为特征工程方法的好坏直接决定着模型的性能,数据挖掘的研究都将重心放在了针对特定的数据人工设计有价值的特征上。举例来说,神经影像数据通常作为包含着多种噪音,并且分辨率极高,不适合直接作为机器学习模型的输入。因此笔者将数据进行一定的预处理并计算出相应的特征向量,在输入到分析模型中。
Deep learning is essentially a kind of "feature engineering", or mostly called "feature learning". This is because the general idea of deep learning is to transform the original data into higher-level features through the nonlinear transformation model of neural network, and these features are usually a vector that can be used as the input of classifiers. The graph convolutional neural network mentioned in this section is a method that can represent the nodes and edges in the graph using feature vectors to serve as the input of high-performance machine learning algorithm model. This method of embedding graph nodes into low-dimensional Euclidean space is also called graph embedding method.
深度学习本质上就是一种“特征工程”,更多地被称为“特征学习”。这是由于深度学习的思想就是将原始数据通过神经网络这一非线性变换模型转变为更高层次的特征,而这些特征通常是一个向量,可以作为分类器的输入。本节提到的图卷积神经网络就是一种能够将图中的节点和边使用特征向量表示出来,以作为高性能机器学习算法模型的输入的方法,这种将图节点嵌入到低维欧几里得空间中的方法也称作图嵌入方法。

通常用邻居聚合或消息传递来表示卷积算法在不规则数据上的泛化,用动图来理解就是:
第一步:发射(send)每一个节点将自身的特征信息经过变换后发送给邻居节点。这一步是在对节点的特征信息进行抽取变换。

Step 1 SEND

第二步:接收(receive)每个节点将邻居节点的特征信息聚集起来。这一步是在对节点的局部结构信息进行融合。
Step 2 RECEIVE

第三步:变换(transform)把前面的信息聚集之后做非线性变换,增加模型的表达能力。
Step 3 TRANSFORM

Reference: 浅析图卷积神经网络

Graph Convolution Operator (Source: //www.greatytc.com/p/89fbed65cd04?winzoom=1)

上面给出的是图卷积算子的计算公式,设中心节点为ih^{l}_{i}是节点i在第l层的特征表达,c_{ij}是归一化因子,如取节点度的倒数,N_{i}是节点i的邻节点,包含自身,R_{i}是节点i的类型,W^{l}_{R_j}表示R_j类型节点的变换权重参数,\sigma表示激活函数。

Applications and Open-Source Implementations 应用和开源实现

According to problems in the field of neuro-informatics, the application of graph neural network in which is mainly graph classification, namely after the construction of brain function network and features are added in the corresponding nodes, using GCN to learn the high-level features of brain networks, using full connection layer to extract vectorized features or directly using global average pooling (GAP) to output the class confidence, which can be gender (e.g. Graph Saliency Maps through Spectral Convolutional Networks: Application to Sex Classification with Brain Connectivity) or disease group. Currently open source deep learning frameworks based on graph mostly focus link classification and node classification, support for Graph Classification is relatively lacked, and pytorch_geometric is a Deep Learning framework supporting multiple GNN applications, including the support of this article, An End - to - End Deep Learning Architecture for Graph Classification, which makes it qualified to perform graph convolution operations and output feature vectors ready for learning and classification.
根据笔者所处研究领域的痛点,目前图神经网络在其中的应用主要为图分类,即在构建脑功能网络并在相应节点添加特征后,使用GCN对脑网络进行高层特征学习,使用全连接层提取向量化的特征或直接使用全局平均池化(GAP)输出类别置信度,这一类别可为性别(如: Graph Saliency Maps through Spectral Convolutional Networks: Application to Sex Classification with Brain Connectivity),亦可为疾病。目前基于图的深度学习开源框架大多注重边分类和节点分类,对图分类的支持相对较少,而pytorch_geometric是一个支持多种图深度学习应用的框架,其中对An End-to-End Deep Learning Architecture for Graph Classification这篇文章的支持使其能够胜任图卷积操作并输出特征向量这一工作,以便之后对该特征进行学习和分类。

  • Graph Saliency Maps provides the implementation of an activation-based visual attribution method for irregular graphs, which works integrated with graph convolutional neural networks (GCNs). The method has been validated via a sex classification task using functional brain connectivity networks (paper);
  • SGCN is a Siamese Graph Convolution Network for learning multi-view brain network embedding;
  • pytorch_geometric is a geometric deep learning extension library for PyTorch. It consists of various methods for deep learning on graphs and other irregular structures, also known as geometric deep learning, from a variety of published papers. In addition, it consists of an easy-to-use mini-batch loader for many small and single giant graphs, multi gpu-support, a large number of common benchmark datasets (based on simple interfaces to create your own), and helpful transforms, both for learning on arbitrary graphs as well as on 3D meshes or point clouds;

References 参考文献

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
禁止转载,如需转载请通过简信或评论联系作者。
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,463评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,868评论 3 391
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,213评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,666评论 1 290
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,759评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,725评论 1 294
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,716评论 3 415
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,484评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,928评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,233评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,393评论 1 345
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,073评论 5 340
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,718评论 3 324
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,308评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,538评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,338评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,260评论 2 352

推荐阅读更多精彩内容