当前位置: 首页 > news >正文

[TPAMI 2022]HGNN: General Hypergraph Neural Networks+

论文网址:HGNN+: General Hypergraph Neural Networks | IEEE Journals & Magazine | IEEE Xplore

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.3.1. Graph Neural Networks

2.3.2. Hypergraph Learning

2.4. Preliminaries of Hypergraphs

2.5. The Framework of Hypergraph Neural Network HGNN+

2.5.1. Hypergraph Modeling

2.5.2. Hypergraph Convolution

2.6. Discussions

2.6.1. Hypergraph vs. Graph

2.6.2. HGNN/HGNN vs. GNN+

2.6.3. HGNN vs. HGNN+

2.7. Experiments and Discussions

2.7.1. Experimental Settings

2.7.2. Vertex Classification on the Data With Graph Structure

2.7.3. Vertex Classification on the Data Without Graph Structure

2.7.4. Vertex Classification on the Data With Hypergraph Structure

2.7.5. Visualization

2.8. THU-DeepHypergraph: An Open Toolbox of the HGNN Framework+

2.9. Conclusion

3. Reference

1. 心得

(1)经典还是得读一下

2. 论文逐段精读

2.1. Abstract

        ①Limitations of GNN: feature extraction on multi modal/type

        ②Limitations of existing HGNN: shared weight between modalities/types

        ③So, they proposed HGNN+ and THU-DeepHypergraph toolbox

2.2. Introduction

        ①High order relationship:

        ②Framework of HGNN+:

        ③The conception of graph and hypergraph:

        ④Hyperedge group: pairwise edge, attribute, k-Hop and neighbors

        ⑤Advantages of HGNN+: not only for adaptive aggregation, but also expand to directed hypergraph

2.3. Related Work

2.3.1. Graph Neural Networks

        ①Lists spectral and spatial based GNN

2.3.2. Hypergraph Learning

        ①Lists the development of hypergraph and hypergraph neural network

2.4. Preliminaries of Hypergraphs

        ①Notations:

for graph \mathcal{G}=(\mathcal{V},\mathcal{E},\mathbf{W})|\mathcal{V}|\times|\mathcal{E}| incidence matrix \mathbf{H} can be written as:

\mathbf{H}(v,e)=\left\{ \begin{array} {lr}1, & \mathrm{if~}v\in e \\ 0, & \mathrm{otherwise} \end{array}\right.

and the initial node feature is \mathbf{X}^0=\{x_1^0,x_2^0,\cdots,x_N^0\},x_i^0\in\mathbb{R}^{C_0}, where C_0 denotes the dimension of feature

        ②Optimization goal in node classfication task:

\arg\min_f\left\{\mathcal{R}_{emp}(f)+\Omega(f)\right\}

where \Omega(f) is a regularizer on hypergraph, \mathcal{R}_{emp}(f) denotes the supervised empirical loss and f\left ( \cdot \right ) is classification function

        ③Regularizer \Omega(f) can be represented by:

\Omega(f)=\frac{1}{2}\sum_{e\in\mathcal{E}}\sum_{\{u,v\}\in\mathcal{V}}\frac{w(e)\mathbf{H}(u,e)\mathbf{H}(v,e)}{\delta(e)}\left(\frac{f(u)}{\sqrt{d(u)}}-\frac{f(v)}{\sqrt{d(v)}}\right)^{2}

2.5. The Framework of Hypergraph Neural Network HGNN+

2.5.1. Hypergraph Modeling

(1)Hyperedge Group Generation

        ①When the data correlation is with graph structure:

(a top) pairwise edge

Similar to traditional graph:

\mathcal{E}_{\mathrm{pair}}=\{\{v_i,v_j\}\mid(v_i,v_j)\in\mathcal{E}_s\}

(a bottom) k-hop neighbor

The k-hop neighbor is:

N_{\mathrm{hop}_k}(v)=\{u\mid\mathbf{A}_{uv}^k\neq0,u\in\mathcal{V}_s\}, where k\in \left [ 2,n_v \right ]. The hyperedge can be represented by:

\mathcal{E}_{\mathrm{hop}_k}= \begin{Bmatrix} N_{\mathrm{hop}_k}(v)\mid v\in\mathcal{V} \end{Bmatrix}

        ②When the data correlation is without graph structure:

(b top) attributes

Hyperedges are constructed by the same attributes (geo-locations, time and other specific information)(比如每个节点都是一篇论文的话,综述论文共享一条超边,研究论文共享一条超边,workshop共享一条等等). The construction of hyperedge:

\mathcal{E}_\mathrm{attribute}=\{N_\mathrm{att}(a)\mid a\in\mathcal{A}\}

(b bottom) feature

k nearest neighbor in feature space:

\left.\left\{ \begin{array} {l}\mathcal{E}_{\mathrm{feature}}^{\mathrm{KNN}_k}=\{N_{\mathrm{KNN}_k}(v)\mid v\in\mathcal{V}\} \\ \mathcal{E}_{\mathrm{feature}}^{\mathrm{distance}_d}=\{N_{\mathrm{dis}_d}(v)\mid v\in\mathcal{V}\} \end{array}\right.\right.

(2)Combination of Hyperedge Groups

        ①Coequal fusion of each hyperedge group:

\mathbf{H}=\mathbf{H}_1\|\mathbf{H}_2\|\cdots\|\mathbf{H}_K

        ②Adaptive fusion:

\left\{ \begin{array} {l}\mathbf{w}_{k}=\operatorname{copy}(\operatorname{sigmoid}(w_{k}),M_{k}) \\ \mathbf{W}=\operatorname{diag}(\mathbf{w}_{1}^{1},\cdots,\mathbf{w}_{1}^{M_{1}},\cdots,\mathbf{w}_{K}^{1},\cdots,\mathbf{w}_{K}^{M_{K}}) \\ \mathbf{H}=\mathbf{H}_{1}\|\mathbf{H}_{2}\|\cdots\|\mathbf{H}_{K} \end{array}\right.

where \mathbf{w}_{k} \in \mathbb{R} denotes trainable parameter shared by all hyperedges inside a specified hyperedge group k\operatorname{copy}\left ( a,b \right ) returns a vector of size b with all the value of ab\mathbf{W}\in\mathbb{R}^{M\times M} denotes a diagonal weight matrix,

2.5.2. Hypergraph Convolution

(1)Spectral Convolution on Hypergraph

        ①The spectral convolution of signal \mathbf{x} and filter \mathbf{g} can be denoted as:

\mathbf{g}\star x=\mathbf{\Phi}((\Phi^\top\mathbf{g})\odot(\Phi^\top\mathbf{x}))=\mathbf{\Phi}g(\mathbf{\Lambda})\mathbf{\Phi}^\top\mathbf{x}

where \odot denotes the element-wise Hadamard product, g(\boldsymbol{\Lambda})=\mathrm{diag}(g(\lambda_1),\ldots,g(\lambda_n))

        ②Further simplify it by Chebyshv:

\mathbf{g}\star\mathbf{x}\approx\sum_{k=0}^K\theta_kT_k(\tilde{\mathbf{\Delta}})\mathbf{x}

        ③Limite Chebyshv in K=1:

\mathbf{g}\star x\approx\theta_0\mathbf{x}-\theta_1\mathbf{D}_v^{-1/2}\mathbf{H}\mathbf{W}\mathbf{D}_e^{-1}\mathbf{H}^\top\mathbf{D}_v^{-1/2}\mathbf{x}

        ④Reduce parameter to avoid overfitting and simplify graph conv:

\left.\left\{ \begin{array} {l}\theta_1=-\frac{1}{2}\theta \\ \theta_0=\frac{1}{2}\theta\mathbf{D}_v^{-1/2}\mathbf{H}\mathbf{D}_e^{-1}\mathbf{H}^\top\mathbf{D}_v^{-1/2} \end{array}\right.\right.

\begin{gathered} \mathbf{g}\star x\approx\frac{1}{2}\theta\mathbf{D}_v^{-1/2}\mathbf{H}(\mathbf{W}+\mathbf{I})\mathbf{D}_e^{-1}\mathbf{H}^\top\mathbf{D}_v^{-1/2}\mathbf{x} \\ \approx\theta\mathbf{D}_v^{-1/2}\mathbf{H}\mathbf{W}\mathbf{D}_e^{-1}\mathbf{H}^\top\mathbf{D}_v^{-1/2}\mathbf{x}, \end{gathered}

(2)General Spatial Convolution on Hypergraph

        ①Inter-Neighbor Relation:

N=\{ \begin{array} {c}(v,e)\mid\mathbf{H}(v,e)=1,v\in\mathcal{V}\mathrm{and~}e\in\mathcal{E} \end{array}\}

and the vertex inter-neighbor set of hyperedge is defined as:

\mathcal{N}_v(e)=\{v\mid vNe,v\in\mathcal{V}\mathrm{and}e\in\mathcal{E}\}

the hyperedge inter-neighbor set of vertex is defined as:

\mathcal{N}_e(v)=\{e\mid vNe,v\in\mathcal{V}\mathrm{and}e\in\mathcal{E}\}

        ②Spatial hypergraph convolution:

where m_\beta^t is the message of hyperedge \beta\in\mathcal{E}y_\beta^t is the hyperedge feature of hyperedge \beta which is a element of hyperedge feature set Y^t=\{y_1^t,y_2^t,\cdots,y_M^t\},y_i^t\in\mathbb{R}^{C_t} in layer tM_v^t(\cdot),U_e^t(\cdot),M_e^t(\cdot),U_v^t(\cdot) are the vertex message function, hyperedge update functions, hyperedge message function and vertex update function in t-th layer

(3)HGNN Convolution Layer Configurations+

        ①Functions in HGNN+:

\begin{cases} M_v^{t}(x_{\alpha}^{t}) =\frac{x_{\alpha}^{t}\vert}{\vert N_e(\beta)} \\ U_e^{t}(w_{\beta},m_{\beta}^{t}) =w_{\beta}\cdot m_{\beta}^{t} \\ M_e^{t}(x_{\alpha}^{t},y_{\beta}^{t}) =\frac{y_{\beta}^{t}\vert}{\vert N_e(\alpha)} \\ U_v^{t}(x_{\alpha}^{t},m_{\alpha}^{T+1}) =\sigma(m_{\beta}^{T+1}\cdot\Theta) \end{cases}

        ②Hyper graph conv in matrix format:

\mathbf{X}^{t+1}=\sigma(\mathbf{D}_v^{-1}\mathbf{H}\mathbf{W}\mathbf{D}_e^{-1}\mathbf{H}^\top X^t\mathbf{\Theta}^t)

2.6. Discussions

2.6.1. Hypergraph vs. Graph

        ①"from the random walks’ aspect, a hypergraph with edge-independent vertex weights is equivalent to a weighted graph, and a hypergraph with edge-dependent vertex weights cannot be reduced to a weighted graph."

        ②For undirected graph without relation to edge, \mathbf{H}\in\{0,1\}^{|\mathcal{V}|\times|\mathcal{E}|} and \mathcal{G}_{\mathrm{in}}=\{\mathcal{V},\mathcal{E},\mathbf{W}\} (edge-independent).

        ③For those edge-dependent graph, \mathbf{R}\in\mathbb{R}^{|\mathcal{V}|\times|\mathcal{E}|} and \mathcal{G}_{\mathrm{de}}=\{\mathcal{V},\mathcal{E},\mathbf{W},\gamma\}. Degree matrix of vertex d\left ( v \right ) and edge \delta \left ( e \right ) can be represented by:

\left.\left\{ \begin{array} {ll}d(v) & =\sum_{\beta\in\mathcal{N}_e(v)}w(\beta) \\ \delta(e) & =\sum_{\alpha\in\mathcal{N}_v(e)}\gamma_e(\alpha) \end{array}\right.\right.

        ④之后是用随机游走还有马尔科夫链证明超图blabla,就是①,这里就不誊抄了,不是数学达人

2.6.2. HGNN/HGNN vs. GNN+

        ①For simple hyper graph (traditional graph):

\begin{cases} \mathbf{HH}^\top =\mathbf{A}+\mathbf{D} \\ \mathbf{D}_e^{-1} =1/2\mathbf{I} \\ \mathbf{W} =\mathbf{I} \end{cases}

the HGNN can be:

\begin{aligned} \mathbf{X}^{t+1} & =\sigma(\mathbf{D}_v^{-1/2}\mathbf{H}\mathbf{W}\mathbf{D}_e^{-1}\mathbf{H}^\top\mathbf{D}_v^{-1/2}\mathbf{X}^t\boldsymbol{\Theta}^t) \\ & =\sigma\left(\mathbf{D}_v^{-1/2}\mathbf{H}\left(\frac{1}{2}\mathbf{I}\right)\mathbf{H}^\top\mathbf{D}_v^{-1/2}\mathbf{X}^t\boldsymbol{\Theta}^t\right) \\ & =\sigma\left(\frac{1}{2}\mathbf{D}^{-1/2}(\mathbf{A}+\mathbf{D})\mathbf{D}^{-1/2}\mathbf{X}^t\boldsymbol{\Theta}^t\right) \\ & =\sigma\left(\frac{1}{2}(\mathbf{I}+\mathbf{D}^{-1/2}\mathbf{A}\mathbf{D}^{-1/2})\mathbf{X}^t\boldsymbol{\Theta}^t\right) \\ & =\sigma(\mathbf{D}^{-1/2}\hat{\mathbf{A}}\mathbf{D}^{-1/2}\mathbf{X}^t\hat{\boldsymbol{\Theta}}^t) \end{aligned}

        ②Utilize rooted subtree to analogize 2-uniform hypergraph:

2.6.3. HGNN vs. HGNN+

        ①Difference between two convs:

\begin{cases} \mathbf{X^{t+1}}\xleftarrow{\mathrm{HGNNConv}}\sigma(\underline{\mathbf{D_v^{-1/2}HWD_e^{-1}H^\top D_v^{-1/2}}}\mathbf{X^t}\mathbf{\Theta^t}) \\ \mathbf{X^{t+1}}\xleftarrow{\mathrm{HGNNConv^+}}\sigma(\underline{\mathbf{D_v^{-1}HWD_e^{-1}H^\top}}\mathbf{X^t}\mathbf{\Theta^t}) & \end{cases}

上面的是对称(无向图)消息传递,而下面的是非对称(无向图)传递

        ②Define of directed graph:

\hat{\mathbf{H}}(v,e)= \begin{cases} -1 & \mathrm{if}v\in T(e) \\ 1 & \mathrm{if}v\in S(e) \\ 0 & \mathrm{otherwise} \end{cases}

where T\left ( e \right ) and S\left ( e \right ) are the set of target and source vertices for hyperedge e

        ③Specific message passing of directed hyper graph:

\mathbf{X}^{T+1}=\sigma(\mathbf{D}_t^{-1}\hat{\mathbf{H}}_t\mathbf{W}\mathbf{D}_s^{-1}\hat{\mathbf{H}}_s\mathbf{X}^t\mathbf{\Theta}^t)

where 

\begin{cases} \mathbf{D}_s=\operatorname{diag}(\operatorname{col\_sum}(\hat{\mathbf{H}}_s)) \\ \mathbf{D}_t=\operatorname{diag}(\operatorname{col\_sum}(\hat{\mathbf{H}}_t)) & \end{cases}

2.7. Experiments and Discussions

2.7.1. Experimental Settings

        ①Learning rate: 0.01 and 0.001 for 3 citation network datasets and 2 social media network datasets

        ②Dropout: 0.5

        ③Optimizer: Adam with 0.0005 weight decay

        ④Loss: CE:

\mathcal{L}=-\sum_{v\in V}\sum_{c=1}^CY_{vc}\log(O_{vc})

2.7.2. Vertex Classification on the Data With Graph Structure

        ①Statics of datasets:

        ②Performance on 5 datasets when 5 samples per category were trained:

        ③Performance on 5 datasets when 10 samples per category were trained:

        ④Comparison on the effectiveness of different hyperedge groups with HGNN+:

        ⑤Comparison of HGNN and HGNN+:

        ⑥Convolution strategy comparison:

2.7.3. Vertex Classification on the Data Without Graph Structure

        ①3D object datasets: ModelNet40 and NTU

        ②Performance:

2.7.4. Vertex Classification on the Data With Hypergraph Structure

        ①Dataset: Cooking-200 and MovieLens2k-v2

        ②Performance:

2.7.5. Visualization

        ①t-sne visualization on Cooking-200:

2.8. THU-DeepHypergraph: An Open Toolbox of the HGNN Framework+

        ~

2.9. Conclusion

        ~

3. Reference

@ARTICLE{9795251,
  author={Gao, Yue and Feng, Yifan and Ji, Shuyi and Ji, Rongrong},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={HGNN+: General Hypergraph Neural Networks}, 
  year={2023},
  volume={45},
  number={3},
  pages={3181-3199},
  keywords={Correlation;Convolution;Data models;Task analysis;Social networking (online);Mathematical models;Representation learning;Hypergraph;classification;hypergraph convolution;representation learning},
  doi={10.1109/TPAMI.2022.3182052}}

http://www.lqws.cn/news/472645.html

相关文章:

  • GO语言---数组
  • ffmpeg(七):直播相关命令
  • AI大模型学习之基础数学:高斯分布-AI大模型概率统计的基石
  • 4.1 FFmpeg编译选项配置
  • Wire--编译时依赖注入工具
  • Python元组常用操作方法
  • C++模板基础
  • 并查集(Disjoint-Set Union)详解
  • 【分布式理论】读确认数与写确认数:分布式一致性的核心概念
  • AI大模型学习之基础数学:微积分-AI大模型的数学引擎
  • 【Linux 平台总线驱动开发实战】
  • 湖北理元理律师事务所企业债务纾困路径:司法重整中的再生之道
  • Spring中IoC的理解
  • AI大模型提示词工程研究报告:长度与效果的辩证分析
  • TensorFlow 安装与 GPU 驱动兼容(h800)
  • 【软考高级系统架构论文】论模型驱动架构设计方法及其应用
  • 【知识图谱提取】【阶段总结】【LLM4KGC】LLM4KGC项目提取知识图谱推理部分
  • 网站并发访问量达到1万以上需要注意哪些事项
  • Qt 连接信号使用lambda表达式和槽函数的区别
  • nginx服务器配置时遇到的一些问题
  • 【软考高级系统架构论文】论软件系统架构风格
  • 【Node】最佳Node.js后端开发模板推荐
  • 从0开始学linux韦东山教程Linux驱动入门实验班(1)
  • OSC晶振的工作原理及作用
  • 前端开发面试题总结-vue3框架篇(二)
  • 常见应用层协议介绍
  • 抖音的视频怎么下载下来——下载狗解析工具
  • 什么是RoCE网络技术
  • 冰箱压缩机电机驱动板【电源部分】
  • C 语言结构体:从基础到内存对齐深度解析