Feature propagation bears similarity to label propagation (LP) [9]. However, the key difference is that LP is a feature-agnostic method and directly predicts a class for each node by propagating the known labels in the graph. On the other hand, FP is used to first reconstruct the missing node features, which are then fed into a downstream GNN. This allows FP to leverage the observed features and to outperform LP on all benchmarks we experimented with. Furthermore, it often happens in practice that the sets of nodes with labels and those with features are not necessarily fully overlapping, so the two approaches are not always directly comparable.
We conducted an extensive experimental validation of FP using seven standard node-classification benchmarks, in which we randomly removed a variable fraction of node features (independently for every channel). FP followed by a 2-layer GCN on the reconstructed features significantly outperformed both simple baselines as well as most recent state-of-the-art methods [2-3]. Fig 4 shows the plot for the Cora dataset (the plots for all other datasets can be found in our paper [4]).
FP particularly shone in regimes of high rates of missing features (>90%), where all other methods tend to suffer. For example, even with 99% of features missing, FP only loses around 4% of relative accuracy on average compared to the same model when all features are present.