HugeGraph-ML

HugeGraph-ML integrates HugeGraph with popular graph learning libraries, enabling end-to-end machine learning workflows directly on graph data.

Overview

hugegraph-ml provides a unified interface for applying graph neural networks and machine learning algorithms to data stored in HugeGraph. It eliminates the need for complex data export/import pipelines by seamlessly converting HugeGraph data to formats compatible with leading ML frameworks.

Key Features

  • Direct HugeGraph Integration: Query graph data directly from HugeGraph without manual exports
  • 21 Implemented Algorithms: Comprehensive coverage of node classification, graph classification, embedding, and link prediction
  • DGL Backend: Leverages Deep Graph Library (DGL) for efficient training
  • End-to-End Workflows: From data loading to model training and evaluation
  • Modular Tasks: Reusable task abstractions for common ML scenarios

Prerequisites

  • Python: 3.9+ (standalone module)
  • HugeGraph Server: 1.0+ (recommended: 1.5+)
  • UV Package Manager: 0.7+ (for dependency management)

Installation

1. Start HugeGraph Server

# Option 1: Docker (recommended)
docker run -itd --name=hugegraph -p 8080:8080 hugegraph/hugegraph

# Option 2: Binary packages
# See https://hugegraph.apache.org/docs/download/download/

2. Clone and Setup

git clone https://github.com/apache/incubator-hugegraph-ai.git
cd incubator-hugegraph-ai/hugegraph-ml

3. Install Dependencies

# uv sync automatically creates .venv and installs all dependencies
uv sync

# Activate virtual environment
source .venv/bin/activate

4. Navigate to Source Directory

cd ./src

[!NOTE] All examples assume you’re in the activated virtual environment.

Implemented Algorithms

HugeGraph-ML currently implements 21 graph machine learning algorithms across multiple categories:

Node Classification (11 algorithms)

Predict labels for graph nodes based on network structure and features.

AlgorithmPaperDescription
GCNKipf & Welling, 2017Graph Convolutional Networks
GATVeličković et al., 2018Graph Attention Networks
GraphSAGEHamilton et al., 2017Inductive representation learning
APPNPKlicpera et al., 2019Personalized PageRank propagation
AGNNThekumparampil et al., 2018Attention-based GNN
ARMABianchi et al., 2019Autoregressive moving average filters
DAGNNLiu et al., 2020Deep adaptive graph neural networks
DeeperGCNLi et al., 2020Very deep GCN architectures
GRANDFeng et al., 2020Graph random neural networks
JKNetXu et al., 2018Jumping knowledge networks
Cluster-GCNChiang et al., 2019Scalable GCN training via clustering

Graph Classification (2 algorithms)

Classify entire graphs based on their structure and node features.

AlgorithmPaperDescription
DiffPoolYing et al., 2018Differentiable graph pooling
GINXu et al., 2019Graph isomorphism networks

Graph Embedding (3 algorithms)

Learn unsupervised node representations for downstream tasks.

AlgorithmPaperDescription
DGIVeličković et al., 2019Deep graph infomax (contrastive learning)
BGRLThakoor et al., 2021Bootstrapped graph representation learning
GRACEZhu et al., 2020Graph contrastive learning

Predict missing or future connections in graphs.

AlgorithmPaperDescription
SEALZhang & Chen, 2018Subgraph extraction and labeling
P-GNNYou et al., 2019Position-aware GNN
GATNECen et al., 2019Attributed multiplex heterogeneous network embedding

Fraud Detection (2 algorithms)

Detect anomalous nodes in graphs (e.g., fraudulent accounts).

AlgorithmPaperDescription
CARE-GNNDou et al., 2020Camouflage-resistant GNN
BGNNZheng et al., 2021Bipartite graph neural network

Post-Processing (1 algorithm)

Improve predictions via label propagation.

AlgorithmPaperDescription
C&SHuang et al., 2020Correct & Smooth (prediction refinement)

Usage Examples

Example 1: Node Embedding with DGI

Perform unsupervised node embedding on the Cora dataset using Deep Graph Infomax (DGI).

Step 1: Import Dataset (if needed)

from hugegraph_ml.utils.dgl2hugegraph_utils import import_graph_from_dgl

# Import Cora dataset from DGL to HugeGraph
import_graph_from_dgl("cora")

Step 2: Convert Graph Data

from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL

# Convert HugeGraph data to DGL format
hg2d = HugeGraph2DGL()
graph = hg2d.convert_graph(vertex_label="CORA_vertex", edge_label="CORA_edge")

Step 3: Initialize Model

from hugegraph_ml.models.dgi import DGI

# Create DGI model
model = DGI(n_in_feats=graph.ndata["feat"].shape[1])

Step 4: Train and Generate Embeddings

from hugegraph_ml.tasks.node_embed import NodeEmbed

# Train model and generate node embeddings
node_embed_task = NodeEmbed(graph=graph, model=model)
embedded_graph = node_embed_task.train_and_embed(
    add_self_loop=True,
    n_epochs=300,
    patience=30
)

Step 5: Downstream Task (Node Classification)

from hugegraph_ml.models.mlp import MLPClassifier
from hugegraph_ml.tasks.node_classify import NodeClassify

# Use embeddings for node classification
model = MLPClassifier(
    n_in_feat=embedded_graph.ndata["feat"].shape[1],
    n_out_feat=embedded_graph.ndata["label"].unique().shape[0]
)
node_clf_task = NodeClassify(graph=embedded_graph, model=model)
node_clf_task.train(lr=1e-3, n_epochs=400, patience=40)
print(node_clf_task.evaluate())

Expected Output:

{'accuracy': 0.82, 'loss': 0.5714246034622192}

Full Example: See dgi_example.py

Example 2: Node Classification with GRAND

Directly classify nodes using the GRAND model (no separate embedding step needed).

from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL
from hugegraph_ml.models.grand import GRAND
from hugegraph_ml.tasks.node_classify import NodeClassify

# Load graph
hg2d = HugeGraph2DGL()
graph = hg2d.convert_graph(vertex_label="CORA_vertex", edge_label="CORA_edge")

# Initialize GRAND model
model = GRAND(
    n_in_feats=graph.ndata["feat"].shape[1],
    n_out_feats=graph.ndata["label"].unique().shape[0]
)

# Train and evaluate
node_clf_task = NodeClassify(graph=graph, model=model)
node_clf_task.train(lr=1e-2, n_epochs=1500, patience=100)
print(node_clf_task.evaluate())

Full Example: See grand_example.py

Core Components

HugeGraph2DGL Converter

Seamlessly converts HugeGraph data to DGL graph format:

from hugegraph_ml.data.hugegraph2dgl import HugeGraph2DGL

hg2d = HugeGraph2DGL()
graph = hg2d.convert_graph(
    vertex_label="person",      # Vertex label to extract
    edge_label="knows",         # Edge label to extract
    directed=False              # Graph directionality
)

Task Abstractions

Reusable task objects for common ML workflows:

TaskClassPurpose
Node EmbeddingNodeEmbedGenerate unsupervised node embeddings
Node ClassificationNodeClassifyPredict node labels
Graph ClassificationGraphClassifyPredict graph-level labels
Link PredictionLinkPredictPredict missing edges

Best Practices

  1. Start with Small Datasets: Test your pipeline on small graphs (e.g., Cora, Citeseer) before scaling
  2. Use Early Stopping: Set patience parameter to avoid overfitting
  3. Tune Hyperparameters: Adjust learning rate, hidden dimensions, and epochs based on dataset size
  4. Monitor GPU Memory: Large graphs may require batch training (e.g., Cluster-GCN)
  5. Validate Schema: Ensure vertex/edge labels match your HugeGraph schema

Troubleshooting

IssueSolution
“Connection refused” to HugeGraphVerify server is running on port 8080
CUDA out of memoryReduce batch size or use CPU-only mode
Model convergence issuesTry different learning rates (1e-2, 1e-3, 1e-4)
ImportError for DGLRun uv sync to reinstall dependencies

Contributing

To add a new algorithm:

  1. Create model file in src/hugegraph_ml/models/your_model.py
  2. Inherit from base model class and implement forward() method
  3. Add example script in src/hugegraph_ml/examples/
  4. Update this documentation with algorithm details

See Also