This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

HugeGraph Computing (OLAP)

DeepWiki provides real-time updated project documentation with more comprehensive and accurate content, suitable for quickly understanding the latest project information.

📖 https://deepwiki.com/apache/hugegraph-computer

GitHub Access: https://github.com/apache/hugegraph-computer

1 - HugeGraph-Vermeer Quick Start

1. Overview of Vermeer

1.1 Architecture

Vermeer is a high-performance, memory-first graph computing framework written in Go (start once, execute any task), supporting ultra-fast computation of 15+ OLAP graph algorithms (most tasks complete in seconds to minutes), with master and worker roles. Currently, there is only one master (HA can be added), and there can be multiple workers.

The master is responsible for communication, forwarding, and aggregation, with minimal computation and resource usage. Workers are computation nodes used to store graph data and run computation tasks, consuming a large amount of memory and CPU. The grpc and rest modules handle internal communication and external calls, respectively.

The framework’s runtime configuration can be passed via command-line parameters or specified in configuration files located in the config/ directory. The --env parameter can specify which configuration file to use, e.g., --env=master specifies using master.ini. Note that the master needs to specify the listening port, and the worker needs to specify the listening port and the master’s ip:port.

1.2 Running Method

  1. Option 1: Docker Compose (Recommended)

Please ensure that docker-compose.yaml exists in your project root directory. If it doesn’t, here is an example:

services:
  vermeer-master:
    image: hugegraph/vermeer
    container_name: vermeer-master
    volumes:
      - ~/.config:/go/bin/config # Change here to your actual config path
    command: --env=master
    networks:
      vermeer_network:
        ipv4_address: 172.20.0.10 # Assign a static IP for the master

  vermeer-worker:
    image: hugegraph/vermeer
    container_name: vermeer-worker
    volumes:
      - ~/:/go/bin/config # Change here to your actual config path
    command: --env=worker
    networks:
      vermeer_network:
        ipv4_address: 172.20.0.11 # Assign a static IP for the worker

networks:
  vermeer_network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.0.0/24 # Define the subnet for your network

Modify docker-compose.yaml

  • Volume: For example, change both instances of ~/:/go/bin/config to /home/user/config:/go/bin/config (or your own configuration directory).
  • Subnet: Modify the subnet IP based on your actual situation. Note that the ports each container needs to access are specified in the config file. Please refer to the contents of the project’s config folder for details.

Build the Image and Start in the Project Directory (or docker build first, then docker-compose up)

# Build the image (in the project root vermeer directory)
docker build -t hugegraph/vermeer .

# Start the services (in the vermeer root directory)
docker-compose up -d
# Or use the new CLI:
# docker compose up -d

View Logs / Stop / Remove

docker-compose logs -f
docker-compose down
  1. Option 2: Start individually via docker run (Manually create network and assign static IP)

Ensure the CONFIG_DIR has proper read/execute permissions for the Docker process.

Build the image:

docker build -t hugegraph/vermeer .

Create a custom bridge network (one-time operation):

docker network create --driver bridge \
  --subnet 172.20.0.0/24 \
  vermeer_network

Run master (adjust CONFIG_DIR to your absolute configuration path, and you can adjust the IP as needed based on your actual situation).

CONFIG_DIR=/home/user/config

docker run -d \
  --name vermeer-master \
  --network vermeer_network --ip 172.20.0.10 \
  -v ${CONFIG_DIR}:/go/bin/config \
  hugegraph/vermeer \
  --env=master

Run worker:

docker run -d \
  --name vermeer-worker \
  --network vermeer_network --ip 172.20.0.11 \
  -v ${CONFIG_DIR}:/go/bin/config \
  hugegraph/vermeer \
  --env=worker

View logs / Stop / Remove:

docker logs -f vermeer-master
docker logs -f vermeer-worker

docker stop vermeer-master vermeer-worker
docker rm vermeer-master vermeer-worker

# Remove the custom network (if needed)
docker network rm vermeer_network
  1. Option 3: Build from Source

Build. You can refer Vermeer Readme.

go build

Enter the directory and input ./vermeer --env=master or ./vermeer --env=worker01.

2. Task Creation REST API

2.1 Introduction

This REST API provides all task creation functions, including reading graph data and various computation functions, offering both asynchronous and synchronous return interfaces. The returned content includes information about the created tasks. The overall process of using Vermeer is to first create a task to read the graph data, and after the graph is read, create a computation task to execute the computation. The graph will not be automatically deleted; multiple computation tasks can be run on one graph without repeated reading. If deletion is needed, the delete graph interface can be used. Task statuses can be divided into graph reading task status and computation task status. Generally, the client only needs to know four statuses: created, in progress, completed, and error. The graph status is the basis for determining whether the graph is available. If the graph is being read or the graph status is erroneous, the graph cannot be used to create computation tasks. The delete graph interface is only available when the graph is in the loaded or error status and has no computation tasks.

Available URLs are as follows:

  • Asynchronous return interface: POST http://master_ip:port/tasks/create returns only whether the task creation is successful, and the task status needs to be actively queried to determine completion.
  • Synchronous return interface: POST http://master_ip:port/tasks/create/sync returns after the task is completed.

2.2 Loading Graph Data

Refer to the Vermeer parameter list document for specific parameters.

Vermeer provides three ways to load data:

  1. Load from Local Files

You can obtain the dataset in advance, such as the Twitter-2010 dataset. Acquisition method: https://snap.stanford.edu/data/twitter-2010.html The first Twitter-2010.text.gz is sufficient.

Request Example:

POST http://localhost:8688/tasks/create
{
 "task_type": "load",
 "graph": "testdb",
 "params": {
  "load.parallel": "50",
  "load.type": "local",
  "load.vertex_files": "{\"localhost\":\"data/twitter-2010.v_[0,99]\"}",
  "load.edge_files": "{\"localhost\":\"data/twitter-2010.e_[0,99]\"}",
  "load.use_out_degree": "1",
  "load.use_outedge": "1"
 }
}
  1. Load from HugeGraph

Request Example:

⚠️ Security Warning: Never store real passwords in configuration files or code. Use environment variables or a secure credential management system instead.

POST http://localhost:8688/tasks/create
{
  "task_type": "load",
  "graph": "testdb",
  "params": {
    "load.parallel": "50",
    "load.type": "hugegraph",
    "load.hg_pd_peers": "[\"<your-hugegraph-ip>:8686\"]",
    "load.hugegraph_name": "DEFAULT/hugegraph2/g",
    "load.hugegraph_username": "admin",
    "load.hugegraph_password": "<your-password-here>",
    "load.use_out_degree": "1",
    "load.use_outedge": "1"
  }
}
  1. Load from HDFS

Request Example:

POST http://localhost:8688/tasks/create
{
  "task_type": "load",
  "graph": "testdb",
  "params": {
    "load.parallel": "50",
    "load.type": "hdfs",
    "load.hdfs_namenode": "name_node1:9000",
    "load.hdfs_conf_path": "/path/to/conf",
    "load.krb_realm": "EXAMPLE.COM",
    "load.krb_name": "user@EXAMPLE.COM",
    "load.krb_keytab_path": "/path/to/keytab",
    "load.krb_conf_path": "/path/to/krb5.conf",
    "load.hdfs_use_krb": "1",
    "load.vertex_files": "/data/graph/vertices",
    "load.edge_files": "/data/graph/edges",
    "load.use_out_degree": "1",
    "load.use_outedge": "1"
  }
}

2.3 Output Computation Results

All Vermeer computation tasks support multiple result output methods, which can be customized: local, hdfs, afs, or hugegraph. Add the corresponding parameters under the params parameter when sending the request to take effect. When output.need_statistics is set to 1, it supports outputting statistical information of the computation results, which will be written in the interface task information. The statistical mode operators currently support “count” and “modularity,” but only for community detection algorithms.

Refer to the Vermeer parameter list document for specific parameters.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "pagerank",
 "compute.parallel": "10",
 "compute.max_step": "10",
 "output.type": "local",
 "output.parallel": "1",
 "output.file_path": "result/pagerank"
  }
}

3. Supported Algorithms

3.1 PageRank

The PageRank algorithm, also known as the web ranking algorithm, is a technique used by search engines to calculate the relevance and importance of web pages (nodes) based on their mutual hyperlinks.

  • If a web page is linked to by many other web pages, it indicates that the web page is relatively important, and its PageRank value will be relatively high.
  • If a web page with a high PageRank value links to other web pages, the PageRank value of the linked web pages will also increase accordingly.

The PageRank algorithm is suitable for scenarios such as web page ranking and identifying key figures in social networks.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "pagerank",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/pagerank",
 "compute.max_step":"10"
 }
}

3.2 WCC (Weakly Connected Components)

The weakly connected components algorithm calculates all connected subgraphs in an undirected graph and outputs the weakly connected subgraph ID to which each vertex belongs, indicating the connectivity between points and distinguishing different connected communities.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "wcc",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/wcc",
 "compute.max_step":"10"
 }
}

3.3 LPA (Label Propagation Algorithm)

The label propagation algorithm is a graph clustering algorithm commonly used in social networks to discover potential communities.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "lpa",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/lpa",
 "compute.max_step":"10"
 }
}

3.4 Degree Centrality

The degree centrality algorithm calculates the degree centrality value of each node in the graph, supporting both undirected and directed graphs. Degree centrality is an important indicator of node importance; the more edges a node has with other nodes, the higher its degree centrality value, and the more important the node is in the graph. In an undirected graph, degree centrality is calculated based on edge information to count the number of times a node appears, resulting in the degree centrality value of the node. In a directed graph, it is based on the direction of the edges, filtering based on input or output-edge information to count the number of times a node appears, resulting in the in-degree or out-degree value of the node. It indicates the importance of each point, with more important points having higher degrees.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "degree",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/degree",
 "degree.direction":"both"
 }
}

3.5 Closeness Centrality

Closeness centrality is used to calculate the inverse of the shortest distance from a node to all other reachable nodes, accumulating and normalizing the value. Closeness centrality can be used to measure the time it takes for information to be transmitted from the node to other nodes. The larger the closeness centrality of a node, the closer its position in the graph is to the center, suitable for scenarios such as identifying key nodes in social networks.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "closeness_centrality",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/closeness_centrality",
 "closeness_centrality.sample_rate":"0.01"
 }
}

3.6 Betweenness Centrality

The betweenness centrality algorithm determines the value of a node as a “bridge” node; the larger the value, the more likely it is to be a necessary path between two points in the graph. Typical examples include mutual followers in social networks. It is suitable for measuring the degree of aggregation around a node in a community.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "betweenness_centrality",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/betweenness_centrality",
 "betweenness_centrality.sample_rate":"0.01"
 }
}

3.7 Triangle Count

The triangle count algorithm calculates the number of triangles passing through each vertex, suitable for calculating the relationships between users and whether the associations form triangles. The more triangles, the higher the degree of association between nodes in the graph, and the tighter the organizational relationship. In social networks, triangles indicate cohesive communities, and identifying triangles helps understand clustering and interconnections among individuals or groups in the network. In financial or transaction networks, the presence of triangles may indicate suspicious or fraudulent activities, and triangle counting can help identify transaction patterns that may require further investigation.

The output result is the Triangle Count corresponding to each vertex, i.e., the number of triangles the vertex is part of.

Note: This algorithm is for undirected graphs and ignores edge directions.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "triangle_count",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/triangle_count"
 }
}

3.8 K-Core

The K-Core algorithm marks all vertices with a degree of K, suitable for graph pruning and finding the core part of the graph.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "kcore",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/kcore",
 "kcore.degree_k":"5"
 }
}

3.9 SSSP (Single Source Shortest Path)

The single source the shortest path algorithm calculates the shortest distance from one point to all other points.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "sssp",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/degree",
 "sssp.source":"tom"
 }
}

3.10 KOUT

Starting from a point, get the k-layer nodes of this point.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "kout",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/kout",
 "kout.source":"tom",
 "compute.max_step":"6"
 }
}

3.11 Louvain

The Louvain algorithm is a community detection algorithm based on modularity. The basic idea is that nodes in the network try to traverse all neighbor community labels and choose the community label that maximizes the modularity increment. After maximizing modularity, each community is regarded as a new node, and the process is repeated until the modularity no longer increases.

The distributed Louvain algorithm implemented on Vermeer is affected by factors such as node order and parallel computation. Due to the random traversal order of the Louvain algorithm, community compression also has a certain randomness, leading to different results in multiple executions. However, the overall trend will not change significantly.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "louvain",
 "compute.parallel":"10",
 "compute.max_step":"1000",
 "louvain.threshold":"0.0000001",
 "louvain.resolution":"1.0",
 "louvain.step":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/louvain"
  }
 }

3.12 Jaccard Similarity Coefficient

The Jaccard index, also known as the Jaccard similarity coefficient, is used to compare the similarity and diversity between finite sample sets. The larger the Jaccard coefficient value, the higher the similarity of the samples. It is used to calculate the Jaccard similarity coefficient between a given source point and all other points in the graph.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "jaccard",
 "compute.parallel":"10",
 "compute.max_step":"2",
 "jaccard.source":"123",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/jaccard"
 }
}

3.13 Personalized PageRank

The goal of personalized PageRank is to calculate the relevance of all nodes relative to user u. Starting from the node corresponding to user u, at each node, there is a probability of 1-d to stop walking and start again from u, or a probability of d to continue walking, randomly selecting a node from the nodes pointed to by the current node to walk down. It is used to calculate the personalized PageRank score starting from a given starting point, suitable for scenarios such as social recommendations.

Since the calculation requires using out-degree, load.use_out_degree needs to be set to 1 when reading the graph.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "ppr",
 "compute.parallel":"100",
 "compute.max_step":"10",
 "ppr.source":"123",
 "ppr.damping":"0.85",
 "ppr.diff_threshold":"0.00001",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/ppr"
 }
}

3.14 Global Kout

Calculate the k-degree neighbors of all nodes in the graph (excluding themselves and 1~k-1 degree neighbors). Due to the severe memory expansion of the global kout algorithm, k is currently limited to 1 and 2. Additionally, the global kout algorithm supports filtering functions (parameters such as “compute.filter”:“risk_level==1”), and the filtering condition is judged when calculating the k-degree. The final result set includes those that meet the filtering condition. The algorithm’s final output is the number of neighbors that meet the condition.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "kout_all",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"10",
 "output.file_path":"result/kout",
 "compute.max_step":"2",
 "compute.filter":"risk_level==1"
 }
}

3.15 Clustering Coefficient

The clustering coefficient represents the coefficient of the clustering degree of nodes in a graph. In real networks, especially in specific networks, nodes tend to establish a tightly organized relationship due to relatively high-density connection points. The clustering coefficient algorithm (Cluster Coefficient) is used to calculate the clustering degree of nodes in the graph. This algorithm is for local clustering coefficients. The local clustering coefficient can measure the clustering degree around each node in the graph.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "clustering_coefficient",
 "compute.parallel":"100",
 "compute.max_step":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/cc"
 }
}

3.16 SCC (Strongly Connected Components)

In the mathematical theory of directed graphs, if every vertex of a graph can be reached from any other point in the graph, the graph is said to be strongly connected. The parts of any directed graph that can achieve strong connectivity are called strongly connected components. It indicates the connectivity between points and distinguishes different connected communities.

Request example:

POST http://localhost:8688/tasks/create
{
 "task_type": "compute",
 "graph": "testdb",
 "params": {
 "compute.algorithm": "scc",
 "compute.parallel":"10",
 "output.type":"local",
 "output.parallel":"1",
 "output.file_path":"result/scc",
 "compute.max_step":"200"
 }
}

🚧, further updates and improvements will be made at any time. Suggestions and feedback are welcome.

2 - HugeGraph-Computer Quick Start

1 HugeGraph-Computer Overview

The HugeGraph-Computer is a distributed graph processing system for HugeGraph (OLAP). It is an implementation of Pregel. It runs on a Kubernetes(K8s) framework.(It focuses on supporting graph data volumes of hundreds of billions to trillions, using disk for sorting and acceleration, which is one of the biggest differences from Vermeer)

Features

  • Support distributed MPP graph computing, and integrates with HugeGraph as graph input/output storage.
  • Based on the BSP (Bulk Synchronous Parallel) model, an algorithm performs computing through multiple parallel iterations; every iteration is a superstep.
  • Auto memory management. The framework will never be OOM(Out of Memory) since it will split some data to disk if it doesn’t have enough memory to hold all the data.
  • The part of edges or the messages of super node can be in memory, so you will never lose it.
  • You can load the data from HDFS or HugeGraph, or any other system.
  • You can output the results to HDFS or HugeGraph, or any other system.
  • Easy to develop a new algorithm. You just need to focus on vertex-only processing just like as in a single server, without worrying about message transfer and memory/storage management.

2 Dependency for Building/Running

2.1 Install Java 11 (JDK 11)

Must use ≥ Java 11 to run Computer, and configure by yourself.

Be sure to execute the java -version command to check the jdk version before reading

3 Get Started

3.1 Run PageRank algorithm locally

To run the algorithm with HugeGraph-Computer, you need to install Java 11 or later versions.

You also need to deploy HugeGraph-Server and Etcd.

There are two ways to get HugeGraph-Computer:

  • Download the compiled tarball
  • Clone source code then compile and package

3.1.1 Download the compiled archive

Download the latest version of the HugeGraph-Computer release package:

wget https://downloads.apache.org/incubator/hugegraph/${version}/apache-hugegraph-computer-incubating-${version}.tar.gz
tar zxvf apache-hugegraph-computer-incubating-${version}.tar.gz -C hugegraph-computer

3.1.2 Clone source code to compile and package

Clone the latest version of HugeGraph-Computer source package:

$ git clone https://github.com/apache/hugegraph-computer.git

Compile and generate tar package:

cd hugegraph-computer
mvn clean package -DskipTests

3.1.3 Configure computer.properties

Edit conf/computer.properties to configure the connection to HugeGraph-Server and etcd:

# Job configuration
job.id=local_pagerank_001
job.partitions_count=4

# HugeGraph connection (✅ Correct configuration keys)
hugegraph.url=http://localhost:8080
hugegraph.name=hugegraph
# If authentication is enabled on HugeGraph-Server
hugegraph.username=
hugegraph.password=

# BSP coordination (✅ Correct key: bsp.etcd_endpoints)
bsp.etcd_endpoints=http://localhost:2379
bsp.max_super_step=10

# Algorithm parameters (⚠️ Required)
algorithm.params_class=org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams

Important Configuration Notes:

  • Use bsp.etcd_endpoints (NOT bsp.etcd.url) for etcd connection
  • algorithm.params_class is required for all algorithms
  • For multiple etcd endpoints, use comma-separated list: http://host1:2379,http://host2:2379

3.1.4 Start master node

You can use -c parameter specify the configuration file, more computer config please see:Computer Config Options

cd hugegraph-computer
bin/start-computer.sh -d local -r master

3.1.5 Start worker node

bin/start-computer.sh -d local -r worker

3.1.6 Query algorithm results

3.1.6.1 Enable OLAP index query for server

If the OLAP index is not enabled, it needs to be enabled. More reference: modify-graphs-read-mode

PUT http://localhost:8080/graphs/hugegraph/graph_read_mode

"ALL"

3.1.6.2 Query page_rank property value:

curl "http://localhost:8080/graphs/hugegraph/graph/vertices?page&limit=3" | gunzip

3.2 Run PageRank algorithm in Kubernetes

To run an algorithm with HugeGraph-Computer, you need to deploy HugeGraph-Server first

3.2.1 Install HugeGraph-Computer CRD

# Kubernetes version >= v1.16
kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1.yaml

# Kubernetes version < v1.16
kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1beta1.yaml

3.2.2 Show CRD

kubectl get crd

NAME                                        CREATED AT
hugegraphcomputerjobs.hugegraph.apache.org   2021-09-16T08:01:08Z

3.2.3 Install hugegraph-computer-operator&etcd-server

kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-operator.yaml

3.2.4 Wait for hugegraph-computer-operator&etcd-server deployment to complete

kubectl get pod -n hugegraph-computer-operator-system

NAME                                                              READY   STATUS    RESTARTS   AGE
hugegraph-computer-operator-controller-manager-58c5545949-jqvzl   1/1     Running   0          15h
hugegraph-computer-operator-etcd-28lm67jxk5                       1/1     Running   0          15h

3.2.5 Submit a job

More computer crd please see: Computer CRD

More computer config please see: Computer Config Options

Basic Example:

cat <<EOF | kubectl apply --filename -
apiVersion: hugegraph.apache.org/v1
kind: HugeGraphComputerJob
metadata:
  namespace: hugegraph-computer-operator-system
  name: &jobName pagerank-sample
spec:
  jobId: *jobName
  algorithmName: page_rank  # ✅ Correct: use underscore format (matches algorithm implementation)
  image: hugegraph/hugegraph-computer:latest
  jarFile: /hugegraph/hugegraph-computer/algorithm/builtin-algorithm.jar
  pullPolicy: Always
  workerCpu: "4"
  workerMemory: "4Gi"
  workerInstances: 5
  computerConf:
    job.partitions_count: "20"
    algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams
    hugegraph.url: http://${hugegraph-server-host}:${hugegraph-server-port}
    hugegraph.name: hugegraph
EOF

Complete Example with Advanced Features:

cat <<EOF | kubectl apply --filename -
apiVersion: hugegraph.apache.org/v1
kind: HugeGraphComputerJob
metadata:
  namespace: hugegraph-computer-operator-system
  name: &jobName pagerank-advanced
spec:
  jobId: *jobName
  algorithmName: page_rank  # ✅ Correct: underscore format
  image: hugegraph/hugegraph-computer:latest
  jarFile: /hugegraph/hugegraph-computer/algorithm/builtin-algorithm.jar
  pullPolicy: Always

  # Resource limits
  masterCpu: "2"
  masterMemory: "2Gi"
  workerCpu: "4"
  workerMemory: "4Gi"
  workerInstances: 5

  # JVM options
  jvmOptions: "-Xmx3g -Xms3g -XX:+UseG1GC"

  # Environment variables (optional)
  envVars:
    - name: REMOTE_JAR_URI
      value: "http://example.com/custom-algorithm.jar"  # Download custom algorithm JAR
    - name: LOG_LEVEL
      value: "INFO"

  # Computer configuration
  computerConf:
    # Job settings
    job.partitions_count: "20"

    # Algorithm parameters (⚠️ Required)
    algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams
    page_rank.alpha: "0.85"  # PageRank damping factor

    # HugeGraph connection
    hugegraph.url: http://hugegraph-server:8080
    hugegraph.name: hugegraph
    hugegraph.username: ""  # Fill if authentication is enabled
    hugegraph.password: ""

    # BSP configuration (⚠️ System-managed in K8s, do not override)
    # bsp.etcd_endpoints is automatically set by operator
    bsp.max_super_step: "20"
    bsp.log_interval: "30000"

    # Snapshot configuration (optional)
    snapshot.write: "true"       # Enable snapshot writing
    snapshot.load: "false"       # Do not load from snapshot this time
    snapshot.name: "pagerank-snapshot-v1"
    snapshot.minio_endpoint: "http://minio:9000"
    snapshot.minio_access_key: "minioadmin"
    snapshot.minio_secret_key: "minioadmin"
    snapshot.minio_bucket_name: "hugegraph-snapshots"

    # Output configuration
    output.result_name: "page_rank"
    output.batch_size: "500"
    output.with_adjacent_edges: "false"
EOF

Configuration Notes:

Configuration Key⚠️ Important Notes
algorithmNameMust use page_rank (underscore format), matches the algorithm’s name() method return value
bsp.etcd_endpointsSystem-managed in K8s - automatically set by operator, do not override in computerConf
algorithm.params_classRequired - must specify for all algorithms
REMOTE_JAR_URIOptional environment variable to download custom algorithm JAR from remote URL
snapshot.*Optional - enable snapshots for checkpoint recovery or repeated computations

3.2.6 Show job

kubectl get hcjob/pagerank-sample -n hugegraph-computer-operator-system

NAME               JOBID              JOBSTATUS
pagerank-sample    pagerank-sample    RUNNING

3.2.7 Show log of nodes

# Show the master log
kubectl logs -l component=pagerank-sample-master -n hugegraph-computer-operator-system

# Show the worker log
kubectl logs -l component=pagerank-sample-worker -n hugegraph-computer-operator-system

# Show diagnostic log of a job
# NOTE: diagnostic log exist only when the job fails, and it will only be saved for one hour.
kubectl get event --field-selector reason=ComputerJobFailed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system

3.2.8 Show success event of a job

NOTE: it will only be saved for one hour

kubectl get event --field-selector reason=ComputerJobSucceed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system

3.2.9 Query algorithm results

If the output to Hugegraph-Server is consistent with Locally, if output to HDFS, please check the result file in the directory of /hugegraph-computer/results/{jobId} directory.


3.3 Local Mode vs Kubernetes Mode

Understanding the differences helps you choose the right deployment mode for your use case.

FeatureLocal ModeKubernetes Mode
Configurationconf/computer.properties fileCRD YAML computerConf field
Etcd ManagementManual deployment of external etcdOperator auto-deploys etcd StatefulSet
Worker ScalingManual start of multiple processesCRD workerInstances field auto-scales
Resource IsolationShared host resourcesPod-level CPU/Memory limits
Remote JARJAR_FILE_PATH environment variableCRD remoteJarUri or envVars.REMOTE_JAR_URI
Log ViewingLocal logs/ directorykubectl logs command
Fault RecoveryManual process restartK8s auto-restarts failed pods
Use CasesDevelopment, testing, small datasetsProduction, large-scale data

Local Mode Prerequisites:

  • Java 11+
  • HugeGraph-Server running on localhost:8080
  • Etcd running on localhost:2379

K8s Mode Prerequisites:

  • Kubernetes cluster (version 1.16+)
  • HugeGraph-Server accessible from cluster
  • HugeGraph-Computer Operator installed

Configuration Key Differences:

# Local Mode (computer.properties)
bsp.etcd_endpoints=http://localhost:2379  # ✅ User-configured
job.workers_count=4                        # User-configured
# K8s Mode (CRD)
spec:
  workerInstances: 5  # Overrides job.workers_count
  computerConf:
    # bsp.etcd_endpoints is auto-set by operator, do NOT configure
    job.partitions_count: "20"

3.4 Common Troubleshooting

3.4.1 Configuration Errors

Error: “Failed to connect to etcd”

Symptoms: Master or Worker cannot connect to etcd

Local Mode Solutions:

# Check configuration key name (common mistake)
grep "bsp.etcd_endpoints" conf/computer.properties
# Should output: bsp.etcd_endpoints=http://localhost:2379

# ❌ WRONG: bsp.etcd.url (old/incorrect key)
# ✅ CORRECT: bsp.etcd_endpoints

# Test etcd connectivity
curl http://localhost:2379/version

K8s Mode Solutions:

# Check Operator etcd service
kubectl get svc hugegraph-computer-operator-etcd -n hugegraph-computer-operator-system

# Verify etcd pod is running
kubectl get pods -n hugegraph-computer-operator-system -l app=hugegraph-computer-operator-etcd
# Should show: Running status

# Test connectivity from worker pod
kubectl exec -it pagerank-sample-worker-0 -n hugegraph-computer-operator-system -- \
  curl http://hugegraph-computer-operator-etcd:2379/version

Error: “Algorithm class not found”

Symptoms: Cannot find algorithm implementation class

Cause: Incorrect algorithmName format

# ❌ WRONG formats:
algorithmName: pageRank   # Camel case
algorithmName: PageRank   # Title case

# ✅ CORRECT format (matches PageRank.name() return value):
algorithmName: page_rank  # Underscore lowercase

Verification:

# Check algorithm implementation in source code
# File: computer-algorithm/.../PageRank.java
# Method: public String name() { return "page_rank"; }

Error: “Required option ‘algorithm.params_class’ is missing”

Solution:

computerConf:
  algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams  # ⚠️ Required

3.4.2 K8s Deployment Issues

Issue: REMOTE_JAR_URI not working

Solution:

spec:
  envVars:
    - name: REMOTE_JAR_URI
      value: "http://example.com/my-algorithm.jar"

Issue: Etcd connection timeout in K8s

Check Operator etcd:

# Verify etcd is running
kubectl get pods -n hugegraph-computer-operator-system -l app=hugegraph-computer-operator-etcd
# Should show: Running

# From worker pod, test etcd connectivity
kubectl exec -it pagerank-sample-worker-0 -n hugegraph-computer-operator-system -- \
  curl http://hugegraph-computer-operator-etcd:2379/version

Issue: Snapshot/MinIO configuration problems

Verify MinIO service:

# Test MinIO reachability
kubectl run -it --rm debug --image=alpine --restart=Never -- sh
wget -O- http://minio:9000/minio/health/live

# Test bucket permissions (requires MinIO client)
mc config host add myminio http://minio:9000 minioadmin minioadmin
mc ls myminio/hugegraph-snapshots

3.4.3 Job Status Checks

Check job overall status:

kubectl get hcjob pagerank-sample -n hugegraph-computer-operator-system
# Output example:
# NAME              JOBSTATUS   SUPERSTEP   MAXSUPERSTEP   SUPERSTEPSTAT
# pagerank-sample   Running     5           20             COMPUTING

Check detailed events:

kubectl describe hcjob pagerank-sample -n hugegraph-computer-operator-system

Check failure reasons:

kubectl get events --field-selector reason=ComputerJobFailed \
  --field-selector involvedObject.name=pagerank-sample \
  -n hugegraph-computer-operator-system

Real-time master logs:

kubectl logs -f -l component=pagerank-sample-master -n hugegraph-computer-operator-system

All worker logs:

kubectl logs -l component=pagerank-sample-worker -n hugegraph-computer-operator-system --all-containers=true

4. Built-In algorithms document

4.1 Supported algorithms list:

Centrality Algorithm:
  • PageRank
  • BetweennessCentrality
  • ClosenessCentrality
  • DegreeCentrality
Community Algorithm:
  • ClusteringCoefficient
  • Kcore
  • Lpa
  • TriangleCount
  • Wcc
Path Algorithm:
  • RingsDetection
  • RingsDetectionWithFilter

More algorithms please see: Built-In algorithms

4.2 Algorithm describe

TODO

5 Algorithm development guide

TODO

6 Note

  • If some classes under computer-k8s cannot be found, you need to execute mvn compile in advance to generate corresponding classes.

3 - HugeGraph-Computer Configuration Reference

Computer Config Options

Default Value Notes:

  • Configuration items listed below show the code default values (defined in ComputerOptions.java)
  • When the packaged configuration file (conf/computer.properties in the distribution) specifies a different value, it’s noted as: value (packaged: value)
  • Example: 300000 (packaged: 100000) means the code default is 300000, but the distributed package defaults to 100000
  • For production deployments, the packaged defaults take precedence unless you explicitly override them

1. Basic Configuration

Core job settings for HugeGraph-Computer.

config optiondefault valuedescription
hugegraph.urlhttp://127.0.0.1:8080The HugeGraph server URL to load data and write results back.
hugegraph.namehugegraphThe graph name to load data and write results back.
hugegraph.username"" (empty)The username for HugeGraph authentication (leave empty if authentication is disabled).
hugegraph.password"" (empty)The password for HugeGraph authentication (leave empty if authentication is disabled).
job.idlocal_0001 (packaged: local_001)The job identifier on YARN cluster or K8s cluster.
job.namespace"" (empty)The job namespace that can separate different data sources. 🔒 Managed by system - do not modify manually.
job.workers_count1The number of workers for computing one graph algorithm job. 🔒 Managed by system - do not modify manually in K8s.
job.partitions_count1The number of partitions for computing one graph algorithm job.
job.partitions_thread_nums4The number of threads for partition parallel compute.

2. Algorithm Configuration

Algorithm-specific configuration for computation logic.

config optiondefault valuedescription
algorithm.params_classorg.apache.hugegraph.computer.core.config.Null⚠️ REQUIRED The class used to transfer algorithm parameters before the algorithm is run.
algorithm.result_classorg.apache.hugegraph.computer.core.config.NullThe class of vertex’s value, used to store the computation result for the vertex.
algorithm.message_classorg.apache.hugegraph.computer.core.config.NullThe class of message passed when computing a vertex.

3. Input Configuration

Configuration for loading input data from HugeGraph or other sources.

3.1 Input Source

config optiondefault valuedescription
input.source_typehugegraph-serverThe source type to load input data, allowed values: [‘hugegraph-server’, ‘hugegraph-loader’]. The ‘hugegraph-loader’ means use hugegraph-loader to load data from HDFS or file. If using ‘hugegraph-loader’, please configure ‘input.loader_struct_path’ and ‘input.loader_schema_path’.
input.loader_struct_path"" (empty)The struct path of loader input, only takes effect when input.source_type=loader is enabled.
input.loader_schema_path"" (empty)The schema path of loader input, only takes effect when input.source_type=loader is enabled.

3.2 Input Splits

config optiondefault valuedescription
input.split_size1048576 (1 MB)The input split size in bytes.
input.split_max_splits10000000The maximum number of input splits.
input.split_page_size500The page size for streamed load input split data.
input.split_fetch_timeout300The timeout in seconds to fetch input splits.

3.3 Input Processing

config optiondefault valuedescription
input.filter_classorg.apache.hugegraph.computer.core.input.filter.DefaultInputFilterThe class to create input-filter object. Input-filter is used to filter vertex edges according to user needs.
input.edge_directionOUTThe direction of edges to load, allowed values: [OUT, IN, BOTH]. When the value is BOTH, edges in both OUT and IN directions will be loaded.
input.edge_freqMULTIPLEThe frequency of edges that can exist between a pair of vertices, allowed values: [SINGLE, SINGLE_PER_LABEL, MULTIPLE]. SINGLE means only one edge can exist between a pair of vertices (identified by sourceId + targetId); SINGLE_PER_LABEL means each edge label can have one edge between a pair of vertices (identified by sourceId + edgeLabel + targetId); MULTIPLE means many edges can exist between a pair of vertices (identified by sourceId + edgeLabel + sortValues + targetId).
input.max_edges_in_one_vertex200The maximum number of adjacent edges allowed to be attached to a vertex. The adjacent edges will be stored and transferred together as a batch unit.

3.4 Input Performance

config optiondefault valuedescription
input.send_thread_nums4The number of threads for parallel sending of vertices or edges.

4. Snapshot & Storage Configuration

HugeGraph-Computer supports snapshot functionality to save vertex/edge partitions to local storage or MinIO object storage, enabling checkpoint recovery or accelerating repeated computations.

4.1 Basic Snapshot Configuration

config optiondefault valuedescription
snapshot.writefalseWhether to write snapshots of input vertex/edge partitions.
snapshot.loadfalseWhether to load from snapshots of vertex/edge partitions.
snapshot.name"" (empty)User-defined snapshot name to distinguish different snapshots.

4.2 MinIO Integration (Optional)

MinIO can be used as a distributed object storage backend for snapshots in K8s deployments.

config optiondefault valuedescription
snapshot.minio_endpoint"" (empty)MinIO service endpoint (e.g., http://minio:9000). Required when using MinIO.
snapshot.minio_access_keyminioadminMinIO access key for authentication.
snapshot.minio_secret_keyminioadminMinIO secret key for authentication.
snapshot.minio_bucket_name"" (empty)MinIO bucket name for storing snapshot data.

Usage Scenarios:

  • Checkpoint Recovery: Resume from snapshots after job failures, avoiding data reloading
  • Repeated Computations: Load data from snapshots when running the same algorithm multiple times
  • A/B Testing: Save multiple snapshot versions of the same dataset to test different algorithm parameters

Example: Local Snapshot (in computer.properties):

snapshot.write=true
snapshot.name=pagerank-snapshot-20260201

Example: MinIO Snapshot (in K8s CRD computerConf):

computerConf:
  snapshot.write: "true"
  snapshot.name: "pagerank-snapshot-v1"
  snapshot.minio_endpoint: "http://minio:9000"
  snapshot.minio_access_key: "my-access-key"
  snapshot.minio_secret_key: "my-secret-key"
  snapshot.minio_bucket_name: "hugegraph-snapshots"

5. Worker & Master Configuration

Configuration for worker and master computation logic.

5.1 Master Configuration

config optiondefault valuedescription
master.computation_classorg.apache.hugegraph.computer.core.master.DefaultMasterComputationMaster-computation is computation that can determine whether to continue to the next superstep. It runs at the end of each superstep on the master.

5.2 Worker Computation

config optiondefault valuedescription
worker.computation_classorg.apache.hugegraph.computer.core.config.NullThe class to create worker-computation object. Worker-computation is used to compute each vertex in each superstep.
worker.combiner_classorg.apache.hugegraph.computer.core.config.NullCombiner can combine messages into one value for a vertex. For example, PageRank algorithm can combine messages of a vertex to a sum value.
worker.partitionerorg.apache.hugegraph.computer.core.graph.partition.HashPartitionerThe partitioner that decides which partition a vertex should be in, and which worker a partition should be in.

5.3 Worker Combiners

config optiondefault valuedescription
worker.vertex_properties_combiner_classorg.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombinerThe combiner can combine several properties of the same vertex into one properties at input step.
worker.edge_properties_combiner_classorg.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombinerThe combiner can combine several properties of the same edge into one properties at input step.

5.4 Worker Buffers

config optiondefault valuedescription
worker.received_buffers_bytes_limit104857600 (100 MB)The limit bytes of buffers of received data. The total size of all buffers can’t exceed this limit. If received buffers reach this limit, they will be merged into a file (spill to disk).
worker.write_buffer_capacity52428800 (50 MB)The initial size of write buffer that used to store vertex or message.
worker.write_buffer_threshold52428800 (50 MB)The threshold of write buffer. Exceeding it will trigger sorting. The write buffer is used to store vertex or message.

5.5 Worker Data & Timeouts

config optiondefault valuedescription
worker.data_dirs[jobs]The directories separated by ‘,’ that received vertices and messages can persist into.
worker.wait_sort_timeout600000 (10 minutes)The max timeout (in ms) for message-handler to wait for sort-thread to sort one batch of buffers.
worker.wait_finish_messages_timeout86400000 (24 hours)The max timeout (in ms) for message-handler to wait for finish-message of all workers.

6. I/O & Output Configuration

Configuration for output computation results.

6.1 Output Class & Result

config optiondefault valuedescription
output.output_classorg.apache.hugegraph.computer.core.output.LogOutputThe class to output the computation result of each vertex. Called after iteration computation.
output.result_namevalueThe value is assigned dynamically by #name() of instance created by WORKER_COMPUTATION_CLASS.
output.result_write_typeOLAP_COMMONThe result write-type to output to HugeGraph, allowed values: [OLAP_COMMON, OLAP_SECONDARY, OLAP_RANGE].

6.2 Output Behavior

config optiondefault valuedescription
output.with_adjacent_edgesfalseWhether to output the adjacent edges of the vertex.
output.with_vertex_propertiesfalseWhether to output the properties of the vertex.
output.with_edge_propertiesfalseWhether to output the properties of the edge.

6.3 Batch Output

config optiondefault valuedescription
output.batch_size500The batch size of output.
output.batch_threads1The number of threads used for batch output.
output.single_threads1The number of threads used for single output.

6.4 HDFS Output

config optiondefault valuedescription
output.hdfs_urlhdfs://127.0.0.1:9000The HDFS URL for output.
output.hdfs_userhadoopThe HDFS user for output.
output.hdfs_path_prefix/hugegraph-computer/resultsThe directory of HDFS output results.
output.hdfs_delimiter, (comma)The delimiter of HDFS output.
output.hdfs_merge_partitionstrueWhether to merge output files of multiple partitions.
output.hdfs_replication3The replication number of HDFS.
output.hdfs_core_site_path"" (empty)The HDFS core site path.
output.hdfs_site_path"" (empty)The HDFS site path.
output.hdfs_kerberos_enablefalseWhether Kerberos authentication is enabled for HDFS.
output.hdfs_kerberos_principal"" (empty)The HDFS principal for Kerberos authentication.
output.hdfs_kerberos_keytab"" (empty)The HDFS keytab file for Kerberos authentication.
output.hdfs_krb5_conf/etc/krb5.confKerberos configuration file path.

6.5 Retry & Timeout

config optiondefault valuedescription
output.retry_times3The retry times when output fails.
output.retry_interval10The retry interval (in seconds) when output fails.
output.thread_pool_shutdown_timeout60The timeout (in seconds) of output thread pool shutdown.

7. Network & Transport Configuration

Configuration for network communication between workers and master.

7.1 Server Configuration

config optiondefault valuedescription
transport.server_host127.0.0.1🔒 Managed by system The server hostname or IP to listen on to transfer data. Do not modify manually.
transport.server_port0🔒 Managed by system The server port to listen on to transfer data. The system will assign a random port if set to 0. Do not modify manually.
transport.server_threads4The number of transport threads for server.

7.2 Client Configuration

config optiondefault valuedescription
transport.client_threads4The number of transport threads for client.
transport.client_connect_timeout3000The timeout (in ms) of client connect to server.

7.3 Protocol Configuration

config optiondefault valuedescription
transport.provider_classorg.apache.hugegraph.computer.core.network.netty.NettyTransportProviderThe transport provider, currently only supports Netty.
transport.io_modeAUTOThe network IO mode, allowed values: [NIO, EPOLL, AUTO]. AUTO means selecting the appropriate mode automatically.
transport.tcp_keep_alivetrueWhether to enable TCP keep-alive.
transport.transport_epoll_ltfalseWhether to enable EPOLL level-trigger (only effective when io_mode=EPOLL).

7.4 Buffer Configuration

config optiondefault valuedescription
transport.send_buffer_size0The size of socket send-buffer in bytes. 0 means using system default value.
transport.receive_buffer_size0The size of socket receive-buffer in bytes. 0 means using system default value.
transport.write_buffer_high_mark67108864 (64 MB)The high water mark for write buffer in bytes. It will trigger sending unavailable if the number of queued bytes > write_buffer_high_mark.
transport.write_buffer_low_mark33554432 (32 MB)The low water mark for write buffer in bytes. It will trigger sending available if the number of queued bytes < write_buffer_low_mark.

7.5 Flow Control

config optiondefault valuedescription
transport.max_pending_requests8The max number of client unreceived ACKs. It will trigger sending unavailable if the number of unreceived ACKs >= max_pending_requests.
transport.min_pending_requests6The minimum number of client unreceived ACKs. It will trigger sending available if the number of unreceived ACKs < min_pending_requests.
transport.min_ack_interval200The minimum interval (in ms) of server reply ACK.

7.6 Timeouts

config optiondefault valuedescription
transport.close_timeout10000The timeout (in ms) of close server or close client.
transport.sync_request_timeout10000The timeout (in ms) to wait for response after sending sync-request.
transport.finish_session_timeout0The timeout (in ms) to finish session. 0 means using (transport.sync_request_timeout × transport.max_pending_requests).
transport.write_socket_timeout3000The timeout (in ms) to write data to socket buffer.
transport.server_idle_timeout360000 (6 minutes)The max timeout (in ms) of server idle.

7.7 Heartbeat

config optiondefault valuedescription
transport.heartbeat_interval20000 (20 seconds)The minimum interval (in ms) between heartbeats on client side.
transport.max_timeout_heartbeat_count120The maximum times of timeout heartbeat on client side. If the number of timeouts waiting for heartbeat response continuously > max_timeout_heartbeat_count, the channel will be closed from client side.

7.8 Advanced Network Settings

config optiondefault valuedescription
transport.max_syn_backlog511The capacity of SYN queue on server side. 0 means using system default value.
transport.recv_file_modetrueWhether to enable receive buffer-file mode. It will receive buffer and write to file from socket using zero-copy if enabled. Note: Requires OS support for zero-copy (e.g., Linux sendfile/splice).
transport.network_retries3The number of retry attempts for network communication if network is unstable.

8. Storage & Persistence Configuration

Configuration for HGKV (HugeGraph Key-Value) storage engine and value files.

8.1 HGKV Configuration

config optiondefault valuedescription
hgkv.max_file_size2147483648 (2 GB)The max number of bytes in each HGKV file.
hgkv.max_data_block_size65536 (64 KB)The max byte size of HGKV file data block.
hgkv.max_merge_files10The max number of files to merge at one time.
hgkv.temp_file_dir/tmp/hgkvThis folder is used to store temporary files during the file merging process.

8.2 Value File Configuration

config optiondefault valuedescription
valuefile.max_segment_size1073741824 (1 GB)The max number of bytes in each segment of value-file.

9. BSP & Coordination Configuration

Configuration for Bulk Synchronous Parallel (BSP) protocol and etcd coordination.

config optiondefault valuedescription
bsp.etcd_endpointshttp://localhost:2379🔒 Managed by system in K8s The endpoints to access etcd. For multiple endpoints, use comma-separated list: http://host1:port1,http://host2:port2. Do not modify manually in K8s deployments.
bsp.max_super_step10 (packaged: 2)The max super step of the algorithm.
bsp.register_timeout300000 (packaged: 100000)The max timeout (in ms) to wait for master and workers to register.
bsp.wait_workers_timeout86400000 (24 hours)The max timeout (in ms) to wait for workers BSP event.
bsp.wait_master_timeout86400000 (24 hours)The max timeout (in ms) to wait for master BSP event.
bsp.log_interval30000 (30 seconds)The log interval (in ms) to print the log while waiting for BSP event.

10. Performance Tuning Configuration

Configuration for performance optimization.

config optiondefault valuedescription
allocator.max_vertices_per_thread10000Maximum number of vertices per thread processed in each memory allocator.
sort.thread_nums4The number of threads performing internal sorting.

11. System Administration Configuration

⚠️ Configuration items managed by the system - users are prohibited from modifying these manually.

The following configuration items are automatically managed by the K8s Operator, Driver, or runtime system. Manual modification will cause cluster communication failures or job scheduling errors.

config optionmanaged bydescription
bsp.etcd_endpointsK8s OperatorAutomatically set to operator’s etcd service address
transport.server_hostRuntimeAutomatically set to pod/container hostname
transport.server_portRuntimeAutomatically assigned random port
job.namespaceK8s OperatorAutomatically set to job namespace
job.idK8s OperatorAutomatically set to job ID from CRD
job.workers_countK8s OperatorAutomatically set from CRD workerInstances
rpc.server_hostRuntimeRPC server hostname (system-managed)
rpc.server_portRuntimeRPC server port (system-managed)
rpc.remote_urlRuntimeRPC remote URL (system-managed)

Why These Are Forbidden:

  • BSP/RPC Configuration: Must match the actual deployed etcd/RPC services. Manual overrides break coordination.
  • Job Configuration: Must match K8s CRD specifications. Mismatches cause worker count errors.
  • Transport Configuration: Must use actual pod hostnames/ports. Manual values prevent inter-worker communication.

K8s Operator Config Options

NOTE: Option needs to be converted through environment variable settings, e.g. k8s.internal_etcd_url => INTERNAL_ETCD_URL

config optiondefault valuedescription
k8s.auto_destroy_podtrueWhether to automatically destroy all pods when the job is completed or failed.
k8s.close_reconciler_timeout120The max timeout (in ms) to close reconciler.
k8s.internal_etcd_urlhttp://127.0.0.1:2379The internal etcd URL for operator system.
k8s.max_reconcile_retry3The max retry times of reconcile.
k8s.probe_backlog50The maximum backlog for serving health probes.
k8s.probe_port9892The port that the controller binds to for serving health probes.
k8s.ready_check_internal1000The time interval (ms) of check ready.
k8s.ready_timeout30000The max timeout (in ms) of check ready.
k8s.reconciler_count10The max number of reconciler threads.
k8s.resync_period600000The minimum frequency at which watched resources are reconciled.
k8s.timezoneAsia/ShanghaiThe timezone of computer job and operator.
k8s.watch_namespacehugegraph-computer-systemThe namespace to watch custom resources in. Use ‘*’ to watch all namespaces.

HugeGraph-Computer CRD

CRD: https://github.com/apache/hugegraph-computer/blob/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1.yaml

specdefault valuedescriptionrequired
algorithmNameThe name of algorithm.true
jobIdThe job id.true
imageThe image of algorithm.true
computerConfThe map of computer config options.true
workerInstancesThe number of worker instances, it will override the ‘job.workers_count’ option.true
pullPolicyAlwaysThe pull-policy of image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#image-pull-policyfalse
pullSecretsThe pull-secrets of Image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-podfalse
masterCpuThe cpu limit of master, the unit can be ’m’ or without unit detail please refer to: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpufalse
workerCpuThe cpu limit of worker, the unit can be ’m’ or without unit detail please refer to: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpufalse
masterMemoryThe memory limit of master, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memoryfalse
workerMemoryThe memory limit of worker, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memoryfalse
log4jXmlThe content of log4j.xml for computer job.false
jarFileThe jar path of computer algorithm.false
remoteJarUriThe remote jar uri of computer algorithm, it will overlay algorithm image.false
jvmOptionsThe java startup parameters of computer job.false
envVarsplease refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/false
envFromplease refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/false
masterCommandbin/start-computer.shThe run command of master, equivalent to ‘Entrypoint’ field of Docker.false
masterArgs["-r master", “-d k8s”]The run args of master, equivalent to ‘Cmd’ field of Docker.false
workerCommandbin/start-computer.shThe run command of worker, equivalent to ‘Entrypoint’ field of Docker.false
workerArgs["-r worker", “-d k8s”]The run args of worker, equivalent to ‘Cmd’ field of Docker.false
volumesPlease refer to: https://kubernetes.io/docs/concepts/storage/volumes/false
volumeMountsPlease refer to: https://kubernetes.io/docs/concepts/storage/volumes/false
secretPathsThe map of k8s-secret name and mount path.false
configMapPathsThe map of k8s-configmap name and mount path.false
podTemplateSpecPlease refer to: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-template-v1/#PodTemplateSpecfalse
securityContextPlease refer to: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/false

KubeDriver Config Options

config optiondefault valuedescription
k8s.build_image_bash_pathThe path of command used to build image.
k8s.enable_internal_algorithmtrueWhether enable internal algorithm.
k8s.framework_image_urlhugegraph/hugegraph-computer:latestThe image url of computer framework.
k8s.image_repository_passwordThe password for login image repository.
k8s.image_repository_registryThe address for login image repository.
k8s.image_repository_urlhugegraph/hugegraph-computerThe url of image repository.
k8s.image_repository_usernameThe username for login image repository.
k8s.internal_algorithm[pageRank]The name list of all internal algorithm. Note: Algorithm names use camelCase here (e.g., pageRank), but algorithm implementations return underscore_case (e.g., page_rank).
k8s.internal_algorithm_image_urlhugegraph/hugegraph-computer:latestThe image url of internal algorithm.
k8s.jar_file_dir/cache/jars/The directory where the algorithm jar will be uploaded.
k8s.kube_config~/.kube/configThe path of k8s config file.
k8s.log4j_xml_pathThe log4j.xml path for computer job.
k8s.namespacehugegraph-computer-systemThe namespace of hugegraph-computer system.
k8s.pull_secret_names[]The names of pull-secret for pulling image.