DeepWiki provides real-time updated project documentation with more comprehensive and accurate content, suitable for quickly understanding the latest project information.
GitHub Access: https://github.com/apache/hugegraph-computer
This is the multi-page printable view of this section. Click here to print.
DeepWiki provides real-time updated project documentation with more comprehensive and accurate content, suitable for quickly understanding the latest project information.
GitHub Access: https://github.com/apache/hugegraph-computer
Vermeer is a high-performance, memory-first graph computing framework written in Go (start once, execute any task), supporting ultra-fast computation of 15+ OLAP graph algorithms (most tasks complete in seconds to minutes), with master and worker roles. Currently, there is only one master (HA can be added), and there can be multiple workers.
The master is responsible for communication, forwarding, and aggregation, with minimal computation and resource usage. Workers are computation nodes used to store graph data and run computation tasks, consuming a large amount of memory and CPU. The grpc and rest modules handle internal communication and external calls, respectively.
The framework’s runtime configuration can be passed via command-line parameters or specified in configuration files located in the config/ directory. The --env parameter can specify which configuration file to use, e.g., --env=master specifies using master.ini. Note that the master needs to specify the listening port, and the worker needs to specify the listening port and the master’s ip:port.
Please ensure that docker-compose.yaml exists in your project root directory. If it doesn’t, here is an example:
services:
vermeer-master:
image: hugegraph/vermeer
container_name: vermeer-master
volumes:
- ~/.config:/go/bin/config # Change here to your actual config path
command: --env=master
networks:
vermeer_network:
ipv4_address: 172.20.0.10 # Assign a static IP for the master
vermeer-worker:
image: hugegraph/vermeer
container_name: vermeer-worker
volumes:
- ~/:/go/bin/config # Change here to your actual config path
command: --env=worker
networks:
vermeer_network:
ipv4_address: 172.20.0.11 # Assign a static IP for the worker
networks:
vermeer_network:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/24 # Define the subnet for your network
Modify docker-compose.yaml
~/:/go/bin/config to /home/user/config:/go/bin/config (or your own configuration directory).config folder for details.Build the Image and Start in the Project Directory (or docker build first, then docker-compose up)
# Build the image (in the project root vermeer directory)
docker build -t hugegraph/vermeer .
# Start the services (in the vermeer root directory)
docker-compose up -d
# Or use the new CLI:
# docker compose up -d
View Logs / Stop / Remove
docker-compose logs -f
docker-compose down
docker run (Manually create network and assign static IP)Ensure the CONFIG_DIR has proper read/execute permissions for the Docker process.
Build the image:
docker build -t hugegraph/vermeer .
Create a custom bridge network (one-time operation):
docker network create --driver bridge \
--subnet 172.20.0.0/24 \
vermeer_network
Run master (adjust CONFIG_DIR to your absolute configuration path, and you can adjust the IP as needed based on your actual situation).
CONFIG_DIR=/home/user/config
docker run -d \
--name vermeer-master \
--network vermeer_network --ip 172.20.0.10 \
-v ${CONFIG_DIR}:/go/bin/config \
hugegraph/vermeer \
--env=master
Run worker:
docker run -d \
--name vermeer-worker \
--network vermeer_network --ip 172.20.0.11 \
-v ${CONFIG_DIR}:/go/bin/config \
hugegraph/vermeer \
--env=worker
View logs / Stop / Remove:
docker logs -f vermeer-master
docker logs -f vermeer-worker
docker stop vermeer-master vermeer-worker
docker rm vermeer-master vermeer-worker
# Remove the custom network (if needed)
docker network rm vermeer_network
Build. You can refer Vermeer Readme.
go build
Enter the directory and input ./vermeer --env=master or ./vermeer --env=worker01.
This REST API provides all task creation functions, including reading graph data and various computation functions, offering both asynchronous and synchronous return interfaces. The returned content includes information about the created tasks. The overall process of using Vermeer is to first create a task to read the graph data, and after the graph is read, create a computation task to execute the computation. The graph will not be automatically deleted; multiple computation tasks can be run on one graph without repeated reading. If deletion is needed, the delete graph interface can be used. Task statuses can be divided into graph reading task status and computation task status. Generally, the client only needs to know four statuses: created, in progress, completed, and error. The graph status is the basis for determining whether the graph is available. If the graph is being read or the graph status is erroneous, the graph cannot be used to create computation tasks. The delete graph interface is only available when the graph is in the loaded or error status and has no computation tasks.
Available URLs are as follows:
Refer to the Vermeer parameter list document for specific parameters.
Vermeer provides three ways to load data:
You can obtain the dataset in advance, such as the Twitter-2010 dataset. Acquisition method: https://snap.stanford.edu/data/twitter-2010.html The first Twitter-2010.text.gz is sufficient.
Request Example:
POST http://localhost:8688/tasks/create
{
"task_type": "load",
"graph": "testdb",
"params": {
"load.parallel": "50",
"load.type": "local",
"load.vertex_files": "{\"localhost\":\"data/twitter-2010.v_[0,99]\"}",
"load.edge_files": "{\"localhost\":\"data/twitter-2010.e_[0,99]\"}",
"load.use_out_degree": "1",
"load.use_outedge": "1"
}
}
Request Example:
⚠️ Security Warning: Never store real passwords in configuration files or code. Use environment variables or a secure credential management system instead.
POST http://localhost:8688/tasks/create
{
"task_type": "load",
"graph": "testdb",
"params": {
"load.parallel": "50",
"load.type": "hugegraph",
"load.hg_pd_peers": "[\"<your-hugegraph-ip>:8686\"]",
"load.hugegraph_name": "DEFAULT/hugegraph2/g",
"load.hugegraph_username": "admin",
"load.hugegraph_password": "<your-password-here>",
"load.use_out_degree": "1",
"load.use_outedge": "1"
}
}
Request Example:
POST http://localhost:8688/tasks/create
{
"task_type": "load",
"graph": "testdb",
"params": {
"load.parallel": "50",
"load.type": "hdfs",
"load.hdfs_namenode": "name_node1:9000",
"load.hdfs_conf_path": "/path/to/conf",
"load.krb_realm": "EXAMPLE.COM",
"load.krb_name": "user@EXAMPLE.COM",
"load.krb_keytab_path": "/path/to/keytab",
"load.krb_conf_path": "/path/to/krb5.conf",
"load.hdfs_use_krb": "1",
"load.vertex_files": "/data/graph/vertices",
"load.edge_files": "/data/graph/edges",
"load.use_out_degree": "1",
"load.use_outedge": "1"
}
}
All Vermeer computation tasks support multiple result output methods, which can be customized: local, hdfs, afs, or hugegraph. Add the corresponding parameters under the params parameter when sending the request to take effect. When output.need_statistics is set to 1, it supports outputting statistical information of the computation results, which will be written in the interface task information. The statistical mode operators currently support “count” and “modularity,” but only for community detection algorithms.
Refer to the Vermeer parameter list document for specific parameters.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "pagerank",
"compute.parallel": "10",
"compute.max_step": "10",
"output.type": "local",
"output.parallel": "1",
"output.file_path": "result/pagerank"
}
}
The PageRank algorithm, also known as the web ranking algorithm, is a technique used by search engines to calculate the relevance and importance of web pages (nodes) based on their mutual hyperlinks.
The PageRank algorithm is suitable for scenarios such as web page ranking and identifying key figures in social networks.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "pagerank",
"compute.parallel":"10",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/pagerank",
"compute.max_step":"10"
}
}
The weakly connected components algorithm calculates all connected subgraphs in an undirected graph and outputs the weakly connected subgraph ID to which each vertex belongs, indicating the connectivity between points and distinguishing different connected communities.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "wcc",
"compute.parallel":"10",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/wcc",
"compute.max_step":"10"
}
}
The label propagation algorithm is a graph clustering algorithm commonly used in social networks to discover potential communities.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "lpa",
"compute.parallel":"10",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/lpa",
"compute.max_step":"10"
}
}
The degree centrality algorithm calculates the degree centrality value of each node in the graph, supporting both undirected and directed graphs. Degree centrality is an important indicator of node importance; the more edges a node has with other nodes, the higher its degree centrality value, and the more important the node is in the graph. In an undirected graph, degree centrality is calculated based on edge information to count the number of times a node appears, resulting in the degree centrality value of the node. In a directed graph, it is based on the direction of the edges, filtering based on input or output-edge information to count the number of times a node appears, resulting in the in-degree or out-degree value of the node. It indicates the importance of each point, with more important points having higher degrees.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "degree",
"compute.parallel":"10",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/degree",
"degree.direction":"both"
}
}
Closeness centrality is used to calculate the inverse of the shortest distance from a node to all other reachable nodes, accumulating and normalizing the value. Closeness centrality can be used to measure the time it takes for information to be transmitted from the node to other nodes. The larger the closeness centrality of a node, the closer its position in the graph is to the center, suitable for scenarios such as identifying key nodes in social networks.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "closeness_centrality",
"compute.parallel":"10",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/closeness_centrality",
"closeness_centrality.sample_rate":"0.01"
}
}
The betweenness centrality algorithm determines the value of a node as a “bridge” node; the larger the value, the more likely it is to be a necessary path between two points in the graph. Typical examples include mutual followers in social networks. It is suitable for measuring the degree of aggregation around a node in a community.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "betweenness_centrality",
"compute.parallel":"10",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/betweenness_centrality",
"betweenness_centrality.sample_rate":"0.01"
}
}
The triangle count algorithm calculates the number of triangles passing through each vertex, suitable for calculating the relationships between users and whether the associations form triangles. The more triangles, the higher the degree of association between nodes in the graph, and the tighter the organizational relationship. In social networks, triangles indicate cohesive communities, and identifying triangles helps understand clustering and interconnections among individuals or groups in the network. In financial or transaction networks, the presence of triangles may indicate suspicious or fraudulent activities, and triangle counting can help identify transaction patterns that may require further investigation.
The output result is the Triangle Count corresponding to each vertex, i.e., the number of triangles the vertex is part of.
Note: This algorithm is for undirected graphs and ignores edge directions.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "triangle_count",
"compute.parallel":"10",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/triangle_count"
}
}
The K-Core algorithm marks all vertices with a degree of K, suitable for graph pruning and finding the core part of the graph.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "kcore",
"compute.parallel":"10",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/kcore",
"kcore.degree_k":"5"
}
}
The single source the shortest path algorithm calculates the shortest distance from one point to all other points.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "sssp",
"compute.parallel":"10",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/degree",
"sssp.source":"tom"
}
}
Starting from a point, get the k-layer nodes of this point.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "kout",
"compute.parallel":"10",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/kout",
"kout.source":"tom",
"compute.max_step":"6"
}
}
The Louvain algorithm is a community detection algorithm based on modularity. The basic idea is that nodes in the network try to traverse all neighbor community labels and choose the community label that maximizes the modularity increment. After maximizing modularity, each community is regarded as a new node, and the process is repeated until the modularity no longer increases.
The distributed Louvain algorithm implemented on Vermeer is affected by factors such as node order and parallel computation. Due to the random traversal order of the Louvain algorithm, community compression also has a certain randomness, leading to different results in multiple executions. However, the overall trend will not change significantly.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "louvain",
"compute.parallel":"10",
"compute.max_step":"1000",
"louvain.threshold":"0.0000001",
"louvain.resolution":"1.0",
"louvain.step":"10",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/louvain"
}
}
The Jaccard index, also known as the Jaccard similarity coefficient, is used to compare the similarity and diversity between finite sample sets. The larger the Jaccard coefficient value, the higher the similarity of the samples. It is used to calculate the Jaccard similarity coefficient between a given source point and all other points in the graph.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "jaccard",
"compute.parallel":"10",
"compute.max_step":"2",
"jaccard.source":"123",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/jaccard"
}
}
The goal of personalized PageRank is to calculate the relevance of all nodes relative to user u. Starting from the node corresponding to user u, at each node, there is a probability of 1-d to stop walking and start again from u, or a probability of d to continue walking, randomly selecting a node from the nodes pointed to by the current node to walk down. It is used to calculate the personalized PageRank score starting from a given starting point, suitable for scenarios such as social recommendations.
Since the calculation requires using out-degree, load.use_out_degree needs to be set to 1 when reading the graph.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "ppr",
"compute.parallel":"100",
"compute.max_step":"10",
"ppr.source":"123",
"ppr.damping":"0.85",
"ppr.diff_threshold":"0.00001",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/ppr"
}
}
Calculate the k-degree neighbors of all nodes in the graph (excluding themselves and 1~k-1 degree neighbors). Due to the severe memory expansion of the global kout algorithm, k is currently limited to 1 and 2. Additionally, the global kout algorithm supports filtering functions (parameters such as “compute.filter”:“risk_level==1”), and the filtering condition is judged when calculating the k-degree. The final result set includes those that meet the filtering condition. The algorithm’s final output is the number of neighbors that meet the condition.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "kout_all",
"compute.parallel":"10",
"output.type":"local",
"output.parallel":"10",
"output.file_path":"result/kout",
"compute.max_step":"2",
"compute.filter":"risk_level==1"
}
}
The clustering coefficient represents the coefficient of the clustering degree of nodes in a graph. In real networks, especially in specific networks, nodes tend to establish a tightly organized relationship due to relatively high-density connection points. The clustering coefficient algorithm (Cluster Coefficient) is used to calculate the clustering degree of nodes in the graph. This algorithm is for local clustering coefficients. The local clustering coefficient can measure the clustering degree around each node in the graph.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "clustering_coefficient",
"compute.parallel":"100",
"compute.max_step":"10",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/cc"
}
}
In the mathematical theory of directed graphs, if every vertex of a graph can be reached from any other point in the graph, the graph is said to be strongly connected. The parts of any directed graph that can achieve strong connectivity are called strongly connected components. It indicates the connectivity between points and distinguishes different connected communities.
Request example:
POST http://localhost:8688/tasks/create
{
"task_type": "compute",
"graph": "testdb",
"params": {
"compute.algorithm": "scc",
"compute.parallel":"10",
"output.type":"local",
"output.parallel":"1",
"output.file_path":"result/scc",
"compute.max_step":"200"
}
}
🚧, further updates and improvements will be made at any time. Suggestions and feedback are welcome.
The HugeGraph-Computer is a distributed graph processing system for HugeGraph (OLAP). It is an implementation of Pregel. It runs on a Kubernetes(K8s) framework.(It focuses on supporting graph data volumes of hundreds of billions to trillions, using disk for sorting and acceleration, which is one of the biggest differences from Vermeer)
Must use ≥ Java 11 to run Computer, and configure by yourself.
Be sure to execute the java -version command to check the jdk version before reading
To run the algorithm with HugeGraph-Computer, you need to install Java 11 or later versions.
You also need to deploy HugeGraph-Server and Etcd.
There are two ways to get HugeGraph-Computer:
Download the latest version of the HugeGraph-Computer release package:
wget https://downloads.apache.org/incubator/hugegraph/${version}/apache-hugegraph-computer-incubating-${version}.tar.gz
tar zxvf apache-hugegraph-computer-incubating-${version}.tar.gz -C hugegraph-computer
Clone the latest version of HugeGraph-Computer source package:
$ git clone https://github.com/apache/hugegraph-computer.git
Compile and generate tar package:
cd hugegraph-computer
mvn clean package -DskipTests
Edit conf/computer.properties to configure the connection to HugeGraph-Server and etcd:
# Job configuration
job.id=local_pagerank_001
job.partitions_count=4
# HugeGraph connection (✅ Correct configuration keys)
hugegraph.url=http://localhost:8080
hugegraph.name=hugegraph
# If authentication is enabled on HugeGraph-Server
hugegraph.username=
hugegraph.password=
# BSP coordination (✅ Correct key: bsp.etcd_endpoints)
bsp.etcd_endpoints=http://localhost:2379
bsp.max_super_step=10
# Algorithm parameters (⚠️ Required)
algorithm.params_class=org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams
Important Configuration Notes:
- Use
bsp.etcd_endpoints(NOTbsp.etcd.url) for etcd connectionalgorithm.params_classis required for all algorithms- For multiple etcd endpoints, use comma-separated list:
http://host1:2379,http://host2:2379
You can use
-cparameter specify the configuration file, more computer config please see:Computer Config Options
cd hugegraph-computer
bin/start-computer.sh -d local -r master
bin/start-computer.sh -d local -r worker
3.1.6.1 Enable OLAP index query for server
If the OLAP index is not enabled, it needs to be enabled. More reference: modify-graphs-read-mode
PUT http://localhost:8080/graphs/hugegraph/graph_read_mode
"ALL"
3.1.6.2 Query page_rank property value:
curl "http://localhost:8080/graphs/hugegraph/graph/vertices?page&limit=3" | gunzip
To run an algorithm with HugeGraph-Computer, you need to deploy HugeGraph-Server first
# Kubernetes version >= v1.16
kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1.yaml
# Kubernetes version < v1.16
kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1beta1.yaml
kubectl get crd
NAME CREATED AT
hugegraphcomputerjobs.hugegraph.apache.org 2021-09-16T08:01:08Z
kubectl apply -f https://raw.githubusercontent.com/apache/hugegraph-computer/master/computer-k8s-operator/manifest/hugegraph-computer-operator.yaml
kubectl get pod -n hugegraph-computer-operator-system
NAME READY STATUS RESTARTS AGE
hugegraph-computer-operator-controller-manager-58c5545949-jqvzl 1/1 Running 0 15h
hugegraph-computer-operator-etcd-28lm67jxk5 1/1 Running 0 15h
More computer crd please see: Computer CRD
More computer config please see: Computer Config Options
Basic Example:
cat <<EOF | kubectl apply --filename -
apiVersion: hugegraph.apache.org/v1
kind: HugeGraphComputerJob
metadata:
namespace: hugegraph-computer-operator-system
name: &jobName pagerank-sample
spec:
jobId: *jobName
algorithmName: page_rank # ✅ Correct: use underscore format (matches algorithm implementation)
image: hugegraph/hugegraph-computer:latest
jarFile: /hugegraph/hugegraph-computer/algorithm/builtin-algorithm.jar
pullPolicy: Always
workerCpu: "4"
workerMemory: "4Gi"
workerInstances: 5
computerConf:
job.partitions_count: "20"
algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams
hugegraph.url: http://${hugegraph-server-host}:${hugegraph-server-port}
hugegraph.name: hugegraph
EOF
Complete Example with Advanced Features:
cat <<EOF | kubectl apply --filename -
apiVersion: hugegraph.apache.org/v1
kind: HugeGraphComputerJob
metadata:
namespace: hugegraph-computer-operator-system
name: &jobName pagerank-advanced
spec:
jobId: *jobName
algorithmName: page_rank # ✅ Correct: underscore format
image: hugegraph/hugegraph-computer:latest
jarFile: /hugegraph/hugegraph-computer/algorithm/builtin-algorithm.jar
pullPolicy: Always
# Resource limits
masterCpu: "2"
masterMemory: "2Gi"
workerCpu: "4"
workerMemory: "4Gi"
workerInstances: 5
# JVM options
jvmOptions: "-Xmx3g -Xms3g -XX:+UseG1GC"
# Environment variables (optional)
envVars:
- name: REMOTE_JAR_URI
value: "http://example.com/custom-algorithm.jar" # Download custom algorithm JAR
- name: LOG_LEVEL
value: "INFO"
# Computer configuration
computerConf:
# Job settings
job.partitions_count: "20"
# Algorithm parameters (⚠️ Required)
algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams
page_rank.alpha: "0.85" # PageRank damping factor
# HugeGraph connection
hugegraph.url: http://hugegraph-server:8080
hugegraph.name: hugegraph
hugegraph.username: "" # Fill if authentication is enabled
hugegraph.password: ""
# BSP configuration (⚠️ System-managed in K8s, do not override)
# bsp.etcd_endpoints is automatically set by operator
bsp.max_super_step: "20"
bsp.log_interval: "30000"
# Snapshot configuration (optional)
snapshot.write: "true" # Enable snapshot writing
snapshot.load: "false" # Do not load from snapshot this time
snapshot.name: "pagerank-snapshot-v1"
snapshot.minio_endpoint: "http://minio:9000"
snapshot.minio_access_key: "minioadmin"
snapshot.minio_secret_key: "minioadmin"
snapshot.minio_bucket_name: "hugegraph-snapshots"
# Output configuration
output.result_name: "page_rank"
output.batch_size: "500"
output.with_adjacent_edges: "false"
EOF
Configuration Notes:
| Configuration Key | ⚠️ Important Notes |
|---|---|
algorithmName | Must use page_rank (underscore format), matches the algorithm’s name() method return value |
bsp.etcd_endpoints | System-managed in K8s - automatically set by operator, do not override in computerConf |
algorithm.params_class | Required - must specify for all algorithms |
REMOTE_JAR_URI | Optional environment variable to download custom algorithm JAR from remote URL |
snapshot.* | Optional - enable snapshots for checkpoint recovery or repeated computations |
kubectl get hcjob/pagerank-sample -n hugegraph-computer-operator-system
NAME JOBID JOBSTATUS
pagerank-sample pagerank-sample RUNNING
# Show the master log
kubectl logs -l component=pagerank-sample-master -n hugegraph-computer-operator-system
# Show the worker log
kubectl logs -l component=pagerank-sample-worker -n hugegraph-computer-operator-system
# Show diagnostic log of a job
# NOTE: diagnostic log exist only when the job fails, and it will only be saved for one hour.
kubectl get event --field-selector reason=ComputerJobFailed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system
NOTE: it will only be saved for one hour
kubectl get event --field-selector reason=ComputerJobSucceed --field-selector involvedObject.name=pagerank-sample -n hugegraph-computer-operator-system
If the output to Hugegraph-Server is consistent with Locally, if output to HDFS, please check the result file in the directory of /hugegraph-computer/results/{jobId} directory.
Understanding the differences helps you choose the right deployment mode for your use case.
| Feature | Local Mode | Kubernetes Mode |
|---|---|---|
| Configuration | conf/computer.properties file | CRD YAML computerConf field |
| Etcd Management | Manual deployment of external etcd | Operator auto-deploys etcd StatefulSet |
| Worker Scaling | Manual start of multiple processes | CRD workerInstances field auto-scales |
| Resource Isolation | Shared host resources | Pod-level CPU/Memory limits |
| Remote JAR | JAR_FILE_PATH environment variable | CRD remoteJarUri or envVars.REMOTE_JAR_URI |
| Log Viewing | Local logs/ directory | kubectl logs command |
| Fault Recovery | Manual process restart | K8s auto-restarts failed pods |
| Use Cases | Development, testing, small datasets | Production, large-scale data |
Local Mode Prerequisites:
K8s Mode Prerequisites:
Configuration Key Differences:
# Local Mode (computer.properties)
bsp.etcd_endpoints=http://localhost:2379 # ✅ User-configured
job.workers_count=4 # User-configured
# K8s Mode (CRD)
spec:
workerInstances: 5 # Overrides job.workers_count
computerConf:
# bsp.etcd_endpoints is auto-set by operator, do NOT configure
job.partitions_count: "20"
Error: “Failed to connect to etcd”
Symptoms: Master or Worker cannot connect to etcd
Local Mode Solutions:
# Check configuration key name (common mistake)
grep "bsp.etcd_endpoints" conf/computer.properties
# Should output: bsp.etcd_endpoints=http://localhost:2379
# ❌ WRONG: bsp.etcd.url (old/incorrect key)
# ✅ CORRECT: bsp.etcd_endpoints
# Test etcd connectivity
curl http://localhost:2379/version
K8s Mode Solutions:
# Check Operator etcd service
kubectl get svc hugegraph-computer-operator-etcd -n hugegraph-computer-operator-system
# Verify etcd pod is running
kubectl get pods -n hugegraph-computer-operator-system -l app=hugegraph-computer-operator-etcd
# Should show: Running status
# Test connectivity from worker pod
kubectl exec -it pagerank-sample-worker-0 -n hugegraph-computer-operator-system -- \
curl http://hugegraph-computer-operator-etcd:2379/version
Error: “Algorithm class not found”
Symptoms: Cannot find algorithm implementation class
Cause: Incorrect algorithmName format
# ❌ WRONG formats:
algorithmName: pageRank # Camel case
algorithmName: PageRank # Title case
# ✅ CORRECT format (matches PageRank.name() return value):
algorithmName: page_rank # Underscore lowercase
Verification:
# Check algorithm implementation in source code
# File: computer-algorithm/.../PageRank.java
# Method: public String name() { return "page_rank"; }
Error: “Required option ‘algorithm.params_class’ is missing”
Solution:
computerConf:
algorithm.params_class: org.apache.hugegraph.computer.algorithm.centrality.pagerank.PageRankParams # ⚠️ Required
Issue: REMOTE_JAR_URI not working
Solution:
spec:
envVars:
- name: REMOTE_JAR_URI
value: "http://example.com/my-algorithm.jar"
Issue: Etcd connection timeout in K8s
Check Operator etcd:
# Verify etcd is running
kubectl get pods -n hugegraph-computer-operator-system -l app=hugegraph-computer-operator-etcd
# Should show: Running
# From worker pod, test etcd connectivity
kubectl exec -it pagerank-sample-worker-0 -n hugegraph-computer-operator-system -- \
curl http://hugegraph-computer-operator-etcd:2379/version
Issue: Snapshot/MinIO configuration problems
Verify MinIO service:
# Test MinIO reachability
kubectl run -it --rm debug --image=alpine --restart=Never -- sh
wget -O- http://minio:9000/minio/health/live
# Test bucket permissions (requires MinIO client)
mc config host add myminio http://minio:9000 minioadmin minioadmin
mc ls myminio/hugegraph-snapshots
Check job overall status:
kubectl get hcjob pagerank-sample -n hugegraph-computer-operator-system
# Output example:
# NAME JOBSTATUS SUPERSTEP MAXSUPERSTEP SUPERSTEPSTAT
# pagerank-sample Running 5 20 COMPUTING
Check detailed events:
kubectl describe hcjob pagerank-sample -n hugegraph-computer-operator-system
Check failure reasons:
kubectl get events --field-selector reason=ComputerJobFailed \
--field-selector involvedObject.name=pagerank-sample \
-n hugegraph-computer-operator-system
Real-time master logs:
kubectl logs -f -l component=pagerank-sample-master -n hugegraph-computer-operator-system
All worker logs:
kubectl logs -l component=pagerank-sample-worker -n hugegraph-computer-operator-system --all-containers=true
More algorithms please see: Built-In algorithms
TODO
TODO
mvn compile in advance to generate corresponding classes.Default Value Notes:
- Configuration items listed below show the code default values (defined in
ComputerOptions.java)- When the packaged configuration file (
conf/computer.propertiesin the distribution) specifies a different value, it’s noted as:value (packaged: value)- Example:
300000 (packaged: 100000)means the code default is 300000, but the distributed package defaults to 100000- For production deployments, the packaged defaults take precedence unless you explicitly override them
Core job settings for HugeGraph-Computer.
| config option | default value | description |
|---|---|---|
| hugegraph.url | http://127.0.0.1:8080 | The HugeGraph server URL to load data and write results back. |
| hugegraph.name | hugegraph | The graph name to load data and write results back. |
| hugegraph.username | "" (empty) | The username for HugeGraph authentication (leave empty if authentication is disabled). |
| hugegraph.password | "" (empty) | The password for HugeGraph authentication (leave empty if authentication is disabled). |
| job.id | local_0001 (packaged: local_001) | The job identifier on YARN cluster or K8s cluster. |
| job.namespace | "" (empty) | The job namespace that can separate different data sources. 🔒 Managed by system - do not modify manually. |
| job.workers_count | 1 | The number of workers for computing one graph algorithm job. 🔒 Managed by system - do not modify manually in K8s. |
| job.partitions_count | 1 | The number of partitions for computing one graph algorithm job. |
| job.partitions_thread_nums | 4 | The number of threads for partition parallel compute. |
Algorithm-specific configuration for computation logic.
| config option | default value | description |
|---|---|---|
| algorithm.params_class | org.apache.hugegraph.computer.core.config.Null | ⚠️ REQUIRED The class used to transfer algorithm parameters before the algorithm is run. |
| algorithm.result_class | org.apache.hugegraph.computer.core.config.Null | The class of vertex’s value, used to store the computation result for the vertex. |
| algorithm.message_class | org.apache.hugegraph.computer.core.config.Null | The class of message passed when computing a vertex. |
Configuration for loading input data from HugeGraph or other sources.
| config option | default value | description |
|---|---|---|
| input.source_type | hugegraph-server | The source type to load input data, allowed values: [‘hugegraph-server’, ‘hugegraph-loader’]. The ‘hugegraph-loader’ means use hugegraph-loader to load data from HDFS or file. If using ‘hugegraph-loader’, please configure ‘input.loader_struct_path’ and ‘input.loader_schema_path’. |
| input.loader_struct_path | "" (empty) | The struct path of loader input, only takes effect when input.source_type=loader is enabled. |
| input.loader_schema_path | "" (empty) | The schema path of loader input, only takes effect when input.source_type=loader is enabled. |
| config option | default value | description |
|---|---|---|
| input.split_size | 1048576 (1 MB) | The input split size in bytes. |
| input.split_max_splits | 10000000 | The maximum number of input splits. |
| input.split_page_size | 500 | The page size for streamed load input split data. |
| input.split_fetch_timeout | 300 | The timeout in seconds to fetch input splits. |
| config option | default value | description |
|---|---|---|
| input.filter_class | org.apache.hugegraph.computer.core.input.filter.DefaultInputFilter | The class to create input-filter object. Input-filter is used to filter vertex edges according to user needs. |
| input.edge_direction | OUT | The direction of edges to load, allowed values: [OUT, IN, BOTH]. When the value is BOTH, edges in both OUT and IN directions will be loaded. |
| input.edge_freq | MULTIPLE | The frequency of edges that can exist between a pair of vertices, allowed values: [SINGLE, SINGLE_PER_LABEL, MULTIPLE]. SINGLE means only one edge can exist between a pair of vertices (identified by sourceId + targetId); SINGLE_PER_LABEL means each edge label can have one edge between a pair of vertices (identified by sourceId + edgeLabel + targetId); MULTIPLE means many edges can exist between a pair of vertices (identified by sourceId + edgeLabel + sortValues + targetId). |
| input.max_edges_in_one_vertex | 200 | The maximum number of adjacent edges allowed to be attached to a vertex. The adjacent edges will be stored and transferred together as a batch unit. |
| config option | default value | description |
|---|---|---|
| input.send_thread_nums | 4 | The number of threads for parallel sending of vertices or edges. |
HugeGraph-Computer supports snapshot functionality to save vertex/edge partitions to local storage or MinIO object storage, enabling checkpoint recovery or accelerating repeated computations.
| config option | default value | description |
|---|---|---|
| snapshot.write | false | Whether to write snapshots of input vertex/edge partitions. |
| snapshot.load | false | Whether to load from snapshots of vertex/edge partitions. |
| snapshot.name | "" (empty) | User-defined snapshot name to distinguish different snapshots. |
MinIO can be used as a distributed object storage backend for snapshots in K8s deployments.
| config option | default value | description |
|---|---|---|
| snapshot.minio_endpoint | "" (empty) | MinIO service endpoint (e.g., http://minio:9000). Required when using MinIO. |
| snapshot.minio_access_key | minioadmin | MinIO access key for authentication. |
| snapshot.minio_secret_key | minioadmin | MinIO secret key for authentication. |
| snapshot.minio_bucket_name | "" (empty) | MinIO bucket name for storing snapshot data. |
Usage Scenarios:
Example: Local Snapshot (in computer.properties):
snapshot.write=true
snapshot.name=pagerank-snapshot-20260201
Example: MinIO Snapshot (in K8s CRD computerConf):
computerConf:
snapshot.write: "true"
snapshot.name: "pagerank-snapshot-v1"
snapshot.minio_endpoint: "http://minio:9000"
snapshot.minio_access_key: "my-access-key"
snapshot.minio_secret_key: "my-secret-key"
snapshot.minio_bucket_name: "hugegraph-snapshots"
Configuration for worker and master computation logic.
| config option | default value | description |
|---|---|---|
| master.computation_class | org.apache.hugegraph.computer.core.master.DefaultMasterComputation | Master-computation is computation that can determine whether to continue to the next superstep. It runs at the end of each superstep on the master. |
| config option | default value | description |
|---|---|---|
| worker.computation_class | org.apache.hugegraph.computer.core.config.Null | The class to create worker-computation object. Worker-computation is used to compute each vertex in each superstep. |
| worker.combiner_class | org.apache.hugegraph.computer.core.config.Null | Combiner can combine messages into one value for a vertex. For example, PageRank algorithm can combine messages of a vertex to a sum value. |
| worker.partitioner | org.apache.hugegraph.computer.core.graph.partition.HashPartitioner | The partitioner that decides which partition a vertex should be in, and which worker a partition should be in. |
| config option | default value | description |
|---|---|---|
| worker.vertex_properties_combiner_class | org.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombiner | The combiner can combine several properties of the same vertex into one properties at input step. |
| worker.edge_properties_combiner_class | org.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombiner | The combiner can combine several properties of the same edge into one properties at input step. |
| config option | default value | description |
|---|---|---|
| worker.received_buffers_bytes_limit | 104857600 (100 MB) | The limit bytes of buffers of received data. The total size of all buffers can’t exceed this limit. If received buffers reach this limit, they will be merged into a file (spill to disk). |
| worker.write_buffer_capacity | 52428800 (50 MB) | The initial size of write buffer that used to store vertex or message. |
| worker.write_buffer_threshold | 52428800 (50 MB) | The threshold of write buffer. Exceeding it will trigger sorting. The write buffer is used to store vertex or message. |
| config option | default value | description |
|---|---|---|
| worker.data_dirs | [jobs] | The directories separated by ‘,’ that received vertices and messages can persist into. |
| worker.wait_sort_timeout | 600000 (10 minutes) | The max timeout (in ms) for message-handler to wait for sort-thread to sort one batch of buffers. |
| worker.wait_finish_messages_timeout | 86400000 (24 hours) | The max timeout (in ms) for message-handler to wait for finish-message of all workers. |
Configuration for output computation results.
| config option | default value | description |
|---|---|---|
| output.output_class | org.apache.hugegraph.computer.core.output.LogOutput | The class to output the computation result of each vertex. Called after iteration computation. |
| output.result_name | value | The value is assigned dynamically by #name() of instance created by WORKER_COMPUTATION_CLASS. |
| output.result_write_type | OLAP_COMMON | The result write-type to output to HugeGraph, allowed values: [OLAP_COMMON, OLAP_SECONDARY, OLAP_RANGE]. |
| config option | default value | description |
|---|---|---|
| output.with_adjacent_edges | false | Whether to output the adjacent edges of the vertex. |
| output.with_vertex_properties | false | Whether to output the properties of the vertex. |
| output.with_edge_properties | false | Whether to output the properties of the edge. |
| config option | default value | description |
|---|---|---|
| output.batch_size | 500 | The batch size of output. |
| output.batch_threads | 1 | The number of threads used for batch output. |
| output.single_threads | 1 | The number of threads used for single output. |
| config option | default value | description |
|---|---|---|
| output.hdfs_url | hdfs://127.0.0.1:9000 | The HDFS URL for output. |
| output.hdfs_user | hadoop | The HDFS user for output. |
| output.hdfs_path_prefix | /hugegraph-computer/results | The directory of HDFS output results. |
| output.hdfs_delimiter | , (comma) | The delimiter of HDFS output. |
| output.hdfs_merge_partitions | true | Whether to merge output files of multiple partitions. |
| output.hdfs_replication | 3 | The replication number of HDFS. |
| output.hdfs_core_site_path | "" (empty) | The HDFS core site path. |
| output.hdfs_site_path | "" (empty) | The HDFS site path. |
| output.hdfs_kerberos_enable | false | Whether Kerberos authentication is enabled for HDFS. |
| output.hdfs_kerberos_principal | "" (empty) | The HDFS principal for Kerberos authentication. |
| output.hdfs_kerberos_keytab | "" (empty) | The HDFS keytab file for Kerberos authentication. |
| output.hdfs_krb5_conf | /etc/krb5.conf | Kerberos configuration file path. |
| config option | default value | description |
|---|---|---|
| output.retry_times | 3 | The retry times when output fails. |
| output.retry_interval | 10 | The retry interval (in seconds) when output fails. |
| output.thread_pool_shutdown_timeout | 60 | The timeout (in seconds) of output thread pool shutdown. |
Configuration for network communication between workers and master.
| config option | default value | description |
|---|---|---|
| transport.server_host | 127.0.0.1 | 🔒 Managed by system The server hostname or IP to listen on to transfer data. Do not modify manually. |
| transport.server_port | 0 | 🔒 Managed by system The server port to listen on to transfer data. The system will assign a random port if set to 0. Do not modify manually. |
| transport.server_threads | 4 | The number of transport threads for server. |
| config option | default value | description |
|---|---|---|
| transport.client_threads | 4 | The number of transport threads for client. |
| transport.client_connect_timeout | 3000 | The timeout (in ms) of client connect to server. |
| config option | default value | description |
|---|---|---|
| transport.provider_class | org.apache.hugegraph.computer.core.network.netty.NettyTransportProvider | The transport provider, currently only supports Netty. |
| transport.io_mode | AUTO | The network IO mode, allowed values: [NIO, EPOLL, AUTO]. AUTO means selecting the appropriate mode automatically. |
| transport.tcp_keep_alive | true | Whether to enable TCP keep-alive. |
| transport.transport_epoll_lt | false | Whether to enable EPOLL level-trigger (only effective when io_mode=EPOLL). |
| config option | default value | description |
|---|---|---|
| transport.send_buffer_size | 0 | The size of socket send-buffer in bytes. 0 means using system default value. |
| transport.receive_buffer_size | 0 | The size of socket receive-buffer in bytes. 0 means using system default value. |
| transport.write_buffer_high_mark | 67108864 (64 MB) | The high water mark for write buffer in bytes. It will trigger sending unavailable if the number of queued bytes > write_buffer_high_mark. |
| transport.write_buffer_low_mark | 33554432 (32 MB) | The low water mark for write buffer in bytes. It will trigger sending available if the number of queued bytes < write_buffer_low_mark. |
| config option | default value | description |
|---|---|---|
| transport.max_pending_requests | 8 | The max number of client unreceived ACKs. It will trigger sending unavailable if the number of unreceived ACKs >= max_pending_requests. |
| transport.min_pending_requests | 6 | The minimum number of client unreceived ACKs. It will trigger sending available if the number of unreceived ACKs < min_pending_requests. |
| transport.min_ack_interval | 200 | The minimum interval (in ms) of server reply ACK. |
| config option | default value | description |
|---|---|---|
| transport.close_timeout | 10000 | The timeout (in ms) of close server or close client. |
| transport.sync_request_timeout | 10000 | The timeout (in ms) to wait for response after sending sync-request. |
| transport.finish_session_timeout | 0 | The timeout (in ms) to finish session. 0 means using (transport.sync_request_timeout × transport.max_pending_requests). |
| transport.write_socket_timeout | 3000 | The timeout (in ms) to write data to socket buffer. |
| transport.server_idle_timeout | 360000 (6 minutes) | The max timeout (in ms) of server idle. |
| config option | default value | description |
|---|---|---|
| transport.heartbeat_interval | 20000 (20 seconds) | The minimum interval (in ms) between heartbeats on client side. |
| transport.max_timeout_heartbeat_count | 120 | The maximum times of timeout heartbeat on client side. If the number of timeouts waiting for heartbeat response continuously > max_timeout_heartbeat_count, the channel will be closed from client side. |
| config option | default value | description |
|---|---|---|
| transport.max_syn_backlog | 511 | The capacity of SYN queue on server side. 0 means using system default value. |
| transport.recv_file_mode | true | Whether to enable receive buffer-file mode. It will receive buffer and write to file from socket using zero-copy if enabled. Note: Requires OS support for zero-copy (e.g., Linux sendfile/splice). |
| transport.network_retries | 3 | The number of retry attempts for network communication if network is unstable. |
Configuration for HGKV (HugeGraph Key-Value) storage engine and value files.
| config option | default value | description |
|---|---|---|
| hgkv.max_file_size | 2147483648 (2 GB) | The max number of bytes in each HGKV file. |
| hgkv.max_data_block_size | 65536 (64 KB) | The max byte size of HGKV file data block. |
| hgkv.max_merge_files | 10 | The max number of files to merge at one time. |
| hgkv.temp_file_dir | /tmp/hgkv | This folder is used to store temporary files during the file merging process. |
| config option | default value | description |
|---|---|---|
| valuefile.max_segment_size | 1073741824 (1 GB) | The max number of bytes in each segment of value-file. |
Configuration for Bulk Synchronous Parallel (BSP) protocol and etcd coordination.
| config option | default value | description |
|---|---|---|
| bsp.etcd_endpoints | http://localhost:2379 | 🔒 Managed by system in K8s The endpoints to access etcd. For multiple endpoints, use comma-separated list: http://host1:port1,http://host2:port2. Do not modify manually in K8s deployments. |
| bsp.max_super_step | 10 (packaged: 2) | The max super step of the algorithm. |
| bsp.register_timeout | 300000 (packaged: 100000) | The max timeout (in ms) to wait for master and workers to register. |
| bsp.wait_workers_timeout | 86400000 (24 hours) | The max timeout (in ms) to wait for workers BSP event. |
| bsp.wait_master_timeout | 86400000 (24 hours) | The max timeout (in ms) to wait for master BSP event. |
| bsp.log_interval | 30000 (30 seconds) | The log interval (in ms) to print the log while waiting for BSP event. |
Configuration for performance optimization.
| config option | default value | description |
|---|---|---|
| allocator.max_vertices_per_thread | 10000 | Maximum number of vertices per thread processed in each memory allocator. |
| sort.thread_nums | 4 | The number of threads performing internal sorting. |
⚠️ Configuration items managed by the system - users are prohibited from modifying these manually.
The following configuration items are automatically managed by the K8s Operator, Driver, or runtime system. Manual modification will cause cluster communication failures or job scheduling errors.
| config option | managed by | description |
|---|---|---|
| bsp.etcd_endpoints | K8s Operator | Automatically set to operator’s etcd service address |
| transport.server_host | Runtime | Automatically set to pod/container hostname |
| transport.server_port | Runtime | Automatically assigned random port |
| job.namespace | K8s Operator | Automatically set to job namespace |
| job.id | K8s Operator | Automatically set to job ID from CRD |
| job.workers_count | K8s Operator | Automatically set from CRD workerInstances |
| rpc.server_host | Runtime | RPC server hostname (system-managed) |
| rpc.server_port | Runtime | RPC server port (system-managed) |
| rpc.remote_url | Runtime | RPC remote URL (system-managed) |
Why These Are Forbidden:
NOTE: Option needs to be converted through environment variable settings, e.g. k8s.internal_etcd_url => INTERNAL_ETCD_URL
| config option | default value | description |
|---|---|---|
| k8s.auto_destroy_pod | true | Whether to automatically destroy all pods when the job is completed or failed. |
| k8s.close_reconciler_timeout | 120 | The max timeout (in ms) to close reconciler. |
| k8s.internal_etcd_url | http://127.0.0.1:2379 | The internal etcd URL for operator system. |
| k8s.max_reconcile_retry | 3 | The max retry times of reconcile. |
| k8s.probe_backlog | 50 | The maximum backlog for serving health probes. |
| k8s.probe_port | 9892 | The port that the controller binds to for serving health probes. |
| k8s.ready_check_internal | 1000 | The time interval (ms) of check ready. |
| k8s.ready_timeout | 30000 | The max timeout (in ms) of check ready. |
| k8s.reconciler_count | 10 | The max number of reconciler threads. |
| k8s.resync_period | 600000 | The minimum frequency at which watched resources are reconciled. |
| k8s.timezone | Asia/Shanghai | The timezone of computer job and operator. |
| k8s.watch_namespace | hugegraph-computer-system | The namespace to watch custom resources in. Use ‘*’ to watch all namespaces. |
| spec | default value | description | required |
|---|---|---|---|
| algorithmName | The name of algorithm. | true | |
| jobId | The job id. | true | |
| image | The image of algorithm. | true | |
| computerConf | The map of computer config options. | true | |
| workerInstances | The number of worker instances, it will override the ‘job.workers_count’ option. | true | |
| pullPolicy | Always | The pull-policy of image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy | false |
| pullSecrets | The pull-secrets of Image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod | false | |
| masterCpu | The cpu limit of master, the unit can be ’m’ or without unit detail please refer to: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu | false | |
| workerCpu | The cpu limit of worker, the unit can be ’m’ or without unit detail please refer to: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu | false | |
| masterMemory | The memory limit of master, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory | false | |
| workerMemory | The memory limit of worker, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory | false | |
| log4jXml | The content of log4j.xml for computer job. | false | |
| jarFile | The jar path of computer algorithm. | false | |
| remoteJarUri | The remote jar uri of computer algorithm, it will overlay algorithm image. | false | |
| jvmOptions | The java startup parameters of computer job. | false | |
| envVars | please refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/ | false | |
| envFrom | please refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/ | false | |
| masterCommand | bin/start-computer.sh | The run command of master, equivalent to ‘Entrypoint’ field of Docker. | false |
| masterArgs | ["-r master", “-d k8s”] | The run args of master, equivalent to ‘Cmd’ field of Docker. | false |
| workerCommand | bin/start-computer.sh | The run command of worker, equivalent to ‘Entrypoint’ field of Docker. | false |
| workerArgs | ["-r worker", “-d k8s”] | The run args of worker, equivalent to ‘Cmd’ field of Docker. | false |
| volumes | Please refer to: https://kubernetes.io/docs/concepts/storage/volumes/ | false | |
| volumeMounts | Please refer to: https://kubernetes.io/docs/concepts/storage/volumes/ | false | |
| secretPaths | The map of k8s-secret name and mount path. | false | |
| configMapPaths | The map of k8s-configmap name and mount path. | false | |
| podTemplateSpec | Please refer to: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-template-v1/#PodTemplateSpec | false | |
| securityContext | Please refer to: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ | false |
| config option | default value | description |
|---|---|---|
| k8s.build_image_bash_path | The path of command used to build image. | |
| k8s.enable_internal_algorithm | true | Whether enable internal algorithm. |
| k8s.framework_image_url | hugegraph/hugegraph-computer:latest | The image url of computer framework. |
| k8s.image_repository_password | The password for login image repository. | |
| k8s.image_repository_registry | The address for login image repository. | |
| k8s.image_repository_url | hugegraph/hugegraph-computer | The url of image repository. |
| k8s.image_repository_username | The username for login image repository. | |
| k8s.internal_algorithm | [pageRank] | The name list of all internal algorithm. Note: Algorithm names use camelCase here (e.g., pageRank), but algorithm implementations return underscore_case (e.g., page_rank). |
| k8s.internal_algorithm_image_url | hugegraph/hugegraph-computer:latest | The image url of internal algorithm. |
| k8s.jar_file_dir | /cache/jars/ | The directory where the algorithm jar will be uploaded. |
| k8s.kube_config | ~/.kube/config | The path of k8s config file. |
| k8s.log4j_xml_path | The log4j.xml path for computer job. | |
| k8s.namespace | hugegraph-computer-system | The namespace of hugegraph-computer system. |
| k8s.pull_secret_names | [] | The names of pull-secret for pulling image. |