This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Config

1: HugeGraph configuration
2: HugeGraph Config Options
3: Built-in User Authentication and Authorization Configuration and Usage in HugeGraph
4: Configuring HugeGraphServer to Use HTTPS Protocol
5: HugeGraph-Computer Config

1 - HugeGraph configuration

1 Overview

The directory for the configuration files is hugegraph-release/conf, and all the configurations related to the service and the graph itself are located in this directory.

The main configuration files include gremlin-server.yaml, rest-server.properties, and hugegraph.properties.

The HugeGraphServer integrates the GremlinServer and RestServer internally, and gremlin-server.yaml and rest-server.properties are used to configure these two servers.

GremlinServer: GremlinServer accepts Gremlin statements from users, parses them, and then invokes the Core code.
RestServer: It provides a RESTful API that, based on different HTTP requests, calls the corresponding Core API. If the user’s request body is a Gremlin statement, it will be forwarded to GremlinServer to perform operations on the graph data.

Now let’s introduce these three configuration files one by one.

2. gremlin-server.yaml

The default content of the gremlin-server.yaml file is as follows:

# host and port of gremlin server, need to be consistent with host and port in rest-server.properties
#host: 127.0.0.1
#port: 8182

# timeout in ms of gremlin query
evaluationTimeout: 30000

channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
# don't set graph at here, this happens after support for dynamically adding graph
graphs: {
}
scriptEngines: {
  gremlin-groovy: {
    staticImports: [
      org.opencypher.gremlin.process.traversal.CustomPredicates.*',
      org.opencypher.gremlin.traversal.CustomFunctions.*
    ],
    plugins: {
      org.apache.hugegraph.plugin.HugeGraphGremlinPlugin: {},
      org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
      org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {
        classImports: [
          java.lang.Math,
          org.apache.hugegraph.backend.id.IdGenerator,
          org.apache.hugegraph.type.define.Directions,
          org.apache.hugegraph.type.define.NodeRole,
          org.apache.hugegraph.traversal.algorithm.CollectionPathsTraverser,
          org.apache.hugegraph.traversal.algorithm.CountTraverser,
          org.apache.hugegraph.traversal.algorithm.CustomizedCrosspointsTraverser,
          org.apache.hugegraph.traversal.algorithm.CustomizePathsTraverser,
          org.apache.hugegraph.traversal.algorithm.FusiformSimilarityTraverser,
          org.apache.hugegraph.traversal.algorithm.HugeTraverser,
          org.apache.hugegraph.traversal.algorithm.JaccardSimilarTraverser,
          org.apache.hugegraph.traversal.algorithm.KneighborTraverser,
          org.apache.hugegraph.traversal.algorithm.KoutTraverser,
          org.apache.hugegraph.traversal.algorithm.MultiNodeShortestPathTraverser,
          org.apache.hugegraph.traversal.algorithm.NeighborRankTraverser,
          org.apache.hugegraph.traversal.algorithm.PathsTraverser,
          org.apache.hugegraph.traversal.algorithm.PersonalRankTraverser,
          org.apache.hugegraph.traversal.algorithm.SameNeighborTraverser,
          org.apache.hugegraph.traversal.algorithm.ShortestPathTraverser,
          org.apache.hugegraph.traversal.algorithm.SingleSourceShortestPathTraverser,
          org.apache.hugegraph.traversal.algorithm.SubGraphTraverser,
          org.apache.hugegraph.traversal.algorithm.TemplatePathsTraverser,
          org.apache.hugegraph.traversal.algorithm.steps.EdgeStep,
          org.apache.hugegraph.traversal.algorithm.steps.RepeatEdgeStep,
          org.apache.hugegraph.traversal.algorithm.steps.WeightedEdgeStep,
          org.apache.hugegraph.traversal.optimize.ConditionP,
          org.apache.hugegraph.traversal.optimize.Text,
          org.apache.hugegraph.traversal.optimize.TraversalUtil,
          org.apache.hugegraph.util.DateUtil,
          org.opencypher.gremlin.traversal.CustomFunctions,
          org.opencypher.gremlin.traversal.CustomPredicate
        ],
        methodImports: [
          java.lang.Math#*,
          org.opencypher.gremlin.traversal.CustomPredicate#*,
          org.opencypher.gremlin.traversal.CustomFunctions#*
        ]
      },
      org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {
        files: [scripts/empty-sample.groovy]
      }
    }
  }
}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1,
      config: {
        serializeResultToString: false,
        ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
      }
  }
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0,
      config: {
        serializeResultToString: false,
        ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
      }
  }
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV2d0,
      config: {
        serializeResultToString: false,
        ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
      }
  }
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0,
      config: {
        serializeResultToString: false,
        ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
      }
  }
metrics: {
  consoleReporter: {enabled: false, interval: 180000},
  csvReporter: {enabled: false, interval: 180000, fileName: ./metrics/gremlin-server-metrics.csv},
  jmxReporter: {enabled: false},
  slf4jReporter: {enabled: false, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}
}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false
}

There are many configuration options mentioned above, but for now, let’s focus on the following options: channelizer and graphs.

graphs: This option specifies the graphs that need to be opened when the GremlinServer starts. It is a map structure where the key is the name of the graph and the value is the configuration file path for that graph.
channelizer: The GremlinServer supports two communication modes with clients: WebSocket and HTTP (default). If WebSocket is chosen, users can quickly experience the features of HugeGraph using Gremlin-Console, but it does not support importing large-scale data. It is recommended to use HTTP for communication, as all peripheral components of HugeGraph are implemented based on HTTP.

By default, the GremlinServer serves at localhost:8182. If you need to modify it, configure the host and port settings.

host: The hostname or IP address of the machine where the GremlinServer is deployed. Currently, HugeGraphServer does not support distributed deployment, and GremlinServer is not directly exposed to users.
port: The port number of the machine where the GremlinServer is deployed.

Additionally, you need to add the corresponding configuration gremlinserver.url=http://host:port in rest-server.properties.

3. rest-server.properties

The default content of the rest-server.properties file is as follows:

# bind url
# could use '0.0.0.0' or specified (real)IP to expose external network access
restserver.url=http://127.0.0.1:8080
#restserver.enable_graphspaces_filter=false
# gremlin server url, need to be consistent with host and port in gremlin-server.yaml
#gremlinserver.url=http://127.0.0.1:8182

graphs=./conf/graphs

# The maximum thread ratio for batch writing, only take effect if the batch.max_write_threads is 0
batch.max_write_ratio=80
batch.max_write_threads=0

# configuration of arthas
arthas.telnet_port=8562
arthas.http_port=8561
arthas.ip=127.0.0.1
arthas.disabled_commands=jad

# authentication configs
# choose 'org.apache.hugegraph.auth.StandardAuthenticator' or
# 'org.apache.hugegraph.auth.ConfigAuthenticator'
#auth.authenticator=

# for StandardAuthenticator mode
#auth.graph_store=hugegraph
# auth client config
#auth.remote_url=127.0.0.1:8899,127.0.0.1:8898,127.0.0.1:8897

# for ConfigAuthenticator mode
#auth.admin_token=
#auth.user_tokens=[]

# TODO: Deprecated & removed later (useless from version 1.5.0)
# rpc server configs for multi graph-servers or raft-servers
#rpc.server_host=127.0.0.1
#rpc.server_port=8091
#rpc.server_timeout=30

# rpc client configs (like enable to keep cache consistency)
#rpc.remote_url=127.0.0.1:8091,127.0.0.1:8092,127.0.0.1:8093
#rpc.client_connect_timeout=20
#rpc.client_reconnect_period=10
#rpc.client_read_timeout=40
#rpc.client_retries=3
#rpc.client_load_balancer=consistentHash

# raft group initial peers
#raft.group_peers=127.0.0.1:8091,127.0.0.1:8092,127.0.0.1:8093

# lightweight load balancing (beta)
server.id=server-1
server.role=master

# slow query log
log.slow_query_threshold=1000

# jvm(in-heap) memory usage monitor, set 1 to disable it
memory_monitor.threshold=0.85
memory_monitor.period=2000

restserver.url: The URL at which the RestServer provides its services. Modify it according to the actual environment. If you can’t connet to server from other IP address, try to modify it as specific IP; or modify it as http://0.0.0.0 to listen all network interfaces as a convenient solution, but need to take care of the network area that might access.
graphs: The RestServer also needs to open graphs when it starts. This option is a map structure where the key is the name of the graph and the value is the configuration file path for that graph.

Note: Both gremlin-server.yaml and rest-server.properties contain the graphs configuration option, and the init-store command initializes based on the graphs specified in the graphs section of gremlin-server.yaml.

The gremlinserver.url configuration option is the URL at which the GremlinServer provides services to the RestServer. By default, it is set to http://localhost:8182. If you need to modify it, it should match the host and port settings in gremlin-server.yaml.

4. hugegraph.properties

hugegraph.properties is a type of file. If the system has multiple graphs, there will be multiple similar files. This file is used to configure parameters related to graph storage and querying. The default content of the file is as follows:

# gremlin entrence to create graph
gremlin.graph=org.apache.hugegraph.HugeFactory

# cache config
#schema.cache_capacity=100000
# vertex-cache default is 1000w, 10min expired
#vertex.cache_capacity=10000000
#vertex.cache_expire=600
# edge-cache default is 100w, 10min expired
#edge.cache_capacity=1000000
#edge.cache_expire=600

# schema illegal name template
#schema.illegal_name_regex=\s+|~.*

#vertex.default_label=vertex

backend=rocksdb
serializer=binary

store=hugegraph

raft.mode=false
raft.safe_read=false
raft.use_snapshot=false
raft.endpoint=127.0.0.1:8281
raft.group_peers=127.0.0.1:8281,127.0.0.1:8282,127.0.0.1:8283
raft.path=./raft-log
raft.use_replicator_pipeline=true
raft.election_timeout=10000
raft.snapshot_interval=3600
raft.backend_threads=48
raft.read_index_threads=8
raft.queue_size=16384
raft.queue_publish_timeout=60
raft.apply_batch=1
raft.rpc_threads=80
raft.rpc_connect_timeout=5000
raft.rpc_timeout=60000

# if use 'ikanalyzer', need download jar from 'https://github.com/apache/hugegraph-doc/raw/ik_binary/dist/server/ikanalyzer-2012_u6.jar' to lib directory
search.text_analyzer=jieba
search.text_analyzer_mode=INDEX

# rocksdb backend config
#rocksdb.data_path=/path/to/disk
#rocksdb.wal_path=/path/to/disk

# cassandra backend config
cassandra.host=localhost
cassandra.port=9042
cassandra.username=
cassandra.password=
#cassandra.connect_timeout=5
#cassandra.read_timeout=20
#cassandra.keyspace.strategy=SimpleStrategy
#cassandra.keyspace.replication=3

# hbase backend config
#hbase.hosts=localhost
#hbase.port=2181
#hbase.znode_parent=/hbase
#hbase.threads_max=64

# mysql backend config
#jdbc.driver=com.mysql.jdbc.Driver
#jdbc.url=jdbc:mysql://127.0.0.1:3306
#jdbc.username=root
#jdbc.password=
#jdbc.reconnect_max_times=3
#jdbc.reconnect_interval=3
#jdbc.ssl_mode=false

# postgresql & cockroachdb backend config
#jdbc.driver=org.postgresql.Driver
#jdbc.url=jdbc:postgresql://localhost:5432/
#jdbc.username=postgres
#jdbc.password=

# palo backend config
#palo.host=127.0.0.1
#palo.poll_interval=10
#palo.temp_dir=./palo-data
#palo.file_limit_size=32

Pay attention to the following uncommented items:

gremlin.graph: The entry point for GremlinServer startup. Users should not modify this item.
backend: The backend storage used, with options including memory, cassandra, scylladb, mysql, hbase, postgresql, and rocksdb.
serializer: Mainly for internal use, used to serialize schema, vertices, and edges to the backend. The corresponding options are text, cassandra, scylladb, and binary (Note: The rocksdb backend should have a value of binary, while for other backends, the values of backend and serializer should remain consistent. For example, for the hbase backend, the value should be hbase).
store: The name of the database used for storing the graph in the backend. In Cassandra and ScyllaDB, it corresponds to the keyspace name. The value of this item is unrelated to the graph name in GremlinServer and RestServer, but for clarity, it is recommended to use the same name.
cassandra.host: This item is only meaningful when the backend is set to cassandra or scylladb. It specifies the seeds of the Cassandra/ScyllaDB cluster.
cassandra.port: This item is only meaningful when the backend is set to cassandra or scylladb. It specifies the native port of the Cassandra/ScyllaDB cluster.
rocksdb.data_path: This item is only meaningful when the backend is set to rocksdb. It specifies the data directory for RocksDB.
rocksdb.wal_path: This item is only meaningful when the backend is set to rocksdb. It specifies the log directory for RocksDB.
admin.token: A token used to retrieve server configuration information. For example: http://localhost:8080/graphs/hugegraph/conf?token=162f7848-0b6d-4faf-b557-3a0797869c55

5. Multi-Graph Configuration

Our system can have multiple graphs, and the backend of each graph can be different, such as hugegraph_rocksdb and hugegraph_mysql, where hugegraph_rocksdb uses RocksDB as the backend, and hugegraph_mysql uses MySQL as a backend.

The configuration method is simple:

[Optional]: Modify rest-server.properties

You can modify the graph profile directory in the graphs option of rest-server.properties. The default configuration is graphs=./conf/graphs, if you want to change it to another directory then adjust the graphs option, e.g. adjust it to graphs=/etc/hugegraph/graphs, example is as follows:

graphs=./conf/graphs

Modify hugegraph_mysql_backend.properties and hugegraph_rocksdb_backend.properties based on hugegraph.properties under conf/graphs path

The modified part of hugegraph_mysql_backend.properties is as follows:

backend=mysql
serializer=mysql

store=hugegraph_mysql

# mysql backend config
jdbc.driver=com.mysql.cj.jdbc.Driver
jdbc.url=jdbc:mysql://127.0.0.1:3306
jdbc.username=root
jdbc.password=123456
jdbc.reconnect_max_times=3
jdbc.reconnect_interval=3
jdbc.ssl_mode=false

The modified part of hugegraph_rocksdb_backend.properties is as follows:

backend=rocksdb
serializer=binary

store=hugegraph_rocksdb

Stop the server, execute init-store.sh (to create a new database for the new graph), and restart the server.

$ ./bin/stop-hugegraph.sh

$ ./bin/init-store.sh

Initializing HugeGraph Store...
2023-06-11 14:16:14 [main] [INFO] o.a.h.u.ConfigUtil - Scanning option 'graphs' directory './conf/graphs'
2023-06-11 14:16:14 [main] [INFO] o.a.h.c.InitStore - Init graph with config file: ./conf/graphs/hugegraph_rocksdb_backend.properties
...
2023-06-11 14:16:15 [main] [INFO] o.a.h.StandardHugeGraph - Graph 'hugegraph_rocksdb' has been initialized
2023-06-11 14:16:15 [main] [INFO] o.a.h.c.InitStore - Init graph with config file: ./conf/graphs/hugegraph_mysql_backend.properties
...
2023-06-11 14:16:16 [main] [INFO] o.a.h.StandardHugeGraph - Graph 'hugegraph_mysql' has been initialized
2023-06-11 14:16:16 [main] [INFO] o.a.h.StandardHugeGraph - Close graph standardhugegraph[hugegraph_rocksdb]
...
2023-06-11 14:16:16 [main] [INFO] o.a.h.HugeFactory - HugeFactory shutdown
2023-06-11 14:16:16 [hugegraph-shutdown] [INFO] o.a.h.HugeFactory - HugeGraph is shutting down
Initialization finished.

$ ./bin/start-hugegraph.sh

Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)...OK
Started [pid 21614]

Check out created graphs:

curl http://127.0.0.1:8080/graphs/

{"graphs":["hugegraph_rocksdb","hugegraph_mysql"]}

Get details of the graph

curl http://127.0.0.1:8080/graphs/hugegraph_mysql_backend

{"name":"hugegraph_mysql","backend":"mysql"}

curl http://127.0.0.1:8080/graphs/hugegraph_rocksdb_backend

{"name":"hugegraph_rocksdb","backend":"rocksdb"}

2 - HugeGraph Config Options

Gremlin Server Config Options

Corresponding configuration file gremlin-server.yaml

config option	default value	description
host	127.0.0.1	The host or ip of Gremlin Server.
port	8182	The listening port of Gremlin Server.
graphs	hugegraph: conf/hugegraph.properties	The map of graphs with name and config file path.
scriptEvaluationTimeout	30000	The timeout for gremlin script execution(millisecond).
channelizer	org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer	Indicates the protocol which the Gremlin Server provides service.
authentication	authenticator: org.apache.hugegraph.auth.StandardAuthenticator, config: {tokens: conf/rest-server.properties}	The authenticator and config(contains tokens path) of authentication mechanism.

Rest Server & API Config Options

Corresponding configuration file rest-server.properties

config option	default value	description
graphs	[hugegraph:conf/hugegraph.properties]	The map of graphs’ name and config file.
server.id	server-1	The id of rest server, used for license verification.
server.role	master	The role of nodes in the cluster, available types are [master, worker, computer]
restserver.url	http://127.0.0.1:8080	The url for listening of rest server.
ssl.keystore_file	server.keystore	The path of server keystore file used when https protocol is enabled.
ssl.keystore_password		The password of the path of the server keystore file used when the https protocol is enabled.
restserver.max_worker_threads	2 * CPUs	The maximum worker threads of rest server.
restserver.min_free_memory	64	The minimum free memory(MB) of rest server, requests will be rejected when the available memory of system is lower than this value.
restserver.request_timeout	30	The time in seconds within which a request must complete, -1 means no timeout.
restserver.connection_idle_timeout	30	The time in seconds to keep an inactive connection alive, -1 means no timeout.
restserver.connection_max_requests	256	The max number of HTTP requests allowed to be processed on one keep-alive connection, -1 means unlimited.
gremlinserver.url	http://127.0.0.1:8182	The url of gremlin server.
gremlinserver.max_route	8	The max route number for gremlin server.
gremlinserver.timeout	30	The timeout in seconds of waiting for gremlin server.
batch.max_edges_per_batch	500	The maximum number of edges submitted per batch.
batch.max_vertices_per_batch	500	The maximum number of vertices submitted per batch.
batch.max_write_ratio	50	The maximum thread ratio for batch writing, only take effect if the batch.max_write_threads is 0.
batch.max_write_threads	0	The maximum threads for batch writing, if the value is 0, the actual value will be set to batch.max_write_ratio * restserver.max_worker_threads.
auth.authenticator		The class path of authenticator implementation. e.g., org.apache.hugegraph.auth.StandardAuthenticator, or org.apache.hugegraph.auth.ConfigAuthenticator.
auth.admin_token	162f7848-0b6d-4faf-b557-3a0797869c55	Token for administrator operations, only for org.apache.hugegraph.auth.ConfigAuthenticator.
auth.graph_store	hugegraph	The name of graph used to store authentication information, like users, only for org.apache.hugegraph.auth.StandardAuthenticator.
auth.user_tokens	[hugegraph:9fd95c9c-711b-415b-b85f-d4df46ba5c31]	The map of user tokens with name and password, only for org.apache.hugegraph.auth.ConfigAuthenticator.
auth.audit_log_rate	1000.0	The max rate of audit log output per user, default value is 1000 records per second.
auth.cache_capacity	10240	The max cache capacity of each auth cache item.
auth.cache_expire	600	The expiration time in seconds of vertex cache.
auth.remote_url		If the address is empty, it provide auth service, otherwise it is auth client and also provide auth service through rpc forwarding. The remote url can be set to multiple addresses, which are concat by ‘,’.
auth.token_expire	86400	The expiration time in seconds after token created
auth.token_secret	FXQXbJtbCLxODc6tGci732pkH1cyf8Qg	Secret key of HS256 algorithm.
exception.allow_trace	false	Whether to allow exception trace stack.
memory_monitor.threshold	0.85	The threshold of JVM(in-heap) memory usage monitoring , 1 means disabling this function.
memory_monitor.period	2000	The period in ms of JVM(in-heap) memory usage monitoring.

Basic Config Options

Basic Config Options and Backend Config Options correspond to configuration files:{graph-name}.properties, such as hugegraph.properties

config option	default value	description
gremlin.graph	org.apache.hugegraph.HugeFactory	Gremlin entrance to create graph.
backend	rocksdb	The data store type, available values are [memory, rocksdb, cassandra, scylladb, hbase, mysql].
serializer	binary	The serializer for backend store, available values are [text, binary, cassandra, hbase, mysql].
store	hugegraph	The database name like Cassandra Keyspace.
store.connection_detect_interval	600	The interval in seconds for detecting connections, if the idle time of a connection exceeds this value, detect it and reconnect if needed before using, value 0 means detecting every time.
store.graph	g	The graph table name, which store vertex, edge and property.
store.schema	m	The schema table name, which store meta data.
store.system	s	The system table name, which store system data.
schema.illegal_name_regex	.\s+$\|~.	The regex specified the illegal format for schema name.
schema.cache_capacity	10000	The max cache size(items) of schema cache.
vertex.cache_type	l2	The type of vertex cache, allowed values are [l1, l2].
vertex.cache_capacity	10000000	The max cache size(items) of vertex cache.
vertex.cache_expire	600	The expire time in seconds of vertex cache.
vertex.check_customized_id_exist	false	Whether to check the vertices exist for those using customized id strategy.
vertex.default_label	vertex	The default vertex label.
vertex.tx_capacity	10000	The max size(items) of vertices(uncommitted) in transaction.
vertex.check_adjacent_vertex_exist	false	Whether to check the adjacent vertices of edges exist.
vertex.lazy_load_adjacent_vertex	true	Whether to lazy load adjacent vertices of edges.
vertex.part_edge_commit_size	5000	Whether to enable the mode to commit part of edges of vertex, enabled if commit size > 0, 0 means disabled.
vertex.encode_primary_key_number	true	Whether to encode number value of primary key in vertex id.
vertex.remove_left_index_at_overwrite	false	Whether remove left index at overwrite.
edge.cache_type	l2	The type of edge cache, allowed values are [l1, l2].
edge.cache_capacity	1000000	The max cache size(items) of edge cache.
edge.cache_expire	600	The expiration time in seconds of edge cache.
edge.tx_capacity	10000	The max size(items) of edges(uncommitted) in transaction.
query.page_size	500	The size of each page when querying by paging.
query.batch_size	1000	The size of each batch when querying by batch.
query.ignore_invalid_data	true	Whether to ignore invalid data of vertex or edge.
query.index_intersect_threshold	1000	The maximum number of intermediate results to intersect indexes when querying by multiple single index properties.
query.ramtable_edges_capacity	20000000	The maximum number of edges in ramtable, include OUT and IN edges.
query.ramtable_enable	false	Whether to enable ramtable for query of adjacent edges.
query.ramtable_vertices_capacity	10000000	The maximum number of vertices in ramtable, generally the largest vertex id is used as capacity.
query.optimize_aggregate_by_index	false	Whether to optimize aggregate query(like count) by index.
oltp.concurrent_depth	10	The min depth to enable concurrent oltp algorithm.
oltp.concurrent_threads	10	Thread number to concurrently execute oltp algorithm.
oltp.collection_type	EC	The implementation type of collections used in oltp algorithm.
rate_limit.read	0	The max rate(times/s) to execute query of vertices/edges.
rate_limit.write	0	The max rate(items/s) to add/update/delete vertices/edges.
task.wait_timeout	10	Timeout in seconds for waiting for the task to complete,such as when truncating or clearing the backend.
task.input_size_limit	16777216	The job input size limit in bytes.
task.result_size_limit	16777216	The job result size limit in bytes.
task.sync_deletion	false	Whether to delete schema or expired data synchronously.
task.ttl_delete_batch	1	The batch size used to delete expired data.
computer.config	/conf/computer.yaml	The config file path of computer job.
search.text_analyzer	ikanalyzer	Choose a text analyzer for searching the vertex/edge properties, available type are [word, ansj, hanlp, smartcn, jieba, jcseg, mmseg4j, ikanalyzer]. if use ‘ikanalyzer’, need download jar from ‘https://github.com/apache/hugegraph-doc/raw/ik_binary/dist/server/ikanalyzer-2012_u6.jar' to lib directory
search.text_analyzer_mode	smart	Specify the mode for the text analyzer, the available mode of analyzer are {word: [MaximumMatching, ReverseMaximumMatching, MinimumMatching, ReverseMinimumMatching, BidirectionalMaximumMatching, BidirectionalMinimumMatching, BidirectionalMaximumMinimumMatching, FullSegmentation, MinimalWordCount, MaxNgramScore, PureEnglish], ansj: [BaseAnalysis, IndexAnalysis, ToAnalysis, NlpAnalysis], hanlp: [standard, nlp, index, nShort, shortest, speed], smartcn: [], jieba: [SEARCH, INDEX], jcseg: [Simple, Complex], mmseg4j: [Simple, Complex, MaxWord], ikanalyzer: [smart, max_word]}.
snowflake.datacenter_id	0	The datacenter id of snowflake id generator.
snowflake.force_string	false	Whether to force the snowflake long id to be a string.
snowflake.worker_id	0	The worker id of snowflake id generator.
raft.mode	false	Whether the backend storage works in raft mode.
raft.safe_read	false	Whether to use linearly consistent read.
raft.use_snapshot	false	Whether to use snapshot.
raft.endpoint	127.0.0.1:8281	The peerid of current raft node.
raft.group_peers	127.0.0.1:8281,127.0.0.1:8282,127.0.0.1:8283	The peers of current raft group.
raft.path	./raft-log	The log path of current raft node.
raft.use_replicator_pipeline	true	Whether to use replicator line, when turned on it multiple logs can be sent in parallel, and the next log doesn’t have to wait for the ack message of the current log to be sent.
raft.election_timeout	10000	Timeout in milliseconds to launch a round of election.
raft.snapshot_interval	3600	The interval in seconds to trigger snapshot save.
raft.backend_threads	current CPU v-cores	The thread number used to apply task to backend.
raft.read_index_threads	8	The thread number used to execute reading index.
raft.apply_batch	1	The apply batch size to trigger disruptor event handler.
raft.queue_size	16384	The disruptor buffers size for jraft RaftNode, StateMachine and LogManager.
raft.queue_publish_timeout	60	The timeout in second when publish event into disruptor.
raft.rpc_threads	80	The rpc threads for jraft RPC layer.
raft.rpc_connect_timeout	5000	The rpc connect timeout for jraft rpc.
raft.rpc_timeout	60000	The rpc timeout for jraft rpc.
raft.rpc_buf_low_water_mark	10485760	The ChannelOutboundBuffer’s low water mark of netty, when buffer size less than this size, the method ChannelOutboundBuffer.isWritable() will return true, it means that low downstream pressure or good network.
raft.rpc_buf_high_water_mark	20971520	The ChannelOutboundBuffer’s high water mark of netty, only when buffer size exceed this size, the method ChannelOutboundBuffer.isWritable() will return false, it means that the downstream pressure is too great to process the request or network is very congestion, upstream needs to limit rate at this time.
raft.read_strategy	ReadOnlyLeaseBased	The linearizability of read strategy.

RPC server Config Options

config option	default value	description
rpc.client_connect_timeout	20	The timeout(in seconds) of rpc client connect to rpc server.
rpc.client_load_balancer	consistentHash	The rpc client uses a load-balancing algorithm to access multiple rpc servers in one cluster. Default value is ‘consistentHash’, means forwarding by request parameters.
rpc.client_read_timeout	40	The timeout(in seconds) of rpc client read from rpc server.
rpc.client_reconnect_period	10	The period(in seconds) of rpc client reconnect to rpc server.
rpc.client_retries	3	Failed retry number of rpc client calls to rpc server.
rpc.config_order	999	Sofa rpc configuration file loading order, the larger the more later loading.
rpc.logger_impl	com.alipay.sofa.rpc.log.SLF4JLoggerImpl	Sofa rpc log implementation class.
rpc.protocol	bolt	Rpc communication protocol, client and server need to be specified the same value.
rpc.remote_url		The remote urls of rpc peers, it can be set to multiple addresses, which are concat by ‘,’, empty value means not enabled.
rpc.server_adaptive_port	false	Whether the bound port is adaptive, if it’s enabled, when the port is in use, automatically +1 to detect the next available port. Note that this process is not atomic, so there may still be port conflicts.
rpc.server_host		The hosts/ips bound by rpc server to provide services, empty value means not enabled.
rpc.server_port	8090	The port bound by rpc server to provide services.
rpc.server_timeout	30	The timeout(in seconds) of rpc server execution.

Cassandra Backend Config Options

config option	default value	description
backend		Must be set to `cassandra`.
serializer		Must be set to `cassandra`.
cassandra.host	localhost	The seeds hostname or ip address of cassandra cluster.
cassandra.port	9042	The seeds port address of cassandra cluster.
cassandra.connect_timeout	5	The cassandra driver connect server timeout(seconds).
cassandra.read_timeout	20	The cassandra driver read from server timeout(seconds).
cassandra.keyspace.strategy	SimpleStrategy	The replication strategy of keyspace, valid value is SimpleStrategy or NetworkTopologyStrategy.
cassandra.keyspace.replication	[3]	The keyspace replication factor of SimpleStrategy, like ‘[3]’.Or replicas in each datacenter of NetworkTopologyStrategy, like ‘[dc1:2,dc2:1]’.
cassandra.username		The username to use to login to cassandra cluster.
cassandra.password		The password corresponding to cassandra.username.
cassandra.compression_type	none	The compression algorithm of cassandra transport: none/snappy/lz4.
cassandra.jmx_port=7199	7199	The port of JMX API service for cassandra.
cassandra.aggregation_timeout	43200	The timeout in seconds of waiting for aggregation.

ScyllaDB Backend Config Options

config option	default value	description
backend		Must be set to `scylladb`.
serializer		Must be set to `scylladb`.

Other options are consistent with the Cassandra backend.

RocksDB Backend Config Options

config option	default value	description
backend		Must be set to `rocksdb`.
serializer		Must be set to `binary`.
rocksdb.data_disks	[]	The optimized disks for storing data of RocksDB. The format of each element: `STORE/TABLE: /path/disk`.Allowed keys are [g/vertex, g/edge_out, g/edge_in, g/vertex_label_index, g/edge_label_index, g/range_int_index, g/range_float_index, g/range_long_index, g/range_double_index, g/secondary_index, g/search_index, g/shard_index, g/unique_index, g/olap]
rocksdb.data_path	rocksdb-data/data	The path for storing data of RocksDB.
rocksdb.wal_path	rocksdb-data/wal	The path for storing WAL of RocksDB.
rocksdb.allow_mmap_reads	false	Allow the OS to mmap file for reading sst tables.
rocksdb.allow_mmap_writes	false	Allow the OS to mmap file for writing.
rocksdb.block_cache_capacity	8388608	The amount of block cache in bytes that will be used by RocksDB, 0 means no block cache.
rocksdb.bloom_filter_bits_per_key	-1	The bits per key in bloom filter, a good value is 10, which yields a filter with ~ 1% false positive rate, -1 means no bloom filter.
rocksdb.bloom_filter_block_based_mode	false	Use block based filter rather than full filter.
rocksdb.bloom_filter_whole_key_filtering	true	True if place whole keys in the bloom filter, else place the prefix of keys.
rocksdb.bottommost_compression	NO_COMPRESSION	The compression algorithm for the bottommost level of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd.
rocksdb.bulkload_mode	false	Switch to the mode to bulk load data into RocksDB.
rocksdb.cache_index_and_filter_blocks	false	Indicating if we’d put index/filter blocks to the block cache.
rocksdb.compaction_style	LEVEL	Set compaction style for RocksDB: LEVEL/UNIVERSAL/FIFO.
rocksdb.compression	SNAPPY_COMPRESSION	The compression algorithm for compressing blocks of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd.
rocksdb.compression_per_level	[NO_COMPRESSION, NO_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION]	The compression algorithms for different levels of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd.
rocksdb.delayed_write_rate	16777216	The rate limit in bytes/s of user write requests when need to slow down if the compaction gets behind.
rocksdb.log_level	INFO	The info log level of RocksDB.
rocksdb.max_background_jobs	8	Maximum number of concurrent background jobs, including flushes and compactions.
rocksdb.level_compaction_dynamic_level_bytes	false	Whether to enable level_compaction_dynamic_level_bytes, if it’s enabled we give max_bytes_for_level_multiplier a priority against max_bytes_for_level_base, the bytes of base level is dynamic for a more predictable LSM tree, it is useful to limit worse case space amplification. Turning this feature on/off for an existing DB can cause unexpected LSM tree structure so it’s not recommended.
rocksdb.max_bytes_for_level_base	536870912	The upper-bound of the total size of level-1 files in bytes.
rocksdb.max_bytes_for_level_multiplier	10.0	The ratio between the total size of level (L+1) files and the total size of level L files for all L.
rocksdb.max_open_files	-1	The maximum number of open files that can be cached by RocksDB, -1 means no limit.
rocksdb.max_subcompactions	4	The value represents the maximum number of threads per compaction job.
rocksdb.max_write_buffer_number	6	The maximum number of write buffers that are built up in memory.
rocksdb.max_write_buffer_number_to_maintain	0	The total maximum number of write buffers to maintain in memory.
rocksdb.min_write_buffer_number_to_merge	2	The minimum number of write buffers that will be merged together.
rocksdb.num_levels	7	Set the number of levels for this database.
rocksdb.optimize_filters_for_hits	false	This flag allows us to not store filters for the last level.
rocksdb.optimize_mode	true	Optimize for heavy workloads and big datasets.
rocksdb.pin_l0_filter_and_index_blocks_in_cache	false	Indicating if we’d put index/filter blocks to the block cache.
rocksdb.sst_path		The path for ingesting SST file into RocksDB.
rocksdb.target_file_size_base	67108864	The target file size for compaction in bytes.
rocksdb.target_file_size_multiplier	1	The size ratio between a level L file and a level (L+1) file.
rocksdb.use_direct_io_for_flush_and_compaction	false	Enable the OS to use direct read/writes in flush and compaction.
rocksdb.use_direct_reads	false	Enable the OS to use direct I/O for reading sst tables.
rocksdb.write_buffer_size	134217728	Amount of data in bytes to build up in memory.
rocksdb.max_manifest_file_size	104857600	The max size of manifest file in bytes.
rocksdb.skip_stats_update_on_db_open	false	Whether to skip statistics update when opening the database, setting this flag true allows us to not update statistics.
rocksdb.max_file_opening_threads	16	The max number of threads used to open files.
rocksdb.max_total_wal_size	0	Total size of WAL files in bytes. Once WALs exceed this size, we will start forcing the flush of column families related, 0 means no limit.
rocksdb.db_write_buffer_size	0	Total size of write buffers in bytes across all column families, 0 means no limit.
rocksdb.delete_obsolete_files_period	21600	The periodicity in seconds when obsolete files get deleted, 0 means always do full purge.
rocksdb.hard_pending_compaction_bytes_limit	274877906944	The hard limit to impose on pending compaction in bytes.
rocksdb.level0_file_num_compaction_trigger	2	Number of files to trigger level-0 compaction.
rocksdb.level0_slowdown_writes_trigger	20	Soft limit on number of level-0 files for slowing down writes.
rocksdb.level0_stop_writes_trigger	36	Hard limit on number of level-0 files for stopping writes.
rocksdb.soft_pending_compaction_bytes_limit	68719476736	The soft limit to impose on pending compaction in bytes.

HBase Backend Config Options

config option	default value	description
backend		Must be set to `hbase`.
serializer		Must be set to `hbase`.
hbase.hosts	localhost	The hostnames or ip addresses of HBase zookeeper, separated with commas.
hbase.port	2181	The port address of HBase zookeeper.
hbase.threads_max	64	The max threads num of hbase connections.
hbase.znode_parent	/hbase	The znode parent path of HBase zookeeper.
hbase.zk_retry	3	The recovery retry times of HBase zookeeper.
hbase.aggregation_timeout	43200	The timeout in seconds of waiting for aggregation.
hbase.kerberos_enable	false	Is Kerberos authentication enabled for HBase.
hbase.kerberos_keytab		The HBase’s key tab file for kerberos authentication.
hbase.kerberos_principal		The HBase’s principal for kerberos authentication.
hbase.krb5_conf	etc/krb5.conf	Kerberos configuration file, including KDC IP, default realm, etc.
hbase.hbase_site	/etc/hbase/conf/hbase-site.xml	The HBase’s configuration file
hbase.enable_partition	true	Is pre-split partitions enabled for HBase.
hbase.vertex_partitions	10	The number of partitions of the HBase vertex table.
hbase.edge_partitions	30	The number of partitions of the HBase edge table.

MySQL & PostgreSQL Backend Config Options

config option	default value	description
backend		Must be set to `mysql`.
serializer		Must be set to `mysql`.
jdbc.driver	com.mysql.jdbc.Driver	The JDBC driver class to connect database.
jdbc.url	jdbc:mysql://127.0.0.1:3306	The url of database in JDBC format.
jdbc.username	root	The username to login database.
jdbc.password	******	The password corresponding to jdbc.username.
jdbc.ssl_mode	false	The SSL mode of connections with database.
jdbc.reconnect_interval	3	The interval(seconds) between reconnections when the database connection fails.
jdbc.reconnect_max_times	3	The reconnect times when the database connection fails.
jdbc.storage_engine	InnoDB	The storage engine of backend store database, like InnoDB/MyISAM/RocksDB for MySQL.
jdbc.postgresql.connect_database	template1	The database used to connect when init store, drop store or check store exist.

PostgreSQL Backend Config Options

config option	default value	description
backend		Must be set to `postgresql`.
serializer		Must be set to `postgresql`.

Other options are consistent with the MySQL backend.

The driver and url of the PostgreSQL backend should be set to:
jdbc.driver=org.postgresql.Driver
jdbc.url=jdbc:postgresql://localhost:5432/

3 - Built-in User Authentication and Authorization Configuration and Usage in HugeGraph

Overview

To facilitate authentication usage in different user scenarios, HugeGraph currently provides built-in authorization StandardAuthenticator mode, which supports multi-user authentication and fine-grained access control. It adopts a 4-layer design based on “User-UserGroup-Operation-Resource” to flexibly control user roles and permissions (supports multiple GraphServers).

Some key designs of the StandardAuthenticator mode include:

During initialization, a super administrator (admin) user is created. Subsequently, other users can be created by the super administrator. Once newly created users are assigned sufficient permissions, they can create or manage more users.
It supports dynamic creation of users, user groups, and resources, as well as dynamic allocation or revocation of permissions.
Users can belong to one or multiple user groups. Each user group can have permissions to operate on any number of resources. The types of operations include read, write, delete, execute, and others.
“Resource” describes the data in the graph database, such as vertices that meet certain criteria. Each resource consists of three elements: type, label, and properties. There are 18 types in total, with the ability to combine any label and properties. The internal condition of a resource is an AND relationship, while the condition between multiple resources is an OR relationship.

Here is an example to illustrate:

// Scenario: A user only has data read permission for the Beijing area
user(name=xx) -belong-> group(name=xx) -access(read)-> target(graph=graph1, resource={label: person, city: Beijing})

Configure User Authentication

By default, HugeGraph does not enable user authentication, and it needs to be enabled by modifying the configuration file (Note: If used in a production environment or over the internet, please use a Java11 version and enable auth-system to avoid security risks.)

You need to modify the configuration file to enable this feature. HugeGraph provides built-in authentication mode: StandardAuthenticator. This mode supports multi-user authentication and fine-grained permission control. Additionally, developers can implement their own HugeAuthenticator interface to integrate with their existing authentication systems.

HugeGraph authentication modes adopt HTTP Basic Authentication. In simple terms, when sending an HTTP request, you need to set the Authentication header to Basic and provide the corresponding username and password. The corresponding HTTP plaintext format is as follows:

GET http://localhost:8080/graphs/hugegraph/schema/vertexlabels
Authorization: Basic admin xxxx

Warning: Versions of HugeGraph-Server prior to 1.5.0 have a JWT-related security vulnerability in the Auth mode. Users are advised to update to a newer version or manually set the JWT token’s secretKey. It can be set in the rest-server.properties file by setting the auth.token_secret information:

auth.token_secret=XXXX   # should be a 32-chars string, consist of A-Z, a-z and 0-9

You can also generate it with the following command:

RANDOM_STRING=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 32)
echo "auth.token_secret=${RANDOM_STRING}" >> rest-server.properties

StandardAuthenticator Mode

The StandardAuthenticator mode supports user authentication and permission control by storing user information in the database backend. This implementation authenticates users based on their names and passwords (encrypted) stored in the database and controls user permissions based on their roles. Below is the specific configuration process (requires service restart):

Configure the authenticator and its rest-server file path in the gremlin-server.yaml configuration file:

authentication: {
  authenticator: org.apache.hugegraph.auth.StandardAuthenticator,
  authenticationHandler: org.apache.hugegraph.auth.WsAndHttpBasicAuthHandler,
  config: {tokens: conf/rest-server.properties}
}

Configure the authenticator and graph_store information in the rest-server.properties configuration file:

auth.authenticator=org.apache.hugegraph.auth.StandardAuthenticator
auth.graph_store=hugegraph

# Auth Client Config
# If GraphServer and AuthServer are deployed separately, you also need to specify the following configuration. Fill in the IP:RPC port of AuthServer.
# auth.remote_url=127.0.0.1:8899,127.0.0.1:8898,127.0.0.1:8897

In the above configuration, the graph_store option specifies which graph to use for storing user information. If there are multiple graphs, you can choose any of them.

In the hugegraph{n}.properties configuration file, configure the gremlin.graph information:

gremlin.graph=org.apache.hugegraph.auth.HugeFactoryAuthProxy

For detailed API calls and explanations regarding permissions, please refer to the Authentication-API documentation.

Custom User Authentication System

If you need to support a more flexible user system, you can customize the authenticator for extension. Simply implement the org.apache.hugegraph.auth.HugeAuthenticator interface with your custom authenticator, and then modify the authenticator configuration item in the configuration file to point to your implementation.

Switching authentication mode

After the authentication configuration completed, enter the admin password on the command line when executing init store.sh for the first time. (For non-Docker mode)

If deployed based on Docker image or if HugeGraph has already been initialized and needs to be converted to authentication mode, relevant graph data needs to be deleted and HugeGraph needs to be restarted. If there is already business data in the diagram, it is temporarily not possible to directly convert the authentication mode (version<=1.2.0)

Improvements for this feature have been included in the latest release (available in the latest docker image), please refer to PR 2411. Seamless switching is now available.

# stop the hugeGraph firstly
bin/stop-hugegraph.sh

# delete the store data (here we use the default path for rocksdb)
# there is no need to delete in the latest version (fixed in https://github.com/apache/incubator-hugegraph/pull/2411)
rm -rf rocksdb-data/

# init store again
bin/init-store.sh

# start hugeGraph again
bin/start-hugegraph.sh

Use docker to enable authentication mode

For versions of the hugegraph/hugegraph image equal to or greater than 1.2.0, you can enable authentication mode while starting the Docker image.

The steps are as follows:

1. Use docker run

To enable authentication mode, add the environment variable PASSWORD=123456 (you can freely set the password) in the docker run command:

docker run -itd -e PASSWORD=123456 --name=server -p 8080:8080 hugegraph/hugegraph:1.5.0

2. Use docker-compose

Use docker-compose and set the environment variable PASSWORD=123456:

version: '3'
services:
  server:
    image: hugegraph/hugegraph:1.5.0
    container_name: server
    ports:
      - 8080:8080
    environment:
      - PASSWORD=123456

3. Enter the container to enable authentication mode

Enter the container first:

docker exec -it server bash
# Modify the config quickly, the modified file are save in the conf-bak folder
bin/enable-auth.sh

Then follow Switching authentication mode

4 - Configuring HugeGraphServer to Use HTTPS Protocol

Overview

By default, HugeGraphServer uses the HTTP protocol. However, if you have security requirements for your requests, you can configure it to use HTTPS.

Server Configuration

Modify the conf/rest-server.properties configuration file and change the schema part of restserver.url to https.

# Set the protocol to HTTPS
restserver.url=https://127.0.0.1:8080
# Server keystore file path. This default value is automatically effective when using HTTPS, and you can modify it as needed.
ssl.keystore_file=conf/hugegraph-server.keystore
# Server keystore file password. This default value is automatically effective when using HTTPS, and you can modify it as needed.
ssl.keystore_password=******

The server’s conf directory already includes a keystore file named hugegraph-server.keystore, and the password for this file is hugegraph. These are the default values when enabling the HTTPS protocol. Users can generate their own keystore file and password, and then modify the values of ssl.keystore_file and ssl.keystore_password.

Client Configuration

Using HTTPS in HugeGraph-Client

When constructing a HugeClient, pass the HTTPS-related configurations. Here’s an example in Java:

String url = "https://localhost:8080";
String graphName = "hugegraph";
HugeClientBuilder builder = HugeClient.builder(url, graphName);
// Client keystore file path
String trustStoreFilePath = "hugegraph.truststore";
// Client keystore password
String trustStorePassword = "******";
builder.configSSL(trustStoreFilePath, trustStorePassword);
HugeClient hugeClient = builder.build();

Note: Before version 1.9.0, HugeGraph-Client was created directly using the new keyword and did not support the HTTPS protocol. Starting from version 1.9.0, it changed to use the builder pattern and supports configuring the HTTPS protocol.

Using HTTPS in HugeGraph-Loader

When starting an import task, add the following options in the command line:

# HTTPS
--protocol https
# Client certificate file path. When specifying --protocol as https, the default value conf/hugegraph.truststore is automatically used, and you can modify it as needed.
--trust-store-file {file}
# Client certificate file password. When specifying --protocol as https, the default value hugegraph is automatically used, and you can modify it as needed.
--trust-store-password {password}

Under the conf directory of hugegraph-loader, there is already a default client certificate file named hugegraph.truststore, and its password is hugegraph.

Using HTTPS in HugeGraph-Tools

When executing commands, add the following options in the command line:

# Client certificate file path. When using the HTTPS protocol in the URL, the default value conf/hugegraph.truststore is automatically used, and you can modify it as needed.
--trust-store-file {file}
# Client certificate file password. When using the HTTPS protocol in the URL, the default value hugegraph is automatically used, and you can modify it as needed.
--trust-store-password {password}
# When executing migration commands and using the --target-url with the HTTPS protocol, the default value conf/hugegraph.truststore is automatically used, and you can modify it as needed.
--target-trust-store-file {target-file}
# When executing migration commands and using the --target-url with the HTTPS protocol, the default value hugegraph is automatically used, and you can modify it as needed.
--target-trust-store-password {target-password}

Under the conf directory of hugegraph-tools, there is already a default client certificate file named hugegraph.truststore, and its password is hugegraph.

How to Generate Certificate Files

This section provides an example of generating certificates. If the default certificate is sufficient or if you already know how to generate certificates, you can skip this section.

Server

Generate the server’s private key and import it into the server’s keystore file. The server.keystore is for the server’s use and contains its private key.

keytool -genkey -alias serverkey -keyalg RSA -keystore server.keystore

During the process, fill in the description information according to your requirements. The description information for the default certificate is as follows:

First and Last Name: hugegraph
Organizational Unit Name: hugegraph
Organization Name: hugegraph
City or Locality Name: BJ
State or Province Name: BJ
Country Code: CN

Export the server certificate based on the server’s private key.

keytool -export -alias serverkey -keystore server.keystore -file server.crt

server.crt is the server’s certificate.

Client

keytool -import -alias serverkey -file server.crt -keystore client.truststore

client.truststore is for the client’s use and contains the trusted certificate.

5 - HugeGraph-Computer Config

Computer Config Options

config option	default value	description
algorithm.message_class	org.apache.hugegraph.computer.core.config.Null	The class of message passed when compute vertex.
algorithm.params_class	org.apache.hugegraph.computer.core.config.Null	The class used to transfer algorithms’ parameters before algorithm been run.
algorithm.result_class	org.apache.hugegraph.computer.core.config.Null	The class of vertex’s value, the instance is used to store computation result for the vertex.
allocator.max_vertices_per_thread	10000	Maximum number of vertices per thread processed in each memory allocator
bsp.etcd_endpoints	http://localhost:2379	The end points to access etcd.
bsp.log_interval	30000	The log interval(in ms) to print the log while waiting bsp event.
bsp.max_super_step	10	The max super step of the algorithm.
bsp.register_timeout	300000	The max timeout to wait for master and works to register.
bsp.wait_master_timeout	86400000	The max timeout(in ms) to wait for master bsp event.
bsp.wait_workers_timeout	86400000	The max timeout to wait for workers bsp event.
hgkv.max_data_block_size	65536	The max byte size of hgkv-file data block.
hgkv.max_file_size	2147483648	The max number of bytes in each hgkv-file.
hgkv.max_merge_files	10	The max number of files to merge at one time.
hgkv.temp_file_dir	/tmp/hgkv	This folder is used to store temporary files, temporary files will be generated during the file merging process.
hugegraph.name	hugegraph	The graph name to load data and write results back.
hugegraph.url	http://127.0.0.1:8080	The hugegraph url to load data and write results back.
input.edge_direction	OUT	The data of the edge in which direction is loaded, when the value is BOTH, the edges in both OUT and IN direction will be loaded.
input.edge_freq	MULTIPLE	The frequency of edges can exist between a pair of vertices, allowed values: [SINGLE, SINGLE_PER_LABEL, MULTIPLE]. SINGLE means that only one edge can exist between a pair of vertices, use sourceId + targetId to identify it; SINGLE_PER_LABEL means that each edge label can exist one edge between a pair of vertices, use sourceId + edgelabel + targetId to identify it; MULTIPLE means that many edge can exist between a pair of vertices, use sourceId + edgelabel + sortValues + targetId to identify it.
input.filter_class	org.apache.hugegraph.computer.core.input.filter.DefaultInputFilter	The class to create input-filter object, input-filter is used to Filter vertex edges according to user needs.
input.loader_schema_path		The schema path of loader input, only takes effect when the input.source_type=loader is enabled
input.loader_struct_path		The struct path of loader input, only takes effect when the input.source_type=loader is enabled
input.max_edges_in_one_vertex	200	The maximum number of adjacent edges allowed to be attached to a vertex, the adjacent edges will be stored and transferred together as a batch unit.
input.source_type	hugegraph-server	The source type to load input data, allowed values: [‘hugegraph-server’, ‘hugegraph-loader’], the ‘hugegraph-loader’ means use hugegraph-loader load data from HDFS or file, if use ‘hugegraph-loader’ load data then please config ‘input.loader_struct_path’ and ‘input.loader_schema_path’.
input.split_fetch_timeout	300	The timeout in seconds to fetch input splits
input.split_max_splits	10000000	The maximum number of input splits
input.split_page_size	500	The page size for streamed load input split data
input.split_size	1048576	The input split size in bytes
job.id	local_0001	The job id on Yarn cluster or K8s cluster.
job.partitions_count	1	The partitions count for computing one graph algorithm job.
job.partitions_thread_nums	4	The number of threads for partition parallel compute.
job.workers_count	1	The workers count for computing one graph algorithm job.
master.computation_class	org.apache.hugegraph.computer.core.master.DefaultMasterComputation	Master-computation is computation that can determine whether to continue next superstep. It runs at the end of each superstep on master.
output.batch_size	500	The batch size of output
output.batch_threads	1	The threads number used to batch output
output.hdfs_core_site_path		The hdfs core site path.
output.hdfs_delimiter	,	The delimiter of hdfs output.
output.hdfs_kerberos_enable	false	Is Kerberos authentication enabled for Hdfs.
output.hdfs_kerberos_keytab		The Hdfs’s key tab file for kerberos authentication.
output.hdfs_kerberos_principal		The Hdfs’s principal for kerberos authentication.
output.hdfs_krb5_conf	/etc/krb5.conf	Kerberos configuration file.
output.hdfs_merge_partitions	true	Whether merge output files of multiple partitions.
output.hdfs_path_prefix	/hugegraph-computer/results	The directory of hdfs output result.
output.hdfs_replication	3	The replication number of hdfs.
output.hdfs_site_path		The hdfs site path.
output.hdfs_url	hdfs://127.0.0.1:9000	The hdfs url of output.
output.hdfs_user	hadoop	The hdfs user of output.
output.output_class	org.apache.hugegraph.computer.core.output.LogOutput	The class to output the computation result of each vertex. Be called after iteration computation.
output.result_name	value	The value is assigned dynamically by #name() of instance created by WORKER_COMPUTATION_CLASS.
output.result_write_type	OLAP_COMMON	The result write-type to output to hugegraph, allowed values are: [OLAP_COMMON, OLAP_SECONDARY, OLAP_RANGE].
output.retry_interval	10	The retry interval when output failed
output.retry_times	3	The retry times when output failed
output.single_threads	1	The threads number used to single output
output.thread_pool_shutdown_timeout	60	The timeout seconds of output threads pool shutdown
output.with_adjacent_edges	false	Output the adjacent edges of the vertex or not
output.with_edge_properties	false	Output the properties of the edge or not
output.with_vertex_properties	false	Output the properties of the vertex or not
sort.thread_nums	4	The number of threads performing internal sorting.
transport.client_connect_timeout	3000	The timeout(in ms) of client connect to server.
transport.client_threads	4	The number of transport threads for client.
transport.close_timeout	10000	The timeout(in ms) of close server or close client.
transport.finish_session_timeout	0	The timeout(in ms) to finish session, 0 means using (transport.sync_request_timeout * transport.max_pending_requests).
transport.heartbeat_interval	20000	The minimum interval(in ms) between heartbeats on client side.
transport.io_mode	AUTO	The network IO Mode, either ‘NIO’, ‘EPOLL’, ‘AUTO’, the ‘AUTO’ means selecting the property mode automatically.
transport.max_pending_requests	8	The max number of client unreceived ack, it will trigger the sending unavailable if the number of unreceived ack >= max_pending_requests.
transport.max_syn_backlog	511	The capacity of SYN queue on server side, 0 means using system default value.
transport.max_timeout_heartbeat_count	120	The maximum times of timeout heartbeat on client side, if the number of timeouts waiting for heartbeat response continuously > max_heartbeat_timeouts the channel will be closed from client side.
transport.min_ack_interval	200	The minimum interval(in ms) of server reply ack.
transport.min_pending_requests	6	The minimum number of client unreceived ack, it will trigger the sending available if the number of unreceived ack < min_pending_requests.
transport.network_retries	3	The number of retry attempts for network communication,if network unstable.
transport.provider_class	org.apache.hugegraph.computer.core.network.netty.NettyTransportProvider	The transport provider, currently only supports Netty.
transport.receive_buffer_size	0	The size of socket receive-buffer in bytes, 0 means using system default value.
transport.recv_file_mode	true	Whether enable receive buffer-file mode, it will receive buffer write file from socket by zero-copy if enable.
transport.send_buffer_size	0	The size of socket send-buffer in bytes, 0 means using system default value.
transport.server_host	127.0.0.1	The server hostname or ip to listen on to transfer data.
transport.server_idle_timeout	360000	The max timeout(in ms) of server idle.
transport.server_port	0	The server port to listen on to transfer data. The system will assign a random port if it’s set to 0.
transport.server_threads	4	The number of transport threads for server.
transport.sync_request_timeout	10000	The timeout(in ms) to wait response after sending sync-request.
transport.tcp_keep_alive	true	Whether enable TCP keep-alive.
transport.transport_epoll_lt	false	Whether enable EPOLL level-trigger.
transport.write_buffer_high_mark	67108864	The high water mark for write buffer in bytes, it will trigger the sending unavailable if the number of queued bytes > write_buffer_high_mark.
transport.write_buffer_low_mark	33554432	The low water mark for write buffer in bytes, it will trigger the sending available if the number of queued bytes < write_buffer_low_mark.org.apache.hugegraph.config.OptionChecker$$Lambda$97/0x00000008001c8440@776a6d9b
transport.write_socket_timeout	3000	The timeout(in ms) to write data to socket buffer.
valuefile.max_segment_size	1073741824	The max number of bytes in each segment of value-file.
worker.combiner_class	org.apache.hugegraph.computer.core.config.Null	Combiner can combine messages into one value for a vertex, for example page-rank algorithm can combine messages of a vertex to a sum value.
worker.computation_class	org.apache.hugegraph.computer.core.config.Null	The class to create worker-computation object, worker-computation is used to compute each vertex in each superstep.
worker.data_dirs	[jobs]	The directories separated by ‘,’ that received vertices and messages can persist into.
worker.edge_properties_combiner_class	org.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombiner	The combiner can combine several properties of the same edge into one properties at inputstep.
worker.partitioner	org.apache.hugegraph.computer.core.graph.partition.HashPartitioner	The partitioner that decides which partition a vertex should be in, and which worker a partition should be in.
worker.received_buffers_bytes_limit	104857600	The limit bytes of buffers of received data, the total size of all buffers can’t excess this limit. If received buffers reach this limit, they will be merged into a file.
worker.vertex_properties_combiner_class	org.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombiner	The combiner can combine several properties of the same vertex into one properties at inputstep.
worker.wait_finish_messages_timeout	86400000	The max timeout(in ms) message-handler wait for finish-message of all workers.
worker.wait_sort_timeout	600000	The max timeout(in ms) message-handler wait for sort-thread to sort one batch of buffers.
worker.write_buffer_capacity	52428800	The initial size of write buffer that used to store vertex or message.
worker.write_buffer_threshold	52428800	The threshold of write buffer, exceeding it will trigger sorting, the write buffer is used to store vertex or message.

K8s Operator Config Options

NOTE: Option needs to be converted through environment variable settings, e.g. k8s.internal_etcd_url => INTERNAL_ETCD_URL

config option	default value	description
k8s.auto_destroy_pod	true	Whether to automatically destroy all pods when the job is completed or failed.
k8s.close_reconciler_timeout	120	The max timeout(in ms) to close reconciler.
k8s.internal_etcd_url	http://127.0.0.1:2379	The internal etcd url for operator system.
k8s.max_reconcile_retry	3	The max retry times of reconcile.
k8s.probe_backlog	50	The maximum backlog for serving health probes.
k8s.probe_port	9892	The value is the port that the controller bind to for serving health probes.
k8s.ready_check_internal	1000	The time interval(ms) of check ready.
k8s.ready_timeout	30000	The max timeout(in ms) of check ready.
k8s.reconciler_count	10	The max number of reconciler thread.
k8s.resync_period	600000	The minimum frequency at which watched resources are reconciled.
k8s.timezone	Asia/Shanghai	The timezone of computer job and operator.
k8s.watch_namespace	hugegraph-computer-system	The value is watch custom resources in the namespace, ignore other namespaces, the ‘*’ means is all namespaces will be watched.

HugeGraph-Computer CRD

CRD: https://github.com/apache/hugegraph-computer/blob/master/computer-k8s-operator/manifest/hugegraph-computer-crd.v1.yaml

spec	default value	description	required
algorithmName		The name of algorithm.	true
jobId		The job id.	true
image		The image of algorithm.	true
computerConf		The map of computer config options.	true
workerInstances		The number of worker instances, it will instead the ‘job.workers_count’ option.	true
pullPolicy	Always	The pull-policy of image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy	false
pullSecrets		The pull-secrets of Image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod	false
masterCpu		The cpu limit of master, the unit can be ’m’ or without unit detail please refer to：https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu	false
workerCpu		The cpu limit of worker, the unit can be ’m’ or without unit detail please refer to：https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu	false
masterMemory		The memory limit of master, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to：https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory	false
workerMemory		The memory limit of worker, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to：https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory	false
log4jXml		The content of log4j.xml for computer job.	false
jarFile		The jar path of computer algorithm.	false
remoteJarUri		The remote jar uri of computer algorithm, it will overlay algorithm image.	false
jvmOptions		The java startup parameters of computer job.	false
envVars		please refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/	false
envFrom		please refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/	false
masterCommand	bin/start-computer.sh	The run command of master, equivalent to ‘Entrypoint’ field of Docker.	false
masterArgs	["-r master", “-d k8s”]	The run args of master, equivalent to ‘Cmd’ field of Docker.	false
workerCommand	bin/start-computer.sh	The run command of worker, equivalent to ‘Entrypoint’ field of Docker.	false
workerArgs	["-r worker", “-d k8s”]	The run args of worker, equivalent to ‘Cmd’ field of Docker.	false
volumes		Please refer to: https://kubernetes.io/docs/concepts/storage/volumes/	false
volumeMounts		Please refer to: https://kubernetes.io/docs/concepts/storage/volumes/	false
secretPaths		The map of k8s-secret name and mount path.	false
configMapPaths		The map of k8s-configmap name and mount path.	false
podTemplateSpec		Please refer to: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-template-v1/#PodTemplateSpec	false
securityContext		Please refer to: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/	false

KubeDriver Config Options

config option	default value	description
k8s.build_image_bash_path		The path of command used to build image.
k8s.enable_internal_algorithm	true	Whether enable internal algorithm.
k8s.framework_image_url	hugegraph/hugegraph-computer:latest	The image url of computer framework.
k8s.image_repository_password		The password for login image repository.
k8s.image_repository_registry		The address for login image repository.
k8s.image_repository_url	hugegraph/hugegraph-computer	The url of image repository.
k8s.image_repository_username		The username for login image repository.
k8s.internal_algorithm	[pageRank]	The name list of all internal algorithm.
k8s.internal_algorithm_image_url	hugegraph/hugegraph-computer:latest	The image url of internal algorithm.
k8s.jar_file_dir	/cache/jars/	The directory where the algorithm jar to upload location.
k8s.kube_config	~/.kube/config	The path of k8s config file.
k8s.log4j_xml_path		The log4j.xml path for computer job.
k8s.namespace	hugegraph-computer-system	The namespace of hugegraph-computer system.
k8s.pull_secret_names	[]	The names of pull-secret for pulling image.