This is the multi-page printable view of this section. Click here to print.
Config
1 - HugeGraph 配置
1 概述
配置文件的目录为 hugegraph-release/conf,所有关于服务和图本身的配置都在此目录下。
主要的配置文件包括:gremlin-server.yaml、rest-server.properties 和 hugegraph.properties
HugeGraphServer 内部集成了 GremlinServer 和 RestServer,而 gremlin-server.yaml 和 rest-server.properties 就是用来配置这两个 Server 的。
- GremlinServer:GremlinServer 接受用户的 gremlin 语句,解析后转而调用 Core 的代码。
- RestServer:提供 RESTful API,根据不同的 HTTP 请求,调用对应的 Core API,如果用户请求体是 gremlin 语句,则会转发给 GremlinServer,实现对图数据的操作。
下面对这三个配置文件逐一介绍。
2 gremlin-server.yaml
gremlin-server.yaml 文件默认的内容如下:
# host and port of gremlin server, need to be consistent with host and port in rest-server.properties
#host: 127.0.0.1
#port: 8182
# Gremlin 查询中的超时时间(以毫秒为单位)
evaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
# 不要在此处设置图形,此功能将在支持动态添加图形后再进行处理
graphs: {
}
scriptEngines: {
gremlin-groovy: {
staticImports: [
org.opencypher.gremlin.process.traversal.CustomPredicates.*',
org.opencypher.gremlin.traversal.CustomFunctions.*
],
plugins: {
org.apache.hugegraph.plugin.HugeGraphGremlinPlugin: {},
org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {
classImports: [
java.lang.Math,
org.apache.hugegraph.backend.id.IdGenerator,
org.apache.hugegraph.type.define.Directions,
org.apache.hugegraph.type.define.NodeRole,
org.apache.hugegraph.traversal.algorithm.CollectionPathsTraverser,
org.apache.hugegraph.traversal.algorithm.CountTraverser,
org.apache.hugegraph.traversal.algorithm.CustomizedCrosspointsTraverser,
org.apache.hugegraph.traversal.algorithm.CustomizePathsTraverser,
org.apache.hugegraph.traversal.algorithm.FusiformSimilarityTraverser,
org.apache.hugegraph.traversal.algorithm.HugeTraverser,
org.apache.hugegraph.traversal.algorithm.JaccardSimilarTraverser,
org.apache.hugegraph.traversal.algorithm.KneighborTraverser,
org.apache.hugegraph.traversal.algorithm.KoutTraverser,
org.apache.hugegraph.traversal.algorithm.MultiNodeShortestPathTraverser,
org.apache.hugegraph.traversal.algorithm.NeighborRankTraverser,
org.apache.hugegraph.traversal.algorithm.PathsTraverser,
org.apache.hugegraph.traversal.algorithm.PersonalRankTraverser,
org.apache.hugegraph.traversal.algorithm.SameNeighborTraverser,
org.apache.hugegraph.traversal.algorithm.ShortestPathTraverser,
org.apache.hugegraph.traversal.algorithm.SingleSourceShortestPathTraverser,
org.apache.hugegraph.traversal.algorithm.SubGraphTraverser,
org.apache.hugegraph.traversal.algorithm.TemplatePathsTraverser,
org.apache.hugegraph.traversal.algorithm.steps.EdgeStep,
org.apache.hugegraph.traversal.algorithm.steps.RepeatEdgeStep,
org.apache.hugegraph.traversal.algorithm.steps.WeightedEdgeStep,
org.apache.hugegraph.traversal.optimize.ConditionP,
org.apache.hugegraph.traversal.optimize.Text,
org.apache.hugegraph.traversal.optimize.TraversalUtil,
org.apache.hugegraph.util.DateUtil,
org.opencypher.gremlin.traversal.CustomFunctions,
org.opencypher.gremlin.traversal.CustomPredicate
],
methodImports: [
java.lang.Math#*,
org.opencypher.gremlin.traversal.CustomPredicate#*,
org.opencypher.gremlin.traversal.CustomFunctions#*
]
},
org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {
files: [scripts/empty-sample.groovy]
}
}
}
}
serializers:
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphBinaryMessageSerializerV1,
config: {
serializeResultToString: false,
ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
}
}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0,
config: {
serializeResultToString: false,
ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
}
}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV2d0,
config: {
serializeResultToString: false,
ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
}
}
- { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0,
config: {
serializeResultToString: false,
ioRegistries: [org.apache.hugegraph.io.HugeGraphIoRegistry]
}
}
metrics: {
consoleReporter: {enabled: false, interval: 180000},
csvReporter: {enabled: false, interval: 180000, fileName: ./metrics/gremlin-server-metrics.csv},
jmxReporter: {enabled: false},
slf4jReporter: {enabled: false, interval: 180000},
gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
graphiteReporter: {enabled: false, interval: 180000}
}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
enabled: false
}
上面的配置项很多,但目前只需要关注如下几个配置项:channelizer 和 graphs。
- graphs:GremlinServer 启动时需要打开的图,该项是一个 map 结构,key 是图的名字,value 是该图的配置文件路径;
- channelizer:GremlinServer 与客户端有两种通信方式,分别是 WebSocket 和 HTTP(默认)。如果选择 WebSocket, 用户可以通过 Gremlin-Console 快速体验 HugeGraph 的特性,但是不支持大规模数据导入, 推荐使用 HTTP 的通信方式,HugeGraph 的外围组件都是基于 HTTP 实现的;
默认 GremlinServer 是服务在 localhost:8182,如果需要修改,配置 host、port 即可
- host:部署 GremlinServer 机器的机器名或 IP,目前 HugeGraphServer 不支持分布式部署,且 GremlinServer 不直接暴露给用户;
- port:部署 GremlinServer 机器的端口;
同时需要在 rest-server.properties 中增加对应的配置项 gremlinserver.url=http://host:port
3 rest-server.properties
rest-server.properties 文件的默认内容如下:
# bind url
# could use '0.0.0.0' or specified (real)IP to expose external network access
restserver.url=http://127.0.0.1:8080
#restserver.enable_graphspaces_filter=false
# gremlin server url, need to be consistent with host and port in gremlin-server.yaml
#gremlinserver.url=http://127.0.0.1:8182
graphs=./conf/graphs
# The maximum thread ratio for batch writing, only take effect if the batch.max_write_threads is 0
batch.max_write_ratio=80
batch.max_write_threads=0
# configuration of arthas
arthas.telnet_port=8562
arthas.http_port=8561
arthas.ip=127.0.0.1
arthas.disabled_commands=jad
# authentication configs
# choose 'org.apache.hugegraph.auth.StandardAuthenticator' or
# 'org.apache.hugegraph.auth.ConfigAuthenticator'
#auth.authenticator=
# for StandardAuthenticator mode
#auth.graph_store=hugegraph
# auth client config
#auth.remote_url=127.0.0.1:8899,127.0.0.1:8898,127.0.0.1:8897
# for ConfigAuthenticator mode
#auth.admin_token=
#auth.user_tokens=[]
# TODO: Deprecated & removed later (useless from version 1.5.0)
# rpc server configs for multi graph-servers or raft-servers
#rpc.server_host=127.0.0.1
#rpc.server_port=8091
#rpc.server_timeout=30
# rpc client configs (like enable to keep cache consistency)
#rpc.remote_url=127.0.0.1:8091,127.0.0.1:8092,127.0.0.1:8093
#rpc.client_connect_timeout=20
#rpc.client_reconnect_period=10
#rpc.client_read_timeout=40
#rpc.client_retries=3
#rpc.client_load_balancer=consistentHash
# raft group initial peers
#raft.group_peers=127.0.0.1:8091,127.0.0.1:8092,127.0.0.1:8093
# lightweight load balancing (beta)
server.id=server-1
server.role=master
# slow query log
log.slow_query_threshold=1000
# jvm(in-heap) memory usage monitor, set 1 to disable it
memory_monitor.threshold=0.85
memory_monitor.period=2000
- restserver.url:RestServer 提供服务的 url,根据实际环境修改。如果其他 IP 地址无法访问,可以尝试修改为特定的地址;或修改为
http://0.0.0.0
来监听来自任何 IP 地址的请求,这种方案较为便捷,但需要留意服务可被访问的网络范围; - graphs:RestServer 启动时也需要打开图,该项为 map 结构,key 是图的名字,value 是该图的配置文件路径;
注意:gremlin-server.yaml 和 rest-server.properties 都包含 graphs 配置项,而
init-store
命令是根据 gremlin-server.yaml 的 graphs 下的图进行初始化的。
配置项 gremlinserver.url 是 GremlinServer 为 RestServer 提供服务的 url,该配置项默认为 http://localhost:8182,如需修改,需要和 gremlin-server.yaml 中的 host 和 port 相匹配;
4 hugegraph.properties
hugegraph.properties 是一类文件,因为如果系统存在多个图,则会有多个相似的文件。该文件用来配置与图存储和查询相关的参数,文件的默认内容如下:
# gremlin entrence to create graph
gremlin.graph=org.apache.hugegraph.HugeFactory
# cache config
#schema.cache_capacity=100000
# vertex-cache default is 1000w, 10min expired
#vertex.cache_capacity=10000000
#vertex.cache_expire=600
# edge-cache default is 100w, 10min expired
#edge.cache_capacity=1000000
#edge.cache_expire=600
# schema illegal name template
#schema.illegal_name_regex=\s+|~.*
#vertex.default_label=vertex
backend=rocksdb
serializer=binary
store=hugegraph
raft.mode=false
raft.safe_read=false
raft.use_snapshot=false
raft.endpoint=127.0.0.1:8281
raft.group_peers=127.0.0.1:8281,127.0.0.1:8282,127.0.0.1:8283
raft.path=./raft-log
raft.use_replicator_pipeline=true
raft.election_timeout=10000
raft.snapshot_interval=3600
raft.backend_threads=48
raft.read_index_threads=8
raft.queue_size=16384
raft.queue_publish_timeout=60
raft.apply_batch=1
raft.rpc_threads=80
raft.rpc_connect_timeout=5000
raft.rpc_timeout=60000
# if use 'ikanalyzer', need download jar from 'https://github.com/apache/hugegraph-doc/raw/ik_binary/dist/server/ikanalyzer-2012_u6.jar' to lib directory
search.text_analyzer=jieba
search.text_analyzer_mode=INDEX
# rocksdb backend config
#rocksdb.data_path=/path/to/disk
#rocksdb.wal_path=/path/to/disk
# cassandra backend config
cassandra.host=localhost
cassandra.port=9042
cassandra.username=
cassandra.password=
#cassandra.connect_timeout=5
#cassandra.read_timeout=20
#cassandra.keyspace.strategy=SimpleStrategy
#cassandra.keyspace.replication=3
# hbase backend config
#hbase.hosts=localhost
#hbase.port=2181
#hbase.znode_parent=/hbase
#hbase.threads_max=64
# mysql backend config
#jdbc.driver=com.mysql.jdbc.Driver
#jdbc.url=jdbc:mysql://127.0.0.1:3306
#jdbc.username=root
#jdbc.password=
#jdbc.reconnect_max_times=3
#jdbc.reconnect_interval=3
#jdbc.ssl_mode=false
# postgresql & cockroachdb backend config
#jdbc.driver=org.postgresql.Driver
#jdbc.url=jdbc:postgresql://localhost:5432/
#jdbc.username=postgres
#jdbc.password=
# palo backend config
#palo.host=127.0.0.1
#palo.poll_interval=10
#palo.temp_dir=./palo-data
#palo.file_limit_size=32
重点关注未注释的几项:
- gremlin.graph:GremlinServer 的启动入口,用户不要修改此项;
- backend:使用的后端存储,可选值有 memory、cassandra、scylladb、mysql、hbase、postgresql 和 rocksdb;
- serializer:主要为内部使用,用于将 schema、vertex 和 edge 序列化到后端,对应的可选值为 text、cassandra、scylladb 和 binary;(注:rocksdb 后端值需是 binary,其他后端 backend 与 serializer 值需保持一致,如 hbase 后端该值为 hbase)
- store:图存储到后端使用的数据库名,在 cassandra 和 scylladb 中就是 keyspace 名,此项的值与 GremlinServer 和 RestServer 中的图名并无关系,但是出于直观考虑,建议仍然使用相同的名字;
- cassandra.host:backend 为 cassandra 或 scylladb 时此项才有意义,cassandra/scylladb 集群的 seeds;
- cassandra.port:backend 为 cassandra 或 scylladb 时此项才有意义,cassandra/scylladb 集群的 native port;
- rocksdb.data_path:backend 为 rocksdb 时此项才有意义,rocksdb 的数据目录
- rocksdb.wal_path:backend 为 rocksdb 时此项才有意义,rocksdb 的日志目录
- admin.token: 通过一个 token 来获取服务器的配置信息,例如:http://localhost:8080/graphs/hugegraph/conf?token=162f7848-0b6d-4faf-b557-3a0797869c55
5 多图配置
我们的系统是可以存在多个图的,并且各个图的后端可以不一样,比如图 hugegraph_rocksdb
和 hugegraph_mysql
,其中 hugegraph_rocksdb
以 RocksDB
作为后端,hugegraph_mysql
以 MySQL
作为后端。
配置方法也很简单:
[可选]:修改 rest-server.properties
通过修改 rest-server.properties
中的 graphs
配置项来设置图的配置文件目录。默认配置为 graphs=./conf/graphs
,如果想要修改为其它目录则调整 graphs
配置项,比如调整为 graphs=/etc/hugegraph/graphs
,示例如下:
graphs=./conf/graphs
在 conf/graphs
路径下基于 hugegraph.properties
修改得到 hugegraph_mysql_backend.properties
和 hugegraph_rocksdb_backend.properties
hugegraph_mysql_backend.properties
修改的部分如下:
backend=mysql
serializer=mysql
store=hugegraph_mysql
# mysql backend config
jdbc.driver=com.mysql.cj.jdbc.Driver
jdbc.url=jdbc:mysql://127.0.0.1:3306
jdbc.username=root
jdbc.password=123456
jdbc.reconnect_max_times=3
jdbc.reconnect_interval=3
jdbc.ssl_mode=false
hugegraph_rocksdb_backend.properties
修改的部分如下:
backend=rocksdb
serializer=binary
store=hugegraph_rocksdb
停止 Server,初始化执行 init-store.sh(为新的图创建数据库),重新启动 Server
$ ./bin/stop-hugegraph.sh
$ ./bin/init-store.sh
Initializing HugeGraph Store...
2023-06-11 14:16:14 [main] [INFO] o.a.h.u.ConfigUtil - Scanning option 'graphs' directory './conf/graphs'
2023-06-11 14:16:14 [main] [INFO] o.a.h.c.InitStore - Init graph with config file: ./conf/graphs/hugegraph_rocksdb_backend.properties
...
2023-06-11 14:16:15 [main] [INFO] o.a.h.StandardHugeGraph - Graph 'hugegraph_rocksdb' has been initialized
2023-06-11 14:16:15 [main] [INFO] o.a.h.c.InitStore - Init graph with config file: ./conf/graphs/hugegraph_mysql_backend.properties
...
2023-06-11 14:16:16 [main] [INFO] o.a.h.StandardHugeGraph - Graph 'hugegraph_mysql' has been initialized
2023-06-11 14:16:16 [main] [INFO] o.a.h.StandardHugeGraph - Close graph standardhugegraph[hugegraph_rocksdb]
...
2023-06-11 14:16:16 [main] [INFO] o.a.h.HugeFactory - HugeFactory shutdown
2023-06-11 14:16:16 [hugegraph-shutdown] [INFO] o.a.h.HugeFactory - HugeGraph is shutting down
Initialization finished.
$ ./bin/start-hugegraph.sh
Starting HugeGraphServer...
Connecting to HugeGraphServer (http://127.0.0.1:8080/graphs)...OK
Started [pid 21614]
查看创建的图:
curl http://127.0.0.1:8080/graphs/
{"graphs":["hugegraph_rocksdb","hugegraph_mysql"]}
查看某个图的信息:
curl http://127.0.0.1:8080/graphs/hugegraph_mysql_backend
{"name":"hugegraph_mysql","backend":"mysql"}
curl http://127.0.0.1:8080/graphs/hugegraph_rocksdb_backend
{"name":"hugegraph_rocksdb","backend":"rocksdb"}
2 - HugeGraph 配置项
Gremlin Server 配置项
对应配置文件gremlin-server.yaml
config option | default value | description |
---|---|---|
host | 127.0.0.1 | The host or ip of Gremlin Server. |
port | 8182 | The listening port of Gremlin Server. |
graphs | hugegraph: conf/hugegraph.properties | The map of graphs with name and config file path. |
scriptEvaluationTimeout | 30000 | The timeout for gremlin script execution(millisecond). |
channelizer | org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer | Indicates the protocol which the Gremlin Server provides service. |
authentication | authenticator: org.apache.hugegraph.auth.StandardAuthenticator, config: {tokens: conf/rest-server.properties} | The authenticator and config(contains tokens path) of authentication mechanism. |
Rest Server & API 配置项
对应配置文件rest-server.properties
config option | default value | description |
---|---|---|
graphs | [hugegraph:conf/hugegraph.properties] | The map of graphs’ name and config file. |
server.id | server-1 | The id of rest server, used for license verification. |
server.role | master | The role of nodes in the cluster, available types are [master, worker, computer] |
restserver.url | http://127.0.0.1:8080 | The url for listening of rest server. |
ssl.keystore_file | server.keystore | The path of server keystore file used when https protocol is enabled. |
ssl.keystore_password | The password of the path of the server keystore file used when the https protocol is enabled. | |
restserver.max_worker_threads | 2 * CPUs | The maximum worker threads of rest server. |
restserver.min_free_memory | 64 | The minimum free memory(MB) of rest server, requests will be rejected when the available memory of system is lower than this value. |
restserver.request_timeout | 30 | The time in seconds within which a request must complete, -1 means no timeout. |
restserver.connection_idle_timeout | 30 | The time in seconds to keep an inactive connection alive, -1 means no timeout. |
restserver.connection_max_requests | 256 | The max number of HTTP requests allowed to be processed on one keep-alive connection, -1 means unlimited. |
gremlinserver.url | http://127.0.0.1:8182 | The url of gremlin server. |
gremlinserver.max_route | 8 | The max route number for gremlin server. |
gremlinserver.timeout | 30 | The timeout in seconds of waiting for gremlin server. |
batch.max_edges_per_batch | 500 | The maximum number of edges submitted per batch. |
batch.max_vertices_per_batch | 500 | The maximum number of vertices submitted per batch. |
batch.max_write_ratio | 50 | The maximum thread ratio for batch writing, only take effect if the batch.max_write_threads is 0. |
batch.max_write_threads | 0 | The maximum threads for batch writing, if the value is 0, the actual value will be set to batch.max_write_ratio * restserver.max_worker_threads. |
auth.authenticator | The class path of authenticator implementation. e.g., org.apache.hugegraph.auth.StandardAuthenticator, or org.apache.hugegraph.auth.ConfigAuthenticator. | |
auth.admin_token | 162f7848-0b6d-4faf-b557-3a0797869c55 | Token for administrator operations, only for org.apache.hugegraph.auth.ConfigAuthenticator. |
auth.graph_store | hugegraph | The name of graph used to store authentication information, like users, only for org.apache.hugegraph.auth.StandardAuthenticator. |
auth.user_tokens | [hugegraph:9fd95c9c-711b-415b-b85f-d4df46ba5c31] | The map of user tokens with name and password, only for org.apache.hugegraph.auth.ConfigAuthenticator. |
auth.audit_log_rate | 1000.0 | The max rate of audit log output per user, default value is 1000 records per second. |
auth.cache_capacity | 10240 | The max cache capacity of each auth cache item. |
auth.cache_expire | 600 | The expiration time in seconds of vertex cache. |
auth.remote_url | If the address is empty, it provide auth service, otherwise it is auth client and also provide auth service through rpc forwarding. The remote url can be set to multiple addresses, which are concat by ‘,’. | |
auth.token_expire | 86400 | The expiration time in seconds after token created |
auth.token_secret | FXQXbJtbCLxODc6tGci732pkH1cyf8Qg | Secret key of HS256 algorithm. |
exception.allow_trace | false | Whether to allow exception trace stack. |
memory_monitor.threshold | 0.85 | The threshold of JVM(in-heap) memory usage monitoring , 1 means disabling this function. |
memory_monitor.period | 2000 | The period in ms of JVM(in-heap) memory usage monitoring. |
基本配置项
基本配置项及后端配置项对应配置文件:{graph-name}.properties,如hugegraph.properties
config option | default value | description |
---|---|---|
gremlin.graph | org.apache.hugegraph.HugeFactory | Gremlin entrance to create graph. |
backend | rocksdb | The data store type, available values are [memory, rocksdb, cassandra, scylladb, hbase, mysql]. |
serializer | binary | The serializer for backend store, available values are [text, binary, cassandra, hbase, mysql]. |
store | hugegraph | The database name like Cassandra Keyspace. |
store.connection_detect_interval | 600 | The interval in seconds for detecting connections, if the idle time of a connection exceeds this value, detect it and reconnect if needed before using, value 0 means detecting every time. |
store.graph | g | The graph table name, which store vertex, edge and property. |
store.schema | m | The schema table name, which store meta data. |
store.system | s | The system table name, which store system data. |
schema.illegal_name_regex | .\s+$|~. | The regex specified the illegal format for schema name. |
schema.cache_capacity | 10000 | The max cache size(items) of schema cache. |
vertex.cache_type | l2 | The type of vertex cache, allowed values are [l1, l2]. |
vertex.cache_capacity | 10000000 | The max cache size(items) of vertex cache. |
vertex.cache_expire | 600 | The expire time in seconds of vertex cache. |
vertex.check_customized_id_exist | false | Whether to check the vertices exist for those using customized id strategy. |
vertex.default_label | vertex | The default vertex label. |
vertex.tx_capacity | 10000 | The max size(items) of vertices(uncommitted) in transaction. |
vertex.check_adjacent_vertex_exist | false | Whether to check the adjacent vertices of edges exist. |
vertex.lazy_load_adjacent_vertex | true | Whether to lazy load adjacent vertices of edges. |
vertex.part_edge_commit_size | 5000 | Whether to enable the mode to commit part of edges of vertex, enabled if commit size > 0, 0 means disabled. |
vertex.encode_primary_key_number | true | Whether to encode number value of primary key in vertex id. |
vertex.remove_left_index_at_overwrite | false | Whether remove left index at overwrite. |
edge.cache_type | l2 | The type of edge cache, allowed values are [l1, l2]. |
edge.cache_capacity | 1000000 | The max cache size(items) of edge cache. |
edge.cache_expire | 600 | The expiration time in seconds of edge cache. |
edge.tx_capacity | 10000 | The max size(items) of edges(uncommitted) in transaction. |
query.page_size | 500 | The size of each page when querying by paging. |
query.batch_size | 1000 | The size of each batch when querying by batch. |
query.ignore_invalid_data | true | Whether to ignore invalid data of vertex or edge. |
query.index_intersect_threshold | 1000 | The maximum number of intermediate results to intersect indexes when querying by multiple single index properties. |
query.ramtable_edges_capacity | 20000000 | The maximum number of edges in ramtable, include OUT and IN edges. |
query.ramtable_enable | false | Whether to enable ramtable for query of adjacent edges. |
query.ramtable_vertices_capacity | 10000000 | The maximum number of vertices in ramtable, generally the largest vertex id is used as capacity. |
query.optimize_aggregate_by_index | false | Whether to optimize aggregate query(like count) by index. |
oltp.concurrent_depth | 10 | The min depth to enable concurrent oltp algorithm. |
oltp.concurrent_threads | 10 | Thread number to concurrently execute oltp algorithm. |
oltp.collection_type | EC | The implementation type of collections used in oltp algorithm. |
rate_limit.read | 0 | The max rate(times/s) to execute query of vertices/edges. |
rate_limit.write | 0 | The max rate(items/s) to add/update/delete vertices/edges. |
task.wait_timeout | 10 | Timeout in seconds for waiting for the task to complete,such as when truncating or clearing the backend. |
task.input_size_limit | 16777216 | The job input size limit in bytes. |
task.result_size_limit | 16777216 | The job result size limit in bytes. |
task.sync_deletion | false | Whether to delete schema or expired data synchronously. |
task.ttl_delete_batch | 1 | The batch size used to delete expired data. |
computer.config | /conf/computer.yaml | The config file path of computer job. |
search.text_analyzer | ikanalyzer | Choose a text analyzer for searching the vertex/edge properties, available type are [word, ansj, hanlp, smartcn, jieba, jcseg, mmseg4j, ikanalyzer]. # if use ‘ikanalyzer’, need download jar from ‘https://github.com/apache/hugegraph-doc/raw/ik_binary/dist/server/ikanalyzer-2012_u6.jar' to lib directory |
search.text_analyzer_mode | smart | Specify the mode for the text analyzer, the available mode of analyzer are {word: [MaximumMatching, ReverseMaximumMatching, MinimumMatching, ReverseMinimumMatching, BidirectionalMaximumMatching, BidirectionalMinimumMatching, BidirectionalMaximumMinimumMatching, FullSegmentation, MinimalWordCount, MaxNgramScore, PureEnglish], ansj: [BaseAnalysis, IndexAnalysis, ToAnalysis, NlpAnalysis], hanlp: [standard, nlp, index, nShort, shortest, speed], smartcn: [], jieba: [SEARCH, INDEX], jcseg: [Simple, Complex], mmseg4j: [Simple, Complex, MaxWord], ikanalyzer: [smart, max_word]}. |
snowflake.datecenter_id | 0 | The datacenter id of snowflake id generator. |
snowflake.force_string | false | Whether to force the snowflake long id to be a string. |
snowflake.worker_id | 0 | The worker id of snowflake id generator. |
raft.mode | false | Whether the backend storage works in raft mode. |
raft.safe_read | false | Whether to use linearly consistent read. |
raft.use_snapshot | false | Whether to use snapshot. |
raft.endpoint | 127.0.0.1:8281 | The peerid of current raft node. |
raft.group_peers | 127.0.0.1:8281,127.0.0.1:8282,127.0.0.1:8283 | The peers of current raft group. |
raft.path | ./raft-log | The log path of current raft node. |
raft.use_replicator_pipeline | true | Whether to use replicator line, when turned on it multiple logs can be sent in parallel, and the next log doesn’t have to wait for the ack message of the current log to be sent. |
raft.election_timeout | 10000 | Timeout in milliseconds to launch a round of election. |
raft.snapshot_interval | 3600 | The interval in seconds to trigger snapshot save. |
raft.backend_threads | current CPU v-cores | The thread number used to apply task to backend. |
raft.read_index_threads | 8 | The thread number used to execute reading index. |
raft.apply_batch | 1 | The apply batch size to trigger disruptor event handler. |
raft.queue_size | 16384 | The disruptor buffers size for jraft RaftNode, StateMachine and LogManager. |
raft.queue_publish_timeout | 60 | The timeout in second when publish event into disruptor. |
raft.rpc_threads | 80 | The rpc threads for jraft RPC layer. |
raft.rpc_connect_timeout | 5000 | The rpc connect timeout for jraft rpc. |
raft.rpc_timeout | 60000 | The rpc timeout for jraft rpc. |
raft.rpc_buf_low_water_mark | 10485760 | The ChannelOutboundBuffer’s low water mark of netty, when buffer size less than this size, the method ChannelOutboundBuffer.isWritable() will return true, it means that low downstream pressure or good network. |
raft.rpc_buf_high_water_mark | 20971520 | The ChannelOutboundBuffer’s high water mark of netty, only when buffer size exceed this size, the method ChannelOutboundBuffer.isWritable() will return false, it means that the downstream pressure is too great to process the request or network is very congestion, upstream needs to limit rate at this time. |
raft.read_strategy | ReadOnlyLeaseBased | The linearizability of read strategy. |
RPC server 配置
config option | default value | description |
---|---|---|
rpc.client_connect_timeout | 20 | The timeout(in seconds) of rpc client connect to rpc server. |
rpc.client_load_balancer | consistentHash | The rpc client uses a load-balancing algorithm to access multiple rpc servers in one cluster. Default value is ‘consistentHash’, means forwarding by request parameters. |
rpc.client_read_timeout | 40 | The timeout(in seconds) of rpc client read from rpc server. |
rpc.client_reconnect_period | 10 | The period(in seconds) of rpc client reconnect to rpc server. |
rpc.client_retries | 3 | Failed retry number of rpc client calls to rpc server. |
rpc.config_order | 999 | Sofa rpc configuration file loading order, the larger the more later loading. |
rpc.logger_impl | com.alipay.sofa.rpc.log.SLF4JLoggerImpl | Sofa rpc log implementation class. |
rpc.protocol | bolt | Rpc communication protocol, client and server need to be specified the same value. |
rpc.remote_url | The remote urls of rpc peers, it can be set to multiple addresses, which are concat by ‘,’, empty value means not enabled. | |
rpc.server_adaptive_port | false | Whether the bound port is adaptive, if it’s enabled, when the port is in use, automatically +1 to detect the next available port. Note that this process is not atomic, so there may still be port conflicts. |
rpc.server_host | The hosts/ips bound by rpc server to provide services, empty value means not enabled. | |
rpc.server_port | 8090 | The port bound by rpc server to provide services. |
rpc.server_timeout | 30 | The timeout(in seconds) of rpc server execution. |
Cassandra 后端配置项
config option | default value | description |
---|---|---|
backend | Must be set to cassandra . | |
serializer | Must be set to cassandra . | |
cassandra.host | localhost | The seeds hostname or ip address of cassandra cluster. |
cassandra.port | 9042 | The seeds port address of cassandra cluster. |
cassandra.connect_timeout | 5 | The cassandra driver connect server timeout(seconds). |
cassandra.read_timeout | 20 | The cassandra driver read from server timeout(seconds). |
cassandra.keyspace.strategy | SimpleStrategy | The replication strategy of keyspace, valid value is SimpleStrategy or NetworkTopologyStrategy. |
cassandra.keyspace.replication | [3] | The keyspace replication factor of SimpleStrategy, like ‘[3]’.Or replicas in each datacenter of NetworkTopologyStrategy, like ‘[dc1:2,dc2:1]’. |
cassandra.username | The username to use to login to cassandra cluster. | |
cassandra.password | The password corresponding to cassandra.username. | |
cassandra.compression_type | none | The compression algorithm of cassandra transport: none/snappy/lz4. |
cassandra.jmx_port=7199 | 7199 | The port of JMX API service for cassandra. |
cassandra.aggregation_timeout | 43200 | The timeout in seconds of waiting for aggregation. |
ScyllaDB 后端配置项
config option | default value | description |
---|---|---|
backend | Must be set to scylladb . | |
serializer | Must be set to scylladb . |
其它与 Cassandra 后端一致。
RocksDB 后端配置项
config option | default value | description |
---|---|---|
backend | Must be set to rocksdb . | |
serializer | Must be set to binary . | |
rocksdb.data_disks | [] | The optimized disks for storing data of RocksDB. The format of each element: STORE/TABLE: /path/disk .Allowed keys are [g/vertex, g/edge_out, g/edge_in, g/vertex_label_index, g/edge_label_index, g/range_int_index, g/range_float_index, g/range_long_index, g/range_double_index, g/secondary_index, g/search_index, g/shard_index, g/unique_index, g/olap] |
rocksdb.data_path | rocksdb-data/data | The path for storing data of RocksDB. |
rocksdb.wal_path | rocksdb-data/wal | The path for storing WAL of RocksDB. |
rocksdb.allow_mmap_reads | false | Allow the OS to mmap file for reading sst tables. |
rocksdb.allow_mmap_writes | false | Allow the OS to mmap file for writing. |
rocksdb.block_cache_capacity | 8388608 | The amount of block cache in bytes that will be used by RocksDB, 0 means no block cache. |
rocksdb.bloom_filter_bits_per_key | -1 | The bits per key in bloom filter, a good value is 10, which yields a filter with ~ 1% false positive rate, -1 means no bloom filter. |
rocksdb.bloom_filter_block_based_mode | false | Use block based filter rather than full filter. |
rocksdb.bloom_filter_whole_key_filtering | true | True if place whole keys in the bloom filter, else place the prefix of keys. |
rocksdb.bottommost_compression | NO_COMPRESSION | The compression algorithm for the bottommost level of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd. |
rocksdb.bulkload_mode | false | Switch to the mode to bulk load data into RocksDB. |
rocksdb.cache_index_and_filter_blocks | false | Indicating if we’d put index/filter blocks to the block cache. |
rocksdb.compaction_style | LEVEL | Set compaction style for RocksDB: LEVEL/UNIVERSAL/FIFO. |
rocksdb.compression | SNAPPY_COMPRESSION | The compression algorithm for compressing blocks of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd. |
rocksdb.compression_per_level | [NO_COMPRESSION, NO_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION, SNAPPY_COMPRESSION] | The compression algorithms for different levels of RocksDB, allowed values are none/snappy/z/bzip2/lz4/lz4hc/xpress/zstd. |
rocksdb.delayed_write_rate | 16777216 | The rate limit in bytes/s of user write requests when need to slow down if the compaction gets behind. |
rocksdb.log_level | INFO | The info log level of RocksDB. |
rocksdb.max_background_jobs | 8 | Maximum number of concurrent background jobs, including flushes and compactions. |
rocksdb.level_compaction_dynamic_level_bytes | false | Whether to enable level_compaction_dynamic_level_bytes, if it’s enabled we give max_bytes_for_level_multiplier a priority against max_bytes_for_level_base, the bytes of base level is dynamic for a more predictable LSM tree, it is useful to limit worse case space amplification. Turning this feature on/off for an existing DB can cause unexpected LSM tree structure so it’s not recommended. |
rocksdb.max_bytes_for_level_base | 536870912 | The upper-bound of the total size of level-1 files in bytes. |
rocksdb.max_bytes_for_level_multiplier | 10.0 | The ratio between the total size of level (L+1) files and the total size of level L files for all L. |
rocksdb.max_open_files | -1 | The maximum number of open files that can be cached by RocksDB, -1 means no limit. |
rocksdb.max_subcompactions | 4 | The value represents the maximum number of threads per compaction job. |
rocksdb.max_write_buffer_number | 6 | The maximum number of write buffers that are built up in memory. |
rocksdb.max_write_buffer_number_to_maintain | 0 | The total maximum number of write buffers to maintain in memory. |
rocksdb.min_write_buffer_number_to_merge | 2 | The minimum number of write buffers that will be merged together. |
rocksdb.num_levels | 7 | Set the number of levels for this database. |
rocksdb.optimize_filters_for_hits | false | This flag allows us to not store filters for the last level. |
rocksdb.optimize_mode | true | Optimize for heavy workloads and big datasets. |
rocksdb.pin_l0_filter_and_index_blocks_in_cache | false | Indicating if we’d put index/filter blocks to the block cache. |
rocksdb.sst_path | The path for ingesting SST file into RocksDB. | |
rocksdb.target_file_size_base | 67108864 | The target file size for compaction in bytes. |
rocksdb.target_file_size_multiplier | 1 | The size ratio between a level L file and a level (L+1) file. |
rocksdb.use_direct_io_for_flush_and_compaction | false | Enable the OS to use direct read/writes in flush and compaction. |
rocksdb.use_direct_reads | false | Enable the OS to use direct I/O for reading sst tables. |
rocksdb.write_buffer_size | 134217728 | Amount of data in bytes to build up in memory. |
rocksdb.max_manifest_file_size | 104857600 | The max size of manifest file in bytes. |
rocksdb.skip_stats_update_on_db_open | false | Whether to skip statistics update when opening the database, setting this flag true allows us to not update statistics. |
rocksdb.max_file_opening_threads | 16 | The max number of threads used to open files. |
rocksdb.max_total_wal_size | 0 | Total size of WAL files in bytes. Once WALs exceed this size, we will start forcing the flush of column families related, 0 means no limit. |
rocksdb.db_write_buffer_size | 0 | Total size of write buffers in bytes across all column families, 0 means no limit. |
rocksdb.delete_obsolete_files_period | 21600 | The periodicity in seconds when obsolete files get deleted, 0 means always do full purge. |
rocksdb.hard_pending_compaction_bytes_limit | 274877906944 | The hard limit to impose on pending compaction in bytes. |
rocksdb.level0_file_num_compaction_trigger | 2 | Number of files to trigger level-0 compaction. |
rocksdb.level0_slowdown_writes_trigger | 20 | Soft limit on number of level-0 files for slowing down writes. |
rocksdb.level0_stop_writes_trigger | 36 | Hard limit on number of level-0 files for stopping writes. |
rocksdb.soft_pending_compaction_bytes_limit | 68719476736 | The soft limit to impose on pending compaction in bytes. |
HBase 后端配置项
config option | default value | description |
---|---|---|
backend | Must be set to hbase . | |
serializer | Must be set to hbase . | |
hbase.hosts | localhost | The hostnames or ip addresses of HBase zookeeper, separated with commas. |
hbase.port | 2181 | The port address of HBase zookeeper. |
hbase.threads_max | 64 | The max threads num of hbase connections. |
hbase.znode_parent | /hbase | The znode parent path of HBase zookeeper. |
hbase.zk_retry | 3 | The recovery retry times of HBase zookeeper. |
hbase.aggregation_timeout | 43200 | The timeout in seconds of waiting for aggregation. |
hbase.kerberos_enable | false | Is Kerberos authentication enabled for HBase. |
hbase.kerberos_keytab | The HBase’s key tab file for kerberos authentication. | |
hbase.kerberos_principal | The HBase’s principal for kerberos authentication. | |
hbase.krb5_conf | etc/krb5.conf | Kerberos configuration file, including KDC IP, default realm, etc. |
hbase.hbase_site | /etc/hbase/conf/hbase-site.xml | The HBase’s configuration file |
hbase.enable_partition | true | Is pre-split partitions enabled for HBase. |
hbase.vertex_partitions | 10 | The number of partitions of the HBase vertex table. |
hbase.edge_partitions | 30 | The number of partitions of the HBase edge table. |
MySQL & PostgreSQL 后端配置项
config option | default value | description |
---|---|---|
backend | Must be set to mysql . | |
serializer | Must be set to mysql . | |
jdbc.driver | com.mysql.jdbc.Driver | The JDBC driver class to connect database. |
jdbc.url | jdbc:mysql://127.0.0.1:3306 | The url of database in JDBC format. |
jdbc.username | root | The username to login database. |
jdbc.password | ****** | The password corresponding to jdbc.username. |
jdbc.ssl_mode | false | The SSL mode of connections with database. |
jdbc.reconnect_interval | 3 | The interval(seconds) between reconnections when the database connection fails. |
jdbc.reconnect_max_times | 3 | The reconnect times when the database connection fails. |
jdbc.storage_engine | InnoDB | The storage engine of backend store database, like InnoDB/MyISAM/RocksDB for MySQL. |
jdbc.postgresql.connect_database | template1 | The database used to connect when init store, drop store or check store exist. |
PostgreSQL 后端配置项
config option | default value | description |
---|---|---|
backend | Must be set to postgresql . | |
serializer | Must be set to postgresql . |
其它与 MySQL 后端一致。
PostgreSQL 后端的 driver 和 url 应该设置为:
jdbc.driver=org.postgresql.Driver
jdbc.url=jdbc:postgresql://localhost:5432/
3 - HugeGraph 内置用户权限与扩展权限配置及使用
概述
HugeGraph 为了方便不同用户场景下的鉴权使用,目前内置了完备的StandardAuthenticator
权限模式,支持多用户认证、
以及细粒度的权限访问控制,采用基于“用户 - 用户组 - 操作 - 资源”的 4 层设计,灵活控制用户角色与权限 (支持多 GraphServer)
StandardAuthenticator
模式的几个核心设计:
- 初始化时创建超级管理员 (
admin
) 用户,后续通过超级管理员创建其它用户,新创建的用户被分配足够权限后,可以创建或管理更多的用户 - 支持动态创建用户、用户组、资源,支持动态分配或取消权限
- 用户可以属于一个或多个用户组,每个用户组可以拥有对任意个资源的操作权限,操作类型包括:读、写、删除、执行等种类
- “资源” 描述了图数据库中的数据,比如符合某一类条件的顶点,每一个资源包括
type
、label
、properties
三个要素,共有 18 种类型、任意 label、任意 properties 可组合形成的资源,一个资源的内部条件是且关系,多个资源之间的条件是或关系
举例说明:
// 场景:某用户只有北京地区的数据读取权限
user(name=xx) -belong-> group(name=xx) -access(read)-> target(graph=graph1, resource={label: person, city: Beijing})
配置用户认证
HugeGraph 目前默认未启用用户认证功能,需通过修改配置文件来启用该功能。(Note: 如果在生产环境/外网使用, 请使用 Java11 版本 + 开启权限避免安全相关隐患)
目前已内置实现了StandardAuthenticator
模式,该模式支持多用户认证与细粒度权限控制。此外,开发者可以自定义实现HugeAuthenticator
接口来对接自身的权限系统。
用户认证方式均采用 HTTP Basic Authentication ,简单说就是在发送 HTTP 请求时在 Authentication
设置选择 Basic
然后输入对应的用户名和密码,对应 HTTP 明文如下所示 :
GET http://localhost:8080/graphs/hugegraph/schema/vertexlabels
Authorization: Basic admin xxxx
警告:在 1.5.0 之前版本的 HugeGraph-Server 在鉴权模式下存在 JWT 相关的安全隐患,请务必使用新版本或自行修改 JWT token 的 secretKey。
修改方式为在配置文件rest-server.properties
中重写auth.token_secret
信息:(1.5.0 后会默认生成随机值则无需配置)
auth.token_secret=XXXX #这里为 32 位 String,由 a-z,A-Z 和 0-9 组成
也可以通过下面的命令实现:
RANDOM_STRING=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 32)
echo "auth.token_secret=${RANDOM_STRING}" >> rest-server.properties
StandardAuthenticator 模式
StandardAuthenticator
模式是通过在数据库后端存储用户信息来支持用户认证和权限控制,该实现基于数据库存储的用户的名称与密码进行认证(密码已被加密),基于用户的角色来细粒度控制用户权限。下面是具体的配置流程(重启服务生效):
在配置文件gremlin-server.yaml
中配置authenticator
及其rest-server
文件路径:
authentication: {
authenticator: org.apache.hugegraph.auth.StandardAuthenticator,
authenticationHandler: org.apache.hugegraph.auth.WsAndHttpBasicAuthHandler,
config: {tokens: conf/rest-server.properties}
}
在配置文件rest-server.properties
中配置authenticator
及其graph_store
信息:
auth.authenticator=org.apache.hugegraph.auth.StandardAuthenticator
auth.graph_store=hugegraph
# auth client config
# 如果是分开部署 GraphServer 和 AuthServer,还需要指定下面的配置,地址填写 AuthServer 的 IP:RPC 端口
#auth.remote_url=127.0.0.1:8899,127.0.0.1:8898,127.0.0.1:8897
其中,graph_store
配置项是指使用哪一个图来存储用户信息,如果存在多个图的话,选取任意一个均可。
在配置文件hugegraph{n}.properties
中配置gremlin.graph
信息:
gremlin.graph=org.apache.hugegraph.auth.HugeFactoryAuthProxy
然后详细的权限 API 调用和说明请参考 Authentication-API 文档。
自定义用户认证系统
如果需要支持更加灵活的用户系统,可自定义 authenticator 进行扩展,自定义 authenticator 实现接口org.apache.hugegraph.auth.HugeAuthenticator
即可,然后修改配置文件中authenticator
配置项指向该实现。
基于鉴权模式启动
在鉴权配置完成后,需在首次执行 init-store.sh
时命令行中输入 admin
密码 (非 docker 部署模式下)
如果基于 docker 镜像部署或者已经初始化 HugeGraph 并需要转换为鉴权模式,需要删除相关图数据并重新启动 HugeGraph, 若图已有业务数据,暂时无法直接转换鉴权模式 (hugegraph 版本 <= 1.2.0)
对于该功能的改进已经在最新版本发布 (Docker latest 可用),可参考 PR 2411, 此时可无缝切换。
# stop the hugeGraph firstly
bin/stop-hugegraph.sh
# delete the store data (here we use the default path for rocksdb)
# Note: no need to delete data in the latest code (fixed in https://github.com/apache/incubator-hugegraph/pull/2411)
rm -rf rocksdb-data/
# init store again
bin/init-store.sh
# start hugeGraph again
bin/start-hugegraph.sh
使用 Docker 时开启鉴权模式
对于镜像 hugegraph/hugegraph
大于等于 1.2.0
的版本,我们可以在启动 docker
镜像的同时开启鉴权模式
具体做法如下:
1. 采用 docker run
在 docker run
中添加环境变量 PASSWORD=123456
(密码可以自由设置)即可开启鉴权模式::
docker run -itd -e PASSWORD=123456 --name=server -p 8080:8080 hugegraph/hugegraph:1.5.0
2. 采用 docker-compose
使用 docker-compose
在环境变量中设置 PASSWORD=123456
即可
version: '3'
services:
server:
image: hugegraph/hugegraph:1.5.0
container_name: server
ports:
- 8080:8080
environment:
- PASSWORD=123456
3. 进入容器后重新开启鉴权模式
首先进入容器:
docker exec -it server bash
# 用于快速修改配置, 修改前的文件被保存在conf-bak文件夹下
bin/enable-auth.sh
之后参照 基于鉴权模式启动 即可
4 - 配置 HugeGraphServer 使用 https 协议
概述
HugeGraphServer 默认使用的是 http 协议,如果用户对请求的安全性有要求,可以配置成 https。
服务端配置
修改 conf/rest-server.properties 配置文件,将 restserver.url 的 schema 部分改为 https。
# 将协议设置为 https
restserver.url=https://127.0.0.1:8080
# 服务端 keystore 文件路径,当协议为 https 时该默认值自动生效,可按需修改此项
ssl.keystore_file=conf/hugegraph-server.keystore
# 服务端 keystore 文件密码,当协议为 https 时该默认值自动生效,可按需修改此项
ssl.keystore_password=******
服务端的 conf 目录下已经给出了一个 keystore 文件hugegraph-server.keystore
,该文件的密码为hugegraph
,
这两项都是在开启了 https 协议时的默认值,用户可以生成自己的 keystore 文件及密码,然后修改ssl.keystore_file
和ssl.keystore_password
的值。
客户端配置
在 HugeGraph-Client 中使用 https
在构造 HugeClient 时传入 https 相关的配置,代码示例:
String url = "https://localhost:8080";
String graphName = "hugegraph";
HugeClientBuilder builder = HugeClient.builder(url, graphName);
// 客户端 keystore 文件路径
String trustStoreFilePath = "hugegraph.truststore";
// 客户端 keystore 密码
String trustStorePassword = "******";
builder.configSSL(trustStoreFilePath, trustStorePassword);
HugeClient hugeClient = builder.build();
注意:HugeGraph-Client 在 1.9.0 版本以前是直接以 new 的方式创建,并且不支持 https 协议,在 1.9.0 版本以后改成以 builder 的方式创建,并支持配置 https 协议。
在 HugeGraph-Loader 中使用 https
启动导入任务时,在命令行中添加如下选项:
# https
--protocol https
# 客户端证书文件路径,当指定 --protocol 为 https 时,默认值 conf/hugegraph.truststore 自动生效,可按需修改
--trust-store-file {file}
# 客户端证书文件密码,当指定 --protocol 为 https 时,默认值 hugegraph 自动生效,可按需修改
--trust-store-password {password}
hugegraph-loader 的 conf 目录下已经放了一个默认的客户端证书文件 hugegraph.truststore,其密码是 hugegraph。
在 HugeGraph-Tools 中使用 https
执行命令时,在命令行中添加如下选项:
# 客户端证书文件路径,当 url 中使用 https 协议时,默认值 conf/hugegraph.truststore 自动生效,可按需修改
--trust-store-file {file}
# 客户端证书文件密码,当 url 中使用 https 协议时,默认值 hugegraph 自动生效,可按需修改
--trust-store-password {password}
# 执行迁移命令时,当 --target-url 中使用 https 协议时,默认值 conf/hugegraph.truststore 自动生效,可按需修改
--target-trust-store-file {target-file}
# 执行迁移命令时,当 --target-url 中使用 https 协议时,默认值 hugegraph 自动生效,可按需修改
--target-trust-store-password {target-password}
hugegraph-tools 的 conf 目录下已经放了一个默认的客户端证书文件 hugegraph.truststore,其密码是 hugegraph。
如何生成证书文件
本部分给出生成证书的示例,如果默认的证书已经够用,或者已经知晓如何生成,可跳过。
服务端
- ⽣成服务端私钥,并且导⼊到服务端 keystore ⽂件中,server.keystore 是给服务端⽤的,其中保存着⾃⼰的私钥
keytool -genkey -alias serverkey -keyalg RSA -keystore server.keystore
过程中根据需求填写描述信息,默认证书的描述信息如下:
名字和姓⽒:hugegraph
组织单位名称:hugegraph
组织名称:hugegraph
城市或区域名称:BJ
州或省份名称:BJ
国家代码:CN
- 根据服务端私钥,导出服务端证书
keytool -export -alias serverkey -keystore server.keystore -file server.crt
server.crt 就是服务端的证书
客户端
keytool -import -alias serverkey -file server.crt -keystore client.truststore
client.truststore 是给客户端⽤的,其中保存着受信任的证书
5 - HugeGraph-Computer 配置
Computer Config Options
config option | default value | description |
---|---|---|
algorithm.message_class | org.apache.hugegraph.computer.core.config.Null | The class of message passed when compute vertex. |
algorithm.params_class | org.apache.hugegraph.computer.core.config.Null | The class used to transfer algorithms’ parameters before algorithm been run. |
algorithm.result_class | org.apache.hugegraph.computer.core.config.Null | The class of vertex’s value, the instance is used to store computation result for the vertex. |
allocator.max_vertices_per_thread | 10000 | Maximum number of vertices per thread processed in each memory allocator |
bsp.etcd_endpoints | http://localhost:2379 | The end points to access etcd. |
bsp.log_interval | 30000 | The log interval(in ms) to print the log while waiting bsp event. |
bsp.max_super_step | 10 | The max super step of the algorithm. |
bsp.register_timeout | 300000 | The max timeout to wait for master and works to register. |
bsp.wait_master_timeout | 86400000 | The max timeout(in ms) to wait for master bsp event. |
bsp.wait_workers_timeout | 86400000 | The max timeout to wait for workers bsp event. |
hgkv.max_data_block_size | 65536 | The max byte size of hgkv-file data block. |
hgkv.max_file_size | 2147483648 | The max number of bytes in each hgkv-file. |
hgkv.max_merge_files | 10 | The max number of files to merge at one time. |
hgkv.temp_file_dir | /tmp/hgkv | This folder is used to store temporary files, temporary files will be generated during the file merging process. |
hugegraph.name | hugegraph | The graph name to load data and write results back. |
hugegraph.url | http://127.0.0.1:8080 | The hugegraph url to load data and write results back. |
input.edge_direction | OUT | The data of the edge in which direction is loaded, when the value is BOTH, the edges in both OUT and IN direction will be loaded. |
input.edge_freq | MULTIPLE | The frequency of edges can exist between a pair of vertices, allowed values: [SINGLE, SINGLE_PER_LABEL, MULTIPLE]. SINGLE means that only one edge can exist between a pair of vertices, use sourceId + targetId to identify it; SINGLE_PER_LABEL means that each edge label can exist one edge between a pair of vertices, use sourceId + edgelabel + targetId to identify it; MULTIPLE means that many edge can exist between a pair of vertices, use sourceId + edgelabel + sortValues + targetId to identify it. |
input.filter_class | org.apache.hugegraph.computer.core.input.filter.DefaultInputFilter | The class to create input-filter object, input-filter is used to Filter vertex edges according to user needs. |
input.loader_schema_path | The schema path of loader input, only takes effect when the input.source_type=loader is enabled | |
input.loader_struct_path | The struct path of loader input, only takes effect when the input.source_type=loader is enabled | |
input.max_edges_in_one_vertex | 200 | The maximum number of adjacent edges allowed to be attached to a vertex, the adjacent edges will be stored and transferred together as a batch unit. |
input.source_type | hugegraph-server | The source type to load input data, allowed values: [‘hugegraph-server’, ‘hugegraph-loader’], the ‘hugegraph-loader’ means use hugegraph-loader load data from HDFS or file, if use ‘hugegraph-loader’ load data then please config ‘input.loader_struct_path’ and ‘input.loader_schema_path’. |
input.split_fetch_timeout | 300 | The timeout in seconds to fetch input splits |
input.split_max_splits | 10000000 | The maximum number of input splits |
input.split_page_size | 500 | The page size for streamed load input split data |
input.split_size | 1048576 | The input split size in bytes |
job.id | local_0001 | The job id on Yarn cluster or K8s cluster. |
job.partitions_count | 1 | The partitions count for computing one graph algorithm job. |
job.partitions_thread_nums | 4 | The number of threads for partition parallel compute. |
job.workers_count | 1 | The workers count for computing one graph algorithm job. |
master.computation_class | org.apache.hugegraph.computer.core.master.DefaultMasterComputation | Master-computation is computation that can determine whether to continue next superstep. It runs at the end of each superstep on master. |
output.batch_size | 500 | The batch size of output |
output.batch_threads | 1 | The threads number used to batch output |
output.hdfs_core_site_path | The hdfs core site path. | |
output.hdfs_delimiter | , | The delimiter of hdfs output. |
output.hdfs_kerberos_enable | false | Is Kerberos authentication enabled for Hdfs. |
output.hdfs_kerberos_keytab | The Hdfs’s key tab file for kerberos authentication. | |
output.hdfs_kerberos_principal | The Hdfs’s principal for kerberos authentication. | |
output.hdfs_krb5_conf | /etc/krb5.conf | Kerberos configuration file. |
output.hdfs_merge_partitions | true | Whether merge output files of multiple partitions. |
output.hdfs_path_prefix | /hugegraph-computer/results | The directory of hdfs output result. |
output.hdfs_replication | 3 | The replication number of hdfs. |
output.hdfs_site_path | The hdfs site path. | |
output.hdfs_url | hdfs://127.0.0.1:9000 | The hdfs url of output. |
output.hdfs_user | hadoop | The hdfs user of output. |
output.output_class | org.apache.hugegraph.computer.core.output.LogOutput | The class to output the computation result of each vertex. Be called after iteration computation. |
output.result_name | value | The value is assigned dynamically by #name() of instance created by WORKER_COMPUTATION_CLASS. |
output.result_write_type | OLAP_COMMON | The result write-type to output to hugegraph, allowed values are: [OLAP_COMMON, OLAP_SECONDARY, OLAP_RANGE]. |
output.retry_interval | 10 | The retry interval when output failed |
output.retry_times | 3 | The retry times when output failed |
output.single_threads | 1 | The threads number used to single output |
output.thread_pool_shutdown_timeout | 60 | The timeout seconds of output threads pool shutdown |
output.with_adjacent_edges | false | Output the adjacent edges of the vertex or not |
output.with_edge_properties | false | Output the properties of the edge or not |
output.with_vertex_properties | false | Output the properties of the vertex or not |
sort.thread_nums | 4 | The number of threads performing internal sorting. |
transport.client_connect_timeout | 3000 | The timeout(in ms) of client connect to server. |
transport.client_threads | 4 | The number of transport threads for client. |
transport.close_timeout | 10000 | The timeout(in ms) of close server or close client. |
transport.finish_session_timeout | 0 | The timeout(in ms) to finish session, 0 means using (transport.sync_request_timeout * transport.max_pending_requests). |
transport.heartbeat_interval | 20000 | The minimum interval(in ms) between heartbeats on client side. |
transport.io_mode | AUTO | The network IO Mode, either ‘NIO’, ‘EPOLL’, ‘AUTO’, the ‘AUTO’ means selecting the property mode automatically. |
transport.max_pending_requests | 8 | The max number of client unreceived ack, it will trigger the sending unavailable if the number of unreceived ack >= max_pending_requests. |
transport.max_syn_backlog | 511 | The capacity of SYN queue on server side, 0 means using system default value. |
transport.max_timeout_heartbeat_count | 120 | The maximum times of timeout heartbeat on client side, if the number of timeouts waiting for heartbeat response continuously > max_heartbeat_timeouts the channel will be closed from client side. |
transport.min_ack_interval | 200 | The minimum interval(in ms) of server reply ack. |
transport.min_pending_requests | 6 | The minimum number of client unreceived ack, it will trigger the sending available if the number of unreceived ack < min_pending_requests. |
transport.network_retries | 3 | The number of retry attempts for network communication,if network unstable. |
transport.provider_class | org.apache.hugegraph.computer.core.network.netty.NettyTransportProvider | The transport provider, currently only supports Netty. |
transport.receive_buffer_size | 0 | The size of socket receive-buffer in bytes, 0 means using system default value. |
transport.recv_file_mode | true | Whether enable receive buffer-file mode, it will receive buffer write file from socket by zero-copy if enable. |
transport.send_buffer_size | 0 | The size of socket send-buffer in bytes, 0 means using system default value. |
transport.server_host | 127.0.0.1 | The server hostname or ip to listen on to transfer data. |
transport.server_idle_timeout | 360000 | The max timeout(in ms) of server idle. |
transport.server_port | 0 | The server port to listen on to transfer data. The system will assign a random port if it’s set to 0. |
transport.server_threads | 4 | The number of transport threads for server. |
transport.sync_request_timeout | 10000 | The timeout(in ms) to wait response after sending sync-request. |
transport.tcp_keep_alive | true | Whether enable TCP keep-alive. |
transport.transport_epoll_lt | false | Whether enable EPOLL level-trigger. |
transport.write_buffer_high_mark | 67108864 | The high water mark for write buffer in bytes, it will trigger the sending unavailable if the number of queued bytes > write_buffer_high_mark. |
transport.write_buffer_low_mark | 33554432 | The low water mark for write buffer in bytes, it will trigger the sending available if the number of queued bytes < write_buffer_low_mark.org.apache.hugegraph.config.OptionChecker$$Lambda$97/0x00000008001c8440@776a6d9b |
transport.write_socket_timeout | 3000 | The timeout(in ms) to write data to socket buffer. |
valuefile.max_segment_size | 1073741824 | The max number of bytes in each segment of value-file. |
worker.combiner_class | org.apache.hugegraph.computer.core.config.Null | Combiner can combine messages into one value for a vertex, for example page-rank algorithm can combine messages of a vertex to a sum value. |
worker.computation_class | org.apache.hugegraph.computer.core.config.Null | The class to create worker-computation object, worker-computation is used to compute each vertex in each superstep. |
worker.data_dirs | [jobs] | The directories separated by ‘,’ that received vertices and messages can persist into. |
worker.edge_properties_combiner_class | org.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombiner | The combiner can combine several properties of the same edge into one properties at inputstep. |
worker.partitioner | org.apache.hugegraph.computer.core.graph.partition.HashPartitioner | The partitioner that decides which partition a vertex should be in, and which worker a partition should be in. |
worker.received_buffers_bytes_limit | 104857600 | The limit bytes of buffers of received data, the total size of all buffers can’t excess this limit. If received buffers reach this limit, they will be merged into a file. |
worker.vertex_properties_combiner_class | org.apache.hugegraph.computer.core.combiner.OverwritePropertiesCombiner | The combiner can combine several properties of the same vertex into one properties at inputstep. |
worker.wait_finish_messages_timeout | 86400000 | The max timeout(in ms) message-handler wait for finish-message of all workers. |
worker.wait_sort_timeout | 600000 | The max timeout(in ms) message-handler wait for sort-thread to sort one batch of buffers. |
worker.write_buffer_capacity | 52428800 | The initial size of write buffer that used to store vertex or message. |
worker.write_buffer_threshold | 52428800 | The threshold of write buffer, exceeding it will trigger sorting, the write buffer is used to store vertex or message. |
K8s Operator Config Options
NOTE: Option needs to be converted through environment variable settings, e.g. k8s.internal_etcd_url => INTERNAL_ETCD_URL
config option | default value | description |
---|---|---|
k8s.auto_destroy_pod | true | Whether to automatically destroy all pods when the job is completed or failed. |
k8s.close_reconciler_timeout | 120 | The max timeout(in ms) to close reconciler. |
k8s.internal_etcd_url | http://127.0.0.1:2379 | The internal etcd url for operator system. |
k8s.max_reconcile_retry | 3 | The max retry times of reconcile. |
k8s.probe_backlog | 50 | The maximum backlog for serving health probes. |
k8s.probe_port | 9892 | The value is the port that the controller bind to for serving health probes. |
k8s.ready_check_internal | 1000 | The time interval(ms) of check ready. |
k8s.ready_timeout | 30000 | The max timeout(in ms) of check ready. |
k8s.reconciler_count | 10 | The max number of reconciler thread. |
k8s.resync_period | 600000 | The minimum frequency at which watched resources are reconciled. |
k8s.timezone | Asia/Shanghai | The timezone of computer job and operator. |
k8s.watch_namespace | hugegraph-computer-system | The value is watch custom resources in the namespace, ignore other namespaces, the ‘*’ means is all namespaces will be watched. |
HugeGraph-Computer CRD
spec | default value | description | required |
---|---|---|---|
algorithmName | The name of algorithm. | true | |
jobId | The job id. | true | |
image | The image of algorithm. | true | |
computerConf | The map of computer config options. | true | |
workerInstances | The number of worker instances, it will instead the ‘job.workers_count’ option. | true | |
pullPolicy | Always | The pull-policy of image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy | false |
pullSecrets | The pull-secrets of Image, detail please refer to: https://kubernetes.io/docs/concepts/containers/images/#specifying-imagepullsecrets-on-a-pod | false | |
masterCpu | The cpu limit of master, the unit can be ’m’ or without unit detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu | false | |
workerCpu | The cpu limit of worker, the unit can be ’m’ or without unit detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu | false | |
masterMemory | The memory limit of master, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory | false | |
workerMemory | The memory limit of worker, the unit can be one of Ei、Pi、Ti、Gi、Mi、Ki detail please refer to:https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory | false | |
log4jXml | The content of log4j.xml for computer job. | false | |
jarFile | The jar path of computer algorithm. | false | |
remoteJarUri | The remote jar uri of computer algorithm, it will overlay algorithm image. | false | |
jvmOptions | The java startup parameters of computer job. | false | |
envVars | please refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-interdependent-environment-variables/ | false | |
envFrom | please refer to: https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/ | false | |
masterCommand | bin/start-computer.sh | The run command of master, equivalent to ‘Entrypoint’ field of Docker. | false |
masterArgs | ["-r master", “-d k8s”] | The run args of master, equivalent to ‘Cmd’ field of Docker. | false |
workerCommand | bin/start-computer.sh | The run command of worker, equivalent to ‘Entrypoint’ field of Docker. | false |
workerArgs | ["-r worker", “-d k8s”] | The run args of worker, equivalent to ‘Cmd’ field of Docker. | false |
volumes | Please refer to: https://kubernetes.io/docs/concepts/storage/volumes/ | false | |
volumeMounts | Please refer to: https://kubernetes.io/docs/concepts/storage/volumes/ | false | |
secretPaths | The map of k8s-secret name and mount path. | false | |
configMapPaths | The map of k8s-configmap name and mount path. | false | |
podTemplateSpec | Please refer to: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-template-v1/#PodTemplateSpec | false | |
securityContext | Please refer to: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/ | false |
KubeDriver Config Options
config option | default value | description |
---|---|---|
k8s.build_image_bash_path | The path of command used to build image. | |
k8s.enable_internal_algorithm | true | Whether enable internal algorithm. |
k8s.framework_image_url | hugegraph/hugegraph-computer:latest | The image url of computer framework. |
k8s.image_repository_password | The password for login image repository. | |
k8s.image_repository_registry | The address for login image repository. | |
k8s.image_repository_url | hugegraph/hugegraph-computer | The url of image repository. |
k8s.image_repository_username | The username for login image repository. | |
k8s.internal_algorithm | [pageRank] | The name list of all internal algorithm. |
k8s.internal_algorithm_image_url | hugegraph/hugegraph-computer:latest | The image url of internal algorithm. |
k8s.jar_file_dir | /cache/jars/ | The directory where the algorithm jar to upload location. |
k8s.kube_config | ~/.kube/config | The path of k8s config file. |
k8s.log4j_xml_path | The log4j.xml path for computer job. | |
k8s.namespace | hugegraph-computer-system | The namespace of hugegraph-computer system. |
k8s.pull_secret_names | [] | The names of pull-secret for pulling image. |