REST API Reference

HugeGraph-LLM provides REST API endpoints for integrating RAG and Text2Gremlin capabilities into your applications.

Base URL

http://localhost:8001

Change host/port as configured when starting the service:

python -m hugegraph_llm.demo.rag_demo.app --host 127.0.0.1 --port 8001

Authentication

Currently, the API supports optional token-based authentication:

# Enable authentication in .env
ENABLE_LOGIN=true
USER_TOKEN=your-user-token
ADMIN_TOKEN=your-admin-token

Pass tokens in request headers:

Authorization: Bearer <token>

RAG Endpoints

1. Complete RAG Query

POST /rag

Execute a full RAG pipeline including keyword extraction, graph retrieval, vector search, reranking, and answer generation.

Request Body

{
  "query": "Tell me about Al Pacino's movies",
  "raw_answer": false,
  "vector_only": false,
  "graph_only": true,
  "graph_vector_answer": false,
  "graph_ratio": 0.5,
  "rerank_method": "cohere",
  "near_neighbor_first": false,
  "gremlin_tmpl_num": 5,
  "max_graph_items": 30,
  "topk_return_results": 20,
  "vector_dis_threshold": 0.9,
  "topk_per_keyword": 1,
  "custom_priority_info": "",
  "answer_prompt": "",
  "keywords_extract_prompt": "",
  "gremlin_prompt": "",
  "client_config": {
    "url": "127.0.0.1:8080",
    "graph": "hugegraph",
    "user": "admin",
    "pwd": "admin",
    "gs": ""
  }
}

Parameters:

FieldTypeRequiredDefaultDescription
querystringYes-User’s natural language question
raw_answerbooleanNofalseReturn LLM answer without retrieval
vector_onlybooleanNofalseUse only vector search (no graph)
graph_onlybooleanNofalseUse only graph retrieval (no vector)
graph_vector_answerbooleanNofalseCombine graph and vector results
graph_ratiofloatNo0.5Ratio of graph vs vector results (0-1)
rerank_methodstringNo""Reranker: “cohere”, “siliconflow”, ""
near_neighbor_firstbooleanNofalsePrioritize direct neighbors
gremlin_tmpl_numintegerNo5Number of Gremlin templates to try
max_graph_itemsintegerNo30Max items from graph retrieval
topk_return_resultsintegerNo20Top-K after reranking
vector_dis_thresholdfloatNo0.9Vector similarity threshold (0-1)
topk_per_keywordintegerNo1Top-K vectors per keyword
custom_priority_infostringNo""Custom context to prioritize
answer_promptstringNo""Custom answer generation prompt
keywords_extract_promptstringNo""Custom keyword extraction prompt
gremlin_promptstringNo""Custom Gremlin generation prompt
client_configobjectNonullOverride graph connection settings

Response

{
  "query": "Tell me about Al Pacino's movies",
  "graph_only": {
    "answer": "Al Pacino starred in The Godfather (1972), directed by Francis Ford Coppola...",
    "context": ["The Godfather is a 1972 crime film...", "..."],
    "graph_paths": ["..."],
    "keywords": ["Al Pacino", "movies"]
  }
}

Example (curl)

curl -X POST http://localhost:8001/rag \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Tell me about Al Pacino",
    "graph_only": true,
    "max_graph_items": 30
  }'

2. Graph Retrieval Only

POST /rag/graph

Retrieve graph context without generating an answer. Useful for debugging or custom processing.

Request Body

{
  "query": "Al Pacino movies",
  "max_graph_items": 30,
  "topk_return_results": 20,
  "vector_dis_threshold": 0.9,
  "topk_per_keyword": 1,
  "gremlin_tmpl_num": 5,
  "rerank_method": "cohere",
  "near_neighbor_first": false,
  "custom_priority_info": "",
  "gremlin_prompt": "",
  "get_vertex_only": false,
  "client_config": {
    "url": "127.0.0.1:8080",
    "graph": "hugegraph",
    "user": "admin",
    "pwd": "admin",
    "gs": ""
  }
}

Additional Parameter:

FieldTypeDefaultDescription
get_vertex_onlybooleanfalseReturn only vertex IDs without full details

Response

{
  "graph_recall": {
    "query": "Al Pacino movies",
    "keywords": ["Al Pacino", "movies"],
    "match_vids": ["1:Al Pacino", "2:The Godfather"],
    "graph_result_flag": true,
    "gremlin": "g.V('1:Al Pacino').outE().inV().limit(30)",
    "graph_result": [
      {"id": "1:Al Pacino", "label": "person", "properties": {"name": "Al Pacino"}},
      {"id": "2:The Godfather", "label": "movie", "properties": {"title": "The Godfather"}}
    ],
    "vertex_degree_list": [5, 12]
  }
}

Example (curl)

curl -X POST http://localhost:8001/rag/graph \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Al Pacino",
    "max_graph_items": 30,
    "get_vertex_only": false
  }'

Text2Gremlin Endpoint

3. Natural Language to Gremlin

POST /text2gremlin

Convert natural language queries to executable Gremlin commands.

Request Body

{
  "query": "Find all movies directed by Francis Ford Coppola",
  "example_num": 5,
  "gremlin_prompt": "",
  "output_types": ["GREMLIN", "RESULT"],
  "client_config": {
    "url": "127.0.0.1:8080",
    "graph": "hugegraph",
    "user": "admin",
    "pwd": "admin",
    "gs": ""
  }
}

Parameters:

FieldTypeRequiredDefaultDescription
querystringYes-Natural language query
example_numintegerNo5Number of example templates to use
gremlin_promptstringNo""Custom prompt for Gremlin generation
output_typesarrayNonullOutput types: [“GREMLIN”, “RESULT”, “CYPHER”]
client_configobjectNonullGraph connection override

Output Types:

  • GREMLIN: Generated Gremlin query
  • RESULT: Execution result from graph
  • CYPHER: Cypher query (if requested)

Response

{
  "gremlin": "g.V().has('person','name','Francis Ford Coppola').out('directed').hasLabel('movie').values('title')",
  "result": [
    "The Godfather",
    "The Godfather Part II",
    "Apocalypse Now"
  ]
}

Example (curl)

curl -X POST http://localhost:8001/text2gremlin \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Find all movies directed by Francis Ford Coppola",
    "output_types": ["GREMLIN", "RESULT"]
  }'

Configuration Endpoints

4. Update Graph Connection

POST /config/graph

Dynamically update HugeGraph connection settings.

Request Body

{
  "url": "127.0.0.1:8080",
  "name": "hugegraph",
  "user": "admin",
  "pwd": "admin",
  "gs": ""
}

Response

{
  "status_code": 201,
  "message": "Graph configuration updated successfully"
}

5. Update LLM Configuration

POST /config/llm

Update chat/extract LLM settings at runtime.

Request Body (OpenAI)

{
  "llm_type": "openai",
  "api_key": "sk-your-api-key",
  "api_base": "https://api.openai.com/v1",
  "language_model": "gpt-4o-mini",
  "max_tokens": 4096
}

Request Body (Ollama)

{
  "llm_type": "ollama/local",
  "host": "127.0.0.1",
  "port": 11434,
  "language_model": "llama3.1:8b"
}

6. Update Embedding Configuration

POST /config/embedding

Update embedding model settings.

Request Body

{
  "llm_type": "openai",
  "api_key": "sk-your-api-key",
  "api_base": "https://api.openai.com/v1",
  "language_model": "text-embedding-3-small"
}

7. Update Reranker Configuration

POST /config/rerank

Configure reranker settings.

Request Body (Cohere)

{
  "reranker_type": "cohere",
  "api_key": "your-cohere-key",
  "reranker_model": "rerank-multilingual-v3.0",
  "cohere_base_url": "https://api.cohere.com/v1/rerank"
}

Request Body (SiliconFlow)

{
  "reranker_type": "siliconflow",
  "api_key": "your-siliconflow-key",
  "reranker_model": "BAAI/bge-reranker-v2-m3"
}

Error Responses

All endpoints return standard HTTP status codes:

CodeMeaning
200Success
201Created (config updated)
400Bad Request (invalid parameters)
500Internal Server Error
501Not Implemented

Error response format:

{
  "detail": "Error message describing what went wrong"
}

Python Client Example

import requests

BASE_URL = "http://localhost:8001"

# 1. Configure graph connection
graph_config = {
    "url": "127.0.0.1:8080",
    "name": "hugegraph",
    "user": "admin",
    "pwd": "admin"
}
requests.post(f"{BASE_URL}/config/graph", json=graph_config)

# 2. Execute RAG query
rag_request = {
    "query": "Tell me about Al Pacino",
    "graph_only": True,
    "max_graph_items": 30
}
response = requests.post(f"{BASE_URL}/rag", json=rag_request)
print(response.json())

# 3. Generate Gremlin from natural language
text2gql_request = {
    "query": "Find all directors who worked with Al Pacino",
    "output_types": ["GREMLIN", "RESULT"]
}
response = requests.post(f"{BASE_URL}/text2gremlin", json=text2gql_request)
print(response.json())

See Also