Skip to content

Completion Endpoints

Endpoints

The Completion service provides endpoints for text completion, service monitoring, and web interface access.

Service Information

GET /

Returns service health status and information. Also verifies database connectivity.

Response

{
"status": "healthy",
"env": "production",
"ui": "https://completion.genstack.app/ui"
}

Status Codes

  • 200 - Service is healthy
  • 500 - Internal server error or database connection failure

Text Completion

POST /

Generates text completion using the specified AI model.

Request

Content-Type: application/json

{
"genstackToken": "user_auth_token",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Tell me about AI." }
],
"model": "gpt-4-mini",
"developerId": "developer_genstack_id", // optional
"applicationId": "com.example.dev.app",
"temperature": 0.7,
"maxOutputTokens": 500,
"overrideGrossMarginPercent": 50 // optional
}

Parameters: | Name | Type | Required | Allowed Values | Description | |------|------|----------|----------------|-------------| | genstackToken | string | Yes | Valid JWT token | Valid Genstack authentication token | | messages | array | Yes | Array of message objects | Array of message objects with role and content | | model | string | Yes | “llama-3.3”, “gpt-4-mini”, “claude-3.5-sonnet” | ID of the AI model to use | | developerId | string | No | Valid developer ID | Developer’s unique identifier | | applicationId | string | No | Valid application ID | Application’s unique identifier | | temperature | number | No | 0.0 to 5.0 (model dependent) | Sampling temperature | | maxOutputTokens | number | No | 1 to 16384 (model dependent) | Maximum tokens in the completion | | overrideGrossMarginPercent | number | No | 0-99 | Custom gross margin percentage |

Model-Specific Limits: | Model | Temperature Range | Max Output Tokens | Max Input Tokens | |-------|------------------|-------------------|------------------| | llama-3.3 | 0.0 - 5.0 | 2048 | 128000 | | gpt-4-mini | 0.0 - 2.0 | 16384 | 128000 | | claude-3.5-sonnet | 0.0 - 1.0 | 8192 | 200000 |

Response

{
"model": "gpt-4-mini",
"output": "AI, or Artificial Intelligence, refers to...",
"usage": {
"type": "completion",
"promptTokens": 20,
"completionTokens": 150,
"inputCost": "0.000005",
"outputCost": "0.001"
},
"balanceBefore": "1000",
"balanceAfter": "660"
}

Streaming Completion

POST /stream

Generates text completion with streaming output. Returns chunks of the completion as they’re generated.

Request

Same parameters as the standard completion endpoint (POST /).

Response

Content-Type: text/event-stream

The response is a stream of Server-Sent Events (SSE). Each event contains a JSON payload with one of the following formats:

{
"token": "next_word_or_token"
}
{
"completed": true,
"response": {
// Same format as standard completion response
}
}
{
"aborted": true,
"response": {
// Partial completion response
}
}

The response headers include a X-Stream-Id header containing a unique identifier for the stream.

POST /stream/shutdown

Terminates an active streaming completion.

Request

Content-Type: application/json

{
"streamId": "unique_stream_identifier"
}

Response

  • 200 - Stream successfully terminated
  • 404 - Stream not found
  • 500 - Error terminating stream

Web Interface

GET /ui

Provides a web-based interface for text completion.

Response

Returns an HTML page with the completion interface.

Error Responses

All error responses follow this format:

{
"error": {
"type": "ErrorType",
"message": "Description of what went wrong",
"details": {
// Additional error context
}
}
}

Common error types include:

  • InvalidRequest
  • InsufficientFunds
  • InvalidCredentials
  • GatewayError
  • InternalError