Completion Endpoints
Endpoints
The Completion service provides endpoints for text completion, service monitoring, and web interface access.
Service Information
GET /
Returns service health status and information. Also verifies database connectivity.
Response
{ "status": "healthy", "env": "production", "ui": "https://completion.genstack.app/ui"}
Status Codes
200
- Service is healthy500
- Internal server error or database connection failure
Text Completion
POST /
Generates text completion using the specified AI model.
Request
Content-Type: application/json
{ "genstackToken": "user_auth_token", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Tell me about AI." } ], "model": "gpt-4-mini", "developerId": "developer_genstack_id", // optional "applicationId": "com.example.dev.app", "temperature": 0.7, "maxOutputTokens": 500, "overrideGrossMarginPercent": 50 // optional}
Parameters: | Name | Type | Required | Allowed Values | Description | |------|------|----------|----------------|-------------| | genstackToken | string | Yes | Valid JWT token | Valid Genstack authentication token | | messages | array | Yes | Array of message objects | Array of message objects with role and content | | model | string | Yes | “llama-3.3”, “gpt-4-mini”, “claude-3.5-sonnet” | ID of the AI model to use | | developerId | string | No | Valid developer ID | Developer’s unique identifier | | applicationId | string | No | Valid application ID | Application’s unique identifier | | temperature | number | No | 0.0 to 5.0 (model dependent) | Sampling temperature | | maxOutputTokens | number | No | 1 to 16384 (model dependent) | Maximum tokens in the completion | | overrideGrossMarginPercent | number | No | 0-99 | Custom gross margin percentage |
Model-Specific Limits: | Model | Temperature Range | Max Output Tokens | Max Input Tokens | |-------|------------------|-------------------|------------------| | llama-3.3 | 0.0 - 5.0 | 2048 | 128000 | | gpt-4-mini | 0.0 - 2.0 | 16384 | 128000 | | claude-3.5-sonnet | 0.0 - 1.0 | 8192 | 200000 |
Response
{ "model": "gpt-4-mini", "output": "AI, or Artificial Intelligence, refers to...", "usage": { "type": "completion", "promptTokens": 20, "completionTokens": 150, "inputCost": "0.000005", "outputCost": "0.001" }, "balanceBefore": "1000", "balanceAfter": "660"}
Streaming Completion
POST /stream
Generates text completion with streaming output. Returns chunks of the completion as they’re generated.
Request
Same parameters as the standard completion endpoint (POST /).
Response
Content-Type: text/event-stream
The response is a stream of Server-Sent Events (SSE). Each event contains a JSON payload with one of the following formats:
{ "token": "next_word_or_token"}
{ "completed": true, "response": { // Same format as standard completion response }}
{ "aborted": true, "response": { // Partial completion response }}
The response headers include a X-Stream-Id
header containing a unique
identifier for the stream.
POST /stream/shutdown
Terminates an active streaming completion.
Request
Content-Type: application/json
{ "streamId": "unique_stream_identifier"}
Response
200
- Stream successfully terminated404
- Stream not found500
- Error terminating stream
Web Interface
GET /ui
Provides a web-based interface for text completion.
Response
Returns an HTML page with the completion interface.
Error Responses
All error responses follow this format:
{ "error": { "type": "ErrorType", "message": "Description of what went wrong", "details": { // Additional error context } }}
Common error types include:
InvalidRequest
InsufficientFunds
InvalidCredentials
GatewayError
InternalError