Completion Endpoints

Endpoints

The Completion service provides endpoints for text completion, service monitoring, and web interface access.

Service Information

GET /

Returns service health status and information. Also verifies database connectivity.

Response

{
  "status": "healthy",
  "env": "production",
  "ui": "https://completion.genstack.app/ui"
}

Status Codes

200 - Service is healthy
500 - Internal server error or database connection failure

Text Completion

POST /

Generates text completion using the specified AI model.

Request

Content-Type: application/json

{
  "genstackToken": "user_auth_token",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Tell me about AI." }
  ],
  "model": "gpt-4-mini",
  "developerId": "developer_genstack_id", // optional
  "applicationId": "com.example.dev.app",
  "temperature": 0.7,
  "maxOutputTokens": 500,
  "overrideGrossMarginPercent": 50 // optional
}

Parameters: | Name | Type | Required | Allowed Values | Description | |------|------|----------|----------------|-------------| | genstackToken | string | Yes | Valid JWT token | Valid Genstack authentication token | | messages | array | Yes | Array of message objects | Array of message objects with role and content | | model | string | Yes | “llama-3.3”, “gpt-4-mini”, “claude-3.5-sonnet” | ID of the AI model to use | | developerId | string | No | Valid developer ID | Developer’s unique identifier | | applicationId | string | No | Valid application ID | Application’s unique identifier | | temperature | number | No | 0.0 to 5.0 (model dependent) | Sampling temperature | | maxOutputTokens | number | No | 1 to 16384 (model dependent) | Maximum tokens in the completion | | overrideGrossMarginPercent | number | No | 0-99 | Custom gross margin percentage |

Model-Specific Limits: | Model | Temperature Range | Max Output Tokens | Max Input Tokens | |-------|------------------|-------------------|------------------| | llama-3.3 | 0.0 - 5.0 | 2048 | 128000 | | gpt-4-mini | 0.0 - 2.0 | 16384 | 128000 | | claude-3.5-sonnet | 0.0 - 1.0 | 8192 | 200000 |

Response

{
  "model": "gpt-4-mini",
  "output": "AI, or Artificial Intelligence, refers to...",
  "usage": {
    "type": "completion",
    "promptTokens": 20,
    "completionTokens": 150,
    "inputCost": "0.000005",
    "outputCost": "0.001"
  },
  "balanceBefore": "1000",
  "balanceAfter": "660"
}

Streaming Completion

POST /stream

Generates text completion with streaming output. Returns chunks of the completion as they’re generated.

Request

Same parameters as the standard completion endpoint (POST /).

Response

Content-Type: text/event-stream

The response is a stream of Server-Sent Events (SSE). Each event contains a JSON payload with one of the following formats:

{
  "token": "next_word_or_token"
}

{
  "completed": true,
  "response": {
    // Same format as standard completion response
  }
}

{
  "aborted": true,
  "response": {
    // Partial completion response
  }
}

The response headers include a X-Stream-Id header containing a unique identifier for the stream.

POST /stream/shutdown

Terminates an active streaming completion.

Request

Content-Type: application/json

{
  "streamId": "unique_stream_identifier"
}

Response

200 - Stream successfully terminated
404 - Stream not found
500 - Error terminating stream

Web Interface

GET /ui

Provides a web-based interface for text completion.

Response

Returns an HTML page with the completion interface.

Error Responses

All error responses follow this format:

{
  "error": {
    "type": "ErrorType",
    "message": "Description of what went wrong",
    "details": {
      // Additional error context
    }
  }
}

Common error types include:

InvalidRequest
InsufficientFunds
InvalidCredentials
GatewayError
InternalError