Perform chat completion with streaming response.
Request payload is a MultilingualUserInput
object. Returns a stream of LLMChunk
objectsfollowed by an
LLMReply` object.
POST
/
v1
/
sutra-light
/
completion
Authorizations
authorization
string
headerrequiredBody
application/json
User input for a completion request.
model
enum<string>
requiredA model name, for example 'sutra-light'.
Available options:
sutra-light
, sutra-pro
, sutra-turbo
messages
object[]
requiredThe LLM prompt.
max_tokens
number
The maximum number of tokens to generate before terminating. This number cannot exceed the context window for the selected model. The default value is 1024.
temperature
number
Controls the randomness of the response, a lower temperature gives lower randomness. Values are in the range [0,2] with a default value of 0.3.
stop
object
May be a string, null or an array of strings.
presence_penalty
number
frequency_penalty
number
top_p
number
Response
200 - application/x-ndjson
A chunk of JSON objects.
typeName
enum<string>
requiredAvailable options:
LLMChunk
, LLMReply
isFinal
boolean
Indicates if this is the final chunk.