Perform chat completion with streaming response.
Request payload is a MultilingualUserInput
object. Returns a stream of LLMChunk
objectsfollowed by an
LLMReply` object.
POST
Authorizations
Body
application/json
User input for a completion request.
The LLM prompt.
A model name, for example 'sutra-light'.
Available options:
sutra-light
, sutra-pro
, sutra-turbo
The maximum number of tokens to generate before terminating. This number cannot exceed the context window for the selected model. The default value is 1024.
May be a string, null or an array of strings.
Controls the randomness of the response, a lower temperature gives lower randomness. Values are in the range [0,2] with a default value of 0.3.