Skip to main content
Back to API List This API deploys an open-source or fine-tuned model in the Ready to Deploy state. Users can configure deployment parameters, including hyperparameters, scaling, and optimization settings, allowing for flexible model scaling and performance tuning. The API response includes the model ID and the model deployment status. After receiving the response, use the dockStatusId to call the Get Dock Status API and verify the successful deployment of the model.
MethodPOST
Endpointhttps://{host}/api/public/models/:{<i>modelId</i>}/deploy?modelType={<i>modelType</i>}
Content Typeapplication/json
AuthorizationX-api-key - The API key used for authentication.
To use the API, create an API key.

Query Parameters

PARAMETERDESCRIPTIONTYPEREQUIRED/OPTIONALENUM VALUES
hostThe environment URL. For example, https://agent-platform.domain.ai/.StringRequiredN/A
modelIdThe model ID to deploy.StringRequiredN/A
modelTypeType of model being deployed.StringRequired[“openSource”, “fineTune”]

Sample Request

For an Opensource Model Source
curl --location 'https://{host}/api/public/models/cm-2xxxxxxxxxxxxxxxxxx0/deploy?modelType=openSource' 
--header 'x-api-key: kg-axxxxxxx-5xx3-5xx8-bxxb-9xxxxxxxxxx-ebxxxxxx-5xxb-4xxb-9xx5-cxxxxxxxxx3' 
--header 'Content-Type: application/json' 
--data '{
    "name": "Flant5_model",
    "hyperParameters": {
      "temperature": 1,
      "maxTokens": 512,
      "topP": 1,
      "topK": 50,
      "stopSequence": []
    },
    "scalingParameters": {
      "maxBatchSize": 10,
      "minReplicas": 1,
      "maxReplicas": 2,
      "scaleUpDelay": 30,
      "scaleDownDelay": 600
    },
    "deviceType": "g5.xlarge",
    "optimizationInfo": {
      "optimizationType": "",
      "quantizationType": ""
    },
    "isDeployedPreviously": true
  }'
For a Fine-tune Model Source
curl --location ' https://{host}/api/public/models/cm-6xxxxxxxxxxxxxxxxxx9/deploy?modelType=fineTune' 
--header 'x-api-key: kg-2xxxxxxxxxxxxxxxxxxf-7xxxxxxx-7xx8-4xxf-8xx7-dxxxxxxxxxx3' 
--header 'Content-Type: application/json' 
--data '{
    "name": "gpt2",
    "hyperParameters": {
      "temperature": 1,
      "maxTokens": 512,
      "topP": 1,
      "topK": 50,
      "stopSequence": []
    },
    "scalingParameters": {
      "maxBatchSize": 10,
      "minReplicas": 1,
      "maxReplicas": 2,
      "scaleUpDelay": 30,
      "scaleDownDelay": 600
    },
    "deviceType": "g5.xlarge",
    "optimizationInfo": {
      "optimizationType": "",
      "quantizationType": ""
    },
    "isDeployedPreviously": true
  }'

Body Parameters

The following deployment parameters can be configured and passed in the body: General Parameters
PARAMETERDESCRIPTIONTYPEREQUIRED/OPTIONALENUM VALUES
nameName of the model to deploy.StringRequiredN/A
isDeployedPreviouslyIndicates if the model was deployed before.BooleanOptional[true, false]
Hyperparameters
PARAMETERDESCRIPTIONTYPEREQUIRED/OPTIONALENUM VALUES
temperatureControls randomness of output.FloatRequired0-2
maxTokensMaximum tokens allowed.IntRequired0-512
topPControls nucleus sampling.FloatRequired0-1
topKControls top-K sampling.IntRequired1-100
stopSequenceStop sequences for the model.ArrayOptionalN/A
Scaling Parameters
PARAMETERDESCRIPTIONTYPEREQUIRED/OPTIONALRANGE
maxBatchSizeMaximum batch size.IntOptional1-256
minReplicasMinimum replicas.IntOptional1-10
maxReplicasMaximum replicas.IntOptional1-50
scaleUpDelayDelay before scaling up (ms).IntOptional1-1000
scaleDownDelayDelay before scaling down (ms).IntOptional50-2000
Deployment Device & Optimization
PARAMETERDESCRIPTIONTYPEREQUIRED/OPTIONALENUM VALUES
deviceTypeDevice type for deployment.StringRequired[“g4dn.xlarge”, “g5.xlarge”, “g5.2xlarge”, “g6e.xlarge”, “g4dn.12xlarge”, “g5.12xlarge”, “g5.48xlarge”, “g4dn.metal”]
optimizationInfoOptimization details.ObjectOptionalN/A
optimizationTypeType of optimization.StringOptional[“ctranslate2”, “vllm”]
quantizationTypeType of quantization.StringOptional[“no_quantization”, “int8_float16”]

Sample Response

{
  "dock-statusId": "ds-d0xxxxxd-bxx9-5xx0-8xx5-5bxxxxxxxxx1",
  "modelId": "cm-77xxxxxb-exx9-5xxc-8xx6-5xxxxxxxxxx1",
  "jobType": "MODELS",
  "action": "DEPLOY",
  "status": "IN_PROGRESS"
}

Response Parameters

PARAMETERDESCRIPTIONTYPE
dockStatusIdThe unique identifier for tracking the model deployment.String
modelIdThe model that was deployed.String
jobTypeSpecifies the type of job (for example, MODELS).String
actionIndicates the performed action (DEPLOY).String
statusDeployment status (SUCCESS, IN_PROGRESS, or FAILED).String