Deploy a Model API

Back to API List This API deploys an open-source or fine-tuned model in the Ready to Deploy state. Users can configure deployment parameters, including hyperparameters, scaling, and optimization settings, allowing for flexible model scaling and performance tuning. The API response includes the model ID and the model deployment status. After receiving the response, use the dockStatusId to call the Get Dock Status API and verify the successful deployment of the model.

Method	POST
Endpoint	`https://{host}/api/public/models/:{<i>modelId</i>}/deploy?modelType={<i>modelType</i>}`
Content Type	application/json
Authorization	`X-api-key` - The API key used for authentication.

To use the API, create an API key.

Query Parameters

PARAMETER	DESCRIPTION	TYPE	REQUIRED/OPTIONAL	ENUM VALUES
host	The environment URL. For example, `https://agent-platform.domain.ai/`.	String	Required	N/A
modelId	The model ID to deploy.	String	Required	N/A
modelType	Type of model being deployed.	String	Required	[“openSource”, “fineTune”]

Sample Request

For an Opensource Model Source

curl --location 'https://{host}/api/public/models/cm-2xxxxxxxxxxxxxxxxxx0/deploy?modelType=openSource' 
--header 'x-api-key: kg-axxxxxxx-5xx3-5xx8-bxxb-9xxxxxxxxxx-ebxxxxxx-5xxb-4xxb-9xx5-cxxxxxxxxx3' 
--header 'Content-Type: application/json' 
--data '{
    "name": "Flant5_model",
    "hyperParameters": {
      "temperature": 1,
      "maxTokens": 512,
      "topP": 1,
      "topK": 50,
      "stopSequence": []
    },
    "scalingParameters": {
      "maxBatchSize": 10,
      "minReplicas": 1,
      "maxReplicas": 2,
      "scaleUpDelay": 30,
      "scaleDownDelay": 600
    },
    "deviceType": "g5.xlarge",
    "optimizationInfo": {
      "optimizationType": "",
      "quantizationType": ""
    },
    "isDeployedPreviously": true
  }'

For a Fine-tune Model Source

curl --location ' https://{host}/api/public/models/cm-6xxxxxxxxxxxxxxxxxx9/deploy?modelType=fineTune' 
--header 'x-api-key: kg-2xxxxxxxxxxxxxxxxxxf-7xxxxxxx-7xx8-4xxf-8xx7-dxxxxxxxxxx3' 
--header 'Content-Type: application/json' 
--data '{
    "name": "gpt2",
    "hyperParameters": {
      "temperature": 1,
      "maxTokens": 512,
      "topP": 1,
      "topK": 50,
      "stopSequence": []
    },
    "scalingParameters": {
      "maxBatchSize": 10,
      "minReplicas": 1,
      "maxReplicas": 2,
      "scaleUpDelay": 30,
      "scaleDownDelay": 600
    },
    "deviceType": "g5.xlarge",
    "optimizationInfo": {
      "optimizationType": "",
      "quantizationType": ""
    },
    "isDeployedPreviously": true
  }'

Body Parameters

The following deployment parameters can be configured and passed in the body: General Parameters

PARAMETER	DESCRIPTION	TYPE	REQUIRED/OPTIONAL	ENUM VALUES
name	Name of the model to deploy.	String	Required	N/A
isDeployedPreviously	Indicates if the model was deployed before.	Boolean	Optional	[true, false]

Hyperparameters

PARAMETER	DESCRIPTION	TYPE	REQUIRED/OPTIONAL	ENUM VALUES
temperature	Controls randomness of output.	Float	Required	0-2
maxTokens	Maximum tokens allowed.	Int	Required	0-512
topP	Controls nucleus sampling.	Float	Required	0-1
topK	Controls top-K sampling.	Int	Required	1-100
stopSequence	Stop sequences for the model.	Array	Optional	N/A

Scaling Parameters

PARAMETER	DESCRIPTION	TYPE	REQUIRED/OPTIONAL	RANGE
maxBatchSize	Maximum batch size.	Int	Optional	1-256
minReplicas	Minimum replicas.	Int	Optional	1-10
maxReplicas	Maximum replicas.	Int	Optional	1-50
scaleUpDelay	Delay before scaling up (ms).	Int	Optional	1-1000
scaleDownDelay	Delay before scaling down (ms).	Int	Optional	50-2000

Deployment Device & Optimization

PARAMETER	DESCRIPTION	TYPE	REQUIRED/OPTIONAL	ENUM VALUES
deviceType	Device type for deployment.	String	Required	[“g4dn.xlarge”, “g5.xlarge”, “g5.2xlarge”, “g6e.xlarge”, “g4dn.12xlarge”, “g5.12xlarge”, “g5.48xlarge”, “g4dn.metal”]
optimizationInfo	Optimization details.	Object	Optional	N/A
optimizationType	Type of optimization.	String	Optional	[“ctranslate2”, “vllm”]
quantizationType	Type of quantization.	String	Optional	[“no_quantization”, “int8_float16”]

Sample Response

{
  "dock-statusId": "ds-d0xxxxxd-bxx9-5xx0-8xx5-5bxxxxxxxxx1",
  "modelId": "cm-77xxxxxb-exx9-5xxc-8xx6-5xxxxxxxxxx1",
  "jobType": "MODELS",
  "action": "DEPLOY",
  "status": "IN_PROGRESS"
}

Response Parameters

PARAMETER	DESCRIPTION	TYPE
dockStatusId	The unique identifier for tracking the model deployment.	String
modelId	The model that was deployed.	String
jobType	Specifies the type of job (for example, `MODELS`).	String
action	Indicates the performed action (`DEPLOY`).	String
status	Deployment status (`SUCCESS`, `IN_PROGRESS`, or `FAILED`).	String

Building Agents

Platform Services

Administration

References

Query Parameters

Sample Request

Body Parameters

Sample Response

Response Parameters

Building Agents

Platform Services

Administration

References

​Query Parameters

​Sample Request

​Body Parameters

​Sample Response

​Response Parameters

Query Parameters

Sample Request

Body Parameters

Sample Response

Response Parameters