On-premise

The Bioptimus Model Server runs H-Optimus and M-Optimus entirely within your infrastructure — keeping data on your hardware for residency, compliance, or air-gapped requirements. It serves the same JSON API as the SageMaker deployment, and the Bioptimus SDK works against both. Two package variants are available. Each image contains the relevant model plus the tissue segmentation model:

Package	Models	Key endpoints
H1	H-Optimus (`h1`), tissue-seg	`POST /api/embed/h1`, `POST /api/predict/tissue-seg`
M	M-Optimus (`m-optimus`), tissue-seg	`POST /api/embed/m-optimus`, `POST /api/predict/m-optimus`, `POST /api/predict/tissue-seg`

Prerequisites

Requirement	Detail
Docker	Version 20.10 or later
NVIDIA Container Toolkit	Required for GPU inference (install guide)
GPU	NVIDIA GPU with CUDA Compute Capability 8.6 (e.g. A10G-class, equivalent to SageMaker `ml.g5.xlarge`). A CUDA GPU is required — the container exits if none is detected.
CPU architecture	x86-64 — CC 8.6 + x86-64 runs out of the box; other architectures require recompilation.
GPU access	Direct passthrough (`--gpus all`)
OS	Linux with an NVIDIA GPU
Disk	Enough to hold the image (weights are baked in)

The REST API currently requires no authentication — run the server on a private network and restrict access at the network layer. See the API reference.

1. Load the container image

You receive the container as a compressed archive plus a sha256 checksum. Verify integrity, then load it into Docker.

H1 package
M package

sha256sum -c bioptimus-h1-onpremise-v1.0.1.sha256
docker load < bioptimus-h1-onpremise-v1.0.1.tar.gz
docker images bioptimus-h1-onpremise

sha256sum -c bioptimus-m-onpremise-v1.0.1.sha256
docker load < bioptimus-m-onpremise-v1.0.1.tar.gz
docker images bioptimus-m-onpremise

2. Start the container

The server is self-contained — all weights and assets are baked into the image, so no volume mounts are required.

H1 package
M package

docker run -d --name bioptimus-server --gpus all -p 8080:8080 \
  bioptimus-h1-onpremise:v1.0.1 serve

docker run -d --name bioptimus-server --gpus all -p 8080:8080 \
  bioptimus-m-onpremise:v1.0.1 serve

Models load at startup. Follow the logs until the server is ready:

docker logs -f bioptimus-server 2>&1 | grep -m1 "models ready"

3. Air-gapped install

For environments with no outbound network, transfer the .tar.gz archive via your approved process, then run sha256sum -c, docker load, and docker run exactly as above. No registry access is needed — the image is self-contained.

4. Verify the deployment

Health check

curl -s http://localhost:8080/ping | python3 -m json.tool
# {"status": "ok", "models": ["h1", "tissue-seg"]}

curl -s http://localhost:8080/ping | python3 -m json.tool
# {"status": "ok", "models": ["m-optimus", "tissue-seg"]}

A 503 {"status": "loading"} means models are still initialising — retry. See the API reference for all health states.

Interactive API docs

Open http://localhost:8080/docs for a Swagger UI to explore endpoints and try calls. Service discovery is at http://localhost:8080/bioptimus/.

Test with the Bioptimus SDK

from bioptimus.models.backbones import Backbone
from bioptimus.models.types import Models

print(Backbone.available_backbones())   # ['h1', 'm-optimus', 'tissue-seg'] — SDK-known models; /ping shows what's deployed
model = Backbone(Models.H1, base_url="http://localhost:8080")

from bioptimus.models.backbones import Backbone
from bioptimus.models.types import Models

print(Backbone.available_backbones())   # ['h1', 'm-optimus', 'tissue-seg'] — SDK-known models; /ping shows what's deployed
model = Backbone(Models.M_OPTIMUS, base_url="http://localhost:8080")

For whole-slide inference, see the Bioptimus SDK.

Managing the container

docker stop bioptimus-server      # stop (preserves state)
docker start bioptimus-server     # restart
docker logs bioptimus-server      # view logs
docker rm -f bioptimus-server     # remove

Environment variables

Set with -e at start:

Variable	Default	Description
`PORT`	`8080`	Server port
`LOG_LEVEL`	`INFO`	Log level (`DEBUG`, `INFO`, `WARNING`, `ERROR`)
`SM_MODEL_DIR`	`/opt/ml/model`	Model artefact directory (used by SageMaker)

Troubleshooting

curl: (7) Failed to connect

Container not running or still starting. Check docker ps and docker logs bioptimus-server.

/ping returns {"status": "loading"}

Models are still loading. Wait for the models ready log line.

/ping returns {"status": "unhealthy"}

The GPU is no longer available, or the CUDA context was corrupted. Check nvidia-smi on the host and restart the container.

Container exits immediately

Common causes: no CUDA GPU detected (a GPU is required), port conflict, or insufficient memory. Run docker logs bioptimus-server for the error.

Out of GPU memory

The server auto-batches concurrent requests (max batch size 32). Reduce the number of concurrent SDK requests to lower peak memory use.

Get Started

Platforms

Prerequisites

1. Load the container image

2. Start the container

3. Air-gapped install

4. Verify the deployment

Managing the container

Environment variables

Troubleshooting

​Prerequisites

​1. Load the container image

​2. Start the container

​3. Air-gapped install

​4. Verify the deployment

​Managing the container

​Environment variables

​Troubleshooting

Prerequisites

1. Load the container image

2. Start the container

3. Air-gapped install

4. Verify the deployment

Managing the container

Environment variables

Troubleshooting