Skip to main content
The Bioptimus Model Server runs H-Optimus and M-Optimus entirely within your infrastructure — keeping data on your hardware for residency, compliance, or air-gapped requirements. It serves the same JSON API as the SageMaker deployment, and the bioptimus SDK works against both. Two package variants are available. Each image contains the relevant model plus the tissue segmentation model:
PackageModelsKey endpoints
H1H-Optimus (h1), tissue-segPOST /api/embed/h1, POST /api/predict/tissue-seg
MM-Optimus (m-optimus), tissue-segPOST /api/embed/m-optimus, POST /api/predict/m-optimus, POST /api/predict/tissue-seg

Prerequisites

RequirementDetail
DockerVersion 20.10 or later
NVIDIA Container ToolkitRequired for GPU inference (install guide)
GPUNVIDIA GPU with CUDA Compute Capability 8.6 (e.g. A10G-class, equivalent to SageMaker ml.g5.xlarge). A CUDA GPU is required — the container exits if none is detected.
CPU architecturex86-64 — CC 8.6 + x86-64 runs out of the box; other architectures require recompilation.
GPU accessDirect passthrough (--gpus all)
OSLinux with an NVIDIA GPU
DiskEnough to hold the image (weights are baked in)
The REST API currently requires no authentication — run the server on a private network and restrict access at the network layer. See the API reference.

1. Load the container image

You receive the container as a compressed archive plus a sha256 checksum. Verify integrity, then load it into Docker.
sha256sum -c bioptimus-h1-onpremise-v1.0.1.sha256
docker load < bioptimus-h1-onpremise-v1.0.1.tar.gz
docker images bioptimus-h1-onpremise

2. Start the container

The server is self-contained — all weights and assets are baked into the image, so no volume mounts are required.
docker run -d --name bioptimus-server --gpus all -p 8080:8080 \
  bioptimus-h1-onpremise:v1.0.1 serve
Models load at startup. Follow the logs until the server is ready:
docker logs -f bioptimus-server 2>&1 | grep -m1 "models ready"

3. Air-gapped install

For environments with no outbound network, transfer the .tar.gz archive via your approved process, then run sha256sum -c, docker load, and docker run exactly as above. No registry access is needed — the image is self-contained.

4. Verify the deployment

1

Health check

curl -s http://localhost:8080/ping | python3 -m json.tool
# {"status": "ok", "models": ["h1", "tissue-seg"]}
A 503 {"status": "loading"} means models are still initialising — retry. See the API reference for all health states.
2

Interactive API docs

Open http://localhost:8080/docs for a Swagger UI to explore endpoints and try calls. Service discovery is at http://localhost:8080/bioptimus/.
3

Test with the SDK

from bioptimus.models.backbones import Backbone, Models

print(Backbone.available_backbones())   # H1: ['h1', 'tissue-seg']
model = Backbone(Models.H1, base_url="http://localhost:8080")
For whole-slide inference, see the SDK guide.

Managing the container

docker stop bioptimus-server      # stop (preserves state)
docker start bioptimus-server     # restart
docker logs bioptimus-server      # view logs
docker rm -f bioptimus-server     # remove

Environment variables

Set with -e at start:
VariableDefaultDescription
PORT8080Server port
LOG_LEVELINFOLog level (DEBUG, INFO, WARNING, ERROR)
SM_MODEL_DIR/opt/ml/modelModel artefact directory (used by SageMaker)

Troubleshooting

Container not running or still starting. Check docker ps and docker logs bioptimus-server.
Models are still loading. Wait for the models ready log line.
The GPU is no longer available, or the CUDA context was corrupted. Check nvidia-smi on the host and restart the container.
Common causes: no CUDA GPU detected (a GPU is required), port conflict, or insufficient memory. Run docker logs bioptimus-server for the error.
The server auto-batches concurrent requests (max batch size 32). Reduce the number of concurrent SDK requests to lower peak memory use.