Service
Health check
Returns 200 with the loaded model list when the server is ready. Returns 503 with a status reason while loading or if the GPU/CUDA context is unavailable.
GET
Health check
Response
Healthy — all expected models loaded and GPU available
Health check

