Skip to main content
AWS SageMaker client. Sends JSON payloads to a SageMaker endpoint via the sagemaker-runtime invoke_endpoint API. The model_name field is injected into every request body for multi-model dispatch.

AWSClient

class AWSClient()
Sends JSON payloads to an AWS SageMaker endpoint.
endpoint_name
Name of the SageMaker endpoint.
model_name
Model identifier injected into the request body for SageMaker multi-model dispatch.
region_name
AWS region. When None, uses the default from the environment or boto_session.
boto_session
Optional pre-configured Session (e.g. with explicit credentials or a custom profile). When None, a default session is created using the standard credential chain.
timeout
Read timeout in seconds.
Example:
client = AWSClient(
    endpoint_name="h0-mini-prod",
    model_name="h0-mini",
)
resp_json = client.predict(request_json)

predict

def predict(body: str) -> str
Invoke the SageMaker endpoint for prediction.
body
str
required
Serialized JSON request payload.
returns
str
Response body as a JSON string.

embed

def embed(body: str) -> str
Invoke the SageMaker endpoint for embedding.
body
str
required
Serialized JSON request payload.
returns
str
Response body as a JSON string.

metadata

def metadata() -> str
Fetch model metadata via /invocations. Sends a minimal request with mode set to "metadata" so the server returns model metadata instead of running inference.
returns
str
Response body as a JSON string.

predict_async

async def predict_async(body: str, session: Any = None) -> str
Predict asynchronously via run_in_executor. boto3 is not natively async, so the synchronous predict call is delegated to a thread pool.
body
str
required
Serialized JSON payload.
session
Any
Unused. Accepted for interface compatibility with the Client protocol.
returns
str
Response body as a string.

embed_async

async def embed_async(body: str, session: Any = None) -> str
Embed asynchronously via run_in_executor.
body
str
required
Serialized JSON payload.
session
Any
Unused. Accepted for interface compatibility.
returns
str
Response body as a string.