はじめに
Hugging Faceで入手できるSageMakerでのデプロイ用のコードの利用の仕方を確認する
手順
https://huggingface.co/openai/whisper-small にて、deploy > Amazon SageMakerを選択し、以下のコードを入手する
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel
try:
role = sagemaker.get_execution_role()
except ValueError:
iam = boto3.client('iam')
role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'openai/whisper-small',
'HF_TASK':'automatic-speech-recognition'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.37.0',
pytorch_version='2.1.0',
py_version='py310',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge' # ec2 instance type
)
from sagemaker.serializers import DataSerializer
predictor.serializer = DataSerializer(content_type='audio/x-audio')
### この部分は書き換える
# Make sure the input file "sample1.flac" exists
with open("sample1.flac", "rb") as f:
data = f.read()
predictor.predict(data)
###
S3に、FLACフォーマットの音声ファイルをアップロードする
S3から音声ファイルを読み込み、transcriptが取得できることを確認する
import boto3
s3 = boto3.client('s3')
bucket_name = 'sagemaker-studio-f5d00bf0'
object_key = '1272-128104-0000.flac'
response = s3.get_object(Bucket=bucket_name, Key=object_key)
file_content = response['Body'].read()
predictor.predict(data)
{'text': ' Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.'}