334 測定値

FLUX、Python、Diffusers を使用した AI 搭載画像生成 API サービス: クイックガイド

に HeraHaven AI11m2024/11/29

長すぎる; 読むには

この記事では、Python を使用して独自の FLUX サーバーを作成する手順を説明します。このサーバーを使用すると、シンプルな API を介してテキストプロンプトに基づいて画像を生成できます。このサーバーを個人使用のために実行する場合でも、実稼働アプリケーションの一部として展開する場合でも、このガイドは開始するのに役立ちます。

featured image - FLUX、Python、Diffusers を使用した AI 搭載画像生成 API サービス: クイックガイド

FLUX ( Black Forest Labs社製) は、ここ数か月で AI 画像生成の世界に旋風を巻き起こしました。多くのベンチマークで Stable Diffusion (これまでのオープンソースの王者) に勝っただけでなく、いくつかの指標ではDall-EやMidjourneyなどの独自モデルも上回りました。

しかし、アプリの 1 つで FLUX を使用するにはどうすればよいでしょうか。Replicate などのサーバーレスホストの使用を考える人もいるかもしれませんが、これらはすぐに非常に高価になり、必要な柔軟性が得られない可能性があります。そこで、独自のカスタム FLUX サーバーを作成すると便利です。

前提条件

コードに進む前に、必要なツールとライブラリが設定されていることを確認しましょう。

Python: マシンに Python 3 (できればバージョン 3.10) がインストールされている必要があります。
torch : FLUX を実行するために使用するディープラーニングフレームワーク。
diffusers : FLUX モデルへのアクセスを提供します。
transformers ：ディフューザーの必要な依存性。
sentencepiece : FLUXトークナイザーを実行するために必要
protobuf : FLUXを実行するために必要
accelerate : 場合によっては、FLUX モデルをより効率的にロードするのに役立ちます。
fastapi : 画像生成リクエストを受け入れることができる Web サーバーを作成するためのフレームワーク。
uvicorn : FastAPI サーバーを実行するために必要です。
psutil : マシンに搭載されている RAM の量を確認できます。

次のコマンドを実行すると、すべてのライブラリをインストールできます: pip install torch diffusers transformers sentencepiece protobuf accelerate fastapi uvicorn 。

M1 または M2 チップを搭載した Mac を使用している場合は、最適なパフォーマンスを得るために PyTorch with Metal を設定する必要があります。続行する前に、公式の PyTorch with Metal ガイドに従ってください。

FLUX を GPU デバイスで実行する予定の場合は、少なくとも 12 GB の VRAM があることを確認する必要があります。または、CPU/MPS (低速になります) で実行する場合は、少なくとも 12 GB の RAM が必要です。

ステップ1: 環境の設定

使用しているハードウェアに基づいて推論を実行するための適切なデバイスを選択してスクリプトを開始しましょう。

 device = 'cuda' # can also be 'cpu' or 'mps' import os # MPS support in PyTorch is not yet fully implemented if device == 'mps': os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" import torch if device == 'mps' and not torch.backends.mps.is_available(): raise Exception("Device set to MPS, but MPS is not available") elif device == 'cuda' and not torch.cuda.is_available(): raise Exception("Device set to CUDA, but CUDA is not available")

cpu 、 cuda (NVIDIA GPU の場合)、またはmps (Apple の Metal Performance Shaders の場合) を指定できます。スクリプトは、選択したデバイスが使用可能かどうかを確認し、使用できない場合は例外を発生させます。

ステップ2: FLUXモデルの読み込み

次に、FLUX モデルをロードします。fp16 精度でモデルをロードすると、品質をあまり損なうことなくメモリを節約できます。

この時点で、FLUX モデルはゲートされているため、HuggingFace で認証するように求められる場合があります。正常に認証するには、HuggingFace アカウントを作成し、モデルページに移動して利用規約に同意し、アカウント設定から HuggingFace トークンを作成し、それをHF_TOKEN環境変数としてマシンに追加する必要があります。

 from diffusers import FlowMatchEulerDiscreteScheduler, FluxPipeline import psutil model_name = "black-forest-labs/FLUX.1-dev" print(f"Loading {model_name} on {device}") pipeline = FluxPipeline.from_pretrained( model_name, # Diffusion models are generally trained on fp32, but fp16 # gets us 99% there in terms of quality, with just half the (V)RAM torch_dtype=torch.float16, # Ensure we don't load any dangerous binary code use_safetensors=True # We are using Euler here, but you can also use other samplers scheduler=FlowMatchEulerDiscreteScheduler() ).to(device)

ここでは、diffusers ライブラリを使用して FLUX モデルを読み込んでいます。使用しているモデルはblack-forest-labs/FLUX.1-devで、fp16 精度で読み込まれています。

また、推論は高速ですが、出力される画像の詳細度が低い FLUX Schnell というタイムステップ抽出モデルや、クローズドソースの FLUX Pro モデルもあります。ここでは Euler スケジューラを使用しますが、これを試してみることもできます。スケジューラの詳細については、こちらを参照してください。画像生成はリソースを大量に消費する可能性があるため、特に CPU やメモリが限られたデバイスで実行する場合は、メモリ使用量を最適化することが重要です。

 # Recommended if running on MPS or CPU with < 64 GB of RAM total_memory = psutil.virtual_memory().total total_memory_gb = total_memory / (1024 ** 3) if (device == 'cpu' or device == 'mps') and total_memory_gb < 64: print("Enabling attention slicing") pipeline.enable_attention_slicing()

このコードは、使用可能なメモリの合計をチェックし、システムの RAM が 64 GB 未満の場合にアテンションスライシングを有効にします。アテンションスライシングは、イメージ生成中のメモリ使用量を削減します。これは、リソースが限られているデバイスにとって不可欠です。

ステップ3: FastAPIを使用してAPIを作成する

次に、画像を生成するための API を提供する FastAPI サーバーをセットアップします。

 from fastapi import FastAPI, HTTPException from pydantic import BaseModel, Field, conint, confloat from fastapi.middleware.gzip import GZipMiddleware from io import BytesIO import base64 app = FastAPI() # We will be returning the image as a base64 encoded string # which we will want compressed app.add_middleware(GZipMiddleware, minimum_size=1000, compresslevel=7)

FastAPI は、Python で Web API を構築するための一般的なフレームワークです。この場合、これを使用して、画像生成のリクエストを受け入れることができるサーバーを作成します。また、応答を圧縮するために GZip ミドルウェアも使用しています。これは、画像を base64 形式で送り返すときに特に便利です。

実稼働環境では、CDN やその他の最適化を活用するために、生成された画像を S3 バケットまたはその他のクラウドストレージに保存し、base64 でエンコードされた文字列ではなく URL を返すことが必要な場合があります。

ステップ4: リクエストモデルの定義

ここで、API が受け入れるリクエストのモデルを定義する必要があります。

 class GenerateRequest(BaseModel): prompt: str seed: conint(ge=0) = Field(..., description="Seed for random number generation") height: conint(gt=0) = Field(..., description="Height of the generated image, must be a positive integer and a multiple of 8") width: conint(gt=0) = Field(..., description="Width of the generated image, must be a positive integer and a multiple of 8") cfg: confloat(gt=0) = Field(..., description="CFG (classifier-free guidance scale), must be a positive integer or 0") steps: conint(ge=0) = Field(..., description="Number of steps") batch_size: conint(gt=0) = Field(..., description="Number of images to generate in a batch")

このGenerateRequestモデルは、画像を生成するために必要なパラメータを定義します。prompt フィールドはprompt作成する画像のテキスト説明です。その他のフィールドには、画像のサイズ、推論ステップの数、バッチサイズなどがあります。

ステップ5: 画像生成エンドポイントの作成

次に、画像生成リクエストを処理するエンドポイントを作成しましょう。

 @app.post("/") async def generate_image(request: GenerateRequest): # Validate that height and width are multiples of 8 # as required by FLUX if request.height % 8 != 0 or request.width % 8 != 0: raise HTTPException(status_code=400, detail="Height and width must both be multiples of 8") # Always calculate the seed on CPU for deterministic RNG # For a batch of images, seeds will be sequential like n, n+1, n+2, ... generator = [torch.Generator(device="cpu").manual_seed(i) for i in range(request.seed, request.seed + request.batch_size)] images = pipeline( height=request.height, width=request.width, prompt=request.prompt, generator=generator, num_inference_steps=request.steps, guidance_scale=request.cfg, num_images_per_prompt=request.batch_size ).images # Convert images to base64 strings # (for a production app, you might want to store the # images in an S3 bucket and return the URLs instead) base64_images = [] for image in images: buffered = BytesIO() image.save(buffered, format="PNG") img_str = base64.b64encode(buffered.getvalue()).decode("utf-8") base64_images.append(img_str) return { "images": base64_images, }

このエンドポイントは、画像生成プロセスを処理します。まず、FLUX の要件に従って、高さと幅が 8 の倍数であることを検証します。次に、提供されたプロンプトに基づいて画像を生成し、base64 でエンコードされた文字列として返します。

ステップ6: サーバーの起動

最後に、スクリプトの実行時にサーバーを起動するコードを追加しましょう。

 @app.on_event("startup") async def startup_event(): print("Image generation server running") if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000)

このコードは、ポート 8000 で FastAPI サーバーを起動し、 0.0.0.0バインディングのおかげで、 http://localhost:8000からだけでなく、ホストマシンの IP アドレスを使用して同じネットワーク上の他のデバイスからもアクセスできるようになります。

ステップ7: ローカルでサーバーをテストする

FLUX サーバーが起動したら、次はテストです。HTTP リクエストを行うためのコマンドラインツールであるcurl使用して、サーバーと対話できます。

 curl -X POST "http://localhost:8000/" \ -H "Content-Type: application/json" \ -d '{ "prompt": "A futuristic cityscape at sunset", "seed": 42, "height": 1024, "width": 1024, "cfg": 3.5, "steps": 50, "batch_size": 1 }' | jq -r '.images[0]' | base64 -d > test.png

このコマンドは、 curl 、 jq 、 base64ユーティリティがインストールされている UNIX ベースのシステムでのみ機能します。また、FLUX サーバーをホストしているハードウェアによっては、完了するまでに数分かかる場合があります。

結論

おめでとうございます。Python を使用して独自の FLUX サーバーを正常に作成できました。この設定により、シンプルな API を介してテキストプロンプトに基づいて画像を生成できます。基本 FLUX モデルの結果に満足できない場合は、特定のユースケースでさらに優れたパフォーマンスを得るためにモデルを微調整することを検討してください。

完全なコード

このガイドで使用されている完全なコードは以下にあります。

 device = 'cuda' # can also be 'cpu' or 'mps' import os # MPS support in PyTorch is not yet fully implemented if device == 'mps': os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" import torch if device == 'mps' and not torch.backends.mps.is_available(): raise Exception("Device set to MPS, but MPS is not available") elif device == 'cuda' and not torch.cuda.is_available(): raise Exception("Device set to CUDA, but CUDA is not available") from diffusers import FlowMatchEulerDiscreteScheduler, FluxPipeline import psutil model_name = "black-forest-labs/FLUX.1-dev" print(f"Loading {model_name} on {device}") pipeline = FluxPipeline.from_pretrained( model_name, # Diffusion models are generally trained on fp32, but fp16 # gets us 99% there in terms of quality, with just half the (V)RAM torch_dtype=torch.float16, # Ensure we don't load any dangerous binary code use_safetensors=True, # We are using Euler here, but you can also use other samplers scheduler=FlowMatchEulerDiscreteScheduler() ).to(device) # Recommended if running on MPS or CPU with < 64 GB of RAM total_memory = psutil.virtual_memory().total total_memory_gb = total_memory / (1024 ** 3) if (device == 'cpu' or device == 'mps') and total_memory_gb < 64: print("Enabling attention slicing") pipeline.enable_attention_slicing() from fastapi import FastAPI, HTTPException from pydantic import BaseModel, Field, conint, confloat from fastapi.middleware.gzip import GZipMiddleware from io import BytesIO import base64 app = FastAPI() # We will be returning the image as a base64 encoded string # which we will want compressed app.add_middleware(GZipMiddleware, minimum_size=1000, compresslevel=7) class GenerateRequest(BaseModel): prompt: str seed: conint(ge=0) = Field(..., description="Seed for random number generation") height: conint(gt=0) = Field(..., description="Height of the generated image, must be a positive integer and a multiple of 8") width: conint(gt=0) = Field(..., description="Width of the generated image, must be a positive integer and a multiple of 8") cfg: confloat(gt=0) = Field(..., description="CFG (classifier-free guidance scale), must be a positive integer or 0") steps: conint(ge=0) = Field(..., description="Number of steps") batch_size: conint(gt=0) = Field(..., description="Number of images to generate in a batch") @app.post("/") async def generate_image(request: GenerateRequest): # Validate that height and width are multiples of 8 # as required by FLUX if request.height % 8 != 0 or request.width % 8 != 0: raise HTTPException(status_code=400, detail="Height and width must both be multiples of 8") # Always calculate the seed on CPU for deterministic RNG # For a batch of images, seeds will be sequential like n, n+1, n+2, ... generator = [torch.Generator(device="cpu").manual_seed(i) for i in range(request.seed, request.seed + request.batch_size)] images = pipeline( height=request.height, width=request.width, prompt=request.prompt, generator=generator, num_inference_steps=request.steps, guidance_scale=request.cfg, num_images_per_prompt=request.batch_size ).images # Convert images to base64 strings # (for a production app, you might want to store the # images in an S3 bucket and return the URL's instead) base64_images = [] for image in images: buffered = BytesIO() image.save(buffered, format="PNG") img_str = base64.b64encode(buffered.getvalue()).decode("utf-8") base64_images.append(img_str) return { "images": base64_images, } @app.on_event("startup") async def startup_event(): print("Image generation server running") if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000)