259 การอ่าน

บริการ API สร้างภาพที่ขับเคลื่อนด้วย AI พร้อมด้วย FLUX, Python และตัวกระจายแสง: คำแนะนำฉบับย่อ

โดย HeraHaven AI11m2024/11/29

นานเกินไป; อ่าน

ในบทความนี้ เราจะแนะนำคุณเกี่ยวกับการสร้างเซิร์ฟเวอร์ FLUX ของคุณเองโดยใช้ Python เซิร์ฟเวอร์นี้จะช่วยให้คุณสร้างรูปภาพตามข้อความแจ้งเตือนผ่าน API ที่เรียบง่าย ไม่ว่าคุณจะใช้เซิร์ฟเวอร์นี้เพื่อใช้งานส่วนตัวหรือใช้งานเป็นส่วนหนึ่งของแอปพลิเคชันการผลิต คู่มือนี้จะช่วยคุณเริ่มต้นได้

featured image - บริการ API สร้างภาพที่ขับเคลื่อนด้วย AI พร้อมด้วย FLUX, Python และตัวกระจายแสง: คำแนะนำฉบับย่อ

FLUX (โดย Black Forest Labs ) ได้เข้ามาครองโลกของการสร้างภาพด้วย AI ในช่วงไม่กี่เดือนที่ผ่านมา ไม่เพียงแต่สามารถเอาชนะ Stable Diffusion (ราชาโอเพนซอร์สในอดีต) ในเกณฑ์มาตรฐานต่างๆ ได้เท่านั้น แต่ยังแซงหน้าโมเดลที่เป็นกรรมสิทธิ์ เช่น Dall-E หรือ Midjourney ในบางเกณฑ์อีกด้วย

แล้วคุณจะใช้ FLUX ในแอปของคุณได้อย่างไร หลายคนอาจคิดว่าจะใช้โฮสต์ที่ไม่มีเซิร์ฟเวอร์ เช่น Replicate และอื่นๆ แต่โฮสต์เหล่านี้อาจมีราคาแพงมาก และอาจไม่มีความยืดหยุ่นตามที่คุณต้องการ ดังนั้นการสร้างเซิร์ฟเวอร์ FLUX แบบกำหนดเองจึงมีประโยชน์

ข้อกำหนดเบื้องต้น

ก่อนที่จะเจาะลึกโค้ด เรามาตรวจสอบกันก่อนว่าคุณได้ตั้งค่าเครื่องมือและไลบรารีที่จำเป็นแล้ว:

Python: คุณจะต้องติดตั้ง Python 3 บนเครื่องของคุณ โดยควรเป็นเวอร์ชัน 3.10
torch : กรอบการทำงานการเรียนรู้เชิงลึกที่เราจะใช้ในการรัน FLUX
diffusers : ช่วยให้เข้าถึงรุ่น FLUX ได้
transformers : จำเป็นต้องมีตัวกระจายความร้อน
sentencepiece : จำเป็นสำหรับการรัน FLUX tokenizer
protobuf : จำเป็นสำหรับการรัน FLUX
accelerate : ช่วยโหลดโมเดล FLUX ได้อย่างมีประสิทธิภาพมากขึ้นในบางกรณี
fastapi : กรอบงานในการสร้างเว็บเซิร์ฟเวอร์ที่สามารถรับคำขอสร้างภาพได้
uvicorn : จำเป็นสำหรับการเรียกใช้เซิร์ฟเวอร์ FastAPI
psutil : ช่วยให้เราตรวจสอบได้ว่ามี RAM อยู่ในเครื่องของเราเท่าไร

คุณสามารถติดตั้งไลบรารีทั้งหมดได้โดยรันคำสั่งต่อไปนี้: pip install torch diffusers transformers sentencepiece protobuf accelerate fastapi uvicorn .

หากคุณใช้ Mac ที่มีชิป M1 หรือ M2 คุณควรตั้งค่า PyTorch ด้วย Metal เพื่อประสิทธิภาพสูงสุด ปฏิบัติตาม คำแนะนำอย่างเป็นทางการเกี่ยวกับ PyTorch ด้วย Metal ก่อนดำเนินการต่อ

คุณจะต้องแน่ใจว่าคุณมี VRAM อย่างน้อย 12 GB หากคุณวางแผนที่จะรัน FLUX บนอุปกรณ์ GPU หรืออย่างน้อย RAM 12 GB สำหรับการรันบน CPU/MPS (ซึ่งจะช้ากว่า)

ขั้นตอนที่ 1: การตั้งค่าสภาพแวดล้อม

เริ่มต้นสคริปต์โดยเลือกอุปกรณ์ที่ถูกต้องในการรันอนุมานตามฮาร์ดแวร์ที่เรากำลังใช้

 device = 'cuda' # can also be 'cpu' or 'mps' import os # MPS support in PyTorch is not yet fully implemented if device == 'mps': os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" import torch if device == 'mps' and not torch.backends.mps.is_available(): raise Exception("Device set to MPS, but MPS is not available") elif device == 'cuda' and not torch.cuda.is_available(): raise Exception("Device set to CUDA, but CUDA is not available")

คุณสามารถระบุ cpu , cuda (สำหรับ GPU ของ NVIDIA) หรือ mps (สำหรับ Metal Performance Shaders ของ Apple) ได้ จากนั้นสคริปต์จะตรวจสอบว่าอุปกรณ์ที่เลือกนั้นพร้อมใช้งานหรือไม่ และสร้างข้อยกเว้นถ้าไม่พร้อมใช้งาน

ขั้นตอนที่ 2: โหลดโมเดล FLUX

ขั้นต่อไป เราจะโหลดโมเดล FLUX เราจะโหลดโมเดลด้วยความแม่นยำ fp16 ซึ่งจะช่วยประหยัดหน่วยความจำได้โดยไม่สูญเสียคุณภาพมากนัก

ณ จุดนี้ คุณอาจได้รับการขอให้ยืนยันตัวตนด้วย HuggingFace เนื่องจากโมเดล FLUX ถูกจำกัดการเข้าถึง หากต้องการยืนยันตัวตนสำเร็จ คุณจะต้องสร้างบัญชี HuggingFace ไปที่หน้าโมเดล ยอมรับเงื่อนไข จากนั้นสร้างโทเค็น HuggingFace จากการตั้งค่าบัญชีของคุณ และเพิ่มลงในเครื่องของคุณเป็นตัวแปรสภาพแวดล้อม HF_TOKEN

 from diffusers import FlowMatchEulerDiscreteScheduler, FluxPipeline import psutil model_name = "black-forest-labs/FLUX.1-dev" print(f"Loading {model_name} on {device}") pipeline = FluxPipeline.from_pretrained( model_name, # Diffusion models are generally trained on fp32, but fp16 # gets us 99% there in terms of quality, with just half the (V)RAM torch_dtype=torch.float16, # Ensure we don't load any dangerous binary code use_safetensors=True # We are using Euler here, but you can also use other samplers scheduler=FlowMatchEulerDiscreteScheduler() ).to(device)

ที่นี่ เรากำลังโหลดโมเดล FLUX โดยใช้ไลบรารีตัวกระจายแสง โมเดลที่เรากำลังใช้คือ black-forest-labs/FLUX.1-dev โหลดใน fp16 precision

นอกจากนี้ยังมีโมเดล FLUX Schnell ที่ย่อมาจาก timestep ซึ่งมีการอนุมานที่เร็วกว่า แต่ให้ภาพที่มีรายละเอียดน้อยกว่า รวมถึงโมเดล FLUX Pro ซึ่งเป็นแบบปิด เราจะใช้ตัวกำหนดตารางเวลาของ Euler ที่นี่ แต่คุณอาจทดลองใช้สิ่งนี้ได้ คุณสามารถอ่านเพิ่มเติมเกี่ยวกับตัวกำหนดตารางเวลา ได้ที่นี่ เนื่องจากการสร้างภาพอาจต้องใช้ทรัพยากรมาก จึงจำเป็นต้องปรับการใช้หน่วยความจำให้เหมาะสม โดยเฉพาะอย่างยิ่งเมื่อทำงานบน CPU หรืออุปกรณ์ที่มีหน่วยความจำจำกัด

 # Recommended if running on MPS or CPU with < 64 GB of RAM total_memory = psutil.virtual_memory().total total_memory_gb = total_memory / (1024 ** 3) if (device == 'cpu' or device == 'mps') and total_memory_gb < 64: print("Enabling attention slicing") pipeline.enable_attention_slicing()

โค้ดนี้จะตรวจสอบหน่วยความจำทั้งหมดที่มีและเปิดใช้งานการแบ่งส่วนความสนใจหากระบบมี RAM น้อยกว่า 64 GB การแบ่งส่วนความสนใจจะช่วยลดการใช้หน่วยความจำระหว่างการสร้างภาพ ซึ่งเป็นสิ่งสำคัญสำหรับอุปกรณ์ที่มีทรัพยากรจำกัด

ขั้นตอนที่ 3: สร้าง API ด้วย FastAPI

ถัดไปเราจะตั้งค่าเซิร์ฟเวอร์ FastAPI ซึ่งจะจัดเตรียม API สำหรับสร้างรูปภาพ

 from fastapi import FastAPI, HTTPException from pydantic import BaseModel, Field, conint, confloat from fastapi.middleware.gzip import GZipMiddleware from io import BytesIO import base64 app = FastAPI() # We will be returning the image as a base64 encoded string # which we will want compressed app.add_middleware(GZipMiddleware, minimum_size=1000, compresslevel=7)

FastAPI เป็นกรอบงานยอดนิยมสำหรับการสร้าง API เว็บด้วย Python ในกรณีนี้ เราใช้กรอบงานดังกล่าวเพื่อสร้างเซิร์ฟเวอร์ที่สามารถรับคำขอสำหรับการสร้างรูปภาพได้ นอกจากนี้ เรายังใช้มิดเดิลแวร์ GZip เพื่อบีบอัดการตอบสนอง ซึ่งมีประโยชน์อย่างยิ่งเมื่อส่งรูปภาพกลับในรูปแบบ base64

ในสภาพแวดล้อมการผลิต คุณอาจต้องการจัดเก็บรูปภาพที่สร้างขึ้นในบัคเก็ต S3 หรือที่เก็บข้อมูลบนคลาวด์อื่นๆ และส่งคืน URL แทนสตริงที่เข้ารหัส base64 เพื่อใช้ประโยชน์จาก CDN และการเพิ่มประสิทธิภาพอื่นๆ

ขั้นตอนที่ 4: การกำหนดแบบจำลองคำขอ

ตอนนี้เราจำเป็นต้องกำหนดโมเดลสำหรับคำขอที่ API ของเราจะยอมรับ

 class GenerateRequest(BaseModel): prompt: str seed: conint(ge=0) = Field(..., description="Seed for random number generation") height: conint(gt=0) = Field(..., description="Height of the generated image, must be a positive integer and a multiple of 8") width: conint(gt=0) = Field(..., description="Width of the generated image, must be a positive integer and a multiple of 8") cfg: confloat(gt=0) = Field(..., description="CFG (classifier-free guidance scale), must be a positive integer or 0") steps: conint(ge=0) = Field(..., description="Number of steps") batch_size: conint(gt=0) = Field(..., description="Number of images to generate in a batch")

โมเดล GenerateRequest นี้กำหนดพารามิเตอร์ที่จำเป็นในการสร้างภาพ ฟิลด์ prompt คือคำอธิบายข้อความของภาพที่คุณต้องการสร้าง ฟิลด์อื่นๆ ได้แก่ ขนาดของภาพ จำนวนขั้นตอนการอนุมาน และขนาดชุด

ขั้นตอนที่ 5: การสร้างจุดสิ้นสุดการสร้างภาพ

ตอนนี้มาสร้างจุดสิ้นสุดที่จะจัดการคำขอสร้างรูปภาพกัน

 @app.post("/") async def generate_image(request: GenerateRequest): # Validate that height and width are multiples of 8 # as required by FLUX if request.height % 8 != 0 or request.width % 8 != 0: raise HTTPException(status_code=400, detail="Height and width must both be multiples of 8") # Always calculate the seed on CPU for deterministic RNG # For a batch of images, seeds will be sequential like n, n+1, n+2, ... generator = [torch.Generator(device="cpu").manual_seed(i) for i in range(request.seed, request.seed + request.batch_size)] images = pipeline( height=request.height, width=request.width, prompt=request.prompt, generator=generator, num_inference_steps=request.steps, guidance_scale=request.cfg, num_images_per_prompt=request.batch_size ).images # Convert images to base64 strings # (for a production app, you might want to store the # images in an S3 bucket and return the URLs instead) base64_images = [] for image in images: buffered = BytesIO() image.save(buffered, format="PNG") img_str = base64.b64encode(buffered.getvalue()).decode("utf-8") base64_images.append(img_str) return { "images": base64_images, }

จุดสิ้นสุดนี้จัดการกระบวนการสร้างภาพ โดยจะตรวจสอบก่อนว่าความสูงและความกว้างเป็นทวีคูณของ 8 ตามที่ FLUX กำหนด จากนั้นจึงสร้างภาพตามพรอมต์ที่ให้มาและส่งคืนภาพเหล่านั้นเป็นสตริงที่เข้ารหัสด้วย Base64

ขั้นตอนที่ 6: เริ่มต้นเซิร์ฟเวอร์

สุดท้ายเรามาเพิ่มโค้ดบางส่วนเพื่อเริ่มเซิร์ฟเวอร์เมื่อรันสคริปต์

 @app.on_event("startup") async def startup_event(): print("Image generation server running") if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000)

โค้ดนี้จะเริ่มเซิร์ฟเวอร์ FastAPI บนพอร์ต 8000 ทำให้สามารถเข้าถึงได้ไม่เฉพาะจาก http://localhost:8000 เท่านั้น แต่ยังมาจากอุปกรณ์อื่นๆ บนเครือข่ายเดียวกันโดยใช้ที่อยู่ IP ของเครื่องโฮสต์อีกด้วย ขอบคุณการผูก 0.0.0.0

ขั้นตอนที่ 7: ทดสอบเซิร์ฟเวอร์ในเครื่องของคุณ

ตอนนี้เซิร์ฟเวอร์ FLUX ของคุณพร้อมใช้งานแล้ว ถึงเวลาทดสอบแล้ว คุณสามารถใช้ curl ซึ่งเป็นเครื่องมือบรรทัดคำสั่งสำหรับส่งคำขอ HTTP เพื่อโต้ตอบกับเซิร์ฟเวอร์ของคุณได้:

 curl -X POST "http://localhost:8000/" \ -H "Content-Type: application/json" \ -d '{ "prompt": "A futuristic cityscape at sunset", "seed": 42, "height": 1024, "width": 1024, "cfg": 3.5, "steps": 50, "batch_size": 1 }' | jq -r '.images[0]' | base64 -d > test.png

คำสั่งนี้จะใช้งานได้เฉพาะในระบบที่ใช้ UNIX ที่มีการติดตั้งยูทิลิตี้ curl , jq และ base64 ไว้เท่านั้น อาจใช้เวลาสักครู่จึงจะเสร็จสิ้น ขึ้นอยู่กับฮาร์ดแวร์ที่โฮสต์เซิร์ฟเวอร์ FLUX

บทสรุป

ขอแสดงความยินดี! คุณสร้างเซิร์ฟเวอร์ FLUX ของตัวเองโดยใช้ Python สำเร็จแล้ว การตั้งค่านี้ช่วยให้คุณสร้างรูปภาพตามข้อความแจ้งเตือนผ่าน API ง่ายๆ หากคุณไม่พอใจกับผลลัพธ์ของโมเดล FLUX พื้นฐาน คุณอาจลองปรับแต่งโมเดลเพื่อให้มีประสิทธิภาพดีขึ้นในกรณีการใช้งานเฉพาะ

โค้ดเต็ม

คุณจะพบโค้ดเต็มที่ใช้ในคู่มือนี้ได้ด้านล่าง:

 device = 'cuda' # can also be 'cpu' or 'mps' import os # MPS support in PyTorch is not yet fully implemented if device == 'mps': os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" import torch if device == 'mps' and not torch.backends.mps.is_available(): raise Exception("Device set to MPS, but MPS is not available") elif device == 'cuda' and not torch.cuda.is_available(): raise Exception("Device set to CUDA, but CUDA is not available") from diffusers import FlowMatchEulerDiscreteScheduler, FluxPipeline import psutil model_name = "black-forest-labs/FLUX.1-dev" print(f"Loading {model_name} on {device}") pipeline = FluxPipeline.from_pretrained( model_name, # Diffusion models are generally trained on fp32, but fp16 # gets us 99% there in terms of quality, with just half the (V)RAM torch_dtype=torch.float16, # Ensure we don't load any dangerous binary code use_safetensors=True, # We are using Euler here, but you can also use other samplers scheduler=FlowMatchEulerDiscreteScheduler() ).to(device) # Recommended if running on MPS or CPU with < 64 GB of RAM total_memory = psutil.virtual_memory().total total_memory_gb = total_memory / (1024 ** 3) if (device == 'cpu' or device == 'mps') and total_memory_gb < 64: print("Enabling attention slicing") pipeline.enable_attention_slicing() from fastapi import FastAPI, HTTPException from pydantic import BaseModel, Field, conint, confloat from fastapi.middleware.gzip import GZipMiddleware from io import BytesIO import base64 app = FastAPI() # We will be returning the image as a base64 encoded string # which we will want compressed app.add_middleware(GZipMiddleware, minimum_size=1000, compresslevel=7) class GenerateRequest(BaseModel): prompt: str seed: conint(ge=0) = Field(..., description="Seed for random number generation") height: conint(gt=0) = Field(..., description="Height of the generated image, must be a positive integer and a multiple of 8") width: conint(gt=0) = Field(..., description="Width of the generated image, must be a positive integer and a multiple of 8") cfg: confloat(gt=0) = Field(..., description="CFG (classifier-free guidance scale), must be a positive integer or 0") steps: conint(ge=0) = Field(..., description="Number of steps") batch_size: conint(gt=0) = Field(..., description="Number of images to generate in a batch") @app.post("/") async def generate_image(request: GenerateRequest): # Validate that height and width are multiples of 8 # as required by FLUX if request.height % 8 != 0 or request.width % 8 != 0: raise HTTPException(status_code=400, detail="Height and width must both be multiples of 8") # Always calculate the seed on CPU for deterministic RNG # For a batch of images, seeds will be sequential like n, n+1, n+2, ... generator = [torch.Generator(device="cpu").manual_seed(i) for i in range(request.seed, request.seed + request.batch_size)] images = pipeline( height=request.height, width=request.width, prompt=request.prompt, generator=generator, num_inference_steps=request.steps, guidance_scale=request.cfg, num_images_per_prompt=request.batch_size ).images # Convert images to base64 strings # (for a production app, you might want to store the # images in an S3 bucket and return the URL's instead) base64_images = [] for image in images: buffered = BytesIO() image.save(buffered, format="PNG") img_str = base64.b64encode(buffered.getvalue()).decode("utf-8") base64_images.append(img_str) return { "images": base64_images, } @app.on_event("startup") async def startup_event(): print("Image generation server running") if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000)