Table of Contents Umbala weContents Ukucaciswa: Iimpawu ezininzi, iifoto ezamahala I-Part 1 - I-Data Exploration: Ukufumana i-Data esebenzayo I-Part 2 — Preprocessing: Ukuthetha kwilwimi ye-Model Indawo ye-3 — Ukukhishwa kwe-Feature: Ukuguqulwa kwezithombe kwizibalo ezininzi ezininzi Indawo ye-4 - I-Unsupervised Clustering: Ukufundisa Isakhiwo Kwangathi I-Part 5 - Ukuqeqeshwa kwe-Semi-Supervised: I-Core Experiment Indawo ye-6 - Ukuphakamisa kwiimifanekiso ezininzi: I-Roadmap ye-Realistic Ukucinga Iimveliso ezininzi ezininzi, iifoto ezamahala Kwimveliso efanelekileyo, zonke iifoto kwi-dataset yakho ziquka i-label. "I-Defective." "Normal." "I-Crack-type A." "I-Scratch-type B." Kodwa kwimveliso efanelekileyo, i-labeling ikhefu emangalisayo. I-radiologist ye-medical kuphela inokufumana i-50 iifoto ze-brain ngehora - ngexabiso ze-200 ngehora. I-industrial quality inspector inokufumana ngexabiso ze-100 iifoto ngehora. Kwi-scale, i-labeling ye-100,000 iifoto inokufumana iindleko engaphezu kwe-training ye-model yenyewe. Ngiya i-paradox: iinkampani ziquka iimifanekiso ezininzi (kwamakhamera, iisenzisi, ukulayishwa kwabasebenzisi) kodwa zinokufumana kuphela ukulayisha ingxaki elincinci. Kwimeko yokukhuthaza iimifanekiso ongaphakeme, sinokusetyenziswa kwimodeli yokuphucula - ukuxhaswa i-set engaphakeme ye-labeled kunye ne-set engaphakeme ye-labeled. semi-supervised learning Nceda usebenzise i-analogue. Ukukhangisa ukuba umfundisi we-30 abafundi. Uyakwazi ukufumana zonke izifundo, kodwa unayo nje ixesha ukuhlaziywa iiphepha ze-5. Uyakwazi ukuhlaziywa iiphepha ze-5 ngokufanelekileyo, kwaye unokufunda iiphepha: abafundi abalandeli ezininzi ziyafumaneka ukufumana iiphepha ezininzi, kwaye abafundi abalandeli iiphepha ezincinane ziyafumaneka ukufumana iiphepha ezincinane. Ukusebenzisa iiphepha ze-25 ezininzi, unako ukuhlaziywa iiphepha ze-25 ezininzi ngaphandle kokufunda zonke iintlobo. Oku kuquka ukufundiswa kwe-semi-supervised: ungenza iiphepha ezincinane (iinkcukacha ze- Le nqaku ibekwe i-pipeline epheleleyo ye-imaging ye-semi-supervised ukusuka kwangaphambili. Siza kusebenze yonke iminyango kunye nezifundo ezininzi njengoko siyaqhubeka. I-Case Study: Ukubonisa iingxaki zokuvelisa kwi-metal surfaces. I-factory ikhiqiza i-plate ye-stainless steel, kwaye iikhamera zihlanganisa yonke iplate xa ifakwe kwi-line yokukhiqiza. I-plate ezininzi zihlanganisa, ezinye zihlanganisa iingxaki (iingxaki, iingxaki, iingxaki, iingxaki). Thina iifoto ze-10,000, kodwa kuphela i-200 ezifakiwe. Ukucaciswa kweCase: I-factory ikhiqiza iplate ze-steel, kwaye iikhamera zihlanganisa zonke iplate njengoko ifakwe kwi-line yokukhiqiza. Iplate ezininzi zihlanganisa, ezinye zihlanganisa iimpawu (izithuba, izikhwama, i-pitting, iingxaki). Thina iifoto ze-10,000, kodwa kuphela ze-200 zikhwama. detecting manufacturing defects on metal surfaces. Ukwakhiwa kwe-Semi-Supervised Learning Pipeline Ngaphambi kokufunda kwikhowudi ye-code, nceda siqonde i-pipeline epheleleyo. Ukunyaniseka i-flow kuqala kuya kubangela ukuba yonke iminyango ibonelela ngempumelelo kunokuba ngempumelelo. Fumana imizuzu yokufunda le i-diagram - zonke iibhokisi ziquka i-step eyenza: THE COMPLETE SEMI-SUPERVISED PIPELINE: [10,000 raw images] ──▶ [Exploration & Cleaning] │ ▼ [Preprocessing: resize, normalize, histogram equalization] │ ▼ [Feature Extraction: pretrained ResNet50 → 2048-dim embedding per image] │ ┌─────────┴──────────┐ │ │ ▼ ▼ [200 LABELED images] [9,800 UNLABELED images] │ │ │ ▼ │ [Clustering: K-Means, DBSCAN │ on embeddings → pseudo-labels] │ │ │ ▼ │ [WEAKLY labeled dataset] │ (cluster assignments) │ │ ▼ ▼ ┌────────────────────────────────────┐ │ SEMI-SUPERVISED TRAINING: │ │ 1. Pre-train CNN on weakly labeled │ │ 2. Fine-tune CNN on strongly labeled│ │ 3. Compare vs supervised-only │ └────────────────────────────────────┘ │ ▼ [Evaluation: F1, AUC-ROC, confusion matrix, comparison] inkcazelo Key: siya kubhalwe iifoto unlabelled kwi iifoto ezaziwayo (ngokusebenzisa i-clustering), ke usebenzise ulwazi olungagqibeleleyo ukunika iimodeli lokugqibela ukuqala. Hlola njengoko ukunika umfundi umguquli wophando olungagqibeleleyo ngaphambi kokufunda kwimvavanyo yokwenene - ayikho ngokupheleleyo, kodwa kuhle ngaphezu kokungabikho. Ukubalwa Nceda siphinde ngokucacileyo malunga neengxaki ezisetyenziswa kule nqaku, njengoko ukuxhaswa kweengxaki kubaluleke kakhulu: Ukucaciswa kakhulu (okanye nje "i-labeled"): iifoto kunye ne-label ziye zibonwa ngu-human expert. I-Gold standard. Thina i-200 ezi. I-Lably Labeled (okanye i-"pseudo-labeled"): iifoto eziqhelekileyo ziye ziye zithathwe nge-clustering. Iifoto ezincinane kodwa ezincinane. Siza kuvelisa i-9800 ezininzi. Unlabeled: iifoto ngaphandle kwe-etiquette. Le ngempumelelo yayo ngaphambi kokuhlanganisa. Ukubunjwa: Ukubunjwa kwe-numerical ye-image, eyenziwe nge-pre-trained neural network. Izixhobo yethu yokuqala yokwenza iifoto ezinxulumene. Ukubuyekezwa kwakhona: Ukubuyekezwa kwakhona: Ukufundwa kwe-Semi-Supervised Learning - scikit-learn I-Google Research kwi-Semi-Supervised Learning Ukufundwa kwe-Semi-Supervised Learning - scikit-learn I-Google Research kwi-Semi-Supervised Learning I-Data Exploration: Ukuphathelela idatha esebenzayo Yintoni kufuneka uqhagamshelane idatha yakho ngaphambi kokwenza nayiphi na enye I-image datasets zinezinto ezizodwa ze-failure ezikhoyo kwiinkcukacha ze-tabular: iifayile ze-corrupted ezivela kwi-training loop yakho kwi-3 AM, izixazululo ze-inconsistency ezivela iifoto yakho, i-channels ye-colour evamile (i-grayscale ebhalwe yi-RGB), kunye ne-class imbalance ekhulwini apho i-95% yeifoto ziquka "i-normal." Ukuba ufumane i-exploration, i-model yakho iyafundisa ngokulandelanayo i-garbage - kwaye uyazi ukuba akuyona yintoni i-performance engabonakali. Umthetho weGolden: Ndiyathanda ngexesha elidlulileyo. never trust data you haven't inspected. 1.1 — Ukukhuphela kunye nokucinga iifoto I-dataset yethu ifakwe kumadokhumenti ezimbini eziphambili: omnye nge-images ezifakiweyo (i-subfolders "normal" kunye ne-"defect") kunye nomnye nge-images ezifakiweyo (khona i-subfolders - nje i-flat collection ye-images). iifayile) .png import os import numpy as np import pandas as pd import matplotlib.pyplot as plt from PIL import Image from pathlib import Path data_dir = Path("data/metal_surfaces") labeled_dir = data_dir / "labeled" unlabeled_dir = data_dir / "unlabeled" Ukusetyenziswa i-concatenation ye-string ngenxa yokusebenzisa i-path separators ngokufanelekileyo kwinkqubo ye-operating system. pathlib.Path labeled_files = list(labeled_dir.glob("**/*.png")) unlabeled_files = list(unlabeled_dir.glob("**/*.png")) print(f"Labeled images: {len(labeled_files)}") print(f"Unlabeled images: {len(unlabeled_files)}") print(f"Total: {len(labeled_files) + len(unlabeled_files)}") label_ratio = len(labeled_files) / (len(labeled_files) + len(unlabeled_files)) print(f"Label ratio: {label_ratio:.1%}") I-label ratio ikhona malunga ne-2%. I-2% yeendaba lethu iye i-label eyakhelwe ngu-expert. I-98% eyahlukileyo iye i-Goldmine eyenza ukuba siphinde - kwaye oku nto leyo yokufunda kwe-semi-supervised learning. 1.2 — Ukukhanyisa iimfuno: ukuphakama, umbala, ukuphazamiseka Okulandelayo, kufuneka ukubuyekeza zonke iifoto ngalinye. Iifayile eyodwa ekugqibeleni inokukhawuleza ukuqeqeshwa yonke. I-resolutions engabonakaliyo iya kubuyekeza iifoto ukuba unemibuzo. Kwaye iimodi ze-colour engabonakaliyo (i-grayscale xa imodeli yakho ibanga i-RGB) iya kuvelisa iimpawu ezininzi. Thina ubhalise i-function ebandayo ebandayo ukuyifunda zonke iifoto kunye nokubhala izakhiwo zayo: def get_image_info(filepath): """ Try to open an image and record its properties. If the file is corrupted, Pillow will throw an exception. """ try: img = Image.open(filepath) return { "path": str(filepath), "width": img.size[0], "height": img.size[1], "mode": img.mode, # 'RGB', 'L' (grayscale), 'RGBA' "filesize_kb": os.path.getsize(filepath) / 1024, "corrupted": False, } except Exception: return { "path": str(filepath), "width": None, "height": None, "mode": None, "filesize_kb": None, "corrupted": True, } Yintoni Iindawo ezininzi kubalulekile. ithetha 3 iikhangiso umbala (i-red, i-green, blue). umzekelo greyscale (1 channel). umzekelo RGB kunye ne-alpha (i-transparency) channel. Iimodeli yethu eyenziwe ngexabiso RGB, ngoko siya kufuneka ukuguqulwa yonke into emva koko. img.mode 'RGB' 'L' 'RGBA' Ndiyathanda zonke iifoto ze-10,000: all_files = labeled_files + unlabeled_files image_info = [get_image_info(f) for f in all_files] info_df = pd.DataFrame(image_info) Nceda ucebise iziphumo: print(f"Total images scanned: {len(info_df)}") print(f"Corrupted images: {info_df['corrupted'].sum()}") print(f"\nResolution distribution:") print(info_df[~info_df["corrupted"]][["width", "height"]].describe().round(0)) print(f"\nColor modes: {info_df['mode'].value_counts().to_dict()}") print(f"File size (KB): min={info_df['filesize_kb'].min():.0f}, " f"max={info_df['filesize_kb'].max():.0f}, " f"mean={info_df['filesize_kb'].mean():.0f}") Yintoni ukufuna kwi-output: Iifoto eziluncedo > 0: Ukukhusela ngokuzenzakalelayo. Kwaye ifayile elifanelekileyo enye kungabangela ukuqeqesha yakho. I-resolution ezahlukeneyo: Ukuba i-min ≠ max ye-width okanye i-high, iifoto ziquka i-size ezahlukeneyo. Siza kubhalwe ku-224x224 ngexesha le-preprocessing. Iintlobo ezininzi zeentsimbi: Ukuba ukhangela 'RGB' kunye 'L', unayo iingxaki zeentsimbi kunye ne-greyscale. Thina ukuguqulwa konke kwi-RGB. Iifayile ze-Extreme: Iifayile ye-1KB kunokwenzeka ukuba ingxubevange okanye ingxubevange. Iifayile ye-50MB kunokwenzeka ukuba ingxubevange. Nceda ukutya iifayile ezincinane: corrupted_paths = set(info_df[info_df["corrupted"]]["path"].tolist()) if corrupted_paths: print(f"Removing {len(corrupted_paths)} corrupted images") labeled_files = [f for f in labeled_files if str(f) not in corrupted_paths] unlabeled_files = [f for f in unlabeled_files if str(f) not in corrupted_paths] 1.3 — I-Class Distribution: I-Etiqueted Set yethu ifumaneka? Kwiimeko ezimbonini kunye nezonyango, iimfuno ziyafumaneka. Iqela lakho le-labelled ingaba i-90% "i-normal" kunye ne-10% "i-defect." Oku kubaluleke kakhulu: iimodeli ye-lazy ebonakalisa "i-normal" iya kuba ne-90% ngokufanelekileyo nangokunxibelelana ngokupheleleyo. Thina kufuneka ufunde i-balance ngexesha elandelayo ukuze sinokuthintela ngexesha elandelayo. class_counts = {} for class_dir in labeled_dir.iterdir(): if class_dir.is_dir(): count = len(list(class_dir.glob("*.png"))) class_counts[class_dir.name] = count print("Class distribution (labeled set):") for cls, count in class_counts.items(): pct = count / sum(class_counts.values()) * 100 print(f" {cls}: {count} images ({pct:.1f}%)") Ukuba ubunzima, siya kuthatha emva kokuba usebenzisa ubuchwepheshe ebizwa ngokuthi kwi-function loss - ngokwenene ibonisa iimodeli "ukungaphumelela i-defect yi-4x engaphezulu kwe-false alarm". pos_weight Nceda siphindeza kwakhona: fig, ax = plt.subplots(figsize=(6, 4)) ax.bar(class_counts.keys(), class_counts.values(), color=["#2ecc71", "#e74c3c"]) ax.set_title("Class Distribution (Labeled Images)") ax.set_ylabel("Number of images") plt.tight_layout() plt.savefig("outputs/class_distribution.png", dpi=150) plt.show() 1.4 — Ukubonisa iimifanekiso isampuli: ngokuzenzakalelayo ukubonisa phambi kokuba model Oku kunokwenzeka ukuba ingxaki ebalulekileyo kwi-pipeline ngokupheleleyo. Khangela idatha yakho. Unokufumana iifoto ezibonakalayo, ukucacisa i-artefacts (i-black boundaries, i-rotations), iingxaki zekhwalithi (i-blur, i-overexposure), okanye ukuba iingxaki zibonakalayo ezincinane kwaye umsebenzi yakho kunzima kunzima kunzima. fig, axes = plt.subplots(2, 5, figsize=(15, 6)) fig.suptitle("Sample Images — Top: Normal | Bottom: Defect", fontsize=14) for i, class_name in enumerate(["normal", "defect"]): class_files = list((labeled_dir / class_name).glob("*.png"))[:5] for j, filepath in enumerate(class_files): img = Image.open(filepath) axes[i, j].imshow(img, cmap="gray" if img.mode == "L" else None) axes[i, j].set_title(class_name, fontsize=10) axes[i, j].axis("off") plt.tight_layout() plt.savefig("outputs/sample_images.png", dpi=150) plt.show() Nceda usebenzisa iifoto ezininzi. Ngaba Ukuba ungenza iimfuno kunye nabanye? Ukuba ungenza, iimodeli iya kuthatha nje. Ukuba iimfuno zibonakalayo (i-scratch emnyama, i-crack emnyama), oku kunzima - iimodeli kufuneka akwazi ukufundisa umngciwane. Ukuba zibonakalayo (ukugqithiswano obunzima, i-crack ye-hairline), uya kufuneka i-preprocessing elungileyo kunye ne-function extraction. Ngaba Ukubuyekezwa kwakhona: Ukubuyekezwa kwakhona: I-Pillow Documentation - i-Python image library ezisetyenziselwa ukulayisha iifoto I-NEU Surface Defect Database — i-real-world steel surface defect data set ukuba unako ukufundisa Izixhobo zePillow I-NEO Surface Defect Database I-Part 2 — Preprocessing: Ukuthetha kwilwimi ye-model Yintoni awukwazi ukutya iifoto eziliqela kwi-network ye-neural I-CNN efanelekileyo, efana ne-ResNet50, iye yandiswa kwiintlobo ezithile kakhulu ze-input: iifoto ze-224x224 ye-pixel, kwi-color RGB, ezisetyenziswa ngamaxabiso ze-middle kunye ne-deviation ye-standard ezilinganiselwe kwi-ImageNet dataset. Ukuba utshintshe iifoto ze-size ezahlukeneyo, okanye iifoto ze-pixel ezisetyenzisiweyo kunokuba ze-normalized, iimpawu eziqhelekileyo ziye zithunyelwe. Uyafikelela kwilwimi. I-ResNet50 "ukuthetha i-ImageNet." Ukuba ufuna ukuba ufumane iifoto zethu ze-metal surface, kufuneka "ukuthunyelwe" kwi-ImageNet ifomati yokuqala. Le ngxelo kuquka iinyathelo ezine: Ukuguqulwa kwi-RGB (3 channel) Ukuphucula i-Contrast nge-Histogram Equalization Ubungakanani 224×224 I-Normalize i-pixel values ukuze ifumaneke ne-ImageNet statistics 2.1 - Yintoni i-histogram equalization, yaye ngoko ke kubalulekile apha? Iifoto ze-industrial ziquka i-contrast emangalisayo kakhulu. I-difference phakathi kwe-surface ye-normal kunye ne-scratched ingaba i-pixel ezincinane ye-intensity levels - engabonakali kwi-eye, kwaye engabonakali kakhulu ukuba i-model ifumaneke. Ukuguqulwa kwizinga le-pixel ukuze i-interval epheleleyo (0 ukuya ku-255) isetyenziswe ngempumelelo. Umphumela: iimpawu ezincinane "ukukhuthaza" ngokucacileyo kunye ne-numerical. Histogram equalization Thina usebenzisa inguqulelo elandelayo (I-Contrast Limited Adaptive Histogram Equalization). Ngokungafani ne-global equalization (eyenza i-transformation efanayo kwimifanekiso epheleleyo), i-CLAHE ibandakanya umfanekiso kwiicandelo amancinci (i-8x8 ngokufanelekileyo) kwaye ibandakanya iicandelo ngamnye ngokufanelekileyo. Oku kuthatha iinkcukacha zangaphakathi kakhulu. CLAHE Ndiyathanda i-analogia: Ukubonisa ukuba ukuguqulwa kwe-brightness kwiifoto. Ukuguqulwa kwe-global kuquka ukusetyenziswa kwe-slider ye-brightness eyodwa kwiifoto epheleleyo - unako ukucacisa iingongoma ze-black but wash out the already-bright center. I-CLAHE kuquka ukuguqulwa kwe-brightness kumazwe ngamnye ngokufanelekileyo, ngoko zonke iindawo ze-image zibonakalisa. 2.2 — Ukwakha i-Custom PyTorch Dataset I-PyTorch ibekwe ukufakelwa kwedatha kwiindidi ezimbini: (Ukujua indlela yokulungisa into eyodwa) kunye (Ukujua indlela yokufaka kunye nokufaka izinto). I-PyTorch inikeza i-built-in i-dataset, kodwa ithetha ukuba zonke iifoto zihlanganisa i-label. I-dataset yethu i-labeled kunye ne-labeled iifoto, ngoko kufuneka i-class ye-custom. Dataset DataLoader ImageFolder Thina ukuvelisa ngexesha ngexesha. Okokuqala, i-skeleton: import torch import torchvision.transforms as T from torch.utils.data import Dataset, DataLoader import cv2 class MetalSurfaceDataset(Dataset): """ Custom Dataset that handles both labeled and unlabeled images. Returns -1 as the label for unlabeled images. """ def __init__(self, image_paths, labels=None, transform=None): self.image_paths = image_paths self.labels = labels # None for unlabeled images self.transform = transform def __len__(self): return len(self.image_paths) ithetha i-PyTorch ukuba iifoto ezininzi ezikhoyo. isithunyelwe iindlela ze-image kunye (i-optional) i-labels zayo. __len__ __init__ ngoku indlela core, , leyo ifakwe kwaye preprocesses umfanekiso omnye. Siza kubandakanya kwiiyure ezintathu: __getitem__ Stage 1 — Load the image and force RGB: def __getitem__(self, idx): # Load image and convert to RGB # .convert("RGB") handles grayscale → RGB conversion automatically # (it duplicates the single channel into R, G, and B) img = Image.open(self.image_paths[idx]).convert("RGB") Yintoni ? Ngokuba i-ResNet50 ihamba i-3 amachana. Ukuba umfanekiso yethu i-greyscale (1 i-channel), oku kubandakanya i-values ye-grey kwi-R, G, kunye ne-B. Ukuba xa i-RGB, akukho nto. Ukuba i-RGBA (i-4 amachana), kunceda i-alpha channel. .convert("RGB") Stage 2 — Apply CLAHE histogram equalization: # Convert PIL Image → numpy array for OpenCV processing img_np = np.array(img) # Convert RGB → LAB color space # L = Lightness (brightness), A and B = color channels # We only equalize L (brightness) to avoid distorting colors img_lab = cv2.cvtColor(img_np, cv2.COLOR_RGB2LAB) # Create CLAHE object and apply to L channel # clipLimit=2.0 prevents over-amplification of noise # tileGridSize=(8,8) means 8x8 tiles for local equalization clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8)) img_lab[:, :, 0] = clahe.apply(img_lab[:, :, 0]) # Convert back LAB → RGB → PIL Image img_np = cv2.cvtColor(img_lab, cv2.COLOR_LAB2RGB) img = Image.fromarray(img_np) Yintoni i-LAB yaye akuyona kuphela ukusetyenziswa kwe-CLAHE ngqo kwi-RGB? Ngenxa yokuba ukuba uqhagamshelane kwi-R, i-G, kunye ne-B ngokufanelekileyo, uqhagamshelane imibala - ukuguqulwa kwe-blue kwi-green, umzekelo. Ukusebenza kwi-LAB space, sinxibelelana kuphela kwi-lightness (L) channel kwaye zithintela imibala (A, B) ngaphandle. Oku kuyinto isebenzo se-standard kwi-image processing. Stage 3 — Apply transforms and return: # Apply resize + normalize transforms if self.transform: img = self.transform(img) # Return the label (or -1 if this image has no label) label = self.labels[idx] if self.labels is not None else -1 return img, label 2.3 — I-Transformation Pipeline: Yintoni i-Each Step ithatha Ngiyazi, sincoma umzekelo we-transformations. Wonke unayo isicelo esifanelekileyo: preprocessing = T.Compose([ T.Resize((224, 224)), # (1) Resize to model's expected input size T.ToTensor(), # (2) PIL Image → PyTorch Tensor, scale [0,1] T.Normalize( mean=[0.485, 0.456, 0.406], # (3) Normalize with ImageNet statistics std=[0.229, 0.224, 0.225], ), ]) Nceda sincoma zonke iingxaki: I-Architecture ye-ResNet50 inokufuneka ngexesha elifanayo. Ukuba uthathe iifoto ye-300x400, iifomati ze-tensor ayikhoyo kwaye i-PyTorch iya kutshintshwe. Ukuguqulwa kwakhona kunokukhawuleza iifomati ze-aspect ngokufanelekileyo, kodwa kwiziko zokusekelwe kwi-texture (njenge-detection ye-defect) oku kukuba ingxaki. (1) Resize to 224×224. Yenza izinto ezimbini: ukuguqulwa i-pixel format ukusuka kwi-HWC (High × Width × Channels) kwi-CHW (Channels × Height × Width), nto leyo i-PyTorch ibekwe; kunye nokuguqulwa amaxabiso ze-pixel ukusuka kwi-0, 255 amaxabiso ezininzi ukuya kwi-0.0, 1.0] amaxabiso. (2) ToTensor. Zifumaneka ezininzi - iimveliso — i-average kunye ne-standard deviation ye-pixel values kulo lonke i-ImageNet dataset, i-channel ye-channel ye-channel (R, G, B). I-ResNet50 iye yenzelwe kunye neeyunithi ezininzi ze-normalization, ngoko izicwangciso zayo zangaphakathi "ngathi" iingxaki zihlanganisa malunga ne-0 kunye ne-unity variance. Ukusetyenziswa kwizigidi ezahlukileyo iya kuba njenge-in-inch, kodwa ukucacisa kwi-centimeters - i-matematics ingxaki. (3) Normalize with ImageNet mean and std. [0.485, 0.456, 0.406] [0.229, 0.224, 0.225] 2.4 - Ukwenza Datasets kunye DataLoaders Ngiyaxolisa yonke into. Umgangatho olufanelekileyo: Ukutshintshwa kwabo kubandakanya ukubaluleka yethu. labeled and unlabeled data must be kept strictly separate at all times. Okokuqala, ukuthatha iindlela ze-image ezifakiwe kunye ne-class labels zayo: labeled_paths = [] labeled_labels = [] for class_idx, class_name in enumerate(["normal", "defect"]): class_dir = labeled_dir / class_name for fp in class_dir.glob("*.png"): labeled_paths.append(str(fp)) labeled_labels.append(class_idx) # 0 = normal, 1 = defect Emva koko, ukwamkela iindlela zeimifanekiso engabonakaliwe (hhayi ama-etiquette engabandakanyeka): unlabeled_paths = [str(fp) for fp in unlabeled_files] Yenza i-PyTorch Dataset Objects: labeled_dataset = MetalSurfaceDataset(labeled_paths, labeled_labels, preprocessing) unlabeled_dataset = MetalSurfaceDataset(unlabeled_paths, labels=None, transform=preprocessing) print(f"Labeled dataset: {len(labeled_dataset)} images") print(f"Unlabeled dataset: {len(unlabeled_dataset)} images") Okugqibela, uqhagamshelane kwi-DataLoaders. I-DataLoader uqhagamshelane iifoto (batch_size=32 inokuthi iifoto ze-32 ngexesha elifanayo) kwaye ngexabiso: labeled_loader = DataLoader(labeled_dataset, batch_size=32, shuffle=False) unlabeled_loader = DataLoader(unlabeled_dataset, batch_size=32, shuffle=False) Yintoni Ngenxa yokuba siza ukunyusa iimpawu, kwaye sincoma iingxaki ukuba ziyafumaneka ngexesha elifanayo kunye nemifuno zethu zefayela. Siza shuffle emva koko, ngexesha lokufunda. shuffle=False Ukubuyekezwa kwakhona: Ukubuyekezwa kwakhona: iimveliso ze-torchvision - isixhobo olupheleleyo I-CLAHE ibonisa (i-OpenCV tutorial) I-PyTorch Dataset & i-DataLoader iimveliso ze-torchvision - isixhobo olupheleleyo I-CLAHE ibonisa (i-OpenCV tutorial) I-PyTorch Dataset & i-DataLoader I-Part 3 - Ukukhishwa kweFunction: Ukuguqulwa kwimibala ezininzi ezininzi 3.1 — Yintoni i-pixels ebomvu i-representation ebomvu I-image ye-224×224 ye-RGB ine-150,528 iinxalenye (224 × 224 × 3 iiseti). Iziqu ze-noise - iingxaki ezincinane ze-illumination, i-sensor artifacts, i-compression artifacts. I-worse: iifoto ezimbini ze-scratch efanayo, eyenziwa kwiingxaki ezininzi ezahlukileyo okanye ze-illumination, ziquka i-pixel ezahlukileyo ngokupheleleyo. Ukuba sinqathise okanye ukuxhaswa i-pixel ezincinane, iifoto ezininzi ezininzi ziyafumaneka kwizithombo. Yintoni ke kufuneka i-representation eyenza i iimifanekiso - "eya kubona i-cratch", "eya kuquka i-surface" - kwi-compact, i-stable ye-number form. Nceda meaning embeddings 3.2 — Yintoni i-model pre-trained kwaye ngoko ke asikwazi ukuqeqesha ukusuka kwakhona Ukuqeqesha inethiwekhi ye-neural ye-deep ukusuka kwelanga kufuneka iinkcukacha ezininzi - ngokuvamile amayunithi ezininzi. Kukho iifoto ezingu-200 ezifakiweyo. Ukuba thina ukuqeqesha iiparamitha ezingu-25 ye-ResNet50 kwiifoto ezingu-200, iimodeli uyakwazi ukufumana zonke iifoto ze-training ngokufanelekileyo kodwa ayikho ngokupheleleyo kwiifoto ezintsha. Oku kubizwa ngokuba . overfitting Kwimeko, sisebenzisa imodeli efunyenwe kwakhona kwi-ImageNet - i-dataset ye-14 million imifanekiso kwi-1,000 iindidi (i-dogs, i-cat, i-cars, i-buildings, njl). Le imodeli yaziwa ukufumana iimpawu eziqhelekileyo ze-visual: iingubo, i-textures, i-shapes, i-color gradients, i-geometric patterns. - zisebenza kwiimveliso ze-steel ngokufanelekileyo njenge-cat. universal Ndicinga njengoko ukunyaniseka umbhobho omdla ukuba utshintshe mveliso yakho. Abanikwazi ukufumana umbhobho we-stainless steel. : bafumana iintsholongwane ezisemthethweni, iintsholongwane ezininzi kwingqongqo yemveliso, iimveliso ezivela kwimveliso. Kukho kufuneka nje ukufundisa into efana ne "imibuzo" kwimveliso yakho eyodwa. Ukucinga 3.3 — Loading ResNet50 Nceda siphinde iimodeli pre-trained: import torchvision.models as models resnet = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1) I-single line ihamba (kuqala) imodeli ye-ResNet50 ebandayo kwi-ImageNet. I-modeli uyazi ukucacisa i-1000 iindidi zeengoma zokusetyenziswa ngosuku. 3.4 - Ukuvula iiparamitha Thina usebenzisa i-ResNet njengezixhobo yokufunda kuphela: for param in resnet.parameters(): param.requires_grad = False I-PyTorch inikeza "ukunciphisa i-gradients kwiiparametre ezininzi." Oku inezinzuzo ezimbini: ibonelela ukuguqulwa kwe-accidental yeengxaki ze-pre-trained, kwaye ibonelela ngokukhawuleza (ukunciphisa i-gradient tracking = ukunciphisa okunciphisa nokunciphisa i-memory). requires_grad = False 3.5 — Ukunciphisa isifundo se-heading I-Architecture ye-ResNet50 ifumaneka oku: Input image (224×224×3) ↓ [Convolutional layers] — learn visual features ↓ [Average Pooling] — compress spatial dimensions → 2048-dim vector ↓ [Fully Connected layer] — classify into 1000 ImageNet categories ↓ Output (1000 probabilities) Thintanda i-vector ye-2048-dimensional ukusuka kwi-Average Pooling layer - nto leyo i-embedding yethu. Thintanda i-Last Fully Connected layer, ngenxa yokuba i-Vector ye-ImageNet ye-1000 Classs (i-dog, cat, airplane ...) kwaye engabonakali kwimfuneko yethu. feature_extractor = torch.nn.Sequential(*list(resnet.children())[:-1]) Yintoni linqweno: Ukuguqulwa zonke iingxaki ye-ResNet njenge-list. Zonke iingxaki zihlala ngaphandle kweengxaki ezidlulileyo (i-FC layer). Yenza iimveliso ezintsha kwimodeli. resnet.children() [:-1] Sequential(*) feature_extractor.eval() mode ifumaneka i-drop-out kwaye isebenzisa i-statistics yokusebenza ukuguqulwa kwe-batch. Oku kuqinisekisa ukuba i-model ibonelela izisombululo - umfanekiso efanayo ikhiqiza umfanekiso efanayo. eval() deterministic device = torch.device("cuda" if torch.cuda.is_available() else "cpu") feature_extractor = feature_extractor.to(device) print(f"Feature extractor ready on {device}") Ukusetyenziswa kwe-GPU (kuye kufumaneka) ivumela ukuvelisa iimpawu ngexesha angama-10x. 3.6 - I-Function yokuvelisa Ngoko ke siphinde isicelo esifundisa iiphakheji zeimifanekiso ngokusebenzisa i-function extractor kunye nokufaka i-embeddings. Thina ukwakha iiphakheji ngamnye. Izixhobo ze-external : def extract_embeddings(dataloader, model, device): """ Feed all images through the model and collect embeddings. Returns: embeddings: numpy array, shape (n_images, 2048) labels: numpy array, shape (n_images,) — -1 if unlabeled """ all_embeddings = [] all_labels = [] Izixhobo zihlanganisa kwi-lists ngenxa yokusebenza iifoto kwi-batches (i-32 ngexesha elifanayo), akukho zonke ngexesha elifanayo (ukungabikho kwi-GPU memory). Uhlobo lokuqala: with torch.no_grad(): for batch_images, batch_labels in dataloader: batch_images = batch_images.to(device) features = model(batch_images) Ukukhusela ukucaciswa kwe-gradient - ebalulekileyo ngenxa yokuba sisebenza kuphela ukucaciswa, akukwazi ukucaciswa. Oku kuphela ukunciphisa ukusetyenziswa kweememori ngalinye kwaye isantya izinto. isithuba iifoto kwi-GPU ukuba sinayo. torch.no_grad() batch_images.to(device) Ukukhiqizwa kwe Ukucinga — iingubo ezimbini ezidlulileyo ziquka i-spatial residues ezivela kwi-medium pooling. Thina kufuneka i-squeeze: model(batch_images) (batch_size, 2048, 1, 1) features = features.squeeze(-1).squeeze(-1) # Now shape is (batch_size, 2048) — that's our embedding Okugqibela, sinikeza imiphumela kwakhona kwi-CPU (i-numpy ayikwazi ukusebenza nge-tensors ye-GPU) kwaye zihlanganisa: all_embeddings.append(features.cpu().numpy()) all_labels.append(batch_labels.numpy()) return np.concatenate(all_embeddings), np.concatenate(all_labels) Zihlanganisa zonke iintlobo kunye ne-array eyodwa. np.concatenate 3.7 — Ukusebenza kwe-extraction print("Extracting embeddings for labeled images...") labeled_embeddings, labeled_labels_arr = extract_embeddings( labeled_loader, feature_extractor, device ) print(f" Shape: {labeled_embeddings.shape}") # Expected: (200, 2048) 200 iifoto, eyahlukileyo ngalinye yi-vector ye-2048-dimensional. Yinto i-compression ye-73x kuqhathaniswa ne-pixels ebomvu (150,528 → 2,048) - kwaye i-representation e-compressed Ukubaluleka kakhulu. Ukucinga print("Extracting embeddings for unlabeled images...") unlabeled_embeddings, _ = extract_embeddings( unlabeled_loader, feature_extractor, device ) print(f" Shape: {unlabeled_embeddings.shape}") # Expected: (9800, 2048) Thina ushiye i-labels (kwaye bonke -1) kunye inguqulelo _ 3.8 — Ukuphucula i-embeddings (ukunciphisa ngokuzenzakalelayo yonke imizuzu!) Ukukhwabanisa iimfuno i-step eyenziwe kakhulu - iiyure ezininzi ze-30 kwi-GPU ye-10,000 iifoto. Ukukhwabanisa iziphumo ukuze akufuneka ukutshintsha: np.save("data/labeled_embeddings.npy", labeled_embeddings) np.save("data/labeled_labels.npy", labeled_labels_arr) np.save("data/unlabeled_embeddings.npy", unlabeled_embeddings) print("Embeddings saved to disk") Emva koko, unokufumana ngokuzenzakalelayo kunye . np.load("data/labeled_embeddings.npy") 3.9 — Ukubuyekezwa kwe-sanity: i-embeddings ziyafumaneka? Ngaphambi kokuphumelela, ukulawula ngokukhawuleza. I-Garbage-in, i-Garbage-out - siphinde ukuba i-embeddings zethu ziyafumaneka: print(f"Embedding statistics:") print(f" Mean: {labeled_embeddings.mean():.4f}") print(f" Std: {labeled_embeddings.std():.4f}") print(f" Min: {labeled_embeddings.min():.4f}") print(f" Max: {labeled_embeddings.max():.4f}") print(f" NaN: {np.isnan(labeled_embeddings).any()}") print(f" Inf: {np.isinf(labeled_embeddings).any()}") Yintoni ukufikelela: ubuncinane malunga ne-0.3-0.5, std malunga ne-0.5-1.0, akukho NaN, akukho Inf. Ukuba ukhangela iindidi zeNaN, umfanekiso ebomvu kunokuthintela ngexesha lokucoca. Ukuba ubuncinane ngexabiso ngexabiso 0. Ukusuka ngaphezulu: Ukusuka ngaphezulu: Ukufundwa kwe-Transfer Learning (i-PyTorch Tutorial) I-ResNet Paper (He et al., 2015) - i-architecture yokuqala ebonakalisa ukuxhasa Ukukhishwa kweFunction vs fine-tuning — Stanford CS231n izifundo Ukufundwa kwe-Transfer Learning (i-PyTorch Tutorial) Umbhali we-ResNet (He et al., 2015) Ukukhishwa kweFunction vs fine-tuning I-Part 4 - I-Unsupervised Clustering: Ukufundisa Isakhiwo Kwangathi 4.1 — Yintoni i-clustering yenza kwaye yintoni siya kufuneka Thina ngoku i-10,000 ingxubevange - iingxubevange zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha zeenkcukacha. Kwiimveliso ezibonakalayo ezivela kwi-"normal" kunye ne-"defect". groupings I-assumption esisiseko: ukuba ama-embeddings zethu ziyafumaneka (i-ResNet50 embeddings ngokuvamile), iifoto ze-type efanayo ziyafumaneka kwindawo yokubandakanya. Izindawo ezilula ziya zihlanganisa; izindawo ezincinane ziya zihlanganisa. Nangona ngaphandle kwemibala, izakhiwo zihlanganisa - i-clustering ibonise. Ukucinga Kodwa okokuqala, ingxaki esebenzayo: 2048 ubungakanani ayikwazi ukufikelela kwaye ezinye algorithms zithembisa. Thina kuqala ukunciphisa ubungakanani. 4.2 — Standardize the embeddings I-algorithms ye-clustering, ikakhulukazi i-K-Means, i-calculate the distances between points. Ukuba i-dimension eyodwa ivela kwi-0 ukuya kwi-1000 kwaye enye kwi-0 ukuya kwi-0.01, i-algorithm yokuqala iya kuxhomekeke ngokupheleleyo kwi-distance - njengoko ifumaneka i-dimension yesibini. I-standardization (i-mean=0, i-std=1 ngalinye-dimension) ibeka zonke iimpawu kwi-equal footing. from sklearn.preprocessing import StandardScaler # Combine labeled + unlabeled for joint standardization all_embeddings = np.concatenate([labeled_embeddings, unlabeled_embeddings], axis=0) scaler = StandardScaler() all_embeddings_scaled = scaler.fit_transform(all_embeddings) Yintoni ukulungiselela i-labeled kunye ne-unlabeled ngokufanayo? Ngenxa yokuba zithatholakala kwi-distribution efanayo (i-factory efanayo, i-camera efanayo). Ukusetyenziswa kwe-labeled kunye ne-label unlabeled ibonelela ukucaciswa okuqhubekayo. # Split back — we'll need them separate later labeled_scaled = all_embeddings_scaled[:len(labeled_embeddings)] unlabeled_scaled = all_embeddings_scaled[len(labeled_embeddings):] print(f"After standardization: mean={all_embeddings_scaled.mean():.4f}, " f"std={all_embeddings_scaled.std():.4f}") 4.3 — Ukunciphisa ubukhulu nge PCA I-PCA (I-Principal Component Analysis) ibonise iindlela ze-variance ephakeme kwimibelelwano kunye ne-projects kwiindawo ezidlulileyo. Siza kuxhomekeke kwi-2048 ukuya kwi-50 iingxaki: from sklearn.decomposition import PCA pca = PCA(n_components=50, random_state=42) all_pca = pca.fit_transform(all_embeddings_scaled) print(f"PCA: 2048 → 50 dimensions") print(f"Variance retained: {pca.explained_variance_ratio_.sum():.1%}") ~95% i-variance efunyenweyo kunceda ukuba sincoma kuphela i-5% yeinkcukacha kodwa ukunciphisa i-dimensionality ngama-40x. Oku kwenza i-t-SNE kunye ne-DBSCAN ngokukhawuleza nangaphezulu. Thina kwakhona ukubonisa ngalinye i-component eyodwa ibonelela: plt.figure(figsize=(10, 4)) plt.plot(np.cumsum(pca.explained_variance_ratio_), marker="o", markersize=3) plt.xlabel("Number of PCA components") plt.ylabel("Cumulative explained variance") plt.title("PCA: How many components do we need?") plt.axhline(y=0.95, color="r", linestyle="--", label="95% threshold") plt.legend() plt.grid(True, alpha=0.3) plt.tight_layout() plt.savefig("outputs/pca_variance.png", dpi=150) plt.show() Oku "i-compound" ibonisa apho ukongezwa kwezinto ezininzi kunikeza ukufumana iziphumo ezininzi. Ukuba i-curve ifakwe phambi kwe-50 i-components, unako ukusetyenzisa ngaphezulu. 4.4 — Ukubonisa nge-t-SNE t-SNE yi-nonlinear dimensionality reduction technique eyenzelwe ngokugqithisileyo yokuzonwabisa. iifoto eziquka kwi-high-dimensional space iya kuxhomekeke kwi-plot 2D. Oku kwenza elungileyo yokubuyisa ukuba iifoto eziqhelekileyo kunye neefayile zihlanganisa ngokwemvelo. local structure Umnqweno oluphambili: i-t-SNE ivimbele izixeko zehlabathi - i-space phakathi kwama-cluster ayikho. Isebenzisa nje ngenxa yokubonisa, kwaye i-cluster kwi-embeddings yokuqala (okanye i-PCA-reduced). never cluster on t-SNE output. from sklearn.manifold import TSNE # Apply t-SNE on PCA output (faster and more stable than on raw 2048-dim) tsne = TSNE(n_components=2, random_state=42, perplexity=30, n_iter=1000) all_tsne = tsne.fit_transform(all_pca) Yintoni i-parameter ikulawula ngokubanzi i-"ukubanzi yeengqungquthela" - ubungakanani obungapheliyo ye-t-SNE. I-30 yi-default ye-datasets yeenkcukacha zethu. perplexity Ngoku, thina ukwahlula i-t-SNE iiyunithi: labeled_tsne = all_tsne[:len(labeled_embeddings)] unlabeled_tsne = all_tsne[len(labeled_embeddings):] Nceda ukubonisa: fig, axes = plt.subplots(1, 2, figsize=(14, 6)) # Left plot: labeled images only, colored by true label for cls_idx, cls_name, color in [(0, "Normal", "#2ecc71"), (1, "Defect", "#e74c3c")]: mask = labeled_labels_arr == cls_idx axes[0].scatter(labeled_tsne[mask, 0], labeled_tsne[mask, 1], label=cls_name, alpha=0.7, s=40, c=color) axes[0].set_title("t-SNE: Labeled Images (True Labels)") axes[0].legend() Ukuba unayo izibambo ezimbini ezahlukileyo kwi-plot - i-green ngoko enye, i-red ngoko elinye - oku kunceda elungileyo. Oku kuthetha ukuba i-ResNet50 ingxubevange ngokugqithisileyo iingxubevange phakathi kwezingxubevange ezijwayelekile kunye nezingxubevange. # Right plot: all images (unlabeled in gray, labeled overlaid) axes[1].scatter(unlabeled_tsne[:, 0], unlabeled_tsne[:, 1], c="lightgray", alpha=0.2, s=10, label="Unlabeled") for cls_idx, cls_name, color in [(0, "Normal", "#2ecc71"), (1, "Defect", "#e74c3c")]: mask = labeled_labels_arr == cls_idx axes[1].scatter(labeled_tsne[mask, 0], labeled_tsne[mask, 1], label=f"Labeled: {cls_name}", alpha=0.8, s=40, c=color) axes[1].set_title("t-SNE: All Images (Labeled in Color)") axes[1].legend() plt.tight_layout() plt.savefig("outputs/tsne_visualization.png", dpi=150) plt.show() Kwi-plot efanelekileyo, i-cloud ye-grey (iifoto engabonakaliwe) kufuneka ifumaneke neengxaki ze-colored. Oku ibonise ukuba iifoto ze-labelled kunye ne-unlabelled zithunyelwe kwindawo efanayo - i-condition esizodwa yokufunda kwe-semi-supervised yokusebenza. 4.5 — K-Means ukuxhaswa I-K-Means yi-algorithm ye-clustering eyenziwe kakhulu. I-K-Means iveza idatha kwi-k-groups ngokufanelekileyo ngokuhambisa yonke indawo kwi-centre ye-cluster efikeleleyo, emva kokugcwalisa i-centres. Ngenxa yokuba sinokufumana ukuba kukho iindidi ezimbini (i-normal kunye ne-defect), siyaqalisa ne-k=2. Kodwa sinokuthanda i-k=3, i-4, i-5, ukuze ufunde ukuba idatha ingaba i-structure engaphezulu (isib. iimveliso ezahlukileyo ezahlukileyo ezahlukileyo ezahlukileyo). iimveliso Ukucacisa ukuba i-cluster ibambisana ne-labels ezifanelekileyo, sisebenzisa i . I-ARI = 1.0 inikeza ukuxhaswa ngokupheleleyo kunye neetiketi ezifanelekileyo. I-ARI = 0.0 inikeza ukuxhaswa ngempumelelo. I-ARI < 0 inikeza ukuxhaswa ngempumelelo. ARI (Adjusted Rand Index) from sklearn.cluster import KMeans from sklearn.metrics import adjusted_rand_score, silhouette_score print("K-Means Clustering:") print(f" {'k':<5s} {'ARI':>8s} {'Silhouette':>12s}") print(f" {'-'*27}") for k in [2, 3, 4, 5]: kmeans = KMeans(n_clusters=k, random_state=42, n_init=10) all_clusters = kmeans.fit_predict(all_embeddings_scaled) # ARI: compare clusters vs true labels (on labeled images only) labeled_clusters = all_clusters[:len(labeled_embeddings)] ari = adjusted_rand_score(labeled_labels_arr, labeled_clusters) # Silhouette: internal quality measure (no labels needed) # How well-separated are the clusters? Range: -1 to +1 sil = silhouette_score(all_embeddings_scaled, all_clusters) print(f" {k:<5d} {ari:>8.4f} {sil:>12.4f}") Yintoni I-parameter inokuthi i-K-Means iya kuqhutywa iiyure ze-10 nge-initializations ezahlukileyo ezahlukileyo kunye nokugcina imiphumo emangalisayo. Oku kukuvimbela ukufikelela kwi-bad local minimum. n_init=10 Ukuba k=2 inikeza i-ARI ephakeme kakhulu, oku kuqinisekisa ukuba idatha yethu iye iindidi ezimbini ezisemthethweni ezihambelana ne-normal vs. defect. 4.6 — I-DBSCAN clustering (umgangatho olungaphakathi) I-DBSCAN isebenza ngokubanzi kwi-K-Means. Ngaphandle kokufaka inani le-cluster, unayo iiparametre ezimbini: eps (epsilon): ububanzi obungapheliyo phakathi kwiphakamitha ezimbini ukuba ziquka nabalandeli. Hlola nje njengoko "ngaphansi kunokuba ngexesha?" min_samples: ubungakanani obuninzi obuninzi ebonakalayo ukwakha isixeko eshushu (i-cluster). Hlola nje njengoko "Ukufuneka ukuba isixeko esifutshane ukuba kuxhomekeke njenge-cluster?" I-DBSCAN ibonise ngokuzenzakalelayo inani le-cluster kwaye ibonise i-outliers (i-points that don't belong to any cluster - labelled as -1). Oku kunokuncedisa ukufumana iifoto ezizodwa ezinokuthi ziyafumaneka ngempumelelo okanye i-anomaly. from sklearn.cluster import DBSCAN print("\nDBSCAN Clustering:") print(f" {'eps':<6s} {'min_s':<7s} {'clusters':>9s} {'noise':>7s} {'ARI':>8s}") print(f" {'-'*40}") Thina ukhawuleza iingxaki zeemparamitha ezininzi njengoko iingxaki ze "ngcono" zihlanganisa idatha: for eps in [3.0, 5.0, 7.0, 10.0]: for min_samples in [5, 10, 20]: dbscan = DBSCAN(eps=eps, min_samples=min_samples) db_clusters = dbscan.fit_predict(all_pca) # Use PCA-reduced data n_clusters = len(set(db_clusters)) - (1 if -1 in db_clusters else 0) n_noise = (db_clusters == -1).sum() if n_clusters >= 2: labeled_db = db_clusters[:len(labeled_embeddings)] mask = labeled_db != -1 # Exclude noise points from ARI if mask.sum() > 10: ari = adjusted_rand_score(labeled_labels_arr[mask], labeled_db[mask]) print(f" {eps:<6.1f} {min_samples:<7d} {n_clusters:>9d} " f"{n_noise:>7d} {ari:>8.4f}") Qaphela ukuba usebenzisa iinkcukacha PCA-reduced ( , 50 dims) ngaphandle kokugqitywa kwe-2048-dim. I-DBSCAN isibambisane kwi-dimensions ezininzi kakhulu ngenxa yokuba zonke iingxaki ziye ziye ziye ziyiquka (i-"I-curse ye-dimensionality"). all_pca Ukubala i-ARI engcono kwi-DBSCAN kunye ne-best kwi-K-Means, kwaye ukhethe umlawuli. 4.7 — Ukubonisa i-cluster kwi-t-SNE plot Nceda siphinde njani i-clustering engcono kwi-t-SNE yethu yokubonisa: best_kmeans = KMeans(n_clusters=2, random_state=42, n_init=10) all_cluster_ids = best_kmeans.fit_predict(all_embeddings_scaled) fig, ax = plt.subplots(figsize=(8, 6)) scatter = ax.scatter(all_tsne[:, 0], all_tsne[:, 1], c=all_cluster_ids, cmap="coolwarm", alpha=0.4, s=15) ax.set_title("K-Means Clusters (k=2) on t-SNE") plt.colorbar(scatter, label="Cluster ID") plt.tight_layout() plt.savefig("outputs/kmeans_clusters_tsne.png", dpi=150) plt.show() Ukuba imibala ezimbini kwi-plot iyafumaneka ngexabiso ezimbini oyifunyenwe kwi-plot t-SNE ebhalisiweyo, i-clustering isebenza. 4.8 — Ukukhishwa kwe-pseudo-labels kwiifoto ezizodwa Kwaye isinyathelo esincinane: sisebenzisa izicelo ze-cluster kwaye zihlanganisa njenge-"labels ezincinane" kwiifoto ezincinane. Kodwa kukho ingxaki — K-Means ibonise i-ID ye-cluster ngokufanelekileyo. I-cluster 0 ingaba kuxhomekeke yi-"defect" okanye i-"normal." Thina kuxhomekeke. Ukukhuphela i-pseudo-labels: unlabeled_pseudo_labels = all_cluster_ids[len(labeled_embeddings):] Qhagamshelana kunye neetiketi real: labeled_cluster_ids = all_cluster_ids[:len(labeled_embeddings)] # What fraction of labeled images in cluster 0 are actually "normal"? cluster_0_normal_rate = (labeled_labels_arr[labeled_cluster_ids == 0] == 0).mean() cluster_1_normal_rate = (labeled_labels_arr[labeled_cluster_ids == 1] == 0).mean() print(f"Cluster 0: {cluster_0_normal_rate:.1%} of labeled images are 'normal'") print(f"Cluster 1: {cluster_1_normal_rate:.1%} of labeled images are 'normal'") Ukuba i-cluster 0 yintloko iimfuno (normal_rate < 50%), sincoma i-mapping: if cluster_0_normal_rate < 0.5: unlabeled_pseudo_labels = 1 - unlabeled_pseudo_labels print("Cluster IDs flipped to match convention (0=normal, 1=defect)") Ndiyathanda ukusabalalisa: print(f"\nPseudo-label distribution:") print(f" Normal (0): {(unlabeled_pseudo_labels == 0).sum()} images") print(f" Defect (1): {(unlabeled_pseudo_labels == 1).sum()} images") Thina ngoku iindidi ezimbini ezahlukileyo ezahlukileyo ezahlukileyo ezahlukileyo: Ukucaciswa kakhulu — iifoto ze-200 kunye ne-labels ze-expert ezininzi. Umgangatho ophezulu, inani elincinci. Oku kubalulekileyo. I-Lably Labeled — i-9800 iifoto kunye ne-pseudo-label base-cluster. Umgangatho ongaphakeme (ezinye i-label zibonakalayo), kodwa inani elikhulu. Umthetho weGolden: Zifumaneka izicelo ezahlukeneyo kwi-step elandelayo. never mix these two. Ukubuyekezwa kwakhona: Ukubuyekezwa kwakhona: I-K-Means ezaziwayo (i-scikit-learn) I-DBSCAN ibonisa (i-scikit-learn) Indlela yokufunda i-t-SNE ngokufanelekileyo (i-Distill) — ukubuyekeza ebalulekileyo I-K-Means ezaziwayo (i-scikit-learn) I-DBSCAN ibonisa (i-scikit-learn) Indlela yokufunda t-SNE ngokufanelekileyo (Distill) I-Part 5 - Ukuqeqeshwa kwe-Semi-Supervised: I-Experiment Enyanelekileyo 5.1 — I-logic esekelwe kwi-two-phase approach yethu Ndiyathanda ukuba utshintshe i-quality inspector entsha kwi-factory: : Uyakwazi ukubonisa iifoto ze-9800 kwaye ushiye "I Zonke iimveliso zibonakalayo zibonakalayo kwaye zibonakalayo, kodwa ayikho ngexabiso ye-100%." I-inspector iqala ukwakha iimodeli ye-mental. Ezinye iimveliso zibonakalayo, kodwa imodeli epheleleyo - iimveliso zibonakalayo zibonakalayo kwaye zibonakalayo, iimveliso zibonakalayo zibonakalayo - iyona ngokwenene. Emva kwii-phase, i-inspector ibonelela iimveliso ezifanelekileyo. . Phase 1 (pre-training on pseudo-labels) Ngathi Ukucinga : Ngoko ke uya kuba 200 iifoto ezibonakalayo ngokucacileyo ngu-expert: "Ii-DESFINITELY i-normal, kwaye i-DESFINITELY i-defective." I-inspector ibonise iimodeli yayo yokuzonwabisa - ukuguqula iingxaki ze-phase 1 kunye nokukhutshisa imibuzo yayo kwiimeko ze-edge. Phase 2 (fine-tuning on real labels) Umzekelo: umzekelo owenziwe 10,000 iifoto (kuvelisa i-intuition epheleleyo) kwaye ngama-200 amaxabiso ezidlulileyo (ukuphepha ukucacisa). Sifuna ukuba lo mfanekiso uyakwazi ukufumana omnye owayenza kuphela i-200 amaxabiso ezidlulileyo. Ukucinga Ukucaciswa Ukubonisa oku, sisebenza izilwanyana ezimbini eziphilayo: I-Experiment A — I-Supervised Only: Ukuqeqesha kwi-200 iifoto ezaziwayo kuphela Uhlobo lwe-Experiment B-Semi-supervised: i-pre-train kwi-9800 iifoto ze-pseudo-labelled, emva koko i-fine-tune kwi-200 iifoto ze-labelled Izixhobo ye-model efanayo, i-test set efanayo. I-difference kuphela kuquka ukuba i-model ibonisa idatha eyenziwe ngexabiso okanye akukho. 5.2 — Ukwakhiwa kweClassifier: Architecture Thina usebenzise i-ResNet50 njenge-backbone, kodwa ngexesha ubungakanani lokugqibela kunye ne-binary classifier kunye nathi it (unlike Part 3, where we just extracted features). replace train import torch.nn as nn class DefectClassifier(nn.Module): """ Binary classifier: Normal (0) vs Defect (1). Based on ResNet50 with a custom classification head. """ def __init__(self, dropout_rate=0.5): super().__init__() self.backbone = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1) num_features = self.backbone.fc.in_features # 2048 Ngiya kubandakanyeka umbhali wokuqala we-ImageNet (i-2048 → i-1000 iindidi) kunye nathi: self.backbone.fc = nn.Sequential( nn.Dropout(p=dropout_rate), # Anti-overfitting nn.Linear(num_features, 1), # Binary output ) def forward(self, x): return self.backbone(x) Yintoni ? Ngexesha le-200 iifoto ezaziwayo kunye ne-25 iiparamitha ezininzi, i-overfitting yinto yokukhangisa kakhulu. I-Dropout ihambisa ngempumelelo i-50% ye-neurons ngexesha elinye le-training, ukhangela inethiwekhi ukufundisa i-redundant representations. Kwixesha le-deduction, zonke i-neurons ziye ezisebenzayo. Le nkqubo ye-regularization efanelekileyo ye-small datasets. Dropout(0.5) Yintoni Ukusetyenziswa kwe-binary, i-neurone ye-output ye-single kunye ne-activation ye-sigmoid i-matematically equivalent to two neurons with softmax, kodwa i-simple and slightly more numerically stable. Linear(2048, 1) 5.3 — I-Loss Function: Ukulawula i-class imbalance Ngaphambi kokubhala i-training loop, siza kuxhomekeke i-loss function. Thina usebenzise (I-Binary Cross-Entropy kunye neLogits), eyahlanganisa i-activation ye-sigmoid kunye ne-cross-entropy ye-binary kwi-operation eyodwa, enzima. BCEWithLogitsLoss Umzekelo wokufanisa : pos_weight pos_weight = torch.tensor([4.0]).to(device) criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight) Yintoni Ukubonisa umfuneko we-loss function: "I-defect missed (i-false negative) kufuneka ithathwe Ukusabela i-class imbalance. Ngaphandle kwe-class, i-modeli angakwazi ukufumana i-80% i-accuracy ngexesha lokufikelela "i-normal" - nto leyo lula. pos_weight=4.0 4 times more I-value 4.0 iyisisombululo ebandayo esekelwe kwi-class ratio. Ukuba unayo i-80% normal / i-20% defect, ke Uyakwazi ukucacisa le ubunzima, kodwa i-4.4 yinto elungileyo yokuqala. pos_weight = 80/20 = 4.0 5.4 — I-Optimizer: I-AdamW kunye ne-weight decay import torch.optim as optim optimizer = optim.AdamW(model.parameters(), lr=lr, weight_decay=1e-4) Yintoni i-AdamW? Yintoni i-Adam kunye ne-decay ye-weight (i-L2 ye-regularization). ngokufanelekileyo iingxaki ezininzi, nto leyo enye ingxaki yokhuseleko ukusuka overfitting. Think of it like saying the model "ukukukhetha izifundo ezincinane." weight_decay=1e-4 5.5 - I-Scheduler ye-learning rate scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=3, factor=0.5) Oku kwandisa ngokuzenzakalelayo i-learning rate xa i-validation loss iyahambisa. "Ukuvela i-3 iintsuku ngaphandle kokuphucula ngaphambi kokunciphisa." umzekelo "ukwandisa izinga lokufunda nge-0.5". Oku kubalulekile ukuqhagamshelwano - njengoko umzekelo umzekelo ubuncinane, iingxaki ezincinane ziyafumaneka ukuchithwa. patience=3 factor=0.5 5.6 — I-Training loop: isixeko esinye ngexesha Ngoku siza ukuvelisa umsebenzi yokulungisa ngokupheleleyo. Siza kuza ngeenxa zonke iinkcukacha ze-loop ngokulinganayo. (i-one epoch - i-one pass ngokusebenzisa zonke iinkcukacha zokusebenza): The training phase from sklearn.metrics import f1_score def train_model(model, train_loader, val_loader, epochs, lr, device, phase_name=""): """Train the model and track validation F1 score.""" pos_weight = torch.tensor([4.0]).to(device) criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight) optimizer = optim.AdamW(model.parameters(), lr=lr, weight_decay=1e-4) scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=3, factor=0.5) best_f1 = 0 for epoch in range(epochs): # ---- TRAINING ---- model.train() # Enable dropout, update batch norm stats for images, labels in train_loader: images = images.to(device) labels = labels.float().unsqueeze(1).to(device) # .float() because BCEWithLogitsLoss expects float targets # .unsqueeze(1) adds a dimension: shape (batch,) → (batch, 1) optimizer.zero_grad() # Reset gradients from previous batch outputs = model(images) # Forward pass loss = criterion(outputs, labels) # Compute loss loss.backward() # Compute gradients (backpropagation) optimizer.step() # Update weights Zonke i-batch ivela kwi-cycle ye-classic: i-forward pass → i-computing loss → i-backpropagate gradients → i-update weights → i-reset gradients ye-batch elandelayo. (Ukuba zonke iintsuku zokusebenza): The validation phase # ---- VALIDATION ---- model.eval() # Disable dropout, use fixed batch norm stats all_preds, all_true = [], [] val_loss_total = 0 with torch.no_grad(): # No gradients needed for evaluation for images, labels in val_loader: images = images.to(device) outputs = model(images) # Compute validation loss val_loss_total += criterion( outputs, labels.float().unsqueeze(1).to(device) ).item() # Convert raw logits → binary predictions # sigmoid maps logits to [0, 1], then threshold at 0.5 preds = (torch.sigmoid(outputs) >= 0.5).int().cpu().numpy().flatten() all_preds.extend(preds) all_true.extend(labels.numpy()) Zibonisa i inkcazelo. Oku kubalulekile: kuyenza i-drop-out (zonke i-neurons ezisebenzayo) kwaye usebenzisa i-statistics yokusebenza ye-batch normalization kunokuba i-batch statistics. Ngaphandle kokuba, i-metric yakho ye-validation iya kuba i-noisy and unreliable. model.eval() Track metrics and update scheduler: val_f1 = f1_score(all_true, all_preds, average="binary") scheduler.step(val_loss_total) # Reduce LR if loss plateaued if val_f1 > best_f1: best_f1 = val_f1 if (epoch + 1) % 5 == 0: print(f" [{phase_name}] Epoch {epoch+1}/{epochs}: val_f1={val_f1:.4f}") print(f" [{phase_name}] Best F1: {best_f1:.4f}") return best_f1 Thina ukucinga F1 kuzo zonke iintsuku, akuyona kuphela elidlulileyo. Iimodeli ziquka ngokufanelekileyo ngaphambi kokugqibela ukuqeqeshwa (kwixesha elandelayo zinokufanelekileyo). best 5.7 - Ukubalwa iingxaki zeedatha Zihlanganisa iinkcukacha ze-labelled kwi-train (70%) kunye ne-test (30%). I-test set : kungekho kusetyenziswa ukuqeqeshwa kwisixabiso omnye. Stratification ibonelela i-class ratio ibekwe kumahluko omnye. sacred from sklearn.model_selection import train_test_split labeled_train_idx, labeled_test_idx = train_test_split( range(len(labeled_paths)), test_size=0.3, random_state=42, stratify=labeled_labels, ) Yenza iindidi ezintathu eziluncedo eziluncedo: # 1. Labeled training data (for supervised training + fine-tuning) train_labeled_ds = MetalSurfaceDataset( [labeled_paths[i] for i in labeled_train_idx], [labeled_labels[i] for i in labeled_train_idx], preprocessing, ) # 2. Test data (for evaluation only — NEVER used for training) test_ds = MetalSurfaceDataset( [labeled_paths[i] for i in labeled_test_idx], [labeled_labels[i] for i in labeled_test_idx], preprocessing, ) # 3. Weakly labeled data (pseudo-labels from clustering) weakly_labeled_ds = MetalSurfaceDataset( unlabeled_paths, unlabeled_pseudo_labels.tolist(), preprocessing, ) Kwakhona iDataLoaders: train_labeled_loader = DataLoader(train_labeled_ds, batch_size=16, shuffle=True) test_loader = DataLoader(test_ds, batch_size=16, shuffle=False) weakly_labeled_loader = DataLoader(weakly_labeled_ds, batch_size=32, shuffle=True) print(f"Train (labeled): {len(train_labeled_ds)} images") print(f"Test: {len(test_ds)} images") print(f"Weakly labeled: {len(weakly_labeled_ds)} images") Qinisekisa iintlobo ezahlukeneyo ze-batch: 16 kwi-set ebandayo ebandayo (izithombe ezincinane ngexesha), 32 kwi-set ebandayo ebandayo ebandayo ebandayo (kuvelisa ngokukhawuleza). Ukuqeqeshwa ukuze ukunciphisa imodeli ukufundisa umgangatho we-samples. shuffle=True 5.8 — I-Experiment A: I-Supervised Only (i-baseline) Oku kwimvavanyo elula. Sifundisa imodeli omtsha usebenzisa ONLY iimifanekiso ze-140 ezihlabathiweyo (i-60 ezininzi zihlanganisa ukuyifaka). Le ngempumelelo siza kufumana ngaphandle kokufundiswa kwe-semi-supervised learning. print("=" * 60) print("EXPERIMENT A: SUPERVISED ONLY (140 labeled images)") print("=" * 60) model_supervised = DefectClassifier(dropout_rate=0.5).to(device) f1_supervised = train_model( model_supervised, train_labeled_loader, test_loader, epochs=30, lr=1e-4, device=device, phase_name="Supervised" ) 5.9 — I-Experiment B: i-semi-supervised (umgangatho we-2-phase) Ngoku i-pipeline epheleleyo. I-Phase 1 ibonelela i-model kwi-intuition epheleleyo kwi-9800 iifoto ze-pseudo-label. I-Phase 2 ibonelela kwi-140 iifoto ze-real. Phase 1 — Pre-training on weakly labeled data: print("\n" + "=" * 60) print("EXPERIMENT B: SEMI-SUPERVISED") print("=" * 60) model_semi = DefectClassifier(dropout_rate=0.5).to(device) print("\nPhase 1: Pre-train on pseudo-labeled data (9,800 images)...") train_model( model_semi, weakly_labeled_loader, test_loader, epochs=10, lr=1e-4, device=device, phase_name="Pre-train" ) Ukusebenza kuphela kwi-10 iintsuku apha ngenxa ye-pseudo-label ye-noisy. Ukusebenza kwexesha elide kwi-label ye-noisy iya kubandakanya iingxaki. Phase 2 — Fine-tuning on strongly labeled data: print("\nPhase 2: Fine-tune on real labeled data (140 images)...") f1_semi = train_model( model_semi, train_labeled_loader, test_loader, epochs=20, lr=5e-5, device=device, phase_name="Fine-tune" ) Zibonisa i (5e-5 vs 1e-4 kwi phase 1). Lokhu kubalulekile. Ukuba usebenzisa i-learning rate ephezulu ngexesha le-fine-tuning, umzobo uya "ukugqitywa" ngokukhawuleza yonke into ebandayo ngexesha le-pre-training - i-gradients iya kuba enkulu kakhulu kwaye iya kubhalwe izicwangciso ze-pre-trained. I-learning rate elula inokukwazi umzobo ukwenza imibuzo amancinci kwiingcebiso yayo ezikhoyo, ukugcina iimveliso ezininzi evela kwi-phase 1 ngexesha lokugqithisa iingcebiso nge-labels eziqhelekileyo. lower learning rate Yinto efana ne-factory inspector: unako ukuqala ukuqeqeshwa kwabo ukusuka kokuqala kwi-phase 2. Uyakwazi ukuguqulwa ngokuthandayo iingxaki zayo ngelixa ukugcina intliziyo zabo jikelele. 5.10 — Ukuphakamisa lokugqibela: i-moment of truth Ngoku, sincoma iimodeli ezimbini kwi-test set efanayo nge-metric ezininzi. Okokuqala, umsebenzi yokuhlola: from sklearn.metrics import roc_auc_score, classification_report def full_evaluation(model, test_loader, device, name): """ Evaluate on the test set. Returns F1 score and AUC-ROC. """ model.eval() all_preds, all_probs, all_true = [], [], [] with torch.no_grad(): for images, labels in test_loader: outputs = model(images.to(device)) probs = torch.sigmoid(outputs).cpu().numpy().flatten() all_probs.extend(probs) all_preds.extend((probs >= 0.5).astype(int)) all_true.extend(labels.numpy()) Izixhobo ezimbini (u-AUC-ROC, ebonisa umgangatho we-ranking) kunye (Ukuba i-F1, ebonakalisa umgangatho we-classification kwi-threshold 0.5): probabilities binary predictions f1 = f1_score(all_true, all_preds, average="binary") auc = roc_auc_score(all_true, all_probs) print(f"\n{name}:") print(f" F1 Score: {f1:.4f}") print(f" AUC-ROC: {auc:.4f}") print(classification_report( all_true, all_preds, target_names=["Normal", "Defect"] )) return f1, auc Yintoni yaye ayikho ngokugqithisileyo? Ngenxa yokuba kwizilwanyana ezincinane, ngokugqithisileyo iyathintela. Iimodeli ebonakalayo njalo "i-normal" ibonelela ngokugqithisileyo kwi-80%, kodwa i-0% ibonelela iimfuno. I-F1 yintloko olungqithisileyo ye-precision kunye ne-recall - ibonelela iimodeli ebonakalayo i-minority class. F1 Yintoni ? Ukubala ukuba indlela efanelekileyo iimifanekiso (iimifanekiso ezifanelekileyo ziquka iimfanelo ezininzi kunokwenzeka kunokwenzeka), ngaphandle kokufanelekileyo. I-AUC ye-1,0 inokuthi i-ranking epheleleyo; i-0,0 inokuthi i-random. AUC-ROC Ukucinga Ndiyathanda ukuxhaswa: f1_sup, auc_sup = full_evaluation( model_supervised, test_loader, device, "SUPERVISED ONLY" ) f1_semi, auc_semi = full_evaluation( model_semi, test_loader, device, "SEMI-SUPERVISED" ) Kwakhona umdla wokugqibela: print("=" * 60) print("FINAL COMPARISON") print("=" * 60) print(f" {'Metric':<12s} {'Supervised':>12s} {'Semi-supervised':>16s} {'Delta':>8s}") print(f" {'-'*50}") print(f" {'F1':<12s} {f1_sup:>12.4f} {f1_semi:>16.4f} {f1_semi - f1_sup:>+8.4f}") print(f" {'AUC-ROC':<12s} {auc_sup:>12.4f} {auc_semi:>16.4f} {auc_semi - auc_sup:>+8.4f}") Ukuba i-Delta ye-column ibonisa iinombolo ezininzi, sinokuthi ukuba iinkcukacha ze-unlabeled ziyafumaneka. I-pseudo-label, nangona i-imperfect, ibonelela kwimodeli i-head-start ukuba ukulawulwa okuzenzakalelayo kwi-200 iifoto ayikho. 5.11 - Ukucaciswa kweziphumo Nazi indlela yokufunda isibana: I-F1 ifunyenwe nge-0.05 okanye ngaphezulu: ukufumana okucacileyo kwi-semi-supervised. Iinkcukacha ze-unlabeled zithunyelwe i-signal epheleleyo. I-F1 ifunyenwe nge-0.01 ukuya kwi-0.04: ukuguqulwa kwe-modest. I-Semi-supervised inokunceda kodwa i-margin ingancinci. Qinisekisa ukuphucula umgangatho we-clustering okanye ukusetyenziswa kwe-pseudo-labeling emangalisayo. F1 efanelekileyo okanye engapheliyo: i-pseudo-labels iye i-noise kakhulu yokusiza, okanye i-clustering ayikwazi ukufumana isakhiwo esemgangathweni. Thola i-function extractors ezahlukeneyo, i-algorithms ye-clustering ezahlukeneyo, okanye i-pseudo-labels ze-trust thresholds engaphezulu. Ukubuyekezwa kwakhona: Ukubuyekezwa kwakhona: Umbhali we-pseudo-labelling (Lee, 2013) - indlela yokuqala I-Semi-Supervised Learning Survey (i-van Engelen & Hoos) – inkcazelo olupheleleyo lwezindlela I-PyTorch training loop iimeko ezilungileyo Umbhali we-pseudo-labelling (Lee, 2013) I-Semi-Supervised Learning Survey (i-van Engelen & Hoos) I-PyTorch training loop iimeko ezilungileyo I-Part 6 — Ukuphakamisa kwiimifanekiso ezininzi: isitimela eshushu umbuzo kwi-business "I-proof ye-concept yakho isebenza kwi-10,000 iifoto. Thina i-4 million iifoto zokusetyenziswa. Ngaba sinokufumana le pipeline nge-budget ye-5000 euro?" Yinto ingxaki uya kukufumana kwiiprojekthi yeqiniso. Thina siphile ngokwenene. Iindleko zeComputer Yintoni i-bottleneck. Kwiifoto zethu ze-10,000 kunye ne-GPU elinye, kulandelela malunga ne-30 imizuzu. Ukuphakamisa kwi-linear: Feature extraction 4,000,000 images ÷ 10,000 images × 30 min = 12,000 min = 200 GPU-hours Ngama ~€2/ora kwi-cloud GPU instance (i-Azure NC-series kunye ne-T4 okanye i-A10 GPU), oko . €400 K-Means Standard ihlawula zonke iinkcukacha kwi-memory ukucacisa izixazululo. Nge-embeddings ye-4M ye-2048 dimensions (ngomnye i-4-byte float): Clustering 4,000,000 × 2,048 × 4 bytes = ~32 GB just for the embeddings Oku akufanele kwi-RAM kwiimishini ezininzi. Isisombululo: ukusetyenziswa ukusuka scikit-learn, leyo ithatha idatha kwiiphakheji (ngathi) amaxabiso ze-10,000 ngexesha elifanayo. I-resultat efanayo, i-fraction ye-memory. MiniBatchKMeans : Pre-training on 4M iifoto pseudo-labeled kunzima malunga 50 iiyure GPU . CNN training €100 Iindleko Storage Iimifanekiso Raw: 4M × ~50 KB average = . Ububanzi: 4M × 2048 × 4 bytes = On Azure Blob Storage kwi ~€0.02/GB / ngenyanga, oko . 200 GB 32 GB €5/month Ukuhlobisa Strategy Ukuba i-200 i-labels ayinayo ngokubanzi, sinokufumana i-labels ezininzi. Kwi-1 € ngalinye i-image (kuquka ukulawula umgangatho), i-2,000 i-labels ezininzi ziquka Kodwa kukho indlela elungileyo: . €2,000 active learning Ukufundwa okuzenzakalelayo ivumela i-model ukhethe iifoto ukuze zihlanganise. Ngaphandle kokufaka ngempumelelo iifoto ze-2000, iimodeli ibonise iifoto ezininzi ezinxulumeneyo - iifoto ezininzi ezifundisa. Oku kufuneka iifoto ezininzi ezininzi ze-3-3x ukuze ufumane i-performance enhancement efanayo. Yintoni Nge-learning esebenzayo, siya kufuneka kuphela 500 labels ezongezelelweyo kunokuba 2,000 . €500 Ukubalwa kwebhizinisi Feature extraction (GPU): €400 CNN training (GPU): €100 Storage (year 1): €60 Additional labeling: €500 – €2,000 ────────────────────────────────────── TOTAL: €1,060 – €2,560 Phantsi kwebhizinisi we-€5,000, kunye nokufumana indawo yokusebenza kunye nokuguqula. Iiyure ezimbini yokuphumelela Ukusebenzisa i-cloud GPU, ayikho i-hardware ye-local. Ukuhambisa ngehora, ukunika kuphela yintoni esebenzayo. Ukusetyenziswa kwe-MiniBatchKMeans kunokuba kwi-KMeans ezivamile. Umgangatho efanayo, i-memory engaphezulu kwe-100x. Yenza i-pipeline yeedatha efanelekileyo kunye ne-batch processing. Ungayithanda iifoto ze-4M kwi-RAM ngexesha elifanayo. Isebenzisa i-PyTorch DataLoader kunye ne-num_workers > 0 yokuthumela ngexesha elifanayo. Qinisekisa ucwaningo olusebenzayo ukwandisa i-value ye-imeyile ye-humane. Yonke i-etiquette kufuneka ifumaneka ngempumelelo, ayikho ngempumelelo. I-Store kunye ne-version embeddings, akukho iifoto eziluncedo kuphela. Ukukhishwa kwakhona kwe-embeddings ye-4M iindleko i-€400; ukulayisha i-embeddings ezihlawulwe ayikho iindleko. Ukubuyekezwa kwakhona: Ukubuyekezwa kwakhona: MiniBatchKMeans (scikit-learn) — how to label smarter, not more Active learning overview I-MiniBatchKMeans (i-scikit-learn) Ukufundwa Active Overview Ukucinga Ukufundwa kwe-semi-supervised ayikho yokuzonwabisa - iyona yokuzonwabisa. Uyachitha isakhiwo esifundisiweyo kumadokhumenti ebhalisiweyo (ngokusebenzisa i-embeddings kunye ne-clustering), ukuguqulwa kwimibelelwano ebhalisiweyo, kwaye usebenzisa leyo ukunika iimodeli yakho ebhalisiweyo ukuqala. Iimodeli ebhalisiweyo ayikho ukuguqula iimibelelwano ezifanelekileyo - ibonelela. Nceda siphinde i-pipeline epheleleyo efakwe: I-Exploration—Scaned iifoto ze-10,000 ngenxa ye-corruption, i-format inconsistent, kunye ne-class imbalance. Sithanda idatha ngeengoma zethu zayo. Preprocessing - Sithintshwe zonke iifoto kwi-format ResNet50 ibonakala: 224×224, RGB, CLAHE-enhanced, ImageNet-normalized. Ukukhishwa kweempawu - Senzele i-ResNet50 eyenziwe ngexesha lokugqibela ukuguqulwa yonke umfanekiso kwi-embedding ye-2048-dimensional eyenza i-essence yayo ye-visual. I-Clustering - Thola i-K-Means kunye ne-DBSCAN ukuze zihlanganise iifoto engabonakaliweyo kwi-cluster, emva kokuba zihlanganisa i-pseudo-labels kulingana ne-cluster membership. Uqeqesho we-semi-supervised - Sikhokelela i-CNN kwi-9800 iifoto ze-pseudo-labeled, emva koko i-fine-tuned kwi-200 iifoto ze-real, kwaye ilinganiselwe kwi-supervised-only baseline. Ukucaciswa kwe-scaling - Sithathwa iindleko ze-calculation, ukugcina kunye ne-labelling ye-images ye-4M, okuqinisekisa ukuba kunokwenzeka kwi-budget ye-€5,000. Iimpawu ze-key takeaways: I-CNN eyenziwe ngexesha elandelayo inokukwazi ukufumana iimpawu ezininzi ezininzi kwiinkalo zonyango, ngaphandle kwe-imeyile elidlulileyo. I-Clustering kwi-embeddings ibonisa iingxaki ezijwayelekile ezinxulumene nezilwanyana. I-pseudo-labels ziyafumaneka, kodwa i-model eyenziwe ngexesha elandelayo kwi-labels ezincinane kunye ne-labels ezininzi ziyafumaneka ngexesha elidlulileyo kwi-labels ezijwayelekile kuphela. Kwaye isisombululo se-semi-supervised iye yinto elidlulileyo ngokukhawuleza xa i-labels zibe ezincinane kunye neengxabiso ezininzi. I-pattern ifumaneka kwiinkalo ezininzi: i-imaging ye-medical, i-industrial quality control, i-satellite image, i-document classification, kunye ne-biodiversity monitoring. Kukho iindidi ezininzi ezininzi kwaye iinkcukacha ze-unlabelled ziquka – leyo ngonyaka ka-2025, malunga ngalinye.