Train YOLO in 10 Epochs: A Lean Recipe for Marine Micro-Object Detection

Table of Links

III. METHODOLOGY OF THE STUDY

In this experimental study, researchers conducted imaging of Artemia nauplii in various saltwater samples saturated with nanoparticles. The process involved capturing images of the nauplii under a microscope. The Artemia nauplii were freely moving in some samples and others were selected for fixed sample capture.

A. Dataset

The dataset employed in this study consisted of 1000 images, of which 800 were allocated for training, and 200 for validation purposes. To optimize computational efficiency and save computational resources, all images underwent a format transformation. Originally in TIF format, these images were converted to PNG format. This conversion served a dual purpose: it significantly reduced image size while enhancing computational efficiency.

By switching to the PNG format, the dataset became clearly visible and more manageable without compromising the quality of the images. Furthermore, to facilitate model training with YOLO and to optimize computational efficiency, all images were resized from their original dimensions of 2048x2044 to a standardized size of 640x640 pixels. This resizing ensured uniformity across the dataset while reducing computational overhead.

When the images were resized to a standardized size of 640x640 pixels, the nauplii in the images became smaller in terms of pixel dimensions compared to their original, largersized images which can affect the measured speed of the nauplii (calculated as pixels per frame). This means that the same physical movement of a nauplius would correspond to fewer pixels in the resized images than in the original images. As a result, the calculated speed in pixels per frame might appear artificially high. Therefore, it's essential to consider the specific scale factor and the potential difference in pixel values relative to real-world instances and distances.

The selection of these 1000 images was performed manually from a larger collection of 2250 images. This manual curation process was undertaken to ensure better clarity and accurate positioning of objects within the images. By carefully handpicking the images, the dataset was refined to emphasize high-quality and representative samples, enhancing the reliability of the object detection models in capturing and recognizing Artemia, Cyst, and Excrement.

B. Class-instances

The training dataset consisted of a total of 1,716 annotated class instances as follows: Artemia holds the majority with 1,368 instances, making up a substantial 79.7% of the total instances. Cyst is represented by 297 instances which is 17.3% of the training set. Lastly, excrement has 51 annotated instances, approximately 3% of the training dataset.

In the validation set, we have a total of 435 class instances. Here, the class distribution follows a similar trend. Artemia is the most dominant class with 355 instances, approximately 81.6% of the validation set. Cysts are represented by 66 instances, contributing 15.2% to the validation dataset. Excrement appeared only 14 times, just 3.2% of the total validation instances. The similar allocation of instances between the training and validation datasets allowed us to comprehensively evaluate the models' performance on two datasets while maintaining a proportional representation of the three classes.

The total dataset combining both training and validation sets has the following class distribution: Artemia stands out as the dominant class with 1,723 instances or roughly 80.1% of the entire dataset. This majority presence of the Artemia instances reflects their ecological significance in marine environments. Cysts make up 36.3% of the dataset, with a total of 363 instances. The least common class is just 6.5% of the dataset, with 65 instances. Recognizing this class distribution is essential for our research, as it provides the basis for evaluating the object detection models' performance.

C. Structural Similarity

The samples of this dataset were derived from three distinct concentrations: 50mg, 100mg, and a controlled concentration. Specifically, it includes 400 images each from the 50mg and 100mg concentrations. 200 more image samples are collected from the controlled concentration. This distribution ensures a balanced representation of various concentration levels. The importance of the Structural Similarity Index (SSIM) in this context cannot be overstated. SSIM is a metric used to measure the similarity between two images. In our study, it plays a crucial role in assessing the similarities between images in the dataset. This assessment is vital for several reasons because by evaluating the SSIM, we can ensure that the images across different concentrations maintain a consistent quality which is crucial for accurate analysis and comparison.

The SSIM also helps in determining how static or dynamic the samples appear in terms of structure and appearance. This is essential when studying the impact of different concentrations on the subjects. A higher SSIM would indicate more structural similarity and less dynamic change, whereas a lower SSIM might suggest significant alterations due to concentration differences. It helps in identifying any distinct structural changes that occur due to varying concentrations.

Another value that is being measured is MSE (Mean Squared Error). It is a metric to measure the average squared difference between the pixel values of the two compared images. MSE value of 0 indicates a perfect similarity when an image is being compared with itself. The MSE value is greater than 0 for different images where higher values indicating greater differences between the images.

The MSE values from figure 3 represents the average of the squares of the differences between the pixel intensities of two images. As mentioned earlier, an MSE of 0 indicates no difference between the compared images (perfect similarity). For instance, when '50mg (1).png' is compared with itself, the MSE is 0.00 which indicates a perfect match. At the same time, the SSIM values measure the similarity between two images in terms of luminance, contrast, and structure. SSIM values range from -1 (no similarity) to 1 (perfect similarity). For example, '50mg (1).png' compared with itself has an SSIM of 1.00, which is expected since it's the same image. There are five sample from each concentration to measure and evaluate these values. Images that have a yellow border are the reference images for that row. It means all the MSE and SSIM values displayed at the bottom of other images in that row are calculated in comparison to the reference image with the yellow border. For example, in the second row, '50mg (2).png' has a yellow border, indicating it is the reference image for comparisons in that row. '50mg (1).png' has comparisons with MSE values ranging from 0.00 to 0.03 and SSIM values from 0.52 to 1.00, while '50mg (2).png' has comparisons with MSE values ranging from 0.00 to 0.03 and SSIM values from 0.54 to 1.00. '50mg (3).png' has comparisons with MSE values ranging from 0.00 to 0.03 and SSIM values from 0.56 to 1.00, while '50mg (4).png' has comparisons with MSE values ranging from 0.00 to 0.02 and SSIM values from 0.60 to 1.00. The last image, '50mg (5).png' has MSE values ranging from 0.00 to 0.03 and SSIM values from 0.52 to 1.00.

Considering the low MSE values and the high SSIM values, it can be said that the images are very similar to each other in 50mg concentration. This high degree of similarity indicates that there are minimal changes in texture, luminance, and contrast between the images.

In figure 4, the MSE values for the comparisons are again low, measured ranging from 0.01 to 0.02 which indicates that the pixel intensity differences between the images are minor while the SSIM values are mostly in the range of 0.61 to 0.70, with some reaching 1.00 (when comparing an image with itself). These values are a bit lower than in the previous set which indicates a slight decrease in structural similarity, but they still reflect a high degree of similarity overall. '100mg (1).png' has comparisons with MSE values ranging from 0.00 to 0.02 and SSIM values from 0.61 to 1.00, while '100mg (2).png' has comparisons with MSE values ranging from 0.00 to 0.02 and SSIM values from 0.62 to 1.00. '100mg (3).png' has comparisons with MSE values ranging from 0.00 to 0.01 and SSIM values from 0.64 to 1.00, while '100mg (4).png' has comparisons with MSE values ranging from 0.00 to 0.02 and SSIM values from 0.66 to 1.00. '100mg (5).png' has comparisons with MSE values ranging from 0.00 to 0.02 and SSIM values from 0.61 to 1.00.

The patterns indicate that when comparing different images, there is a slight decrease in SSIM values compared to the previous set (50mg). It suggests that these images are a bit less similar to each other than the images in the 50mg set. However, the similarities are still high, which suggests that the differences between the images might still subtle.

In figure 5, 'ctrl (1).png' is compared to others with MSE values ranging from 0.00 to 0.01 and SSIM values from 0.72 to 1.00, while 'ctrl (2).png' shows MSE values from 0.00 to 0.01 and SSIM values from 0.71 to 1.00 when compared to others. 'ctrl (3).png' has MSE values from 0.00 to 0.01 and SSIM values from 0.72 to 1.00 in its comparisons while 'ctrl (4).png' is compared with others showing MSE values from 0.00 to 0.01 and SSIM values from 0.71 to 1.00. Finally, 'ctrl (5).png' has MSE values from 0.00 to 0.01 and SSIM values from 0.72 to 1.00 in comparisons.

The controlled concentration images exhibit consistently low Mean Squared Error (MSE) values and high Structural Similarity Index (SSIM) values across comparisons. These metrics indicate a high degree of similarity among the images within the controlled set. It also indicates that there are minimal variations in terms of pixel intensity, texture, and structure. The high SSIM values and low MSE figures observed in the controlled concentration comparisons reinforce the uniformity and stability of these samples by providing a solid foundation for the integrity of the experimental design and subsequent data analysis. The consistency in these key image quality metrics facilitates a more accurate and clear understanding of the impact of concentration levels on the subjects under study.

D. Training

To train the object detection models, we adopted a structured training procedure. The dataset, comprising 1000 images with 800 for training and 200 for validation, was prepared as previously described. For this comparative study, we employed the power of the YOLO (You Only Look Once) object detection framework; YOLOv5 and YOLOv8. The training was conducted specifically using YOLOv5s (YOLO version 5-small) and YOLOv8s (YOLO version 8-small) pretrained model architectures.

The training process encompassed 10 epochs. Each epoch iterated over the entire training dataset eight times (batch size of 8). This precise training process allowed the models to learn and adapt to the features and variations within the dataset. During the training process, the models optimized their internal parameters by minimizing a loss function through stochastic gradient descent (SGD) optimization that allowed the models to learn and adapt to the distinctive characteristics of the objects of interest within the dataset.

By progressively upgrading from YOLOv5 to YOLOv8, we explored the advancements in model architectures and their impact on object detection performance. The 10 training epochs and a batch size of 8 secured a balance between model convergence and computational efficiency. It also ensured that the models could effectively capture and classify Artemia, Cyst, and Excrement within the images.

Authors:

(1) Mahmudul Islam, Masum School of Computing and Information Sciences, Florida International University Miami, USA ([email protected]);

(2) Arif Sarwat, Department of Electrical and Computer Engineering, Florida International University Miami, USA ([email protected]);

(3) Hugo Riggs,Department of Electrical and Computer Engineering, Florida International University Miami, USA ([email protected]);

(4) Alicia Boymelgreen, Department of Mechanical and Materials Engineering, Florida International University Miami, USA ([email protected]);

(5) Preyojon Dey, Department of Mechanical and Materials Engineering, Florida International University Miami, USA ([email protected]).

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.