Need Precision Plankton Counts? Why YOLOv5 Shines, But YOLOv8 Adapts

Table of Links

IV. RESULT AND DISCUSSION

For object detection, we employed the YOLOv5 model and YOLOv8 model by utilizing YOLOv5s and YOLOv8s pretrained models respectively. We primarily relied on precision, recall, F-score, and mean Average Precision (mAP) metrics to assess object detection accuracy.

Precision emphasizes on the model's ability to make accurate positive predictions, aiming to minimize false positives. Recall, on the other hand, focuses on how effectively the model detects positive instances within the dataset. These metrics are calculated using the following formulas, where "Pos" represents Positive, "Neg" represents Negative, "T" stands for True, and "F" denotes False:

Accuracy serves as an additional metric to gauge classification performance, considering both true and false predictions.

The F1 Confidence Curve illustrates how sensitive the model is to the threshold used for classifying a positive instance. It also helps in identifying the threshold that provides the best balance between precision and recall. It also shows how robust the model is across different levels of confidence at the same time. It is vital in applications where the cost of false positives and false negatives varies. The F1 Confidence Curve can provide insights beyond what a single F1 score might reveal, especially in cases where models have similar F1 scores but different precision-recall balances.

Furthermore, we evaluate the overall object detection performance using the mean Average Precision (mAP). This metric represents the average of the Average Precision (AP) calculated for all the classes being detected.

A. YOLOv5 training outcome

The training of YOLOv5 on the curated dataset delivered highly encouraging results, particularly noteworthy due to the challenging conditions presented by fisheries images. The microscopic study approach is vital for capturing detailed images in challenging aquatic environments, where images often exhibit issues, such as being out of focus and blurry. The environment becomes especially challenging when dealing with microscopic subjects like nanoparticles and Artemia nauplii. This microscopic imaging approach is essential because it allows researchers to obtain highly detailed images, which is valuable. In a study [7] to identify, classify, and separate Larvae using an aspiration pipette or water stream, researchers also used a microscopic approach to capture images. Despite the complexities of microscopy, YOLOv5 demonstrated its robustness and adaptability to precisely detect and analyze objects within these images.

The confusion matrix of YOLOv5 shows that the pre-trained model had a success rate of 0.97 for Artemia, 0.86 for Cyst, and 0.64 for Excrement. The false positives were negligible for Artemia and Cyst with a rate of 0.01 and 0.02 respectively.

The F1 - Confidence serves as a dynamic representation of the model's precision-recall trade-off for different confidence thresholds. At a threshold of 0.241, the model consistently achieved a score of 0.76, which is a balanced measure of precision and recall. This finding signifies that the model demonstrated a remarkable ability to maintain both high precision, minimizing false positives, and high recall, effectively capturing positive instances within the dataset.

The precision-confidence curve is a crucial visualization that showcases the model's precision at different confidence thresholds. A precision score of 1.00 signifies that the model made no false-positive predictions at the confidence threshold of 0.843. In other words, when the model identified an object with a confidence score exceeding 0.843, it was unequivocally accurate in its predictions.

In the Precision-Recall curve, an average precision (AP) of 0.766 indicates that the model effectively minimized false positives while capturing the majority of positive instances within the dataset. The model consistently achieved a precision-recall balance at a confidence threshold of 0.5 that yielded an average precision (AP) of 0.766 across all classes.

As for recall confidence, at an extremely low confidence threshold of 0.000, the model consistently achieved an outstanding precision-recall balance, with a precision score of 0.96 across all classes.

The understanding of the rate of losses during both the training and validation phases is crucial for evaluating the performance of object detection models. During the initial stages of training, the loss of boxes started at approximately 0.09, indicating a reasonable starting point. As training progressed through the 10 epochs, there was a gradual and consistent improvement that resulted in a decrease in box loss. Around the 5th epoch, the rate of box loss reached a stable level which reflected the model's ability to efficiently predict bounding boxes for objects. In contrast, the validation loss curve represents a less smooth trajectory compared to the training curve. It started at approximately 0.06 and then rose to approximately 0.07, and then gradually decreased to around 0.04. After the 5th epoch, the validation curve exhibited stability and achieved a commendable box loss rate of 0.03.

The training curve showcased an initial object loss of approximately 0.035, which stabilized after just three epochs, ultimately reaching a final loss rate of 0.020. on the other hand, the validation curve displayed a more inconsistent pattern but ultimately achieved a superior loss rate of less than 0.014 at the end of training.

The class loss during training initially had a rate of over 0.025. However, after a mere 2-3 epochs, the loss rate dropped to less than 0.010 with an excellent final rate of 0.005. In the validation phase, the class loss was initially slightly above 0.014, accompanied by a few spikes in the curve. The final loss rate was around 0.005, signifying that the model excelled in inaccurate class prediction during both the training and validation phases

B. YOLOv8 training outcome

The confusion matrix of YOLOv8 shows that the pre-trained model had a success rate of 0.91 for Artemia, 0.92 for Cyst with 0 false positives, and 0.33 for Excrement.

The F1 - Confidence curve for YOLOv8 illustrates the model's precision-recall trade-off across various confidence thresholds. At a threshold of 0.234, the model consistently achieved an F1 score of 0.66.

For YOLOv8, at a confidence threshold of 0.849, the YOLOv8 model consistently achieved a perfect precision score of 1.00 for all classes.

At a moderate confidence threshold with [email protected], the model consistently achieved a balanced precision-recall performance, resulting in an average precision (AP) of 0.658 across all classes. The precision-recall curve is an essential visualization that assesses how well the model balances precision (minimizing false positives) and recall (capturing positive instances) at varying confidence thresholds. The model maintained a balance between precision and recall with [email protected].

The YOLOv8 model consistently achieved an outstanding precision-recall balance with a precision score of 0.89 across all classes.

In training, the rate of box loss was initially at 1.70 before exhibiting a consistent downward trajectory. After 10 training epochs, the final box loss rate was at less than 1.45. The validation phase also demonstrated a notable reduction in box loss by reaching an even more impressive rate of 1.40 after 10 epochs. The class loss during training followed a similar pattern. The model initially struggled with a class loss of approximately 2.5 which indicated challenges in correctly classifying objects. However, the model made progress by reducing the class loss to a rate of 1.0 after 10 epochs. On the other hand, the model's class loss exhibited a slightly different trajectory during validation. It started at a value of 1.6 before having some spikes in the curve. However, the curve gradually smoothed out beyond the 5th epoch, reaching a final rate of approximately 0.05.

During the training process, the Distributional focal loss (DFL) starts at an initial value of over 1.65. However, as training progresses, the loss exhibits a consistent downward trend, ultimately converging to a final rate of less than 1.45.

In the validation phase, the DFL was initially at 1.75, indicating some initial challenges. After the first two epochs, it demonstrates an encouraging decrease, reaching nearly 1.60. However, there is again an increase in the loss, peaking at 1.70. With some spikes in the validation curve, the Distributional focal loss becomes more stable over time, with a final loss rate of less than 1.50.

C. Inferencing Outputs

To evaluate the object detection capabilities of YOLOv5 and YOLOv8 on real-world data, a total of 25 images were subjected to testing for the detection of Artemia, cyst, and excrement. These images were processed using both models, each trained on the labeled dataset.

What makes the inference results intriguing is the subtle performance differences observed across the two models:

YOLOv5's Artemia and Cyst Detection: In several cases (Figures 22 and 23), YOLOv5 outperformed YOLOv8 in the detection of Artemia and cyst. Its ability to accurately identify and classify these objects showcased its proficiency, particularly in scenarios where precision and accuracy were paramount.

Challenging Excrement Detection: However, it is noteworthy that YOLOv5 faced challenges when it came to the detection of excrement. In these instances (Figure 24), YOLOv5 exhibited limitations, as it struggled to identify and detect excrement accurately. This observation suggests that YOLOv5 may require further fine-tuning or specialized training for enhanced performance in this specific detection task.

One of the reasons causing this difference between YOLOv5 and YOLOv8 in detecting excrement can be the utilization of DFL (Distributional Focal Loss) in YOLOv8. It plays a crucial role in addressing the challenges associated with object detection. DFL functions are used for bounding box loss and binary cross-entropy for classification loss. These losses have been specifically designed to enhance smaller object detection, excrement in our case. It extends Focal Loss from discrete to continuous labels, that optimize and improve quality estimation and class prediction [8]. It enables YOLOv8 to provide a more accurate representation of the flexible distribution present in real data, thus reducing the risk of inconsistencies in detection results.

Another key advantage of DFL is its ability to handle class imbalance effectively [9]. In our dataset, there is a huge imbalance between the classes. DFL assigns higher weights to challenging examples, excrement in our case. This function allows the network to focus on learning the probabilities of values around the continuous locations of target bounding boxes [10]. This ensures that the model can adapt to arbitrary and flexible distributions, improving its ability to detect and classify objects accurately, even in challenging scenarios.

With these findings, the choice between YOLOv5 and YOLOv8 should be driven by the specific requirements and variations of the task. Further studies and more extensive training may help confirm these initial observations and guide the selection of the most suitable model for a given application.

V. CONCLUSION

The outcomes and results derived from the evaluation of YOLOv5 and YOLOv8 in object detection present an intriguing and subtle picture. The analysis of SSIM in our study further evaluates the context of object detection performance between YOLOv5 and YOLOv8. The high SSIM values observed across various image concentrations emphasize the importance of maintaining consistent image quality and structural integrity in training datasets. For example, the precision in detecting Artemia and cyst using YOLOv5 suggests that its performance benefits significantly from high-quality, structurally similar images. One the other hand, the slight overall decrease in SSIM values with increased concentration levels indicates the subtle challenges YOLOv8 faces in maintaining detection accuracy across varying image qualities and structural similarities. While both models exhibit strengths and capabilities, the findings suggest that YOLOv5 may excel over YOLOv8 in certain scenarios. However, it is equally apparent that the performance of these models can be context-dependent and class-specific.

One of the noteworthy observations from the evaluation is that YOLOv5 demonstrated superior performance in detecting Artemia and cyst in several instances, showcasing its potential in precision-driven detection tasks. Its agility and accuracy in these areas are promising, particularly for applications where accurate object recognition is paramount.

However, it is equally important to acknowledge that YOLOv5 exhibited limitations, particularly in the detection of less-represented classes, such as excrement. In cases where there are limited instances of a class within the labeled dataset, YOLOv5 appeared to struggle in accurately detecting those class objects.

On the other hand, YOLOv8 demonstrated robustness in detecting objects across a wider range of classes, even in scenarios with limited instances of a class. This observation implies that YOLOv8 may offer greater versatility and adaptability in certain detection tasks, even when training data is scarce for specific classes.

ACKNOWLEDGMENT

This work is supported by the National Science Foundation (award number: 2038484, year: 2020).

REFERENCES

[1] Dey, P., Bradley, T. M., & Boymelgreen, A. (2023). The impact of selected abiotic factors on artemia hatching process through real-time observation of oxygen changes in a microfluidic platform. Scientific Reports, 13(1). https://doi.org/10.1038/s41598-023-32873-1

[2] Liu, Q., Gong, X., Li, J., Wang, H., Liu, R., Liu, D., Zhou, R., Xie, T., Fu, R., & Duan, X. (2023). A multitask model for realtime fish detection and segmentation based on Yolov5. PeerJ Computer Science, 9. https://doi.org/10.7717/peerj-cs.1262

[3] Li, J., Liu, C., Lu, X., & Wu, B. (2022). CME-yolov5: An efficient object detection network for densely spaced fish and small targets. Water, 14(15), 2412. https://doi.org/10.3390/w14152412

[4] Jain, S. (2023, May 26). DeepSeaNet: Improving underwater object detection using efficientdet. arXiv.org. https://arxiv.org/abs/2306.06075

[5] Ye, X., Liu, Y., Zhang, D., Hu, X., He, Z., & Chen, Y. (2023). Rapid and accurate crayfish sorting by size and maturity based on improved Yolov5. Applied Sciences, 13(15), 8619. https://doi.org/10.3390/app13158619

[6] Wang, J., & yu, N. (2022). UTD-Yolov5: A Real-time Underwater Targets Detection Method based on Attention Improved YOLOv5. https://arxiv.org/abs/2207.00837

[7] Zhang, G., Yu, X., Huang, G., Lei, D., & Tong, M. (2021). An improved automated zebrafish larva high-throughput imaging system. Computers in Biology and Medicine, 136, 104702. https://doi.org/10.1016/j.compbiomed.2021.104702

[8] Terven, J., & Cordova-Esparza, D. (2023, October 8). A comprehensive review of Yolo: From Yolov1 and beyond. arXiv.org. https://arxiv.org/abs/2304.00501

[9] Casas, E., Ramos, L., Bendek, E., & Rivas-Echeverría, F. (2023). Assessing the effectiveness of YOLO architectures for smoke and wildfire detection. IEEE Access, 11, 96554–96583. https://doi.org/10.1109/access.2023.3312217

[10] Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., Tang, J., & Yang, J. (2020, June 8). Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. arXiv.org. https://arxiv.org/abs/2006.04388

Authors:

(1) Mahmudul Islam, Masum School of Computing and Information Sciences, Florida International University Miami, USA ([email protected]);

(2) Arif Sarwat, Department of Electrical and Computer Engineering, Florida International University Miami, USA ([email protected]);

(3) Hugo Riggs,Department of Electrical and Computer Engineering, Florida International University Miami, USA ([email protected]);

(4) Alicia Boymelgreen, Department of Mechanical and Materials Engineering, Florida International University Miami, USA ([email protected]);

(5) Preyojon Dey, Department of Mechanical and Materials Engineering, Florida International University Miami, USA ([email protected]).

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.