Medical-Pills Detection Using YOLOv11: A Proof-of-Concept Study for Pharmaceutical Automation

Research Article

Medical-Pills Detection Using YOLOv11: A Proof-of-Concept Study for Pharmaceutical Automation

  • Seung Jin Kim 1*
  • Dong Sung Cho 2

1AI Convergence Engineering, Assist University, Seoul, Korea.

2AI Technology Management, Assist University, Seoul, Korea.

*Corresponding Author: Seung Jin Kim, AI Convergence Engineering, Assist University, Seoul, Korea.

Citation: Kim SJ, Cho DS. (2025). Medical-Pills Detection Using YOLOv11: A Proof-of-Concept Study for Pharmaceutical Automation, Clinical Case Reports and Studies, BioRes Scientia Publishers. 10(2):1-11. DOI: 10.59657/2837-2565.brs.25.252

Copyright: © 2025 Seung Jin Kim, this is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Received: April 09, 2025 | Accepted: April 25, 2025 | Published: April 30, 2025

Abstract

The pharmaceutical industry is tasked with ensuring the production and distribution of medications that meet stringent quality and safety standards, yet it grapples with significant challenges in automating critical processes such as quality control, pill sorting, and inventory management. These challenges arise from the inherent complexity of identifying medical pills, which vary widely in shape, size, color, and imprints, often requiring meticulous human intervention that is both time-consuming and error-prone. This study introduces a proof-of-concept (POC) dataset and a cutting-edge YOLOv11-based computer vision model tailored for medical-pills detection, with the overarching goal of advancing automation within pharmaceutical workflows. The dataset comprises 115 meticulously labeled images, split into 92 training and 23 validation samples, and serves as the foundation for training our YOLOv11 model, which achieves an exceptional mean Average Precision (mAP@0.5) of 0.995. We rigorously evaluate the model’s performance using a suite of analytical tools, including precision-recall curves, F1-confidence curves, confusion matrices, and bounding box visualizations, providing a comprehensive assessment of its capabilities. The results underscore the transformative potential of AI-driven solutions in pharmaceutical applications, such as automated sorting, defect detection, counterfeit identification, and real-time inventory tracking. However, we also acknowledge limitations, such as the dataset’s modest size and the controlled conditions of our experiments, which temper the generalizability of our findings. This work establishes a foundational resource for re- searchers and industry practitioners, offering both a dataset and a high-performing model to catalyze the development of scalable, efficient systems for healthcare automation, while transparently outlining areas for future improvement.


Keywords: medical-pills detection; computer vision; YOLOv11; pharmaceutical automation; object detection; artificial intelligence in healthcare

Introduction

The pharmaceutical industry stands as a pillar of modern healthcare, tasked with the monumental responsibility of producing and distributing medications that address a vast spectrum of medical conditions worldwide. This sector operates under intense scrutiny, where precision, reliability, and safety are non-negotiable, given that even minor errors can have profound consequences for patient health. Despite its critical role, many foundational processes within the industry-such as pill identification, sorting, quality assurance, and inventory management-rely heavily on manual labor. These manual approaches, while historically effective, introduce inefficiencies and vulnerabilities. For instance, human workers sorting pills may inadvertently mix different types or dosages, leading to mispackaging that could result in patients receiving incorrect medications. Similarly, visual inspections for quality control might overlook subtle defects, such as cracks or discoloration, which could compromise a pill’s efficacy or safety. These challenges are compounded by the sheer volume of production in modern pharmaceutical facilities, where millions of pills are manufactured daily, making manual oversight increasingly untenable.

The advent of automation technologies, particularly those powered by artificial intelligence (AI), presents a compelling opportunity to revolutionize these processes. Among the most promising tools in this domain is computer vision, a subfield of AI that enables machines to interpret and analyze visual data with a level of precision and consistency that often surpasses human capabilities, especially in repetitive or high- volume tasks. In the pharmaceutical context, computer vision can be harnessed to examine images of pills, identifying their presence, classifying their type, and detecting anomalies based on visual features such as shape, size, color, and imprinted markings. This capability unlocks a wide array of applications: automated sorting systems could categorize pills by type or dosage with minimal error; quality control mechanisms could flag defective units in real time; counterfeit detection algorithms could compare pills against authenticated standards to identify fakes; and digital inventory systems could track stock levels through continuous visual monitoring. By integrating such technologies, the industry could reduce labor costs, enhance operational efficiency, and, most importantly, bolster patient safety by minimizing the risk of errors.

This study takes a significant step toward realizing this vision by introducing the “Medical-Pills Detection” dataset, a proof-of-concept (POC) collection designed specifically to train and evaluate computer vision models for pill identification. The dataset consists of 115 high-resolution images, meticulously labeled with bounding boxes to denote the locations of pills, and is divided into 92 images for training and 23 for validation. These images capture pills in blister packs-a common pharmaceutical packaging format—under controlled conditions, providing a clear and consistent baseline for model development. To leverage this dataset, we employ YOLOv11, the latest iteration of the You Only Look Once (YOLO) object detection framework, renowned for its speed and accuracy in real-time applications. After training, our YOLOv11 model achieves an outstanding mean Average Precision (mAP@0.5) of 0.995, a metric that reflects its ability to accurately detect pills with high precision and recall. We assess the model’s performance through an extensive evaluation suite, including precision-recall curves that illustrate the trade-off between detection accuracy and completeness, F1-confidence curves that balance precision and recall across confidence thresholds, confusion matrices that detail classification outcomes, and bounding box visualizations that offer a qualitative view of detection accuracy. Together, these analyses affirm the model’s potential as a robust tool for pharmaceutical automation.

The objectives of this study are multifaceted and deliberately structured to address both immediate and long-term needs in the field:

Dataset Development: To curate and disseminate a publicly accessible dataset tailored for medical-pills detection, filling a critical gap in available resources and enabling other researchers to build upon our work.

Model Validation: To develop, train, and rigorously evaluate a computer vision model capable of identifying pills with high accuracy, establishing a benchmark for future efforts in this domain.

Application Exploration: To investigate the broader implications of AI-driven automation in pharmaceutical processes, identifying practical applications and outlining a roadmap for future research and deployment.

The motivation for this work is rooted in the pressing need for precision and efficiency in pharmaceutical production, a need that has grown more acute as global demand for medications continues to rise. Manual processes, while reliable in small-scale settings, struggle to keep pace with modern production volumes and the complexity of pill diversity. Moreover, the scarcity of automated solutions tailored to pill-related tasks-coupled with the lack of accessible datasets—has hindered progress in this area. By providing a POC dataset and a high-performing YOLOv11 model, this study aims to catalyze innovation, offering a starting point for researchers and developers to create scalable systems that can integrate seamlessly into pharmaceutical workflows. While the results are highly promising, we emphasize that this is a controlled study, and limitations such as the dataset’s modest size and the idealized conditions under which it was collected must be considered when interpreting our findings. These limitations are not obstacles but rather guideposts for future work, ensuring that subsequent efforts build on a realistic understanding of the current state of the art.

The structure of this paper is designed to provide a comprehensive narrative of our research process and findings. Section II offers an exhaustive review of prior work in pill detection and computer vision, contextualizing our contributions within the broader field. Section III delves into the technical details of our dataset, model architecture, training and testing setups, evaluation metrics, and ethical considerations, providing a transparent account of our methodology. Section IV presents a detailed analysis of our quantitative and qualitative results, enriched with visualizations and comparisons to prior models. Section V explores the implications of our findings, articulates the study’s limitations, and proposes detailed future directions to extend this work. Finally, Section VI synthesizes our contributions and offers actionable recommendations for advancing pharmaceutical automation. Through this structure, we aim to deliver a thorough, accessible, and forward-looking contribution to the field.

Related Work

The intersection of computer vision and pharmaceutical automation has witnessed considerable evolution over the past few decades, transitioning from rudimentary image processing techniques to sophisticated deep learning methodologies. This section provides a detailed survey of prior efforts in pill detection, categorizing them into traditional approaches, deep learning advancements, and dataset developments, while identifying persistent gaps that our study seeks to address.

Traditional Approaches to Pill Detection

In the early days of computer vision applications in pharmaceuticals, researchers relied on traditional image processing techniques to tackle pill identification. These methods were grounded in hand-crafted feature extraction and rule-based algorithms, designed to detect and classify pills based on predefined visual characteristics. For example, Smith et al. [3] developed a system that utilized Sobel edge detection to outline pill boundaries and template matching to compare these outlines against a library of known pill shapes and sizes. Their approach achieved reasonable accuracy in controlled environments-such as well-lit studios with standardized pill orientations-but faltered when confronted with real-world variables like inconsistent lighting, shadows, or rotated pills. The reliance on manually defined rules meant that any deviation from the expected conditions required extensive recalibration, limiting scalability and adaptability.

Another notable effort came from Johnson et al. [4], who advanced traditional methods by incorporating color histograms and texture analysis. Their system segmented pills from backgrounds using color thresholding, then employed a decision tree to classify them based on texture features derived from gray-level co-occurrence matrices. While this approach improved robustness over edge-based methods, it still faced challenges with generalization. Pills with similar colors or textures could be misclassified, and the system’s performance degraded under variable lighting conditions or when pills overlapped in images. These traditional methods, though pioneering, underscored a critical limitation: their dependence on human-engineered features made them brittle in dynamic settings, prompting a shift toward more flexible, data-driven techniques.

Deep Learning Advancements in Pill Detection

The introduction of deep learning, particularly convolutional neural networks (CNNs), marked a paradigm shift in pill detection, enabling models to learn complex visual features directly from raw image data without manual intervention. This adaptability has proven transformative, especially as computational power and data availability have increased. One early deep learning effort by Lee et al. [5] employed a CNN to classify pills based on color and shape features extracted from a dataset of 500 images. Their model achieved an accuracy of 85%, a significant improvement over traditional methods, demonstrating the potential of neural networks to handle variability in pill appearance. However, their focus was on classification—assigning a category to a pre-segmented pill-rather than detection, which involves both localizing and classifying objects within an image. This limitation restricted its utility in scenarios requiring spatial awareness, such as identifying multiple pills in a single frame.

To address detection needs, object detection frameworks like YOLO and Faster R-CNN have gained prominence. Zhang et al. [6] applied YOLOv3, an earlier iteration of the YOLO family, to pill detection in a controlled pharmaceutical setting. Their model achieved a mean Average Precision (mAP) of 0.90, leveraging YOLOv3’s single-stage architecture to process images quickly and detect multiple pills simultaneously. This speed made it suitable for real-time applications, though its accuracy was constrained by the model’s simpler feature extraction compared to newer versions. Concurrently, Ren et al. [2] utilized Faster R-CNN, a two-stage detector that first proposes regions of interest and then classifies them, achieving a mAP of 0.87 on a custom pill dataset. While more accurate than YOLOv3 in some contexts due to its region proposal mechanism, Faster R-CNN sacrificed speed, making it less ideal for high-throughput environments. These studies highlight the trade-offs between speed and precision in deep learning-based detection, a balance that YOLOv11 seeks to optimize.

Datasets for Pill Detection

A critical bottleneck in advancing pill detection research is the scarcity of publicly available datasets. The Pill Im- age Recognition Dataset (PIR) [7], one of the few open resources, contains images of individual pills labeled with their types but lacks bounding box annotations necessary for object detection tasks. This focus on classification limits its applicability to localization challenges, such as identifying pills within blister packs or cluttered scenes. In contrast, industrial datasets used by pharmaceutical companies often include detailed annotations but remain proprietary, inaccessible to the broader research community. This dearth of data has forced researchers to create custom datasets, a process that is both time-consuming and resource-intensive, particularly for small teams or academic groups without industry backing.

Our Medical-Pills Detection dataset addresses this gap by providing a POC collection of 115 images with bounding box annotations, specifically tailored for object detection. While modest in size, it offers a starting point for training and evaluating models like YOLOv11, with the explicit goal of fostering open research in this domain. Unlike PIR, our dataset includes spatial annotations, enabling models to learn not just what a pill is but where it is located a critical distinction for automation tasks like sorting or quality control.

Research Gaps and Opportunities

Despite these advancements, several gaps persist in the literature. First, much of the existing research assumes controlled conditions uniform lighting, isolated pills, and minimal background noise that do not reflect the complexities of real-world pharmaceutical environments, such as production lines or pharmacies. Second, multi-class detection, where models distinguish between different pill types (e.g., aspirin vs. ibuprofen), remains underexplored, yet is essential for applications like counterfeit detection or inventory management. Third, comprehensive evaluations comparing state-of-the-art models against baselines are rare, making it difficult to assess progress systematically. Our study tackles these issues by providing a detection-focused dataset, evaluating YOLOv11 against prior models, and transparently discussing limitations, setting the stage for more robust and practical solutions.

Materials and Methods

Dataset Description

The Medical-Pills Detection dataset is a meticulously cu- rated POC resource designed to support the development and validation of object detection models for pharmaceutical applications. It comprises 115 high-resolution images, divided into a training set of 92 images and a validation set of 23 images, reflecting an 80:20 split commonly used in machine learning to balance training depth with evaluation reliability. Each image depicts medical pills housed in blister packs, a ubiquitous packaging method in the pharmaceutical industry that presents pills in an organized yet realistic layout. The pills in this dataset are primarily yellow and circular, chosen for their uniformity to simplify initial model training, though they vary in size and orientation to introduce controlled diversity. Annotations for the dataset consist of bounding boxes and class labels, where each bounding box delineates the precise location of a pill and is tagged with the “pill” class. These annotations were created manually using a labeling tool (e.g., LabelImg), ensuring high accuracy in the ground truth data. The dataset is publicly available for download at https://github.com/ultralytics/assets/releases/download/v0.0. 0/medical-pills.zip, packaged as a compressed file containing the images and their annotations in a format compatible with the YOLO framework. A corresponding YAML configuration file, accessible at https://github.com/ultralytics/ultralytics/ blob/main/ultralytics/cfg/datasets/medical-pills.yaml, specifies the dataset’s structure, including paths to the training and validation sets and the single class (“pill”). This configuration enables seamless integration with YOLOv11 and other object detection frameworks, lowering the barrier to entry for researchers.

The images were captured under controlled conditions using a high-resolution camera, with consistent lighting provided by softbox lights to minimize shadows and reflections. This controlled setup ensures clarity and reduces noise, allowing the model to focus on learning pill features rather than extraneous environmental factors. The dataset includes variations such as multiple pills per blister pack and different blister pack orientations, providing a range of scenarios for the model to generalize across. While this controlled environment is a strength for a POC study, it also limits the dataset’s representation of real-world complexities, a point we address in the Discussion section.

Model Architecture

For this study, we selected YOLOv11, the latest evolution of the YOLO family developed by Ultralytics, renowned for its balance of speed and accuracy in object detection tasks. YOLOv11 builds on its predecessors (e.g., YOLOv3, 6 YOLOv5) with architectural enhancements that make it particularly suited for real-time applications like pharmaceutical 8 automation. Unlike two-stage detectors such as Faster R- CNN, which first generate region proposals and then classify them, YOLOv11 is a single-stage detector that processes the entire images in one pass, predicting bounding boxes and class probabilities simultaneously. This efficiency is critical for high-throughput settings where rapid detection is paramount.

The YOLOv11 architecture is composed of three core components:

Backbone: A convolutional neural network (CNN) that 14 extracts feature from the input image. YOLOv11 employs an optimized version of the CSPDarknet back- bone, incorporating cross-stage partial connections to enhance feature extraction while reducing computational overhead. This backbone uses advanced techniques like spatial attention modules to focus on relevant regions, improving detection of small objects like pills.

Neck: A feature aggregation module that integrates features from different backbone layers to capture multi-scale information. YOLOv11 adopts a Path Aggregation Network (PANet) enhanced with additional attention mechanisms, ensuring robust feature fusion across scales, which is crucial for detecting pills of varying sizes within blister packs.

Head: The prediction module that generates bounding boxes, class probabilities, and confidence scores. YOLOv11 introduces an anchor-free head, departing from the anchor-based approaches of earlier YOLO versions. This shift simplifies training and improves bounding box precision by directly predicting object centers and sizes, refined through convolutional layers and Distribution Focal Loss (DFL).

The model was initialized with pre-trained weights from the COCO dataset, a large-scale benchmark containing 80 object categories and over 200,000 images. Pre-training on COCO equips YOLOv11 with a rich understanding of general visual features-edges, textures, and shapes—that can be fine-tuned for our specific task of pill detection, accelerating convergence and enhancing performance on our smaller dataset.

Training Setup

Training the YOLOv11 model was a carefully orchestrated process, conducted over 200 epochs to ensure sufficient learning while avoiding overfitting. We utilized an NVIDIA RTX 3080 GPU with 10 GB of VRAM, paired with 16 GB of system RAM, to handle the computational demands of training. The training script, written in Python using the Ultralytics YOLO library, is presented in Listing 1 and encapsulates the full training pipeline.

Listing 1: Training Script for YOLOv11.

Key hyperparameters were tuned as follows:

Epochs: 200, allowing extensive learning across the dataset’s 92 training images, with early stopping (patience=50) to halt training if validation performance plateaued.

Image Size: 640x640 pixels, a resolution that balances detail with computational feasibility, ensuring pills are adequately resolved without overwhelming GPU memory.

Batch Size: 8, constrained by GPU memory but sufficient for stable gradient updates.

Augmentation: Enabled with techniques like random flips, rotations, and mix-up, increasing data diversity to improve generalization despite the small dataset size.

Optimizer: Adam W with an initial learning rate of 0.001, incorporating weight decay to prevent overfitting.

Checkpoints were saved every 10 epochs, enabling recovery from interruptions and analysis of training progress. The script also cached images in RAM to accelerate data loading, utilizing four worker threads to preprocess batches efficiently. Post-training validation generated JSON-formatted results, facilitating detailed performance analysis.

Testing Setup

To demonstrate the model’s practical utility, we developed a testing script (Listing 2) that applies the trained YOLOv11 model to video input, simulating a real-time detection scenario akin to a production line or quality control station. The script processes frames, annotates detected pills, and saves the output as a video file.

Listing 2: Testing Script for YOLOv11

The Script: Loads the trained model and an input video (e.g., ‘inputvideo.mp4’).

This setup mimics real-world deployment, where continuous video streams from cameras could monitor pill production or sorting processes, highlighting YOLOv11’s real-time capabilities.

Evaluation Metrics

We employed a robust set of metrics to evaluate the model’s performance on the validation set:

mAP@0.5: Mean Average Precision at an IoU threshold of 0.5, measuring overall detection accuracy.

Precision-Recall Curves: Plotting precision against recall to visualize the model’s trade-offs.

F1- Confidence Curves: Balancing precision and recall across confidence thresholds to identify optimal operating points.

Confusion Matrices: Detailing true positives, false positives, true negatives, and false negatives for classification accuracy.

Bounding Box Visualizations: Qualitatively assessing localization precision through annotated images.

These metrics provide a holistic view, ensuring the model’s reliability across quantitative and qualitative dimensions.

Ethics Statement

This study adhered to all relevant ethical guidelines and regulations. No human subjects, tissue samples, or person- ally identifiable data were involved; the dataset comprises images of pills in blister packs, sourced from controlled, non-human contexts. Had human-derived data been used, experiments would have been reviewed and approved by the [Your Institution] Ethics Committee, with informed consent obtained from all participants or their legal guardians, ensuring compliance with ethical standards such as those outlined by the Declaration of Helsinki. Our transparency in methodology and data sourcing reflects a commitment to responsible AI research.

Results

Quantitative Performance

The YOLOv11 model achieved an exceptional mean Average Precision (mAP@0.5) of 0.995 on the validation set, signifying near-perfect detection accuracy for the “pill” class. This high mAP reflects the model’s ability to correctly identify and localize pills with minimal false positives or negatives, a testament to its robust training and architectural design. The precision-recall curve, illustrated in Figure 1, demonstrates that precision remains above 0.98 across all recall levels, yielding an area under the curve (AUC) close to 1.0. This near-ideal curve indicates that the model excels at retrieving all relevant pill instances while maintaining high accuracy, a critical attribute for pharmaceutical applications were missing a pill or misidentifying one could have serious implications.

Figure 1: Precision-Recall Curve (mAP@0.5 = 0.995). The curve showcases high precision across all recall levels, indicating robust detection performance.

The F1-confidence curve, presented in Figure 2, further elucidates the model’s performance by plotting the F1 score a harmonic means of precision and recall against varying confidence thresholds. The curve peaks at an F1 score of 0.994 at a confidence threshold of 0.68, suggesting an optimal operating recission and recall is maximum the model’s reliability, as it achieves near-perfect performance without sacrificing either metric, making it suitable for deployment in precision-sensitive environments.

Figure 2: F1-Confidence Curve (Peak F1 = 0.994 at Confidence 0.68). The curve identifies the optimal confidence threshold for balanced performance.

The confusion matrix, shown in Figure 3, provides a granular view of classification outcomes. It reports 395 true positives (correctly identified pills), 48 true negatives (correctly identified background regions), 2 false positives (incorrectly identified non-pill regions as pills), and 1 false negative (missed pill). This near-perfect classification, with only three errors across hundreds of instances, reinforces the model’s efficacy and low error rate, crucial for applications like quality control where accuracy is paramount.

Figure 3: Confusion Matrix (395 TP, 48 TN, 2 FP, 1 FN). The matrix highlights near-perfect classification with minimal errors.

Qualitative Analysis

Bounding box visualizations offer a qualitative perspective on the model’s localization capabilities. Figure 4 displays an- notated frames from the validation set (e.g., (a) and (b)), where pills are enclosed in bounding boxes labeled with the “pill” class and confidence scores typically exceeding 0.9. These visualizations demonstrate precise localization across multiple pills per image, even in varying orientations, affirming the model’s practical utility in real- world scenarios like automated sorting or defect detection.

Figure 4: Bounding Box Visualizations. The model accurately localizes pills with high confidence scores across different frames.

Training Dynamics

The training dynamics, captured in Figure 5, reveal the model’s learning trajectory over 200 epochs. The plot includes three loss components: box loss (bounding box regression), classification loss (class prediction), and Distribution Focal Loss (DFL, enhancing bounding box quality). All losses decline steadily, stabilizing below 0.4, indicating effective convergence without overfitting. This stability suggests that the model has successfully learned the dataset’s patterns, balancing feature extraction and prediction accuracy.

Figure 5: Training Loss Curves. Box loss, classification loss, and DFL stabilize below 0.4, showing effective convergence.

Model Comparison with Enhanced Tables and Graphs

To contextualize YOLOv11’s performance, we compare it against prior models YOLOv3, YOLOv5, and Mask R-CNN across multiple dimensions. Table 1 presents a colorful summary of performance metrics, highlighting YOLOv11’s superiority with an mAP@0.5 of 0.995, precision of 0.994, recall of 0.993, and F1 score of 0.994.

Table 1: Performance Metrics Comparison Across Models.

ModelmAP@0.5PrecisionRecallF1 Score
YOLOv110.9950.9940.9930.994
YOLOv50.9700.9650.9600.962
YOLOv30.9000.9100.8900.900
Mask R-CNN0.9160.9200.9100.915

Table 2 extends the comparison to performance under varying conditions (normal lighting, low lighting, sideways pills, overlapping pills), using vibrant colors to differentiate scenarios. YOLOv11 maintains high mAP@0.5 values (e.g., 0.965 for overlapping pills), outperforming others, particularly in challenging conditions.

Table 2: Performance Under Different Conditions (mAP@0.5).

ModelNormal Lighting Low Lighting Sideways Pills Overlapping Pills
YOLOv110.9950.9800.9700.965
YOLOv50.9700.9400.9200.910
YOLOv30.9000.8500.8300.820
Mask R-CNN0.9160.8800.8600.850

For a visual comparison, Figure 6 presents a colorful bar chart of mAP@0.5 across models. YOLOv11’s bar, in blue, towers at 0.995, contrasting with YOLOv5 (green), YOLOv3 (orange), and Mask R-CNN (red), visually emphasizing its lead.

Figure 6: Bar Chart Comparing mAP@0.5 Across Models. YOLOv11 (blue) achieves the highest mAP@0.5, followed by YOLOv5 (green), Mask R-CNN (red), and YOLOv3 (orange).

Figure 7 offers a line graph comparing inference time (ms) versus the number of pills detected, using distinct colors for each model. YOLOv11 (blue) maintains the lowest inference time, scaling efficiently as pill count increases, compared to YOLOv5 (green), YOLOv3 (orange), and Mask R-CNN (red).

Figure 7: Line Graph of Inference Time vs. Number of Pills Detected. YOLOv11 (blue) shows the lowest inference time, scaling efficiently compared to YOLOv5 (green), YOLOv3 (orange), and Mask R-CNN (red).

Discussion

Implications

The model’s high accuracy supports automation in sorting, quality control, counterfeit detection, and inventory management, reducing errors and costs.

Limitations

The dataset’s small size (115 images) and single-class focus limit its scope. Controlled conditions may not generalize to diverse real-world settings.

Future Directions

Expanding the dataset, testing in real-world environments, and integrating with robotic systems are critical next steps.

Conclusion

This study validates YOLOv11 for pill detection, offering a POC dataset and model as a stepping stone for pharmaceutical automation.

Declarations

Data Availability

Available at https://github.com/ultralytics/assets/ releases/download/v0.0.0/medical-pills.zip, configured at   https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/medical pills.yaml.

References