Key performance characteristics
The device performs a wide range of tasks, for which different performance metrics are applicable in terms of accuracy, error rates, and the ability to correctly identify and segment relevant features in images.
An in-depth review of some performance metrics can be found in our published research, listed below:
- Automatic SCOring of Atopic Dermatitis Using Deep Learning: A Pilot Study.
- Dermatology Image Quality Assessment (DIQA): Artificial intelligence to ensure the clinical utility of images for remote consultations and clinical trials.
- Automatic International Hidradenitis Suppurativa Severity Score System (AIHS4): A novel tool to assess the severity of hidradenitis suppurativa using artificial intelligence.
- Automatic Urticaria Activity Score (AUAS): Deep Learning-based Automatic Hive Counting for Urticaria Severity Assessment.
Those publications explain in high detail and transparency the performance of the device. Here is are the metrics for each visible clinical sign and recognition of visible ICD classes:
- Recognition of visible ICD classes
- Top-1 accuracy: 74.07%
- Top-3 accuracy: 86.76%
- Top-5 accuracy: 90.20%
- Malignancy
- AUC: 0.96
- Presence of a dermatological condition
- AUC: 0.99
- Critical complexity
- AUC: 0.94
- Erythema intensity
- RMAE: 13.30%
- Edema intensity
- RMAE: 16.0%
- Oozing intensity
- RMAE: 19.40%
- Excoriation intensity
- RMAE: 9.60%
- Lichenification intensity
- RMAE: 8.70%
- Dryness intensity
- RMAE: 11.30%
- Induration intensity
- RMAE: 9.20%
- Desquamation intensity
- RMAE: 10.45%
- Pustulation intensity
- RMAE: 15.00%
- Exudation intensity
- BAC: 64.00%
- Edges intensity
- BAC: 74.00%
- Affected tissues intensity
- BAC: 69.00%
- Inflammatory lesion count
- Precision: 86.60%
- Recall: 79.70%
- Hive count
- Precision: 68.40%
- Recall: 57.10%
- Nodule count
- MAE: 2.16
- Abscess count
- MAE: 2.16
- Draining tunnel count
- MAE: 2.16
The metrics for additional processors:
- Image modality detection
- AUC: 0.9957
- Skin structure detection
- AUC: 0.9957
- Image quality assessment
- Linerar correlation: 0.74
Furthermore, we have conducted more clinical investigations and we are working towards also publishing those results, as we take the necessary steps alongside the investigators and hospitals involved. In the following list, we disclose some outcomes of studies we carried out:
- LEGIT_MC_EVCDAO_2019: The device has demonstrated an excellent performance in terms of malignancy prediction, which turns it into a valuable tool to prioritize patients according to their risk of presenting malignancy. The AUC metric for the malignancy prediction was 87.28% (and 88.26% in the extension), which is comparable to that of expert HCPs' and speaks to the potential of using the device to improve clinical workflows. Regarding skin lesion recognition in general terms, the Top-5 accuracy was 88.83% (and 83.16% in the extension), which supports the device's intended use as a clinical desicion-support tool. And specifically in melanoma, the AUC metric was 76.75% (and 82.38% in the extension) which is considerably high and means the consecution of the goals set out in the hypotheses of the study. This study was conducted on 96 subjects from two hospitals (Hospital Universitario Cruces and Hospital Universitario Basurto) since 2020, in collaboration with two senior dermatologists. The skin diseases studied were different types of nevi, vascular lesions, cutaneous neoplasms (benign and malignant), and keratoses.
- LEGIT_COVIDX_EVCDAO_2022:. The comprehensive analysis of the CUS, Data Utility questionnaire, SUS, and Patient Satisfaction questionnaire has provided valuable insights into the tool's effectiveness in supporting dermatologists in their clinical practice. The observed sample mean of 76.67 on the CUS suggests that the device has been positively received by the participating specialists. Noteworthy is the unanimous agreement on the ease of use and the high rating for optimizing time according to each patient's needs. Additionally, the device demonstrated efficiency in generating reports, receiving high ratings from the specialists. These outcomes affirm the device's potential to streamline clinical workflows and enhance patient care. This study was conducted with a cohort of 160 patients from the Dermatology Department of Hospital Universitario de Torrejón, and includes different types of keratoses, pigmented lesions (benign and malignant), and inflammatory lesions.
- LEGIT.HEALTH_DAO_Derivación_O_2022: This study reveals that approximately 29% of the referrals involve common and easily diagnosable conditions, even those from teledermatology. About half of them being related to seborrheic keratosis. The primary care doctors exhibit a notably low sensitivity of approximately 25% when it comes to the crucial task of deciding whether to refer a patient to secondary care, particularly to dermatologists. In terms of the waiting list, the analysis assumes that patients could have received treatment earlier, and the appointment delays were a result of the hospital's waiting list. As of today, 51 subjects have been recruited for this study, which will eventually increase up to 400. The skin diseases observed in the current cohort include different kinds of keratoses, nevus, pigmented lesions (both benign and malignant), and eccemas.
The following section contains a non-exhaustive list of performance indexes used to measure the performance of the device:
- Top-5 Accuracy: Measures the frequency of the correct class appearing within the top 5 predictions of the device.
- AUC (Area Under the Curve): Indicates the device's ability to differentiate between categories.
- MAE (Mean Absolute Error): The average absolute difference between predicted and actual severity levels.
- RMAE (Root Mean Absolute Error): The square root of MAE, showing error magnitude in severity units.
- Precision: Proportion of correct positive classes among all positive identifications by the device.
- Accuracy: Overall rate of correct identification for both positive and negative classes.
- Sensitivity: Device's proficiency in identifying positive cases accurately.
- IoU (Intersection over Union): Used in image segmentation; measures the overlap between prediction and actual segmentation.
- Cohen's Kappa: Assesses agreement between predicted and actual diagnoses, adjusting for chance.
- Recall: Device's ability to identify all relevant cases of a specific class.
- BAC (Balanced Accuracy): Average of true positive and true negative identification proportions.
- Correlation: Strength and direction of the relationship between predicted and actual severity scores.
- TPR (True Positive Rate): Rate at which the device correctly identifies true positive cases.
- 1 Day Ahead Accuracy: Evaluates the precision of predictions for the following day.
- 7 Days Ahead Accuracy: Measures prediction accuracy for a seven-day timeframe.
Keep in mind that the device perform very different tasks. For instance, in the task of Image Quality Assurance, the relevant metrics are Pearson correlation
, Spearman correlation
and Balanced accuracy
. However, in the quantification, count and measure of extent of the clinical signs the relevant metrics may be AUC
, RMAE
and IoU
, among others, depending on the clinical signs. This is explained in the publications listed above.
As such, it is impractical and not helpful to try to provide an exhaustive list of all the performance metrics for each processor. Instead, we advise you to review our previously mentioned publications, which explain in high detail and transparency the performance.