Endpoints
Explanation
The device works in a rather straighforward way: you send images, and you get back a DiagnosticReport, as defined in HL7's FHIR® specifications. The following chart explains the basics of it:
Diagnostic support
Although diagnosis is a quotidian way of speaking about the output of the device, keep in mind that what the device outputs is an interpretative distribution representation of possible International Classification of Diseases (ICD) classes that might be represented in the pixel content of the image.
Indeed, healthcare practitioners and organisations may use the data outputed by the device to inform a diagnosis, but what the device itself outputs is not a diagnosis. This is appropiately signaled in the output of the device, which follows the FHIR standard when noting that the output is a DiagnosticReport
, with a status of preliminary
.
Severity measure
Although severity measure is a quotidian way of speaking about the output of the device, keep in mind that what the device outputs is an quantifiable data on the intensity, count and extent of clinical signs such as erythema, desquamation, and induration, among others.
Indeed, healthcare practitioners and organisations may use the data outputed by the device to determine the degree of affectation of a patient, but what the device itself outputs is not the severity. This is appropiately signaled in the output of the device, which follows the FHIR standard when noting that the output is a DiagnosticReport
, with a status of preliminary
.
Clinical indicators
Clinical indicators are a set of values derived from the diagnostic support output of the device. The diagnosis support output of the device is an interpretative distribution representation of possible International Classification of Diseases (ICD) classes that might be represented in the pixel content of the image. In other words, it is a probability distribution in which every ICD-11 category is given a probability value between 0 and 1. As it is a probability distribution, the sum of the entire distribution equals to 1.
Condition confirmation
The diagnosis support output of the device covers a wide range of ICD-11 categories, except for one class: Non-specific lesion
. This class is activated when the device did not find any condition in the image.
With this in mind, it is possible to determine from the predicted probability distribution how likely is that the picture contains any kind of condition (hasCondition
). This could be done by summing the probabilities of all classes, excluding Non-specific lesion
, which is achieved with the following formula: However, there is a faster and more efficient way to calculate it:
Where is the probability given to the Non-specific lesion
category in the probability distribution. As the sum of the entire probability distribution equals to 1, the probability of the image depicting a condition is simply the subtraction of the Non-specific lesion
probability from the total probability (1).
Weighted sum findings
Several clinical indicators (pigmentedLesion
, urgentReferral
, highPriorityReferral
, and malignancy
) are obtained via a weighted sum of the device's output, using category weights defined specifically for each finding. These weights are binary values (0 or 1) and indicate which ICD-11 categories from the output contribute to the finding's value.
- For the
pigmentedLesion
finding, all ICD-11 categories that correspond to a pigmented lesion are given a positive weight () 1, and negative () otherwise. - For the
urgentReferral
finding, all ICD-11 categories related to conditions that require urgent referral (i.e. should be referred between 0-48 hours) are given a positive weight () 1, and negative () otherwise. - For the
highPriorityReferral
finding, all ICD-11 categories related to conditions that, despite not requiring an urgent referral, have a higher priority for referral than others (i.e. referred in 7-15 days), are given a positive weight () 1, and negative () otherwise. - For the
malignancy
finding, all ICD-11 categories related to malignancy (skin cancer) are given a positive weight () 1, and negative () otherwise.
The value of each finding () is computed using the weighted sum of the device's output (i.e. the probability distribution):
Where is the total number of predicted ICD-11 categories, and and are the weight and probability of the -th category of the distribution.
Performance indicators
Performance indicators are a set of values that provide a deeper understanding of the device's skin disease recognition performance on a given input as well as on the internal test data used during development. Similarly to the clinical indicators, they are derived from the diagnosis support output of the device, which is an interpretative distribution representation of possible International Classification of Diseases (ICD) classes that might be represented in the pixel content of the image. In other words, it is a probability distribution in which every ICD-11 category is given a probability value between 0 and 1. As it is a probability distribution, the sum of the entire distribution equals to 1.
Top-K sensitivity and specificity
In order to measure the skin disease recognition performance of the device for each specific ICD-11 category, we compute category-wise top-K sensitivity and specificity metrics on our hold-out test set. These are common metrics for binary classification scenarios:
- Sensitivity (also known as true positive rate) is the probability of a positive output, conditioned on the test case truly being positive.
- Specificity (or true negative rate) is the probability of a negative output, conditioned on the test case truly being negative.
In order to apply these metrics to our multiclass scenario, we used the following strategy for each ICD-11 category . We also modified the metrics to account for the diagnosis support use case (i.e. looking at the top-K suggestions instead of just the top-1 prediction):
- If the ground truth label of an image corresponds to class , it is considered a positive (1) case, and negative (0) if it is any other ICD-11 category.
- If category is within the top-K predicted classes, the prediction is considered a positive (1) output. If category is not in the top-K list, the output is negative (0).
As the predictions and ground truth labels have been converted to a binary case (0/1), we can compute sensitivity and specificity using true positives, false negatives, false positives, and true negatives:
Positive output | Negative output | |
---|---|---|
Positive label | ✔️ True positive (TP) | ❌ False negative (FN) |
Negative label | ❌ False positive (FP) | ✔️ True negative (TN) |
We compute sensitivity and specificity for several values of (1,3, 5), resulting in the top-1, top-3, and top-5 sensitivity and specificity performance indicators.
Entropy
This finding is included to provide the user with an estimation of the uncertainty associated to the diagnosis support output of the device. Normalised entropy () is defined as:
Where is the natural logarithm, the total number of ICD-11 categories, and the probability of th -th category in the probability distribution.
Low entropy values indicate that the mass of the distribution is concentrated on a reduced number of categories, which can be interpreted as the device being confident about its prediction. Conversely, high entropy values indicate that the mass of the probability distribution is distributed equally across all categories, suggesting that the device is not confident about its prediction.