Dataset Development

Dataset Development Process

The ICCR cancer datasets are developed under a quality framework which dictates both how the datasets look as well as what should be included. Each dataset has been developed by an international panel, the Dataset Authoring Committee (DAC), and includes Core and Non-core elements. The roles and responsibilities of the DAC and others in the dataset development process are outlined in Roles and Responsibilities for the ICCR dataset development process.

Core elements are those which are essential for the clinical management, staging or prognosis of the cancer. These elements will either have evidentiary support at Level III-2 or above, based on prognostic factors in the National Health and Medical Research Council (NHMRC) levels of evidence document below:

Explanatory notes

Definitions of these study designs are provided on pages 7-8 How to use the evidence: assessment and application of scientific evidence (NHMRC 2000b) and in the accompanying Glossary.
These levels of evidence apply only to studies of assessing the accuracy of diagnostic or screening tests. To assess the overall effectiveness of a diagnostic test there also needs to be a consideration of the impact of the test on patient management and health outcomes (Medical Services Advisory Committee 2005, Sackett and Haynes 2002). The evidence hierarchy given in the ‘Intervention’ column should be used when assessing the impact of a diagnostic test on health outcomes relative to an existing method of diagnosis/comparator test(s). The evidence hierarchy given in the ‘Screening’ column should be used when assessing the impact of a screening test on health outcomes relative to no screening or alternative screening methods.
If it is possible and/or ethical to determine a causal relationship using experimental evidence, then the ‘Intervention’ hierarchy of evidence should be utilised. If it is only possible and/or ethical to determine a causal relationship using observational evidence (eg. cannot allocate groups to a potential harmful exposure, such as nuclear radiation), then the ‘Aetiology’ hierarchy of evidence should be utilised.
A systematic review will only be assigned a level of evidence as high as the studies it contains, excepting where those studies are of level II evidence. Systematic reviews of level II evidence provide more data than the individual studies and any meta-analyses will increase the precision of the overall results, reducing the likelihood that the results are affected by chance. Systematic reviews of lower level evidence present results of likely poor internal validity and thus are rated on the likelihood that the results have been affected by bias, rather than whether the systematic review itself is of good quality. Systematic review quality should be assessed separately. A systematic review should consist of at least two studies. In systematic reviews that include different study designs, the overall level of evidence should relate to each individual outcome/result, as different studies (and study designs) might contribute to each different outcome.
The validity of the reference standard should be determined in the context of the disease under review. Criteria for determining the validity of the reference standard should be pre-specified. This can include the choice of the reference standard(s) and its timing in relation to the index test. The validity of the reference standard can be determined through quality appraisal of the study (Whiting et al 2003).
Well-designed population based case-control studies (eg. population based screening studies where test accuracy is assessed on all cases, with a random sample of controls) do capture a population with a representative spectrum of disease and thus fulfil the requirements for a valid assembly of patients. However, in some cases the population assembled is not representative of the use of the test in practice. In diagnostic case-control studies a selected sample of patients already known to have the disease are compared with a separate group of normal/healthy people known to be free of the disease. In this situation patients with borderline or mild expressions of the disease, and conditions mimicking the disease are excluded, which can lead to exaggeration of both sensitivity and specificity. This is called spectrum bias or spectrum effect because the spectrum of study participants will not be representative of patients seen in practice (Mulherin and Miller 2002).
At study inception the cohort is either non-diseased or all at the same stage of the disease. A randomised controlled trial with persons either non-diseased or at the same stage of the disease in both arms of the trial would also meet the criterion for this level of evidence.
All or none of the people with the risk factor(s) experience the outcome; and the data arises from an unselected or representative case series which provides an unbiased representation of the prognostic effect. For example, no smallpox develops in the absence of the specific virus; and clear proof of the causal link has come from the disappearance of small pox after large-scale vaccination.
This also includes controlled before-and-after (pre-test/post-test) studies, as well as adjusted indirect comparisons (ie. utilise A vs B and B vs C, to determine A vs C with statistical adjustment for B).
Comparing single arm studies ie. case series from two studies. This would also include unadjusted indirect comparisons (ie. utilise A vs B and B vs C, to determine A vs C but where there is no statistical adjustment for B).
Studies of diagnostic yield provide the yield of diagnosed patients, as determined by an index test, without confirmation of the accuracy of this diagnosis by a reference standard. These may be the only alternative when there is no reliable reference standard.

Note A: Assessment of comparative harms/safety should occur according to the hierarchy presented for each of the research questions, with the proviso that this assessment occurs within the context of the topic being assessed. Some harms (and other outcomes) are rare and cannot feasibly be captured within randomised controlled trials, in which case lower levels of evidence may be the only type of evidence that is practically achievable; physical harms and psychological harms may need to be addressed by different study designs; harms from diagnostic testing include the likelihood of false positive and false negative results; harms from screening include the likelihood of false alarm and false reassurance results.

Note B: When a level of evidence is attributed in the text of a document, it should also be framed according to its corresponding research question eg. level II intervention evidence; level IV diagnostic evidence; level III-2 prognostic evidence.

Note C: Each individual study that is attributed a “level of evidence” should be rigorously appraised using validated or commonly used checklists or appraisal tools to ensure that factors other than study design have not affected the validity of the results.

From: Merlin T, Weston A, Tooher R. Extending an evidence hierarchy to include topics other than treatment: revising the Australian ‘levels of evidence’. BMC Medical Research Methodology, 2009.

In rare circumstances, where level III-2 evidence is not available an element may be made a Core element where there is unanimous agreement by the DAC. An appropriate staging system, e.g. Pathological TNM staging, would normally be included as a Core element.

Non-core elements are those which are unanimously agreed should be included in the dataset but are not supported by level III‐2 evidence. These elements may be clinically important and recommended as good practice but are not yet validated or regularly used in patient management.

The dataset development process is outlined in the Guidelines for the Development of ICCR Datasets document and in the figure below:
To ensure a consistent approach to the content of ICCR datasets, terms and their recommended usage have been defined and are available in the ICCR Harmonisation Guidelines.

These documents will be updated periodically in order to maintain currency and to take advantage of improvements in process and agreement in terminology that will be achieved as the collaborative process progresses.

New datasets

There is a close interdependency between cancer reporting datasets and the World Health Organization (WHO) Classification of Tumours ‘Blue Books’, and therefore the ICCR has made a commitment to publish datasets in synchrony with the WHO updates. Therefore, new datasets will generally be developed in synchrony with the WHO updates, however, changes to other dependent publications such as TNM or International Federation of Gynaecology and Obstetrics (FIGO) staging will also trigger development of new datasets. Additional datasets may be scheduled for development if a specific need arises and resources are available.

Updates to datasets

ICCR Datasets will be scheduled for review and possible revision every three years at a minimum. Updates before the date of formal review may also be undertaken as a result of errors, changes to dependent publications such as the WHO Classification of Tumours or staging systems e.g. Union for International Cancer Control (UICC), American Joint Committee on Cancer (AJCC), FIGO, or significant changes in clinical or diagnostic evidence or management related to a specific cancer for example.

The process of updating ICCR Datasets is described in the Guidelines for the Development of ICCR Datasets document.

Dataset Development