A movement for patients, relatives and carers. Harnessing the patient voice to build confidence in the use of patient data for research and analysis.

A selection of the frequently asked questions

"There is no data about us, without us, so use our data to help us, and all the other cancer patients who will be coming in the future."

Patient advocate, use MY data

Cancer registration underpins what we know about cancer in the UK.

Traditionally cancer registration was about manually finding and counting new cases of cancer, and only recorded a few items of data. These were to identify the person and link their data from different hospitals, and to know something about the actual tumour, such as what type of cancer it was, where in the body it was, when it was diagnosed and how it was diagnosed.

Registration is now of a much higher quality than in the past, as the data are fed directly from hospital and lab systems, without the need for data to be re-entered. This makes the data feeds much quicker, and improves reliability, because the data are taken directly from the systems which are used to treat patients, and the different data can be cross-checked to look for data anomalies.

Having this extra data means that registration is not just about incidence (the number of new cases in a year), mortality (the number of deaths in a year) and survival (the numbers of people alive after a diagnosis).

Registration now allows us to look at differences in the actual care given, whether this is surgery, radiotherapy or chemotherapy. It allows us to look at the detailed routes by which people are diagnosed, and also to look at delays in this process.

And because all this relies on actual patient records, we are also able to look at differences across the country, differences between the sexes, differences for ethnic groups, and other factors.

Now that we have linked the patient experience records into the registration records, we can directly map the care given to a patient, with the experience and outcomes that they themselves are reporting.

Because we have the data, we are able to produce tools, reports and analyses which identify variation across the country, at hospital, commissioning group and locality levels. We can do this by types of cancer and many other factors.

All hospitals have their own IT systems, which hold details about the patients they treat (or have treated). The information comes from:

- the patient when they attend an appointment and give information, which is entered into the IT system

- the GP, who will send information about you to the hospital if they refer you for suspected cancer, or if they are asked by the treating clinical team for particular information to support your care.

Any form of treatment or investigation that a patient has is recorded by the hospital. It is common for hospitals to have different IT systems in different parts of the hospital, although more hospitals are now running more integrated systems, often called the EPR (Electronic Patient Record).

Where hospitals have more than one system, patient data may be on both systems. Hospitals should keep the data on the different systems aligned, but sometimes the data can be out of step. This can result in patients being asked to give their details again when they visit different departments of the same hospital. This is much less common than it used to be, but it usually happens when specific departments are still running their own ‘standalone’ systems.

By its very nature, data which is taken directly from the systems which are used in the hospitals to treat the patients should be high quality. Feeding this data directly to the registration process removes possible transcription errors.

However, there are errors in virtually all data sources, so additional processes are needed to check the accuracy.

First is the principle of multiple sources – taking data for the same patient from different data systems. If there is an error in one source, you should see this when it is compared to another source.

The second method is to use highly trained registration staff, who double-check each record before it is used for analysis.

A key stage in quality assurance is to reflect the data back to the clinical teams that supplied it in the first place. This is done at different stages and different time points, to ensure errors are spotted quickly and also to provide useful intelligence back to the clinical teams.

The most recent check is to let patients have access to their own data. Initially this was for a small subset of patients – those with a brain tumour. The initial brain tumour portal, developed with brainstrust, has now begun to extend with the support of Cancer Research UK

The types of cancer which are registered are dictated by the legal coverage within which registration works. There is a list of ’registrable conditions’, which comprise a set of codes used across the world to record the types of cancer. The codes, called ICD (International Classification of Diseases) codes, have developed through the years, and as each new version is issued, they are assigned a number. The current version in use across the NHS is ICD10.

The list of conditions that can be registered is shown in the attached document:

This is what cancer registration is all about. In legal terms, this translates as being the legal body which hosts the cancer registry:

- for England – Public Health England (PHE)

- for Scotland – the Information Services Division (ISD) of the Scottish government

- for Northern Ireland – the Cancer Registry is hosted by Queens University Belfast

- for Wales – the Welsh Cancer Intelligence and Surveillance Unit is hosted by Public Health Wales.

This could be answered in two ways – are the data which are a collected technically accurate; or are the data that are collected the right things to collect in order to answer the questions we need answering.

Firstly, experience shows that data always contains errors, particularly when it is used out of the context of the actual recording of the data. That is why we spend so much time and energy checking and assuring it.

Data always contains errors, particularly when it is used out of the context of the actual recording of the data. To counter this there are many mechanisms for checking and assuring the data.

See "How do we know cancer registration data are accurate?" for further information about data accuracy.

On the second angle (are we collecting the right data in the first place) there are regular reviews about this across the UK, driven by thing such as national strategies, changes in treatments (or new treatments being introduced), or changes in organisational structures

The main risk of sharing and using patient data is identification.

Patient data are held on NHS computers and databases and there is the risk that any computer system could be hacked. The NHS runs a separate, secure, encrypted national network (N3), to which only approved organisations can get access. All personal data which moves across the network is encrypted, including NHS to NHS email.

Before any data is released from the cancer registration system, there are formal processes to assess the potential risk of a release of data, including where the data contains small numbers. This includes a ‘privacy impact assessment’, which is used to identify and minimise confidentiality risks.

By default, all data released should be anonymised. Rather than de-identifying data by removing a patient’s name or address, applying anonymisation techniques minimises the possibility that someone could be re-identified from other data (such as a geographical location or dates related to admissions/treatments).

As part of the privacy impact assessment of data where name, address and NHS number have already been removed, there is an assessment of whether the data are anonymised in accordance with an anonymisation standard developed by the Health and Social Care Information Centre (Standard ISB 1523: Anonymisation Standard for Publishing Health and Social Care Data). This is a ‘k-anonymity’ check.

K-anonymity is used to limit the unique fields in a dataset, so that no single individual can be identified. These fields are often called indirect identifiers because they can be used in combination to identify individuals. These include:

- any derivation of date of birth (such as age range)

- gender

- ethnic category

- any derivation of postcode (such as area code)

- event dates (such as hospital admission date, whereas hospital admission month and year is acceptable)

- employer

- occupation or staff group.

The data may need to be transformed so that there is a minimum of three people (k-3) who all share the same controlled characteristics.

There is slight variation in the timing that the data are processed in the different countries of the UK, but in general data begins to arrive at the registry (from hospitals, path labs, etc.) by the fourth month after diagnosis. So for anyone diagnosed in January, the first three months of their data will start to arrive in April. Further data continues to arrive (for treatments, follow ups, etc.) from that point.

Around eight months after diagnosis these data flows are relatively complete, so the cancer registration process can compute a coherent picture of the diagnosis (and initial treatments), so summary records (about activity and some initial outcomes) are made available for the clinical teams.

Data will continue to arrive from that point, whenever a patient has a hospital interaction, or any treatments are given.

It’s mostly static, but yes, it does undergo minor changes through time (as do all datasets).

The Cancer Outcomes and Services Dataset (COSD) is a large dataset, which covers a range of activities such as audit and registration, which applies to England. Having one dataset means that whenever data are recorded across the NHS, it is done consistently and to the same rules.

The COSD has had small developments through the years, but these are now mostly modifications or clarifications. No major changes are anticipated at present.

The COSD only applies to England, though it is largely consistent with dataset definitions across the rest of the UK.

Yes. All the data which feeds registration is retrospective, though some data arrives faster than others.

Once a registration is started, it remains active or ‘open’, so that data will continue to flow when clinical activity is recorded for the patient. This allows long-term effects to be recorded, and is the best way to look for recurrence, or disease-free survival, both of which remain difficult to measure accurately.

No, this isn’t true. All disease conditions are coded, using an international classification scheme such as ICD (see "Is my disease covered?" question). The NHS across the UK has been using these coding systems for over 30 years. Other coding schemes are used in particular specialist areas, such as pathology (uses a coding scheme called SNOMED). Again, these have been used across the NHS for many years. One of the things that the registries do is to provide a "mapping" between ICD and SNOMED coding.

Primary care coding in GP practices uses a different method, using terminology which gives a greater granularity which better suits primary care. Again, this is all mappable to standard national codes.

Registries only record cases within a defined list of codes (see "Is my disease covered?" question).

If we consider the pathway to start with a suspicion of cancer by a GP, then if test results rule out cancer, the registry does not record further information about the patient.

Once a patient is diagnosed, their data will continue to flow from that point onwards, up to the point of death.

If a patient chooses to pay for their own private medical care, the data held by those private providers does not usually flow to the cancer registry.

Download the full FAQ document