Home: Systematic Reviews: Data Extraction

Data Extraction and Extraction Tips

What is Data Extraction?

With the screening process complete, review teams take the information they need from the studies they have included in the review. This is data extraction: the process of carefully collecting relevant information from studies, and organizing it in a way that will enable analysis and synthesis. It’s an interesting stage in any project because it’s the start of adding value and producing work that is more valuable than the sum of its parts.

The list of included studies is the tip of the beginning. Each study reference represents a huge amount of information. And because you will be interested in some but not all of that information, your approach to data extraction will need to be planned and purposeful to avoid a data deluge.

The data can be split into two types: data from the study and data about the study. An example of data from the study is the dose and timing of the intervention – this informs the evidence synthesis. An example of data about the study is the description of the allocation of patients to the study arms – this informs the assessment of study quality. Both types of data are usually collected by two reviewers who work independently of each other because this helps to minimize the risk of data entry error.

During data extraction reviewers design, test, and populate forms. The forms are then checked and conflicts are investigated and resolved. There is no ‘best’ way to build data extraction form; much depends on the type of review, the type of data, and the specific outcomes of interest. However, the basic process can be organized into five steps:

Plan
Pilot
Extract
Compare and reach consensus 👀
Export

Tip 1: Follow a Data Extraction Framework:

At the core of a successful systematic review is the adoption of a structured approach, and a data extraction framework is instrumental in achieving this. Frameworks such as PICO, are systematic, organized and not only enhance reliability and transparency but also contribute to the efficiency of the review process, supporting the validity and credibility of the findings.

Tip 2: Plan Ahead:

Meticulous planning is a must. It’s important to plan your data extraction approach before diving into the process. This proactive step minimizes the need for rework, addresses unforeseen circumstances, and mitigates uncertainties that may arise during the review. A well-thought-out plan sets the stage for a smoother and more successful data extraction process. The starting point for designing a data extraction form is the review question and, if you have one, the research plan. The form must be well-structured and easy to use. If items are unclear, there is a risk that the reviewers will make mistakes as they use it. Designing good data extraction forms takes practice. It’s a good idea to take advice from people with experience and to specify in advance the process for resolving conflicts between the reviewers.

Tip 3: Pilot the Template:

Piloting the template before full-scale data extraction begins is also vital. This practice not only saves time but also ensures that the entire team is well-acquainted with the template components. By identifying potential issues early on, the likelihood of template edits during extraction is minimized, subsequently reducing the time spent re-extracting studies or cleaning data. This streamlined approach enhances the overall efficiency of the review process.

If the test data come back as per your expectation, the form is performing well. If some test data is wide of the mark, you have a useful opportunity to explore the reasons for this and to improve the form.

Is the form capturing everything you need?
Is it capturing data that you don’t need?
Are any of the questions potentially confusing or open to interpretation?
Does it provide data in a form that can be utilized according to your research plan?

This testing phase can be repeated until the team is satisfied that the form is fit for purpose. It’s important to train individual reviewers at the pilot stage and throughout the extraction process to ensure that they understand the requirements and code data in exactly the same way. Discrepancies between reviewers that persist once they are trained could indicate that the form is still not clear enough to gather data reliably.

Tip 4: Extract the Right Amount of Data:

Effective data extraction is an art that we encourage researchers to master. It involves extracting all relevant data while avoiding unnecessary information that may impede the analysis. Striking this balance ensures that your analysis can be completed without the need to continually refer back to original sources. Discerning between crucial and non-essential data is a reliable way to streamline the workflow.

With the form ready to go, you’ll now use it to transfer the data safely from the study reports into the review. Although the specifics will be different, each review will likely extract data on the following aspects of the studies:

Participants
Interventions
Outcomes
Results

These bullets represent data from the study only. Data about the study (for example, risk of bias data) are extracted using a separate form and are not required for all review types (for example, scoping reviews typically do not collect this type of data). Your review might be interested in some, rather than all, of the interventions in a particular study. There is no need to extract information on the interventions that are out of the scope of the review but it is helpful to provide a complete list for readers who may wish to replicate an intervention.

Tip 5: Communicate Regularly & Keep a Log for Reporting Checklists:

Communication is a cornerstone during the data extraction phase. Regular updates within the team ensure a shared understanding of the processes and early identification of potential issues requiring protocol or template amendments. We recommend keeping a comprehensive log of communication, decisions, and processes during data extraction to facilitate the completion of checklists, often required by journals during manuscript submission.

The data from the two reviewers must now be compared. Discrepancies must be resolved here and a consensus reached. It is this consensus data that is exported in step 5. If the values in the forms do not match because one reviewer has made a coding error, the error can be corrected here. If the reviewers have reached different judgements, the conflict can be resolved by discussion between the reviewers and, if that doesn’t work, by referral to another member of the team who will make the final decision. It is good practice to specify in advance the process for resolving conflict among the reviewers and to keep a record of these disagreements.

Data Extraction Best Practices

Once you complete screening and have your final set of included studies, you may begin Data Extraction.

The Cochrane Handbook recommends:

Data collection for systematic reviews should be performed using structured data collection forms. Given the important functions of data collection forms, ample time and thought should be invested in their design. (Cochrane Handbook 5.4.1)
All data collection forms and data systems should be thoroughly pilot-tested before launch. (Cochrane Handbook MECIR Box 5.4.a). Testing should involve several people extracting data from at least a few articles. (Cochrane Handbook 5.4.3)
Use (at least) two people working independently to extract study characteristics from reports of each study, and define in advance the process for resolving disagreements. - highly desirable (Cochrane Handbook 5.5.2)
Use (at least) two people working independently to extract outcome data from reports of each study, and define in advance the process for resolving disagreements. - mandatory (Cochrane Handbook 5.5.2)

Cochrane Handbook of Systematic Reviews of Interventions: Chapter 5: Collecting data

A practical Guide to Data Extraction for Intervention Systematic Reviews (Cochrane)

Systematic Reviews

Template For Data Extraction Planning

Data Extraction and Extraction Tips

Data Extraction Best Practices

Characteristics of Included Studies' Table