esh_logo PDF annotation
Coding and extracting data from PDFs

Extraction of content from articles, also known as coding, is an important part of evidence synthesis, especially for meta-analysis that require coding of multiple predefined parameters that are to be extracted from articles. This task is usually tedious therefore multiple people, potentially including external helpers may involve in coding. Software tools that assist efficient content extraction and also enable indexing of extracted context against the field labels, are highly desirable.

The most significant barrier against such tools is that the majority of the articles are available in pdf format, because contents in pdf files are embedded in highly abstract and protected manner. The main contribution of the prototype is accessing contents in pdf articles selectively. The tool is built around ReactJS JavaScript framework, therefore suitable for deploying in a local virtual web-server in a desktop environment or in a centrally hosted web-server, as a web application. This application takes in a CSV file with fields to be extracted as headers and loads pdf files from a server folder. Then the coding can be performed using a right-click menu that brings up list of fields, and then saves the fields, selected contents / values on in the pdf and any user comments back to the CSV file as a new raw per single pdf.