Language: Python
Paperweight, driven by a combination of natural language processing (NLP) algorithms. In the evidence synthesis process, the first steps typically require reviewers to manually build a database of articles and journals they want to summarize. This process entails an exhaustive search of Google Scholar using manually chosen keywords. This approach is vulnerable to bias since the reviewer might be more likely to find certain articles or journals in their review over other ones, depending on the selected search keywords. Tackling this problem, Paperweight seeks to remove the need for a reviewer to manually choose keywords to form their search queries.
Full-text PDFs are almost always the most reliable source of information from academic articles. Even though several resources allow for the extraction of data from full-text documents, most of the time the information is incomplete, inaccurate, or not available. PDFs were created to look great, not to extract data from. So, when you try to copy/paste from PDF you often get unexpected results. In this first version the project allows users to easily copy text from a PDF and attempts to automatically identify the references.
Citations downloaded from bibliographic databases and other resources, such as Google Scholar, are often missing certain details like abstracts or volume/page details that are important for a variety of reasons, such as screening in systematic reviews or locating full text documents. This functionality is intended to be used for filling in missing information from a set of citation files, including abstracts.
Defining a good search strategy for systematic reviews can be a particularly challenging task. Some of the problems encountered are: when asking two people for a strategy they will get totally different outputs, the number of hits is prohibitively high, there are missing relevant references because a specific keyword was omitted, few means of validating search strategies exist, it is difficult to adapt the strategy for other databases, errors may be introduced when adapting strategies between databases, etc.