Paperweight, driven by a combination of natural language processing (NLP) algorithms. In the evidence synthesis process, the first steps typically require reviewers to manually build a database of articles and journals they want to summarize. This process entails an exhaustive search of Google Scholar using manually chosen keywords. This approach is vulnerable to bias since the reviewer might be more likely to find certain articles or journals in their review over other ones, depending on the selected search keywords. Tackling this problem, Paperweight seeks to remove the need for a reviewer to manually choose keywords to form their search queries.
Full-text PDFs are almost always the most reliable source of information from academic articles. Even though several resources allow for the extraction of data from full-text documents, most of the time the information is incomplete, inaccurate, or not available. PDFs were created to look great, not to extract data from. So, when you try to copy/paste from PDF you often get unexpected results. In this first version the project allows users to easily copy text from a PDF and attempts to automatically identify the references.