This AI Can Summarize Research Papers, And Present Them In A Few Sentences

Robot scholar”

Academic papers, scholarly papers, position papers, thesis and scientific papers, can all contain terms and languages that are complex.

Understanding these papers isn’t something people can learn to do overnight. It takes practice, and also time. It can take anywhere between an hour for reading short papers, to six hours for longer ones.

While there are indeed people who are good at reading and understanding long-written papers, most people don't have such ability and/or patience.

For those people, an AI has been developed by the researchers at the Allen Institute for Artificial Intelligence.

The team has developed an AI model that can summarize text from the papers, and present it in a few sentences in the form of TLDR (Too Long Didn’t Read).

This way, users who would want a quick summary of a research document, can easily extract the main points of the research, in mere seconds.

Initially. the team has rolled this model out to the Allen Institute’s Semantic Scholar search engine for papers, and only on papers that are related to computer science on search results or the author’s page.

On its web page, Semantic Scholar wrote that:

"The new TLDR feature in Semantic Scholar puts automatically generated single-sentence paper summaries right on the search results page, allowing you to quickly locate the right papers and spend your time reading what matters to you."

"TLDRs (Too Long; Didn't Read) are super-short summaries of the main objective and results of a scientific paper generated using expert background knowledge and the latest GPT-3 style NLP techniques. This new feature is available in beta for nearly 10 million papers and counting in the computer science domain in Semantic Scholar."

The AI summarizes the paper users want to read, by taking the most important parts from a paper which include the abstract, introduction, and conclusion section, and then summary them.

To make this happen, the researchers first “pre-trained” the model on the English language.

Then, they created a SciTLDR data set that consists of more than 5,400 summaries of computer science papers. After that, the model was further trained on more than 20,000 titles of research papers to reduce dependency on domain knowledge while writing a synopsis.

An example TLDR of a scientific paper. A TLDR is typically composed of the most important information found in certain sections of a paper. (Credit: Allen Institute for AI)
"Staying up to date with scientific literature is an important part of any researchers’ workflow, and parsing a long list of papers from various sources by reading paper abstracts is time-consuming."

"TLDRs help users make quick informed decisions about which papers are relevant, and where to invest the time in further reading. TLDRs also provide ready-made paper summaries for explaining the work in various contexts, such as sharing a paper on social media."

The result of this, the AI is capable of summarizing papers that are as long as 5,000 words, into just 21 words in average.

According to the researchers' paper, this is a compression ratio of 238.

Following the success of this AI on English-written computer science-related papers, the researchers plan to expand the model to papers in other fields.