We recently launched the online demonstrator for our quotation extraction system. It automatically extracts both direct and indirect reported speech, associated speakers, and reporting verbs from German news article texts. In the demonstrator, you can enter or paste text yourself and see how well quotes are detected. Quotations are highlighted directly in the text and can also be displayed as extracted raw text.
You can try out the live demonstrator here.
Quotation Extraction – What we Extract
Our system detects 3 main types of elements in the text: Reported speech acts, their speaker, and, if present, the reporting verb used to mark the reported speech. In the demonstrator, these types are highlighted in the text as they are detected.
Reported speech comes in two flavours: direct and indirect speech. To extract reported speech, we go beyond a simple recognition of direct quotations marked by quotation marks. Our system is capable of recognizing both simple and complex formulations of quotations in text using deep linguistic parsing. This means we can catch the full spectrum of reports of exactly what people have said in the news.
We also detect who these people are by detecting the speaker of each quote in the text. Because without the speaker alongside it, a citation would be useless. If a speaker is referenced only with a part of their name or a pronoun, we resolve the full name of the speaker as mentioned earlier in the text.
It can sometimes be useful to know how a quotation was reported in the news. After all, there is a difference between a person “suggesting” something and “insisting on” it. Therefore, if a verb was used to report a speech act, we detect it in the text and associate it with its appropriate quotation.
Applications of Quotation Extraction
An automatic quotation extraction system such as ours makes it possible to track and gather reports of individual persons in the news over a long time. This helps to structure the news data in more meaningful ways and provides valuable data for many use cases in news analysis.
The INAMET Project
The ongoing INAMET project is a cooperation project between the DAI Labor and Neofonie. The goal of INAMET is to create a hierarchical overview of news contents over long spans of time. Many different aspects of the news, like sentiment analysis and who said what to whom, are aggregated in relation to news topics and events — to be explored visually by the user.
CC IRML of DAI Labor’s contributions to the project are the quotation extraction and opinion mining systems. The latter is currently under development and will soon be integrated into the demonstrator system.
Detailed project information can be found at the INAMET project page.
Author and contact: Sascha Narr
Leave a Reply