Search results

1 – 2 of 2

View access options

Article

Publication date: 8 November 2022

PFSA-ID: an annotated Indonesian corpus and baseline model of public figures statements attributions

Yohanes Sigit Purnomo W.P., Yogan Jaya Kumar and Nur Zareen Zulkarnain

By far, the corpus for the quotation extraction and quotation attribution tasks in Indonesian is still limited in quantity and depth. This study aims to develop an Indonesian…

HTML

PDF (3.7 MB)

Downloads

Abstract

Purpose

By far, the corpus for the quotation extraction and quotation attribution tasks in Indonesian is still limited in quantity and depth. This study aims to develop an Indonesian corpus of public figure statements attributions and a baseline model for attribution extraction, so it will contribute to fostering research in information extraction for the Indonesian language.

Design/methodology/approach

The methodology is divided into corpus development and extraction model development. During corpus development, data were collected and annotated. The development of the extraction model entails feature extraction, the definition of the model architecture, parameter selection and configuration, model training and evaluation, as well as model selection.

Findings

The Indonesian corpus of public figure statements attribution achieved 90.06% agreement level between the annotator and experts and could serve as a gold standard corpus. Furthermore, the baseline model predicted most labels and achieved 82.026% F-score.

Originality/value

To the best of the authors’ knowledge, the resulting corpus is the first corpus for attribution of public figures’ statements in the Indonesian language, which makes it a significant step for research on attribution extraction in the language. The resulting corpus and the baseline model can be used as a benchmark for further research. Other researchers could follow the methods presented in this paper to develop a new corpus and baseline model for other languages.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9342

Keywords

View access options

Article

Publication date: 2 December 2020

Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in Indonesia

Yohanes Sigit Purnomo W.P., Yogan Jaya Kumar and Nur Zareen Zulkarnain

Extracting information from unstructured data becomes a challenging task for computational linguistics. Public figure’s statement attributed by journalists in a story is one type…

HTML

PDF (1.8 MB)

Downloads

299

Abstract

Purpose

Extracting information from unstructured data becomes a challenging task for computational linguistics. Public figure’s statement attributed by journalists in a story is one type of information that can be processed into structured data. Therefore, having the knowledge base about this data will be very beneficial for further use, such as for opinion mining, claim detection and fact-checking. This study aims to understand statement extraction tasks and the models that have already been applied to formulate a framework for further study.

Design/methodology/approach

This paper presents a literature review from selected previous research that specifically addresses the topics of quotation extraction and quotation attribution. Research works that discuss corpus development related to quotation extraction and quotation attribution are also considered. The findings of the review will be used as a basis for proposing a framework to direct further research.

Findings

There are three findings in this study. Firstly, the extraction process still consists of two main tasks, namely, the extraction of quotations and the attribution of quotations. Secondly, most extraction algorithms rely on a rule-based algorithm or traditional machine learning. And last, the availability of corpus, which is limited in quantity and depth. Based on these findings, a statement extraction framework for Indonesian language corpus and model development is proposed.

Originality/value

The paper serves as a guideline to formulate a framework for statement extraction based on the findings from the literature study. The proposed framework includes a corpus development in the Indonesian language and a model for public figure statement extraction. Furthermore, this study could be used as a reference to produce a similar framework for other languages.

Details

Global Knowledge, Memory and Communication, vol. 70 no. 6/7

Type: Research Article

DOI:

ISSN: 2514-9342

Keywords

Access

Year

All dates (2)

Content type

1 – 2 of 2

Search results

PFSA-ID: an annotated Indonesian corpus and baseline model of public figures statements attributions

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in Indonesia

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

Something didn’t work…

All feedback is valuable

Platform update page

Questions & More Information

PFSA-ID: an annotated Indonesian corpus and baseline model of public figures statements attributions

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Understanding quotation extraction and attribution: towards automatic extraction of public figure’s statements for journalism in Indonesia

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information