ArAutoSenti: Automatic annotation and new tendencies for sentiment classification of Arabic messages

Date

2020-08

Advisors

Journal Title

Journal ISSN

ISSN

Volume Title

Publisher

Springer

Type

Article

Peer reviewed

Yes

Abstract

A corpus-based sentiment analysis approach for messages written in Arabic and its dialects is presented and implemented. The originality of this approach resides in the automation construction of the annotated sentiment corpus, which relies mainly on a sentiment lexicon that is also constructed automatically. For the classification step, shallow and deep classifiers are used with features being extracted applying word embedding models. For the validation of the constructed corpus, we proceed with a manual reviewing and it was found that 85.17% were correctly annotated. This approach is applied on the under-resourced Algerian dialect and the approach is tested on two external test corpora presented in the literature. The obtained results are very encouraging with an F1-score that is up to 88% (on the first test corpus) and up to 81% (on the second test corpus). These results respectively represent a 20% and a 6% improvement, respectively, when compared with existing work in the research literature.

Description

The file attached to this record is the author's final peer reviewed version.

Keywords

Arabic sentiment analysis, Arabic and its dialects, Automatic resources construction, shallow/deep classification, word embedding, document embedding

Citation

Guellil, I., Azouaou, A., Chiclana, F. (2020) ArAutoSenti: Automatic annotation and new tendencies for sentiment classification of Arabic messages. Social Network Analysis and Mining,

Rights

Research Institute