Lexicon‐pointed hybrid N‐gram Features Extraction Model (LeNFEM) for sentence level sentiment analysis

dc.cclicenceCC-BY-NCen
dc.contributor.authorMutinda, James
dc.contributor.authorMwangi, Waweru
dc.contributor.authorOkeyo, George
dc.date.acceptance2021-01-20
dc.date.accessioned2021-02-18T11:36:24Z
dc.date.available2021-02-18T11:36:24Z
dc.date.issued2021-02-07
dc.descriptionopen access articleen
dc.description.abstractSentiment analysis of social media textual posts can provide information and knowledge that is applicable in social settings, business intelligence, evaluation of citizens' opinions in governance, and in mood triggered devices in the Internet of Things. Feature extraction and selection is a key determinant of accuracy and computational cost of machine learning models for such analysis. Most feature extraction and selection techniques utilize bag of words, N‐grams, and frequency‐based algorithms especially Term Frequency‐Inverse Document Frequency. However, these approaches do not consider relationships between words, they ignore words' characteristics and they suffer high feature dimensionality. In this paper we propose and evaluate a feature extraction and selection approach that utilizes a fixed hybrid N‐gram window for feature extraction and minimum redundancy maximum relevance feature selection algorithm for sentence level sentiment analysis. The approach improves the existing features extraction techniques, specifically the N‐gram by generating a hybrid vector from words, Part of Speech (POS) tags, and word semantic orientation. The vector is extracted by using a static trigram window identified by a lexicon where a sentiment word appears in a sentence. A blend of the words, POS tags, and the sentiment orientations of the static trigram are used to build the feature vector. The optimal features from the vector are then selected using minimum redundancy maximum relevance (MRMR) algorithm. Experiments were carried out using the public Yelp dataset to compare the performance of the proposed model and existing feature extraction models (BOW, normal N‐grams and lexicon‐based bag of words semantic orientations). Using supervised machine learning classifiers the experimental results showed that the proposed model had the highest F‐measure (88.64%) compared to the highest (83.55%) from baseline approaches. Wilcoxon test carried out ascertained that the proposed approach performed significantly better than the baseline approaches. Comparative performance analysis with other datasets further affirmed that the proposed approach is generalizable.en
dc.funderNo external funderen
dc.identifier.citationMutinda, J., Mwangi, W., Okeyo, G. (2021) Lexicon-pointed hybrid N-gram Features Extraction Model (LeNFEM) for sentence level sentiment analysis. Engineering Reports, e12374.en
dc.identifier.doihttps://doi.org/10.1002/eng2.12374
dc.identifier.issn2577-8196
dc.identifier.urihttps://dora.dmu.ac.uk/handle/2086/20632
dc.language.isoen_USen
dc.peerreviewedYesen
dc.publisherWileyen
dc.researchinstituteCyber Technology Institute (CTI)en
dc.subjectfeature selectionen
dc.subjectlexiconen
dc.subjectminimum redundancy maximum relevanceen
dc.subjectsentence level SAen
dc.subjectsentiment classificationen
dc.subjectTF-IDFen
dc.subjectN-gram2vec modelen
dc.titleLexicon‐pointed hybrid N‐gram Features Extraction Model (LeNFEM) for sentence level sentiment analysisen
dc.typeArticleen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
eng2.12374.pdf
Size:
1.52 MB
Format:
Adobe Portable Document Format
Description:
Article on feature selection and sentiment analysis.
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
4.2 KB
Format:
Item-specific license agreed upon to submission
Description: