Elsevier

Expert Systems with Applications

Volume 40, Issue 16, 15 November 2013, Pages 6266-6282
Expert Systems with Applications

Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network

https://doi.org/10.1016/j.eswa.2013.05.057Get rights and content

Highlights

  • We focus on the role of Twitter and social media in the business environment.

  • We develop tools to collect a large data set of more than 10 million brand-specific tweets.

  • We develop a reduced (1/8th) Twitter-specific lexicon to replace traditional sentiment lexicons.

  • We demonstrate the lexicon provides improved corpus coverage and sentiment analysis performance.

  • We develop comparative sentiment classification models using DAN2 and SVM.

Abstract

Twitter messages are increasingly used to determine consumer sentiment towards a brand. The existing literature on Twitter sentiment analysis uses various feature sets and methods, many of which are adapted from more traditional text classification problems. In this research, we introduce an approach to supervised feature reduction using n-grams and statistical analysis to develop a Twitter-specific lexicon for sentiment analysis. We augment this reduced Twitter-specific lexicon with brand-specific terms for brand-related tweets. We show that the reduced lexicon set, while significantly smaller (only 187 features), reduces modeling complexity, maintains a high degree of coverage over our Twitter corpus, and yields improved sentiment classification accuracy. To demonstrate the effectiveness of the devised Twitter-specific lexicon compared to a traditional sentiment lexicon, we develop comparable sentiment classification models using SVM. We show that the Twitter-specific lexicon is significantly more effective in terms of classification recall and accuracy metrics. We then develop sentiment classification models using the Twitter-specific lexicon and the DAN2 machine learning approach, which has demonstrated success in other text classification problems. We show that DAN2 produces more accurate sentiment classification results than SVM while using the same Twitter-specific lexicon.

Section snippets

Background

Twitter offers a unique dataset in the world of brand sentiment. Public figures and brands receive sentiment messages directly from consumers in real time in a public forum. Both the targeted and competing brands have the opportunity to dissect these messages to determine changes in consumer sentiment. Taking advantage of this data, however, requires researchers to deal with analyzing an immense amount of data produced by Twitter each day, referred to as the Twitter fire hose. As noted by

Twitter sentiment analysis literature review and modeling approach

Twitter is a popular and rapidly growing computer-mediated communication platform. Twitter users create micro-blog status update messages called tweets to communicate with other users for various reasons on a wide variety of topics. These tweets often contain valuable information and the perspectives and opinions of users on issues related to business and society (Gleason, 2013, Jansen et al., 2009). Researchers have developed various approaches to monitor Twitter in real-time for the

Data collection and preparation

We use the Twitter API v1.0 for our data collection. The API v1.0 offered by the Twitter service is a moving target, and has changed several times even over the course of our investigation. The most common and consistent method for gathering data is to request a paged set of data for a given query. The subject (brand) selected for this research is Justin Bieber. At the time of this research, his Twitter account was the largest Twitter account receiving more than 300,000 tweets daily, eclipsing

Feature engineering

The Twitter sentiment analysis is a special case of the general category of text classification. Text classification problems are complex in nature and are always characterized by high dimensionality (Yang & Pedersen, 1997). To reduce this complexity researchers begin by applying preprocessing techniques to the original documents in order to produce a more simplified text.

We use standard preprocessing activities in our feature engineering stage. These are: (1) removing stop words, (2) stemming,

Automated, supervised sentiment analysis

As stated earlier, we consider Twitter sentiment analysis as a text classification problem. We use DAN2 and SVM as two methods for this analysis. These methods are supervised machine learning approaches that require a training dataset for their learning stage. Once each model is trained, they can be used to automatically provide sentiment associated with previously unseen input (tweets).

Once the feature set is defined, in order to assess a tweet’s or a corpus’s sentiment, a functional form and

Conclusion

This research makes several contributions to Twitter sentiment analysis, demonstrated through application on a corpus of tweets related to the Justin Bieber brand. Earlier research on Twitter classification has classified factual sounding tweets as a neutral tweet (Go et al., 2009). Using this approach, they state that “more than 80%” of tweets contain no sentiment. Our approach to sentiment analysis has increased sensitivity, accounting for tweets with mild sentiment (positive and negative),

References (63)

  • Bermingham, A., & Smeaton, A. (2010). Classifying sentiment in microblogs: Is brevity an advantage? In Proceeding of...
  • Bermingham, A., & Smeaton, A. (2011). On using twitter to monitor political sentiment and predict election results. In...
  • Bifet, A., & Frank, E. (2010). Sentiment knowledge discovery in Twitter streaming data. In Proceeding of 13th...
  • Blitzer, J., Dredze, M., & Pereira, F. (2007). Biographies, bollywood, boom-boxes and blenders: Domain adaptation for...
  • Bollen, J., Pepe, A., & Mao, H. (2011b). Modeling public mood and emotion: Twitter sentiment and socio-economic...
  • Cha, M., Haddadi, H., Benevenuto, F., & Gummadi, K. (2010). Measuring user influence in Twitter: The million follower...
  • Chung, J., Mustafaraj, E. (2011). Can collective sentiment expressed on twitter predict political elections? In...
  • S. Das et al.

    Yahoo! for Amazon: Sentiment extraction from small talk on the web

    Management Science

    (2007)
  • Dave, K., Lawrence, S., & Pennock, D. (2003). Mining the peanut gallery: Opinion extraction and semantic classification...
  • Davidov, D., Tsur, O., & Rappoport, A. (2010). Enhanced sentiment learning using twitter hashtags and smileys. In...
  • W.B. Frakes et al.

    Data structures and algorithms: Information retrieval

    (1992)
  • Gamon, M. (2004). Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role...
  • M. Ghiassi et al.

    A dynamic artificial neural network model for forecasting time series events

    International Journal of Forecasting

    (2005)
  • B. Gleason

    #Occupy wall street: Exploring informal learning about a social movement on Twitter

    American Behavioral Scientist

    (2013)
  • Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classification using distant supervision. Technical report,...
  • Hsieh, C., Moghbel, C., Fang, J., & Cho, J. (2013). Experts vs. the crowd: examining popular news prediction...
  • Hu

    Real-time Twitter sentiment toward thanksgiving and christmas holidays

    Social Networking

    (2013)
  • Huang, S., Peng, W., Li, J., & Lee, D. (2013). Sentiment and topic analysis on social media: A multi-task multi-label...
  • B. Jansen et al.

    Twitter power: Tweets as electronic word of mouth

    Journal of the American Society for Information Science and Technology

    (2009)
  • Jiang, L., Yu, M., Zhou, M., Liu, X., & Zhao, T. (2011). Target-dependent Twitter sentiment classification. In...
  • T. Joachims

    Making large-scale SVM learning practical

  • Cited by (403)

    • k-anonymization of social network data using Neural Network and SVM: K-NeuroSVM

      2023, Journal of Information Security and Applications
    • Semantic Analysis of Amazon Customer Using LSTM

      2024, AIP Conference Proceedings
    View all citing articles on Scopus
    View full text