Sentiment Analysis – The Data Story Guide

Automated analyses of text data with the objective of either:

Identifying sentiment in the text (i.e., the extent to which the content expresses positive or negative opinions).
Identifying specific objects in text and the sentiment attached to these objects (e.g., 'cereal' is an object and 'like' indicates an opinion about the object).
Identifying specific features of objects in text and the sentiment attached to these features (e.g., where a person likes sugary cereal then the object is cereal and the feature is sugary).

Methods

Sentiment analysis typical involves a combination of one or more of:

Word scoring, whereby specific words or phases are assigned a value (e.g., -1 for a negative word, 0.5 for a mildly positive word) and the average or total value is computed for each observation (e.g., respondent in a survey).
Predictive Models using case studies. This involves assessing whether certain words or combinations of words are known to correlate with known opinions (e.g., ratings on websites, ratings within questionnaires, opinions deduced through coding).
Natural Language Processing which seeks to obtain a deep understanding of the structure of text information (i.e., breaking a text into sentences, recognizing the roles of nouns, verbs, adverbs, etc.). This can involve both the application of statistical methods and the use of rules based on an understanding of the structure of language. (Technically word scoring and predictive models can be viewed as examples of natural language processing, but the key distinction is the extent to which the method is crude, as in these two approaches, or attempting to synthesis the meaning from the text.)

The difficulty of the problem

An opinion can be thought of as possessing the following properties:^[1]

The object being evaluated. For example, a phone.
A specific feature of the object. For example, the size of a phone.
The polarity of the opinion (i.e., the extent to which the opinion is positive or negative). For example, a person that 'loves' a phone has a strong positive opinion, whereas a person that is 'bored' by a phone perhaps has a weak negative opinion.
The holder of the opinion. For example, I may love a phone, my boss may hate a phone and my wife may be bored by a phone.
The time at which an opinion was held.

To accurately compute sentiment it is necessary to extract each of these properties from a given piece of text. To appreciate the difficulty of this, consider the following example:

Last week I bought a Nokia Lumia and my girlfriend bought an iPhone a year ago. She thinks my new phone is too big. We called each other when we got home. She couldn't hear me clearly and she thinks this a problem with the Nokia but I'm not so sure as she often has trouble hearing things. We also found that voice recognition in each phone was no where near as good as Android. However, the 920 phone takes great pictures.

Key challenges presented by this example include:

Three separate objects are discussed (the Nokia Lumia, an iPhone and an un-named Android phone).
The Nokia Lumia is explicitly mentioned in three different ways (Nokia Lumia, Nokia and 920).
The objects are implied but not explicitly mentioned in two of the sentences (i.e., the sentences discussing the size of the phone and voice recognition).
Two separate times are mentioned.
Two features are mentioned ('pictures' and 'voice recognition').
Two further features are implied but never explicitly mentioned ('big' implies the feature of size and 'trouble hearing' implies a problem with a microphone).
Some of the opinions are expressed in a direct manner (e.g., 'great'), whereas others are comparative ('no where near as good').
That 'but' operates as a negation of the opinion that is expressed prior to its use.
The sentence contains error in the use of language ('no where').

An attempt to accurately infer sentiment expressed in this example needs to disentangle all of these aspects, reducing the sentence to a series of distinct opinions (i.e., each with the five properties described above).

The application of sentiment analysis

Due the the inherent difficult of sentiment analysis its application in the coding of survey data is limited and its main application is currently as a quick tool for tracking sentiment in situations where there are insufficient resource to manually code text data.

Also known as

Opinion mining.

References

Liu, B. (2010). Sentiment Analysis and Subjectivity. Handbook of Natural Language Processing. N. Indurkhya and F. J. Danerau, Chapman & Hall/CRC.