Language-Independent Twitter Sentiment Analysis

We recently presented our work on a language-independent approach to sentiment analysis (positive or negative emotions) in tweets at KDML at LWA 2012, Dortmund, Germany.

We also present our evaluation dataset of human-annotated sentiments in tweets, collected using Amazon Mechanical Turk. You can download the dataset for research purposes.

A Language-Independent Sentiment Classification Approach

In this work, we (Sascha Narr, with the aid of Michael Hülfenhaus) took a look at a previously proposed heuristic for classifying positive and negative sentiment in tweets. Word-feature-based classifiers need huge number of training samples to perform well. Having humans label such training data is expensive, so we use a heuristic to save a lot of effort.

The heuristic makes use of emoticons found in tweets as indicators of the sentiment of the tweet. Happy smileys mean a positive sentiment in most cases and angry or sad smileys often indicate negative emotions.

These “noisy labels” are not perfect, but with them we can we can gather a training set for many classifiers almost automatically. This approach works not just for tweets, but for any text collections that contains a fair share of emoticons.

The emoticon heuristic seems to be fit to automatically train classifiers for many different languages. Smileys exist in texts of basically any modern language used on the internet. The smileys themselves may differ between languages, so we need to use a seed set of smileys that covers a large number of the different smileys used worldwide. As long as our smiley sets are as complete as possible, the heuristic can be applied universally.

The only problem we might run into are conflicting smileys between languages, where a happy smiley in one language is used as a negative one in another language. But luckily these cases are very very rare. We conducted some experiments to examine if the approach really works well on different languages.

In our paper we implemented the heuristic and tested it using a Naive Bayes classifier and human-annotated evaluation tweets in 4 languages.

Sentiment Evaluation Dataset

The dataset contains tweets that have been human-annotated with sentiment labels by 3 Mechanical Turk workers.
There are 12597 tweets in 4 languages: English, German, French and Portugese.
The labels annotated are positive, neutral, negative and n/a.


Our evaluation results demonstrate that the our used method is quite performant, but can vary strongly for some languages. The best classification result is achieved by our English classifier.

The performance of the Portuguese classifier however is quite low. Since Portuguese does contain an average amount of emoticons, the emoticon heuristic works as expected, so the lower performance must have to do with specific characteristics of the Portuguese language. Unfortunately, we were not yet able to analyse these reasons further.

We also evaluated a classifier trained on a mix of the 4 languages. This classifier actually performs only marginally worse than the individual classifiers would if you would add and average their results. Therefore it is also feasible to train a classifier on more than one language. This can be useful for example if you do not have the means to seperate texts by language in your dataset.

Please read our paper for a full and detailed evaluation and analysis of the results!

Accuracy of the classifiers for 4 languages and the combined classifier. The baseline is a mock-classifier that always guesses 'positive'.
Accuracy of the classifiers for 4 languages and the combined classifier.
The baseline is a mock-classifier that always guesses ‘positive’.

Slides of the Talk


Paper & Poster

View the paper

View the poster we presented at KDML


Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>