Research title:Emotion Discovery from Heterogeneous Text Collections
Start date: October 2012
Text is an important means not just to convey facts but also to express emotions. Emotion analysis is the computational study of emotions expressed in text (e.g. tweets, messages and blogs). Also with the advent of Social media services such as Twitter, Facebook, MySpace users have become more forthcoming in expressing their opinions (interests, preferences, ideas), thus generating huge volumes of emotive content. Until recently research in emotion analysis has relied on existing sentiment lexicons and other general purpose emotion lexicons (GPEL) to discover emotions in the text. However sentiment lexicons (due to lack of granular emotion information) and GPEL (due to the static and formal nature) are less effective for emotion analysis in domains that are inherently dynamic in nature; such as social media. The aim of my research is to address the limitations of the existing tools for emotion analysis and make impactful contributions.
My research is focused on the following objectives:
- Developing corpus-independent methods in order to learn domain-specific word-emotion lexicons.
- Evaluate the quality of the lexicons through a variety of emotion analysis tasks such as word-emotion quantification, document classification and emotion ranking.
- Developing methods in order to adapt the lexicons to data streams in the context of Social Media.
- Developing novel text based representations for artistic multimedia such as humour, music videos using emotion lexicons in order to enhance recommendation and retrieval of artistic multimedia.
- Studying the relationship between emotions and sentiments in order to validate if they are independent or interdependent
- Anil Bandhakavi, Nirmalie Wiratunga, Deepak.P, Stewart Massie. Generating Word-Emotion Lexicon from #Emotional Tweets. In Proceedings of the 3rd Joint Conference on Lexical and Computational Semantics. (*SEM 2014), Dublin, Ireland.
- Anil Bandhakavi, Nirmalie Wiratunga, Deepak.P, Stewart Massie. A Mixture Unigram model for Emotion analysis of text. To be communicated to A special issue of Elsevier Knowledge-Based Systems journal (KBNLP 2015)
SICSA funded Summer Internship at IBM, Research, India, 2015
Sampling Algorithms for Big Data
The aim of this research is to offer cost-effective solutions to applications which rely heavily on Big data by reducing the data size, thereby reducing the computational costs and processing overheads. In this work we develop sampling algorithms for large-scale Twitter Data, in order to learn high quality representative samples. A high quality representative sample is one which accurately preserves the statistical properties of the universe. In this work we propose and develop a statistical model (language model) based sampling algorithm which can effectively preserve the properties specific to Twitter data such as word (keywords, hash tags)-frequency distributions, word-topic, word-sentiment and word-emotion distributions. We empirically evaluate the quality of the samples extracted using the proposed algorithm and also compare them with those extracted using Random sampling and Stratified sampling. Experiments conducted on word-frequency and word-sentiment distribution preservation suggest that the proposed method is better able to extract high quality samples, compared to the baselines. We expect similar performance gains for word-emotion and word-topic distributions