Introduction

 

Social media sites such as Twitter and Facebook have emerged as popular tools for people to express their opinions and sentiments on various topics. Large amount of data provided by these media is extremely valuable for mining trending topics and events. However, this massive volume also dictates that the mining approach be efficient in terms of computations and storage. We propose an efficient, scalable system to detect events from tweets. The system does not employ any Twitter-specific features, and thus, can be readily adapted to any other social media site. Since tweets are very short and diverse in nature, traditional approaches, meant for typical long/well-formatted documents, do not scale to tweets. Our approach detects events by exploring their textual and temporal components. The system does not require any target entity to be specified; it automatically detects generic events from a set of tweets. The key components of our system are an extraction scheme for event representative keywords, an efficient storage mechanism to store their appearance patterns, and a hierarchical clustering technique based on the common co-occurring features of keywords. Our approach is considerably more time-and-memory efficient than most state-of-the-art systems, while achieving competitive precision and recall.