The World Wide Web had initially comprised a collection of texts in the form of HTML documents. Over the years, organizations and increasingly users have added a variety of multimedia. Today, popular web portals attract viewers not only with captivating stories but audio, images, and videos.
Users continue struggling with the vast amount of information available at their fingertips. Finding relevant information has become a greater challenge.
Organizations operating popular portals have introduced systems supporting users in their quest to find interesting content. These recommender systems take the collection of items, process it automatically, and derive a small set of suggestions. Research has established tools to process texts automatically. Dealing with multimedia remains more difficult.
Content-agnostic methods, such as Collaborative Filtering, rely on strong user profiles. News publishers cannot provide such profiles as readers tend to visit their portals anonymously. Consequently, news publishers tend to combine non-personalized and content-based methods. Content-based filtering takes features describing the item and establishes similarities among them to find content that matches users’ preferences. Hitherto, multimedia content has been largely ignored due to technical difficulties.
This has motivated us to set up the “MediaEval – Multimedia for Recommender System” benchmark. The benchmark asks participants to predict the most popular news items based on image features. Participants obtain a data set spanning six weeks. They have to predict the items which will collect the most views in the following weeks. We have computed a set of image annotations to simplify getting started. Statistics and preliminary observations are described in the Task overview paper.
If you have promising ideas about how to extract useful features from images to predict news articles’ popularity, check out the challenge details here.
We have randomly selected three images of the categories sports, local, and politics to give you an impression of the data.
Leave a Reply