RecSys 2012 Experiences

Author: Benjamin Kille

This blog post summarizes my thoughts about RecSys 2012. It was the first time for me to participate in RecSys. I experienced the conference as venue for both academia and industry leading to manifold discussions and exchange of ideas. The conference offered presentations of a broad variety of topics related to recommender systems. Paper presentations, tutorials, workshops along with a plentitude of posters provided new insights, ideas and opportunities for discussions.

Workshop on Recommendation Utility Evaluation: Beyond RMSE (RUE)

The workshop’s keynote was held by Carlos Gomez-Uribe (Netflix). Carlos provided valuable insights into recommendation evaluation at Netflix. Long-term success requires both offline evaluation based on data sets as well as A/B testing on actual user feedback. Offline data sets can be used to verify whether an adaption to the recommendation system is promising. In case such an adaption succeeds in the offline evaluation, A/B testing is initiated to gather actual feedback. This procedure supports researchers to fail cheaply. Using (long-term) business-related metrics to evaluate the recommender system represents another take away message. The lowest RMSE does not necessarily imply better customer retention.

Besides the keynote, there were two paper-presentation and a poster session. Recommender systems evaluation prooved to be a diverse subject. Papers and presentation highlighted a variety of different aspects leading to fruitful discussions. A major topic in the discussion was the lack of access to actual systems for academic researchers. Without such access A/B testing is limited to industrial researchers. At this point, the plista contest allows researchers to maintain their code and connect to an API granting access to actual user interactions. Another interesting topic was the evaluation of implicit feedback. Recommending scientific articles (Mendeley) or commercials in TV programs does not base on numerical ratings. Such domains require alternative evaluation strategies.



I decided to attend Bart Knijenburg’s tutorial on user studies. Bart discussed a variety of user study related issues in a chronological manner. He pointed out that it is critical to define what exactly is to be measured by the user study. Bart suggested to use structural equation models to assess the outcome of users studies. Such models allow to consider interdependencies between different factors.

Concerning the second tutorial session, I was struggling to decide which tutorial to attend. Xavier Amatriain (Netflix) gave a talk on large scale recommender systems. The other tutorial covered recommender systems challenges. I finally decided to attend the latter in order to prepare for the workshop on recommender systems challenges. Fortunately, Xavier put his slides on slideshare so that I could have a look at them afterwards. The tutorial started with an introduction on recommender system challenges by Domonkos Tikk, Andreas Hotho and Alan Said. After that, a panel discussion with Yehuda Koren (Google), Darren Vengroff (RichRelevance) and Torsten Brodt (plista) invited the audience to ask the panelists on their experiences regarding the Netflix Prize Challenge, the Reclab Prize Challenge and the plista contest, respectively. Numerous questions targeted data preparation and selection of suited evaluation criteria. Privacy issues using user profile data represent another major topic of the discussion.



There were plenty of intersting papers and posters presented at the conference. I will focus on a few of those who I found particularly intersting.

Using graph partitioning techniques for neighbour selection in user-based collaborative filtering DOI
Alejandro Bellogin and Javier Parapar

A question which is bothering me for a while is how to determine exactly which other users should be considered when recommending items to a target user. The authors extent the approach of using Pearson’s correlation coefficient with a normalized cut method. Users are represented as nodes in a graph. The edges in between the nodes are weighted according to their similarity. The normalized cut methods allows to discover suited neighbors efficiently.


When recommenders fail: predicting recommender failure for algorithm selection and combination DOI

Michael Ekstrand and John Riedl

This paper opened a new perspective to the problem I discussed in my paper (Modelling Difficulty in Recommender Systems) at the RUE workshop. Instead of estimating a user specific difficulty explicitly, the authors introduce an approach to determine when a recommendation algorithm will fail. This can be seen as a method to detect particularly difficult users. The authors present an evaluation including several state-of-the-art recommendation algorithms to verify whether different algorithms fail for identical users, what attributes may cause such failure and how hybrid systems could counteract those failures.


BlurMe: inferring and obfuscating user gender based on ratings DOI

Udi Weinsberg, Smriti Bhagat, Stratis Ioannidis and Nina Taft

The authors investigate a privacy related issue in recommender systems. In order to obtain useful recommendations a user needs to provide information on her preferences (either explicitly as ratings or implicitly). The authors developed a method to accuratly predict a user’s gender only with the provided ratings that hold in four out of five cases. The authors suggest users with privacy concerns to rate items particularly popular to the opposite gender. This yields minor losses regarding accuracy preventing the gender detection mechanism successfully.


User effort vs. accuracy in rating-based elicitation DOI
Paolo Cremonesi, Franca Garzotto and Roberto Turrin

This papers targets a central question in recommender systems research: how much information is required to provide adequate recommendations? Setting the burden of putting information in the system prior to receiving recommandations too high might prevent users from using the system. On the other hand, setting the number of ratings required too low might hurt recommendation quality. The authors evaluate this trade-off with a large user study. The take home message given at the presenation: ten ratings are enough.


Multiple objective optimization in recommender systems DOI
Mario Rodriguez, Christian Posse and Ethan Zhang.

This paper perfectly aligns with the findings of the RUE workshop. Generally, the recommender system should obey various objectives, such as recommendation accuracy, customer retention and user experience. The presented approach provides a framework to optimize a recommender system with respect to several criteria at a time. The authors formulate their concrete problem as finding candidates for job offers whose skills match the requirements (objective 1) and are willing to change in the near future (objective 2). The authors report A/B test results indicating a 42 % raise in willing candidates without major loss of skill matching.



The conference featured two keynotes, one held by Jure Leskovec (Stanford University) and another given by Ron Kohavi (Microsoft).

Jure focused on a social network related problem. How does a person’s status and her interests influence us when we evaluate her. He presented results of experiments on three data sets: Wikipedia, Epinions and Slashdot. His findings indicate that users tend to evaluate users with higher status more positively. In addition, he reported that the presence of negative edges allow for more accurate link predictions than relying on positive edges.

Ron was giving the industry keynote. His talk focused on evaluation related topics. He explained that recommender system operators need to be aware of their long term goals. Evaluation criteria must be defined appropriately. Ambiguous criteria may lead to wrong conclusions endangering the systems success. Just as Carlos Gomez-Uribe, Ron mentioned the requirement of online testing procedures such as A/B test. In order to obtain representative groups of users for continuously testing new features, he proposed to use A/A tests. In case those observe major differences, the partitioning must be changed.


Workhop on Recommender System Challenges

At the day after the conference another workshop session took place. I attended the Workshop on Recommender System Challenges. The workshop included paper presentations and a hands-on session with three recommender system libraries: Apache Mahout by Sebastian Schelter, MyMediaLite by Zeno Gantner and LenskitRS by Michael Ekstrand. Additionally, three talks demonstrated the industrial perspective on recommender systems. Domonkos Tikk introduced how recommendation algorithms are ported into business scenarios at Gravity. He mentioned that the algorithmic accuracy does not account for the full recommender system experience. Other aspects such as interface and interactions have significant influence on the percieved quality. Torben Brodt (plista) introduced the plista contest. Researchers can connect to the plista system and provide actual recommendations to users browsing news article websites. The quality is evaluated and results fed back to the researchers (More informations: <>). Kris Jack (Mendeley) talked about recommending interesting scientific articles to researchers. He stated that Mendeley does not spend more than $ 60 to run their system using Amazon’s Elastic MapReduce framework. In addition, he displayed how their system was optimized with respect to both costs and recommendation quality. The hands-on session allowed the participants to experiment with the available frameworks. One finding of the hands-on session was that all frameworks focus much on collaborative filtering either based on explicit or implicit user preferences. Content-based recommenders cause more effort mostly due to varying input formats and therefore do not avail as standard implementations yet.



I really enjoyed my first RecSys. All participants were interested in exchanging ideas. It was really great to personally meet the researchers behind the names on articles covering recommender systems.


My favorite quotes:

Gabor Takacs: “If you don’t like math, you may leave the room now.”
Qiang Yang: “If you have any questions (concerning next year’s RecSys in Hong-Kong), ask me … or anyone who looks Chinese.”

1 comment to RecSys 2012 Experiences

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>