Presenting our Chatbot Research at the LWDA Conference 2019 in Berlin

Author: Andreas Lommatzsch

The 2019 LWDA conference has been held in Berlin from September 30th to October 2nd, 2019. This year’s venues have been the Smart Data Forum (next to the TU Berlin) in and the Berlin School of Library and Information Science (next to the main building of the Humboldt-University Berlin). The conference is organized by the German Computer Science Society (GI). The core topics of the conference are Knowledge Discovery and Machine Learning; Databases, and Information Retrieval.

From the many interesting presentations I would like to highlight the keynote "Beyond research data infrastructures: exploiting artificial & crowd intelligence towards building research knowledge graphs" by Stefan Dietze. The talk underlined the importance of datasets and the aggregation of datasets for research. Challenges are ambiguity and missing meta-data for the available datasets. Converting crawled data into knowledge graphs by applying semantic and ML methods (e.g. NER, NED, Sentiment Detection) provides the basis for new research fields, especially related to social science. I liked the talk due to the fact that we made similar observation in our research projects (e.g. [1] and [2]). Created datasets are provided on our dataset web page.

CC IRML presented current research in the domain of chatbot systems at the conference. Our contribution "An Information Retrieval-based Approach for Building Intuitive Chatbots for Large Knowledge Bases" reports the experiences running the Virtual Assistant "Bobbi". Bobbi is a chatbot providing information related to services and locations of the Berlin Administration. The paper discusses how to build chatbots without training data (cold-start problem) and explains how to efficiently handle the wide variety of observed user intentions. The research uses data which we have collected in the live system deployed on the official website of the city of Berlin ( We presented the results in a 30 minutes talk. Besides, we participated in the poster session to discuss more directly with attendees.

Presenting our Multimedia-based Recommender Approaches at the 19th I4CS Conference

Author: Andreas Lommatzsch

The International Conference on Innovative Internet Community Systems (I4CS) has been held in the CongressPark Wolfsburg, June 24 – 26, 2019.

This year the conference focuses was on Digital Innovations for the Public and Mobility Services. The conference focus was especially visible at the second day of the conference. The day started with a talk of the Mayor of Wolfsburg explaining the digitalization strategy of the city. Subsequently, the Wolfsburg.Digital program has been presented by the Volkswagen AG. The initiative supports innovative solutions for improving the quality of life by improving the digital infrastructure, efficient traffic management, creating the infrastructure for e-mobility, and zero-carbon building. The presentations and the discussion gave interesting insights in the current-state of development and the plans for the next years. The poster session gave much space for discussing research in detailed.

Overall, the conference presentation exciting insights in current research projects and new ideas for further research. I presented our framework for computing multimedia-based recommendations. The framework and the publicly available real-world news dataset are used in the MediaEval benchmark enabling researchers to evaluate new multimedia-based recommender algorithms. In addition to the conference presentation, I also presented the system in the poster session. This gave us time for detailed discussions and new cooperation ideas.

The highlight of social program of the conference was a guided tour through the Volkswagen factory. The tour gave insights into the different vehicle production areas and showed how different cars are produced. In an open Golf train the tour showed all steps of the production process.

In 2020, the 20th edition of the I4CS conference will be held in Bhubaneswar, India.

Multi-Media Analysis for Recommender Systems

The World Wide Web had initially comprised a collection of texts in the form of HTML documents. Over the years, organizations and increasingly users have added a variety of multimedia. Today, popular web portals attract viewers not only with captivating stories but audio, images, and videos.

Users continue struggling with the vast amount of information available at their fingertips. Finding relevant information has become a greater challenge.

Organizations operating popular portals have introduced systems supporting users in their quest to find interesting content. These recommender systems take the collection of items, process it automatically, and derive a small set of suggestions. Research has established tools to process texts automatically. Dealing with multimedia remains more difficult.

Content-agnostic methods, such as Collaborative Filtering, rely on strong user profiles. News publishers cannot provide such profiles as readers tend to visit their portals anonymously. Consequently, news publishers tend to combine non-personalized and content-based methods. Content-based filtering takes features describing the item and establishes similarities among them to find content that matches users’ preferences. Hitherto, multimedia content has been largely ignored due to technical difficulties.

This has motivated us to set up the “MediaEval – Multimedia for Recommender System” benchmark. The benchmark asks participants to predict the most popular news items based on image features. Participants obtain a data set spanning six weeks. They have to predict the items which will collect the most views in the following weeks. We have computed a set of image annotations to simplify getting started. Statistics and preliminary observations are described in the Task overview paper.

If you have promising ideas about how to extract useful features from images to predict news articles’ popularity, check out the challenge details here.

We have randomly selected three images of the categories sports, local, and politics to give you an impression of the data.

The Virtual Citizen Services Assistant would like to have a Name.

Author: Andreas Lommatzsch

Chatbots are one of the most exciting techniques supporting users in finding useful information and in solving complex tasks. In contrast to websites and search engines, chatbots provide a ”natural” interaction scheme. Chatbots try to imitate human experts engaging with users in dialogs.
Chatbots combine methods for considering the context and apply learning algorithms to learn continuously from user feedback. Chatbots are often equipped to handle small talk giving the chatbot an individual personality. This leads to human-like behavior.

The complex domain of public administrations raises plentiful user questions. Consequently, the cities Berlin and Hamburg have introduced Chatbots providing answers to all citizens concerning services and the administration.

Unfortunately, the Virtueller Bürger Service Assistent (Virtual Citizen Service Assistant) currently lacks a catchy name. So please help the assistant to get a name. The Senatsverwaltung für Inneres und Sport has initiated a call for name suggestions (due March 31, 2019). Prizes await the three highest ranked suggestions. The best suggestion will receive an annual ticket for the Berliner Bäder Betriebe (worth 495 EUR). More details can be found with the Official Rules.

We are looking forward to your suggestions.

NewsREEL Multimedia at MediaEval’18

Author: Benjamin Kille

This year’s edition of our news recommendation challenge NewsREEL focused on multimedia data. We asked participants to estimate which articles would become popular solely based on their textual and visual features. A large-scale data set collected by our long-term partners at plista facilitated evaluating different algorithms.

The MediaEval benchmark brings different evaluation tasks together. This year’s edition took place in the time from 29 to 31 October in Nice, France. The event offers task organisers the opportunity to present their challenges. Participants can discuss ideas and illustrate their results.

Three papers have been submitted for NewsREEL Multimedia. The overview paper [1] outlines the task, details the evaluation methodology, and presents the results. The baseline paper [2] illustrates the baselines against which we compared participants’ predictions. Ciobanu et al. [3] analysed how Google’s Vision API can be used to obtain more representative labels for images.

Besides the technical programme, the MediaEval organisers had prepared social events. On the first day, we visited the Château-Musée Grimaldi and avoided drowning during the subsequent city tour. On the second day, we enjoyed dinner at Chez David. Finally, on the third day, we dined at Château Le Cagnard. During the diners, we had the opportunity to discuss and exchange ideas with a diverse group of researchers. We have come up with ideas to foster future cooperation.

Next year, MediaEval will return to Nice colocated with ACM Multimedia.


[1] Lommatzsch, A., Kille, B., Hopfgartner, F. and Ramming, L., 2018, October. NewsREEL Multimedia at MediaEval 2018: News Recommendation with Image and Text Content. In Working Notes Proceedings of the MediaEval 2018 Workshop. CEUR-WS.


[2] Lommatzsch, A. and Kille, B., 2018. Baseline Algorithms for Predicting the Interest in News based on Multimedia Data.In Working Notes Proceedings of the MediaEval 2018 Workshop. CEUR-WS.


[3] Ciobanu, A., Lommatzsch, A. and Kille, B., 2018. Predicting the Interest in News based On Image Annotations. In Working Notes Proceedings of the MediaEval 2018 Workshop. CEUR-WS.


CIKM and INRA’19 in Turin

Author: Benjamin Kille

Turin hosted the 27th edition of Conference on Information and Knowledge Management (CIKM) on 22-26 October 2018. The first day featured nine workshops. We co-organised the sixth edition of the International Workshop for News Recommendation and Analytics (INRA). Between twenty and thirty attendees listened to three keynote talks. Frank Hopfgartner, senior lecturer at Sheffield University, presented a comprehensive review of information retrieval as well as recommender systems evaluation initiatives. Anja Benner-Tischler, legal scholar at Kassel University, outlined the recently introduced EU General Data Protection Regulations (GDPR). Leif Ramming, the lead of plista‘s machine learning team, discussed issues related to scaling up recommender systems on an industrial level. In addition, attendees followed six paper presentations. The seventh paper could not be presented in person due to visa issues.

The second day commenced with the keynote speech by Maarten de Riijke. He highlighted the interactive aspects of environments, in which todays information access systems operate. Subsequently, the conference split into three rounds with five parallel sessions each. The day concluded with the first of three short, demo, and industry sessions. The third day kicked off with Edward Grefenstette‘s keynote about how modelling rewards affects agents’ learning of language. The remainder of the day followed the same structure as before with three rounds of five parallel sessions followed by the second short, demo, and industry session. Yoelle Maarek started the fourth day with her keynote about Amazon’s Alexa. Subsequently, attendees split up to listen to the final rounds of research talks. Besides the final short, demo, and industry session, the programme included a townhall discussion. The final day offered a selection of tutorials.

In 2019, CIKM will take place in Beijing, China on 3-7 November.

The TU Berlin participates in the Festival of Lights 2018

Author: Andreas Lommatzsch

Every year in October the Festival of Light Berlin illuminates over 50 buildings in the city. In 2018 the TU Berlin participated in the festival for the first time. Microscopic images showing structures of leafs and crystals are projected on the TU-Tower at the Ernst-Reuter-Platz (one of the tallest buildings in the Western Part of Berlin). Impressions from our office and the Ernst-Reuter-Platz are shown in the following photos.

KI 2018 in Berlin

Author: Andreas Lommatzsch

The 41th edition of the German AI conference ("KI 2018") has been held in Berlin, September 24-28, 2018. Starting with two days with workshops and tutorials, the main conference ran from Wednesday to Friday. Each day of the main conference started with an interesting keynote. A highlight was the keynote given by Dietmar Jannach on session-based recommendation approaches. Professor Jannach discussed the transition from rating prediction to ranking with implicit feedback. He stressed that recommender systems research has to critically reflect on the desired output rather than marginally improve arbitrary criteria.

CC IRML contributed two-fold the conference:
(1) On Monday, we held a half-day tutorial on stream-based recommendation algorithms. The tutorial discussed how to extend existing recommender algorithms and evaluation to stream-based scenarios. In addition, we explained the NewsREEL challenge, which offers the possibility of evaluating stream-based recommender algorithms both online and offline. About twenty participants attended and engaged in discussions.
(2) On Friday, B. Jain presented his work on "Condorcet’s Jury Theorem for Consensus Clustering." The talk discussed the theoretical justification for the consensus clustering method.

In 2019 the KI conference will be held in Kassel (23-26 Sept 2019); in 2020 the conference will come to Bamberg.

LWDA Conference in Mannheim

Authors: Andreas Lommatzsch and Asmaa Haja

The conference "Lernen Wissen, Daten, Analysen" has been held in Darmstadt August 22-24, 2018. The conference is organized by the German Computer Science Society (GI) and consists of 5 tracks: Information Retrieval (FGIR 2018), Knowledge Management (FGWM 2018), Knowledge Discovery, Data Mining, and Machine Learning (KDML 2018, Business Intelligence (WSBI 2018), and Large-Scale Data Management and Processing – Applications in Research and Industry (FGDB 2018).

The conferences program consists of research paper presentation, a poster session and four interesting keynotes from different domains. The first keynote was given by Daniela Nicklas. The talk discussed the question of who watches the sensors, which are used for monitoring real-world phenomena. The second keynote, given by Kerstin Bach, explained the "selfBACK" decision support system (DSS) project. The system uses Case-Based Reasoning for supporting patients with low back pains. Frank Giesler presented the third talk. He introduced a tried-and-tested approach that ensures successful data-driven digital product development through the close involvement of future users. The last talk, given by Stephan Mandt, gave an overview of some exciting recent developments in deep probabilistic modeling, which combines deep neural networks with probabilistic models for unsupervised learning.

The DAI-Labor presented the research conducted in the student project. Our paper "Normalization of Time series for Improving Recommendations" studies how observed log data in recommender systems can be decomposed into several different characteristic patterns. We explain that the decomposition is the basis for understanding the lifecycle of the items and the characteristic user preferences. Furthermore, the decomposition can be used for trend-based prediction, extrapolating the extracted time series taking into account the characteristic patterns. The methods have been studied in a News Recommendation scenario that is characterized by a continuously changing set of items and strong variances in the number of users during the day.
In addition to a talk, we also presented a poster allowing us to discuss details of the scenario and our approach individually.

Overall, the conference provided many interesting talks covering a wide spectrum of topics. The positive atmosphere has been the basis for exciting discussion.

EYZ Media Research: Cooperation With DAI-Labor TU Berlin

Authors: Jing Juan, Andreas Lommatzsch, Antje Marx

Recommender systems have proven effective to help the audience of streaming platforms finding relevant movies and serials hidden in huge amount of available content. The question is how can the audience be assisted in finding the most interesting movies matching the individual user preferences. Popular platforms, like Netflix, Pandora, and Amazon successfully provide recommender systems. Collaborative filtering based approach (item-based and user-based) are popular to provide recommendations.

Standard recommendation approaches have several weaknesses in niche market due to the user data sparsity, missing ratings, and highly specific items. Therefore, customized recommendation strategies are needed, adapting to the specific system features while catering users’ demand. And such customization can take place when traditional recommender algorithms tailored to the special items format and user experience in the corresponding niche markets, thereby add value to the specific system or platform.
This has been the motivation for the cooperation between EYZ Media and the DAI-Labor of the TU Berlin.

The collaboration between EYZ Media and DAI-Lab in year 2018 considers both traditional recommendation strategies (content-based approach and collaborative filtering approach) and modern event-based approach (introducing domain relevant trend in heterogeneous system resource like Twitter). Obeying the API and dataflow definition from EYZ Media, DAI-Labor provides the support on recommendation solution by building up elasticsearch service, collaborative filtering component and event-based recommender pipeline. Corresponding evaluation turned out from both experts’ subjective opinion and offline evaluation on specific metrics help us better understand how recommenders booster user activities in the niche market.

Project details are discussed in the realeyz tech blog.


The DAI-Labor (Distributed Artificial Intelligence Laboratory) at the TU Berlin conducts research and development projects in order to provide solutions for a new generation of artificial intelligence systems and services. The competence center Information Retrieval and Machine Learning (IRML) focuses on developing innovative recommender systems and information retrieval systems combining machine learning techniques, graph-based approaches and social media analysis.


EYZ Media operates the VOD platform realeyz, on the web at, as Prime Video Channels and as an app in the stores. Thanks to its focus on independent film, realeyz has built up a growing user base and is one of the top 3 VOD specialty providers in Germany. RealEYZ stands for intelligent, exciting, classic and innovative content in all formats and lengths, curated carefully and with passion. Every day a new film, all in its original version with a selection of 1,000+ titles – and the focus on indie movies appeals above all to young, metropolitan users. The use of many unique components created by EYZ’s own dev. Team such as the recommender system with its algorithms that enable individual and personal UX as well as realeyz’’s presence on different channels and technical environments sharpen the profile, strengthen customer loyalty and establish the realeyz brand as the essential hub for independent productions. RealEYZ is supported by the EU-CREATIVE program and is part of the EuroVoD network.

Supported by Investitonsbank Berlin