Bridging the Gap Between Data and
Decision-Makers: A Chatbot-Driven Database Query Solution
In today’s business world, quick access to key information (stored in huge databases)
is critical for effective decision-making. Unfortunately, existing dashboard tools
and query methods are often too rigid, requiring technical knowledge that most
employees do not have. As a result, they rely heavily on database engineers or
data analysts, leading to delays in crucial business decisions.
For this problem, we developed a solution that simplifies this process by
introducing a chatbot interface that allows employees to query databases using
natural language. The chatbot maps the natural language question to a database
query – No technical expertise on the user side is required.
Here’s how it works:
Database Query Builder: A large language model (LLM) converts natural
language queries into SQL statements, based on the database schema,
including vocabulary and synonyms.
Database Connector: The
system runs the generated query and returns the result in a structured
format, such as JSON.
Processing Results: Results
are transformed into meaningful text or visuals.
Output Components: The
system provides responses as text or data visualizations like graphs or
charts.
Multi-Modal Interface: Users
can interact with the chatbot using both voice and text inputs.
The approach has been implemented in a web-based application allowing
normal employers to work with the system under real-life conditions. The
evaluation results show that simple questions are handled effectively, while
more complex queries benefit from additional prompt customization. By
incorporating company-specific terminology, the chatbot produces accurate SQL
queries and minimizes errors.
We presented the work as a regular paper at the LWDA conference 2024 (co-located with KI2024) in Würzburg, Germany. At the conference we had a lot of interesting, fruitful discussions focusing on the big potential of the approach and ideas for future improvements. Looking ahead, we aim to further enhance the system with specialized LLMs, optimize query performance, and refine methods to detect and address hallucinations in responses.
This year’s Long Night of Science Berlin and Potsdam took place on 22-June-2024. We showcased our latest research and projects at the Center for Tangible AI and Digitalisation (ZEKI). As part of the Long Night of Science, presentations at ZEKI centered around innovations in autonomous driving, smart office equipment and care technologies, demonstrating their potential to transform everyday life.
Meanwhile, the Artificial Intelligence and Machine Learning Competence Centre focused on advances in large language models, intelligent conversational systems and explainable AI. We showcased dialogue models that recognize user intent and efficiently navigate conversations to provide relevant information. Practical demonstrations showed the ability of conversational systems to adapt to new objects and user-specific queries, increasing their usefulness in various applications. Discussions with visitors revealed a wide range of expectations and opinions about AI, reflecting a broad public interest in how these technologies are developed and deployed.
Compared to previous years, the number of visitors increased significantly, demonstrating the high level of interest in AI research and its potential to have a significant impact on various aspects of modern life. Overall, the discussion at the Long Night of Science allowed interesting discussions about the future of AI, emphasizing innovation and the ethical dimensions of technology deployment.
Our robot always receives much attention when answering questions from visitors.
Explaining our Large Language and Dialog Models at the Long Night of Science
Impression from the Long Night of Science at the ZEKI Berlin
Recommender systems have become crucial in managing the overwhelming volume of news in today’s digital ecosystem by providing personalized, context-aware suggestions. The introduction of Large Language Models (LLMs) has significantly transformed the news industry by automating the generation of news content. The field of news recommendation is faced with numerous challenges, including maintaining trustworthiness, fighting misinformation, preventing echo chambers, as well as addressing the fleeting nature of news topics. Furthermore, the generated content must carefully balance privacy concerns, uncomplete user preference, and the demand for fairness, bias reduction, diversity, and ethical considerations. Thus, the field of news recommendation is an interesting field for ongoing research.
The 12th International Workshop on News Recommendation and Analytics (INRA 2024) provides a platform to foster the research in the domain of news recommendations. The workshop aims to bring together researchers and practitioners from various disciplines to discuss recent challenges and explore innovative solutions. The topics of interest include
News analytics and recommendation
Automated news generation, summarization and opinion mining
Multi-modality in the news domain
Self-supervised learning for news recommendation
Fake news, misinformation, and filter bubbles
Legal aspects and ethics
Trustworthiness in the news ecosystems
The INRA 2024 workshop will be held in conjunction with RecSys 2024 in Bari, Italy. More details can be found on the Workshop webpage.
Artificial Intelligence (AI) is a rapidly evolving field of research that has garnered significant attention in recent months, particularly with the advent of Large Language Models (LLM) and technologies capable of generating images (e.g. Stable Diffusion) and multimedia content. These innovations simplify the content generation and the context-sensitive creation of documents, strongly changing the way how we approach information creation and management. Despite their potential, administrative processes are often perceived as outdated, presenting numerous opportunities for substantial optimizations.
In response to the big potentials of these technologies, the ITDZ Berlin has inaugurated the Innovation Lab (InnoLab) to make use of the AI’s potential within administrative realms. This initiative aims to analyze the benefits of AI methodologies for specific requirements and desires. Through workshops, self-learning sessions, and events, ITDZ Berlin employees will have hands-on experiences with AI tools and technologies, evaluating their practical applications.
The primary goal of InnoLab is to discuss and understand the strengths and weaknesses of current models, as well as to keep up-to-date with emerging trends. This knowledge will guide the selection of appropriate tools to enhance daily operations at ITDZ. Officially opened on 27-Feb-2024 by Deputy Chairperson Anne Lolas, who, after a brief introductory speech, initiated a countdown led by our pepper robot, Bobbi, marking the official launch.
As a strategically critical initiative, the focus on AI and the continuous development of InnoLab will ensure that current trends and innovations are integrated and leveraged to improve administrative processes, making them more efficient and responsive to the evolving demands of our digital era.
The robots starts the countdown for the InnoLab operning.
Testing Large Language Models for administrative use cases.
The 14th Annual MediaEval Workshop has been held 1-2 February 2024. MediaEval has been held in conjunction with MMM in Amsterdam, Netherlands. MediaEval serves as a benchmarking initiative dedicated to fostering the creation and assessment of innovative algorithms within the multi-media field. It highlights the interdisciplinary nature of multimedia by offering datasets that combine different types of data, including text, audio, images, and video. In 2023, MediaEval introduced the following five tasks:
Similar to last year’s edition, the DAI-Lab of the TU Berlin has been actively involved in the organization of the NewsImages lab.
The participants of NewsImages develop algorithms for re-establishing the connection between texts and images extracted from news portals. Based on a set of news articles (consisting of texts and an images), a set of texts and a set of images is defined. The lab participants have to reconstruct the connection between texts and images (originally defined by the news editors). As a special challenge, we partially replaced the original images by AI-generated image (using Stable Diffusion).
In 2023, 10 teams actively participated in NewsImages task. The teams submitted a up to 5 solutions for the news text -images re-matching task. Overall, the submitted solution very successfully solved the task. The highest performance has been reached by applying OpenCLIP. The winner team used extended fine tuning for optimizing the model to the concrete scenario.
An addition several BLIP and LLM-based methods have been suggested. The methods showed in average a lower performance than CLIP, but give new insights in the patterns used for finding good images for news articles.
The next edition of MediaEval is planned for October 2025 in Dublin. The following pictures give an impression of the MediaEval 2023 in Amsterdam.
Impressions from the MediaEval Workshop in Amsterdam
Insights from the NewsImages lab 2023
Impressions from the MediaEval Workshop in Amsterdam
Chatbots stand as one of the premier communication channels in the digital age, essential in several different sectors for their instantaneous, round-the-clock interaction. Conventional (rule-based) chatbots (like Alice) have exhibited significant limitations, particularly in adapting to context, addressing specific inquiries, and supporting follow-up questions, often leading to a fragmented and unsatisfactory user experience. The use of Large Language Models (LLMs) can help to overcome the problem since LLMs can highly adapt to the context and the concrete user questions – but LLMs generated answers might be unreliable and build based on unverified knowledge.
Our research focuses on combining the potential of LLMs with reliable semantic knowledge sources. Our paper introduces a novel methodology, combining LLMs with dedicated (nosql-) databases bases to ground answer generation in verifiable information, enhancing both reliability and context-relevance of the responses.
We have implemented a prototype aimed at answering questions pertaining to local administration services. This real-world application underscores our commitment to not only advancing theoretical understanding but also solving tangible challenges within the scenario.
Initial evaluations are promising, indicating effective function and a significant leap over traditional chatbot capabilities. Nevertheless, there is substantial room for refinement. The necessity for fine-tuning became evident to ensure comprehensive and reliable responses to all user queries.
We presented our research work on the this years LWDA conference in Marburg (Germany). The presentation in the knowledge management track and in the poster session was very fruitful. We are grateful for the engaging dialogues and the intellectual exchange at LWDA conference; we anticipate further refining our approach through collaborative insights and continued research.
Visual Impressions from the LWDA conference give the following photos.
LWDA Conference 2023 in Marburg – Postersession
LWDA Conference 2023 in Marburg – Our Presentation
LWDA Conference 2023 in Marburg – social event / guided tour
The relation of news texts and images is an interesting research topic.
Since for a significant fraction of news events no photos are available, news
editors often use archived photos or stock images. Due to the advances of
generative AI methods, images may be generated to get well-fitting images
catching the attention of potential readers.
NewsImages is an Evaluation Labs to discuss, test, and evaluate methods for re-matching news texts and images. NewsImages 2023 runs under the umbrella of MediaEval 2023. NewsImages lab provides a dataset tailored for learning and evaluating strategies for reassigning news texts and images.
A detailed description of the NewsImages lab and the dataset are discussed in the lab overview paper.
The NewsImages 2023 workshop will be held in Amsterdam, in February 2024 in conjunction with MMM2024 .
The dataset is available for download. Check out the details on the official web page and join NewsImages 2023!
The Long Night of Science (LNDW) is an annual event that takes place in Berlin and Potsdam. It is a science festival that offers visitors the opportunity to explore science and research in a fun and interactive way. In 2023, for the first time the Center for Tangible AI and Digitalization (ZEKI) participated in the Long Night of Science.
Our research group presented Bobbi, the Chatbot of the Berlin’s administration (developed in a cooperation between the TU Berlin and the ITDZ Berlin). Usually, Chatbot Bobbi reliably answers questions related to the Berlin’s administration on the Berlin’s official Service Portal. At the LNDW visitors could personally get in touch with our robot and talk face to face with Bobbi. As a special surprise, Bobbi had prepared a small quiz allowing the visitors to test their knowledge about Bobbi’s job and the services offered by the Berlin’s administration.
The presentation had been actively attended by a large number of visitors. We discussed with the expectations and requirements to conversations with robots as well as the technical foundations of chatbots. This included classical Natural Language Processing (NLP) methods as well as Transformers and Large Language Models. Visitors learned about recent trends in NLP and enjoyed the interaction with our Chatbot Bobbi.
The strong interest in the project and the profound discussion with the visitors made the event a big success. We plan to present an improved, much more skilled version of chatbot Bobbi in 2024.
This year, NewsImages provided three news datasets from three different domains: (1) The RSS part provided news from different news portals (aggregated by the dgelt-Project). The second part TW provided news stories-related tweets. The third part RT provided news in German related to the war in Ukraine. The task in NewsImages consists in re-matching news text and images. Details about the analyzed setting, the dataset, and the evaluation metrics are explained in the Lab overview paper.
In this year’s challenge, 5 teams from Europe and Asia participated. The developed solutions provided very good results, The best team reached a Recall@5 =0.6 . The discussion of the results gave interesting insights in the relation between the news texts and the used images. The best results were obtained by annotating images using CLIP. For correlating text and image considering only a short text (e.g. the headline) has been most successful. This underlines that image and headline are often based on the same key element of the news message. The best results have been reported for tweets indicating that there is a strong overlap between the (relatively) short text and the accompanying image. The typically required translation step for the texts from the RT dataset reduced the reached recall by about 15%. The teams in the challenge mainly focused on optimizing the similarity models for connecting texts and images as well as on methods for bridging the semantic gap between news texts and images by enriching detected concepts with semantic information.
The LWDA conference (acronym for ”Learning, Knowledge, Data, Analytics”) is a conference organized by the German Computer Science Society. The conference covers recent research in areas such as databases, information systems, knowledge discovery, machine learning, data mining, and knowledge management.
The presentations and poster sessions give a valuable overview on current research projects and results of the mostly German universities. The conference give a good platform for discussing future cooperation and for planning joined activities.
I presented on the conference a system for generating clarification questions in a chatbot tailored for the needs of the Berlin’s administration. The system combines language models, semantic annotations and clustering techniques for generating counter questions for ambiguous user inputs. This enables a virtual chatbot to guide users efficiently to the desired information. The details an be found in our paper.
The following photos give an impression from the conference.
LWDA2022 – CampusMarienburg, University of Hildesheim