2017 Web Intelligence Summer School
An Event on Web Science & Web of Data
We are bringing together experts on various aspects of the Web Intelligence and Semantic Web domains.
The topic of the 2017 edition is Data Management and Questions Answering with the Web of Data:
- Publication of web data:
Linked data, semantic web standards and techniques
- Understanding and analyzing a question in natural language:
Natural Language Processing
- Finding data to answer the question and to justify the answer:
From Monday July 3rd to Friday July 7th students will learn from
formal presentations and hands-on sessions that will make them scientifically and practically competent.
This event is supported by the French-German University (Deutsch-Franzoesische Hochschule, Université franco-allemande) and by the European Union's Horizon 2020 research and innovation programme under the Marie Skodowska-Curie grant agreement No. 642795, project: Answering Questions usingWeb Data (WDAqua).
Sharing, connecting, managing, analyzing and understanding data on the Web will enable better services for citizens, communities and industry. However, turning web data into successful services for the public and private sector requires skilled web and data scientists, and it still requires further research. In order to teach and train researchers and to create exchange opportunities, we are bringing together experts on various aspects of the Web. The 2017 edition of the Web Intelligence Summer School (WISS) will deal with:
Data Management and Questions Answering with the Web of Data
We will bring together experts on various aspects directly related this question: publication and management of web data, understanding and analyzing a question in natural language or in key words, finding data to answer the question and to justify the answer.
During the week, students will learn from formal presentations and hands-on sessions that will make them scientifically and practically competent.
The week will be divided into half-day sessions, each of which provides scientific and practical knowledge to the attendees.
The speakers have been invited from the well known institutions in the field, in order to ensure highest quality in content and pedagogy.
Data, Web, Linked Data, Data Management, NLP, Data integration, Curation, Extraction, Querying
- Sören Auer (Universität Bonn – Fraunhofer IAIS)
- Hady Elsahar (Laboratoire Hubert Curien – Université Jean Monnet)
- José Gimenez-Garcia (Laboratoire Hubert Curien – Université Jean Monnet)
- Oudom Kem (Laboratoire Hubert Curien – École des Mines de Saint-Étienne)
- Manolis Koubarakis (National and Kapodistrian University of Athens)
- Christoph Lange (Universität Bonn – Fraunhofer IAIS)
- Ioanna Lytra (Universität Bonn – Fraunhofer IAIS)
- Pierre Maret (Laboratoire Hubert Curien – Université Jean Monnet)
- Kamal Singh (Laboratoire Hubert Curien - Université Jean Monnet)
- Antoine Zimmermann (Laboratoire Hubert Curien – École des Mines de Saint-Étienne)
- Asunción Gómez-Pérez
Question-Answering for providing Technology Intelligence Services to Small and Medium Enterprises
Benefits of Technology Intelligence (TI) services for SMEs include the assessment of the technologies in use, the identification of new technologies or technology-based products and services, the opportunities for technical collaboration or technology innovation, the recognition of competitors and providers, and the detection of potential and experienced customers for their portfolio of products and services. In this talk, I will present the design of innovative TI services to provide solutions to technology-based small and medium enterprises (SMEs) and how the system is being implemented using the IBM Bluemixcomponents and IBM Watson components.
- Mohamed Yahya
Question Answering in Bloomberg.
Bloomberg is the world's leading financial data provider. The data, which comes in structured and unstructured flavors, is made available to our 400k subscribers through various functions in the "Bloomberg Terminal". In this three-part talk I present our ongoing effort to create a unified question answering (QA) interface within the Bloomberg Terminal to streamline users' access to financial knowledge.
- Javier D. Fernández
Managing Compressed Big Semantic Data: Theory and Practice
The steady adoption of Linked Data in recent years has led to a significant increase in the volume of RDF datasets. The potential of this Semantic Big Data is under-exploited when data management is based on traditional, human-readable RDF representations, which add unnecessary overheads when storing, exchanging and consuming RDF in the context of a large-scale and machine-understandable Semantic Web. This scenario calls for efficient and functional representation formats for RDF as an essential tool for RDF preservation, sharing, and management.
In the first part of the session, after introducing the main challenges emerging in a Big Semantic Data scenario, we will present fundamental concepts of Compact Data Structures and RDF self-indexes. We will analyze how RDF can be effectively compressed by detecting and removing two different sources of syntactic redundancy. Then, we will introduce HDT, a compact data structure and binary serialization format that keeps big datasets compressed, saving space while maintaining search and browse operations without prior decompression. We will present the HDT deployment in projects such as Linked Data Fragments, which provides a uniform and lightweight interface to access RDF in the Web, indexing/reasoning systems like HDT-FoQ/Jena and LOD Laundromat, a project serving a crawl of a very big subset of the Linked Open Data Cloud.
Finally, we will inspect the challenges of representing and querying evolving semantic data. In particular we will present different modeling strategies and compact indexes to cope with RDF versions, allowing cross-time queries to understand and analyse the history and evolution of dynamic datasets.
In the hands-on session, students will gain experience on compressed semantic big data management, with a main focus on HDT and related applications.
- Vinh Nguyen
- Phil Archer
- Lydia Pintscher
Wikidata: current state and challenges
Wikidata is 4,5 years old now. As Wikimedia's structured data project it has made great strides over that time. It has attracted a large community and figured out ways to maintain a multilingual and multicultural knowledge base in the open. In this talk we'll look at the current state of the project, the challenges it still faces as well as some of the remaining open research questions.
- Ricardo Usbeck
Question Answering - Problems and the Search for Solutions
Man-made AI is able to defeat the world-best Jeopardy players and thousands of research papers address Question Answering (QA), but recent QA benchmarks based on the Web of Data still show huge performance gaps. Using the Question Answering over Linked Data (QALD) challenge, we identified two main root causes: a) missing measurement of KPIs to drive research in the right direction b) missing technologies to solve particular types of questions generically. Here, we are going to inspect recent takes on improving the measurement of QA systems and suggest requirements for future developments. The take-aways will include a list of reviewer-convincing list of techniques to develop and publish novel QA systems.
- Vanessa Lopez
Semantics and QA in Action: cognitive solutions for Integrated Care
Despite the enormous potential of Semantic Technologies to empower the world of information management, discovery and reuse, their impact in the market and society and the extent to which they provides significant advantages to address real-life problems have yet to break through. Answering complex questions in the web of data has been envisioned by some as the “killer app” of intelligent semantic systems , but are we there yet? Since the rise of Linked Data, have ontology-based QA tackled the main downsides of early QA approaches narrow in scope and brittle?.
There are two keys behind successful QA systems from an industry perspective (like QA on top of Google Knowledge Graph and IBM Watson system for Jeopardy). Key 1 is strong knowledge acquisition techniques to mine the web for facts, moving from approaches that require the underlying structured data and schema to be encoded and populated with the answers. Key 2 is moving from NLP approaches that required language to be completely and precisely translated into a formal representation to approaches that do not assume a complete representation of a question to find and combine meaningful pieces of knowledge from questions, exploiting whatever sources with evidence for answers, even in the presence of precision and recall errors.
Meanwhile, cognitive technologies promise to have significant societal impact in domains where there is a need to transform multidisciplinary information into actionable services. But with most of information still unstructured, how do we demonstrate the impact of QA on the web of data? We argue that to know if we are answering the important questions and doing what it needs to be done to push the state of the art we need a story to tell. To do that we look at the potential use of semantic and cognitive QA in the health care sector. In this talk, we present a use case that require harvesting large amounts of data and discuss research challenges and future directions. In particular, embodying cognitive approaches that combine semantics, NLP and learning to facilitate intuitive human interaction, in which professionals interact in a natural way with the system and the systems reacts and adapts its understanding and knowledge to give better answers to questions, remain as a significant challenge.
- Thomas Pellissier
Practical Wikidata and librairies (hands-on)
- Manolis Koubarakis
Writing European Project Proposals
- Johann Stan
Patent writing and evaluating. Overview of patents in the QA domain
For more information please click "Further Official Information" below.