CHIO
Técnicas y Herramientas para la Representación de Datos en Grafos a Escala Web
Responsable(s): Inmaculada Concepción Hernández Salmerón / David Ruiz Cortés
01-09-2023 – 31-08-2026
In this project, we aim at generating a collection of tools and data engineering techniques to represent and transform the information contained in web-scale graphs, exploit that information to generate new knowledge, and provide access to that knowledge by means of well-defined services.
IntegraKG
Métodos y herramientas para la integración de grafos de conocimiento Web
Responsable(s): David Ruiz Cortés / Inmaculada Concepción Hernandez
01/01/2022 – 31-05-2023
Data integration tasks such as the creation and refinement of knowledge graphs have to increasingly deal with the matching and fusion of data from many sources, e.g., different web sites, already created knowledge bases and repositories. Integrating new data sources and their entities into a KG is challenging due to the typically large number of different kinds of entities and relationships, the high degree of heterogeneity in their representations and the often low data quality with frequently incomplete, wrong or contradicting information.
In this project, we deal with different tasks related to data integration, specifically in the context of knowledge graphs. We leverage the use of embeddings as a way of representing data in a low-dimensional space, thus enabling the application of state-of-the-art machine learning techniques on them..
DESK
Ingeniería de los Datos Como Soporte a los Grafos de Conocimiento
Responsable(s): Inmaculada Concepción Hernández Salmerón / David Ruiz Cortés
01/06/2020 – 31/05/2022
Knowledge graphs allow efficient and flexible data storage which is being most used nowadays by expert researchers and the leading companies alike (Google, Facebook, Microsoft, Amazon, or Netflix).
Unfortunately, creating and maintaining those graphs are not trivial tasks, whether by automated information extraction, NLP processes, or by hand.
We are putting our focus in this project in the essential data engineering tasks that produce knowledge graphs with complete, interlinked, trustable information, suitable for data science analysis, namely: creating, integrating and refining the graphs, and the optimization of our techniques (a sine qua non for any engineering approach).
MARTITA
Ingeniería de Datos Aplicada a la Extracción, Semantización, Refinamiento y Explotación de Grafos de Conocimiento a Escala Web
Responsable(s): Rafael Corchuelo Gil / David Ruiz Cortés
01/01/2020 – 31/12/2022
In this project, we aim to deal with the Web of Data, easily the largest data repository that exists nowadays. Although there are several approaches to generate linked open data in structured machine-readable format, many organisations still offer their data only in a tabular format, and published by means of HTML web pages.
To leverage the potential benefits of these data, it is necessaty to process them, namely: extracting data from tables, endowing data with semantics to build a knowledge graph, refining the graph to complete it and prune errors, and exploiting the graph to allow the user querying it with the help of virtual assistants.
VORTEX
Herramientas para la Ciencia de los Datos de la Web
Responsable(s): David Ruiz Cortés / Rafael Corchuelo Gil
30/12/2016 – 29/12/2020
In this project, we explore the Web Data Science, which is likely to be one of the hottest research areas in the short term. We cover several related topics, such as: clustering and extracting information from web documents, endowing information with semantics and detecting duplicated information, performing advanced opinion analysis, and validation of the former approaches in the context of big datasets.
The challenge is that our proposals require very little or no human intervention so that they can scale to the dimensions of the Web Data Science.
ISIDORO
Semantización y Publicación de datos Abiertos para la Integración de Servicios Electrónicos
Responsable(s): David Ruiz Cortés / Rafael Corchuelo Gil
01/01/2014 – 31/12/2017
Our goal in this project is to do applied research to craft knowledge, techniques, and tools that our industry can use to reduce the production costs associated with publishing semantically-meaningful open linked data, the creation and integration of added-value electronic services, and analyse the reactions of citizens in social media.