
Prof. Dr. Felix Naumann
Hasso-Plattner-Institut
für Softwaresystemtechnik
Prof.-Dr.-Helmert-Str. 2-3
D-14482 Potsdam, Germany
Paper accepted at SSDBM
Proceedings of the 24th International Conference on Scientific and Statistical Database...
JWS Article Accepted
Integrating Open Government Data with Stratosphere for more Transparency Arvid Heise and Felix...
LREC Paper Accepted
The eighth international conference on Language Resources and Evaluation (LREC), Istanbul,...
Daniel Rinser wins award for his masters thesis
IQ Best Master Degree Wettbewerb der Deutschen Gesellschaft für Informations- und Datenqualität e....
HPI TV releases video about GovWILD
See the new video about our Government Data Integration platform GovWILD.
Tool voidGen released
As part of our winning submission at the 2010 Billion Triple Challenge at the International...
ICDE Paper Accepted
28th IEEE International Conference on Data Engineering (ICDE) Washington, DC, USA Adaptive...
Many real life databases lack sufficient structural information such as foreign keys. These constraints are often not defined due to performance reasons, lacking knowlegde of the data, or due to dirty data, which do not entirely hold the constraints. Thus, we want to detect foreign keys automatically.
This problem is devidable in two steps: First, find all inclusion dependencies, i.e., attributes A and B such that all values of A are included in all values of B. This definition fits the syntactical and automatically testable part of a foreign key constraint. In the second step, we want to find heuristics to filter foreign keys from inclusion dependencies.
We developed our algorithm SPIDER (Single Pass Inclusion DEpendency Recognition) to detect inclusion dependencies over large schemas. The challenge is the quadratic complexity of the problem in the number of attributes. SPIDER sorts and "distincts" all attributes in the database system. Afterwards, it tests all attribute pairs in parallel while reading all values at most once. We showed that SPIDER clearly outperforms previous approaches for detecting inclusion dependencies exactly.
We plan to extend SPIDER to detect partial inclusion dependencies to handle dirty data and to detect composite inclusion dependencies to cover composite keys and foreign keys.
publications
- Jana Bauckmann, Ulf Leser, Felix Naumann, Véronique Tietz: Efficiently Detecting Inclusion Dependencies. International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey, (poster paper, to appear).
- Jana Bauckmann, Ulf Leser, Felix Naumann, Joachim Schmid: Data Profiling: Effiziente Fremdschlüsselerkennung mit Aladin. German Information Quality Conference & Workshop, Bad Soden, November 2006.
- Jana Bauckmann: Efficiently Identifying Inclusion Dependencies in RDBMS. 18. Workshop über Grundlagen von Datenbanken (GI-Workshop), Wittenberg, Juni 2006.
- Jana Bauckmann, Ulf Leser, Felix Naumann: Efficiently Computing Inclusion Dependencies for Schema Discovery. Workshop InterDB (with ICDE06), Atlanta, April 2006.


