
Prof. Dr. Felix Naumann
Hasso-Plattner-Institut
für Softwaresystemtechnik
Prof.-Dr.-Helmert-Str. 2-3
D-14482 Potsdam, Germany
Paper accepted at SSDBM
Proceedings of the 24th International Conference on Scientific and Statistical Database...
JWS Article Accepted
Integrating Open Government Data with Stratosphere for more Transparency Arvid Heise and Felix...
LREC Paper Accepted
The eighth international conference on Language Resources and Evaluation (LREC), Istanbul,...
Daniel Rinser wins award for his masters thesis
IQ Best Master Degree Wettbewerb der Deutschen Gesellschaft für Informations- und Datenqualität e....
HPI TV releases video about GovWILD
See the new video about our Government Data Integration platform GovWILD.
Tool voidGen released
As part of our winning submission at the 2010 Billion Triple Challenge at the International...
ICDE Paper Accepted
28th IEEE International Conference on Data Engineering (ICDE) Washington, DC, USA Adaptive...
Topic
This algorithm tries to find matches between two different tables with the help of duplicates.
Corresponding Papers
- Schema Matching Using Duplicates
(by Alexander Bilke, Felix Naumann)
Requirements
- Java 1.5 or higher
- 2 *.csv files
- The first line in every file has to hold the names of the corresponding column separated by semicolons.
- Separation-char: semicolon (";")
- Special columns has to hold following names:
- KeyCol (at least one is required): holds the primary key of the table
- RWOId (one or no column): this column holds the id for the real world objects
Command
- java -jar DumasOnFiles.jar <file1> <file2>
Sample files
Using the sample files listed below, you should get the following solution:
- Matchings:
- B -> B'
- E -> E'
- Unmatched sampleR.csv:
- A, C, D
- Unmatched sampleS.csv:
- F, G
Download
- jar archive: DumasOnFiles.jar
- Sample files: sampleR.csv sampleS.csv


