Hasso-Plattner-Institut für Softwaresystemtechnik
Information systems group

Prof. Dr. Felix Naumann

Hasso-Plattner-Institut
für Softwaresystemtechnik
Prof.-Dr.-Helmert-Str. 2-3
D-14482 Potsdam, Germany

Information systems group

The research goal of the Information Systems Group is the efficient and effective management of heterogeneous information in large, autonomous systems. This includes methods for data profiling, data cleansing, search, and metadata management. Please also see our welcome video.

   

Research topics

An article in the Data Engineering Bulletin gives a good overview of some of our research topics: "Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies" (2006). Further details can be found on our project and our publications page. In addition, we maintain a repeatability site to publish code and data.

  • Data quality / information quality: The quality of data is measured in many different dimensions. Quality values can be aggregated along data operations, for instance to calculate the quality of query results.
    LinksICIQ 2009
    German: Schlagwort "Datenqualität" im Informatik Spektrum
  • Duplicate detection: Duplicates are multiple, different representations of the same real-world object, for instance, multiple records of a customer in a CRM database. Duplicate detection try to build systems that efficiently and effectively find such duplicates in large data sets.
    LinksSynthesis lecturerepeatabilityDuDe
    German: Duplikaterkennung allgemeinverständlich
  • Linked Open Data (LOD): More and more sources provide data in RDF form as linked open data. Such data serves as use case in a variety of projects.
    Links: HPI's open data activities, ProLOD
  • Service-oriented Computing (SOC): SOC has been a popular approach to enterprise and distributed applications. It is typically achieved through Web Services. The increasing number of the offered Web Services over the web has been reflected in the limited usability of these Web Services. In our research, we aim at increasing the usability of public Web Services through Information Integration techniques, such as web crawling, annotation extraction, classification, etc.
     LinksPoSRDepot (online demo)
  • Similarity Search: Queries often do not exactly match desired objects in the data store. To also find similar matches for a query, a similarity measure as well as a similarity-aware index structure are necessary.
    Links: Similarity search research projectSimilarity Search Algorithms seminar (German)
  • Data profiling: When integrating heterogeneous sources, details of the schema, such as foreign key dependencies, are often unknown. We are developing data profiling methods to automatically detect these and other dependencies in very large databases. In the context of the Aladin project these methods are applied to life sciences databases.
    Links
    GermanProLOD seminar
  • ETL Management: ETL-processes are defined to integrate heterogeneous data into a data warehouse. ETL management is the systematic, semi-automatic management of large sets of such processes. It includes several simple operators, such as IMPORT and SEARCH, and more complex operators, such as MATCH, MERGE, or INVERT.
    Links: Bachelorprojects GrETL and MoritzMETL 
  • Data Fusion: Data fusion is the process of fusing multiple records representing the same real-world object, i.e., duplicates, into a single, consistent, and clean representation. Challenges are scalability over large data volumes and conflict resolution of contradictory values.
    LinksFuSemHummerACM computing surveyVLDB tutorial
  • Schema matching: Schema matching is the (semi-automatic) process of detecting attribute correspondences between two heterogeneous schemata. These correspondences can subsequently be used to create a schema mapping to be used for data transformation or data exchange.

Teaching

  • Bachelor: We offer regular german lectures in database systems, namely Datenbanksysteme I (DBS I) und Datenbanksysteme II (DBS II). In addition we offer the regular seminar "Beauty is our Business" and many other project-oriented seminars.
    One-year Bachelor Projects with 6-8 students finalize bachelor studies at HPI. Our group offers one or two such projects per year in cooperation with external partners.
  • Master: We alternately offer the bi-annual courses "Information Integration" and "Search Engines". In addition we offer diverse specialized seminars, some theoretical, some project-oriented.

Library

Our group library catalog is online. Books can be loaned.

The following word cloud was generated at http://www.wordle.net using this article.