Data Integration

by Michael Genesereth

Published 30 March 2010
Data integration is a critical problem in our increasingly interconnected but inevitably heterogeneous world. There are numerous data sources available in organizational databases and on public information systems like the World Wide Web. Not surprisingly, the sources often use different vocabularies and different data structures, being created, as they are, by different people, at different times, for different purposes.

The goal of data integration is to provide programmatic and human users with integrated access to multiple, heterogeneous data sources, giving each user the illusion of a single, homogeneous database designed for his or her specific need. The good news is that, in many cases, the data integration process can be automated.

This book is an introduction to the problem of data integration and a rigorous account of one of the leading approaches to solving this problem, viz., the relational logic approach. Relational logic provides a theoretical framework for discussing data integration. Moreover, in many important cases, it provides algorithms for solving the problem in a computationally practical way. In many respects, relational logic does for data integration what relational algebra did for database theory several decades ago. A companion web site provides interactive demonstrations of the algorithms.

Logic Programming is a style of programming in which programs take the form of sets of sentences in the language of Symbolic Logic.

Over the years, there has been growing interest in Logic Programming due to applications in deductive databases, automated worksheets, Enterprise Management (business rules), Computational Law, and General Game Playing. This book introduces Logic Programming theory, current technology, and popular applications.

In this volume, we take an innovative, model-theoretic approach to logic programming. We begin with the fundamental notion of datasets, i.e., sets of ground atoms. Given this fundamental notion, we introduce views, i.e., virtual relations; and we define classical logic programs as sets of view definitions, written using traditional Prolog-like notation but with semantics given in terms of datasets rather than implementation. We then introduce actions, i.e., additions and deletions of ground atoms; and we define dynamic logic programs as sets of action definitions.

In addition to the printed book, there is an online version of the text with an interpreter and a compiler for the language used in the text and an integrated development environment for use in developing and deploying practical logic programs.