Jun 2012, Volume 6 Issue 3

  • Select all

    We consider data exchange for XML documents: given source and target schemas, a mapping between them, and a document conforming to the source schema, construct a target document and answer target queries in a way that is consistent with the source information. The problem has primarily been studied in the relational context, in which dataexchange systems have also been built.

    Since many XML documents are stored in relations, it is natural to consider using a relational system for XML data exchange. However, there is a complexity mismatch between query answering in relational and in XML data exchange. This indicates that to make the use of relational systems possible, restrictions have to be imposed on XML schemas and mappings, as well as on XML shredding schemes.

    We isolate a set of five requirements that must be fulfilled in order to have a faithful representation of the XML data-exchange problem by a relational translation. We then demonstrate that these requirements naturally suggest the inlining technique for data-exchange tasks. Our key contribution is to provide shredding algorithms for schemas, documents, mappings and queries, and demonstrate that they enable us to correctly perform XML data-exchange tasks using a relational system.

    Min XIE, Laks V. S. LAKSHMANAN, Peter T. WOOD

    Classical recommender systems provide users with a list of recommendations where each recommendation consists of a single item, e.g., a book or DVD. However, several applications can benefit from a system capable of recommending packages of items, in the form of sets. Sample applications include travel planning with a limited budget (price or time) and twitter users wanting to select worthwhile tweeters to follow, given that they can deal with only a bounded number of tweets. In these contexts, there is a need for a system that can recommend the top-k packages for the user to choose from.

    Motivated by these applications, we consider composite recommendations, where each recommendation comprises a set of items. Each item has both a value (rating) and a cost associated with it, and the user specifies a maximum total cost (budget) for any recommended set of items. Our composite recommender system has access to one or more component recommender systems focusing on different domains, as well as to information sources which can provide the cost associated with each item. Because the problem of decidingwhether there is a recommendation (package)whose cost is under a given budget and whose value exceeds some threshold is NP-complete, we devise several approximation algorithms for generating the top-k packages as recommendations. We analyze the efficiency as well as approximation quality of these algorithms. Finally, using two real and two synthetic datasets, we subject our algorithms to thorough experimentation and empirical analysis. Our findings attest to the efficiency and quality of our approximation algorithms for the top-k packages compared to exact algorithms.

    Jaffer GARDEZI, Leopoldo BERTOSSI, Iluju KIRINGA

    Matching dependencies (MDs) are used to declaratively specify the identification (or matching) of certain attribute values in pairs of database tuples when some similarity conditions on other values are satisfied. Their enforcement can be seen as a natural generalization of entity resolution. In what we call the pure case of MD enforcement, an arbitrary value from the underlying data domain can be used for the value in common that is used for a matching. However, the overall number of changes of attribute values is expected to be kept to a minimum. We investigate this case in terms of semantics and the properties of data cleaning through the enforcement of MDs. We characterize the intended clean instances, and also the clean answers to queries, as those that are invariant under the cleaning process. The complexity of computing clean instances and clean query answering is investigated. Tractable and intractable cases depending on the MDs are identified and characterized.

    Pei LI, Xin Luna DONG, Andrea MAURINO, Divesh SRIVASTAVA

    Many data sets contain temporal records which span a long period of time; each record is associated with a time stamp and describes some aspects of a real-world entity at a particular time (e.g., author information in DBLP). In such cases, we often wish to identify records that describe the same entity over time and so be able to perform interesting longitudinal data analysis. However, existing record linkage techniques ignore temporal information and fall short for temporal data.

    This article studies linking temporal records. First, we apply time decay to capture the effect of elapsed time on entity value evolution. Second, instead of comparing each pair of records locally, we propose clustering methods that consider the time order of the records and make global decisions. Experimental results show that our algorithms significantly outperform traditional linkage methods on various temporal data sets.