Querying and ranking XML documents based on data synopses

Citation metadata

Authors: Weimin He and Teng Lv
Date: Oct. 2011
Publisher: Digital Information Research Foundation
Document Type: Report
Length: 5,252 words
Lexile Measure: 1580L

Document controls

Main content

Abstract :

There is an increasing interest in recent years for querying and ranking XML documents. In this paper, we present a new framework for querying and ranking schema-less XML documents based on concise summaries of their structural and textual content. We introduce a novel data synopsis structure to summarize the textual content of an XML document for efficient indexing. More importantly, we extend the traditional vector space model to effectively rank XML documents over the proposed data synopses. We conduct extensive experiments over XML benchmark data to demonstrate the advantages of the indexing scheme and the effectiveness of our ranking scheme. We also compare our framework with Lucene to demonstrate our extended TF*IDF scoring function is effective. Categories and Subject Descriptors D.3.2 [Language Classifications]: Data-flow languages; H.3.1 [Content Analysis and Indexing]: I.7 [Document and Text Processing]: Markup languages; H.3.3 [Information Search and Retrieval]: Query Formulation General Terms: XML, Information Retrieval, Data Processing Keywords: XML, Query processing, Document ranking, Query synopses, Document ranking Received: 11 June 2011, Revised 12 August 2011, Accepted 19 August 2011

Source Citation

Source Citation   

Gale Document Number: GALE|A338892907