paint-brush
Index Support in BaseX, eXist-DB, Saxon, and libxml2: Explainedby@xpath
New Story

Index Support in BaseX, eXist-DB, Saxon, and libxml2: Explained

by XPath
XPath HackerNoon profile picture

XPath

@xpath

Searching and sorting with surgical grace, uncovering data's hidden gems,...

March 12th, 2025
Read on Terminal Reader
Read this story in a terminal
Print this story
Read this story w/o Javascript
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

We are aware of only one automated testing approach that has been proposed to test XML processors

Companies Mentioned

Mention Thumbnail
Abstract
Mention Thumbnail
Microsoft
featured image - Index Support in BaseX, eXist-DB, Saxon, and libxml2: Explained
1x
Read by Dr. One voice-avatar

Listen to this story

XPath HackerNoon profile picture
XPath

XPath

@xpath

Searching and sorting with surgical grace, uncovering data's hidden gems, in a world of structured information.

About @xpath
LEARN MORE ABOUT @XPATH'S
EXPERTISE AND PLACE ON THE INTERNET.
0-item

STORY’S CREDIBILITY

Academic Research Paper

Academic Research Paper

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Abstract and 1 Introduction

2 Background

3 Approach and 3.1 Differential Testing for XML Processors

3.2 XPath Expression Generation

3.3 XML Generation

4 Evaluation

4.1 Effectiveness

4.2 Efficiency

4.3 Comparison to the State of the Art

4.4 Analysis of BaseX Historical Bug Reports

5 Related Work

6 Conclusion, Acknowledgments, and References

4.3 Comparison to the State of the Art

We are aware of only one automated testing approach that has been proposed to test XML processors [41]. It tackled the test oracle problem by using differential testing by comparing the results of Microsoft’s SQLServer with and without using indexes. Their approach was specifically designed to test SQLServer’s index support and is not publicly available. Due to the narrow testing scope, and since the tool is not publicly available, we could not conduct experiments to directly compare the approaches. However, we further extended our tool to support differential testing with index configurations. Both approaches are complementary, as XPress could not only use differential testing among various XML processors, but also create or omit indexes to find additional bugs.


Index support in BaseX, eXist-DB, Saxon, and libxml2. Database indexes are data structures built to speed up data retrieval [31] and are DBMS-specific. Not all XML processors are DBMSs—as in-memory processors, Saxon and libxml2 lack support for indexes. BaseX and eXist-DB both enable structural indexes, such as storing all distinct paths of nodes by default. For value indexes to optimize querying on content values, BaseX creates text index and attribute index automatically. Users can further define additional indexes. Additionally, BaseX provides token indexes, which apply to specific functions, such as contains-token. eXist supports range indexes, which could be defined for specific nodes or attributes to speed up related comparison searches on their contents.


Methodology. We tested eXist’s range index and BaseX’s token index using the XPath expression generation approach as described in Section 3.2. Due to the found unfixed bugs in eXist, we conducted differential testing within eXist by checking the results with and


Figure 9: Found bug with token index in BaseX.

Figure 9: Found bug with token index in BaseX.


without range index definition. For BaseX, we defined a token index and compared its results directly with the results of Saxon.


Results. Throughout the testing method, we detected one additional bug for BaseX[10] and found no additional bugs in eXist. We reported the found bug shown in Figure 9 to the BaseX developers, who quickly fixed it. The query selects all nodes with tag name M in the document which holds attribute v that contains token "a". BaseX returned node M without token index, as expected, while unexpectedly returning an empty result set when not using an index. Overall, while the results suggest that using or removing indexes might find additional bugs, doing so had low effectiveness. A potential explanation could be that our test-case generation approach does not consider when indexes could be applied, which might result in low testing efficiency.


This paper is available on arxiv under CC BY 4.0 DEED license.


[10] https://github.com/BaseXdb/basex/issues/2222

Authors:

(1) Shuxin Li, Southern University of Science and Technology China and Work done during an internship at the National University of Singapore (shuxin.li.lv@gmail.com);

(2) Manuel Rigger, National University of Singapore Singapore (rigger@nus.edu.sg).


L O A D I N G
. . . comments & more!

About Author

XPath HackerNoon profile picture
XPath@xpath
Searching and sorting with surgical grace, uncovering data's hidden gems, in a world of structured information.

TOPICS

THIS ARTICLE WAS FEATURED IN...

Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
Hackernoon
X
Threads
Bsky

Mentioned in this story

X REMOVE AD