paint-brush
Getting Your Way Around With Common XPath Methodsby@truuts
140 reads

Getting Your Way Around With Common XPath Methods

by Eugene TruutsOctober 18th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This tutorial will clarify various methods for selecting elements and data within XML or HTML documents. I’ve included for you the explanations and examples of standard XPath methods.
featured image - Getting Your Way Around With Common XPath Methods
Eugene Truuts HackerNoon profile picture


This tutorial will clarify various methods for selecting elements and data within XML or HTML documents. I’ve included for you the explanations and examples of standard XPath methods.


Node Selection

Node selection in XPath refers to choosing specific elements, attributes, or nodes within an XML or HTML document based on their type or location in the document’s hierarchy.

//img

Select all image elements on a page


Attribute Selection

Attribute selection in XPath involves choosing elements within an XML or HTML document based on their attributes’ values.

//*[@id = 'gridItemRoot']

Select all Bestseller elements in a grid with the attribute id set to “gridItemRoot.”


Predicate Filtering

Predicate filtering in XPath applies conditions or filters to select specific elements or nodes based on certain criteria. Use conditions inside square brackets to filter elements.

//span[contains(@class, 'sc-price') and number(translate(., '$', '')) < 10.00]

Select all price elements less than $10


Positional Selection

Positional selection in XPath involves choosing elements within an XML or HTML document based on their position or index in its structure.

//*[@id = 'gridItemRoot'][4]

Select the fourth gridItemRoot element in the Bestseller grid


Text Content Selection

Text content selection in XPath refers to choosing elements within an XML or HTML document based on the textual content contained within those elements.

//*[text()='The 48 Laws of Power']

Select the title element with “The 48 Laws of Power” text value


Logical Operators

Logical operators in XPath are used to combine or modify conditions within an XPath expression to make more complex selections.

//div[@id='gridItemRoot' and //*[contains(@class, 'a-icon-star-small')] and .//span[contains(@class, 'sc-price') and number(translate(., '$', '')) < 10.00]]

Select all gridItemRoot elements that have both an “a-icon-star-small” element inside and a “sc-price” less than $10


Axis Selection

Axis selection in XPath involves navigating the document’s hierarchy based on the relationships between elements and nodes, allowing you to select elements related to a specific context node.

Parent Selection

The “parent” XPath is used to select the parent element of a given element. It allows you to navigate the document’s hierarchy to access a specific node's immediate or nearest enclosing parent element.

//li[contains(.//span, 'Comics & Graphic Novels')]/parent::*
or
//li[contains(.//span, 'Comics & Graphic Novels')]/..

Select the entire Best sellers grid — a parent element of the “Comics & Graphic Novels” element.


Preceding Sibling Selection

Preceding sibling selection in XPath allows you to select elements that are siblings of a given context node and appear before it in the document’s hierarchy.

//li[contains(.//span, 'Comics & Graphic Novels')]/preceding-sibling::*

Select all preceding sibling elements for the “Comics & Graphic Novels” element.


Following Sibling Selection

Following sibling selection in XPath allows you to select elements that are siblings of a given context node and appear after it in the document’s hierarchy.

//li[contains(.//span, 'Comics & Graphic Novels')]/following-sibling::*

Select all preceding following elements for the “Comics & Graphic Novels” element.


Child Selection

Child selection in XPath involves selecting elements that are direct children of a given parent element or context node within the XML or HTML document.

//li[contains(.//span, 'Comics & Graphic Novels')]/../child::*

Find the parent of the “Comics & Graphic Novels” element and get all child elements.


Wildcards

Wildcard selection in XPath involves using wildcard symbols to match elements or attributes regardless of their specific names or values.

//*

Select all elements in the document.


Functions

Functions in XPath are predefined operations or calculations that you can use within an XPath expression to manipulate or evaluate nodes, attributes, or values in XML or HTML documents.

//*[@class = 'a-size-small']//child::*[starts-with(text(), 'It')]

Select all titles starting with “It”


These methods offer versatile ways to locate specific elements, attributes, or data within XML and HTML documents, making XPath a powerful tool for tasks such as web scraping, data extraction, and test automation.


Also published here.