XPath Cheat Sheet: A Quick Reference to Essential XPath Expressions
Contents
If you’re a developer who has tried to locate an HTML element from a web page programmatically, whether it’s for automated testing, web scraping, or other automated tasks, you must have came across XPath. It is offered by a wide range of tools and technologies, including web browsers, libraries, web scrappers, and automated testing frameworks like Selenium to navigate the HTML/XML document.
As it can be applied on a wide range of applications, mastering it can be a valuable skill for developers. However, it can be challenging to construct an XPath expression when the document structure is complex. Here, we have compiled an XPath cheat sheet, providing you with a list of essential XPath expressions at your fingertips to save you time and frustration. Let’s dive in!
What is XPath
XPath, as known as XML Path Language, is an expression language that is used to navigate and select elements in an HTML or XML document. It provides a way to locate elements on a web page based on their tag name, attributes, position, text content, and more in the document’s hierarchy.
The XPath expression is constructed according to the position of the HTML element in the document's hierarchy. It's like a map that leads you to the target from a starting point. Therefore, you can use XPath to locate an element on a web page when its ID, class, name, and other attributes are unavailable and you can’t use DOM.
XPath Basics
XPath uses a path notation that is similar to URLs to navigate the hierarchical structure of an HTML/XML document for finding an element/node. Each step in the XPath expression is separated by a slash ( “/” or “//” ), which is one of the many axis types available.
Axis
An axis is used to define the relationships between elements in an HTML/XML document. It allows you to navigate the document structure in a specific direction or pattern and select elements based on their position or relationship to other elements.
The slashes ( “/” or “//” ) traverse down the document to select the child or descendant elements of the current element.
Axis | Description | Example |
---|---|---|
/ | Selects all child elements of the current element. | /html/body/h1 - Selects the <h1> element that is a direct child of the <body> element. |
// | Selects all descendant elements of the current element, regardless of their depth. | /html/body//h3 - Selects all <h3> elements anywhere in the document that are descendants of the <body> element. |
When they’re used at the beginning of an XPath expression, they indicate whether the expression is a/an:
- Absolute XPath that starts from the document's root element and navigates down the hierarchy of the elements until the target element is found, or…
- Relative XPath that starts from a known element and provides a path to the target element based on its relationship to the element.
Axis | Description | Example |
---|---|---|
/ | Absolute XPath | /html/body/div - Selects all <div> elements that are nested within the <body> element, within the root <html> element. |
// | Relative XPath | //h1[@class='title'] - Selects all <h1> elements with the class attribute set to 'title', regardless of their positions in the document. |
Besides using slashes to traverse down the document to find the child or descendant elements, you can also traverse in other directions using these axes:
Axis | Description | Example |
---|---|---|
parent::/.. | Selects the parent element of the current element. | //a/parent:: or //a/.. - Selects the parent element of <a> elements |
ancestor:: | Selects all ancestor elements of the current element. | //a/ancestor::li - Selects all <li> elements that are ancestors of <a> elements. |
following:: | Selects all elements that appear after the current element in document order. | //div/li[2]/following::a - Selects all <a> elements that appear after the second <li> element. |
preceding:: | Selects all elements that appear before the current element in document order. | //div/li[2]/preceding::a - Selects all <a> elements that appear before the second <li> element. |
following-sibling:: | Selects all sibling elements that appear after the current element. | //li/following-sibling::li - Selects all <li> elements that are siblings of <li> elements and appear after them. |
preceding-sibling:: | Selects all sibling elements that appear before the current element | //li/preceding-sibling::li - Selects all <li> elements that are siblings of <li> elements and appear before them. |
self:: | Selects the current element.(It is not always necessary but can be beneficial in situations where you want to explicitly reference the current node and perform operations or apply predicates directly on it.) | //a/self::a - Selects the <a> elements themselves. |
@ | Selects the attributes of the current element. | //a/@href - Selects the href attribute of <a> elements. |
Node
Following the axes, you need to specify the type or name of the node to be selected. It can be an element name, attribute name, or other node types such as text or comment.
Examples
- Element name - /html/body/ div
- Element name with a predicate - //a[@class=“link”], //a[2]
- Attribute name - //a/ @href
- Function - //text()
- Wildcard - // * [@class=“warning”]
XPath Examples
Writing XPath expressions can be challenging, especially for beginners. As XPath has its own syntax and set of rules that you need to understand in order to construct effective expressions, we have compiled a list of essential XPath expressions as a quick reference for you to refer to.
Basic Node Selection
Expression | Description |
---|---|
By Tag Name | |
//E | Select all <E> elements. |
By Class or ID | |
//*[@class = 'class_name'] | |
//*[contains(@class, 'class_name')] | Select all elements with class "class_name”. |
//*[@id = 'id_name'] | Select an element with ID "id_name”. |
By Attribute | |
//*[@attribute_name] | Select all elements with the "attribute_name" attribute. |
//*[@attribute_name = 'attribute_value'] | Select all elements with a specific attribute value. |
//*[starts-with(@attribute_name, 'value')] | Select all elements with an attribute value that starts with a certain value. |
//*[ends-with(@attribute_name, 'value')] | Select all elements with an attribute value that ends with a certain value. |
//*[contains(@attribute_name, 'value')] | Select all elements with an attribute value that contains a certain value. |
//*[@attribute_name != 'value'] | Select all elements with an attribute value that is not equal to a certain value. |
By Position | |
//E[1] | Select the first occurrence of an element. |
//E[last()] | Select the last occurrence of an element. |
//E[n] | Select the n-th occurrence of an element. |
//E[position() >= n] | Select all elements starting from the n-th position. |
//E[position() <= n] | Select all elements up to the n-th position. |
//E[position() >= n and position() <= m] | Select all elements between the n-th and m-th positions. |
//E[position() mod 2 = 1] | Select all odd-numbered occurrences of an element. |
//E[position() mod 2 = 0] | Select all even-numbered occurrences of an element. |
By Text Content | |
//*[text()] | Select all elements with text content that is not empty. |
//*[text() = 'text_content'] | Select all elements with exact text content. |
//*[contains(text(), 'substring')] | Select all elements with text content containing a specific substring. |
//*[starts-with(text(), 'substring')] | Select all elements with text content that starts with a specific substring. |
//*[ends-with(text(), 'substring')] | |
//*[substring(text(), string-length(text()) - string-length('substring') + 1) = 'substring'] | Select all elements with text content that ends with a specific substring. |
//*[text() != 'undesired_text'] | Select all elements with text content that is not equal to a specific value. |
//*[matches(text(), 'regex_pattern')] | Select all elements with text content that matches a regular expression pattern. |
Note: For expressions in a predicate that are used to compare values (eg. [position() >= n]), you can use other comparison operators too, including =
, !=
, <
, >
, <=
, and >=
.
Using Operators to Combine Expressions
XPath provides several operators that can be used to combine multiple expressions to perform logical operations. Using these operators, you can write XPath expressions to select elements based on more complex conditions.
Expression | Description |
---|---|
Union (l) | |
//E1 l //E2 | Select all <E1> and <E2> elements. |
//(E1 l E2)/* | Select all elements that are direct children of either <E1> or <E2> elements. |
//*[@class='class1'] l //*[@class='class2'] | Select all elements with class "class1" or "class2”. |
//*[@attribute1] l //*[@attribute2] | Select all elements with the attribute "attribute1" or "attribute2". |
//*[text()='text1'] l //*[text()='text2'] | Select all elements with the text content "text1" or “text2”. |
//*[contains(text(), 'text1')] l //*[contains(text(), 'text2')] | Select all elements that contain the text "text1" or “text2”. |
AND | |
//*[@class='class1' and @class='class2'] | Select elements that have both class "class1" and "class2”. |
//*[@attribute1 and @attribute2] | Select elements with both the attribute "attribute1" and "attribute2. |
//*[@attribute1='value1' and @attribute2='value2'] | Select elements that have both the attribute "attribute1" and "attribute2" with specific values. |
//*[contains(text(), 'text1') and contains(text(), 'text2')] | Select elements that contain both the text "text1" and "text2”. |
OR | |
//*[@class='class1' or @class='class2'] | Select elements with class "class1" or class "class2”. |
//*[@attribute1 or @attribute2] | Select elements with the attribute "attribute1" or the attribute "attribute2”. |
//*[text()='text1' or text()='text2'] | Select elements with the text content "text1" or the text content "text2". |
//*[contains(text(), 'text1') or contains(text(), 'text2')] | Select elements that contain the text "text1" or the text "text2”. |
NOT | |
//*[not(@attribute_name)] | Select elements that do not have the attribute "attribute_name". |
/*[not(text()='text_content')] | Selects elements that do not contain the text "text_content". |
//*[not(@class)] | Selects elements that do not have a class attribute. |
//*[not(self::E)] | Selects elements that are not <E> elements. |
Note: Although the results are similar, the expression that uses “l” selects nodes that fulfill the conditions as separate node sets while the expression that uses “or” selects nodes that fulfill either of the conditions.
Using Functions in an XPath Expression
XPath also provides some built-in functions that can be used to perform various operations on elements/nodes and their values. Using these functions in an XPath expression, you can navigate and manipulate HTML/XML structures to select them more efficiently.
Expression | Description |
---|---|
string-length() | |
//*[string-length(text()) > n] | Select elements where the text content length is greater than n. |
//*[string-length(text()) mod 2 = 0] | Select elements where the text content length is even. |
//*[string-length(text()) mod 2 = 1] | Select elements where the text content length is odd. |
//*[@attribute_name[string-length(.) < n]] | Select elements where the attribute value length is less than n. |
normalize-space() | |
//*[normalize-space(text()) = 'text_content'] | Select all elements that have the normalized text content 'text_content'. |
local-name() | |
//*[local-name() = 'element_name'] | Select all elements with the local name 'element_name', regardless of the namespace. |
namespace-uri() | |
//*[namespace-uri() = 'namespace_URI'] | Select elements with the specified namespace URI 'namespace_URI', regardless of the local name. |
count() | |
//*[count(*)=n] | Select all elements that have exactly n child elements. |
//*[count(*)=0] | Select all elements that do not have child elements |
//*[count(E) = n] | Select all elements that have exactly n <E> elements. |
//parent[count(child) = n] | Select all <parent> elements that have exactly n <child> elements. |
//*[count(@*) = n] | Select all elements that have exactly n attributes. |
Other Functions
Not all functions can be used in an XPath expression to select nodes/elements directly. However, they can be used to perform specific operations like manipulating and retrieving data:
Function | Description | Example |
---|---|---|
String Functions | ||
concat() | Concatenates multiple strings together. | concat('Hello', ' ', 'World') - Returns the string 'Hello World'. |
substring() | Retrieves a portion of a string based on the specified start position and length. | substring('Hello World', 7) - Returns the string 'World'. |
substring-before() | Retrieves the substring before a specified delimiter. | substring-before("Hello, World", ",") - Returns the string "Hello". |
substring-after() | Retrieves the substring after a specified delimiter. | substring-after("Hello, World", ",") - Returns the string " World". |
translate() | Replaces characters in a string with other characters. | translate('Hello World', 'ol', 'OL') - Returns the string 'HeLLO WOrLd'. |
Math Functions | ||
sum() | Calculates the sum of a node set or a sequence of numbers. | sum(//span) - Returns the sum of all the <span> elements. |
min() | Finds the minimum value from a node set or a sequence of numbers. | min(//span) - Returns the smallest span value from all the <span> elements. |
max() | Finds the maximum value from a node set or a sequence of numbers. | max(//span) - Returns the largest span value from all the <span> elements. |
round() | Rounds a numeric value to the nearest whole number. | round(3.8) - Returns 4. |
floor() | Rounds a numeric value down to the nearest integer. | floor(3.8) - Returns 3. |
ceiling() | Rounds a numeric value up to the nearest integer. | ceiling(3.8) - Returns 4. |
Type Conversions | ||
string() | Converts a numeric value to a string representation. | string(123) - Returns the string "123". |
number() | Converts a string representation of a number to a numeric value. | number('3.8') - Returns the numeric value 3.8. |
boolean() | Evaluates an expression and return true or false. The expression can refer to numbers, node sets, or booleans. | boolean(//a) - Returns true if there is at least one <a> element in the document, otherwise false. |
Conclusion
As a developer, learning XPath can equip you with a versatile skillset that enables efficient HTML/XML document navigation for various types of projects. While this XPath cheat sheet serves as a handy resource to help you construct XPath expressions easily, there are tools that can generate them in a few clicks. If you’re looking to enhance your efficiency in preparing XPath expressions with these tools, check out our article on 9 Best Chrome Extensions to Find XPath for Selenium and Other Automation Tools.