XPATH Tutorial

 

 xpath xml xslt XPath is a set of syntax rules for defining parts of an XML document.

XPath is a major element in the W3C XSLT standard. Without XPath knowledge you will not be able to create XSLT documents.

XPath is a set of syntax rules for defining parts of an XML document.

What is XPath?

  • XPath is a syntax for defining parts of an XML document
  • XPath uses paths to define XML elements
  • XPath defines a library of standard functions
  • XPath is a major element in XSLT
  • XPath is not written in XML
  • XPath is a W3C Standard

Like Traditional File Paths

XPath uses path expressions to identify nodes in an XML document. These path expressions look very much like the expressions you see when you work with a computer file system:

w3schools/xpath/default.asp


XPath Example

Look at this simple XML document:

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<cd country="USA">
<title>Empire Burlesque</title>
<artist>Bob Dylan</artist>
<price>10.90</price>
</cd>
<cd country="UK">
<title>Hide your heart</title>
<artist>Bonnie Tyler</artist>
<price>9.90</price>
</cd>
<cd country="USA">
<title>Greatest Hits</title>
<artist>Dolly Parton</artist>
<price>9.90</price>
</cd>
</catalog>

The XPath expression below selects the ROOT element catalog:

/catalog

The XPath expression below selects all the cd elements of the catalog element:

/catalog/cd

The XPath expression below selects all the price elements of all the cd elements of the catalog element:

/catalog/cd/price

Note: If the path starts with a slash ( / ) it represents an absolute path to an element!

XPath Defines a Library of Standard Functions

XPath defines a library of standard functions for working with strings, numbers and Boolean expressions.

The XPath expression below selects all the cd elements that have a price element with a value larger than 10.80:

/catalog/cd[price>10.80]

XPath is used in XSLT

XPath is a major element of the XSLT standard. Without XPath knowledge you will not be able to create XSLT documents.

You can read more about XSLT in our XSLT tutorial.

XPath is a W3C Standard

XPath was released as a W3C Recommendation 16. November 1999 as a language for addressing parts of an XML document.

XPath was designed to be used by XSLT, XPointer and other XML parsing software.

You can read more about XML and XSL standards in our W3C tutorial.

XPath uses path expressions to locate nodes within XML documents

XML Example Document

We will use this simple XML document to describe the XPath syntax:

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<cd country="USA">
<title>Empire Burlesque</title>
<artist>Bob Dylan</artist>
<price>10.90</price>
</cd>
<cd country="UK">
<title>Hide your heart</title>
<artist>Bonnie Tyler</artist>
<price>9.90</price>
</cd>
<cd country="USA">
<title>Greatest Hits</title>
<artist>Dolly Parton</artist>
<price>9.90</price>
</cd>
</catalog>

Locating Nodes

XML documents can be represented as a tree view of nodes (very similar to the tree view of folders you can see on your computer).

XPath uses a pattern expression to identify nodes in an XML document. An XPath pattern is a slash-separated list of child element names that describe a path through the XML document. The pattern “selects” elements that match the path.

The following XPath expression selects all the price elements of all the cd elements of the catalog element:

/catalog/cd/price

Note: If the path starts with a slash ( / ) it represents an absolute path to an element!

Note: If the path starts with two slashes ( // ) then all elements in the document that fulfil the criteria will be selected (even if they are at different levels in the XML tree)!

The following XPath expression selects all the cd elements in the document:

//cd

 Selecting Unknown Elements

 Wildcards ( * ) can be used to select unknown XML elements.

The following XPath expression selects all the child elements of all the cd elements of the catalog element:

/catalog/cd/*

The following XPath expression selects all the price elements that are grandchild elements of the catalog element:

/catalog/*/price

The following XPath expression selects all price elements which have 2 ancestors:

/*/*/price

The following XPath expression selects all elements in the document:

//*

 Selecting Branches

By using square brackets in an XPath expression you can specify an element further.

The following XPath expression selects the first cd child element of the catalog element:

/catalog/cd[1]

The following XPath expression selects the last cd child element of the catalog element (Note: There is no function named first()):

/catalog/cd[last()]

The following XPath expression selects all the cd elements of the catalog element that have a price element:

/catalog/cd[price]

The following XPath expression selects all the cd elements of the catalog element that have a price element with a value of 10.90:

/catalog/cd[price=10.90]

The following XPath expression selects all the price elements of all the cd elements of the catalog element that have a price element with a value of 10.90:

/catalog/cd[price=10.90]/price

 

Selecting Several Paths

By using the | operator in an XPath expression you can select several paths.

The following XPath expression selects all the title and artist elements of the cd element of the catalog element:

/catalog/cd/title | /catalog/cd/artist

The following XPath expression selects all the title and artist elements in the document:

//title | //artist

The following XPath expression selects all the title, artist and price elements in the document:

//title | //artist | //price

The following XPath expression selects all the title elements of the cd element of the catalog element, and all the artist elements in the document:

/catalog/cd/title | //artist

Selecting Attributes

In XPath all attributes are specified by the @ prefix.

This XPath expression selects all attributes named country:

//@country

This XPath expression selects all cd elements which have an attribute named country:

//cd[@country]

This XPath expression selects all cd elements which have any attribute:

//cd[@*]

This XPath expression selects all cd elements which have an attribute named country with a value of ‘UK’:

//cd[@country='UK']

 A location path expression results in a node-set.

Location Path Expression

A location path can be absolute or relative.

An absolute location path starts with a slash ( / ) and a relative location path does not. In both cases the location path consists of one or more location steps, each separated by a slash:

An absolute location path:/step/step/...A relative location path:step/step/...

The location steps are evaluated in order one at a time, from left to right. Each step is evaluated against the nodes in the current node-set. If the location path is absolute, the current node-set consists of the root node. If the location path is relative, the current node-set consists of the node where the expression is being used. Location steps consist of:

  • an axis (specifies the tree relationship between the nodes selected by the location step and the current node)
  • a node test (specifies the node type and expanded-name of the nodes selected by the location step)
  • zero or more predicates (use expressions to further refine the set of nodes selected by the location step)

The syntax for a location step is:

axisname::nodetest[predicate]

Example:

child::price[price=9.90]

 

Axes and Node Tests

An axis defines a node-set relative to the current node. A node test is used to identify a node within an axis. We can perform a node test by name or by type.

AxisName Description
ancestor Contains all ancestors (parent, grandparent, etc.) of the current node

Note: This axis will always include the root node, unless the current node is the root node

ancestor-or-self Contains the current node plus all its ancestors (parent, grandparent, etc.)
attribute Contains all attributes of the current node
child Contains all children of the current node
descendant Contains all descendants (children, grandchildren, etc.) of the current node

Note: This axis never contains attribute or namespace nodes

descendant-or-self Contains the current node plus all its descendants (children, grandchildren, etc.)
following Contains everything in the document after the closing tag of the current node
following-sibling Contains all siblings after the current node

Note: If the current node is an attribute node or namespace node, this axis will be empty

namespace Contains all namespace nodes of the current node
parent Contains the parent of the current node
preceding Contains everything in the document that is before the starting tag of the current node
preceding-sibling Contains all siblings before the current node

Note: If the current node is an attribute node or namespace node, this axis will be empty

self Contains the current node

Examples

Example Result
child::cd Selects all cd elements that are children of the current node (if the current node has no cd children, it will select an empty node-set)
attribute::src Selects the src attribute of the current node (if the current node has no src attribute, it will select an empty node-set)
child::* Selects all child elements of the current node
attribute::* Selects all attributes of the current node
child::text() Selects the text node children of the current node
child::node() Selects all the children of the current node
descendant::cd Selects all the cd element descendants of the current node
ancestor::cd Selects all cd ancestors of the current node
ancestor-or-self::cd Selects all cd ancestors of the current node and, if the current node is a cd element, the current node as well
child::*/child::price Selects all price grandchildren of the current node
/ Selects the document root

 

Predicates

A predicate filters a node-set into a new node-set. A predicate is placed inside square brackets ( [ ] ).

Examples

Example Result
child::price[price=9.90] Selects all price elements that are children of the current node with a price element that equals 9.90
child::cd[position()=1] Selects the first cd child of the current node
child::cd[position()=last()] Selects the last cd child of the current node
child::cd[position()=last()-1] Selects the last but one cd child of the current node
child::cd[position()<6] Selects the first five cd children of the current node
/descendant::cd[position()=7] Selects the seventh cd element in the document
child::cd[attribute::type=”classic”] Selects all cd children of the current node that have a type attribute with value classic

 

Location Path Abbreviated Syntax

Abbreviations can be used when describing a location path.

The most important abbreviation is that child:: can be omitted from a location step.

Abbr Meaning Example
none child:: cd is short for child::cd
@ attribute:: cd[@type=”classic”] is short for
child::cd[attribute::type=”classic”]
. self::node() .//cd is short for
self::node()/descendant-or-self::node()/child::cd
.. parent::node() ../cd is short for
parent::node()/child::cd
// /descendant-or-self::node()/ //cd is short for
/descendant-or-self::node()/child::cd

Examples

Example Result
cd Selects all the cd elements that are children of the current node
* Selects all child elements of the current node
text() Selects all text node children of the current node
@src Selects the src attribute of the current node
@* Selects all the attributes of the current node
cd[1] Selects the first cd child of the current node
cd[last()] Selects the last cd child of the current node
*/cd Selects all cd grandchildren of the current node
/book/chapter[3]/para[1] Selects the first para of the third chapter of the book
//cd Selects all the cd descendants of the document root and thus selects all cd elements in the same document as the current node
. Selects the current node
.//cd Selects the cd element descendants of the current node
.. Selects the parent of the current node
../@src Selects the src attribute of the parent of the current node
cd[@type=”classic”] Selects all cd children of the current node that have a type attribute with value classic
cd[@type=”classic”][5] Selects the fifth cd child of the current node that has a type attribute with value classic
cd[5][@type=”classic”] Selects the fifth cd child of the current node if that child has a type attribute with value classic
cd[@type and @country] Selects all the cd children of the current node that have both a type attribute and a country attribute

 

 

XPath supports numerical, equality, relational, and Boolean expressions.

Numerical Expressions

Numerical expressions are used to perform arithmetic operations on numbers.

Operator Description Example Result
+ Addition 6 + 4 10
Subtraction 6 – 4 2
* Multiplication 6 * 4 24
div Division 8 div 4 2
mod Modulus (division remainder) 5 mod 2 1

Note: XPath always converts each operand to a number before performing an arithmetic expression.

Equality Expressions

Equality expressions are used to test the equality between two values.

Operator Description Example Result
= Like (equal) price=9.80 true (if price is 9.80)
!= Not like (not equal) price!=9.80 false

 

Testing Against a Node-Set

If the test value is tested for equality against a node-set, the result is true if the node-set contains any node with a value that matches the test value.

If the test value is tested for not equal against a node-set, the result is true if the node-set contains any node with a value that is different from the test value.

The result is that the node-set can be equal and not equal at the same time !!!

Relational Expressions

Relational expressions are used to compare two values.

Operator Description Example Result
< Less than price<9.80 false (if price is 9.80)
<= Less or equal price<=9.80 true
> Greater than price>9.80 false
>= Greater or equal price>=9.80 true

Note: XPath always converts each operand to a number before performing the evaluation.

Boolean Expressions

Boolean expressions are used to compare two values.

Operator Description Example Result
or or price=9.80 or price=9.70 true (if price is 9.80)
and and price<=9.80 and price=9.70 false

 XPath contains a function library for converting data.

XPath Function Library

The XPath function library contains a set of core functions for converting and translating data.

Node Set Functions

Name Description Syntax
count() Returns the number of nodes in a node-set number=count(node-set)
id() Selects elements by their unique ID node-set=id(value)
last() Returns the position number of the last node in the processed node list number=last()
local-name() Returns the local part of a node. A node usually consists of a prefix, a colon, followed by the local name string=local-name(node)
name() Returns the name of a node string=name(node)
namespace-uri() Returns the namespace URI of a specified node uri=namespace-uri(node)
position() Returns the position in the node list of the node that is currently being processed number=position()

String Functions

Name Description Syntax & Example
concat() Returns the concatenation of all its arguments string=concat(val1, val2, ..)

Example:
concat(‘The’,’ ‘,’XML’)
Result: ‘The XML’

contains() Returns true if the second string is contained within the first string, otherwise it returns false bool=contains(val,substr)

Example:
contains(‘XML’,’X’)
Result: true

normalize-space() Removes leading and trailing spaces from a string, and replaces all internal sequences of white with one white space string=normalize-space(string)

Example:
normalize-space(‘ The   XML ‘)
Result: ‘The XML’

starts-with() Returns true if the first string starts with the second string, otherwise it returns false bool=starts-with(string,substr)

Example:
starts-with(‘XML’,’X’)
Result: true

string() Converts the value argument to a string string(value)

Example:
string(314)
Result: ‘314’

string-length() Returns the number of characters in a string number=string-length(string)

Example:
string-length(‘Beatles’)
Result: 7

substring() Returns a part of the string in the string argument string=substring(string,start,length)

Example:
substring(‘Beatles’,1,4)
Result: ‘Beat’

substring-after() Returns the part of the string in the string argument that occurs after the substring in the substr argument string=substring-after(string,substr)

Example:
substring-after(’12/10′,’/’)
Result: ’10’

substring-before() Returns the part of the string in the string argument that occurs before the substring in the substr argument string=substring-before(string,substr)

Example:
substring-before(’12/10′,’/’)
Result: ’12’

translate() Performs a character by character replacement. It looks in the value argument for characters contained in string1, and replaces each character for the one in the same position in the string2 string=translate(value,string1,string2)

Examples:
translate(’12:30′,’30’,’45’)
Result: ’12:45′

translate(’12:30′,’03’,’54’)
Result: ’12:45′

translate(’12:30′,’0123′,’abcd’)
Result: ‘bc:da’

Number Functions

Name Description Syntax & Example
ceiling() Returns the smallest integer that is not less than the number argument number=ceiling(number)

Example:
ceiling(3.14)
Result: 4

floor() Returns the largest integer that is not greater than the number argument number=floor(number)

Example:
floor(3.14)
Result: 3

number() Converts the value argument to a number number=number(value)

Example:
number(‘100’)
Result: 100

round() Rounds the number argument to the nearest integer integer=round(number)

Example:
round(3.14)
Result: 3

sum() Returns the total value of a set of numeric values in a node-set number=sum(nodeset)

Example:
sum(/cd/price)

Boolean Functions

Name Description Syntax & Example
boolean() Converts the value argument to Boolean and returns true or false bool=boolean(value)
false() Returns false false()

Example:
number(false())
Result: 0

lang() Returns true if the language argument matches the language of the xsl:lang element, otherwise it returns false bool=lang(language)
not() Returns true if the condition argument is false, and false if the condition argument is true bool=not(condition)

Example:
not(false())

true() Returns true true()

Example:
number(true())
Result: 1

 

 

We will use the CD catalog from our XML tutorial to demonstrate some XPath examples.

The CD catalog

If you have studied our XML tutorial, you will remember this XML document:

(A fraction of the CD catalog)

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<cd>
<title>Empire Burlesque</title>
<artist>Bob Dylan</artist>
<country>USA</country>
<company>Columbia</company>
<price>10.90</price>
<year>1985</year>
</cd>
<cd>
<title>Hide your heart</title>
<artist>Bonnie Tyler</artist>
<country>UK</country>
<company>CBS Records</company>
<price>9.90</price>
<year>1988</year>
</cd>
.
.
.
.
</catalog>

I you have IE 5 or higher you can look at the cdcatalog.xml.

Selecting Nodes

We will demonstrate how to select nodes from the XML document by using the selectNodes function in Internet Explorer. This function takes a location path expression as an argument:

xmlobject.selectNodes(XPath expression)

 

Selecting cd Nodes

The following example selects all the cd nodes from the CD catalog:

xmlDoc.selectNodes("/catalog/cd")

If you have IE 5 or higher you can try it yourself.

Selecting the First cd Node

The following example selects only the first cd node from the CD catalog:

xmlDoc.selectNodes("/catalog/cd[0]")

If you have IE 5 or higher you can try it yourself.

Note: IE 5 has implemented that [0] should be the first node, but according to the W3C standard it should have been [1].

Selecting price Nodes

The following example selects all the price nodes from the CD catalog:

xmlDoc.selectNodes("/catalog/cd/price")

If you have IE 5 or higher you can try it yourself.

Selecting price Text Nodes

The following example selects only the text from the price nodes:

xmlDoc.selectNodes("/catalog/cd/price/text()")

If you have IE 5 or higher you can try it yourself.

Selecting cd Nodes with Price>10.80

The following example selects all the cd nodes with a price>10.80:

xmlDoc.selectNodes("/catalog/cd[price>10.80]")

If you have IE 5 or higher you can try it yourself.

Selecting price Nodes with Price>10.80

The following example selects all the price nodes with a price>10.80:

xmlDoc.selectNodes("/catalog/cd[price>10.80]/price")

If you have IE 5 or higher you can try it yourself.

Click below to find out more what Appleyard’s EDI Services can do for you

Leave a Reply

Your email address will not be published. Required fields are marked *

*