How to Parse Your XML Data with Telegraf
By
Samantha Wang /
Use Cases, Product, Developer
Apr 14, 2021
Navigate to:
In March, we released Telegraf 1.18, which included a wide range of new input and output plugins. One exciting new addition was an XML Parser Plugin that added support for another input data format to parse into InfluxDB metrics.
What is XML?
XML stands for eXtensible Markup Language and is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.
XML is similar to HTML in being a markup language but is designed to be self-descriptive and to better store and transport data. For example, when you are trying to exchange data between incompatible systems and data needs to be converted, any data that is incompatible can be lost. XML aims to simplify that data sharing and transportation since it is stored in plain text format. This provides a software- and hardware-independent way of storing, transporting and sharing data.
Understanding your XML data
We will use the terms root, child, sub-child throughout this blog to help you understand which data points you’re trying to parse.
<root>
<child>
<subchild>.....</subchild>
</child>
</root>
XML documents must contain exactly one root element that is the parent of all other elements.
This XML weather example from OpenWeather is a good basic example to help us understand XML data structure and how to parse it.
<current>
<city id="5004223" name="Oakland">
<coord lon="-83.3999" lat="42.6667" />
<country>US</country>
<timezone>-14400</timezone>
<sun rise="2021-03-24T11:29:19" set="2021-03-24T23:50:05" />
</city>
<temperature value="62.26" min="61" max="64.4" unit="fahrenheit" />
<feels_like value="54.63" unit="fahrenheit" />
<humidity value="59" unit="%" />
<pressure value="1007" unit="hPa" />
<wind>
<speed value="12.66" unit="mph" name="Moderate breeze" />
<gusts value="24.16" />
<direction value="200" code="SSW" name="South-southwest" />
</wind>
<clouds value="75" name="broken clouds" />
<visibility value="10000" />
<precipitation mode="no" />
<weather number="803" value="broken clouds" icon="04d" />
<lastupdate value="2021-03-24T16:15:35" />
</current>
In our weather data, current
is the root element with city
, temperature
, wind
and the other fields at their same level as its child elements.
An XML element is everything including the start tag <element>
to the element’s end tag </element>
. Some tags can close themselves, as in <coord />
. Elements themselves can contain:
- Text -
US
in<country>US</country>
- Attributes -
lon="-83.3999"
andlat="42.6667"
in the<coord>
element<coord lon="-83.3999" lat="42.6667"/>
- Attributes are designed to contain data related to a specific element. This will be especially important when we are parsing our data values. They can be emitted in a way that comes off a little strange but are still valid, such as
<foo _="dance"></foo>
.
- Attributes are designed to contain data related to a specific element. This will be especially important when we are parsing our data values. They can be emitted in a way that comes off a little strange but are still valid, such as
- Child elements -
<city>
and<coord>
are other elements in the<current>
element.
The relationships between elements are described by the terms parent, child, and sibling.
What is XPath?
The Telegraf XML Parser breaks down an XML string into metric fields using XPath expressions and supports most XPath 1.0 functionality. The parser will use XPath syntax to identify and navigate XPath nodes in your XML data. XPath supports over 200 functions, and the functions supported by Telegraf XML Parser are listed in the underlying library repository.
Note: Usually XPath expressions select a node or a node-set and you have call functions like string()
or number()
to access the node’s content. However, when we discuss the Telegraf XML Parser Plugin in more detail below, you’ll see that it handles this in the following way for convenience: both metric_selection
and field_selection
only select the node or node-set, so they are normal XPath expressions. However, all other queries will return the node’s “string-value” according to the XPath specification. You can convert the types using functions as shown below.
I found this XPath tutorial particularly helpful in understanding XPath terminology and expressions. There is also this XPath cheat sheet that gives you a one page view of using XPath selectors, expressions, functions and more.
Before parsing any data, take a look at your XML and understand the nodes and node-sets of the data you want to parse. This XPath tester will come in really handy in testing out XPath functions and making sure you are querying the correct path to parse specific XML nodes.
Path |
Description |
XML returned |
---|---|---|
current | Selects the child node(s) with the name of current relative to the current node. It will not descent in the node tree and only searches the children of the current node |
|
/current | Selects the root element current |
|
current/city | Selects all city elements that are children of current |
|
//city | Selects all city elements no matter where they are in the document |
|
current//country | Selects all country elements within the current element, no matter where they are in the XML Tree |
|
current//@name | Selects ALL attributes named name |
|
current/city/@name Or //city/@name | Selects attributes named name under city element |
|
current/city/* | Selects all the child element nodes under the city element |
|
current/city/@* | Selects all attributes in the city element |
|
W3Schools provides an extensive list of XPath syntax and dives deep into XPath axes with additional examples.
Configuring Telegraf to ingest XML
XML is currently one of the many supported input data formats for Telegraf. This means that any input plugin containing the data_format
option can be set to xml
and begin parsing your XML data, like this:
data_format = "xml"
Let’s discuss how to get your configuration just right to get that XML data into InfluxDB. As mentioned above, the XML parser breaks down an XML string into metric fields using XPath expressions. XPath expressions are what the parser uses to identify and navigate nodes in your XML data.
Here is the plugin’s default configuration for using the XML parser. As with other Telegraf configs, commented lines start with a pound sign (#
).
[[inputs.tail]]
files = ["example.xml"]
## Data format to consume.
## Each data format has its own unique set of configuration options, read
## more about them here:
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "xml"
## Multiple parsing sections are allowed
[[inputs.tail.xml]]
## Optional: XPath-query to select a subset of nodes from the XML document.
#metric_selection = "/Bus/child::Sensor"
## Optional: XPath-query to set the metric (measurement) name.
#metric_name = "string('example')"
## Optional: Query to extract metric timestamp.
## If not specified the time of execution is used.
#timestamp = "/Gateway/Timestamp"
## Optional: Format of the timestamp determined by the query above.
## This can be any of "unix", "unix_ms", "unix_us", "unix_ns" or a valid Golang
## time format. If not specified, a "unix" timestamp (in seconds) is expected.
#timestamp_format = "2006-01-02T15:04:05Z"
## Tag definitions using the given XPath queries.
[inputs.tail.xml.tags]
name = "substring-after(Sensor/@name, ' ')"
device = "string('the ultimate sensor')"
## Integer field definitions using XPath queries.
[inputs.tail.xml.fields_int]
consumers = "Variable/@consumers"
## Non-integer field definitions using XPath queries.
## The field type is defined using XPath expressions such as number(), boolean() or string(). If no conversion is performed the field will be of type string.
[inputs.tail.xml.fields]
temperature = "number(Variable/@temperature)"
power = "number(Variable/@power)"
frequency = "number(Variable/@frequency)"
ok = "Mode != 'ok'"
Let’s walk through all the steps and components that will make up your XML parser configuration. Whenever you are setting up an XPath query in your configuration, the specified path can be absolute (starting with /
) or relative. Relative paths use the currently selected node as reference.
- Select subset of nodes you want to parse (optional)
If you wish to parse only a subset of your XML data, you will use the
metric_selection
field to designate which part. In our weather example, say we only wanted to parse the data under thewind
element, we would set this tocurrent//wind
. Let's go ahead and actually read the entire weather XML document, so I'm going to set mymetric_selection = "/current"
. There will be one metric per node selected bymetric_selection
. A benefit of setting this field is that in subsequent configuration fields, I won't want to add"current/"
to my query's pathname. - Set measurement name (optional)
You can override the default measurement name (which will most likely be the plugin name) by setting the
metric_name
field. I'm going to setmetric_name = "'weather'"
to change the measurement name fromhttp
toweather
. You can also set the XPath query formetric_name
to derive the measurement name directly from a node in the XML document. - Set the value you want as your timestamp and its format (optional)
If your XML data contains a specific timestamp you want to assign to your metrics, you will need to set the XPath query of that value. Our weather data has a
lastupdate
value that indicates the exact time this weather data was recorded. I'll settimestamp = "lastupdate/@value"
to read in that value as my timestamp. If thetimestamp
field isn't set, the current time will be used as the timestamp for all created metrics.From there, you can designate the format of the timestamp you just selected. Thistimestamp_format
can be set tounix
,unix_ms
,unix_us
,unix_ns
, or an accepted Go "reference time". Iftimestamp_format
isn't configured, Telegraf will assume yourtimestamp
query is inunix
format. - Set the tags you want from your XML data
To designate the values in your XML you want as your tags, you will need to configure a tags subsection
[inputs.http.xml.tags]
. In your subsection you will add a line for each tag intag-name = query
format with the XPath query. For our weather data, I will add the city and country names as tags withcity = "city/@name"
andcountry = "city/country"
. Multiple tags can be set under one subsection. - Configure the fields of integer type you want from your XML data
For your XML data values that are integers that you want to read in as fields, you must configure the field names and XPath queries in a
fields_int
subsection such as[inputs.tail.xml.fields_int]
. This is because XML values are limited to a single type, string, so all your data will be of type string if not converted by an XPath function. This will follow thefield_name = query
format. In our weather data, values such as humidity and clouds are always integers so we will configure them in this subsection. Results of these field_int-queries will always be converted to int64.[inputs.http.xml.fields_int] humidity = "humidity/@value" clouds = "clouds/@value"
- Configure the rest of your fields. Be sure to indicate the data type in the XPath function.
To add non-integer fields to the metrics, you will add the proper XPath query in a general fields subsection (ex:
[inputs.http.xml.fields]
) in thefield_name = query
format. It's crucial here to specify the data type of the field in your XPath query using the type conversion functions of XPath such asnumber()
,boolean()
orstring()
. If no conversion is performed in the query, the field will be of type string. In our weather data we have a combination of number and string values. For example, our wind speed is a number and will be specified aswind_speed = "number(wind/speed/@value)"
whereas the wind description is text and will be formatted as a string inwind_desc = "string(wind/speed/@name)"
. - Select a set of nodes from your XML data you want to parse as fields (optional)
If you have a large XML file with a large number of fields that would otherwise need to be individually configured, you can select a subset of them by configuring
field_selection
with an XPath query to the selection of nodes. This setting will also be commonly used if the node names are not yet known (ex: value of precipitation is not populated unless it's actively raining). Each node that is selected byfield_selection
forms a new field within the metric.You can set the name and value of each field by using the optionalfield_name
andfield_value
XPath queries. If these queries are not specified, the field's name defaults to the node name and the field's value defaults to the content of the selected field node. It is important to note thatfield_name
andfield_value
queries are only used iffield_selection
is specified. You can also use these settings in combination with the other field specification subsections.Based on the multi-node London bicycle example below, to retrieve all the attributes in theinfo
elements, yourfield_selection
settings would be configured asfield_selection = "child::info" field_name = "name(@*[1])" field_value = "number(@*[1])"
- Expand field names to a path relative to the selected node (optional)
If you want your field names that have been selected with
field_selection
to be expanded to a path relative to the selected node, you will need to setfield_name_expansion = true
. This settings allows you to flatten out nodes with non-unique names in the subtree. This would be necessary if we selected all leaf nodes as fields and those leaf nodes did not have unique names. Iffield_name_expansion
wasn't set, we would end up with duplicate names in the fields.
Examples!
Basic Parsing example: OpenWeather XML data
I have been referencing the OpenWeatherMap XML API response so far in this blog when explanationing XML concepts and steps on configuring your XML parser. This configuration should help you understand how to parse somewhat simple XML data with Telegraf. There is also a 5 day OpenWeather forecast test case in the plugin’s testcases folder.
You can sign up for a free API key to retrieve this XML data over HTTP. Once you have your API key (this may take a few hours after signing up), you can set your URL to specify the location(s) of your weather. My configuration below retrieves Oakland, New York, and London current weather data in imperial units (blame us Americans not knowing the metric system :)). If you want to test the example below make sure you set your API_KEY
as an environment variable to be read by the Telegraf config.
Weather configuration:
[[inputs.http]]
## OpenWeatherMap API, need to register for $API_KEY: https://openweathermap.org/api
urls = [
"http://api.openweathermap.org/data/2.5/weather?q=Oakland&appid=$API_KEY&mode=xml&units=imperial",
"http://api.openweathermap.org/data/2.5/weather?q=New%20York&appid=$API_KEY&mode=xml&units=imperial", "http://api.openweathermap.org/data/2.5/weather?q=London&appid=$API_KEY&mode=xml&units=imperial"
]
data_format = "xml"
## Drop url and hostname from list of tags
tagexclude = ["url", "host"]
## Multiple parsing sections are allowed
[[inputs.http.xml]]
## Optional: XPath-query to select a subset of nodes from the XML document.
metric_name = "'weather'"
## Optional: XPath-query to set the metric (measurement) name.
metric_selection = "/current"
## Optional: Query to extract metric timestamp.
## If not specified the time of execution is used.
timestamp = "lastupdate/@value"
## Optional: Format of the timestamp determined by the query above.
## This can be any of "unix", "unix_ms", "unix_us", "unix_ns" or a valid Golang
## time format. If not specified, a "unix" timestamp (in seconds) is expected.
timestamp_format = "2006-01-02T15:04:05"
## Tag definitions using the given XPath queries.
[inputs.http.xml.tags]
city = "city/@name"
country = "city/country"
## Integer field definitions using XPath queries.
[inputs.http.xml.fields_int]
humidity = "humidity/@value"
clouds = "clouds/@value"
## Non-integer field definitions using XPath queries.
## The field type is defined using XPath expressions such as number(), boolean() or string(). If no conversion is performed the field will be of type string.
[inputs.http.xml.fields]
temperature = "number(/temperature/@value)"
precipitation = "number(precipitation/@value)"
wind_speed = "number(wind/speed/@value)"
wind_desc = "string(wind/speed/@name)"
clouds_desc = "string(clouds/@name)"
lat = "number(city/coord/@lat)"
lon = "number(city/coord/@lon)"
## If "precipitation/@mode" value returns "no", is_it_raining will return false
is_it_raining = "precipitation/@mode = 'yes'"
Most of the settings for this weather configuration are explained above. The last field for is_it_raining
displays how you can use an XPath operator in your configuration to return a node-set, a string, a Boolean, or a number:
is_it_raining = "precipitation/@mode = 'yes'"
Weather output:
weather,city=New\ York,country=US clouds=1i,clouds_desc="clear sky",humidity=38i,is_it_raining=false,lat=40.7143,lon=-74.006,precipitation=0,temperature=58.15,wind_desc="Gentle Breeze",wind_speed=8.05 1617128228000000000
weather,city=London,country=GB clouds=0i,clouds_desc="clear sky",humidity=24i,is_it_raining=false,lat=51.5085,lon=-0.1257,precipitation=0,temperature=66.56,wind_desc="Light breeze",wind_speed=5.75 1617128914000000000
weather,city=Oakland,country=US clouds=90i,clouds_desc="overcast clouds",humidity=34i,is_it_raining=false,lat=42.6667,lon=-83.3999,precipitation=0,temperature=64.54,wind_desc="Moderate breeze",wind_speed=17.27 1617128758000000000
Multi-node selection example: COVID-19 Vaccine Distribution Allocations by Jurisdiction
Your XML data will commonly contain similar metrics for multiple sections (each section could be a different device; in this example, each section represents a different jurisdiction). You can use the XML Parser for multi-node selection to generate metrics for each chunk of data.
Considering this blog is being written during spring 2021, there is plenty of COVID-19 data out there. To stay somewhat optimistic, let’s take a look at some COVID-19 vaccine XML data provided from the Center of Disease Control (CDC). The CDC provides weekly allocation of vaccines by jurisdiction. There is an HTTP XML file for each vaccine manufacturer: Moderna, Pfizer or Janssen/Johnson & Johnson. Each vaccine has its own personality type too!
This COVID vaccine XML data will be a good example on how to do multi-node selection with the XML parser.
<response>
<row>
<row _id="row-vuan~mg8h_vwjk" _uuid="00000000-0000-0000-9614-D811B3DD0141" _position="0" _address="https://data.cdc.gov/resource/saz5-9hgg/row-vuan~mg8h_vwjk">
<jurisdiction>Connecticut</jurisdiction>
<week_of_allocations>2021-04-05T00:00:00</week_of_allocations>
<_1st_dose_allocations>50310</_1st_dose_allocations>
<_2nd_dose_allocations>50310</_2nd_dose_allocations>
</row>
<row _id="row-suay.uwx5_hiiz" _uuid="00000000-0000-0000-C448-E7F5D3B8E3CA" _position="0" _address="https://data.cdc.gov/resource/saz5-9hgg/row-suay.uwx5_hiiz">
<jurisdiction>Maine</jurisdiction>
<week_of_allocations>2021-04-05T00:00:00</week_of_allocations>
<_1st_dose_allocations>19890</_1st_dose_allocations>
<_2nd_dose_allocations>19890</_2nd_dose_allocations>
</row>
<row _id="row-dhdq_gsf8~rzrd" _uuid="00000000-0000-0000-6882-622E1430CDFA" _position="0" _address="https://data.cdc.gov/resource/saz5-9hgg/row-dhdq_gsf8~rzrd">
<jurisdiction>Massachusetts</jurisdiction>
<week_of_allocations>2021-04-05T00:00:00</week_of_allocations>
<_1st_dose_allocations>95940</_1st_dose_allocations>
<_2nd_dose_allocations>95940</_2nd_dose_allocations>
</row>
<row _id="row-jehx-8sxy_8dma" _uuid="00000000-0000-0000-56CD-DCA4760B56BC" _position="0" _address="https://data.cdc.gov/resource/saz5-9hgg/row-jehx-8sxy_8dma">
<jurisdiction>New York</jurisdiction>
<week_of_allocations>2021-04-05T00:00:00</week_of_allocations>
<_1st_dose_allocations>153270</_1st_dose_allocations>
<_2nd_dose_allocations>153270</_2nd_dose_allocations>
</row>
<row _id="row-chrx-6f37~qbn9" _uuid="00000000-0000-0000-30C3-4B8A23B1DF14" _position="0" _address="https://data.cdc.gov/resource/saz5-9hgg/row-chrx-6f37~qbn9">
<jurisdiction>New York City</jurisdiction>
<week_of_allocations>2021-04-05T00:00:00</week_of_allocations>
<_1st_dose_allocations>117000</_1st_dose_allocations>
<_2nd_dose_allocations>117000</_2nd_dose_allocations>
</row>
</row>
</response>
The above script was snipped of CDC COVID-19 Vaccine Distribution Allocations by Jurisdiction - Pfizer
This multi-node dataset doesn’t have many child values for us to configure but many parent subsections. We will use week_of_allocations
as our timestamp, jurisdiction
as a tag, _1st_dose_allocations
and _2nd_dose_allocations
as fields. Even though the Janssen/Johnson & Johnson data doesn’t contain the _2nd_dose_allocations
(one and done), we do not need a separate configuration for it but the parser just won’t emit a field for it.
I included the processors.enum
to my configuration. In the XML data itself there is no indicator besides the URL to indicate which manufacturer the data belongs to. The enum processor I configured will add a tag for the manufacturer name for its corresponding URL.
Configuration:
[[inputs.http]]
urls = [
"https://data.cdc.gov/api/views/b7pe-5nws/rows.xml", # Moderna
"https://data.cdc.gov/api/views/saz5-9hgg/rows.xml", # Pfizer
"https://data.cdc.gov/api/views/w9zu-fywh/rows.xml" # Janssen/Johnson & Johnson
]
data_format = "xml"
## Drop hostname from list of tags
tagexclude = ["host"]
[[inputs.http.xml]]
metric_selection = "//row"
metric_name = "'cdc-vaccines'"
timestamp = "week_of_allocations"
timestamp_format = "2006-01-02T15:04:05"
[inputs.http.xml.tags]
state = "jurisdiction"
[inputs.http.xml.fields_int]
1st_dose_allocations = "_1st_dose_allocations"
2nd_dose_allocations = "_2nd_dose_allocations"
[[processors.enum]]
[[processors.enum.mapping]]
## Name of the tag to map. Globs accepted.
tag = "url"
## Destination tag or field to be used for the mapped value. By default the
## source tag or field is used, overwriting the original value.
dest = "vaccine_type"
## Table of mappings
[processors.enum.mapping.value_mappings]
"https://data.cdc.gov/api/views/b7pe-5nws/rows.xml" = "Moderna"
"https://data.cdc.gov/api/views/saz5-9hgg/rows.xml" = "Pfizer"
"https://data.cdc.gov/api/views/w9zu-fywh/rows.xml" = "Janssen"
Output (snippet of output based of the sample of XML vaccine data above full configuration will provide a much larger output)
cdc-vaccines,state=Connecticut,url=https://data.cdc.gov/api/views/saz5-9hgg/rows.xml,vaccine_type=Pfizer 1st_dose_allocations=60840i,2nd_dose_allocations=60840i 1617580800000000000
cdc-vaccines,state=Maine,url=https://data.cdc.gov/api/views/saz5-9hgg/rows.xml,vaccine_type=Pfizer 1st_dose_allocations=23400i,2nd_dose_allocations=23400i 1617580800000000000
cdc-vaccines,state=Massachusetts,url=https://data.cdc.gov/api/views/saz5-9hgg/rows.xml,vaccine_type=Pfizer 1st_dose_allocations=117000i,2nd_dose_allocations=117000i 1617580800000000000
cdc-vaccines,state=New\ York,url=https://data.cdc.gov/api/views/saz5-9hgg/rows.xml,vaccine_type=Pfizer 1st_dose_allocations=188370i,2nd_dose_allocations=188370i 1617580800000000000
cdc-vaccines,state=New\ York\ City,url=https://data.cdc.gov/api/views/saz5-9hgg/rows.xml,vaccine_type=Pfizer 1st_dose_allocations=143910i,2nd_dose_allocations=143910i 1617580800000000000
Using field selectors for batch field processing (example: London bicycle data)
Your XML data will often contain metrics with so many fields that it would be tedious to configure each field in the [inputs.tail.xml.fields]
sub-section. Also, your XML data might generate fields that are unknown during configuration. In these situations, you can use field selectors to parse these metrics.
For our example, we’ll use the London hire for cycle data provided by Transport for London. The data contains the latest time the data was updated (lastUpdate
) that we’ll use as our timestamp. The info
nodes contain the bicycle station status information that we’ll use as our fields.
<stations lastUpdate="1617397861012" version="2.0">
</stations>
<response>
<location id="1" name="River Street , Clerkenwell">
<info terminalName="001023" />
<info lat="51.52916347" />
<info long="-0.109970527" />
<info installDate="1278947280000" />
<temporary>false</temporary>
<info nbBikes="10" />
<info nbEmptyDocks="9" />
<info nbDocks="19" />
</location>
<location id="2" name="Phillimore Gardens, Kensington">
<info terminalName="001018" />
<info lat="51.49960695" />
<info long="-0.197574246" />
<info installDate="1278585780000" />
<temporary>false</temporary>
<info nbBikes="28" />
<info nbEmptyDocks="9" />
<info nbDocks="37" />
</location>
<location id="3" name="Christopher Street, Liverpool Street">
<info terminalName="001012" />
<info lat="51.52128377" />
<info long="-0.084605692" />
<info installDate="1278240360000" />
<temporary>false</temporary>
<info nbBikes="2" />
<info nbEmptyDocks="30" />
<info nbDocks="32" />
</location>
</response>
In our configuration, we’ll still use the metric_selection
option to select all location
nodes. For each location
we then use field_selection
to select all child nodes of the location as field-nodes. This field selection is relative to the selected nodes for each selected field-node we will configure field_name
and field_value
to determine the field’s name and value, respectively. The field_name
pulls the name of the first attribute of the node, while field_value
pulls the value of the first attribute and converts the result to a number.
For our non-numerical fields, we can still use [inputs.tail.xml.fields]
in conjunction with field_selection
. We will still set the node temporary
that contains a string to read in as a field. Also, note that my timestamp is outside my metric_selection
so I had to make sure the XPath query to pull lastUpdate
was an absolute path predicated with /
.
Configuration:
[[inputs.tail]]
files = ["/pathname/london-cycle-for-hire.xml"]
data_format = "xml"
[[inputs.tail.xml]]
metric_selection = "response/child::location"
metric_name = "string('bikes')"
timestamp = "/stations/@lastUpdate"
timestamp_format = "unix_ms"
field_selection = "child::info"
field_name = "name(@*[1])"
field_value = "number(@*[1])"
[inputs.tail.xml.tags]
address = "@name"
id = "@id"
[inputs.tail.xml.fields]
placement = "string(temporary)"
Output:
bikes,address=River\ Street\ \,\ Clerkenwell,host=MBP15-SWANG.local,id=1 installDate=1278947280000,lat=51.52916347,long=-0.109970527,nbBikes=10,nbDocks=19,nbEmptyDocks=9,placement="false",terminalName=1023 1617397861000000000
bikes,address=Phillimore\ Gardens\,\ Kensington,host=MBP15-SWANG.local,id=2 installDate=1278585780000,lat=51.49960695,long=-0.197574246,nbBikes=28,nbDocks=37,nbEmptyDocks=9,placement="false",terminalName=1018 1617397861000000000
bikes,address=Christopher\ Street\,\ Liverpool\ Street,host=MBP15-SWANG.local,id=3 installDate=1278240360000,lat=51.52128377,long=-0.084605692,nbBikes=2,nbDocks=32,nbEmptyDocks=30,placement="false",terminalName=1012 1617397861000000000
More examples
There is a folder of XML test cases in the Telegraf GitHub repository of more examples. If you think you have an example XML document + XML parser configuration that will be helpful to the community, please contribute a PR containing the documents.
Quick tips and other helpful resources
If you’re looking to do generic troubleshooting, be sure to set debug = "true"
in your agent settings and the parser will (for the *_selection settings) walk up the nodes if the selection is empty and print how many children it found. This will help you see which part of the query could be causing the problem.
An XPath tester like XPather or Code Beautify’s XPath Tester will be your best friend while configuring your XML parser to help you make sure you are selecting the proper XPath query for your data. It will make configuration a lot less frustrating when you can visibly see what nodes your XPath query is selecting.
A few syntax things to reiterate are that when you are setting up an XPath query in your configuration, the specified path can be absolute (starting with /
) or relative. This will be important to remember if you are querying a node outside of your metric selection. If you don’t include the starting /
, you’d end up querying a node in your selected metrics that may not exist.
Lastly, something I kept running into when querying to attribute (ex: lon
& lat
in <coord lon="-83.3999" lat="42.6667"/>
) is to remember to include the \
before @
. I would accidentally query current/city/coord@lat
which would result in nothing when the correct query is current/city/coord/@lat
.
Here are some resources that will help you have a better understanding of the Telegraf XML Parser and XPath:
- XPath Golang library used for the Telegraf XML Parser
- Xpath cheatsheet
- The W3Schools XPath Tutorial
- XPather
- XPath Tester and Evaluator
Incredibly massive shoutout to Sven Rebhan for building this plugin!
If you end up with any questions about parsing your XML data, please reach out to us (@Sven Rebhan
if you’d like to chat with Sven specifically) in the #telegraf channel of our InfluxData Community Slack or post any questions on our Community Site.
Want to learn more about data acquisition through Telegraf? Register for free for InfluxDays EMEA to attend Jess Ingrassellino’s “Data Acquisition” talk covering Telegraf, CLI Integration to the cloud, and client libraries, on May 18, 2021.