使用 InfluxDB 预测下一场物种灭绝事件
作者:Mark Herring / 产品,用例,开发者
另一个标题可能是“如何将 JSON 数据导入 InfluxDB Cloud 2.0”,但这听起来太无聊了!
好吧,也许这个标题有点过分,但我想用我的新 InfluxDB Cloud 账户 捕获一些真实数据(我选择了陨石,因此有了这个标题),并看看我多快能可视化这些数据。完全坦白 我之前从未尝试过!
- 第一部分:令人惊叹的方法
- 第二部分:逐步现实
就像那些烹饪节目一样,这里有几种成分……Tada 这里是蛋糕……
步骤 1 – 找到数据
我寻找了一些 有趣的数据集,并选择了 地球陨石着陆。为什么?我想预测世界末日——还有什么方法比陨石摧毁地球更好。嘿,也许那应该是一部电影……哦,是的,我做过。
"name": "Aachen",
"id": "1",
"nametype": "Valid",
"recclass": "L5",
"mass": "21",
"fall": "Fell",
"year": "1880-01-01T00:00:00.000",
"reclat": "50.775000",
"reclong": "6.083330",
"geolocation": {
"type": "Point",
"coordinates": [
步骤 2 – 使用 Telegraf 解析 JSON
下一步——配置 Telegraf 以连接到数据集并提取有趣的数据。以下是我的 Telegraf 配置文件的一个片段
interval = "10s"
## One or more URLs from which to read formatted metrics
urls = [
name_override = "meteorevent"
tagexclude = ["url"]
## HTTP method
method = "GET"
## Tag keys is an array of keys that should be added as tags.
tag_keys = [
## String fields is an array of keys that should be added as string fields.
json_string_fields = [
## Name key is the key to use as the measurement name.
json_name_key = ""
## Time key is the key containing the time that should be used to create the
## metric.
json_time_key = "year"
## Time format is the time layout that should be used to interprete the json_time_key.
## The time must be `unix`, `unix_ms`, `unix_us`, `unix_ns`, or a time in the
## "reference time". To define a different format, arrange the values from
## the "reference time" in the example to match the format you will be
## using. For more information on the "reference time", visit
## https://golang.ac.cn/pkg/time/#Time.Format
## ex: json_time_format = "Mon Jan 2 15:04:05 -0700 MST 2006"
## json_time_format = "2006-01-02T15:04:05Z07:00"
## json_time_format = "01/02/2006 15:04:05"
## json_time_format = "unix"
## json_time_format = "unix_ms"
json_time_format = "2006-01-02T15:04:05.000"
## Timezone allows you to provide an override for timestamps that
## don't already include an offset
## e.g. 04/06/2016 12:41:45
## Default: "" which renders UTC
## Options are as follows:
## 1. Local -- interpret based on machine localtime
## 2. "America/New_York" -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
## 3. UTC -- or blank/unspecified, will return timestamp in UTC
#json_timezone = ""
integer = ["mass"]
第 3 步——使用 Telegraf 将数据发送到 InfluxDB 云
## The URLs of the InfluxDB cluster nodes.
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
## urls exp:
urls = ["https://PUT IN YOUR URL.influxdata.com"]
## Token for authentication.
token = "$INFLUX_TOKEN"
## Organization is the name of the organization you wish to write to; must exist.
## Destination bucket to write into.
bucket = "Meteroite3"
然后只需启动 telegraf
telegraf --config mytelegraf.conf
第 4 步——可视化数据
登录 InfluxDB 云,并构建一个查询
from(bucket: "Meteroite3")
|> range(start: -100y)
|> filter(fn: (r) => r._field == "mass")
|> group()
|> aggregateWindow(every: 1y, fn: count, createEmpty: false)
第 0 步——创建 Cloud 2.0 账户
这很简单:我在 InfluxDB Cloud 上注册了账号,验证了我的电子邮件并登录了。我被 Samantha Wang 的博客 所启发,但想使用“我自己的”数据集。我选择了上述描述的数据集,并准备出发。
第 1 步——配置 Telegraf
安装 Telegraf 非常简单,文档非常清晰 ,而且它很快就运行起来了。现在我需要做的就是配置它!回到 Cloud,从 JSON 获取数据。
非常喜欢“加载数据”的可视化……但“从 JSON 加载”在哪里?呃……回到文档,去了 GitHub,现在我已经深入到洞中!
在 GitHub 上,我了解到 JSON 插件解析器并为 Telegraf 创建了我的第一个配置文件。
files = ["example"]
name_override = "meteorevent"
tagexclude = ["url"]
## HTTP method
method = "GET"
## Tag keys is an array of keys that should be added as tags.
tag_keys = [
## String fields is an array of keys that should be added as string fields.
json_string_fields = [
## Name key is the key to use as the measurement name.
json_name_key = ""
## Time key is the key containing the time that should be used to create the
## metric.
json_time_key = "year"
## Time format is the time layout that should be used to interprete the json_time_key.
## The time must be `unix`, `unix_ms`, `unix_us`, `unix_ns`, or a time in the
## "reference time". To define a different format, arrange the values from
## the "reference time" in the example to match the format you will be
## using. For more information on the "reference time", visit
## https://golang.ac.cn/pkg/time/#Time.Format
## ex: json_time_format = "Mon Jan 2 15:04:05 -0700 MST 2006"
## json_time_format = "2006-01-02T15:04:05Z07:00"
## json_time_format = "01/02/2006 15:04:05"
## json_time_format = "unix"
## json_time_format = "unix_ms"
json_time_format = "2006-01-02T15:04:05.000"
## Timezone allows you to provide an override for timestamps that
## don't already include an offset
## e.g. 04/06/2016 12:41:45
## Default: "" which renders UTC
## Options are as follows:
## 1. Local -- interpret based on machine localtime
## 2. "America/New_York" -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
## 3. UTC -- or blank/unspecified, will return timestamp in UTC
#json_timezone = ""
哦不,但我没有文件——我的是一个 URL。我考虑了“作弊”,但不行,我想这是一个“真实示例”,所以我找到了从 URL 获取数据的方法。更新了我的配置文件顶部
interval = "10s"
## One or more URLs from which to read formatted metrics
urls = [
太棒了!数据正在进来(我希望)。哦,我该如何让它去我的 Cloud 实例呢?回到文档,我发现了关于 手动配置 Telegraf 的信息。
urls = ["put in your URL"]
token = "$INFLUX_TOKEN"
organization = "orgname"
bucket = "example-bucket"
回到 Cloud,创建了我的存储桶,创建了我的令牌。好吧,我真的准备好了。
启动 telegraf
telegraf --config mytelegraf.conf
Error in plugin: [url=https://data.nasa.gov/resource/y77d-th95.json]: parsing time "1880-01-01T00:00:00.000" as "unix": cannot parse "1880-01-01T00:00:00.000" as "unix"
json_time_format = "2006-01-02T15:04:05Z07:00"
是错误的,但应该是什么?从 JSON 中,我可以看到它是
"year": "1880-01-01T00:00:00.000",
但应该将 json_time_format = "2006-01-02T15:04:05Z07:00"
我想我试了所有的方法……所以我“打电话给朋友”,David McKay,这位出色的 DevRel,离我只有一个 Slack 消息的距离。他回复了
json_time_format = "2006-01-02T15:04:05.000"
integer = ["mass"]
float = ["reclat", "reclong"]
[inputs.http]: Error in plugin: [url=https://data.nasa.gov/resource/y77d-th95.json]: JSON time key could not be found
检查 Cloud……没有数据!呃……
好吧,我已经非常接近了……怎么办?我短暂地考虑了一下放弃——不,我不会这么做!是时候“再叫一个朋友”了——(旁白,为什么不再叫 David?因为我想如果我将请求并行发送到组织,没有人会理解我有多“贪婪”)。但我是个自找麻烦的人,所以这一切都在这篇博客中!Russ Savage 来帮忙(Russ 是 InfluxData 的产品管理总监)。他发现 JSON 中有一些条目没有年份值。对我来说又是另一个学习经验——如果 Telegraf 在 JSON 中发现错误,它就会停止……什么都不会写!
那么该怎么办?好吧,我作弊了……我复制了 JSON 文件,删除了有问题的条目,保存并回到
files = ["mynewdata.JSON"]
第 2 步——获取数据...
成功……Telegraf 已从 JSON 获取数据到 Cloud 2.0。然后我在 15 秒后杀死了 telegraf 进程。无需再次运行相同的文件。还记录了几个针对 Telegraf 的问题。
步骤 3 – 可视化数据
from(bucket: "Meteroite3")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "file")
|> filter(fn: (r) => r._field == "mass")
|> aggregateWindow(every: v.windowPeriod, fn: sum)
|> yield(name: "sum")
from(bucket: "Meteroite3")
|> range(start: -100y)
|> filter(fn: (r) => r._field == "mass")
|> group()
|> aggregateWindow(every: 1y, fn: sum, createEmpty: false)
步骤 4 – 预测未来!
from(bucket: "Meteroite3")
|> range(start: -100y, stop:-8y)
|> filter(fn: (r) => r._field == "mass")
|> group()
|> aggregateWindow(every: 1y, fn: count, createEmpty: false)
|> holtWinters(n: 10, seasonality: 0, interval: 1y, withFit: true)
|> yield(name:"prediction")
"为了使用三重指数平滑(Holt-Winters)或双指数平滑,您的数据需要表现出趋势和季节性,或者只是趋势。您的数据没有明显的季节性或趋势。" – Anais Dotis-Georgiou