使用 InfluxDB 预测下一次物种灭绝事件

作者：Mark Herring / 产品, 用例, 开发者
2019 年 10 月 21 日

导航至

另一个标题可以是“如何将 JSON 导入 InfluxDB Cloud 2.0”，但这听起来太无聊了！

好吧，也许这个标题有点牵强，但我想用我的新 InfluxDB Cloud 账户摄取一些真实数据（我选择了陨石，因此有了这个标题），看看我能多快可视化数据。完全公开——我以前从未尝试过！

我将这篇博文分为两个主要部分

第 1 节：快速实现精彩的方法
第 2 节：逐步实现的现实

第 1 节：快速实现精彩的方法

就像那些烹饪节目一样，这里有一些配料……当当当，蛋糕做好了……

步骤 1 – 查找数据

我查找了一些有趣的数据集，并选择了地球陨石着陆。为什么？我想预测世界末日——有什么比陨石摧毁地球更好的方式呢？嘿，也许这应该拍成电影……哦，是的，已经有了。

数据

 {
  "name": "Aachen",
  "id": "1",
  "nametype": "Valid",
  "recclass": "L5",
  "mass": "21",
  "fall": "Fell",
  "year": "1880-01-01T00:00:00.000",
  "reclat": "50.775000",
  "reclong": "6.083330",
  "geolocation": {
  "type": "Point",
  "coordinates": [
6.08333,
50.775
  ]
    }
       },

步骤 2 – 使用 Telegraf 解析 JSON

下一步——配置 Telegraf 连接到数据集并提取有趣的数据。这是我的 Telegraf 配置文件的一个片段

[[inputs.http]]
  interval = "10s"

  ## One or more URLs from which to read formatted metrics
  urls = [
    "https://data.nasa.gov/resource/y77d-th95.json"
  ]

  name_override = "meteorevent"

  tagexclude = ["url"]

  ## HTTP method
  method = "GET"
## Tag keys is an array of keys that should be added as tags.
      tag_keys = [
    "recclass"

  ]

  ## String fields is an array of keys that should be added as string fields.

  ## String fields is an array of keys that should be added as string fields.
  json_string_fields = [
      "fall",
    "name",
    "mass"]

  ## Name key is the key to use as the measurement name.
  json_name_key = ""

  ## Time key is the key containing the time that should be used to create the
  ## metric.
  json_time_key = "year"

  ## Time format is the time layout that should be used to interprete the json_time_key.
  ## The time must be `unix`, `unix_ms`, `unix_us`, `unix_ns`, or a time in the
  ## "reference time".  To define a different format, arrange the values from
  ## the "reference time" in the example to match the format you will be
  ## using.  For more information on the "reference time", visit
  ## https://golang.ac.cn/pkg/time/#Time.Format
  ##   ex: json_time_format = "Mon Jan 2 15:04:05 -0700 MST 2006"
  ##       json_time_format = "2006-01-02T15:04:05Z07:00"
  ##       json_time_format = "01/02/2006 15:04:05"
  ##       json_time_format = "unix"
  ##       json_time_format = "unix_ms"

json_time_format = "2006-01-02T15:04:05.000"

  ## Timezone allows you to provide an override for timestamps that
  ## don't already include an offset
  ## e.g. 04/06/2016 12:41:45
  ##
  ## Default: "" which renders UTC
  ## Options are as follows:
  ##   1. Local               -- interpret based on machine localtime
  ##   2. "America/New_York"  -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
  ##   3. UTC                 -- or blank/unspecified, will return timestamp in UTC
  #json_timezone = ""

  [[processors.converter]]
  [processors.converter.fields]
   integer = ["mass"]

步骤 3 – 使用 Telegraf 将数据发送到 InfluxDB Cloud

设置您的令牌，输入组织和您想要数据落地的存储桶

[[outputs.influxdb_v2]]	
  ## The URLs of the InfluxDB cluster nodes.
  ##
  ## Multiple URLs can be specified for a single cluster, only ONE of the
  ## urls will be written to each interval.
  ## urls exp: http://127.0.0.1:9999
  urls = ["https://PUT IN YOUR URL.influxdata.com"]

  ## Token for authentication.
  token = "$INFLUX_TOKEN"

  ## Organization is the name of the organization you wish to write to; must exist.
  organization = "PUT IN YOUR ORGANIZATION"

  ## Destination bucket to write into.
  bucket = "Meteroite3"

然后启动 telegraf

telegraf --config mytelegraf.conf

步骤 4 – 可视化数据

from(bucket: "Meteroite3")
  |> range(start: -100y)
  |> filter(fn: (r) => r._field == "mass")
  |> group()
  |> aggregateWindow(every: 1y, fn: count, createEmpty: false)

并显示结果。

哇，是不是很简单！

第 2 节：逐步实现的现实

我还决定告诉您我的真实旅程是什么样的。我希望这个旅程能够帮助其他正在旅程中的人，因为这个平台并没有我在那些精彩的营销文献中读到的那么容易，但我知道下次会容易得多。

步骤 0 – 创建 Cloud 2.0 账户

这真的很简单：我注册了 InfluxDB Cloud，验证了我的电子邮件并登录了。我受到了 Samantha Wang 的博客的启发，但想使用“我自己的”数据集。我选择了上面描述的数据集，并准备开始了。

步骤 1 – 配置 Telegraf

安装 Telegraf 非常容易，文档非常清晰，而且它很快就运行起来了。现在我所需要做的就是配置它！回到 Cloud，从 JSON 获取数据。

喜欢“加载您的数据”可视化……但是“从 JSON 加载”在哪里？ Grrr.. 回到文档，转到 GitHub，现在我越来越深入兔子洞了！

在 GitHub 上，我阅读了 JSON 插件解析器，并为 Telegraf 创建了我的第一个配置文件。

[[inputs.file]]
  files = ["example"]

  name_override = "meteorevent"

  tagexclude = ["url"]

  ## HTTP method
  method = "GET"
## Tag keys is an array of keys that should be added as tags.
      tag_keys = [
    "recclass"

  ]

  ## String fields is an array of keys that should be added as string fields.

  ## String fields is an array of keys that should be added as string fields.
  json_string_fields = [
      "fall",
    "name",
    "mass"]

  ## Name key is the key to use as the measurement name.
  json_name_key = ""

  ## Time key is the key containing the time that should be used to create the
  ## metric.
  json_time_key = "year"

  ## Time format is the time layout that should be used to interprete the json_time_key.
  ## The time must be `unix`, `unix_ms`, `unix_us`, `unix_ns`, or a time in the
  ## "reference time".  To define a different format, arrange the values from
  ## the "reference time" in the example to match the format you will be
  ## using.  For more information on the "reference time", visit
  ## https://golang.ac.cn/pkg/time/#Time.Format
  ##   ex: json_time_format = "Mon Jan 2 15:04:05 -0700 MST 2006"
  ##       json_time_format = "2006-01-02T15:04:05Z07:00"
  ##       json_time_format = "01/02/2006 15:04:05"
  ##       json_time_format = "unix"
  ##       json_time_format = "unix_ms"

json_time_format = "2006-01-02T15:04:05.000"

  ## Timezone allows you to provide an override for timestamps that
  ## don't already include an offset
  ## e.g. 04/06/2016 12:41:45
  ##
  ## Default: "" which renders UTC
  ## Options are as follows:
  ##   1. Local               -- interpret based on machine localtime
  ##   2. "America/New_York"  -- Unix TZ values like those found in https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
  ##   3. UTC                 -- or blank/unspecified, will return timestamp in UTC
  #json_timezone = ""

哦不，但我没有文件——我的是一个 URL。我想过“作弊”，但是不，我想让它成为一个“真实的例子”，所以我找到了如何从 URL 获取数据的方法。更新了我的配置文件顶部

[[inputs.http]]
  interval = "10s"

  ## One or more URLs from which to read formatted metrics
  urls = [
    "https://data.nasa.gov/resource/y77d-th95.json"
  ]

太棒了！数据进来了（我希望）。哦，我如何告诉它去我的 Cloud 实例？回到文档，我在那里找到了关于手动配置 Telegraf 的信息。

[[outputs.influxdb_v2]]
  urls = ["put in your URL"]
  token = "$INFLUX_TOKEN"
  organization = "orgname"
  bucket = "example-bucket"

回到 Cloud，创建了我的存储桶，创建了我的令牌。好的，我真的准备好了。

启动 telegraf

telegraf --config mytelegraf.conf

然后…

ERROR:
Error in plugin: [url=https://data.nasa.gov/resource/y77d-th95.json]: parsing time "1880-01-01T00:00:00.000" as "unix": cannot parse "1880-01-01T00:00:00.000" as "unix"

好吧，我知道我的

json_time_format = "2006-01-02T15:04:05Z07:00"

是错误的，但它应该是什么？从 JSON 中，我可以看到它是

"year": "1880-01-01T00:00:00.000",

但是要将 json_time_format = "2006-01-02T15:04:05Z07:00" 更改为什么呢？

我想我尝试了一切……所以我“打电话给朋友”，David McKay，这位杰出的开发者关系专家，碰巧只需一条 Slack 消息即可联系到。他回复说

json_time_format = "2006-01-02T15:04:05.000"

好的，我不知道他是怎么得到这个的……在文档中看不到，但除了惊叹于他的优秀之外，我渴望继续……他还为我的配置文件提供了一条建议。

[[processors.converter]]
 [processors.converter.fields]
   integer = ["mass"]
   float = ["reclat", "reclong"]

我更新了我的配置……开始吧！

哎呀，又一个错误

[inputs.http]: Error in plugin: [url=https://data.nasa.gov/resource/y77d-th95.json]: JSON time key could not be found

检查了 Cloud……没有数据！ Grrrrrr。

好的，我已经非常接近了……该怎么办？我短暂地想了一下放弃——不，我不会那样做！是时候“再打电话给另一个朋友”了——（旁注，为什么不再给 David 打电话？好吧，我想如果我并行地向整个组织发送我的请求，就不会有人理解我有多“需要帮助”）。但我是一个自讨苦吃的人，所以一切都在这篇博客中揭示了！ Russ Savage 来救援了（Russ 是 InfluxData 的产品管理总监）。他发现 JSON 中有一些条目没有年份值。对我来说又是另一次学习经历——如果 Telegraf 在 JSON 中发现错误，它就会停止……什么也不会写入！

那么该怎么办呢？好吧，我作弊了……我复制了 JSON 文件，删除了有问题的条目，保存了它，然后回到

[[inputs.file]]
  files = ["mynewdata.JSON"]

步骤 2 – 获取数据…

成功……Telegraf 将数据从 JSON 获取到 Cloud 2.0。然后在 15 秒后杀死了 telegraf 进程。无需一遍又一遍地运行同一个文件。还记录了一些针对 Telegraf 的问题。

步骤 3 – 可视化数据

这几乎是最简单的部分。我说“几乎”是因为我首先尝试了 Data Explorer。

新手提示：将时间设置为比过去一小时长得多！我的数据是旧数据！

更改之后，哇……我有数据了！

我学到的是 Data Explorer 很好，但可能对我来说太智能了。

from(bucket: "Meteroite3")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "file")
  |> filter(fn: (r) => r._field == "mass")
  |> aggregateWindow(every: v.windowPeriod, fn: sum)
  |> yield(name: "sum")

我将这个脚本简化为

from(bucket: "Meteroite3")
  |> range(start: -100y)
  |> filter(fn: (r) => r._field == "mass")
  |> group()
  |> aggregateWindow(every: 1y, fn: sum, createEmpty: false)

步骤 4 – 预测未来！

好的，所以我决定使用 Holt-Winters 来预测未来

from(bucket: "Meteroite3")
|> range(start: -100y, stop:-8y)
|> filter(fn: (r) => r._field == "mass")
|> group()
|> aggregateWindow(every: 1y, fn: count, createEmpty: false)
|> holtWinters(n: 10, seasonality: 0, interval: 1y, withFit: true)
|> yield(name:"prediction")

所以它显示我们都将活下去……别担心！

我确实与我们的 ML 专家谈过，这是她说的话

“为了使用三重指数平滑 (Holt-Winters) 或双重指数平滑，您的数据需要分别表现出趋势和季节性或仅表现出趋势。您的数据没有任何明显的季节性或趋势。” – Anais Dotis-Georgiou

感谢所有帮助我预测下一次物种灭绝事件的人！

导航至

试用 InfluxDB Cloud

停止盲目飞行

使用 InfluxDB 预测下一次物种灭绝事件

作者：Mark Herring / 产品, 用例, 开发者
2019 年 10 月 21 日

导航至

第 1 节：快速实现精彩的方法

步骤 1 – 查找数据

步骤 2 – 使用 Telegraf 解析 JSON

步骤 3 – 使用 Telegraf 将数据发送到 InfluxDB Cloud

步骤 4 – 可视化数据

第 2 节：逐步实现的现实

步骤 0 – 创建 Cloud 2.0 账户

步骤 1 – 配置 Telegraf

步骤 2 – 获取数据…

步骤 3 – 可视化数据

步骤 4 – 预测未来！

准备好开始了吗？

InfluxDB 3 Core & Enterprise GA：面向开发者的下一代时间序列平台已面世

数据湖和仓库

InfluxDB for Industrial IoT：
现场演示

时间序列数据库详解

网络监控

时间序列数据分析：2025 年的定义和最佳技术

产品与解决方案

开发者

公司

导航至

试用 InfluxDB Cloud

停止盲目飞行

获取更新

使用 InfluxDB 预测下一次物种灭绝事件

作者：Mark Herring / 产品, 用例, 开发者 2019 年 10 月 21 日

导航至

第 1 节：快速实现精彩的方法

步骤 1 – 查找数据

步骤 2 – 使用 Telegraf 解析 JSON

步骤 3 – 使用 Telegraf 将数据发送到 InfluxDB Cloud

步骤 4 – 可视化数据

第 2 节：逐步实现的现实

步骤 0 – 创建 Cloud 2.0 账户

步骤 1 – 配置 Telegraf

步骤 2 – 获取数据…

步骤 3 – 可视化数据

步骤 4 – 预测未来！

准备好开始了吗？

InfluxDB 3 Core & Enterprise GA：面向开发者的下一代时间序列平台已面世

数据湖和仓库

InfluxDB for Industrial IoT：现场演示

时间序列数据库详解

网络监控

时间序列数据分析：2025 年的定义和最佳技术

产品与解决方案

开发者

公司

注册 InfluxData 新闻通讯

关注我们

作者：Mark Herring / 产品, 用例, 开发者
2019 年 10 月 21 日

InfluxDB for Industrial IoT：
现场演示