目录
强大的性能,无限的扩展
收集、组织和处理海量高速数据。当您将任何数据视为时间序列数据时,它会更有价值。借助 InfluxDB,这个使用 Telegraf 构建的排名第一的时间序列平台可以扩展。
查看入门方法
Apache Mesos 是一个用于管理计算机集群的开源项目。它从机器(物理或虚拟)中抽象出 CPU、内存、存储和其他计算资源,从而能够有效构建和运行容错和弹性分布式系统。
为什么使用 Apache Mesos Telegraf 插件?
Apache Mesos Telegraf 插件允许您收集 Mesos 主节点和代理节点提供的可观测性指标,并将它们插入到您的 InfluxDB 实例中。该插件可以收集一组指标,使集群运营商能够监控资源使用情况,并在问题发生之前检测到问题。
如何使用 Telegraf 插件监控 Apache Mesos
Apache Mesos Telegraf 插件将从 Apache Mesos 收集指标并将它们插入到 InfluxDB 中。默认情况下,此插件未配置为从 Mesos 收集指标,因为集群可以以多种方式部署。您需要为此插件指定主/从节点以从中收集指标。
用于监控的关键 Apache Mesos 指标
您应该主动监控的一些重要的 Apache Mesos 指标包括
资源
master/cpus_percent
已分配 CPU 的百分比master/cpus_used
已分配 CPU 的数量master/cpus_total
CPU 的数量master/cpus_revocable_percent
已分配可撤销 CPU 的百分比master/cpus_revocable_total
可撤销 CPU 的数量master/cpus_revocable_used
已分配可撤销 CPU 的数量master/disk_percent
已分配磁盘空间的百分比master/disk_used
已分配磁盘空间,单位为 MBmaster/disk_total
磁盘空间,单位为 MBmaster/disk_revocable_percent
已分配可撤销磁盘空间的百分比master/disk_revocable_total
可撤销磁盘空间,单位为 MBmaster/disk_revocable_used
已分配可撤销磁盘空间,单位为 MBmaster/gpus_percent
已分配 GPU 的百分比master/gpus_used
已分配 GPU 的数量master/gpus_total
GPU 的数量master/gpus_revocable_percent
已分配可撤销 GPU 的百分比master/gpus_revocable_total
可撤销 GPU 的数量master/gpus_revocable_used
已分配可撤销 GPU 的数量master/mem_percent
已分配内存的百分比master/mem_used
已分配内存,单位为 MBmaster/mem_total
内存,单位为 MBmaster/mem_revocable_percent
已分配可撤销内存的百分比master/mem_revocable_total
可撤销内存,单位为 MBmaster/mem_revocable_used
已分配可撤销内存,单位为 MB
主节点
master/elected
这是否为选定的主节点master/uptime_secs
正常运行时间,单位为秒
系统
system/cpus_total
此主节点中可用的 CPU 数量system/load_15min
过去 15 分钟的平均负载system/load_5min
过去 5 分钟的平均负载system/load_1min
过去 1 分钟的平均负载system/mem_free_bytes
可用内存,单位为字节system/mem_total_bytes
总内存,单位为字节
从节点
master/slave_registrations
master/slave_removals
master/slave_reregistrations
master/slave_shutdowns_scheduled
master/slave_shutdowns_canceled
master/slave_shutdowns_completed
master/slaves_active
master/slaves_connected
master/slaves_disconnected
master/slaves_inactive
master/slave_unreachable_canceled
master/slave_unreachable_completed
master/slave_unreachable_scheduled
master/slaves_unreachable
框架
master/frameworks_active
master/frameworks_connected
master/frameworks_disconnected
master/frameworks_inactive
master/outstanding_offers
框架报价
master/frameworks/subscribed
master/frameworks/calls_total
master/frameworks/calls
master/frameworks/events_total
master/frameworks/events
master/frameworks/operations_total
master/frameworks/operations
master/frameworks/tasks/active
master/frameworks/tasks/terminal
master/frameworks/offers/sent
master/frameworks/offers/accepted
master/frameworks/offers/declined
master/frameworks/offers/rescinded
master/frameworks/roles/suppressed
任务
master/tasks_error
master/tasks_failed
master/tasks_finished
master/tasks_killed
master/tasks_lost
master/tasks_running
master/tasks_staging
master/tasks_starting
master/tasks_dropped
master/tasks_gone
master/tasks_gone_by_operator
master/tasks_killing
master/tasks_unreachable
消息
master/invalid_executor_to_framework_messages
master/invalid_framework_to_executor_messages
master/invalid_status_update_acknowledgements
master/invalid_status_updates
master/dropped_messages
master/messages_authenticate
master/messages_deactivate_framework
master/messages_decline_offers
master/messages_executor_to_framework
master/messages_exited_executor
master/messages_framework_to_executor
master/messages_kill_task
master/messages_launch_tasks
master/messages_reconcile_tasks
master/messages_register_framework
master/messages_register_slave
master/messages_reregister_framework
master/messages_reregister_slave
master/messages_resource_request
master/messages_revive_offers
master/messages_status_update
master/messages_status_update_acknowledgement
master/messages_unregister_framework
master/messages_unregister_slave
master/messages_update_slave
master/recovery_slave_removals
master/slave_removals/reason_registered
master/slave_removals/reason_unhealthy
master/slave_removals/reason_unregistered
master/valid_framework_to_executor_messages
master/valid_status_update_acknowledgements
master/valid_status_updates
master/task_lost/source_master/reason_invalid_offers
master/task_lost/source_master/reason_slave_removed
master/task_lost/source_slave/reason_executor_terminated
master/valid_executor_to_framework_messages
master/invalid_operation_status_update_acknowledgements
master/messages_operation_status_update_acknowledgement
master/messages_reconcile_operations
master/messages_suppress_offers
master/valid_operation_status_update_acknowledgements
evqueue
master/event_queue_dispatches
master/event_queue_http_requests
master/event_queue_messages
master/operator_event_stream_subscribers
注册器
registrar/state_fetch_ms
registrar/state_store_ms
registrar/state_store_ms/max
registrar/state_store_ms/min
registrar/state_store_ms/p50
registrar/state_store_ms/p90
registrar/state_store_ms/p95
registrar/state_store_ms/p99
registrar/state_store_ms/p999
registrar/state_store_ms/p9999
registrar/state_store_ms/count
registrar/log/ensemble_size
registrar/log/recovered
registrar/queued_operations
registrar/registry_size_bytes
分配器
allocator/allocation_run_ms
allocator/allocation_run_ms/count
allocator/allocation_run_ms/max
allocator/allocation_run_ms/min
allocator/allocation_run_ms/p50
allocator/allocation_run_ms/p90
allocator/allocation_run_ms/p95
allocator/allocation_run_ms/p99
allocator/allocation_run_ms/p999
allocator/allocation_run_ms/p9999
allocator/allocation_runs
allocator/allocation_run_latency_ms
allocator/allocation_run_latency_ms/count
allocator/allocation_run_latency_ms/max
allocator/allocation_run_latency_ms/min
allocator/allocation_run_latency_ms/p50
allocator/allocation_run_latency_ms/p90
allocator/allocation_run_latency_ms/p95
allocator/allocation_run_latency_ms/p99
allocator/allocation_run_latency_ms/p999
allocator/allocation_run_latency_ms/p9999
allocator/roles/shares/dominant
allocator/event_queue_dispatches
allocator/offer_filters/roles/active
allocator/quota/roles/resources/offered_or_allocated
allocator/quota/roles/resources/guarantee
allocator/resources/cpus/offered_or_allocated
allocator/resources/cpus/total
allocator/resources/disk/offered_or_allocated
allocator/resources/disk/total
allocator/resources/mem/offered_or_allocated
allocator/resources/mem/total
Mesos 从节点指标组
- 资源
slave/cpus_percent
slave/cpus_used
slave/cpus_total
slave/cpus_revocable_percent
slave/cpus_revocable_total
slave/cpus_revocable_used
slave/disk_percent
slave/disk_used
slave/disk_total
slave/disk_revocable_percent
slave/disk_revocable_total
slave/disk_revocable_used
slave/gpus_percent
slave/gpus_used
slave/gpus_total,
slave/gpus_revocable_percent
slave/gpus_revocable_total
slave/gpus_revocable_used
slave/mem_percent
slave/mem_used
slave/mem_total
slave/mem_revocable_percent
slave/mem_revocable_total
slave/mem_revocable_used
- 代理
slave/registered
slave/uptime_secs
- 系统
system/cpus_total
system/load_15min
system/load_5min
system/load_1min
system/mem_free_bytes
system/mem_total_bytes
- 执行器
containerizer/mesos/container_destroy_errors
slave/container_launch_errors
slave/executors_preempted
slave/frameworks_active
slave/executor_directory_max_allowed_age_secs
slave/executors_registering
slave/executors_running
slave/executors_terminated
slave/executors_terminating
slave/recovery_errors
- 任务
slave/tasks_failed
slave/tasks_finished
slave/tasks_killed
slave/tasks_lost
slave/tasks_running
slave/tasks_staging
slave/tasks_starting
- 消息
slave/invalid_framework_messages
slave/invalid_status_updates
slave/valid_framework_messages
slave/valid_status_updates
您可以在其文档页面上了解有关 Apache Meso 指标的更多信息。
有关更多信息,请查看文档。
强大的性能,无限的扩展
收集、组织和处理海量高速数据。当您将任何数据视为时间序列数据时,它会更有价值。借助 InfluxDB,这个使用 Telegraf 构建的排名第一的时间序列平台可以扩展。
查看入门方法