目录
强大的性能,无限的扩展
收集、组织和处理海量高速数据。当您将任何数据视为时间序列数据时,它会更有价值。借助 InfluxDB,这个使用 Telegraf 构建的排名第一的时间序列平台可以扩展。
查看入门方法
Apache Mesos 是一个用于管理计算机集群的开源项目。它从机器(物理或虚拟)中抽象出 CPU、内存、存储和其他计算资源,从而能够有效构建和运行容错和弹性分布式系统。
为什么使用 Apache Mesos Telegraf 插件?
Apache Mesos Telegraf 插件允许您收集 Mesos 主节点和代理节点提供的可观测性指标,并将它们插入到您的 InfluxDB 实例中。该插件可以收集一组指标,使集群运营商能够监控资源使用情况,并在问题发生之前检测到问题。
如何使用 Telegraf 插件监控 Apache Mesos
Apache Mesos Telegraf 插件将从 Apache Mesos 收集指标并将它们插入到 InfluxDB 中。默认情况下,此插件未配置为从 Mesos 收集指标,因为集群可以以多种方式部署。您需要为此插件指定主/从节点以从中收集指标。
用于监控的关键 Apache Mesos 指标
您应该主动监控的一些重要的 Apache Mesos 指标包括
资源
master/cpus_percent已分配 CPU 的百分比master/cpus_used已分配 CPU 的数量master/cpus_totalCPU 的数量master/cpus_revocable_percent已分配可撤销 CPU 的百分比master/cpus_revocable_total可撤销 CPU 的数量master/cpus_revocable_used已分配可撤销 CPU 的数量master/disk_percent已分配磁盘空间的百分比master/disk_used已分配磁盘空间,单位为 MBmaster/disk_total磁盘空间,单位为 MBmaster/disk_revocable_percent已分配可撤销磁盘空间的百分比master/disk_revocable_total可撤销磁盘空间,单位为 MBmaster/disk_revocable_used已分配可撤销磁盘空间,单位为 MBmaster/gpus_percent已分配 GPU 的百分比master/gpus_used已分配 GPU 的数量master/gpus_totalGPU 的数量master/gpus_revocable_percent已分配可撤销 GPU 的百分比master/gpus_revocable_total可撤销 GPU 的数量master/gpus_revocable_used已分配可撤销 GPU 的数量master/mem_percent已分配内存的百分比master/mem_used已分配内存,单位为 MBmaster/mem_total内存,单位为 MBmaster/mem_revocable_percent已分配可撤销内存的百分比master/mem_revocable_total可撤销内存,单位为 MBmaster/mem_revocable_used已分配可撤销内存,单位为 MB
主节点
master/elected这是否为选定的主节点master/uptime_secs正常运行时间,单位为秒
系统
system/cpus_total此主节点中可用的 CPU 数量system/load_15min过去 15 分钟的平均负载system/load_5min过去 5 分钟的平均负载system/load_1min过去 1 分钟的平均负载system/mem_free_bytes可用内存,单位为字节system/mem_total_bytes总内存,单位为字节
从节点
master/slave_registrationsmaster/slave_removalsmaster/slave_reregistrationsmaster/slave_shutdowns_scheduledmaster/slave_shutdowns_canceledmaster/slave_shutdowns_completedmaster/slaves_activemaster/slaves_connectedmaster/slaves_disconnectedmaster/slaves_inactivemaster/slave_unreachable_canceledmaster/slave_unreachable_completedmaster/slave_unreachable_scheduledmaster/slaves_unreachable
框架
master/frameworks_activemaster/frameworks_connectedmaster/frameworks_disconnectedmaster/frameworks_inactivemaster/outstanding_offers
框架报价
master/frameworks/subscribedmaster/frameworks/calls_totalmaster/frameworks/callsmaster/frameworks/events_totalmaster/frameworks/eventsmaster/frameworks/operations_totalmaster/frameworks/operationsmaster/frameworks/tasks/activemaster/frameworks/tasks/terminalmaster/frameworks/offers/sentmaster/frameworks/offers/acceptedmaster/frameworks/offers/declinedmaster/frameworks/offers/rescindedmaster/frameworks/roles/suppressed
任务
master/tasks_errormaster/tasks_failedmaster/tasks_finishedmaster/tasks_killedmaster/tasks_lostmaster/tasks_runningmaster/tasks_stagingmaster/tasks_startingmaster/tasks_droppedmaster/tasks_gonemaster/tasks_gone_by_operatormaster/tasks_killingmaster/tasks_unreachable
消息
master/invalid_executor_to_framework_messagesmaster/invalid_framework_to_executor_messagesmaster/invalid_status_update_acknowledgementsmaster/invalid_status_updatesmaster/dropped_messagesmaster/messages_authenticatemaster/messages_deactivate_frameworkmaster/messages_decline_offersmaster/messages_executor_to_frameworkmaster/messages_exited_executormaster/messages_framework_to_executormaster/messages_kill_taskmaster/messages_launch_tasksmaster/messages_reconcile_tasksmaster/messages_register_frameworkmaster/messages_register_slavemaster/messages_reregister_frameworkmaster/messages_reregister_slavemaster/messages_resource_requestmaster/messages_revive_offersmaster/messages_status_updatemaster/messages_status_update_acknowledgementmaster/messages_unregister_frameworkmaster/messages_unregister_slavemaster/messages_update_slavemaster/recovery_slave_removalsmaster/slave_removals/reason_registeredmaster/slave_removals/reason_unhealthymaster/slave_removals/reason_unregisteredmaster/valid_framework_to_executor_messagesmaster/valid_status_update_acknowledgementsmaster/valid_status_updatesmaster/task_lost/source_master/reason_invalid_offersmaster/task_lost/source_master/reason_slave_removedmaster/task_lost/source_slave/reason_executor_terminatedmaster/valid_executor_to_framework_messagesmaster/invalid_operation_status_update_acknowledgementsmaster/messages_operation_status_update_acknowledgementmaster/messages_reconcile_operationsmaster/messages_suppress_offersmaster/valid_operation_status_update_acknowledgements
evqueue
master/event_queue_dispatchesmaster/event_queue_http_requestsmaster/event_queue_messagesmaster/operator_event_stream_subscribers
注册器
registrar/state_fetch_msregistrar/state_store_msregistrar/state_store_ms/maxregistrar/state_store_ms/minregistrar/state_store_ms/p50registrar/state_store_ms/p90registrar/state_store_ms/p95registrar/state_store_ms/p99registrar/state_store_ms/p999registrar/state_store_ms/p9999registrar/state_store_ms/countregistrar/log/ensemble_sizeregistrar/log/recoveredregistrar/queued_operationsregistrar/registry_size_bytes
分配器
allocator/allocation_run_msallocator/allocation_run_ms/countallocator/allocation_run_ms/maxallocator/allocation_run_ms/minallocator/allocation_run_ms/p50allocator/allocation_run_ms/p90allocator/allocation_run_ms/p95allocator/allocation_run_ms/p99allocator/allocation_run_ms/p999allocator/allocation_run_ms/p9999allocator/allocation_runsallocator/allocation_run_latency_msallocator/allocation_run_latency_ms/countallocator/allocation_run_latency_ms/maxallocator/allocation_run_latency_ms/minallocator/allocation_run_latency_ms/p50allocator/allocation_run_latency_ms/p90allocator/allocation_run_latency_ms/p95allocator/allocation_run_latency_ms/p99allocator/allocation_run_latency_ms/p999allocator/allocation_run_latency_ms/p9999allocator/roles/shares/dominantallocator/event_queue_dispatchesallocator/offer_filters/roles/activeallocator/quota/roles/resources/offered_or_allocatedallocator/quota/roles/resources/guaranteeallocator/resources/cpus/offered_or_allocatedallocator/resources/cpus/totalallocator/resources/disk/offered_or_allocatedallocator/resources/disk/totalallocator/resources/mem/offered_or_allocatedallocator/resources/mem/total
Mesos 从节点指标组
- 资源
slave/cpus_percentslave/cpus_usedslave/cpus_totalslave/cpus_revocable_percentslave/cpus_revocable_totalslave/cpus_revocable_usedslave/disk_percentslave/disk_usedslave/disk_totalslave/disk_revocable_percentslave/disk_revocable_totalslave/disk_revocable_usedslave/gpus_percentslave/gpus_usedslave/gpus_total,slave/gpus_revocable_percentslave/gpus_revocable_totalslave/gpus_revocable_usedslave/mem_percentslave/mem_usedslave/mem_totalslave/mem_revocable_percentslave/mem_revocable_totalslave/mem_revocable_used
- 代理
slave/registeredslave/uptime_secs
- 系统
system/cpus_totalsystem/load_15minsystem/load_5minsystem/load_1minsystem/mem_free_bytessystem/mem_total_bytes
- 执行器
containerizer/mesos/container_destroy_errorsslave/container_launch_errorsslave/executors_preemptedslave/frameworks_activeslave/executor_directory_max_allowed_age_secsslave/executors_registeringslave/executors_runningslave/executors_terminatedslave/executors_terminatingslave/recovery_errors
- 任务
slave/tasks_failedslave/tasks_finishedslave/tasks_killedslave/tasks_lostslave/tasks_runningslave/tasks_stagingslave/tasks_starting
- 消息
slave/invalid_framework_messagesslave/invalid_status_updatesslave/valid_framework_messagesslave/valid_status_updates
您可以在其文档页面上了解有关 Apache Meso 指标的更多信息。
有关更多信息,请查看文档。
强大的性能,无限的扩展
收集、组织和处理海量高速数据。当您将任何数据视为时间序列数据时,它会更有价值。借助 InfluxDB,这个使用 Telegraf 构建的排名第一的时间序列平台可以扩展。
查看入门方法