16-Zabbix监控配置详解
本文档详细介绍Zabbix监控系统的部署和配置,实现对3节点Docker集群的全面监控。
概述
Zabbix是一个企业级开源监控解决方案,支持:
主机和容器监控
网络设备监控
应用程序监控
告警和通知
架构设计
┌─────────────────────────────────────────────────────────────────┐ │ manage-net (172.20.5.0/24) │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Zabbix Server │ │ │ │ 172.20.5.31:10051 │ │ │ └────────────────────────┬────────────────────────────────┘ │ │ │ │ │ ┌────────────────────────┴────────────────────────────────┐ │ │ │ Zabbix Web │ │ │ │ 172.20.5.32:8080 │ │ │ │ (Apache + PHP) │ │ │ └────────────────────────┬────────────────────────────────┘ │ │ │ │ │ ┌────────────────────────┴────────────────────────────────┐ │ │ │ Zabbix MySQL │ │ │ │ 172.20.5.33:3306 │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │Agent-Node1 │ │Agent-Node2 │ │Agent-Node3 │ │ │ │172.20.5.41 │ │172.20.5.42 │ │172.20.5.43 │ │ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ └─────────┼──────────────────┼──────────────────┼─────────────────┘ │ │ │ 监控Node1 监控Node2 监控Node3
IP规划
| 组件 | IP地址 | 节点 | 端口 | 说明 |
|---|---|---|---|---|
| Zabbix Server | 172.20.5.31 | Node3 | 10051 | Zabbix主服务 |
| Zabbix Web | 172.20.5.32 | Node3 | 8080 | Web界面 |
| Zabbix MySQL | 172.20.5.33 | Node3 | 3306 | 数据库 |
| Zabbix Agent | 172.20.5.41 | Node1 | 10050 | Agent2 |
| Zabbix Agent | 172.20.5.42 | Node2 | 10050 | Agent2 |
| Zabbix Agent | 172.20.5.43 | Node3 | 10050 | Agent2 |
部署步骤
步骤1:创建配置目录
在所有节点执行:
mkdir -p /opt/cluster-deploy/config/{zabbix,zabbix-mysql}步骤2:创建Zabbix MySQL配置
在Node3执行:
cat > /opt/cluster-deploy/config/zabbix-mysql/my.cnf << 'EOF' [mysqld] server-id = 100 bind-address = 0.0.0.0 port = 3306 datadir = /var/lib/mysql socket = /var/lib/mysql/mysql.sock log_bin = mysql-bin binlog_format = ROW expire_logs_days = 7 character-set-server = utf8mb4 collation-server = utf8mb4_bin max_connections = 200 max_allowed_packet = 64M innodb_buffer_pool_size = 256M innodb_log_file_size = 64M innodb_flush_log_at_trx_commit = 2 innodb_flush_method = O_DIRECT [client] socket = /var/lib/mysql/mysql.sock [mysql] socket = /var/lib/mysql/mysql.sock EOF
步骤3:创建Zabbix Server配置
在Node3执行:
cat > /opt/cluster-deploy/config/zabbix/zabbix_server.conf << 'EOF' ListenPort=10051 LogType=console DBHost=zabbix-mysql DBPort=3306 DBName=zabbix DBUser=zabbix DBPassword=ZabbixStr0ng!Pass HANodeName=ZabbixServer NodeAddress=172.20.5.31:10051 EOF
步骤4:创建Zabbix Agent配置
在所有节点执行对应的配置:
Node1 Agent配置
cat > /opt/cluster-deploy/config/zabbix/zabbix_agentd-node1.conf << 'EOF' Server=172.20.5.31 ServerActive=172.20.5.31 Hostname=Node1-Agent BufferSend=5 BufferSize=100 MaxLinesPerSecond=20 Timeout=10 LogType=console EOF
Node2 Agent配置
cat > /opt/cluster-deploy/config/zabbix/zabbix_agentd-node2.conf << 'EOF' Server=172.20.5.31 ServerActive=172.20.5.31 Hostname=Node2-Agent BufferSend=5 BufferSize=100 MaxLinesPerSecond=20 Timeout=10 LogType=console EOF
Node3 Agent配置
cat > /opt/cluster-deploy/config/zabbix/zabbix_agentd-node3.conf << 'EOF' Server=172.20.5.31 ServerActive=172.20.5.31 Hostname=Node3-Agent BufferSend=5 BufferSize=100 MaxLinesPerSecond=20 Timeout=10 LogType=console EOF
步骤5:创建Docker Compose文件
Node1 Zabbix Agent
cat > /opt/cluster-deploy/docker-compose-zabbix-node1.yml << 'EOF' services: zabbix-agent: image: zabbix/zabbix-agent2:alpine-7.0-latest container_name: zabbix-agent networks: manage-net: ipv4_address: 172.20.5.41 volumes: - ./config/zabbix/zabbix_agentd-node1.conf:/etc/zabbix/zabbix_agent2.conf:ro - /var/run/docker.sock:/var/run/docker.sock:ro environment: - ZABBIX_SERVER_HOST=172.20.5.31 restart: unless-stopped networks: manage-net: external: true EOF
Node2 Zabbix Agent
cat > /opt/cluster-deploy/docker-compose-zabbix-node2.yml << 'EOF' services: zabbix-agent: image: zabbix/zabbix-agent2:alpine-7.0-latest container_name: zabbix-agent networks: manage-net: ipv4_address: 172.20.5.42 volumes: - ./config/zabbix/zabbix_agentd-node2.conf:/etc/zabbix/zabbix_agent2.conf:ro - /var/run/docker.sock:/var/run/docker.sock:ro environment: - ZABBIX_SERVER_HOST=172.20.5.31 restart: unless-stopped networks: manage-net: external: true EOF
Node3 Zabbix Server + Web + Agent
cat > /opt/cluster-deploy/docker-compose-zabbix-node3.yml << 'EOF' services: zabbix-agent: image: zabbix/zabbix-agent2:alpine-7.0-latest container_name: zabbix-agent networks: manage-net: ipv4_address: 172.20.5.43 volumes: - ./config/zabbix/zabbix_agentd-node3.conf:/etc/zabbix/zabbix_agent2.conf:ro - /var/run/docker.sock:/var/run/docker.sock:ro environment: - ZABBIX_SERVER_HOST=172.20.5.31 restart: unless-stopped zabbix-mysql: image: mysql:8.0 container_name: zabbix-mysql hostname: zabbix-mysql networks: manage-net: ipv4_address: 172.20.5.33 volumes: - zabbix-mysql-data:/var/lib/mysql - ./config/zabbix-mysql/my.cnf:/etc/mysql/conf.d/my.cnf:ro environment: - MYSQL_ROOT_PASSWORD=RootStr0ng!Pass - MYSQL_DATABASE=zabbix - MYSQL_USER=zabbix - MYSQL_PASSWORD=ZabbixStr0ng!Pass command: - --default-authentication-plugin=mysql_native_password restart: unless-stopped zabbix-server: image: zabbix/zabbix-server-mysql:alpine-7.0-latest container_name: zabbix-server hostname: zabbix-server networks: manage-net: ipv4_address: 172.20.5.31 volumes: - zabbix-server-data:/var/lib/zabbix - ./config/zabbix/zabbix_server.conf:/etc/zabbix/zabbix_server.conf:ro environment: - DB_SERVER_HOST=zabbix-mysql - MYSQL_DATABASE=zabbix - MYSQL_USER=zabbix - MYSQL_PASSWORD=ZabbixStr0ng!Pass - ZBX_CACHESIZE=128M - ZBX_HISTORYCACHESIZE=64M - ZBX_TRENDCACHESIZE=32M - ZBX_VALUECACHESIZE=64M ports: - "10051:10051" depends_on: - zabbix-mysql restart: unless-stopped zabbix-web: image: zabbix/zabbix-web-apache-mysql:alpine-7.0-latest container_name: zabbix-web hostname: zabbix-web networks: manage-net: ipv4_address: 172.20.5.32 volumes: - zabbix-web-data:/etc/zabbix/web - zabbix-web-logs:/var/log/httpd environment: - DB_SERVER_HOST=zabbix-mysql - MYSQL_DATABASE=zabbix - MYSQL_USER=zabbix - MYSQL_PASSWORD=ZabbixStr0ng!Pass - ZBX_SERVER_HOST=172.20.5.31 - PHP_TZ=Asia/Shanghai ports: - "8080:8080" depends_on: - zabbix-mysql - zabbix-server restart: unless-stopped networks: manage-net: external: true volumes: zabbix-mysql-data: zabbix-server-data: zabbix-web-data: zabbix-web-logs: EOF
步骤6:启动Zabbix服务
# Node1 - 启动Agent cd /opt/cluster-deploy docker compose -f docker-compose-zabbix-node1.yml up -d # Node2 - 启动Agent cd /opt/cluster-deploy docker compose -f docker-compose-zabbix-node2.yml up -d # Node3 - 启动Server + Web + Agent cd /opt/cluster-deploy docker compose -f docker-compose-zabbix-node3.yml up -d
初始化Zabbix Web界面
首次访问
打开浏览器访问:http://192.168.64.130:8080
默认登录信息:
用户名:Admin
密码:zabbix
初始配置向导
欢迎:点击 "Next step"
检查依赖:确认所有检查项通过,点击 "Next step"
配置数据库:保持默认设置,点击 "Next step"
Zabbix服务器详情:
Host:
172.20.5.31Port:
10051Name:
Zabbix server
时区配置:选择
Asia/Shanghai完成:点击 "Finish"
添加主机监控
添加Node1主机
进入Configuration→Hosts→Create host
填写主机信息:
Host name:
Node1Groups: 选择
Linux serversInterfaces:
Type:
AgentIP address:
172.20.5.41Port:
10050
点击Templates:
Link new templates:
Linux by Zabbix agentLink new templates:
Docker(如需要)
点击Add
添加Node2主机
同上,Host name为Node2,IP为172.20.5.42
添加Node3主机
同上,Host name为Node3,IP为172.20.5.43
配置监控项
常用监控项
| 监控项 | 键值 | 说明 |
|---|---|---|
| CPU使用率 | system.cpu.util | CPU总使用率 |
| 内存使用 | vm.memory.size | 内存总量 |
| 磁盘使用 | vfs.fs.size | 磁盘使用情况 |
| 网络流量 | net.if.in/out | 网络接口流量 |
| 容器数量 | docker.info | Docker容器数量 |
添加自定义监控项
进入Configuration→Hosts
点击主机名称进入详情
点击Items→Create item
填写监控项信息:
Name:
容器总数Type:
Zabbix agentKey:
docker.infoType of information:
Numeric (unsigned)
配置告警
创建媒体类型
进入Administration→Media types
点击Email
配置SMTP服务器信息:
SMTP server:
smtp.example.comSMTP server port:
587SMTP helo:
zabbixSMTP email:
zabbix@example.com
点击Update
创建触发器
进入Configuration→Hosts
点击触发器的主机
点击Triggers→Create trigger
填写触发器信息:
Name:
CPU使用率过高Severity:
WarningExpression:
{Node1:system.cpu.util.last()}>80
点击Add
创建动作
进入Configuration→Actions
选择Trigger actions→Create action
填写动作信息:
Name:
CPU告警通知Conditions:
Trigger = CPU使用率过高
点击Operations:
Operation type:
Send messageSend to: 选择用户组
Media type:
Email
点击Add
使用Zabbix监控Docker
启用Docker监控模板
Zabbix Agent2内置了Docker监控支持。需要配置以下内容:
在Agent配置中添加监控插件:
cat >> /opt/cluster-deploy/config/zabbix/zabbix_agentd-node1.conf << 'EOF' # Docker监控 Plugins.Docker.Endpoint=unix:///var/run/docker.sock EOF
重启Agent:
docker restart zabbix-agent
在Zabbix Web中导入Docker模板:
下载模板:https://git.zabbix.com/projects/ZBX/repos/zabbix/raw/templates/app/docker.yaml
进入Configuration→Templates→Import
选择下载的yaml文件
点击Import
监控容器状态
可用的Docker监控项:
docker.container_info:容器信息docker.container_stats:容器统计docker.container.list:容器列表docker.image.list:镜像列表
验证监控
检查主机状态
进入Monitoring→Hosts
确认所有3个节点的ZBX图标为绿色
查看最新数据
进入Monitoring→Latest data
选择主机查看监控数据
查看图表
进入Monitoring→Graphs
选择主机和监控项查看趋势图
常用查询
查看容器数量
docker exec zabbix-agent zabbix_agent2 -t docker.container.list
查看系统负载
docker exec zabbix-agent zabbix_agent2 -t system.cpu.load
查看内存使用
docker exec zabbix-agent zabbix_agent2 -t vm.memory.size
故障排除
Zabbix Server无法启动
# 查看日志 docker logs zabbix-server # 检查MySQL连接 docker exec zabbix-server nc -zv zabbix-mysql 3306
Agent无法连接Server
# 查看Agent日志 docker logs zabbix-agent # 测试连通性 docker exec zabbix-agent zabbix_agent2 -t agent.ping
数据库初始化失败
首次启动时,Zabbix会自动初始化数据库。如果失败:
# 删除数据库卷重新初始化 docker compose down -v docker compose up -d
Web界面显示"Services are not running"
# 检查Zabbix Server进程 docker exec zabbix-server pgrep -a zabbix_server # 重启服务 docker restart zabbix-server zabbix-web
性能优化
调整Housekeeper设置
# 编辑zabbix_server.conf HousekeepingFrequency=4 MaxHousekeeperDelete=5000
配置数据存储周期
进入Administration→General→Housekeeper
调整历史数据和趋势的保留天数
调整缓存大小
# 编辑zabbix_server.conf CacheSize=64M StartPollers=10 StartPollersUnreachable=5
备份与恢复
备份Zabbix数据
# 备份MySQL数据 docker exec zabbix-mysql mysqldump -uroot -pRootStr0ng!Pass zabbix > zabbix_backup.sql # 备份配置文件 tar czf zabbix_config_backup.tar.gz /opt/cluster-deploy/config/zabbix
恢复Zabbix数据
# 恢复MySQL数据 docker exec -i zabbix-mysql mysql -uroot -pRootStr0ng!Pass zabbix < zabbix_backup.sql # 恢复配置文件 tar xzf zabbix_config_backup.tar.gz -C /