Flume + zookeeper + kafka收集Nginx日志
相关环境
软件 | 版本 |
---|---|
Centos | 3.10.0-862.el7.x86_64 |
jdk | 1.8 |
zookeeper | 3.4.10 |
kafka | 1.2.2 |
flume | 1.6.0 |
Host | IP |
---|---|
c1 | 192.168.1.200 |
c1_1 | 192.168.1.201 |
c1_2 | 192.168.1.202 |
用户统一为hadoop
前置操作
各主机间启动ssh连接
这一步至关重要,如果没有配置成功,会影响到hadoop,kafka集群之间的连接
1 | [hadoop@c1 ~]$ ssh-keygen |
其他两台机器重复上面的操作即可.完成后,可以ssh一下各台机子(包括本机)是否还需要密码
安装软件
1 | 下载jdk1.8+ |
配置文件
flume配置文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17# vim ${FLUME_HOME}/conf/nginx_kafka.conf
nginx-kafka.sources = r1
nginx-kafka.sinks = k1
nginx-kafka.channels = c1
nginx-kafka.sources.r1.type = exec
nginx-kafka.sources.r1.command = tail -f /home/hadoop/data/access.log
nginx-kafka.sources.r1.shell = /bin/sh -c
# flume1.6 kafka sink 写法
nginx-kafka.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
nginx-kafka.sinks.k1.brokerList = c1:9092
nginx-kafka.sinks.k1.topic= nginxtopic
nginx-kafka.sinks.k1.batchSize=10
nginx-kafka.channels.c1.type = memory
nginx-kafka.sources.r1.channels = c1
nginx-kafka.sinks.k1.channel = c1
zookeeper配置文件
1
2
3
4
5
6
7
8
9
10# cp ${ZK_HOME}/conf/zoo_simple.cfg ${ZK_HOME}/conf/zoo.cfg && vim ${ZK_HOME}/conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/hadoop/data/zookeeper
clientPort=2181
# 注意,当前主机的zookeeper server不能设置hostname,必须是0.0.0.0 否则无法连接
server.1=0.0.0.0:2888:3888
server.2=c1_1:2888:3888
server.3=c1_2:2888:3888
创建zookeeper集群id
1 | echo "1">/home/hadoop/data/zookeeper/myid |
其他主机重复相同操作,server.x
需要和myid值一致,
kafka配置文件
kafka配置文件需要改动的只有几个
1
2
3
4
5
6# ${KAFKA_HOME}/config/server.properties
broker.id=0
host.name=c1
listeners=PLAINTEXT://192.168.1.200:9092
advertised.listeners=PLAINTEXT://c1:9092
zookeeper.connect=c1:2181,c1_1:2181,c1_2:2181
broker.id
从0开始且在集群中唯一
listeners
需要填上IP
advertised.listeners
需要填上hostname
这里我这么设置是没问题的,但是不清楚为啥这么设置
其他主机kafka配置文件同样的操作
编写集群启动脚本
zookeeper集群脚本
1
2
3
4
5
6
7vim start_zookeeper.sh
!/bin/bash
echo "start zkServer..."
for i in c1 c1_1 c1_2
do
ssh hadoop@$i "source ~/.bash_profile;zkServer.sh start"
done
1 | vim stop_zookeeper.sh |
chmod a+x start_zookeeper.sh stop_zookeeper.sh
kafka集群脚本
1
2
3
4
5
6
7
8vim start_kafka.sh
!/bin/sh
echo "start kafka..."
for i in c1 c1_1 c1_2
do
ssh hadoop@$i "source ~/.bash_profile;kafka-server-start.sh -daemon ${KAFKA_HOME}/config/server.properties &"
echo "done"
done
1 | vim stop_kafka.sh |
chmod a+x start_kafka.sh stop_kafka.sh
实战
启动程序
1
2
3
4
5
6
7
8
9
10
11
12启动zookeeper
[hadoop@c1 ~]$ ./start_zookeeper.sh
[hadoop@c1 ~]$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower
启动kafka
[hadoop@c1 ~]$ ./start_kafka.sh
[hadoop@c1 ~]$ jps
2953 QuorumPeerMain # zookeeper 进程
3291 Kafka #kafka进程
3359 Jps
创建Topic
1
[hadoop@c1 ~]$ kafka-topics.sh --create --zookeeper c1:2181,c1_1:2181,c1_2:2181 --replication-factor 3 --partitions 1 --topic nginxtopic
检查Topic
1
2[hadoop@c1 ~]$ kafka-topics.sh --zookeeper c1:2181,c1_1:2181,c1_2:2181 --list
nginx
启动消费者
1
[hadoop@c1 ~]$ kafka-console-consumer.sh --bootstrap-server c1:9092,c1_1:9092,c1_2:9092 -topic nginxtopic --from-beginning
模拟日志
1
2
3
4
5
6
7
8
9vim create_log.sh
---
!/bin/sh
access.log-xxx 等多个文件是生产环境拖下来的真是日志
cat access.log-*| while read -r line
do
echo $line >> /home/hadoop/logs/access.log
sleep 0.$(($RANDOM%5+1)) # 防止日志写入过快
done
启动flume
新开一个窗口
1
[hadoop@c1 ~]$ flume-ng agent --conf-file conf/nginx_kafka.conf -c conf/ --name nginx-kafka -Dflume.root.logger=DEBUG,console
稍等片刻后
flume输出日志
kafka-console-consume 输出日志
至此项目已经完全跑起来了~
错误排查及解决
- not in the sudoers file. This incident will be reported
没有sudo的操作权限,需要在root权限下编辑/etc/sudoer
1
2
3
4
5...
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
hadoop ALL=(ALL) ALL
...
已经添加过ssh仍需要输入密码
1
2chmod 700 ~/.ssh
chmod 644 ~/.ssh/authorized_keys
zookeeper It is probably not running
- 有可能是ssh无法免密访问其他主机
- 有可能是没有正确的写myid
可以在
zookeeper.out
查看详细的错误信息
本文发表于 2018-08-26,最后修改于 2020-04-04。
本站永久域名「 blog.amoyiki.com 」,也可搜索「 四畳半神话大系 」找到我。
期待关注我的 ,查看最近的文章和动态。