Fork me on GitHub

Flume基础学习-三之监控、自定义Source、Sink

Flume监控之Ganglia

Ganglia的安装与部署

  1. 安装httpd服务与php

    1
    [rickyin@hadoop102 flume]$ sudo yum -y install httpd php
  2. 安装其他依赖

    1
    2
    [rickyin@hadoop102 flume]$ sudo yum -y install rrdtool perl-rrdtool rrdtool-devel
    [rickyin@hadoop102 flume]$ sudo yum -y install apr-devel
  3. 安装ganglia

    1
    2
    3
    4
    5
    6
    7
    8
    9
    [rickyin@hadoop102 flume]$ sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
    [rickyin@hadoop102 flume]$ sudo yum -y install ganglia-gmetad
    [rickyin@hadoop102 flume]$ sudo yum -y install ganglia-web
    [rickyin@hadoop102 flume]$ sudo yum install -y ganglia-gmond

    Ganglia由gmond、gmetad和gweb三部分组成。
    gmond(Ganglia Monitoring Daemon)是一种轻量级服务,安装在每台需要收集指标数据的节点主机上。使用gmond,你可以很容易收集很多系统指标数据,如CPU、内存、磁盘、网络和活跃进程的数据等。
    gmetad(Ganglia Meta Daemon)整合所有信息,并将其以RRD格式存储至磁盘的服务。
    gweb(Ganglia Web)Ganglia可视化工具,gweb是一种利用浏览器显示gmetad所存储数据的PHP前端。在Web界面中以图表方式展现集群的运行状态下收集的多种不同指标数据。
  4. 修改配置文件/etc/httpd/conf.d/ganglia.conf

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    [rickyin@hadoop102 flume]$ sudo vim /etc/httpd/conf.d/ganglia.conf

    修改为红颜色的配置:

    # Ganglia monitoring system php web frontend
    Alias /ganglia /usr/share/ganglia
    <Location /ganglia>
    Order deny,allow
    #Deny from all
    Allow from all
    # Allow from 127.0.0.1
    # Allow from ::1
    # Allow from .example.com
    </Location>
  5. 修改配置文件/etc/ganglia/gmetad.conf

    1
    2
    3
    [rickyin@hadoop102 flume]$ sudo vim /etc/ganglia/gmetad.conf
    修改为:
    data_source "hadoop102" 192.168.1.102
  6. 修改配置文件/etc/ganglia/gmond.conf

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    [rickyin@hadoop102 flume]$ sudo vim /etc/ganglia/gmond.conf 

    修改为:

    cluster {
    name = "hadoop102" **
    owner = "unspecified"
    latlong = "unspecified"
    url = "unspecified"
    }
    udp_send_channel {
    #bind_hostname = yes # Highly recommended, soon to be default.
    # This option tells gmond to use a source address
    # that resolves to the machine's hostname. Without
    # this, the metrics may appear to come from any
    # interface and the DNS names associated with
    # those IPs will be used to create the RRDs.
    # mcast_join = 239.2.11.71
    host = 192.168.1.102 **
    port = 8649
    ttl = 1
    }
    udp_recv_channel {
    # mcast_join = 239.2.11.71
    port = 8649
    bind = 192.168.1.102 **
    retry_bind = true
    # Size of the UDP buffer. If you are handling lots of metrics you really
    # should bump it up to e.g. 10MB or even higher.
    # buffer = 10485760
    }
  7. 修改配置文件/etc/selinux/config

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    [rickyin@hadoop102 flume]$ sudo vim /etc/selinux/config
    修改为:

    # This file controls the state of SELinux on the system.
    # SELINUX= can take one of these three values:
    # enforcing - SELinux security policy is enforced.
    # permissive - SELinux prints warnings instead of enforcing.
    # disabled - No SELinux policy is loaded.
    SELINUX=disabled
    # SELINUXTYPE= can take one of these two values:
    # targeted - Targeted processes are protected,
    # mls - Multi Level Security protection.
    SELINUXTYPE=targeted

尖叫提示:selinux本次生效关闭必须重启,如果此时不想重启,可以临时生效之

1
[rickyin@hadoop102 flume]$ sudo setenforce 0
  1. 启动ganglia

    1
    2
    3
    [rickyin@hadoop102 flume]$ sudo service httpd start
    [rickyin@hadoop102 flume]$ sudo service gmetad start
    [rickyin@hadoop102 flume]$ sudo service gmond start
  2. 打开网页浏览ganglia页面

http://192.168.1.102/ganglia

尖叫提示:如果完成以上操作依然出现权限不足错误,请修改/var/lib/ganglia目录的权限

1
[rickyin@hadoop102 flume]$ sudo chmod -R 777 /var/lib/ganglia

操作Flume测试监控

  1. 修改/opt/module/flume/conf目录下的flume-env.sh配置:

    1
    2
    3
    4
    JAVA_OPTS="-Dflume.monitoring.type=ganglia
    -Dflume.monitoring.hosts=192.168.1.102:8649
    -Xms100m
    -Xmx200m"
  2. 启动Flume任务

    1
    2
    3
    4
    5
    6
    7
    [rickyin@hadoop102 flume]$ bin/flume-ng agent \
    --conf conf/ \
    --name a1 \
    --conf-file job/flume-netcat-logger.conf \
    -Dflume.root.logger==INFO,console \
    -Dflume.monitoring.type=ganglia \
    -Dflume.monitoring.hosts=192.168.1.102:8649
  3. 发送数据观察ganglia监测图

    1
    [rickyin@hadoop102 flume]$ nc localhost 44444

自定义Source

介绍

Source是负责接收数据到Flume Agent的组件。Source组件可以处理各种类型、各种格式的日志数据,包括avro、thrift、exec、jms、spooling directory、netcat、sequence generator、syslog、http、legacy。官方提供的source类型已经很多,但是有时候并不能满足实际开发当中的需求,此时我们就需要根据实际需求自定义某些source。

官方也提供了自定义source的接口:https://flume.apache.org/FlumeDeveloperGuide.html#source
根据官方说明自定义MySource需要继承AbstractSource类并实现ConfigurablePollableSource接口。

实现相应方法:
1
2
3
4
5
getBackOffSleepIncrement()//暂不用
getMaxBackOffSleepInterval()//暂不用
configure(Context context)//初始化context(读取配置文件内容)
process()//获取数据封装成event并写入channel,这个方法将被循环调用。
使用场景:读取MySQL数据或者其他文件系统。


### 需求

使用flume接收数据,并给每条数据添加前缀,输出到控制台。前缀可从flume配置文件中配置。

image

### 分析

image

### 编码

导入pom依赖
1
2
3
4
5
6
7
<dependencies>
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-core</artifactId>
<version>1.7.0</version>
</dependency>
</dependencies>


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
package com.atguigu;

import org.apache.flume.Context;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.PollableSource;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.source.AbstractSource;

import java.util.HashMap;

public class MySource extends AbstractSource implements Configurable, PollableSource {

//定义配置文件将来要读取的字段
private Long delay;
private String field;

//初始化配置信息
@Override
public void configure(Context context) {
delay = context.getLong("delay");
field = context.getString("field", "Hello!");
}

@Override
public Status process() throws EventDeliveryException {

try {
//创建事件头信息
HashMap<String, String> hearderMap = new HashMap<>();
//创建事件
SimpleEvent event = new SimpleEvent();
//循环封装事件
for (int i = 0; i < 5; i++) {
//给事件设置头信息
event.setHeaders(hearderMap);
//给事件设置内容
event.setBody((field + i).getBytes());
//将事件写入channel
getChannelProcessor().processEvent(event);
Thread.sleep(delay);
}
} catch (Exception e) {
e.printStackTrace();
return Status.BACKOFF;
}
return Status.READY;
}

@Override
public long getBackOffSleepIncrement() {
return 0;
}

@Override
public long getMaxBackOffSleepInterval() {
return 0;
}
}


### 测试

1. 打包:将写好的代码打包,并放到flume的lib目录(/opt/module/flume)下。
2. 配置文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = com.atguigu.MySource
a1.sources.r1.delay = 1000
#a1.sources.r1.field = atguigu

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1


3. 开启任务

## 自定义Sink

### 介绍

Sink不断地轮询Channel中的事件且批量地移除它们,并将这些事件批量写入到存储或索引系统、或者被发送到另一个Flume Agent。

Sink是完全事务性的。在从Channel批量删除数据之前,每个Sink用Channel启动一个事务。批量事件一旦成功写出到存储系统或下一个Flume Agent,Sink就利用Channel提交事务。事务一旦被提交,该Channel从自己的内部缓冲区删除事件。

Sink组件目的地包括hdfs、logger、avro、thrift、ipc、file、null、HBase、solr、自定义。官方提供的Sink类型已经很多,但是有时候并不能满足实际开发当中的需求,此时我们就需要根据实际需求自定义某些Sink。

官方也提供了自定义source的接口:https://flume.apache.org/FlumeDeveloperGuide.html#sink
根据官方说明自定义MySink需要继承AbstractSink类并实现Configurable接口。

1
2
3
4
实现相应方法:
configure(Context context)//初始化context(读取配置文件内容)
process()//从Channel读取获取数据(event),这个方法将被循环调用。
使用场景:读取Channel数据写入MySQL或者其他文件系统。

需求

使用flume接收数据,并在Sink端给每条数据添加前缀和后缀,输出到控制台。前后缀可在flume任务配置文件中配置。

流程分析:
image

编码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
package com.atguigu;

import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class MySink extends AbstractSink implements Configurable {

//创建Logger对象
private static final Logger LOG = LoggerFactory.getLogger(AbstractSink.class);

private String prefix;
private String suffix;

@Override
public Status process() throws EventDeliveryException {

//声明返回值状态信息
Status status;

//获取当前Sink绑定的Channel
Channel ch = getChannel();

//获取事务
Transaction txn = ch.getTransaction();

//声明事件
Event event;

//开启事务
txn.begin();

//读取Channel中的事件,直到读取到事件结束循环
while (true) {
event = ch.take();
if (event != null) {
break;
}
}
try {
//处理事件(打印)
LOG.info(prefix + new String(event.getBody()) + suffix);

//事务提交
txn.commit();
status = Status.READY;
} catch (Exception e) {

//遇到异常,事务回滚
txn.rollback();
status = Status.BACKOFF;
} finally {

//关闭事务
txn.close();
}
return status;
}

@Override
public void configure(Context context) {

//读取配置文件内容,有默认值
prefix = context.getString("prefix", "hello:");

//读取配置文件内容,无默认值
suffix = context.getString("suffix");
}
}