PipeBug: Monitoring Using Graphite, Logstash, Sensu, and Tessera

The dark side of the ELK stack. Unleash the Logstash mapping.

In memory of the beloved Kibana 3. We will never forget.

Part Three: Install Logstash

Logstash is a powerful system for managing logs. You can store and analyze any logs with it. Logstash requires Java runtime, at least Java 7 from Oracle or Open JDK, which I already installed as described in part one of this tutorial.

Logstash 1.5 looks promising, and it's faster in log processing than its predecessor. There are some performance improvements, plugin ecosystem changes and interesting publish-subscribe messaging integration based on Apache Kafka, which I planned to test later. So, I decided to give this new Logstash 1.5 a try:

cd /opt 
wget http://download.elasticsearch.org/logstash/logstash/logstash-1.5.0.beta1.tar.gz 
tar zxvf logstash-1.5.0.beta1.tar.gz 
ln -sf logstash-1.5.0.beta1 logstash

I defined some important variables in /etc/sysconfig/logstash:

# Set a home directory 
LS_HOME=/var/lib/logstash 
 
# logstash logging 
LS_USE_GC_LOGGING="true" 
 
# logstash configuration directory 
LS_CONF_DIR=/etc/logstash/conf.d

Also, I want to have Logstash daemon - /etc/init.d/logstash:

#!/bin/sh 
# Init script for logstash 
#   * Sections: 20.2, 20.3 
# 
### BEGIN INIT INFO 
# Provides:          logstash 
# Required-Start:    $remote_fs $syslog 
# Required-Stop:     $remote_fs $syslog 
# Default-Start:     2 3 4 5 
# Default-Stop:      0 1 6 
# Short-Description: 
# Description:        Starts Logstash as a daemon. 
### END INIT INFO 
 
PATH=/sbin:/usr/sbin:/bin:/usr/bin 
export PATH 
 
if [ `id -u` -ne 0 ]; then 
  echo "You need root privileges to run this script" 
  exit 1 
fi 
 
name=logstash 
pidfile="/var/run/$name.pid" 
 
LS_USER=root 
LS_GROUP=logstash 
LS_HOME=/var/lib/logstash 
LS_HEAP_SIZE="1000m" 
LS_JAVA_OPTS="-Djava.io.tmpdir=${LS_HOME}" 
LS_LOG_DIR=/var/log/logstash 
LS_LOG_FILE="${LS_LOG_DIR}/$name.log" 
LS_CONF_DIR=/etc/logstash/conf.d 
LS_OPEN_FILES=16384 
LS_NICE=19 
LS_OPTS="" 
 
[ -r /etc/default/$name ] && . /etc/default/$name 
[ -r /etc/sysconfig/$name ] && . /etc/sysconfig/$name 
 
program=/opt/logstash/bin/logstash 
args="agent -f ${LS_CONF_DIR} -l ${LS_LOG_FILE} ${LS_OPTS}" 
 
start() { 
  JAVA_OPTS=${LS_JAVA_OPTS} 
  HOME=${LS_HOME} 
  export PATH HOME JAVA_OPTS LS_HEAP_SIZE LS_JAVA_OPTS LS_USE_GC_LOGGING 
 
  # set ulimit as (root, presumably) first, before we drop privileges 
  ulimit -n ${LS_OPEN_FILES} 
 
  # Run the program! 
  nice -n ${LS_NICE} chroot --userspec $LS_USER:$LS_GROUP / sh -c " 
    cd $LS_HOME 
    ulimit -n ${LS_OPEN_FILES} 
    exec \"$program\" $args 
  " > "${LS_LOG_DIR}/$name.stdout" 2> "${LS_LOG_DIR}/$name.err" & 
 
  # Generate the pidfile from here. If we instead made the forked process 
  # generate it there will be a race condition between the pidfile writing 
  # and a process possibly asking for status. 
  echo $! > $pidfile 
 
  echo "$name started." 
  return 0 
} 
 
stop() { 
  # Try a few times to kill TERM the program 
  if status ; then 
    pid=`cat "$pidfile"` 
    echo "Killing $name (pid $pid) with SIGTERM" 
    kill -TERM $pid 
    # Wait for it to exit. 
    for i in 1 2 3 4 5 ; do 
      echo "Waiting $name (pid $pid) to die..." 
      status || break 
      sleep 1 
    done 
    if status ; then 
      echo "$name stop failed; still running." 
    else 
      echo "$name stopped." 
    fi 
  fi 
} 
 
status() { 
  if [ -f "$pidfile" ] ; then 
    pid=`cat "$pidfile"` 
    if kill -0 $pid > /dev/null 2> /dev/null ; then 
      return 0 
    else 
      return 2 # program is dead but pid file exists 
    fi 
  else 
    return 3 # program is not running 
  fi 
} 
 
force_stop() { 
  if status ; then 
    stop 
    status && kill -KILL `cat "$pidfile"` 
  fi 
} 
 
case "$1" in 
  start) 
    status 
    code=$? 
    if [ $code -eq 0 ]; then 
      echo "$name is already running" 
    else 
      start 
      code=$? 
    fi 
    exit $code 
    ;; 
  stop) 
    stop 
    ;; 
  force-stop) 
    force_stop 
    ;; 
  status) 
    status 
    code=$? 
    if [ $code -eq 0 ] ; then 
      echo "$name is running" 
    else 
      echo "$name is not running" 
    fi 
    exit $code 
    ;; 
  restart) 
    stop && start 
    ;; 
  *) 
    echo "Usage: $SCRIPTNAME {start|stop|force-stop|status|restart}" >&2 
    exit 3 
  ;; 
esac 
 
exit $?

Make this init script executable:

chmod 0755 /etc/init.d/logstash

Sure I want to run Logstash if the server rebooted:

sudo chkconfig --add logstash

I'll need to have Maxmind GeoLiteCity DB in order to create a cool visitors map and have information about visitor's country and city:

mkdir -p /usr/local/share/GeoIP/ 
cd /usr/local/share/GeoIP/ 
wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz 
gzip -d GeoLiteCity.dat.gz

It's out of a scope of this tutorial, but I should mention that Maxmind database update should be automated because of a new version issued every month, and you want to have Geodatabase fresh and current. Also, consider subscribing to a commercial version of Maxmind Geodatabase - it's a great product actually.

I want to collect Apache logs and I created a new Logstash config file - /etc/logstash/conf.d/logstash-httpd.conf. The processing pipeline contains three stages: input (generate events), filter (modify events) and output (delivers events to destination).

I want to process Apache access log from the beginning of the file. Compare events with a predefined pattern "COMBINEDAPACHELOG", remove or change some fields, define geoip details using MaxMind database and then send it to the specified index to password protected instance of Elasticsearch. And yes, using mapping template "http-log-logstash" which I will describe later in details

input { 
  file { 
    path => "/apache/logs/example.com-access_log" 
    type => "apache" 
    start_position => "beginning" 
  } 
}

filter { 
  if [type] == "apache" { 
    grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } 
 
    fingerprint { 
      method => "SHA256" 
      key => "654321" 
    }

    date { 
      match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] 
      remove_field => [ "timestamp", "message", "path", "ident", "auth", "logsource" ] 
    } 
 
    mutate { 
      replace => { type => "apache_access" } 
      gsub => [ 
        "referrer", "^\"", "", 
        "referrer", "\"$", "", 
        "agent", "^\"", "", 
        "agent", "\"$", "" 
      ] 
    } 
 
    useragent { 
      prefix => "agent." 
      source => "agent" 
      remove_field => [ "agent.major", "agent.patch", "agent.build", "agent.minor", "agent.os_major", "agent.os_minor" ] 
    } 
 
    if [clientip]  { 
      geoip { 
        source => "clientip" 
        target => "geoip" 
        database => "/usr/local/share/GeoIP/GeoLiteCity.dat" 
        add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ] 
        add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ] 
      } 
      mutate {
        convert => [ "[geoip][coordinates]", "float" ]
        remove_field => [ "[geoip][ip]" ]
      }
    } 
  } 
} 
 
output { 
  if [host] == "example.com" and [type] == "apache_access" { 
    elasticsearch { 
      host => "example.com" 
      password => "12345" 
      port => 9200 
      protocol => "http" 
      user => "admin" 
      index => "example-%{+YYYY.MM.dd}" 
      template => "/etc/logstash/http-log-logstash.json" 
      template_name => "http-log-logstash" 
      document_id => "%{fingerprint}" 
      template_overwrite => true 
    } 
  } 
}

Pay attention! In order to save space I removed all the duplicate fields (e.g. "geoip.ip" field will duplicate already defined "clientip") and fields that do not carry any meaning (e.g., "agent.major", "agent.patch", "agent.build", etc.).

In this example, I called my index example-YYYY.MM.dd. I don't need to create an index manually before using it since Elasticsearch doesn't require it. Index will be created automatically.

Logstash uses the event's timestamp to assign the index name. With the first event after the first second of the new day (by default) a new index will be created.

In this part of the tutorial, I described Logstash installation on CentOS 6.5 with start-up scripts; Maxmind GeoLiteCity database installation and Logstash configuration file. Please continue to the next chapter and don't hesitate to leave your comments or suggestions below.

Part One: Install Elasticsearch

Part Two: Elasticsearch tuning

Part Three: Install Logstash (you are here)

Part Four: Logstash mapping

Part Five: Install Kibana 4 and create dashboard

Andrey Kanevsky, DevOps engineer @ DevOps Ltd.

Elasticsearch, Kibana, Logstash and Grafana are trademarks of the Elasticsearch BV.
Nagios is a trademark of the Nagios Enterprises.
Sensu is a trademark of the Heavy Water Operations.
Pagerduty is a trademark of the PagerDuty Inc.