Logstash Into

Logstash (the L in the ELK Stack) is probably the most popular log analytic platform. Its responsible for data aggregation from a different sources, processing it, and sending it down the pipeline, usually to be directly indexed in Elasticsearch.

In presented setup Logstash bundles the messages that come from the filebeats, processes it and passes further to Elasticsearch. In our case we have Elasticsearch Cluster (Open Distro ) managed by AWS. However, mostly the rest runs in a Kubernetes cluster, the Logstash as well.

Logstash Deployment

While it’s possible to run several Logstash instances, it’s not needed in our case. So this is example of the Deplyoment with a single instance. Also here we are using the OSS build docker.elastic.co/logstash/logstash-oss:7.7.1 Logstash OSS Docker otherwise we had connection problems to AWS Elastic Service (Open Distro)

Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: logstash-deployment
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: logstash
  template:
    metadata:
      labels:
        app: logstash
    spec:
      containers:
      - name: logstash
        env:
          - name: LOGSTASH_PW
            valueFrom:
              secretKeyRef:
                name: elasticsearch-secrets
                key: LOGSTASH_PASSWORD
        image: docker.elastic.co/logstash/logstash-oss:7.7.1
        ports:
        - containerPort: 5044
        volumeMounts:
          - name: config-volume
            mountPath: /usr/share/logstash/config
          - name: logstash-pipeline-volume
            mountPath: /usr/share/logstash/pipeline
        resources:
            limits:
              memory: "4Gi"
              cpu: "2500m"
            requests: 
              memory: "4Gi"
              cpu: "800m"
      volumes:
      - name: config-volume
        configMap:
          name: logstash-configmap
          items:
            - key: logstash.yml
              path: logstash.yml
      - name: logstash-pipeline-volume
        configMap:
          name: logstash-configmap
          items:
            - key: logstash.conf
              path: logstash.conf

Kubernetes Service

Logstash is exposed as service to our cluster:

kind: Service
apiVersion: v1
metadata:
  name: logstash-service
  namespace: kube-system
  annotations:
        service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
  selector:
    app: logstash
  ports:
  - protocol: TCP
    port: 5044
    targetPort: 5044
  type: LoadBalancer

Basic Logs Processing Configuration

The configuration of Logstash processing pipeline starts in logstash.conf usually. Below you find basically example with 3 sections

  • input - defines source of events
  • filters - defines your processing
  • output - defines the sink
apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-configmap
  namespace: kube-system
data:
  logstash.yml: |
    http.host: "0.0.0.0"
    path.config: /usr/share/logstash/pipeline    
  logstash.conf: |
    input {
      beats {
        port => 5044
      }
    }
    filter {

      if [kubernetes][labels][logstyle] == "nginx" {
        #Nginx
        grok {
          match => { "message" => ["%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \[%{HTTPDATE:[nginx][access][time]}\] \"%{WORD:[nginx][access][method]} %{DATA:[nginx][access][url]} HTTP/%{NUMBER:[nginx][access][http_version]}\" %{NUMBER:[nginx][access][response_code]} %{NUMBER:[nginx][access][body_sent][bytes]}( \"%{DATA:[nginx][access][referrer]}\")?( \"%{DATA:[nginx][access][agent]}\")?",
          "%{IPORHOST:[nginx][access][remote_ip]} - %{DATA:[nginx][access][user_name]} \\[%{HTTPDATE:[nginx][access][time]}\\] \"-\" %{NUMBER:[nginx][access][response_code]} -" ] }
        }

        # date {
        #  match => [ "[nginx][access][time]", "dd/MMM/YYYY:H:m:s Z" ]
        #  remove_field => "[nginx][access][time]"
        # }

        useragent {
          source => "[nginx][access][agent]"
          target => "[nginx][access][user_agent]"
          remove_field => "[nginx][access][agent]"
        }

        geoip {
          source => "[nginx][access][remote_ip]"
          target => "[nginx][access][geoip]"
        }
      }
      else if [kubernetes][pod][labels][app] == "filebeat" {
        #filebeat
        grok {
          match => [ "message", "(?<timestamp>%{TIMESTAMP_ISO8601})\s+%{LOGLEVEL:level}\s+%{DATA}\s+%{GREEDYDATA:logmessage}" ]
        }
      }
      else {
        #HTD java
        grok {
          match => [ "message", "(?<timestamp>%{TIMESTAMP_ISO8601}) - \[(?<thread>[A-Za-z0-9-]+)\] %{LOGLEVEL:level}\s+(?<class>[A-Za-z0-9.]*\.[A-Za-z0-9#_]+)\s* - %{GREEDYDATA:logmessage}" ]
        }        
      }
      
    }
    output {
      elasticsearch {
        ilm_enabled => false
        hosts => ["https://notforeveryone-eyes.es.amazonaws.com:443"]
        user => 'logstash'
        password => '${LOGSTASH_PW}'
        index => "logstash-beta-%{+YYYY.MM.dd}"
      }
    }    

That’s it. Feel free to continue with Installing Filebeat to Kubernetes

Evaluation

This setup works very robust in my scenario. In our case we haven’t found better place for configuration as the ConfigMap. That has one drawback. The change in configuration do not affect restart of a Logstash (cool be a good thing as well). We don’t suffer too much from this, but if you have any improvement proposal, I’m happy to hear any feedback.

I hope it helps someone start using Logstash.