Skip to main content

experience sharing for real-time log indexing in Elasticsearch

Elasticsearch automatically makes type mapping with log data, but in the most of cases, the mapping data is not correct, so we should change them. this is my experience of that.

1) run elasticsearch

2) run logstash with configration

elk:/data1/elasticsearch/logstash-1.4.0> vi logstash-teton_runtime.conf&

input{
    file{
        codec => json
        path => ["/data1/elasticsearch/temp/*.log"]
        start_position => "end"
    }
}

output{
    elasticsearch {
        cluster => "locketCast"
        node_name => "logstash-teton_runtime"
        host => "xxx.xxx.xxx.xxx"
        index => "teton_runtime"
    }
}

elk:/data1/elasticsearch/logstash-1.4.0> bin/logstash -f mixpanel.conf

3) make log file in /data1/elasticsearch/temp/*.log

{"androidId":"91e8b0c3d89","facebookId":"11250","type":0,"numberOfMyFollowers":0,"externalBalance":0,"userType":"F","uniqueId":"dSTSbIg","id":326599,"profileImagePath":"http://graph.facebook.com/1125547890/picture?type=large&width=200&height=200","tos":true,"activeUser":1,"Action":"unlock","$gender":"unknown","event":"Impression_test","age":36,"name":"abcdef","Campaign_id":"tec_cus_9321","created_at":1400294301000,"gender":"unknown","udob":578600000,"active_timestamp":1404520448000,"longitude":"0.00000000","$age":"04/30/1970","os":"4.1.2","user_group":"2","status":"YET_TO_REQUEST","zipcode":"12345","cash_amount":0,"manu":"LGE","email":"marshall.s@yahoo.com","appVersion":"1.5.3","dob":"04/30/1988","latitude":"0.00000000"}

=> issue 1) : "type":0 => "type" can not be available, it should be changed.
error logs :
{"acknowledged":true}mac:/data1/elasticsearch/logstash-1.4.0> Failed to flush outgoing items {:outgoing_count=>1, :exception=>#<NameError: no method 'type' for arguments (org.jruby.RubyFixnum) on Java::OrgElasticsearchActionIndex::IndexRequest>, :backtrace=>["/data1/elasticsearch/logstash-1.4.0/lib/logstash/outputs/elasticsearch/protocol.rb:225:in
~~~
<NameError: no method 'type' for arguments (org.jruby.RubyFixnum) on Java::OrgElasticsearchActionIndex::IndexRequest>, :backtrace=>["/data1/elasticsearch/logstash-1.4.0/lib/logstash/outputs/elasticsearch/protocol.rb:225:in `build_request'", "/data1/elasticsearch/logstash-1.4.0/lib/logstash/outputs/elasticsearch/protocol.rb:205:in `bulk'", "org/jruby/RubyArray.java:1613:in `each'", "/data1/elasticsearch/logstash-1.4.0/lib/logstash/outputs/elasticsearch/protocol.rb:204:in `bulk'",

=> issue 2) : all of entries are not indexed, they're saved as a hole of string entry "message"

Solution )
step 1) check out the current type mapping

elk:/data1/elasticsearch> curl -XGET 'http://localhost:9200/teton_runtime/logs/_mapping'
{"teton_runtime":{"mappings":{"logs":{"properties":{"@timestamp":{"type":"date","format":"dateOptionalTime"},"@version":{"type":"string"},"androidId":{"type":"string"},"externalBalance":{"type":"long"},"facebookId":{"type":"string"},"host":{"type":"string"},"id":{"type":"long"},"message":{"type":"string"},"numberOfMyFollowers":{"type":"long"},"path":{"type":"string"},"profileImagePath":{"type":"string","store":true},"uniqueId":{"type":"string"},"userType":{"type":"string"}}}}}}mac:/data1/elasticsearch/logstash-1.4.0>

step 2) make new type mapping considering with current type mapping
curl -XPUT 'http://localhost:9200/teton_runtime/logs/_mapping' -d '
{
"logs" : {
        "properties" : {
"facebookId" : {"type" : "string"},
"androidId" : {"type" : "string"},
"numberOfMyFollowers" : {"type" : "long"},
"externalBalance" : {"type" : "long"},
"userType" : {"type" : "string"},
"uniqueId" : {"type" : "string"},
"id" : {"type" : "long"},
"profileImagePath" : {"type" : "string", "store" : true},
"tos" : {"type" : "boolean"},
"activeUser" : {"type" : "string"},
"Action" : {"type" : "string"},
"$gender" : {"type" : "string"},
"event" : {"type" : "string"},
"age" : {"type" : "integer"},
"name" : {"type" : "string"},
"Campaign_id" : {"type" : "string"},
"created_at" : {"type" : "date"},
"gender" : {"type" : "string"},
"udob" : {"type" : "date"},
"active_timestamp" : {"type" : "string"},
"longitude" : {"type" : "string"},
"$age" : {"type" : "date"},
"os" : {"type" : "string"},
"user_group" : {"type" : "string"},
"status" : {"type" : "string"},
"zipcode" : {"type" : "string"},
"cash_amount" : {"type" : "long"},
"manu" : {"type" : "string"},
"email" : {"type" : "string"},
"appVersion" : {"type" : "string"},
"dob" : {"type" : "date"},
"latitude" : {"type" : "string"}
        }
    }
}
'

step 3) delete current index
curl -XDELETE 'http://localhost:9200/teton_runtime'

step 4) append log file in /data1/elasticsearch/temp/*.log with last empty line

ex)
curl -XGET 'http://localhost:9200/impression/logs/_mapping'

curl -XDELETE 'http://localhost:9200/impression/logs/_mapping'

curl -XPUT 'http://localhost:9200/impression/logs/_mapping' -d '
{
"logs" : {
        "properties" : {
"carrier" : {"type" : "string", "index": "not_analyzed"}
        }
    }
}
'

* http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/


Comments

Popular posts from this blog

Amazon RDS Blue/Green Deployments

In order to avoid some errors I experienced when proceeding as described in the official documentation, I describe what I did in order. 1) Modify parameters of source_database * error: Blue Green Deployments requires cluster parameter group has binlog enabled. RDS Parameter groups: source-params-group binlog_format => MIXED mysql> show global variables like 'binlog_format'; 2) Insert a row after rebooting the source database, to avoid this error. * error: Correct the replication errors and then switch over. Read Replica Replication Error - IOError: 1236, reason: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file' => To Fix: You need to change the data in the source database. INSERT INTO dummy_table ( `favorite_id` , `favorite_order` , `user_id` , `board_id` ) VALUES ('100001', '1', '11111', '11111'); 3) Modify the param...

Fluentd for mysql in AWS

(0) preparation ulimit -n If your console shows 1024, it is insufficient. Please add following lines to your /etc/security/limits.conf file and reboot your machine. root soft nofile 65536 root hard nofile 65536 (1) install Fluentd // “Ubuntu 12.04 LTS / Precise” curl -L http://toolbelt.treasuredata.com/sh/install-ubuntu-precise.sh | sh /etc/init.d/td-agent start/stop/restart/status // test curl -X POST -d 'json={"json":"message"}' http://localhost:8888/debug.test /etc/init.d/td-agent stop chown: changing ownership of `/var/run/td-agent/td-agent.pid': Operation not permitted chown: changing ownership of `/var/run/td-agent': Operation not permitted  * Stopping td-agent td-agent                                                                                   ...

Install CoreOs on linode without VM

Install CoreOs on linode without VM 1. Add a Linode 2. Create a new Disk   CoreOS 3. Rescue > Reboot into Rescue Mode 4. Remote Access   Launch Lish Console 5. make an install script cat <<'EOF1' > install.sh # add needed package sudo apt-get update sudo apt-get install -y curl wget whois sudo apt-get install -y ca-certificates #sudo apt-get install gawk -y # get discovery url discoveryUrl=`curl https://discovery.etcd.io/new` # write cloud-config.yml cat <<EOF2 > cloud-config.yml #cloud-config users:   - name: core     groups:       - sudo       - docker coreos:   etcd:     name: node01     discovery: $discoveryUrl hostname: node01 EOF2 # get the coreos installation script #wget https://raw.github.com/coreos/init/master/bin/coreos-install wget https://raw.githubusercontent.com/coreos/init/master/bin/coreos-install # run installation chmod 75...