Skip to main content

Posts

Showing posts from April, 2015

Apach Tajo ( with Hadoop Library)

We wanted to analyse huge user activity logs with some query conditions. So we started with ELK(Elasticsearch, Logstash, Kibana). but Kibana's query results didn't meet our business needs. so I made some customised search queries and data types or something. After some struggling, we decided to move to Apache Tajo for flexible query condition and easy for ETL. And in order to save the cost for Tajo slave instances in AWS, I used this batch application for dynamic spot instances. 1 Tajo Master server (EC2 minimum size) N Tajo slave servers (large size spot instances ) tz-tajo ===================================== I made a batch application for cohort analysis with Apache TAJO. the feature is like this, 1. create 10 spot instances 2. check the instances status and Tajo workers' health 3. create a Tajo table on S3 for TETON log files (user, event) 4. create a result Tajo table for result saving on S3 5. execute cohort query and insert the result to the result

Setup for Elasticsearch

0. preparation  - add jars to lib folder (/tz.search/lib)  elasticsearch-1.0.0.jar  lucene-analyzers-common-4.6.1.jar  lucene-codecs-4.6.1.jar  lucene-core-4.6.1.jar  lucene-grouping-4.6.1.jar  lucene-highlighter-4.6.1.jar  lucene-join-4.6.1.jar  lucene-memory-4.6.1.jar  lucene-misc-4.6.1.jar  lucene-queries-4.6.1.jar  lucene-queryparser-4.6.1.jar  lucene-sandbox-4.6.1.jar  lucene-spatial-4.6.1.jar  lucene-suggest-4.6.1.jar  - add jars to classpath  <classpathentry kind="lib" path="lib/elasticsearch-1.0.0.jar"/>  <classpathentry kind="lib" path="lib/lucene-core-4.6.1.jar"/>  <classpathentry kind="lib" path="lib/lucene-analyzers-common-4.6.1.jar"/>  <classpathentry kind="lib" path="lib/lucene-codecs-4.6.1.jar"/>  <classpathentry kind="lib" path="lib/lucene-queries-4.6.1.jar"/>  <classpathentry kind="lib" path="lib/luce