Skip to main content

Apach Tajo ( with Hadoop Library)

We wanted to analyse huge user activity logs with some query conditions.
So we started with ELK(Elasticsearch, Logstash, Kibana). but Kibana's query results didn't meet our business needs. so I made some customised search queries and data types or something.
After some struggling, we decided to move to Apache Tajo for flexible query condition and easy for ETL.

And in order to save the cost for Tajo slave instances in AWS, I used this batch application for dynamic spot instances.


1 Tajo Master server (EC2 minimum size)
N Tajo slave servers (large size spot instances )



tz-tajo
=====================================

I made a batch application for cohort analysis with Apache TAJO. the feature is like this,

1. create 10 spot instances
2. check the instances status and Tajo workers' health
3. create a Tajo table on S3 for TETON log files (user, event)
4. create a result Tajo table for result saving on S3
5. execute cohort query and insert the result to the result Tajo table
6. download the result CSV file from S3
7. create a table in mysql
8. load the result CSV file to the table in mysql
9. terminate 10 spot instances

The only thing I have to fix is getting the result CSV file from S3.
Once we got a base for analysis automation tool, so we just need to change the Tajo query for the next requirement.

https://github.com/doohee323/tz-tajo


Comments

Popular posts from this blog

Amazon RDS Blue/Green Deployments

In order to avoid some errors I experienced when proceeding as described in the official documentation, I describe what I did in order. 1) Modify parameters of source_database * error: Blue Green Deployments requires cluster parameter group has binlog enabled. RDS Parameter groups: source-params-group binlog_format => MIXED mysql> show global variables like 'binlog_format'; 2) Insert a row after rebooting the source database, to avoid this error. * error: Correct the replication errors and then switch over. Read Replica Replication Error - IOError: 1236, reason: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file' => To Fix: You need to change the data in the source database. INSERT INTO dummy_table ( `favorite_id` , `favorite_order` , `user_id` , `board_id` ) VALUES ('100001', '1', '11111', '11111'); 3) Modify the param...

Fluentd for mysql in AWS

(0) preparation ulimit -n If your console shows 1024, it is insufficient. Please add following lines to your /etc/security/limits.conf file and reboot your machine. root soft nofile 65536 root hard nofile 65536 (1) install Fluentd // “Ubuntu 12.04 LTS / Precise” curl -L http://toolbelt.treasuredata.com/sh/install-ubuntu-precise.sh | sh /etc/init.d/td-agent start/stop/restart/status // test curl -X POST -d 'json={"json":"message"}' http://localhost:8888/debug.test /etc/init.d/td-agent stop chown: changing ownership of `/var/run/td-agent/td-agent.pid': Operation not permitted chown: changing ownership of `/var/run/td-agent': Operation not permitted  * Stopping td-agent td-agent                                                                                   ...

Install CoreOs on linode without VM

Install CoreOs on linode without VM 1. Add a Linode 2. Create a new Disk   CoreOS 3. Rescue > Reboot into Rescue Mode 4. Remote Access   Launch Lish Console 5. make an install script cat <<'EOF1' > install.sh # add needed package sudo apt-get update sudo apt-get install -y curl wget whois sudo apt-get install -y ca-certificates #sudo apt-get install gawk -y # get discovery url discoveryUrl=`curl https://discovery.etcd.io/new` # write cloud-config.yml cat <<EOF2 > cloud-config.yml #cloud-config users:   - name: core     groups:       - sudo       - docker coreos:   etcd:     name: node01     discovery: $discoveryUrl hostname: node01 EOF2 # get the coreos installation script #wget https://raw.github.com/coreos/init/master/bin/coreos-install wget https://raw.githubusercontent.com/coreos/init/master/bin/coreos-install # run installation chmod 75...