Tuesday, 26 March 2013

GenericOptionsParser, Tool, and ToolRunner for running Hadoop Job

Hadoop comes with a few helper classes for making it easier to run jobs from the command line. GenericOptionsParser is a class that interprets common Hadoop command-line options and sets them on a Configuration object for your application to use as desired. You don’t usually use GenericOptionsParser directly, as it’s more convenient to implement the Tool interface and run your application with the ToolRunner, which uses GenericOptionsParser internally:
public interface Tool extends Configurable {
int run(String [] args) throws Exception;
}

Below example shows a very simple implementation of Tool, for running the Hadoop Map Reduce Job.
public class WordCountConfigured extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();

return 0;
}
}
public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new WordCountConfigured(), args);
System.exit(exitCode);
}

We make WordCountConfigured a subclass of Configured, which is an implementation of the Configurable interface. All implementations of Tool need to implement Configurable (since Tool extends it), and subclassing Configured is often the easiest way to achieve this. The run() method obtains the Configuration using Configurable’s getConf() method, and then iterates over it, printing each property to standard output.

WordCountConfigured’s main() method does not invoke its own run() method directly. Instead, we call ToolRunner’s static run() method, which takes care of creating a Configuration object for the Tool, before calling its run() method. ToolRunner also uses a GenericOptionsParser to pick up any standard options specified on the command line, and set them on the Configuration instance. We can see the effect of picking up the properties specified in conf/hadoop-localhost.xml by running the following command:
hadoop WordCountConfigured -conf conf/hadoop-localhost.xml -D mapred.job.tracker=localhost:10011 -D mapred.reduce.tasks=n

Options specified with -D take priority over properties from the configuration files. This is very useful: you can put defaults into configuration files, and then override them with the -D option as needed. A common example of this is setting the number of reducers for a MapReduce job via -D mapred.reduce.tasks=n. This will override the number of reducers set on the cluster, or if set in any client-side configuration files. The other options that GenericOptionsParser and ToolRunner support are listed in Table.

GenericOptionsParser and ToolRunner option Description



































Property>Description
-D property=valueSets the given Hadoop configuration property to the given value. Overrides any default or site properties in the configuration, and any properties set via the -conf option.
-conf filename ...Adds the given files to the list of resources in the configuration. This is a convenient way to set site properties, or to set a number of properties at once.
-fs uriSets the default filesystem to the given URI. Shortcut for -D fs.default.name=uri
-jt host:portSets the jobtracker to the given host and port. Shortcut for -D mapred.job.tracker=host:port
-files file1,file2,...Copies the specified files from the local filesystem (or any filesystem if a scheme is specified) to the shared filesystem used by the jobtracker (usually HDFS) and makes them available to MapReduce programs in the task’s working directory.
-archives archive1,archive2,...Copies the specified archives from the local filesystem (or any filesystem if a scheme is
specified) to the shared filesystem used by the jobtracker (usually HDFS), unarchives them, and makes them available to MapReduce programs in the task’s working directory.
-libjars jar1,jar2,...Copies the specified JAR files from the local filesystem (or any filesystem if a scheme is specified) to the shared filesystem used by the jobtracker (usually HDFS), and adds them to the MapReduce task’s classpath. This option is a useful way of shipping JAR files that a job is dependent on

Monday, 25 March 2013

Apache Ant - Tutorial

1. Build Tools
Build tools are used to automate the repetitive task like compiling source code, generating documentation, running tests, uilding the jar and so on. Some of the well known build tools are Apache Ant, Maven.


2. Overview of Ant

Ant(Another Neat Tool) is the java library and mostly used to building and deploying the java application. Ant provides the built-in tools to compile, build, test and packing the java application. Ant builds are based on three blocks.
Tasks: Task is the unit of work. For example compile, packing.
Targets: Targets can be invoked via Ant.
Extension Points: Extension points are same as targets and it won't any operation and just coordinate the targets.

3. Building the Java Application: Using Apache Ant

Create the build.xml(not as same) in your project root directory. Given below is the sample build xml(self explanatory - comments inline)
<?xml version="1.0" ?>
<project name="nRelate Analytics" >
<!-- Sets variables which can later be used. -->
<!-- The value of a property is accessed via ${} -->
<property name="src.dir" location="src" />
<property name="build.dir" location="build" />
<property name="dist.dir" location="dist" />
<property name="lib.dir" location="lib" />

<!--
Create a classpath container which can be later used in the ant task
-->
<path id="build.classpath">
<fileset dir="${lib.dir}">
<include name="**/*.jar" />
</fileset>
</path>

<!-- Deletes the existing build, dist directory-->
<target name="clean">
<delete dir="${build.dir}" />
<delete dir="${dist.dir}" />
</target>

<!-- Creates the build, dist directory-->
<target name="makedir">
<mkdir dir="${build.dir}" />
<mkdir dir="${dist.dir}" />
</target>

<!-- Compiles the java code (including the usage of library -->
<target name="compile" depends="clean, makedir">
<javac srcdir="${src.dir}" destdir="${build.dir}" classpathref="build.classpath">
</javac>
</target>

<!--Creates the deployable jar file -->
<target name="createjar" depends="compile">
<jar destfile="${dist.dir}LogNormalizer.jar" basedir="${build.dir}">
<!-- Adding to Manifest file, from which class to start exceution -->
<manifest>
<attribute name="Main-Class" value="test.MainClass" />
</manifest>
</jar>
</target>
</project>

4) Run your Ant build from the command line

Open a command line and switch to your project directory. Type in the following commands.
# Run the build
ant -f build.xml
# build.xml is default you can also use
ant
Specify the target as below.
#Run the build.xml by specifying the target.
ant -f build.xml createjar

The build should finish successfully and generate the build artifacts under the dist directory.

Subscribe to get updates on this article.

Friday, 22 March 2013

My Personal collections

This is my personal collection of books that I wish to read and also books that I have accomplished so far. I firmly believe in  Thomas Jefferson's quote - "A life without a book is meaningless". I want my life to be meaningful and so here comes this blog. I will keep on updating this list and also welcome people who suggest me some good books and give me some feedback.

Drowned ...


Two States - Chetan bhagat


2-states-cover

This is beautiful love story between an IIM guy from Delhi and a girl from Tamil nadu. This is how they fall in love and carrying it till their marriage. Over coming their problems of being from different caste, culture, language and family back ground. This is the author's own story. A good one to read.






One night at the call center - Chetan bhagat


one_night_@_the_call_centre Another drama from Chetan, where the entire book is about the story that happens over a night. This explains the life at call center and along with a love drama in between. A kind of interesting, though it doesn't sound as two states.








Who will cry when you die - Robin Sharma


Who_will_cry_when_you_die

Author Robin speaks about life's simple things which will make life happier. This will be a collection of different lessons where he touch base all areas from being kind to others, mental and physical fitness, going green, being positive and lot more.








The Greatness guide - Robin Sharma


the-greatness-guide

Robin explains how one should lead his life both as a leader and a follower. He has embedded great sayings of Legends and  proverbs that will sting to your mind. He simply states that , "the first point of how a world sees you, should begin from you.







Five point someone - Chetan bhagat


5_Point_Someone This another real life incident happened to author in IIT. A mix of fun and love in his college days. Well written in simple language as like his other books. Once you start, you can never close without finishing it.






What young India Wants - Chetan Bhagat


ChetanBhagatWhatYoungIndiaWants

Its the collection of essays and articles written by the author in various newspapers in the north. He has also some good letters written by him to Political leaders. Most of the incidents in this book speaks about things that cling around northern part of India.








The Alchemist - Paul Coelho


The Alchemist

This is a story where the author speaks about treasure, desert, travel and love. Its about a young boy who risks his life in fulfilling his destiny. He insists that "“If you really desire to achieve something, the whole world conspires in helping you to achieve it”.








Revolution 2020 - Chetan Bhagat


200px-Revolution2020_Love_Corruption_Ambition

A triangle love story between three. One guy who failed to IIT but becoming a director of an Engineering college. Another guy who cracked the exams to get in to the best college and at last ended up in running his own news paper publication who is willing to correct the corrupted nation with his thoughts and broad mind. The final touch of whom the girl will be married to is an interesting ending.







Kane and Abel - Jeffrey Archer


kane-and-abel


This is one of the best books that I have ever read. It explains life of 2 persons from their birth to death. The journey they went through, difficulties in bringing themselves up their life. I highly recommend this book for anyone. The author has kept his nerve through out the story. I have decided to read the few other books of this author. He deserves the title to be one of the best authors in the world.









To be drowned....


A Thousand Splendid Suns -  Khaled Hosseini


splendid suns

Wednesday, 20 March 2013

Google Launches Realtime API for Google Drive

Google has launched a Realtime API for Google Drive, enabling developers to create apps that can tap into Drive's real-time editing capabilities.

The API provides developers with collaborative versions of data objects such as maps, lists, strings, and JSON values, which are automatically synchronized, and all modifications to them are stored. Developers can create apps that read from and write to these objects like any local object.

If the basic set of objects isn't enough, developers can also create custom objects and references, which includes trees and arbitrary graph structures.

interested developers can check out the Google Drive Realtime API technical documentation here.

 

Source: Mashable.com