Hadoop Tutorial :Intro to Big Data and HDFS

Hadoop Tutorial :Intro to Big Data and HDFS

This tutorial attempts to give you overall understanding of Big Data in analytics and various strategies. It also talks about HDFS – the most important component of hadoop in details.

Continue reading “Hadoop Tutorial :Intro to Big Data and HDFS”

Research Paper : Parallel Computing Solutions – Hadoop Mapreduce

Data Export using Sqoop

This was a research paper that we submitted to ICAPADS-2012 an IEEE – Institute of High performance  distributed computing conference . It talks about a map reduce based solution to maze traversal problem which is applicable in many practical problems.

Continue reading “Research Paper : Parallel Computing Solutions – Hadoop Mapreduce”

MongoDB Tutorial : Introduction and Step by Step Installation

MongoDB Tutorial : Introduction and Step by Step Installation

This article talks about different MongoDB concepts and also gives step by step Installation guideline on Windows OS.

Continue reading “MongoDB Tutorial : Introduction and Step by Step Installation”

Hadoop Tutorial : Getting started with Hadoop and Mapreduce

Hadoop Mapreduce Logo

This tutorial talks about various resources that you can use to leard about hadoop and map reduce. It also talks about how you can think about learning about Big data as a subject in totality

Continue reading “Hadoop Tutorial : Getting started with Hadoop and Mapreduce”

Sqoop Tutorial : Hadoop : Other Sqoop capabilities explored

Data Export using Sqoop

In the last article Sqoop : Hadoop : Importing data from RDBMS to HDFS we explored basic feature of importing the data in HDFS using Sqoop. In this article we will try to explore other very important tools that Sqoop provides.

 

 

 

 

Continue reading “Sqoop Tutorial : Hadoop : Other Sqoop capabilities explored”

Sqoop Tutorial : Hadoop : Importing data from RDBMS to HDFS

Sqoop Tutorial : Hadoop : Importing data from RDBMS to HDFS

In this article we will go through a very important technique – importing data from SQL table to HDFS. We will do so on a sample database say ‘bigdata’ and a sample table say ’employee’ containing employee data.

We will do this in 3 parts. Part 1 will be in scope of this article. We will look at the next parts in subsequent article

Continue reading “Sqoop Tutorial : Hadoop : Importing data from RDBMS to HDFS”

Hadoop Tutorial : Custom Record Reader with TextInputFormat

Hadoop Tutorial : Custom Record Reader with TextInputFormat

In this hadoop tutorial we will have a look at the modification to our previous program wordcount with our own custom mapper and reducer by implementing a concept called as custom record reader. Before we attack the problem let us look at some theory required to understand the topic.

Continue reading “Hadoop Tutorial : Custom Record Reader with TextInputFormat”

Hadoop : WordCount with Custom Mapper and Reducer

So here is the next article in series. In the last post we learnt how to write wordcount without using explicit custom mappers or reducers. You can find the post here

Today we will go a step ahead and we will rewrite the same wordcount program by writing our own custom mappers as well as reducers.

We will use 2 classes in addition to our wordcount class.

Class WCMap Continue reading “Hadoop : WordCount with Custom Mapper and Reducer”

Hadoop : A wordcount without explicit mapper/reducer

So I was trying out my first hadoop program and I was little wary of writing mapper and reducer. But I still wanted to write the program to give me the word count for all words in the input files

So I wrote a driver program of hadoop with map class as a TokenCounterMapper Class. This class is provided by hadoop and it tokenizes the input text and emits each word with count 1.

Just the recipe that I ordered …

Now I needed a reducer which Continue reading “Hadoop : A wordcount without explicit mapper/reducer”