Sqoop Tutorial : Hadoop : Other Sqoop capabilities explored

Data Export using Sqoop

In the last article Sqoop : Hadoop : Importing data from RDBMS to HDFS we explored basic feature of importing the data in HDFS using Sqoop. In this article we will try to explore other very important tools that Sqoop provides.

 

 

 

 

Sqoop Provides Following features to us :

  • Data Import to HDFS – Single as well as all tables from DB
  • Data Export from HDFS to RDBMS
  • Merging results from consequent imports
  • Listing Databases and Tables
  • Running Sqoop Metastore
  • Working with saved jobs
  • Evaluating potential queries.

We will try to explore Some of these features .

Prerequisites :

Squoop 1.4.1 and Hadoop 1.0.1 but be installed and configured properly. You can get the instructions here

We will create the sqoop configuration file without the import command for this article so that we can use any command we want. So the example configuration file (/home/hduser/bigdatadboptions.txt) will look like –

–connect
jdbc:mysql://10.225.65.158:3306/bigdata
–username
root
–password
test

1. Data Import from RDBMS to HDFS

  • Single table import

We Explored this ingle table dataimport in the last article . You can read the article here.

  • Multiple Table import

For importing data from all tables in Database lets have a database with 2 tables “employee” and “skills” with some data in it. We can import all tables from database bigdata using following command.

../bin/sqoop import-all-tables –options-file /home/hduser/bigdatadboptions.txt

After running this Sqoop spawns 2 mapreduce jobs to import both the tables . We can verify this by listing hdfs.

[[email protected] importtemps]$ hadoop dfs -ls /user/hduser
Found 2 items

drwxr-xr-x – hduser supergroup 0 2012-08-09 05:53 /user/hduser/employee
drwxr-xr-x – hduser supergroup 0 2012-08-09 05:54 /user/hduser/skills

2. Data Export

For this lets assume we have a mysql table called skills with following fields:

+————+————-+——+—–+———+——-+
| Field | Type | Null | Key | Default | Extra |
+————+————-+——+—–+———+——-+
| skill_id | int(11) | NO | PRI | NULL | |
| skill_name | varchar(50) | NO | | NULL | |
+————+————-+——+—–+———+——-+

Also we have a file in hdfs at /user/hduser/skills with following 2 comma separated records

1,Java
2,Hadoop

Now we will run the export command as following :

../bin/sqoop export –options-file /home/hduser/bigdatadboptions.txt –table skills –export-dir /user/hduser/skills

As a result sqoop spawned mapreduce and put both the records in the skills table. The verified output from skills table is as follows:

+———-+————+
| skill_id | skill_name |
+———-+————+
| 1 | Java |
| 2 | Hadoop |
+———-+————+

3. Listing databases and tables

Listing of databases in RDBMS and sqoop also can be achieved using following commands

  • Listing Databases

../bin/sqoop list-databases –options-file /home/hduser/bigdatadboptions.txt

  • Listing Tables in a Database

../bin/sqoop list-tables –options-file /home/hduser/bigdatadboptions.txt

This is in no way a very comprehensive post about capabilities of sqoop. It would however give you basic understanding of most important features of sqoop and how to work with them.

A more comprehensive sqoop documentation can be found here http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html

Thanks and Happy Coding

Advertisements

One thought on “Sqoop Tutorial : Hadoop : Other Sqoop capabilities explored

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s