Be aware, sqoop is not atomic. After a data load, it is a good practice to do a record count on both sides and make sure they match.
Log into your HDP Sandbox as root to bring up a terminal window (instructions are provided in the Sandbox). The loopback address 127.0.0.1 is a non-routable IP address that refers to the local host.
Demo One: Move data from a relational database into your Hadoop cluster. Then use HDFS commands to verity the files reside in your Hadoop cluster on HDFS.
Connect to MySQL using the mysql client, create a database and build a simple table.
# mysql
mysql> CREATE DATABASE sqoopdb;
mysql> USE sqoopdb;
mysql> CREATE TABLE mytab (id int not null auto_increment primary key, name varchar(20));
mysql> INSERT INTO mytab VALUES (null, 'Tom');
mysql> INSERT INTO mytab VALUES (null, 'George');
mysql> INSERT INTO mytab VALUES (null, 'Barry');
mysql> INSERT INTO mytab VALUES (null, 'Mark');
mysql> GRANT ALL ON sqoopdb.* to root@localhost;
mysql> GRANT ALL ON sqoopdb.* to root@'%';
mysql> exit;
-- Sqoop command requires permission to access the database as well as HDFS.
# su - hdfs
$ sqoop import --connect jdbc:mysql://127.0.0.1/sqoopdb --username root --direct --table mytab --m 1
$ hadoop fs -lsr mytab
$ hadoop fs -cat mytab/part-m-00000
-- Demo Two: Load data from a relational database into Hive. Then query the data using Hive.
# mysql
mysql> USE sqoopdb;
mysql> CREATE TABLE newtab (id int not null auto_increment primary key, name varchar(20));
mysql> INSERT INTO newtab VALUES (null, 'Tom');
mysql> INSERT INTO newtab VALUES (null, 'George');
mysql> INSERT INTO newtab VALUES (null, 'Barry');
mysql> INSERT INTO newtab VALUES (null, 'Mark');
mysql> exit;
# su - hdfs
$ sqoop import --connect jdbc:mysql://127.0.0.1/sqoopdb --username root --table newtab \
--direct --m 1 --hive-import
-- Hive has a command line interface for interfacing with the data. Using the hive metadata, hive users
-- can access the data using a SQL interface. Person running hive command must have read access in
-- HDFS.
$ hive
hive> show tables;
hive> SELECT * FROM newtab;
hive> exit;
$
-- The physical files will be stored in the HDFS directory location defined by the following property in the /etc/hive/conf/hive-site.xml file.
--
--
--
--
-- Look at the data location in HDFS.
$ hadoop fs -lsr /apps/hive/warehouse/newtab
-- Look at the data contents.
$ hadoop fs -cat /apps/hive/warehouse/newtab/part-m-00000
-- You can use the following help commands along with the documentation to do a lot of examples moving data between your Hadoop cluster and a relational database using Sqoop.
$ sqoop help
$ sqoop help import
No comments:
Post a Comment