跳转至

Sqoop 使用

鸡汤: 兄弟,你没有努力过,所以你觉得不可思议的操作都是外挂。

1. 数据导入

   在Sqoop中,“导入”概念指:从非大数据集群(RDBMS)向大数据集群(HDFS,HIVE,HBASE)中传输数据,叫做:导入,即使用import关键字。

   数据的导入导出, 都是以大数据平台来看 ,比如mysql数据导入到大数据平台叫 导入 ,大数据大数据导入到mysql叫 导出

权威的官网 点我

1.1 RDBMS到HDFS

  • 确定Mysql服务开启正常
  • 准备mysql数据

数据准备

create database leco;
use leco;
create table staff(
    id int(4) primary key not null auto_increment, 
    name varchar(255), 
    sex varchar(255)
);
insert into staff(name, sex) values('caimz', 'male');
insert into staff(name, sex) values('summer', 'female');

注意 ,mysql的字符集,windows上会默认是gbk,插入中文会报错。

详细操作过程
[root@master ~]# mysql -uroot -proot
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 658
Server version: 5.7.26 MySQL Community Server (GPL)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| cc                 |
| hive               |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
6 rows in set (0.02 sec)

mysql> create database leco;
ale');Query OK, 1 row affected (0.04 sec)

mysql> use leco;
Database changed
mysql> create table staff(
    ->     id int(4) primary key not null auto_increment, 
    ->     name varchar(255), 
    ->     sex varchar(255)
    -> );
Query OK, 0 rows affected (0.07 sec)

mysql> insert into staff(name, sex) values('caimz', 'male');
Query OK, 1 row affected (0.01 sec)

mysql> insert into staff(name, sex) values('summer', 'female');
Query OK, 1 row affected (0.00 sec)

mysql> show tables;
+----------------+
| Tables_in_leco |
+----------------+
| staff          |
+----------------+
1 row in set (0.01 sec)

mysql> select * from staff;
+----+--------+--------+
| id | name   | sex    |
+----+--------+--------+
|  1 | caimz  | male   |
|  2 | summer | female |
+----+--------+--------+
2 rows in set (0.00 sec)

1.1.1 导入全部数据

  我们把mysql的库中的一个表输入全部导入到hdfs中。

全部数据导入

sqoop import \
--connect jdbc:mysql://master:3306/leco \
--username root \
--password root \
--table staff \
--target-dir /user/leco \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"
详细操作
[root@master ~]# sqoop import \
> --connect jdbc:mysql://master:3306/leco \
> --username root \
> --password root \
> --table staff \
> --target-dir /user/leco \
> --delete-target-dir \
> --num-mappers 1 \
> --fields-terminated-by "\t"
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/08/08 13:28:08 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/08/08 13:28:08 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/08/08 13:28:09 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/08/08 13:28:09 INFO tool.CodeGenTool: Beginning code generation
Thu Aug 08 13:28:09 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. Acc
ording to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 13:28:09 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff` AS t LIMIT 1
19/08/08 13:28:09 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff` AS t LIMIT 1
19/08/08 13:28:09 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-root/compile/01f4389cf129be60aad0970b9a6f68b2/staff.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/08/08 13:28:13 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/01f4389cf129be60aad0970b9a6f68b2/sta
ff.jarSLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.5/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/Sta
ticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/local/hbase-1.3.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class
]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/08/08 13:28:14 INFO tool.ImportTool: Destination directory /user/leco is not present, hence not deleting.
19/08/08 13:28:14 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/08/08 13:28:14 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/08/08 13:28:14 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/08/08 13:28:14 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/08/08 13:28:14 INFO mapreduce.ImportJobBase: Beginning import of staff
19/08/08 13:28:14 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/08/08 13:28:14 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/08/08 13:28:14 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.186.10:8032
Thu Aug 08 13:28:17 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. Acc
ording to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 13:28:17 INFO db.DBInputFormat: Using read commited transaction isolation
19/08/08 13:28:17 INFO mapreduce.JobSubmitter: number of splits:1
19/08/08 13:28:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564669682555_0018
19/08/08 13:28:18 INFO impl.YarnClientImpl: Submitted application application_1564669682555_0018
19/08/08 13:28:18 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1564669682555_0018/
19/08/08 13:28:18 INFO mapreduce.Job: Running job: job_1564669682555_0018
19/08/08 13:28:26 INFO mapreduce.Job: Job job_1564669682555_0018 running in uber mode : false
19/08/08 13:28:26 INFO mapreduce.Job:  map 0% reduce 0%
19/08/08 13:28:34 INFO mapreduce.Job:  map 100% reduce 0%
19/08/08 13:28:34 INFO mapreduce.Job: Job job_1564669682555_0018 completed successfully
19/08/08 13:28:34 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=125987
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=29
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=5208
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=5208
        Total vcore-milliseconds taken by all map tasks=5208
        Total megabyte-milliseconds taken by all map tasks=5332992
    Map-Reduce Framework
        Map input records=2
        Map output records=2
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=197
        CPU time spent (ms)=1830
        Physical memory (bytes) snapshot=154382336
        Virtual memory (bytes) snapshot=2117656576
        Total committed heap usage (bytes)=81264640
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=29
19/08/08 13:28:34 INFO mapreduce.ImportJobBase: Transferred 29 bytes in 19.9979 seconds (1.4502 bytes/sec)
19/08/08 13:28:34 INFO mapreduce.ImportJobBase: Retrieved 2 records.
[root@master ~]# hadoop fs -ls /user/leco/
Found 2 items
-rw-r--r--   3 root supergroup          0 2019-08-08 13:28 /user/leco/_SUCCESS
-rw-r--r--   3 root supergroup         29 2019-08-08 13:28 /user/leco/part-m-00000
[root@master ~]# hadoop fs -text /user/leco/part-m-00000
1   caimz   male
2   summer  female

1.1.2 查询数据导入

  很多时候,我们是查询后的数据导入到hadoop平台。

查询数据导入

sqoop import \
--connect jdbc:mysql://master:3306/leco \
--username root \
--password root \
--target-dir /user/leco \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--query 'select name,sex from staff where id <=1 and $CONDITIONS;'

提示:must contain 'CONDITIONS' in WHERE clause. 如果query后使用的是双引号,则CONDITIONS前必须加转移符,防止shell识别为自己的变量。

若是双引号,如下更改

sqoop import \
--connect jdbc:mysql://master:3306/leco \
--username root \
--password root \
--target-dir /user/leco \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--query "select name,sex from staff where id <=1 and \$CONDITIONS;"

详细操作
[root@master ~]# sqoop import \
> --connect jdbc:mysql://master:3306/leco \
> --username root \
> --password root \
> --target-dir /user/leco \
> --delete-target-dir \
> --num-mappers 1 \
> --fields-terminated-by "\t" \
> --query 'select name,sex from staff where id <=1 and $CONDITIONS;'
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/08/08 14:01:33 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/08/08 14:01:33 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/08/08 14:01:33 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/08/08 14:01:33 INFO tool.CodeGenTool: Beginning code generation
Thu Aug 08 14:01:34 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. Acc
ording to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 14:01:34 INFO manager.SqlManager: Executing SQL statement: select name,sex from staff where id <=1 and  (1 = 0) ;
19/08/08 14:01:34 INFO manager.SqlManager: Executing SQL statement: select name,sex from staff where id <=1 and  (1 = 0) ;
19/08/08 14:01:34 INFO manager.SqlManager: Executing SQL statement: select name,sex from staff where id <=1 and  (1 = 0) ;
19/08/08 14:01:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-root/compile/de160ffff6ccd016ac92f2abe3d5259a/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/08/08 14:01:35 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/de160ffff6ccd016ac92f2abe3d5259a/Que
ryResult.jarSLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.5/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/Sta
ticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/local/hbase-1.3.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class
]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/08/08 14:01:37 INFO tool.ImportTool: Destination directory /user/leco deleted.
19/08/08 14:01:37 INFO mapreduce.ImportJobBase: Beginning query import.
19/08/08 14:01:37 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/08/08 14:01:37 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/08/08 14:01:37 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.186.10:8032
Thu Aug 08 14:01:40 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. Acc
ording to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 14:01:40 INFO db.DBInputFormat: Using read commited transaction isolation
19/08/08 14:01:40 INFO mapreduce.JobSubmitter: number of splits:1
19/08/08 14:01:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564669682555_0019
19/08/08 14:01:41 INFO impl.YarnClientImpl: Submitted application application_1564669682555_0019
19/08/08 14:01:41 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1564669682555_0019/
19/08/08 14:01:41 INFO mapreduce.Job: Running job: job_1564669682555_0019
19/08/08 14:01:48 INFO mapreduce.Job: Job job_1564669682555_0019 running in uber mode : false
19/08/08 14:01:48 INFO mapreduce.Job:  map 0% reduce 0%
19/08/08 14:01:57 INFO mapreduce.Job:  map 100% reduce 0%
19/08/08 14:01:58 INFO mapreduce.Job: Job job_1564669682555_0019 completed successfully
19/08/08 14:01:59 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=125714
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=11
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=6692
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=6692
        Total vcore-milliseconds taken by all map tasks=6692
        Total megabyte-milliseconds taken by all map tasks=6852608
    Map-Reduce Framework
        Map input records=1
        Map output records=1
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=259
        CPU time spent (ms)=1950
        Physical memory (bytes) snapshot=159064064
        Virtual memory (bytes) snapshot=2119004160
        Total committed heap usage (bytes)=90701824
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=11
19/08/08 14:01:59 INFO mapreduce.ImportJobBase: Transferred 11 bytes in 21.6702 seconds (0.5076 bytes/sec)
19/08/08 14:01:59 INFO mapreduce.ImportJobBase: Retrieved 1 records.
[root@master ~]# hadoop fs -ls /user/leco/
Found 2 items
-rw-r--r--   3 root supergroup          0 2019-08-08 14:01 /user/leco/_SUCCESS
-rw-r--r--   3 root supergroup         11 2019-08-08 14:01 /user/leco/part-m-00000
[root@master ~]# hadoop fs -text /user/leco/part*
caimz   male

1.1.3 导入指定的列

导入指定的列

sqoop import \
--connect jdbc:mysql://master:3306/leco \
--username root \
--password root \
--target-dir /user/leco \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--columns id,sex \
--table staff

columns中如果涉及到多列,用逗号分隔,分隔时不要添加空格

详细操作
[root@master ~]# sqoop import \
> --connect jdbc:mysql://master:3306/leco \
> --username root \
> --password root \
> --target-dir /user/leco \
> --delete-target-dir \
> --num-mappers 1 \
> --fields-terminated-by "\t" \
> --columns id,sex \
> --table staff
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/08/08 14:13:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/08/08 14:13:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/08/08 14:13:26 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/08/08 14:13:26 INFO tool.CodeGenTool: Beginning code generation
Thu Aug 08 14:13:26 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. Acc
ording to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 14:13:27 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff` AS t LIMIT 1
19/08/08 14:13:27 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff` AS t LIMIT 1
19/08/08 14:13:27 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-root/compile/74794c917409ffb84a7004544b56eba5/staff.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/08/08 14:13:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/74794c917409ffb84a7004544b56eba5/sta
ff.jarSLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.5/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/Sta
ticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/local/hbase-1.3.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class
]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/08/08 14:13:29 INFO tool.ImportTool: Destination directory /user/leco deleted.
19/08/08 14:13:29 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/08/08 14:13:29 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/08/08 14:13:29 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/08/08 14:13:29 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/08/08 14:13:29 INFO mapreduce.ImportJobBase: Beginning import of staff
19/08/08 14:13:29 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/08/08 14:13:29 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/08/08 14:13:29 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.186.10:8032
Thu Aug 08 14:13:31 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. Acc
ording to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 14:13:31 INFO db.DBInputFormat: Using read commited transaction isolation
19/08/08 14:13:31 INFO mapreduce.JobSubmitter: number of splits:1
19/08/08 14:13:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564669682555_0020
19/08/08 14:13:32 INFO impl.YarnClientImpl: Submitted application application_1564669682555_0020
19/08/08 14:13:32 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1564669682555_0020/
19/08/08 14:13:32 INFO mapreduce.Job: Running job: job_1564669682555_0020
19/08/08 14:13:39 INFO mapreduce.Job: Job job_1564669682555_0020 running in uber mode : false
19/08/08 14:13:39 INFO mapreduce.Job:  map 0% reduce 0%
19/08/08 14:13:45 INFO mapreduce.Job:  map 100% reduce 0%
19/08/08 14:13:45 INFO mapreduce.Job: Job job_1564669682555_0020 completed successfully
19/08/08 14:13:45 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=125980
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=16
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=3283
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=3283
        Total vcore-milliseconds taken by all map tasks=3283
        Total megabyte-milliseconds taken by all map tasks=3361792
    Map-Reduce Framework
        Map input records=2
        Map output records=2
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=318
        CPU time spent (ms)=1510
        Physical memory (bytes) snapshot=159576064
        Virtual memory (bytes) snapshot=2117361664
        Total committed heap usage (bytes)=92798976
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=16
19/08/08 14:13:45 INFO mapreduce.ImportJobBase: Transferred 16 bytes in 16.0635 seconds (0.996 bytes/sec)
19/08/08 14:13:45 INFO mapreduce.ImportJobBase: Retrieved 2 records.
[root@master ~]# hadoop fs -ls /user/leco/
Found 2 items
-rw-r--r--   3 root supergroup          0 2019-08-08 14:13 /user/leco/_SUCCESS
-rw-r--r--   3 root supergroup         16 2019-08-08 14:13 /user/leco/part-m-00000
[root@master ~]# hadoop fs -text /user/leco/part*
1   male
2   female

1.1.4 关键字筛选

关键字筛选

sqoop import \
--connect jdbc:mysql://master:3306/leco \
--username root \
--password root \
--target-dir /user/leco \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--table staff \
--where "id=2"

关键字筛选2

sqoop import \
--connect jdbc:mysql://master:3306/leco \
--username root \
--password root \
--target-dir /user/leco \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--table staff \
--columns id,sex \
--where "id=2"

指定列也指定其他条件

详细操作
[root@master ~]# sqoop import \
> --connect jdbc:mysql://master:3306/leco \
> --username root \
> --password root \
> --target-dir /user/leco \
> --delete-target-dir \
> --num-mappers 1 \
> --fields-terminated-by "\t" \
> --table staff \
> --where "id=2"
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/08/08 14:18:58 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/08/08 14:18:58 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/08/08 14:18:58 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/08/08 14:18:58 INFO tool.CodeGenTool: Beginning code generation
Thu Aug 08 14:18:58 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. Acc
ording to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 14:18:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff` AS t LIMIT 1
19/08/08 14:18:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff` AS t LIMIT 1
19/08/08 14:18:59 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-root/compile/e220ed27dc2c455f1d8dbab25fde1737/staff.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/08/08 14:19:00 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/e220ed27dc2c455f1d8dbab25fde1737/sta
ff.jarSLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.5/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/Sta
ticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/local/hbase-1.3.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class
]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/08/08 14:19:01 INFO tool.ImportTool: Destination directory /user/leco deleted.
19/08/08 14:19:01 WARN manager.MySQLManager: It looks like you are importing from mysql.
19/08/08 14:19:01 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
19/08/08 14:19:01 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
19/08/08 14:19:01 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
19/08/08 14:19:01 INFO mapreduce.ImportJobBase: Beginning import of staff
19/08/08 14:19:01 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/08/08 14:19:01 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/08/08 14:19:01 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.186.10:8032
Thu Aug 08 14:19:04 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. Acc
ording to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 14:19:04 INFO db.DBInputFormat: Using read commited transaction isolation
19/08/08 14:19:04 INFO mapreduce.JobSubmitter: number of splits:1
19/08/08 14:19:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564669682555_0021
19/08/08 14:19:05 INFO impl.YarnClientImpl: Submitted application application_1564669682555_0021
19/08/08 14:19:05 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1564669682555_0021/
19/08/08 14:19:05 INFO mapreduce.Job: Running job: job_1564669682555_0021
19/08/08 14:19:11 INFO mapreduce.Job: Job job_1564669682555_0021 running in uber mode : false
19/08/08 14:19:11 INFO mapreduce.Job:  map 0% reduce 0%
19/08/08 14:19:16 INFO mapreduce.Job:  map 100% reduce 0%
19/08/08 14:19:17 INFO mapreduce.Job: Job job_1564669682555_0021 completed successfully
19/08/08 14:19:18 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=126128
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=16
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=3162
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=3162
        Total vcore-milliseconds taken by all map tasks=3162
        Total megabyte-milliseconds taken by all map tasks=3237888
    Map-Reduce Framework
        Map input records=1
        Map output records=1
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=147
        CPU time spent (ms)=1950
        Physical memory (bytes) snapshot=160710656
        Virtual memory (bytes) snapshot=2117566464
        Total committed heap usage (bytes)=88080384
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=16
19/08/08 14:19:18 INFO mapreduce.ImportJobBase: Transferred 16 bytes in 16.6825 seconds (0.9591 bytes/sec)
19/08/08 14:19:18 INFO mapreduce.ImportJobBase: Retrieved 1 records.
[root@master ~]# hadoop fs -ls /user/leco/
Found 2 items
-rw-r--r--   3 root supergroup          0 2019-08-08 14:19 /user/leco/_SUCCESS
-rw-r--r--   3 root supergroup         16 2019-08-08 14:19 /user/leco/part-m-00000
[root@master ~]# hadoop fs -text /user/leco/part*
2   summer  female

1.2 RDBMS到Hive

  不光可以把数据导入到hdfs,也可以导入到hive。

must be in the form of a
Argument Description
--column-family <family> Sets the target column family for the import
--hbase-create-table If specified, create missing HBase tables
--hbase-row-key <col> Specifies which input column to use as the row key
In case, if input table contains composite
key, then
comma-separated list of composite key
attributes
--hbase-table <table-name> Specifies an HBase table to use as the target instead of HDFS
--hbase-bulkload Enables bulk loading

提前要建立好hive中的表

create table staff_hive
(
    id int, 
    name string
) 
row format delimited fields terminated by '\t'
location '/user/hive/warehouse/staff_hive/';

个人感觉有点鸡肋,在导入到hive之前需要建表.[表字段,字段多少,和数据加载位置都是提前规划好,要和sqoop配合]

RDBMS到Hive

sqoop import \
--connect jdbc:mysql://master:3306/leco \
--username root \
--password root \
--table staff \
--num-mappers 1 \
--hive-import \
--delete-target-dir \
--fields-terminated-by "\t" \
--hive-overwrite \
--hive-table staff_hive
详细操作

``` [root@master ~]# sqoop import \

--connect jdbc:mysql://master:3306/leco \ --username root \ --password root \ --table staff \ --num-mappers 1 \ --hive-import \ --delete-target-dir \ --fields-terminated-by "\t" \ --hive-overwrite \ --hive-table staff_hive Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 19/08/08 14:37:12 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7 19/08/08 14:37:12 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 19/08/08 14:37:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 19/08/08 14:37:13 INFO tool.CodeGenTool: Beginning code generation Thu Aug 08 14:37:13 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL con nection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 14:37:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM staff AS t LIMIT 1 19/08/08 14:37:13 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM staff AS t LIMIT 1 19/08/08 14:37:13 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop Note: /tmp/sqoop-root/compile/67cb79afcfabe2efcb79b43ef79e1061/staff.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 19/08/08 14:37:15 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/67cb79afcfabe2efcb79b43ef79e1061/staff.jar SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.5/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hbase-1.3.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 19/08/08 14:37:15 INFO tool.ImportTool: Destination directory staff deleted. 19/08/08 14:37:15 WARN manager.MySQLManager: It looks like you are importing from mysql. 19/08/08 14:37:15 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 19/08/08 14:37:15 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 19/08/08 14:37:15 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 19/08/08 14:37:15 INFO mapreduce.ImportJobBase: Beginning import of staff 19/08/08 14:37:15 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 19/08/08 14:37:16 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 19/08/08 14:37:16 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.186.10:8032 Thu Aug 08 14:37:18 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL con nection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 14:37:18 INFO db.DBInputFormat: Using read commited transaction isolation 19/08/08 14:37:18 INFO mapreduce.JobSubmitter: number of splits:1 19/08/08 14:37:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564669682555_0024 19/08/08 14:37:19 INFO impl.YarnClientImpl: Submitted application application_1564669682555_0024 19/08/08 14:37:19 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1564669682555_0024/ 19/08/08 14:37:19 INFO mapreduce.Job: Running job: job_1564669682555_0024 19/08/08 14:37:27 INFO mapreduce.Job: Job job_1564669682555_0024 running in uber mode : false 19/08/08 14:37:27 INFO mapreduce.Job: map 0% reduce 0% 19/08/08 14:37:33 INFO mapreduce.Job: map 100% reduce 0% 19/08/08 14:37:33 INFO mapreduce.Job: Job job_1564669682555_0024 completed successfully 19/08/08 14:37:33 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=125993 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=87 HDFS: Number of bytes written=29 HDFS: Number of read operations=4 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Other local map tasks=1 Total time spent by all maps in occupied slots (ms)=3600 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=3600 Total vcore-milliseconds taken by all map tasks=3600 Total megabyte-milliseconds taken by all map tasks=3686400 Map-Reduce Framework Map input records=2 Map output records=2 Input split bytes=87 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=63 CPU time spent (ms)=1830 Physical memory (bytes) snapshot=168505344 Virtual memory (bytes) snapshot=2117533696 Total committed heap usage (bytes)=80740352 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=29 19/08/08 14:37:33 INFO mapreduce.ImportJobBase: Transferred 29 bytes in 17.5846 seconds (1.6492 bytes/sec) 19/08/08 14:37:33 INFO mapreduce.ImportJobBase: Retrieved 2 records. 19/08/08 14:37:33 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table staff Thu Aug 08 14:37:33 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL con nection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 14:37:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM staff AS t LIMIT 1 19/08/08 14:37:33 INFO hive.HiveImport: Loading uploaded data into Hive

Logging initialized using configuration in jar:file:/usr/local/apache-hive-1.2.2-bin/lib/hive-common-1.2.2.jar!/hive-log4j.properties OK Time taken: 1.603 seconds Loading data to table default.staff_hive Table default.staff_hive stats: [numFiles=1, numRows=0, totalSize=29, rawDataSize=0] OK Time taken: 1.116 seconds


[root@master ~]# hadoop fs -ls /user/hive/warehouse/staff_hive Found 1 items -rwxr-xr-x 3 root supergroup 29 2019-08-08 14:37 /user/hive/warehouse/staff_hive/part-m-00000

hive (default)> select * from staff_hive;
OK
staff_hive.id   staff_hive.name
1   caimz
2   summer
Time taken: 0.355 seconds, Fetched: 2 row(s)
```

提示 :该过程分为两步,第一步将数据导入到HDFS,第二步将导入到HDFS的数据迁移到Hive仓库,第一步默认的临时目录是/user/用户/表名

bug

Q: 在导入hive的时候报错
Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.

A: 
[root@master ~]# tail -1 /etc/profile
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
[root@master ~]# source /etc/profile

1.3 RDBMS到HBASE

Argument Description
--accumulo-table <table-nam> Specifies an Accumulo table to use as the target instead of HDFS
--accumulo-column-family <family> Sets the target column family for the import
--accumulo-create-table If specified, create missing Accumulo tables
--accumulo-row-key <col> Specifies which input column to use as the row key
--accumulo-visibility <vis> (Optional) Specifies a visibility token to apply to all rows inserted into Accumulo. Default is the empty string.
--accumulo-batch-size <size> (Optional) Sets the size in bytes of Accumulo’s write buffer. Default is 4MB.
--accumulo-max-latency <ms> (Optional) Sets the max latency in milliseconds for the Accumulo batch writer. Default is 0.
--accumulo-zookeepers <host:port> Comma-separated list of Zookeeper servers used by the Accumulo instance
--accumulo-instance <table-name> Name of the target Accumulo instance
--accumulo-user <username> Name of the Accumulo user to import as
--accumulo-password <password> Password for the Accumulo user

RDBMS到HBASE

sqoop import \
--connect jdbc:mysql://master:3306/leco \
--username root \
--password root \
--table staff \
--columns "id,name,sex" \
--column-family "info" \
--hbase-create-table \
--hbase-row-key "id" \
--hbase-table "hbase_company" \
--num-mappers 1 \
--split-by id

sqoop 将mysql的数据导入到hbase不需要提前建表,而hive需要建表[需要手动建表]。

详细操作
[root@master ~]# sqoop import \
> --connect jdbc:mysql://master:3306/leco \
> --username root \
> --password root \
> --table staff \
> --columns "id,name,sex" \
> --column-family "info" \
> --hbase-create-table \
> --hbase-row-key "id" \
> --hbase-table "hbase_company" \
> --num-mappers 1 \
> --split-by id
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/08/08 15:21:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
.....
....
.. 省略

19/08/08 15:21:58 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/hadoop-2.6.5/lib/native
19/08/08 15:21:58 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
19/08/08 15:21:58 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
19/08/08 15:21:58 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
19/08/08 15:21:58 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
19/08/08 15:21:58 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-957.el7.x86_64
19/08/08 15:21:58 INFO zookeeper.ZooKeeper: Client environment:user.name=root
19/08/08 15:21:58 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
19/08/08 15:21:58 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root
19/08/08 15:21:58 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,slave1:2181,slave2:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.Pe
ndingWatcher@43826ec19/08/08 15:21:58 INFO zookeeper.ClientCnxn: Opening socket connection to server master/192.168.186.10:2181. Will not attempt to authenticate using SASL (unknown error)
19/08/08 15:21:58 INFO zookeeper.ClientCnxn: Socket connection established to master/192.168.186.10:2181, initiating session
19/08/08 15:21:58 INFO zookeeper.ClientCnxn: Session establishment complete on server master/192.168.186.10:2181, sessionid = 0x16c6f43f28b0001, negotiated timeout = 40000
19/08/08 15:21:59 INFO mapreduce.HBaseImportJob: Creating missing HBase table hbase_company
19/08/08 15:22:02 INFO client.HBaseAdmin: Created hbase_company
19/08/08 15:22:02 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
19/08/08 15:22:02 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x16c6f43f28b0001
19/08/08 15:22:02 INFO zookeeper.ZooKeeper: Session: 0x16c6f43f28b0001 closed
19/08/08 15:22:02 INFO zookeeper.ClientCnxn: EventThread shut down
19/08/08 15:22:02 WARN mapreduce.TableMapReduceUtil: The addDependencyJars(Configuration, Class<?>...) method has been deprecated since it is easy to use incorrectly. Most users should rely
 on addDependencyJars(Job) instead. See HBASE-8386 for more details.19/08/08 15:22:02 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.186.10:8032
Thu Aug 08 15:22:06 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL con
nection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 15:22:06 INFO db.DBInputFormat: Using read commited transaction isolation
19/08/08 15:22:06 INFO mapreduce.JobSubmitter: number of splits:1
19/08/08 15:22:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564669682555_0026
19/08/08 15:22:07 INFO impl.YarnClientImpl: Submitted application application_1564669682555_0026
19/08/08 15:22:07 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1564669682555_0026/
19/08/08 15:22:07 INFO mapreduce.Job: Running job: job_1564669682555_0026
19/08/08 15:22:13 INFO mapreduce.Job: Job job_1564669682555_0026 running in uber mode : false
19/08/08 15:22:13 INFO mapreduce.Job:  map 0% reduce 0%
19/08/08 15:22:19 INFO mapreduce.Job:  map 100% reduce 0%
19/08/08 15:22:19 INFO mapreduce.Job: Job job_1564669682555_0026 completed successfully
19/08/08 15:22:19 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=156243
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=1
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=3723
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=3723
        Total vcore-milliseconds taken by all map tasks=3723
        Total megabyte-milliseconds taken by all map tasks=3812352
    Map-Reduce Framework
        Map input records=2
        Map output records=2
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=84
        CPU time spent (ms)=2560
        Physical memory (bytes) snapshot=198623232
        Virtual memory (bytes) snapshot=2141908992
        Total committed heap usage (bytes)=89128960
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0
19/08/08 15:22:19 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 17.347 seconds (0 bytes/sec)
19/08/08 15:22:19 INFO mapreduce.ImportJobBase: Retrieved 2 records.

查看hbase
hbase(main):001:0> list
TABLE                                                                   
cmz_music_table                                                                      
emp                                                                                    
hbase_company                                                                          
3 row(s) in 0.1760 seconds
=> ["cmz_music_table", "emp", "hbase_company"]
hbase(main):002:0> scan 'hbase_company'
ROW                                              COLUMN+CELL                           
 1                                               column=info:name, timestamp=1565248938239, value=caimz                                                                                      
 1                                               column=info:sex, timestamp=1565248938239, value=male                                                                                        
 2                                               column=info:name, timestamp=1565248938239, value=summer                                                                                     
 2                                               column=info:sex, timestamp=1565248938239, value=female                                                                                      
2 row(s) in 0.1580 seconds

提示:sqoop1.4.6只支持HBase1.0.1之前的版本的自动创建HBase表的功能

导入失败,解决方案:手动创建HBase表
hbase> create 'hbase_company,'info'

在HBase中scan这张表得到如下内容
hbase> scan ‘hbase_company’

2. 数据导出

  在Sqoop中、 导出 概念指:从大数据集群(HDFS,HIVE,HBASE)向非大数据集群(RDBMS)中传输数据,叫做:导出,即使用export关键字。

2.1 HIVE/HDFS到RDBMS

  开始之前要清楚到mysql里面数据,否则有冲突,因为数据一样了,不冲突可以不清除

mysql> show tables;
+----------------+
| Tables_in_leco |
+----------------+
| staff          |
+----------------+
1 row in set (0.00 sec)

mysql> select * from staff;
+----+--------+--------+
| id | name   | sex    |
+----+--------+--------+
|  1 | caimz  | male   |
|  2 | summer | female |
+----+--------+--------+
2 rows in set (0.00 sec)

mysql> truncate staff;
Query OK, 0 rows affected (0.03 sec)

mysql> select * from staff;
Empty set (0.00 sec)

HIVE/HDFS到MySQL

sqoop export \
--connect jdbc:mysql://master:3306/leco \
--username root \
--password root \
--table staff \
--num-mappers 1 \
--export-dir /user/hive/warehouse/staff_hive \
--input-fields-terminated-by "\t"

提示:Mysql中如果表不存在,不会自动创建

详细操作
[root@master ~]# sqoop export \
> --connect jdbc:mysql://master:3306/leco \
> --username root \
> --password root \
> --table staff \
> --num-mappers 1 \
> --export-dir /user/hive/warehouse/staff_hive \
> --input-fields-terminated-by "\t"
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/08/08 16:20:45 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/08/08 16:20:45 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/08/08 16:20:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/08/08 16:20:45 INFO tool.CodeGenTool: Beginning code generation
Thu Aug 08 16:20:45 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL con
nection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 16:20:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff` AS t LIMIT 1
19/08/08 16:20:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff` AS t LIMIT 1
19/08/08 16:20:45 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-root/compile/81e30efa96e33c60243ec1d139234f29/staff.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/08/08 16:20:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/81e30efa96e33c60243ec1d139234f29/staff.jar
19/08/08 16:20:47 INFO mapreduce.ExportJobBase: Beginning export of staff
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.5/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-1.3.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/08/08 16:20:47 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/08/08 16:20:48 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
19/08/08 16:20:48 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
19/08/08 16:20:48 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/08/08 16:20:48 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.186.10:8032
19/08/08 16:20:51 INFO input.FileInputFormat: Total input paths to process : 1
19/08/08 16:20:51 INFO input.FileInputFormat: Total input paths to process : 1
19/08/08 16:20:52 INFO mapreduce.JobSubmitter: number of splits:1
19/08/08 16:20:52 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
19/08/08 16:20:52 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564669682555_0028
19/08/08 16:20:52 INFO impl.YarnClientImpl: Submitted application application_1564669682555_0028
19/08/08 16:20:52 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1564669682555_0028/
19/08/08 16:20:52 INFO mapreduce.Job: Running job: job_1564669682555_0028
19/08/08 16:20:58 INFO mapreduce.Job: Job job_1564669682555_0028 running in uber mode : false
19/08/08 16:20:58 INFO mapreduce.Job:  map 0% reduce 0%
19/08/08 16:21:03 INFO mapreduce.Job:  map 100% reduce 0%
19/08/08 16:21:03 INFO mapreduce.Job: Job job_1564669682555_0028 completed successfully
19/08/08 16:21:03 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=125696
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=175
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Launched map tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=2771
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=2771
        Total vcore-milliseconds taken by all map tasks=2771
        Total megabyte-milliseconds taken by all map tasks=2837504
    Map-Reduce Framework
        Map input records=2
        Map output records=2
        Input split bytes=143
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=54
        CPU time spent (ms)=1190
        Physical memory (bytes) snapshot=166068224
        Virtual memory (bytes) snapshot=2117251072
        Total committed heap usage (bytes)=81788928
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0
19/08/08 16:21:03 INFO mapreduce.ExportJobBase: Transferred 175 bytes in 15.2571 seconds (11.4701 bytes/sec)
19/08/08 16:21:03 INFO mapreduce.ExportJobBase: Exported 2 records.

检查mysql是否数据导入了
mysql> select * from staff;
+----+--------+--------+
| id | name   | sex    |
+----+--------+--------+
|  1 | caimz  | male   |
|  2 | summer | female |
+----+--------+--------+
2 rows in set (0.01 sec)

3. 脚本执行

  使用opt格式的文件打包sqoop命令,然后执行。

脚本执行

mkdir opt
cat> opt/job_HDFS2RDBMS.opt<<EOF
export
--connect
jdbc:mysql://master:3306/leco
--username
root
--password
root
--table
staff
--num-mappers
1
--export-dir
/user/hive/warehouse/staff_hive
--input-fields-terminated-by
"\t"
EOF
sqoop --options-file opt/job_HDFS2RDBMS.opt

脚本后缀名无所谓,按照一定标准即可,比如hive的叫hql

详细操作
mysql> select * from staff;
+----+--------+--------+
| id | name   | sex    |
+----+--------+--------+
|  1 | caimz  | male   |
|  2 | summer | female |
+----+--------+--------+
2 rows in set (0.01 sec)

mysql> truncate staff;
Query OK, 0 rows affected (0.02 sec)

[root@master ~]# mkdir opt
[root@master ~]# cat> opt/job_HDFS2RDBMS.opt<<EOF
> export
> --connect
> jdbc:mysql://master:3306/leco
> --username
> root
> --password
> root
> --table
> staff
> --num-mappers
> 1
> --export-dir
> /user/hive/warehouse/staff_hive
> --input-fields-terminated-by
> "\t"
> EOF
[root@master ~]# cat opt/job_HDFS2RDBMS.opt 
export
--connect
jdbc:mysql://master:3306/leco
--username
root
--password
root
--table
staff
--num-mappers
1
--export-dir
/user/hive/warehouse/staff_hive
--input-fields-terminated-by
"\t"
[root@master ~]# sqoop --options-file opt/job_HDFS2RDBMS.opt
Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
19/08/08 16:30:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
19/08/08 16:30:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
19/08/08 16:30:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
19/08/08 16:30:02 INFO tool.CodeGenTool: Beginning code generation
Thu Aug 08 16:30:02 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL con
nection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.19/08/08 16:30:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff` AS t LIMIT 1
19/08/08 16:30:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `staff` AS t LIMIT 1
19/08/08 16:30:03 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop
Note: /tmp/sqoop-root/compile/f7909ed980b2b41bad0d8d3fe2d44e56/staff.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
19/08/08 16:30:04 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f7909ed980b2b41bad0d8d3fe2d44e56/staff.jar
19/08/08 16:30:04 INFO mapreduce.ExportJobBase: Beginning export of staff
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.5/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-1.3.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/08/08 16:30:04 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
19/08/08 16:30:05 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
19/08/08 16:30:05 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
19/08/08 16:30:05 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
19/08/08 16:30:05 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.186.10:8032
19/08/08 16:30:08 INFO input.FileInputFormat: Total input paths to process : 1
19/08/08 16:30:08 INFO input.FileInputFormat: Total input paths to process : 1
19/08/08 16:30:08 INFO mapreduce.JobSubmitter: number of splits:1
19/08/08 16:30:08 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
19/08/08 16:30:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564669682555_0029
19/08/08 16:30:08 INFO impl.YarnClientImpl: Submitted application application_1564669682555_0029
19/08/08 16:30:08 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1564669682555_0029/
19/08/08 16:30:08 INFO mapreduce.Job: Running job: job_1564669682555_0029
19/08/08 16:30:15 INFO mapreduce.Job: Job job_1564669682555_0029 running in uber mode : false
19/08/08 16:30:15 INFO mapreduce.Job:  map 0% reduce 0%
19/08/08 16:30:20 INFO mapreduce.Job:  map 100% reduce 0%
19/08/08 16:30:20 INFO mapreduce.Job: Job job_1564669682555_0029 completed successfully
19/08/08 16:30:20 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=125696
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=175
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Launched map tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=2960
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=2960
        Total vcore-milliseconds taken by all map tasks=2960
        Total megabyte-milliseconds taken by all map tasks=3031040
    Map-Reduce Framework
        Map input records=2
        Map output records=2
        Input split bytes=143
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=244
        CPU time spent (ms)=2280
        Physical memory (bytes) snapshot=168570880
        Virtual memory (bytes) snapshot=2115022848
        Total committed heap usage (bytes)=90701824
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0
19/08/08 16:30:20 INFO mapreduce.ExportJobBase: Transferred 175 bytes in 14.7309 seconds (11.8798 bytes/sec)
19/08/08 16:30:20 INFO mapreduce.ExportJobBase: Exported 2 records.

检查mysql是否数据导入了
mysql> select * from staff;
+----+--------+--------+
| id | name   | sex    |
+----+--------+--------+
|  1 | caimz  | male   |
|  2 | summer | female |
+----+--------+--------+
2 rows in set (0.00 sec)