spark sql
通过Spark SQL 查询hive中的数据 要求提供操作截图
- 切换数据库,选择指定的表;
- 操作
sunyonggang@gg01:~/spark-1.4.0-bin-hadoop2.6$ ./bin/spark-sql --driver-class-path /home/sunyonggang/apache-hive-0.13.1-bin/lib/mysql-connector-java-5.1.36-bin.jar
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
spark-sql> show databases;
16/05/07 10:55:13 INFO ParseDriver: Parsing command: show databases
16/05/07 10:55:13 INFO ParseDriver: Parse Completed
16/05/07 10:55:15 INFO DAGScheduler: Job 0 finished: processCmd at CliDriver.java:423, took 0.342186 s
16/05/07 10:55:15 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
default
sogou
Time taken: 1.951 seconds, Fetched 2 row(s)
16/05/07 10:55:15 INFO CliDriver: Time taken: 1.951 seconds, Fetched 2 row(s)
16/05/07 10:55:15 INFO StatsReportListener: 96 % 96 % 96 % 96 % 96 % 96 % 96 % 96 % 96 %
use sogou;
16/05/07 10:55:23 INFO ParseDriver: Parsing command: use sogou
16/05/07 10:55:23 INFO ParseDriver: Parse Completed
show tables;
16/05/07 10:55:28 INFO HiveMetaStore: 0: get_tables: db=sogou pat=.*
16/05/07 10:55:28 INFO audit: ugi=sunyonggang ip=unknown-ip-addr cmd=get_tables: db=sogou pat=.*
16/05/07 10:55:28 INFO SparkContext: Starting job: processCmd at CliDriver.java:423
16/05/07 10:55:28 INFO DAGScheduler: Got job 2 (processCmd at CliDriver.java:423) with 1 output partitions (allowLocal=false)
16/05/07 10:55:28 INFO DAGScheduler: Final stage: ResultStage 2(processCmd at CliDriver.java:423)
16/05/07 10:55:28 INFO DAGScheduler: Parents of final stage: List()
16/05/07 10:55:28 INFO DAGScheduler: Missing parents: List()
16/05/07 10:55:28 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[5] at processCmd at CliDriver.java:423), which has no missing parents
16/05/07 10:55:28 INFO MemoryStore: ensureFreeSpace(2816) called with curMem=4314, maxMem=280248975
16/05/07 10:55:28 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.8 KB, free 267.3 MB)
16/05/07 10:55:28 INFO MemoryStore: ensureFreeSpace(1685) called with curMem=7130, maxMem=280248975
16/05/07 10:55:28 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1685.0 B, free 267.3 MB)
16/05/07 10:55:28 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:48651 (size: 1685.0 B, free: 267.3 MB)
16/05/07 10:55:28 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874
16/05/07 10:55:28 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[5] at processCmd at CliDriver.java:423)
16/05/07 10:55:28 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
16/05/07 10:55:28 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, PROCESS_LOCAL, 1902 bytes)
16/05/07 10:55:28 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
16/05/07 10:55:28 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 1149 bytes result sent to driver
16/05/07 10:55:28 INFO DAGScheduler: ResultStage 2 (processCmd at CliDriver.java:423) finished in 0.038 s
16/05/07 10:55:28 INFO DAGScheduler: Job 2 finished: processCmd at CliDriver.java:423, took 0.062450 s
sg false16/05/07 10:55:28 INFO StatsReportListener: Finished stage: org.apache.spark.scheduler.StageInfo@6f05894a
16/05/07 10:55:28 INFO StatsReportListener: task runtime:(count: 1, mean: 42.000000, stdev: 0.000000, max: 42.000000, min: 42.000000)
sunyonggang false
userlog false
16/05/07 10:55:28 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
16/05/07 10:55:28 INFO StatsReportListener: 42.0 ms 42.0 ms 42.0 ms 42.0 ms 42.0 ms 42.0 ms 42.0 ms 42.0 ms 42.0 ms
Time taken: 0.096 seconds, Fetched 3 row(s)
3.编写sql
select address, count(*) as number from userlog group by address order by number desc;
吉首市 2243516/05/07 11:00:40 INFO StatsReportListener: Finished stage: org.apache.spark.scheduler.StageInfo@8a8fbe2
16/05/07 11:00:40 INFO StatsReportListener: task runtime:(count: 9, mean: 25.666667, stdev: 9.153020, max: 42.000000, min: 13.000000)
16/05/07 11:00:40 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
16/05/07 11:00:40 INFO StatsReportListener: 13.0 ms 13.0 ms 13.0 ms 20.0 ms 26.0 ms 30.0 ms 42.0 ms 42.0 ms 42.0 ms
龙山县 12445
永顺县 12146
凤凰县 10548
花垣县 10453
16/05/07 11:00:40 INFO StatsReportListener: fetch wait time:(count: 9, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
保靖县 7258
泸溪县 7102
古丈县 3721
写一个简单的程序通过JDBC访问Spark SQL Thrift server能实现查询,要求过程和截图
- 启动thrift server
sunyonggang@gg01:~/spark-1.4.0-bin-hadoop2.6$ ./sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.port=10000 --hiveconf hive.server2.thrift.bind.host=gg01 --master spark://gg01:7077 --driver-class-path /home/sunyonggang/apache-hive-0.13.1-bin/lib/mysql-connector-java-5.1.36-bin.jar --executor-memory 1g
starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /home/sunyonggang/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-sunyonggang-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-gg01.out
sunyonggang@gg01:~/spark-1.4.0-bin-hadoop2.6$ cd ~/apache-hive-0.13.1-bin/
sunyonggang@gg01:~/apache-hive-0.13.1-bin$ ./bin/beeline
Beeline version 0.13.1 by Apache Hive
beeline> !connect jdbc:hive2://gg01:10000
scan complete in 3ms
Connecting to jdbc:hive2://gg01:10000
Enter username for jdbc:hive2://gg01:10000: sunyonggang
Enter password for jdbc:hive2://gg01:10000: ***********
Connected to: Spark SQL (version 1.4.0)
Driver: Hive JDBC (version 0.13.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://gg01:10000> show databases;
+----------+
| result |
+----------+
| default |
| sogou |
+----------+
2 rows selected (1.852 seconds)
0: jdbc:hive2://gg01:10000>
2.访问数据: 导入hive lib下的的所有包,包括了mysql-jdbc的那个包
package com.github.sunyonggang
import java.sql.DriverManager
/**
* Created by sunyonggang on 16-5-7.
*/
class UseJdbcGetUserlog {
}
object UseJdbcGetUserlog {
def main(args: Array[String]) {
Class.forName("org.apache.hive.jdbc.HiveDriver")
val conn = DriverManager.getConnection("jdbc:hive2://gg01:10000/sogou", "sunyonggang", "**************")
val preStat = conn.prepareStatement("select * from userlog limit 5")
val rs = preStat.executeQuery()
//desc userlog
/**
* hive> desc userlog;
* OK
* id int
* username varchar(100)
* phone bigint
* address varchar(25)
* age int
* gender varchar(5)
* level varchar(25)
* logindate varchar(10)
* Time taken: 0.376 seconds, Fetched: 8 row(s)
*/
while(rs.next()) {
println("id: " + rs.getInt("id") + " username: " + rs.getString("username") + " phone: " + rs.getString("phone") +
" address: " + rs.getString("address") + " age: " + rs.getInt("age")
)
}
rs.close()
preStat.close()
conn.close()
}
}
3.查看结果
/home/sunyonggang/jdk1.7.0_79/bin/java -Didea.launcher.port=7533 -Didea.launcher.bin.path=/home/sunyonggang/idea-IC-145.597.3/bin -Dfile.encoding=UTF-8 -classpath com.intellij.rt.execution.application.AppMain com.github.sunyonggang.UseJdbcGetUserlog
log4j:WARN No appenders could be found for logger (org.apache.thrift.transport.TSaslTransport).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
id: 1 username: user1 phone: 15200776350 address: 凤凰县 age: 25
id: 2 username: user2 phone: 15200771700 address: 凤凰县 age: 41
id: 3 username: user3 phone: 13487837597 address: 吉首市 age: 23
id: 4 username: user4 phone: 15074344263 address: 吉首市 age: 21
id: 5 username: user5 phone: 15007439761 address: 吉首市 age: 20