SparkSQL读取hive数据本地idea运行的方法详解

网友投稿 673 2022-11-21


SparkSQL读取hive数据本地idea运行的方法详解

环境准备:

hadoop版本:2.6.5

spark版本:2.3.0

hive版本:1.2.2

master主机:192.168.100.201

slave1主机:192.168.100.201

pom.xml依赖如下:

xmlns:xsi="http://w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

4.0.0

com.spark

spark_practice

1.0-SNAPSHOT

UTF-8

1.8

1.8

2.3.0

junit

junit

4.11

test

org.apache.spark

spark-core_2.11

${spark.core.version}

org.apache.spark

spark-sql_2.11

${spark.core.version}

mysql

mysql-connector-java

5.1.38

org.apache.spark

spark-hive_2.11

2.3.0

xmlns:xsi="http://w3.org/2001/XMLSchema-instance"

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

4.0.0

com.spark

spark_practice

1.0-SNAPSHOT

UTF-8

1.8

1.8

2.3.0

junit

junit

4.11

test

org.apache.spark

spark-core_2.11

${spark.core.version}

org.apache.spark

spark-sql_2.11

${spark.core.version}

mysql

mysql-connector-java

5.1.38

org.apache.spark

spark-hive_2.11

2.3.0

注意:一定要将hive-site.xml配置文件放到工程resources目录下

hive-site.xml配置如下:

hive.metastore.http://uris

thrift://192.168.100.201:9083

hive.server2.thrift.port

10000

javax.jdo.option.ConnectionURL

jdbc:mysql://node01:3306/hive?createDatabaseIfNotExist=true

javax.jdo.option.ConnectionDriverName

com.mysql.jdbc.Driver

javax.jdo.option.ConnectionUserName

root</value>

javax.jdo.option.ConnectionPassword

123456

hive.zookeeper.quorum

node01,node02,node03

hbase.zookeeper.quorum

node01,node02,node03

hive.metastore.warehouse.dir

/user/hive/warehouse

fs.defaultFS

hdfs://192.168.100.201:9000

hive.metastore.schema.verification

false

datanucleus.autoCreateSchema

true

datanucleus.autoStartMechanism

checked

主类代码:

import org.apache.spark.sql.SparkSession

object SparksqlTest2 {

def main(args: Array[String]): Unit = {

val spark: SparkSession = SparkSession

.builder

.master("local[*]")

.appName("Java Spark Hive Example")

.enableHiveSupport

.getOrCreate

spark.sql("show databases").show()

spark.sql("show tables").show()

spark.sql("select * from person").show()

spark.stop()

}

}

前提:数据库访问的是default,表person中有三条数据。

测试前先确保hadoop集群正常启动,然后需要启动hive的metastore服务。

./bin/hive --service metastore

运行,结果如下:

如果报错:

Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.io.IOException: (null) entry in command string: null chmod 0700 C:\Users\dell\AppData\Local\Temp\c530fb25-b267-4dd2-b24d-741727a6fbf3_resources;

 at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)

 at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)

 at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)

 at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)

 at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)

 at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)

 at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)

 at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.(HiveSessionStateBuilder.scala:69)

 at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)

 at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)

 at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)

 at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)

 at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)

 at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)

 at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)

 at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)

 at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)

 at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)

 at com.tongfang.learn.spark.hive.HiveTest.main(HiveTest.java:15)

解决:

1.下载hadoop windows binary包,链接:https://github.com/steveloughran/winutils

2.在启动类的运行参数中设置环境变量,HADOOP_HOME=D:\winutils\hadoop-2.6.4,后面是hadoop windows 二进制包的目录。


版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:使用@SpringBootTest注解进行单元测试
下一篇:详解springBoot启动时找不到或无法加载主类解决办法
相关文章

 发表评论

暂时没有评论,来抢沙发吧~