hive开发接口（hive提供哪些接口层）

网友投稿 478 2023-03-19

本篇文章给大家谈谈hive开发接口，以及hive提供哪些接口层对应的知识点，希望对各位有所帮助，不要忘了收藏本站喔。今天给各位分享hive开发接口的知识，其中也会对hive提供哪些接口层进行解释，如果能碰巧解决你现在面临的问题，别忘了关注本站，现在开始吧！

本文目录一览：

1、windows下怎么用python连接hive数据库
2、python连接hive的时候必须要依赖sasl类库吗
3、什么是数据仓库，数据仓库在哪里保存数据。BI项目需要用到哪些技术
4、Hive最终都会转化成什么程序来执行？

windows下怎么用python连接hive数据库

由于版本hive开发接口的不同hive开发接口，Python 连接 Hive 的方式也就不一样。
在网上搜索关键字 python hive 的时候可以找到一些解决方案。大部分是这样的hive开发接口，首先把hive 根目录下的$HIVE_HOME/lib/py拷贝到 python 的库中hive开发接口，也就是 site-package 中，或者干脆把新写的 python 代码和拷贝的 py 库放在同一个目录下，然后用这个目录下提供的 thrift 接口调用。示例也是非常简单的。类似这样：
import sys
from hive_service import ThriftHive
from hive_service.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
def hiveExe(sql):
try:
transport = TSocket.TSocket('127.0.0.1', 10000)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = ThriftHive.Client(protocol)
transport.open()
client.execute(sql)
print "The return value is : "
print client.fetchAll()
print "............"
transport.close()
except Thrift.TException, tx:
print '%s' % (tx.message)

python连接hive的时候必须要依赖sasl类库吗

客户端连接Hive需要使用HiveServer2。HiveServer2是HiveServer的重写版本，HiveServer不支持多个客户端的并发请求。当前HiveServer2是基于Thrift RPC实现的。它被设计用于为像JDBC、ODBC这样的开发API客户端提供更好的支持。Hive 0.11版本引入的HiveServer2。

HiveServer2的启动

启动HiveServer2

HiveServer2的启动十分简便：

$ $HIVE_HOME/bin/hiveserver2

或者

$ $HIVE_HOME/bin/hive --service hiveserver2

默认情况下，HiverServer2的Thrift监听端口是10000，其WEB UI端口是10002。可通过来查看HiveServer2的Web UI界面，这里显示了Hive的一些基本信息。如果Web界面不能查看，则说明HiveServer2没有成功运行。

使用beeline测试客户端连接

HiveServer2成功运行后，我们可以使用Hive提供的客户端工具beeline连接HiveServer2。

$ $HIVE_HOME/bin/beeline

beeline !connect jdbc:hive2://localhost:10000

如果成功登录将出现如下的命令提示符，此时可以编写HQL语句。

0: jdbc:hive2://localhost:10000

报错：User: xxx is not allowed to impersonate anonymous

在beeline使用!connect连接HiveServer2时可能会出现如下错误信息：

Caused by: org.apache.hadoop.ipc.RemoteException: User: xxx is not allowed to impersonate anonymous

这里的xxx是我的操作系统用户名称。这个问题的解决方法是在hadoop的core-size.xml文件中添加xxx用户代理配置：

123456789

<span class="hljs-tag"<<span class="hljs-title"property <span class="hljs-tag"<<span class="hljs-title"namehadoop.proxyuser.xxx.groups<span class="hljs-tag"<span class="hljs-title"name <span class="hljs-tag"<<span class="hljs-title"value*<span class="hljs-tag"<span class="hljs-title"value<span class="hljs-tag"<span class="hljs-title"property<span class="hljs-tag"<<span class="hljs-title"property <span class="hljs-tag"<<span class="hljs-title"namehadoop.proxyuser.xxx.hosts<span class="hljs-tag"<span class="hljs-title"name <span class="hljs-tag"<<span class="hljs-title"value*<span class="hljs-tag"<span class="hljs-title"value<span class="hljs-tag"<span class="hljs-title"property</span</span</span</span</span</span</span</span</span</span</span</span</span</span</span</span</span</span</span</span</span</span</span</span

重启HDFS后，再用beeline连接HiveServer2即可成功连接。

常用配置

HiveServer2的配置可以参考官方文档《Setting Up HiveServer2》

这里列举一些hive-site.xml的常用配置：

hive.server2.thrift.port：监听的TCP端口号。默认为10000。

hive.server2.thrift.bind.host：TCP接口的绑定主机。

hive.server2.authentication：身份验证方式。默认为NONE（使用 plain SASL），即不进行验证检查。可选项还有NOSASL, KERBEROS, LDAP, PAM and CUSTOM.

hive.server2.enable.doAs：是否以模拟身份执行查询处理。默认为true。

Python客户端连接HiveServer2

python中用于连接HiveServer2的客户端有3个：pyhs2，pyhive，impyla。官网的示例采用的是pyhs2，但pyhs2的官网已声明不再提供支持，建议使用impyla和pyhive。我们这里使用的是impyla。

impyla的安装

impyla必须的依赖包括：

six

bit_array

thriftpy(python2.x则是thrift)

为了支持Hive还需要以下两个包：

sasl

thrift_sasl

可在Python PI中下载impyla及其依赖包的源码。

impyla示例

以下是使用impyla连接HiveServer2的示例：

1234567891011 from impala.dbapi import <span class="hljs-keyword"connect conn = <span class="hljs-keyword"connect(host=<span class="hljs-string"'127.0.0.1', port=<span class="hljs-number"10000, database=<span class="hljs-string"'default', auth_mechanism=<span class="hljs-string"'PLAIN') cur = conn.cursor() cur.execute(<span class="hljs-string"'SHOW DATABASES')<span class="hljs-keyword"print(cur.fetchall()) cur.execute(<span class="hljs-string"'SHOW Tables')<span class="hljs-keyword"print(cur.fetchall())</span</span</span</span</span</span</span</span</span</span

hive开发接口（hive提供哪些接口层）

什么是数据仓库，数据仓库在哪里保存数据。BI项目需要用到哪些技术

数据仓库还是数据库,数据还是在数据库里放着呢,不过是按照数据仓库的理念去设计架构和开发数据库.BI项目主要运用数据仓库,OLAP,和数据挖掘的技术,细分下来又有主流数据库的开发,如oracle,db2,sqlserver, java,cognos,bo,biee,sas,spss,clementine,weka等等

Hive最终都会转化成什么程序来执行？

hive最终都会转化为mapreduce的job来运行。

用户接口主要有三个：CLI，Client 和 WUI。其中最常用的是 Cli，Cli 启动的时候，会同时启动一个 hive 副本。Client 是 hive 的客户端，用户连接至 hive Server。

在启动 Client 模式的时候，需要指出 hive Server 所在节点，并且在该节点启动 hive Server。 WUI 是通过浏览器访问 hive。

扩展资料：

hive 并不适合那些需要高实时性的应用，例如，联机事务处理（OLTP）。hive 查询操作过程严格遵守Hadoop MapReduce 的作业执行模型，hive 将用户的hiveQL 语句通过解释器转换为MapReduce 作业提交到Hadoop 集群上。

Hadoop 监控作业执行过程，然后返回作业执行结果给用户。hive 并非为联机事务处理而设计，hive 并不提供实时的查询和基于行级的数据更新操作。hive 的最佳使用场合是大数据集的批处理作业，例如，网络日志分析。

参考资料来源：百度百科-hive

关于hive开发接口和hive提供哪些接口层的介绍到此就结束了，不知道你从中找到你需要的信息了吗？如果你还想了解更多这方面的信息，记得收藏关注本站。 hive开发接口的介绍就聊到这里吧，感谢你花时间阅读本站内容，更多关于hive提供哪些接口层、hive开发接口的信息别忘了在本站进行查找喔。

标签：接口数据模拟官方语句

暂时没有评论，来抢沙发吧~

hive开发接口（hive提供哪些接口层）

windows下怎么用python连接hive数据库

python连接hive的时候必须要依赖sasl类库吗

什么是数据仓库，数据仓库在哪里保存数据。BI项目需要用到哪些技术

Hive最终都会转化成什么程序来执行？

多平台统一管理软件接口，如何实现多平台统一管理软件接口

Flask接口签名sign原理与实例代码浅析

java中的接口是类吗

推荐文章

接口调用是什么意思？几种常用接口调用方式

接口设计原则

8款在线 API 接口文档管理工具

api管理系统是什么？

什么是接口调试？接口调试的步骤有哪些？

api 接口管理系统有哪些？

接口测试有几种测试方法

API文档生成工具有哪些？

微服务和api网关区别

交换机配置步骤

最近发表

热评文章

在线接口文档管理工具推荐，支持在线测试，HTTP接口

开源的在线接口文档wiki工具Mindoc的介绍与使

如何优雅的进行接口设计？接口设计的六大原则是什么？

什么是API测试,api检测公司

遇到百度网址安全中心提醒您该页面可能存在钓鱼欺诈信息

软件接口设计怎么做？前后端分离软件接口设计思路

hive开发接口（hive提供哪些接口层）

windows下怎么用python连接hive数据库

python连接hive的时候必须要依赖sasl类库吗

什么是数据仓库，数据仓库在哪里保存数据。BI项目需要用到哪些技术

Hive最终都会转化成什么程序来执行？

微信扫一扫：分享

推荐文章

最近发表

热评文章