简单之美 | HBase-0.90.4集群安装配置

HBase是Hadoop数据库，能够实现随机、实时读写你的Big Data，它是Google的Bigtable的开源实现，可以参考Bigtable的论文Bigtable: A Distributed Storage System for Structured。HBase的存储模型可以如下三个词来概括：distributed, versioned, column-oriented。HBase并非只能在HDFS文件系统上使用，你可以应用在你的本地文件系统上部署HBase实例来存储数据。

准备工作

hbase-0.90.4.tar.gz [http://labs.renren.com/apache-mirror//hbase/stable/hbase-0.90.4.tar.gz]
zookeeper-3.3.4.tar.gz

下面介绍Standalone和Distributed安装过程。

Standalone模式

这种安装模式，是在你的本地文件系统上安装配置一个HBase实例，安装配置比较简单。
首先，要保证你的本地系统能够通过ssh无密码访问，配置如下：

ssh-keygen -t dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

检查一下权限：你的~/.ssh目录的权限是否为755，~/.ssh/authorized_keys的权限是否为644，如果不是，执行下面的命令行：

chmod 755 ~/.ssh
chmod 644 ~/.ssh/authorized_keys

然后，安装配置HBase，过程如下：

cd /home/shirdrn/hadoop
tar -xvzf hbase-0.90.4.tar.gz
cd hbase-0.90.4

修改conf/hbase-env.sh中JAVA_HOME配置，指定为你的JAVA_HOME目录：

export JAVA_HOME=/usr/java/jdk1.6.0_16

其他配置，如HBASE*指定配置项，如果需要可以进行配置。
修改hbase-site.xml中配置，示例如下：

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>file:///home/shirdrn/hadoop/hbase-0.90.4/data</value>
  </property>
</configuration>

指定HBase的数据存储目录，使用的是本地文件系统的目录。
接着，就可以启动HBase实例，提供本地存储服务：

bin/start-hbase.sh

启动完成以后，你可以跟踪一下HBase日志，看看是否启动成功了：

tail -500f logs/hbase-shirdrn-master-localhost.log

或者查看一下HMaster进程是否存在：

ps -ef | grep HMaster

通过日志可以看出，HBase实例启动了所有的HBase和Zookeeper守护进程，并且这些进程都是在同一个JVM中。下面，可以启动HBase shell，来简单测试HBase的数据存储的基本命令：

cd bin
hbase shell
hbase(main):001:0> help
hbase(main):002:0> status
hbase(main):003:0> version
// 创建表'pagedb'，列簇（Column Family）为metadata、text、status
hbase(main):004:0> create 'pagedb', 'metadata', 'text', 'status'
// 插入数据
hbase(main):005:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'metadata:site', 'www.mafengwo.cn'
hbase(main):006:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'metadata:pubdate', '2011-12-20 22:09'
hbase(main):007:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'text:title', '南国之境'
hbase(main):008:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'text:content', '如果海會說话， 如果風愛上砂 我會聆聽浪花，...'
hbase(main):009:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'status:extracted', '0'
hbase(main):010:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'status:httpcode', '200'
hbase(main):011:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'status:indexed', '1'
// 扫描表'pagedb'
hbase(main):012:0> scan 'pagedb'
// 获取记录'http://www.mafengwo.cn/i/764197.html'的所有列的数据
hbase(main):013:0> get 'pagedb', 'http://www.mafengwo.cn/i/764197.html'
// 获取记录'http://www.mafengwo.cn/i/764197.html'的metadata列簇数据
hbase(main):014:0> get 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'metadata'
// 获取记录'http://www.mafengwo.cn/i/764197.html'的列metadata:site数据
hbase(main):015:0> get 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'metadata:site'
// 增加一个列status:state，并指定值为4
hbase(main):016:0> incr 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'status:state', 4
// 修改status:httpcode的值为500
hbase(main):017:0> put 'pagedb', 'http://www.mafengwo.cn/i/764197.html', 'status:httpcode', '500'
// 统计表'pagedb'中的记录行数
hbase(main):018:0> count 'pagedb'
// disable表'pagedb'
hbase(main):019:0> disable 'pagedb'
// enable表pagedb
hbase(main):020:0> enable 'pagedb'
// 清空表'pagedb'
hbase(main):021:0> truncate 'pagedb'
// 列出所有表
hbase(main):022:0> list
// 删除'http://www.mafengwo.cn/i/764197.html'数据行
hbase(main):023:0> deleteall 'pagedb','http://www.mafengwo.cn/i/764197.html'
// 删除表'pagedb'，删除之前必须先disable表
hbase(main):024:0> drop 'pagedb'

如果想练习使用其他更多命令，可以通过help查看其他命令。

Distributed模式

基于分布式模式安装HBase，首先它是在安装在HDFS集群之上，所以，首先要做的就是能够正确配置分布式模式的HDFS集群：保证Nanemode和Datanode进程都正确启动。HBase是一个分布式NoSQL数据库，建立于HDFS之上，并且对于集群模式的HBase需要对各个结点之间的数据进行协调（Coordination），所以HBase直接将ZooKeeper作为一个分布式协调系统来实现HBase数据复制（Replication）存储。有关ZooKeeper的介绍可以参考官方文档：http://zookeeper.apache.org。
HBase的基于主从架构模式：HBase集群中存在一个Hbase Master Server，类似于HDFS中的Namenode的角色；而作为从结点的Region Server，类似于HDFS中的Datanode。
对于HBase分布式模式的安装，又基于Zookeeper的是否被HBase管理，分为两种模式：

基于HBase管理的Zookeeper集群，启动和关闭HBase集群，同时也控制Zookeeper集群
外部Zookeeper集群：一个完全独立于HBase的ZooKeeper集群，不受HBase管理控制（启动与停止ZooKeeper集群）

下面，我们基于一个单独安装的ZooKeeper集群，不基于HBase管理，进行安装。根据官网文档，很容易就能安装配置好，并尝试使用。
1、安装配置HDFS集群
启动HDFS集群实例，一台master作为Namenode结点，其余3台slaves作为Datanode结点。
其中，master服务端口为9000。
2、创建HBase存储目录

#创建目录hdfs://master:9000/hbase
hadoop fs -mkdir /hbase
#验证/hbase目录创建成功
hadoop fs -lsr /

3、配置HBase
（1）解压缩HBase软件包，修改系统环境变量，在~/.bashrc中最后面加入如下配置：

export JAVA_HOME=/home/hadoop/installation/jdk1.6.0_30
export HADOOP_HOME=/home/hadoop/installation/hadoop-0.22.0
export HBASE_HEAPSIZE=128
export HBASE_MANAGES_ZK=false

使配置生效：

. ~/.bashrc

2）修改hbase-0.90.4/conf/hbase-env.sh脚本内容：
首先要重命名hbase-0.90.4目录下的一个目录：

hadoop@master:~/installation/hbase-0.90.4$ mv hbase-webapps/ webapps

默认会查找webapps目录。然后修改脚本，内容如下：

export JAVA_HOME=/home/hadoop/installation/jdk1.6.0_30
export HADOOP_HOME=/home/hadoop/installation/hadoop-0.22.0
export HBASE_HEAPSIZE=128
export HBASE_MANAGES_ZK=false
export HBASE_CLASSPATH=$HBASE_HOME/

最后一个表示使用外部Zookeeper集群，而不让HBase集群去管理。
（3）修改conf/hbase-site.xml文件内容，如下所示：

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://master:9000/hbase</value>
                <description>The directory shared by RegionServers.</description>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
                <description>The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)</description>
        </property>
        <property>
                <name>hbase.zookeeper.property.dataDir</name>
                <value>/home/hadoop/storage/zookeeper</value>
                <description>Property from ZooKeeper's config zoo.cfg. The directory where the snapshot is stored.</description>
        </property>
        <property>
                <name>hbase.zookeeper.quorum</name>
                <value>slave-01,slave-02,slave-03</value>
                <description>The directory shared by RegionServers.</description>
        </property>
</configuration>

上面配置中：
hbase.rootdir 指定了HBase存储的根目录是在HDFS的hdfs://master:9000/hbase目录下，该目录要被HBase集群中Region Server共享。不要忘记了，在启动HBase集群之前，在HDFS集群中创建/hbase目录，在master上执行命令hadoop fs -mkdir /hbase即可。

hbase.cluster.distributed 指定了我们使用完全分布的模式进行安装
hbase.zookeeper.property.dataDir 指定了HBase集群使用的ZooKeeper集群的存储目录
hbase.zookeeper.quorum指定了用于协调HBase集群的ZooKeeper集群结点，必须配置奇数个结点，否则HBase集群启动会失败

所以，在启动HBase集群之前，首先要保证ZooKeeper集群已经成功启动。
（4）接下来，检查HBase的lib中的Hadoop的版本是否之前我们启动的HDFS集群使用的版本一致：

rm ~/installation/hbase-0.90.4/lib/hadoop-core-0.20-append-r1056497.jar
cp ~/installation/hadoop-0.22.0/*.jar ~/installation/hbase-0.90.4/lib/

我直接将HBase解压缩包中的hadoop的jar文件删除，用当前使用版本的Hadoop的jar文件。这一步很重要，如果不细看官方文档，你可能会感觉很怪异，实际HBase软件包中lib下的Hadoop的版本默认是0.20的，如果你启动的HDFS使用的是0.22，则HBase启动会报版本不一致的错误。
（5）修改conf/regionservers文件，配置HBase集群中的从结点Region Server，如下所示：

slave-01
slave-02
slave-03

一行一个主机字符串，上面使用是从结点主机的域名。上面配置，与HDFS的从结点的配置非常类似。
（6）经过上面几个骤，基本已经在一台机器上（master）配置好HBase了，这时，需要将上述的全部环境变量配置，也在各个从结点上进行配置，然后将配置好的HBase安装文件拷贝分发到各个从结点上：

scp -r ~/installation/hbase-0.90.4 hadoop@slave-01:/home/hadoop/installation
scp -r ~/installation/hbase-0.90.4 hadoop@slave-02:/home/hadoop/installation
scp -r ~/installation/hbase-0.90.4 hadoop@slave-03:/home/hadoop/installation

4、配置Zookeeper集群
具体安装、配置和启动，详见文章 http://blog.csdn.net/shirdrn/article/details/7183503 的说明。
在开始启动HBase集群之前，要先启动Zookeeper集群，保证其运行正常。
5、启动HBase集群
启动HBase集群了，执行如下脚本：

./start-hbase.sh

你可以使用jps查看一下，当前master上启动的全部进程，如下所示：

hadoop@master:~/installation/hbase-0.90.4$ jps
15899 SecondaryNameNode
15553 NameNode
21677 Jps
21537 HMaster

其中，HMaster进程就是HBase集群的主结点服务进程。
slaves结点上启动的进程，以slave-03为例：

hadoop@slave-03:~/installation/hbase-0.90.4$ jps
6919 HRegionServer
4212 QuorumPeerMain
7053 Jps
3483 DataNode

上面，HReginServer是HBase集群的从结点服务进程，QuorumPeerMain是ZooKeeper集群的结点服务进程。
或者，查看日志，是否出现启动异常：

master上  ：  tail -500f $HBASE_HOME/logs/hbase-hadoop-master-master.log
slave-01上：  tail -500f $HBASE_HOME/logs/hbase-hadoop-zookeeper-slave-01.log
slave-02上：  tail -500f $HBASE_HOME/logs/hbase-hadoop-zookeeper-slave-02.log
slave-03上：  tail -500f $HBASE_HOME/logs/hbase-hadoop-zookeeper-slave-03.log

6、验证HBase安装
启动HBase shell，如果能够显示如下信息则说明HBase集群启动成功：

hadoop@master:~/installation/hbase-0.90.4$ hbase shell
12/01/09 01:14:09 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/01/09 01:14:09 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/01/09 01:14:09 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.90.4, r1150278, Sun Jul 24 15:53:29 PDT 2011


hbase(main):001:0> help
HBase Shell, version 0.90.4, r1150278, Sun Jul 24 15:53:29 PDT 2011
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.


COMMAND GROUPS:
  Group name: general
  Commands: status, version


  Group name: ddl
  Commands: alter, create, describe, disable, drop, enable, exists, is_disabled, is_enabled, list


  Group name: dml
  Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate


  Group name: tools
  Commands: assign, balance_switch, balancer, close_region, compact, flush, major_compact, move, split, unassign, zk_dump


  Group name: replication
  Commands: add_peer, disable_peer, enable_peer, remove_peer, start_replication, stop_replication


SHELL USAGE:
Quote all names in HBase Shell such as table and column names.  Commas delimit
command parameters.  Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:


  {'key1' => 'value1', 'key2' => 'value2', ...}


and are opened and closed with curley-braces.  Key/values are delimited by the
'=>' character combination.  Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type
'Object.constants' to see a (messy) list of all constants in the environment.


If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:


  hbase> get 't1', "key\x03\x3f\xcd"
  hbase> get 't1', "key\003\023\011"
  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"


The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/docs/current/book.html
hbase(main):002:0> status
3 servers, 0 dead, 0.0000 average load


hbase(main):003:0> version
0.90.4, r1150278, Sun Jul 24 15:53:29 PDT 2011


hbase(main):004:0>

你可以按照前面使用本地文件系统安装过程中，使用的命令来进行相关的操作。

总结说明

1、出现版本不一致错误
如果启动时出现版本不一致的错误，如下所示：

2012-01-06 21:27:18,384 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
org.apache.hadoop.ipc.RemoteException: Server IPC version 5 cannot communicate with client version 3
        at org.apache.hadoop.ipc.Client.call(Client.java:740)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        at $Proxy5.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:113)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:215)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:177)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
        at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:364)
        at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:81)
        at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:346)
        at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:282)
2012-01-02 21:27:18,384 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

这就是说明Hadoop和HBase版本不匹配，仔细阅读文档，你会在http://hbase.apache.org/book.html#hadoop发现，解释如下所示：

Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its lib directory. The bundled jar is ONLY

for use in standalone mode. In
distributed mode, it is critical that the version of Hadoop that is out on your cluster match what is under HBase. Replace the hadoop jar found in the HBase lib
directory with the hadoop jar you are running on your cluster to avoid version mismatch issues. Make sure you replace the jar in HBase everywhere on your cluster.
Hadoop version mismatch issues have various manifestations but often all looks like its hung up.

将HBase解压缩包中lib的Hadoop Core jar文件替换为当前你所使用的Hadoop版本即可。
2、HBase集群启动以后，执行相关操作时抛出异常
如果HBase集群正常启动，但是在想要创建一个table的时候，出现如下异常，如下所示：

ERROR: org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: org.apache.hadoop.hbase.NotAllMetaRegionsOnlineException: Timed out (10000ms)
        at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:334)
        at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:769)
        at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:743)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)

解决方法就是，修改/etc/hosts文件，修改内容以master为例，如下所示：

#127.0.0.1       localhost
192.168.0.180   master
192.168.0.191   slave-01
192.168.0.190   slave-02
192.168.0.189   slave-03
# The following lines are desirable for IPv6 capable hosts
#::1     ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters

然后，再进行相关操作就没有问题了。
参考如下链接：http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/18868

参考链接

本文基于署名-非商业性使用-相同方式共享 4.0许可协议发布，欢迎转载、使用、重新发布，但务必保留文章署名时延军（包含链接：http://shiyanjun.cn），不得用于商业目的，基于本文修改后的作品务必以相同的许可发布。如有任何疑问，请与我联系。

发表评论取消回复

石浩枫: 作者你自己看看你写的通顺吗，图layer一半有颜色一半没颜色，画的啥东西
gsgsgsl: 赞一个，前几年搞过kafka2.x版本的鉴权，几年有这方面需求，发现很多api变了，2.x版本的鉴权配置在3.x版...
dack: GPT-2中的相对位置编码请问有出处吗，在GPT-2的论文& #8221;Language Models are...
zhang: 你好，这一行“前面计算已经得到 QKT 矩阵，n=6，dk=8，则 A 的大小也是 6 x 6。”请问在下面的代码中为什么dk =...
derek: 何时才能出现伴侣Ai
丘比特: 请问博主，如果在窗口中用到广播状态，现在您有什么实现方案吗？
z: 寫的真好
方俊: 大佬好有耐心，从14年回复到19年哈哈
Yanjun: 图是用 Astah 和 OminiGraffle 画的
JacobZheng: 问个题外话，图是用什么工具画的啊
Derek Dekker: 感觉还挺难的
luosijie: 博主你好，请问您知道K距离方法出自哪篇文献吗，我该如何引用？

发表评论 取消回复

发表评论取消回复