Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Hbase Essentials
  • Table Of Contents Toc
  • Feedback & Rating feedback
Hbase Essentials

Hbase Essentials

By : Garg
3.5 (2)
close
close
Hbase Essentials

Hbase Essentials

3.5 (2)
By: Garg

Overview of this book

This book is intended for developers and Big Data engineers who want to know all about HBase at a hands-on level. For in-depth understanding, it would be helpful to have a bit of familiarity with HDFS and MapReduce programming concepts with no prior experience with HBase or similar technologies. This book is also for Big Data enthusiasts and database developers who have worked with other NoSQL databases and now want to explore HBase as another futuristic, scalable database solution in the Big Data space.
Table of Contents (9 chapters)
close
close

Understanding HBase cluster components

In fully distributed and pseudo-distributed modes, a HBase cluster has many components such as HBase Master, ZooKeeper, RegionServers, HDFS DataNodes, and so on, discussed as follows:

  • HBase Master: HBase Master coordinates the HBase cluster and is responsible for administrative operations. It is a lightweight process that does not require too many hardware resources. A large cluster might have multiple HBase Master components to avoid cases that have a single point of failure. In this highly available cluster with multiple HBase Master components, only once HBase Master is active and the rest of HBase Master servers get in sync with the active server asynchronously. Selection of the next HBase Master in case of failover is done with the help of the ZooKeeper ensemble.
  • ZooKeeper: ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Similar to HBase Master, ZooKeeper is again a lightweight process. By default, a ZooKeeper process is started and stopped by HBase but it can be managed separately as well. The HBASE_MANAGES_ZK variable in conf/hbase-env.sh with the default value true signifies that HBase is going to manage ZooKeeper. We can specify the ZooKeeper configuration in the native zoo.cfg file or its values such as client, port, and so on directly in conf/hbase-site.xml. It is advisable that you have an odd number of ZooKeeper ensembles such as one/three/five for more host failure tolerance. The following is an example of hbase-site.xml with ZooKeeper settings:
    <property>
      <name>hbase.zookeeper.property.clientPort</name>
      <value>2222</value>
    </property>
    <property>
      <name>hbase.zookeeper.quorum</name>
      <value>ZooKeeperhost1, ZooKeeperhost2, ZooKeeperhost3<value>
    </property>
    <property>
      <name>hbase.zookeeper.property.dataDir</name>
      <value>/opt/zookeeper<value>
    </property>
    
  • RegionServers: In HBase, horizontal scalability is defined with a term called region. Regions are nothing but a sorted range of rows stored continuously. In HBase architecture, a set of regions is stored on the region server. By default, the region server runs on port 60030. In an HBase cluster based on HDFS, Hadoop DataNodes and RegionServers are typically called slave nodes as they are both responsible for server data and are usually collocated in the cluster. A list of the region servers is specified in the conf/regionservers file with each region server on a separate line, and the start/stop of these region servers is controlled by the script files responsible for an HBase cluster's start/stop.
  • HBase data storage system: HBase is developed using pluggable architecture; hence, for the data storage layer, HBase is not tied with HDFS. Rather, it can also be plugged in with other file storage systems such as the local filesystem (primarily used in standalone mode), S3 (Amazon's Simple Storage Service), CloudStore (also known as Kosmos filesystem) or a self-developed filesystem.

Apart from the mentioned components, there are other considerations as well, such as hardware and software considerations, that are not within the scope of this book.

Note

The backup HBase Master server and the additional region servers can be started in the pseudo-distributed mode using the utility provided bin directory as follows:

[root@localhost hbase-0.96.2-hadoop2]# bin/local-master-backup.sh 2 3 

The preceding command will start the two additional HBase Master backup servers on the same box. Each HMaster server uses three ports (16010, 16020, and 16030 by default) and the new backup servers will be using ports 16012/16022/16032 and 16013/16023/16033.

[root@localhost hbase-0.96.2-hadoop2]# bin/local-regionservers.sh start 2 3

The preceding command will start the two additional HBase region servers on the same box using ports 16202/16302.

Start playing

Now that we have everything installed and running, let's start playing with it and try out a few commands to get a feel of HBase. HBase comes with a command-line interface that works for both local and distributed modes. The HBase shell is developed in JRuby and can run in both interactive (recommended for simple commands) and batch modes (recommended for running shell script programs). Let's start the HBase shell in the interactive mode as follows:

[root@localhost hbase-0.98.7-hadoop2]# bin/hbase shell

The preceding command gives the following output:

Start playing

Type help and click on return to see a listing of the available shell commands and their options. Remember that all the commands are case-sensitive.

The following is a list of some simple commands to get your hands dirty with HBase:

  • status: This verifies whether HBase is up and running, as shown in the following screenshot:
    Start playing
  • create '<table_name>', '<column_family_name>': This creates a table with one column family. We can use multiple column family names as well, as shown in the following screenshot:
    Start playing
  • list: This provides the list of tables, as shown in the following screenshot:
    Start playing
  • put '<table_name>', '<row_num>', 'column_family:key', 'value': This command is used to put data in the table in a column family manner, as shown in the following screenshot. HBase is a schema-less database and provides the flexibility to store any type of data without defining it:
    Start playing
  • get '<table_name>', '<row_num>': This command is used to read a particular row from the table, as shown in the following screenshot:
    Start playing
  • scan '<table_name >': This scans the complete table and outputs the results, as shown in the following screenshot:
    Start playing
  • delete '<table_name>', '<row_num>', 'column_family:key': This deletes the specified value, as shown in the following screenshot:
    Start playing
  • describe '<table_name>': This describes the metadata information about the table, as shown in the following screenshot:
    Start playing
  • drop '<table_name>': This command will drop the table. However, before executing this command, we should first execute disable '<tablename>', as shown in the following screenshot:
    Start playing
  • Finally, exit the shell and stop using HBase, as shown in the following screenshot:
    Start playing

Note

Refer to the following link for more commands: http://wiki.apache.org/hadoop/Hbase/Shell

Create a Note

Modal Close icon
You need to login to use this feature.
notes
bookmark search playlist font-size

Change the font size

margin-width

Change margin width

day-mode

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Delete Bookmark

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Delete Note

Modal Close icon
Are you sure you want to delete it?
Cancel
Yes, Delete

Edit Note

Modal Close icon
Write a note (max 255 characters)
Cancel
Update Note

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY