Blog > Introduction to Apache HBase
Why need Apache HBase?
The traditional data storage system is a relational database management system (RDBMS) for storing data and maintaining related problems, but slowly we faced the rise of Big Data. So since the rise of Big Data, we have come across new solutions, and Hadoop is one of them. But when we stored a huge amount of data inside Hadoop and tried to fetch a few records from it, it was a major issue because the user has to scan the entire Hadoop distributed file system to fetch the smallest record. Hence, the limitation of Hadoop is that it does not provide random access to databases. Here this problem can solve using Apache HBase.
What is Apache HBase?
Apache HBase is similar to database management systems, but it also can access data randomly. HBase is a distributed column-oriented database built on top of Hadoop’s file system. It is an open-source non-relational distributed database written in Java. It is developed as a part of Apache Software Foundation’s Apache Hadoop project and runs on top of HDFS.
Apache HBase is horizontally scalable and similar to Google’s Big Data table design to provide quick random access to huge amounts of structural data. It leverages the fault tolerance provided by the Hadoop file system. It’s part of the Hadoop ecosystem that provides random real-time read and write access to the data in the Hadoop file system.
Apache HBase VS. Hadoop Distributed File System (HDFS)
HBase | HDFS |
HBase is built on top of HDFS. | HDFS is a distributed file system that stores huge files. |
HBase provides faster and individual file lookups. | HDFS does not support individual file lookups. |
It has low latency. | It has high latency. |
It has in-built Hash tables enabling. | Only sequential memory access is available. |
Apache HBase is a column-oriented database, and the tables in it can be sore by row. The table schema defines only column families, which are the key-value pairs.
Apache HBase Features
- HBase is linearly scalable because it is built on top of HDFS, and HDFS provides horizontal scalability, and HBase also adopts this similar feature. It is beneficial for enterprises to deal with a massive amount of data.
- HBase has an automatic failure support feature. This feature provides us default tolerance feature of HBase.
- HBase provides consistent reading and writing. This feature offers its users random access to reading and writing data.
- It is integrated with Hadoop and both as a source and destination.
- It is easy for Java API for clients.
- HBase provides data replication across all the clusters.
Characteristics of HBase
HBase is a type of NoSQL database and is classified as a key-value store in HBase:
- The value will identify with a key
- The key and value are Byte Array
- Values will store in Key-Orders
- Quickly accessed by value keys
Storage Model of HBase
The two major components of the storage model are as follows:
Partitioning:
- A RegionServer may hold multiple regions.
- Horizontal partitioning of tables into regions.
- A RegionServer manages each region
Persistence and Data Availability:
- HBase stores its data in HDFS does not replicate RegionServers and relies on HDFS replication for data availability.
- Updates and reads will be processed by an in-memory cache called MemStore.
When to Use HBase?
- Invariable Schema
- Enough data in millions or billions of rows
- For random selects and range scans by key
- Sufficient commodity hardware with at least five node
HBase: Real Life Connect
Facebook’s messenger platform needs to store over 135 trillion messages every month. They store such data in HBase. Facebook chose HBase because it needed a system that could handle two types of data patterns: an ever-growing data set that is rarely accessed and an ever-growing data set that is highly volatile.
Author: SVCIT Editorial Copyright
Silicon Valley Cloud IT, LLC