Big Data Hadoop & Spark

Implementing HBase filters using Java APIs

In our previous blog we discussed about Need and Working of Filters in HBase. In this blog, we will be implementing a filtering operation on a set of rows in a HBase table.
We also recommend readers to go through our our below posts on HBase as it would help them in understanding the concepts given in this post in a better way.
Beginners Guide For HBase
Working of HBase components
Read and Write Operations in HBase
Performing CRUD Operations on HBase using JAVA API
For the below example, we will be using an existing table named “customer” from HBase default database. We can observe in the below image that by using HBase “list” command, we are listing the tables present in the HBase default database.

Table “customer” contents :
As shown in the below image, the table “customer” consists of three rows, namely Kiran, Manjunath and Prateek with a single column family named “order” and its column qualifier name as the number.

Scenario 1 :
Write a Java API to list the row values of the “customer” table without using Filter function.
Expected Output:
We can refer to the below screenshot to see the what the expected output will be.
5.0 row filter retrieve all rows using scan
Source Code:
package com.acadgild.hbase;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.FilterBase;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.RegexStringComparator;
import org.apache.hadoop.hbase.filter.RowFilter;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.filter.ValueFilter;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.util.Bytes;
import org.jruby.compiler.ir.operands.Array;

public class Filter_RowValue {
public static void main(String args[])throws IOException {
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf,"customer");
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("order"),Bytes.toBytes("number"));
ResultScanner result = table.getScanner(scan);
for(Result res:result){
byte[] val = res.getValue(Bytes.toBytes("order"), Bytes.toBytes("number"));
System.out.println("Row-value : "+Bytes.toString(val));
System.out.println(res);
}
table.close();
}
}

Here’s the explanation of each line of code :
In line 1, we are declaring a class name Filter_RowValue.
In line 3, the Configuration class adds HBase configuration resources to its object conf with the help of create() method of the HBaseConfiguration class.
In line 4, the class HTable instance “table” will allow to communicate with a single HBase table, it accepts configuration object and the table name as the parameters.
In line 5, we are creating class Scan “scan” instance to perform Scan operations.
In line 6, we are using addColumn method to column in the table “customer”, where “order” is the column family name and “number” is the column qualifier name of the column family “order”.
In line 7, we are declaring ResultScanner instance “result” which returns a scanner on the current table “customer” as specified by the Scan object.
In line 8, a foreach loop is taken, which will run each time for the rows inside the “customer” table until the result scanner value is found.
In line 9, we are storing entire rows, if the column family name is “order” and column qualifier name is “number” found in the table “customer” in the variable val.
In line 10, we are printing the entire variable val values with its associated column qualifier value.
In line 13, we are closing the table operation.
Output:
5.0 row filter retrieve all rows using scan
Hadoop
Scenario 2 :
Write a Java API to list the row values of the “customer” table where the column qualifier value is “Fli” and discarding the rows which is not having column qulaifier value as “Fli” using HBase Filter function.
Expected Output :
We can refer to the below screenshot to see the what the expected output will be.

package com.acadgild.hbase;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.FilterBase;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.RegexStringComparator;
import org.apache.hadoop.hbase.filter.RowFilter;
import org.apache.hadoop.hbase.filter.SubstringComparator;
import org.apache.hadoop.hbase.filter.ValueFilter;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.util.Bytes;
import org.jruby.compiler.ir.operands.Array;

public class Filter_RowValue {
public static void main(String args[])throws IOException {
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf,"customer");
Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("order"),Bytes.toBytes("number"));
Filter filter1 = new ValueFilter(CompareFilter.CompareOp.EQUAL, new
SubstringComparator("Fli"));
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE,filter1);
scan.setFilter(list);
ResultScanner result = table.getScanner(scan);
for(Result res:result){
byte[] val = res.getValue(Bytes.toBytes("order"), Bytes.toBytes("number"));
System.out.println("Row-value : "+Bytes.toString(val));
System.out.println(res);
}
table.close();
}
}

Here’s the explanation of each line of code :
In line 1, we are declaring a class name Filter_RowValue.
In line 3, the Configuration class adds HBase configuration resources to its object conf with the help of create() method of the HBaseConfiguration class.
In line 4, the class HTable instance “table” will allow to communicate with a single HBase table, it accepts configuration object and the table name as the parameters.
In line 5, we are creating class Scan “scan” instance to perform Scan operations.
In line 6, we are using addColumn method to column in the table “customer”, where “order” is the column family name and “number” is the column qualifier name of the column family “order”.
In line 7, we are using the class ValueFilter to filter the cells based on the value. It takes a CompareFilter.CompareOp operator (equal, greater, not equal, etc), and either a byte[] value or a ByteArrayComparable.
Here, “order” is the column family name, “number” is its column qualifier name, and “Fli” is the value in the table “customer”. We are using CompareOp.EQUAL and  Substringcomparator operator to check whether the value “Fli” is present in the column family qualifier name “number”.
In line 9,  we are declaring a variable “list” of FliterList class and using  FilterList.Operator.MUST_PASS_ONE which evaluates and compares all the filters and doesn’t stops the evaluation process like FilterList.Operator.MUST_PASS_ALL method as soon as one filter does not include the KeyValue.  
In line 10, we are using setFilter method to perform Filter operation on the list.
In line 11, we are declaring ResultScanner instance “result” which returns a scanner on the current table “customer” as specified by the Scan object.
In line 12, a foreach loop is taken, which will run each time for the rows inside the “customer” table until the result scanner value is found.
In line 13, we are storing entire rows, if the column family name is “order” and column qualifier name is “number” found in the table “customer” in the variable val.
In line 14, we are printing the entire variable val values with its associated column qualifier value.
In line 17, we are closing the table operation.
Output :

Thus, from the above steps we can observe that how HBase custom Filter helped us to retrieve a set of rows with column qualifier value as ‘ flip’ by scanning on a particular column family and its column qualifier value which was passed as an argument in the program instead of scanning the whole table.
We hope this post has been helpful in understanding the working of Filters in HBase, for retrieving results from a HBase database. In case of any queries, feel free to comment below and we will get back to you at the earliest.
Keep visiting our website for more post on Big Data and other technologies.
 
Hadoop

Tags

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close