Big Data Hadoop & Spark

Different Types of Filters in HBase Shell

In this Blog, we will be learning about the different types of filters in HBase Shell. Also, when and how it can be used, which has been presented with executed screenshot.
Filters in HBase Shell and Filter Language was introduced in Apache HBase zero.92. It permits you to perform server-side filtering when accessing HBase over Thrift or within the HBase shell.
When reading information from HBase using Get or Scan operations, you’ll be able to use custom filters to return a set of results to the client. This, however, doesn’t reduce server-side IO, it will only cut back network information measure and reduces the amount of information the client has to process. Filters are typically implemented using the Java API, however, are often used from HBase shell for testing and debugging purposes.

Ideal scan for a table named “bulktable”.


We will perform few filter operations on the table below.

100% Free Course On Big Data Essentials

Subscribe to our blog and get access to this course ABSOLUTELY FREE.

the command for list filters are available in HBase

FirstKeyOnlyFilter

This filter doesn’t take any arguments. It returns solely the primary key-value from every row.

Syntax

FirstKeyOnlyFilter ()

Example of firstkeyonlyfilter

KeyOnlyFilter

This filter doesn’t take any arguments. It returns solely the key part of every key-value.

Syntax

KeyOnlyFilter ()

Example of keyonlyfilter

prefixfilter:

This filter takes one argument as a prefix of a row key. It returns solely those key-values present in the very row that starts with the specified row prefix

Syntax

PrefixFilter (<row_prefix>)

Example of prefixfilter

ColumnPrefixFilter

This filter takes one argument as column prefix. It returns solely those key-values present in the very column that starts with the specified column prefix. The column prefix should be the form qualifier

Syntax

ColumnPrefixFilter(<column_prefix>)

Example of columnprefixfilter

MultipleColumnPrefixFilter

This filter takes a listing of column prefixes. It returns key-values that are present in the very column that starts with any of the specified column prefixes. every column prefixes should be a form qualifier.
Syntax
MultipleColumnPrefixFilter(‘<column_prefix>,<column_prefix>,….<column_prefix>)

Example of multiplecolumnprefixfilter


ColumnCountGetFilter
This filter takes one argument a limit. It returns the primary limit number of columns within the table.
Syntax
ColumnCountGetFilter(<limit>)
Example of columncountgetfilter

Hadoop

PageFilter

This filter takes one argument a page size. It returns page size number of the rows from the table

Syntax

PageFilter (<page_size>)l

Example of pagefilter

InclusiveStopFilter

This filter takes one argument as row key on that to prevent scanning. It returns all key-values present in rows together with the specified row.

Syntax

InclusiveStopFilter(<stop_row_key>)

Example of Inclusivestopfilter

Qualifier Filter (Family Filter)

This filter takes a compare operator and a comparator. It compares every qualifier name with the comparator using the compare operator and if the comparison returns true, it returns all the key-values in this column.

Syntax

QualifierFilter (<compareOp>, <qualifier_comparator>)

Example of Qualifier Filter

ValueFilter

This filter takes a compare operator and a comparator. It compares every value with the comparator using the compare operator and if the comparison returns true, it returns that key-value.

Syntax

ValueFilter (<compareOp>,‘<value_comparator>’)

The above all filters are very basic filters in HBase shell. Let’s look at the little complex one.
SingleColumnValueFilter
This filter as an argument takes a column family, a qualifier, a compare operator and a comparator. So, if the specified column isn’t found, all the columns of that row are going to be emitted. And ,If the column is found and also the comparison with the comparator returns true, all the columns of the row are going to be emitted. If the condition fails, the row won’t be emitted.
This filter additionally takes 2 extra optional boolean arguments – filterIfColumnMissing and setLatestVersionOnly
If the filterIfColumnMissing flag is set to true, the columns of the row won’t be emitted if the specified column to examine isn’t found within the row. The default value is false.
If the setLatestVersionOnly flag is set to false, it’ll check previous versions (timestamps) too. The default value is true.
These flags are not mandatory and if you must set neither or both.

Syntax

SingleColumnValueFilter(‘<family>’,‘<qualifier>’, <compare operator>, ‘<comparator>’, <filterIfColumnMissing_boolean>, <latest_version_boolean>)
SingleColumnValueFilter(‘<family>’, ‘<qualifier>, <compare operator>, ‘<comparator>’)

There are more:
while, You can see the list of filters in HBase by using HBase command (show_filters)
Hope this blog section helped you know the types of filters in HBase. Keep visiting www.acadgild.com for more updates on the courses.

Suggested Reading

How to run Hive queries on Spark

Related Popular Courses:

WHAT IS BIG DATA HADOOP

GOOGLE ANDROID DEVELOPMENT TRAINING

KAFKA TUTORIAL

DATA SCIENTIST TRAINING ONLINE

DIGITAL ANALYTICS CERTIFICATION

Hadoop

Tags

prateek

An alumnus of the NIE-Institute Of Technology, Mysore, Prateek is an ardent Data Science enthusiast. He has been working at Acadgild as a Data Engineer for the past 3 years. He is a Subject-matter expert in the field of Big Data, Hadoop ecosystem, and Spark.

3 Comments

  1. I am facing issue with comparison numbers in “SingleColumnValueFilter”. It doesn’t fetch values in range specified.
    Following if filter, I am using.
    string startFilter = “SingleColumnValueFilter(‘cf’,’qualifier’, >= ,’binary:” + Encoding.UTF8.GetString(HBaseGenericHelper.GetBigEndianByteArray(393)) + “‘,true,true)”;
    string endFilter = “SingleColumnValueFilter(‘cf’,’qualifier’, >= ,’binary:” + Encoding.UTF8.GetString(HBaseGenericHelper.GetBigEndianByteArray(395)) + “‘,true,true)”;
    string finalFilter = startFilter + ” AND ” + endFilter;
    Can you please help?

  2. Hi PRATEEK,
    The article is really helpful .
    I have question below :-
    Is there any way I can use multiple filters in single command line query on hbase .something like :-
    scan ‘cricket’,{FILTER=>”ColumnCountGetFilter(3)” and FILTER=>”ValueFilter(=,100)”}

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close