Big Data Hadoop & Spark

Getting Started with Apache Thrift

Apache Thrift is a software framework for implementing Remote Procedure Call (RPC) in services, with cross-language support. In this blog, we will discuss the working principle of Apache Thrift.
Remote Procedure Call is a protocol that one program can use to request a service from a program located in another computer in a network, without having to understand network details.
RPC is very similar to normal function but it is present remotely on a different server as a service and the service provides many such functions to its client. The client requires a way to know the exposed functions by this service and their parameters.
This is where Apache Thrift comes in. It has its own “Interface Definition Language” (IDL). In this language you define the functions and their parameters. And then, use Thrift compiler to generate corresponding code for any language of your choice. What this means is that you can implement a function in Java, host it on a server and then remotely call it from Python or any other language for that matter.
We will try to understand what thrift does, its architecture and components. Thrift networking stack can be represented as follows:

Let’s now understand each component.

Transport

The Transport layer provides a simple abstraction for reading/writing from/to the network. This enables Thrift to decouple the underlying transport from the rest of the system (serialization/deserialization, for instance).

Here are some of the methods exposed by the Transport interface:

  • open
  • close
  • read
  • write
  • flush

 
Here are some of the transports available for majority of the Thrift-supported languages:

  • file: read/write to/from a file on disk
  • http: as the name suggests

For example, we create a thrift HTTP transport in Java.

Protocol

The Protocol abstraction defines a mechanism to map in-memory data structures to a wire-format. In other words, a protocol specifies how data types use the underlying Transport to encode/decode themselves. Thus the protocol implementation governs the encoding scheme and is responsible for (de)serialization. Some examples of protocols in this sense include JSON, XML, plain text, compact binary etc.
For example, we create a thrift Binary Protocol in Java.

Processor

A Processor encapsulates the ability to read data from input streams and write to output streams. The input and output streams are represented by Protocol objects. Service-specific processor implementations are generated by the compiler. The Processor essentially reads data from the wire (using the input protocol), delegates processing to the handler (implemented by the user) and writes the response over the wire (using the output protocol).
For example, in case you wish to create a thrift processor (Thrift with servlet), we can do it in Java.

Server

A Server pulls together all of the features described above.

  • Create a transport
  • Create input/output protocols for the transport
  • Create a processor based on the input/output protocols
  • Wait for incoming connections and hand them off to the processor

Now that you know the basics of Thrift server, you can implement these basics for installing it on Centos, code sample of Python for accessing the Hbase. Keep visiting our website Acadgild for more updates on Big Data and other technologies. Click here to learn Big Data Hadoop Development.

Hadoop

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close