In any Hadoop interview, knowledge of Sqoop and Kafka is very handy as they play a very important part in data ingestion. Sqoop is heavily used in moving data from an existing RDBMS to Hadoop or vice versa and Kafka is a distributed messaging system which can be used as a pub/sub model for data ingest, including streaming.
Let’s look into some of the most important interview questions asked based Sqoop and Kafka. Before that check the link for Hadoop Interview Questions on Advanced MapReduce
Hadoop Interview Questions Based on Sqoop and Kafka
What will happen if target directory already exists during sqoop import?
Ans: Sqoop runs a map-only job and if the target directory is present, it will throw an exception.
What is the use of warehouse directory in Sqoop import?
Ans: warehouse directory is the HDFS parent directory for table destination. If we specify target-directory all our files are stored in that location. But, with warehouse directory, a child directory is created inside it with the name of the table. All the files are stored inside the child directory.
What is the default number of mappers in a Sqoop job?
How to bring data directly into Hive using Sqoop?
Ans: To bring data directly into Hive using Sqoop use –hive-import command.
We wish to bring data in CSV format in HDFS from RDBMS source. The column in RDBMS table contains ‘,’. How to distinctly import data in this case?
Ans: Use can use the option –optionally-enclosed-by
How to import data directly to HBase using Sqoop?
Ans: You need to use –hbase-table to import data into HBase using sqoop. Sqoop will import data to the table specified as the argument to –hbase-table. Each row of input table will be transformed into an Hbase put operation to a row of output table.
What is incremental load in Sqoop?
What is the benefit of using a Sqoop job?
In the scenario where you must perform incremental import multiple times, you can create a sqoop job for incremental import and run the job. Whenever you run the sqoop job, it will automatically identify last imported value and then the import will start after the identified value.
Where does Sqoop job store the last imported value?
Ans: In its metastore.
What is Kafka?
Ans: It is a distributed, partitioned and replicated publish-subscribe messaging framework.
How is Apache Kafka different from Apache Flume?
Ans: Kafka is a publish-subscribe messaging system, whereas, flume is system for data collection, aggregation and movement
What are important elements of Kafka?
Ans: Kafka Producer, Consumer, Broker, and Topic.
What role does zookeeper play in a kafka cluster?
Ans: The basic responsibility of a Zookeeper is to build coordination between Kafka cluster.
How can consumer control the offset consumed by it.?
Ans: Automatic Commit or Manual commit.
We hope the above questions will help you in answering the Hadoop interview questions asked in the various companies. For more details, enroll for Big data and Hadoop training conducted by Acadgild.