Big Data Hadoop & Spark

Introduction to Solr Indexing

Apache Solr permits you to simply produce search engines that help search websites, databases, and files. Solr Indexing is like retrieving pages from a book that are associated with a keyword by scanning the index provided toward the end of a book, as opposed to looking at every word of each page of the book.
So primarily what you’re doing here is that you just have maintained an inventory of words that are found within the book and an inventory of page numbers wherever they occur.
This is what the Solr precisely does. Solr maintains a list that holds the mapping of words/terms/phrases and, on the corresponding places wherever they occur.
Solr is extremely reliable, scalable, and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation options of many of the world’s largest websites.
The section below describes the process of Solr indexing, adding content to a Solr index from our local database only. By adding content to an index, we make it searchable by Solr.
Visit the site Installing solr if you don’t have solr in your system.
Today we will be covering 2 different methods of Solr indexing.
Here, we have created a sample file with data stored in the xml format as below.

<add><doc><field name=”id”>7777777</field><field name=”name”>testing the solr</field><field name=”price”>,0.99</field><field name=”inStock”>FALSE</field><field name=”author”>prateek</field><field name=”genre_s”/></doc><doc><field name=”id”>88888888</field><field name=”name”>AcadGild </field><field name=”price”>1099.99</field><field name=”inStock”>true </field><field name=”author”>Vinodh Dham</field><field name=”genre_s”>scifi</field></doc><doc><field name=”id”>553573403</field><field name=”name”>A Game of Thrones</field><field name=”price”>7.99</field><field name=”inStock”>true</field><field name=”author”>George R.R. Martin</field><field name=”genre_s”>fantasy</field></doc></add>

Users can also copy the above xml into their systems for practice.
Now, we will learn the steps on how to index a file in Solr. We will also query step-by-step to confirm the same later.

Method1:: bin/Post

There is more than a single method to index a file on solr. Using Post is the core method here. As all the other methods calls this Post to complete indexing.
Here, the file is present inside solr-6.4.0/bin. And the console is present in solr-6.4.0.
Command: bin/post -c gettingstarted /usr/local/solr-6.4.0/bin/books.xml

Solr Indexing complete. See the screenshot below for the values indexed inside Solr. The output is shown in the CSV format as it is specified in the form below.
Although, querying indexes inside Solr is another topic to discuss in detail. We will perform a simple query to conform our indexing.

Querying Inside Solr to Check If the File Was Indexed

Press getting started as shown below. And then press Query.

You may create a new form with lots of fields to fill. Do not worry, just check if q has *:* present inside and press ExecuteQuery. Refer to the following screenshots:
The q represents the query (here, All data) and wt represents the result format (here, csv format).
You will find results appear on the right hand side of the same page, as is also shown below. Refer to the screenshot below.

This is how to index 1 file specifically to Solr. Also, there are many other fields you may see, and we will discuss about every field in detail on a different blog post titled “understanding the query system in Solr.”

To Index All the XML Files Present in the Directory Using Method1(/Bin/Post)

We need to give the following command in the terminal to be included inside the Solr framework that is running.
Command: bin/post -c gettingstarted example/exampledocs/*.xml
*Note: I am already present inside the Solr directory.
Only xml files will be indexed present inside solr-6.4.0/example/exampldocs

This will include all the examples already present in xml formats in example/exampledocs/.
Now, again come to the browser where the Solr UI was running.

Querying Inside Solr to Check If the File Was Indexed

Hence, your Solr is indexed and ready to query with the XML indexes.
Querying indexes inside Solr is another topic of the essence to be discussed here. And how do we index different formats of a file?
Similarly, we can index all kind of files. Few of which commands are listed below to test for different formats. The same test file is replicated in different formats.

Example Commands

[email protected] solr-6.4.0$ bin/post -c gettingstarted /usr/local/solr-6.4.0/bin/books.csv
[email protected] solr-6.4.0$ bin/post -c gettingstarted /usr/local/solr-6.4.0/bin/books.json
[email protected] solr-6.4.0$ bin/post -c gettingstarted /usr/local/solr-6.4.0/bin/books.pdf
*Note: All files are present inside solr-6.4.0/bin

Method 2:: SimplePostTool

The bin/post script currently delegates to a standalone Java program called SimplePostTool. This tool, is bundled into an executable JAR, and can be run directly using java -jar example/exampledocs/post.jar. See the help output and take it from there to post files, recurse a website or file system folder, or send direct commands to a Solr server.
The XML file we are indexing is named as book.xml and is kept present inside solr-6.4.0/example/exampledoc/
Note: the address is different that our previous sample data.Although sample data is same.

Our JAR file is present inside the same location. Refer to the screenshot below.

java -Dc=gettingstarted -jar post.jar books.xml
*Note: I am present inside directory solr-6.4.0/example/exampledoc/

As seen in the screenshot above, the file is indexed successfully. Hence, through this way too, you can index your files with Apache Solr.
Keep visiting for more updates on the technical and certification courses.


Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles