Skip to main content

Getting Started with Apache Lucene

Apache Lucene is a high-performance, full-featured text search engine library.

The API of lucene is very simple to use. Here is overview of some of the objects required to start using Apache Lucene.

Document and Fields

The class "org.apache.lucene.document.Document" is necessary container for the index. Lucene requires all indexed objects to provide an instance of Document. Each document defines one or several fields ( (org.apache.lucene.document.Field). Fields contain classified information about the document or metadata related to document. A sample classification is for example the creation date of a file, author of the document etc. These fields allow you to search later for a specific information in this classification.


IndexWriter

The class org.apache.lucene.index.IndexWriter creates the index. Via the method addDocument you can add an existing Document to the index. The constructor for IndexWriter expects the directory to store the index, and the analyzer for the content of the files. In addition a boolean flag is handed over which indicates if the index should be created new or if an existing index should be extended.

Analyser

The class org.apache.lucene.analysis.standard.StandardAnalyzer provides a standard analyzer. This part is responsible to analyse the text and to filter out certain fill-words, e.g. "and".

Searcher and Query

The class org.apache.lucene.search.Searcher provides the search functionality. org.apache.lucene.search.IndexSearcher searches over an index. What is to be searched is provided via the query (org.apache.lucene.search.Query) class. The search allows wildcard search, e.g. *, ?, logical operations (AND, OR, NOT) and much more, e.g. fussy.

Here's a simple example how to use Lucene for indexing and searching.

public class TestMyLucene
{
public static void main(String[] args) throws Exception
{
Analyzer analyzer = new StandardAnalyzer();

// Store the index in memory:
Directory directory = new RAMDirectory();

// To store an index on disk, use this instead (note that the
// parameter true will overwrite the index in that directory
// if one exists):
//Directory directory = FSDirectory.getDirectory("/tmp/testindex", true);
IndexWriter iwriter = new IndexWriter(directory, analyzer, true);
iwriter.setMaxFieldLength(25000);

Document doc = new Document();
String text = "This is the text to be indexed.";
doc.add(new Field("fieldname", text, Field.Store.YES, Field.Index.TOKENIZED));
iwriter.addDocument(doc);
iwriter.close();

// Now search the index:
IndexSearcher isearcher = new IndexSearcher(directory);

// Parse a simple query that searches for "text":
Query query = QueryParser.parse("text", "fieldname", analyzer);
Hits hits = isearcher.search(query);
System.out.println(hits.length());

// Iterate through the results:
for (int i = 0; i < hits.length(); i++)
{
Document hitDoc = hits.doc(i);
System.out.println(hitDoc.get("fieldname"));
}

isearcher.close();
directory.close();
}
}

In the above example, hits return the matching documents for the query provided.

Comments

Popular posts from this blog

Drools - An overview

For Java based applications the most challenging part has always been the business logic maintenance, and pick any applications which you find complex and if we ask ourself how complex it would be moving forward, the answer will always be nX times.

What do we do ? Drools comes for Rescue as a Rule Engine.

Drools provides mechanism:

a. To write business logic in simple english language
b. Easy to maintain and very simple to extend
c. Reusability of logic by defining keywords in a DSL file and using them in DSLR file.

But be careful nothing comes free, everything takes cost in terms of memory and time space.

Use Drools if you really have :

a. Business logic which you think is getting cluttered with multiple if conditions because of variety of scenarios
b. You will have growing demand of increase in the complexity
c. The business logic changes would be frequent (1 - 2 times a year would also be frequent)
d. Your server's have enough of memory as it is a memory hungary tool, it provi…

Java 8 streams performance on mathematical calculations

Java 8 Streams API supports many parallel operations to process the data, it abstracts low level multithreading logic. To test performance did following simple test to calculate factorial of first X number starting with N.

Following program calculates factorial of first 1000 numbers.

package com.java.examples; import java.math.BigInteger; import java.time.Duration; import java.time.Instant; import java.util.ArrayList; import java.util.List; publicclassMathCalculation { publicstaticvoid main(String[] args) { List<Runnable> runnables = generateRunnables(1, 1000); Instant start= Instant.now(); // Comment one of the lines below to test parallel or sequential streams runnables.parallelStream().forEach(r -> r.run()); // runnables.stream().forEach(r -> r.run()); Instant end = Instant.now(); System.out.println("Calculated in "+ Duration.between(start, end)); } privatestaticvoid factorial(int number) { int i; BigInteger fact = BigInteger.valueOf(1); for (i =1; i <= numb…

MQTT : Android step by step guide using Eclipse Paho

For MQTT integration, recently explored Paho Android project, very simple to use, here are the steps:

Intialize a client, set required options and connect.

    MqttAndroidClient mqttClient = new MqttAndroidClient(BaseApplication.getAppContext(), broker, MQTT_CLIENT_ID);
    //Set call back class
    mqttClient.setCallback(new MqttCallbackHandler(BaseApplication.getAppContext()));
    MqttConnectOptions connOpts = new MqttConnectOptions();
    IMqttToken token = mqttClient.connect(connOpts);


Subscribe to a topic.

    token.setActionCallback(new IMqttActionListener() {
      @Override
      public void onSuccess(IMqttToken arg0) {
           mqttClient.subscribe("TOPIC_NAME" + userId, 2, null, new IMqttActionListener() {
                @Override
                public void onSuccess(IMqttToken asyncActionToken) {
                    Log.d(LOG_TAG, "Successfully subscribed to topic.");
                }

                @Override
                public void onFailure…