Skip to main content

Getting Started with Apache Lucene

Apache Lucene is a high-performance, full-featured text search engine library.

The API of lucene is very simple to use. Here is overview of some of the objects required to start using Apache Lucene.

Document and Fields

The class "org.apache.lucene.document.Document" is necessary container for the index. Lucene requires all indexed objects to provide an instance of Document. Each document defines one or several fields ( (org.apache.lucene.document.Field). Fields contain classified information about the document or metadata related to document. A sample classification is for example the creation date of a file, author of the document etc. These fields allow you to search later for a specific information in this classification.


IndexWriter

The class org.apache.lucene.index.IndexWriter creates the index. Via the method addDocument you can add an existing Document to the index. The constructor for IndexWriter expects the directory to store the index, and the analyzer for the content of the files. In addition a boolean flag is handed over which indicates if the index should be created new or if an existing index should be extended.

Analyser

The class org.apache.lucene.analysis.standard.StandardAnalyzer provides a standard analyzer. This part is responsible to analyse the text and to filter out certain fill-words, e.g. "and".

Searcher and Query

The class org.apache.lucene.search.Searcher provides the search functionality. org.apache.lucene.search.IndexSearcher searches over an index. What is to be searched is provided via the query (org.apache.lucene.search.Query) class. The search allows wildcard search, e.g. *, ?, logical operations (AND, OR, NOT) and much more, e.g. fussy.

Here's a simple example how to use Lucene for indexing and searching.

public class TestMyLucene
{
public static void main(String[] args) throws Exception
{
Analyzer analyzer = new StandardAnalyzer();

// Store the index in memory:
Directory directory = new RAMDirectory();

// To store an index on disk, use this instead (note that the
// parameter true will overwrite the index in that directory
// if one exists):
//Directory directory = FSDirectory.getDirectory("/tmp/testindex", true);
IndexWriter iwriter = new IndexWriter(directory, analyzer, true);
iwriter.setMaxFieldLength(25000);

Document doc = new Document();
String text = "This is the text to be indexed.";
doc.add(new Field("fieldname", text, Field.Store.YES, Field.Index.TOKENIZED));
iwriter.addDocument(doc);
iwriter.close();

// Now search the index:
IndexSearcher isearcher = new IndexSearcher(directory);

// Parse a simple query that searches for "text":
Query query = QueryParser.parse("text", "fieldname", analyzer);
Hits hits = isearcher.search(query);
System.out.println(hits.length());

// Iterate through the results:
for (int i = 0; i < hits.length(); i++)
{
Document hitDoc = hits.doc(i);
System.out.println(hitDoc.get("fieldname"));
}

isearcher.close();
directory.close();
}
}

In the above example, hits return the matching documents for the query provided.

Comments

Popular posts from this blog

Listen Hindi Internet Radio Channels on PS3

PS3 is the best gadget i have ever used and its true "It only do everything". Having used it to play games, watch netflix, youtube and see my collection of pictures and listen to songs. I was searching for a way to play radio on PS3 and specifically "Hindi Internet Radio Channels" After spending couple of days, finally I have it working in few easy steps: 1. Download PS3 Media server on you laptop or PC: http://ps3mediaserver.blogspot.com/ 2. Open WEB.conf file of PS3 and add following lines: audiostream.Web,Radio=Desi Radio - www.desi-radio.com,http://76.73.90.27:80/ audiostream.Web,Radio=Desi-Radio - www.desi-radio.com,http://76.73.126.218:80/ 3. Restart PS3 Media Server 4. In your PS3, you should see PS3 Media server, open following path: Web -> Radio You should see "Desi Radio" in list. 5. Click on Desi Radio and you have live hindi songs streaming on your Ps3. I am searching more hindi internet radio channels, will update this blog when i find more...

MQTT : Android step by step guide using Eclipse Paho

For MQTT integration, recently explored Paho Android project, very simple to use, here are the steps: Intialize a client, set required options and connect.     MqttAndroidClient mqttClient = new MqttAndroidClient(BaseApplication.getAppContext(), broker, MQTT_CLIENT_ID);     //Set call back class     mqttClient.setCallback(new MqttCallbackHandler(BaseApplication.getAppContext()));     MqttConnectOptions connOpts = new MqttConnectOptions();     IMqttToken token = mqttClient.connect(connOpts); Subscribe to a topic.     token.setActionCallback(new IMqttActionListener() {       @Override       public void onSuccess(IMqttToken arg0) {            mqttClient.subscribe("TOPIC_NAME" + userId, 2, null, new IMqttActionListener() {                 @Override                 public void ...

Learn to Play Keyboard

One day I started searching internet about the same thing, how to play keyboard, went through a number of sites, blogs etc.. etc. But finally i learned it on my own, yes dont get surprised this is truth, because most of the sites will just provide you information on keyboard notes etc. But none of them tell you what goes wrong that you dont end up with success in keyboard playing. Well I am sharing my experience here, along with a couple of steps which I hope will be really helpful to you. Please read it like a story not like a lesson about keyboard learning, this will make it easy for you to learn keyboard. Also, this is for those who are busy with their life but still want to lean to play keyboard. First and foremost thing: a) Keep your keyboard at a place, where you can see it, see it in the morning when you get up, see it easily when you go around in your home, dont dump it into an almirah or trunk, this helps, dont get surprised, because everytime you see it at the back of the min...