Friday, July 30, 2010

Java : Simple Lucene 3.0 example

Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching

To run this example you need to download lucene-3.0.2.zip from http://www.apache.org/dyn/closer.cgi/lucene/java

If you need more information about Lucene go to http://lucene.apache.org/java/docs/index.html

To use Lucene, an application should:

1. Create Documents by adding Fields;
2. Create an IndexWriter and add documents to it with addDocument();
3. Call QueryParser.parse() to build a query from a string; and
4. Create an IndexSearcher and pass the query to its search() method.


Lets create a directory called "AllFiles" that contains text files that we are going to index. We have a directory called "LuceneIndexDirectory". This will hold the index that lucene creates.

Now Lets create few files in "AllFiles" folder which will contain few key words which we will search. Here are the files below.

Look at sample folder structure



Java.txt

String
Object
ArrayList
Hashtable
Integer
Random

SQL.txt

Select
Group by
Where
From
random

Javascript.txt

object
Var
function
random

Now lets look at a simple example SimpleLucenExaple.java

First we will create index of all files in our "LuceneIndexDirectory" folder using createIndex(); method, then we will try to search few key words in our files using searchIndex(""); method

To know about this example lets look at Lucene 3.0.1 API

Assuming you have set lucene-core-3.0.2.jar, lucene-demos-3.0.2.jar in classpath.


/*
SimpleLucenExaple.java
*/

import java.io.File;
import java.io.FileReader;
import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriter.MaxFieldLength;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;

public class SimpleLucenExaple {

String allFiles = "AllFiles";

String luceneIndexDirectory = "LuceneIndexDirectory";

IndexSearcher searcher = null; //the searcher used to open/search the index

Query query = null; //the Query created by the QueryParser
TopDocs hits = null; //the search results

public void searchIndex(String searchString)
{
System.out.println("Searching.... '" + searchString + "'");

try
{
IndexReader reader = IndexReader.open(FSDirectory.open(new File(luceneIndexDirectory)), true);
searcher = new IndexSearcher(reader);

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30 );//construct our usual analyzer

QueryParser qp = new QueryParser(Version.LUCENE_30 , "contents", analyzer);
query = qp.parse(searchString); //parse the query and construct the Query object

hits = searcher.search(query, 100); // run the query

if (hits.totalHits == 0)
{
System.out.println("No data found.");
}
else
{
for (int i = 0; i < hits.totalHits; i++) 


   Document doc = searcher.doc(hits.scoreDocs[i].doc); //get the next  document 
   String url = doc.get("path"); //get its path field  
   System.out.println("Found in :: "+url); }


catch (Exception e) 

   e.printStackTrace(); 

} 

public void createIndex() 


Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);  
try 
 
    // Store the index in file 

   Directory directory = new SimpleFSDirectory(new File(luceneIndexDirectory));      
   IndexWriter iwriter = new IndexWriter(directory, analyzer, true,MaxFieldLength.UNLIMITED); 
   File dir = new File(allFiles); 

   File[] files = dir.listFiles();  

   for (File file : files) 
   { 
    System.out.println(file.getPath());  
    Document doc = new Document(); 
    
    doc.add(new Field("path", file.getPath(), Field.Store.YES, Field.Index.ANALYZED )); 

    Reader reader = new FileReader(file.getCanonicalPath()); 

    doc.add(new Field("contents", reader)); iwriter.addDocument(doc); 
   }

   iwriter.optimize(); iwriter.close(); 
 } 
catch (Exception e) 


e.printStackTrace();




public static void main(String[] args) 


   SimpleLucenExaple obj = new SimpleLucenExaple(); 
   System.out.println("************Creating Index************"); 
   obj.createIndex(); 
   System.out.println("************Searching************"); 
   obj.searchIndex("Object AND Random"); 
   obj.searchIndex("Object"); 
   obj.searchIndex("random OR Object"); 
   obj.searchIndex("ObjectRandom"); 
   obj.searchIndex("Function"); 
   obj.searchIndex("Group"); 
   obj.searchIndex("form where"); 
}

}  


Console output

************Creating Index************
AllFiles\Java.txt
AllFiles\Javascript.txt
AllFiles\SQL.txt
************Searching************
Searching.... 'Object AND Random'
Found in :: AllFiles\Javascript.txt
Found in :: AllFiles\Java.txt
Searching.... 'Object'
Found in :: AllFiles\Javascript.txt
Found in :: AllFiles\Java.txt
Searching.... 'random OR Object'
Found in :: AllFiles\Javascript.txt
Found in :: AllFiles\Java.txt
Found in :: AllFiles\SQL.txt
Searching.... 'ObjectRandom'
No data found.
Searching.... 'Function'
Found in :: AllFiles\Javascript.txt
Searching.... 'form where'
Found in :: AllFiles\SQL.txt


To know about this example lets look at Lucene 3.0.1 API


Any feedback/comments? Do write me, I would love to answer it.