JavaXp.com | Java Experts Blog | Java Examples | API | Errors

Showing posts with label Lucene. Show all posts

Monday, December 10, 2012

Lucene - Updating index for an existing file

Updating index files could mean below two possibilities -

Adding a new file to existing index
Updating an existing file

Adding a new file to an existing index

Adding a new file is very simple. Please see my previous article. click here.

Updating an existing file

When you do IndexWriter.add() for a document that is already in the index it won't overwrite the previous document instead it will add multiple copies of the same document in the index.

There is no direct update procedure in Lucene. To update an index incrementally you must first delete the documents that were updated, and then re-add them to the index. In this example we will see how to delete a file and then you can re-add the same file with the help of my previous article. click here.

How to delete a documents from the index?

IndexWriter allows you to delete by Term or by Query. The deletes are buffered and then periodically flushed to the index, and made visible once commit() or close() is called.

IndexReader can also delete documents, by Term or document number, but you must close any open IndexWriter before using IndexReader to make changes (and, vice/versa). IndexReader also buffers the deletions and does not write changes to the index until close() is called, but if you use that same IndexReader for searching, the buffered deletions will immediately take effect. Unlike IndexWriter's delete methods, IndexReader's methods return the number of documents that were deleted.

Generally it's best to use IndexWriter for deletions, unless 1) you must delete by document number, 2) you need your searches to immediately reflect the deletions or 3) you must know how many documents were deleted for a given deleteDocuments invocation.

If you must delete by document number but would otherwise like to use IndexWriter, one common approach is to make a primary key field, that holds a unique ID string for each document. Then you can delete a single document by creating the Term containing the ID, and passing that to IndexWriter's deleteDocuments(Term) method.

Once a document is deleted it will not appear in TermDocs nor TermPositions enumerations, nor any search results. Attempts to load the document will result in an exception. The presence of this document may still be reflected in the docFreq statistics, and thus alter search scores, though this will be corrected eventually as segments containing deletions are merged.

To know more, click here.

Lucene - Updating index files

Lucene 3.0 example - Indexing and searching database tables

Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Apache Lucene is an open source project available for free download. To know more about Lucene , click here.

About the example:

Here is the simple java program which will create index files from the data which is fetched from database. And it will perform search from the created index files and display the results. We are using Lucene 3.0 and My SQL. The basics behind is very simple, we will fetch data from database using JDBC (you can use Hibernate or so accordingly) and create index files.

You can store database column(s) in the index files depending upon your requirement. If you want to perform search in one or two column only then no need to add all columns in index files. You can store that particular column and primary key, and perform search on that column and retrieve primary key and use it accordingly.

Creating index is simple and straight forward just add the filed and filed values in Document. Searching can be done in many ways as per requirement, if you want to perform search on one filed then you can use QueryParser, for searching multiple field you can use MultiFieldQueryParser. In query you can use wild card (e.g. DATA*), logical operators (e.g DATA1 OR DATA2) etc.

To run below example please add lucene-core-3.0.2.jar (For Lucene) and mysql-connector-java-5.1.5.jar (For JDBC - My SQL) in your application's classpath.
To download lucene-core-3.0.2.jar, click here.
How to install My SQL, click here.

org.apache.lucene.store.LockObtainFailedException

While creating index files for Lucene 3.0.2 search engine I got below mentioned error.

Exception:

org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/export/home/Lucene/indexFiles/write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:84)
        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
        at org.apache.lucene.index.IndexWriter.(IndexWriter.java:882)

Solution :

Check for IndexWriter object. Make sure you close IndexWriter object after completing index creation.
Check for any exceptions and close IndexWriter object if exception occurs.

Possible reason is there may be any exception occurred and you missed to close the IndexWriter object in catch block.

code snippte:

// Store the index in file
Directory directory = new SimpleFSDirectory(new File(indexLoc));
IndexWriter iwriter = new IndexWriter(directory, analyzer, isNew,MaxFieldLength.UNLIMITED);

iwriter.optimize();
iwriter.close();

If you need complete tested code for Lucene 3.0.2 example, click here

Friday, July 30, 2010

Java : Simple Lucene 3.0 example

Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching

To run this example you need to download lucene-3.0.2.zip from http://www.apache.org/dyn/closer.cgi/lucene/java

If you need more information about Lucene go to http://lucene.apache.org/java/docs/index.html

To use Lucene, an application should:

1. Create Documents by adding Fields;
2. Create an IndexWriter and add documents to it with addDocument();
3. Call QueryParser.parse() to build a query from a string; and
4. Create an IndexSearcher and pass the query to its search() method.

Lets create a directory called "AllFiles" that contains text files that we are going to index. We have a directory called "LuceneIndexDirectory". This will hold the index that lucene creates.

Now Lets create few files in "AllFiles" folder which will contain few key words which we will search. Here are the files below.

Look at sample folder structure

Java.txt

String
Object
ArrayList
Hashtable
Integer
Random

SQL.txt

Select
Group by
Where
From
random

Javascript.txt

object
Var
function
random

Now lets look at a simple example SimpleLucenExaple.java

First we will create index of all files in our "LuceneIndexDirectory" folder using createIndex(); method, then we will try to search few key words in our files using searchIndex(""); method

To know about this example lets look at Lucene 3.0.1 API

Assuming you have set lucene-core-3.0.2.jar, lucene-demos-3.0.2.jar in classpath.

/*
SimpleLucenExaple.java
*/

import java.io.File;
import java.io.FileReader;
import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriter.MaxFieldLength;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.SimpleFSDirectory;
import org.apache.lucene.util.Version;

public class SimpleLucenExaple {

String allFiles = "AllFiles";

String luceneIndexDirectory = "LuceneIndexDirectory";

IndexSearcher searcher = null; //the searcher used to open/search the index

Query query = null; //the Query created by the QueryParser
TopDocs hits = null; //the search results

public void searchIndex(String searchString)
{
System.out.println("Searching.... '" + searchString + "'");

try
{
IndexReader reader = IndexReader.open(FSDirectory.open(new File(luceneIndexDirectory)), true);
searcher = new IndexSearcher(reader);

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30 );//construct our usual analyzer

QueryParser qp = new QueryParser(Version.LUCENE_30 , "contents", analyzer);
query = qp.parse(searchString); //parse the query and construct the Query object

hits = searcher.search(query, 100); // run the query

if (hits.totalHits == 0)
{
System.out.println("No data found.");
}
else
{
for (int i = 0; i < hits.totalHits; i++)
{
   Document doc = searcher.doc(hits.scoreDocs[i].doc); //get the next document
   String url = doc.get("path"); //get its path field
   System.out.println("Found in :: "+url); }
}
}
catch (Exception e)
{
   e.printStackTrace();
}
}

public void createIndex()
{

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
try
{
    // Store the index in file

   Directory directory = new SimpleFSDirectory(new File(luceneIndexDirectory));
   IndexWriter iwriter = new IndexWriter(directory, analyzer, true,MaxFieldLength.UNLIMITED);
   File dir = new File(allFiles);

   File[] files = dir.listFiles();

   for (File file : files)
   {
    System.out.println(file.getPath());
    Document doc = new Document();

    doc.add(new Field("path", file.getPath(), Field.Store.YES, Field.Index.ANALYZED ));

    Reader reader = new FileReader(file.getCanonicalPath());

    doc.add(new Field("contents", reader)); iwriter.addDocument(doc);
   }

   iwriter.optimize(); iwriter.close();
}
catch (Exception e)
{

e.printStackTrace();
}

}

public static void main(String[] args)
{

   SimpleLucenExaple obj = new SimpleLucenExaple();
   System.out.println("************Creating Index************");
   obj.createIndex();
   System.out.println("************Searching************");
   obj.searchIndex("Object AND Random");
   obj.searchIndex("Object");
   obj.searchIndex("random OR Object");
   obj.searchIndex("ObjectRandom");
   obj.searchIndex("Function");
   obj.searchIndex("Group");
   obj.searchIndex("form where");
}

}

Console output

************Creating Index************
AllFiles\Java.txt
AllFiles\Javascript.txt
AllFiles\SQL.txt
************Searching************
Searching.... 'Object AND Random'
Found in :: AllFiles\Javascript.txt
Found in :: AllFiles\Java.txt
Searching.... 'Object'
Found in :: AllFiles\Javascript.txt
Found in :: AllFiles\Java.txt
Searching.... 'random OR Object'
Found in :: AllFiles\Javascript.txt
Found in :: AllFiles\Java.txt
Found in :: AllFiles\SQL.txt
Searching.... 'ObjectRandom'
No data found.
Searching.... 'Function'
Found in :: AllFiles\Javascript.txt
Searching.... 'form where'
Found in :: AllFiles\SQL.txt

To know about this example lets look at Lucene 3.0.1 API

Any feedback/comments? Do write me, I would love to answer it.

Pages