Since big web applications came into existence, searching for data (and doing it lightning fast and accurate) has been one of the most important problems in web applications. For a while, I've worked using Lucene.NET, which is a C# port of the Lucene project.

I also work using PHP using Zend Framework's Lucene API, which brings me to my question. Most times for providing good indexing we need to perform some NLP tools like tokenizing, lemmatizing, and many more, the question is:

Do you know of any good NLP programming framework/toolset using PHP?

PS: I'm very aware of the Zend API for Lucene, but indexing data properly is not just storing and relying in Lucene, you need to perform some extra tasks, like those above.

Accepted Answer

I would suggest that you look at Solr, which is a best practice implementation of Lucene. Solr uses a REST based API that also has a very good PHP client. This will allow you to leverage the power of Lucene without needing to perform any of the low level programming to get the NLP power that you want. Also, you would probably want to grab the trunk version of Solr as the NLP development is very active right now and new capabilities are being added every day.

Written by Paige Cook
This page was build to provide you fast access to the question and the direct accepted answer.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki