Feb 012013
 

From the Apache Solr website:

Apache Solr is the popular open source enterprise search platform. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search.

From the Apache Tika website:

The Apache Tika toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.

Here are some instructions for installing Apache Solr (search engine) and Apache Tika (content analysis), the quick way …

The search and install commands are for a Centos distro replace ‘yum’ with ‘apt-get’ for an Ubuntu distro. All commands run as root.

# yum search tomcat ...or... apt-get cache search tomcat
# yum install tomcat6 ...or... apt-get install tomcat6

Installing Tomcat is the quickest way to install all the java packages, libraries and dependencies needed by Apache Solr/Tika. Tomcat does not need to be configured or running in order to run an instance of Apache Solr/Tika as they are self-contained java jars.

# cd /usr/local/src/

To obtain Apache Solr: (Mirror link obtained from http://lucene.apache.org/solr/downloads.html)

# wget http://mirror.gopotato.co.uk/apache/lucene/solr/3.6.2/apache-solr-3.6.2.tgz
# tar xvfz apache-solr-3.6.2.tgz

To obtain Apache Tika: (Mirror link obtained from http://www.apache.org/dyn/closer.cgi/tika/tika-app-1.3.jar)

# wget http://mirror.ox.ac.uk/sites/rsync.apache.org/tika/tika-app-1.3.jar

There are several Drupal modules to be installed to use Apache Solr/Tika

  1. Apache Solr Search Integration
  2. Apache Solr Attachments
  3. Apache Solr Autocomplete

Download, install and activate these modules in Drupal for example into /var/www/drupal/sites/all/modules/contrib/

Copy the Drupal Apache Solr Search Integration modules Solr configuration files into the Apache solr configuration directory. This will overwrite three files ‘protwords.txt, schema.xml, solrconfig.xml’

# cp /var/www/drupal/sites/all/modules/contrib/apachesolr/solr-conf/solr-3.x/* /usr/local/src/apache-solr-3.6.2/example/solr/conf/

Apache Solr is now configured with the appropriate configuration files from the Drupal modules ‘solr-conf/solr-3.x’ directory as per the modules installation instructions. Apache Solr is now configured and available to the Drupal installation, to start Apache Solr:

# cd /usr/local/src/apache-solr-3.6.2/example/
# java -jar start.jar

Copy the Apache Tika java jar into an areas accessible by Drupal for example into /var/www/drupal/tmp/

# cp /usr/local/src/tika-app-1.3.jar /var/www/drupal/tmp/

Apache Tika is now available to the Drupal installation, Apache Tika is run by the Drupal Apache Solr Attachments module.

The Drupal Solr modules will need to be configured appropriately, the Solr server URL would be for example: Options: (Solr server URL: http://192.168.0.99:8983/solr)

… Job Done …

14,500 total views, 4 views today

  2 Responses to “Simple Solr/Tika Installation”

  1. Thank you SO Much for this, you saved me and my buddy a ton of development time and headaches… you rock.

  2. Thanks alot. Solr configured and integrated with drupal but when i click on Test your Tika Extraction button its showing the following error message.

    “Text can not be successfully extracted. Please check your settings.”

    please help….

 Leave a Reply

*

© 2011 Indimon Internet Services

Site last updated March 11, 2017 @ 9:57 am; This content last updated February 5, 2013 @ 4:56 pm

Return to Top ▲Return to Top ▲