From the Apache Solr website:
Apache Solr is the popular open source enterprise search platform. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search.
From the Apache Tika website:
The Apache Tika toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.
Here are some instructions for installing Apache Solr (search engine) and Apache Tika (content analysis), the quick way …
The search and install commands are for a Centos distro replace ‘yum’ with ‘apt-get’ for an Ubuntu distro. All commands run as root.
# yum search tomcat ...or... apt-get cache search tomcat # yum install tomcat6 ...or... apt-get install tomcat6
Installing Tomcat is the quickest way to install all the java packages, libraries and dependencies needed by Apache Solr/Tika. Tomcat does not need to be configured or running in order to run an instance of Apache Solr/Tika as they are self-contained java jars.
# cd /usr/local/src/
To obtain Apache Solr: (Mirror link obtained from http://lucene.apache.org/solr/downloads.html)
# wget http://mirror.gopotato.co.uk/apache/lucene/solr/3.6.2/apache-solr-3.6.2.tgz # tar xvfz apache-solr-3.6.2.tgz
To obtain Apache Tika: (Mirror link obtained from http://www.apache.org/dyn/closer.cgi/tika/tika-app-1.3.jar)
# wget http://mirror.ox.ac.uk/sites/rsync.apache.org/tika/tika-app-1.3.jar
There are several Drupal modules to be installed to use Apache Solr/Tika
- Apache Solr Search Integration
- Apache Solr Attachments
- Apache Solr Autocomplete
Download, install and activate these modules in Drupal for example into /var/www/drupal/sites/all/modules/contrib/
Copy the Drupal Apache Solr Search Integration modules Solr configuration files into the Apache solr configuration directory. This will overwrite three files ‘protwords.txt, schema.xml, solrconfig.xml’
# cp /var/www/drupal/sites/all/modules/contrib/apachesolr/solr-conf/solr-3.x/* /usr/local/src/apache-solr-3.6.2/example/solr/conf/
Apache Solr is now configured with the appropriate configuration files from the Drupal modules ‘solr-conf/solr-3.x’ directory as per the modules installation instructions. Apache Solr is now configured and available to the Drupal installation, to start Apache Solr:
# cd /usr/local/src/apache-solr-3.6.2/example/ # java -jar start.jar
Copy the Apache Tika java jar into an areas accessible by Drupal for example into /var/www/drupal/tmp/
# cp /usr/local/src/tika-app-1.3.jar /var/www/drupal/tmp/
Apache Tika is now available to the Drupal installation, Apache Tika is run by the Drupal Apache Solr Attachments module.
The Drupal Solr modules will need to be configured appropriately, the Solr server URL would be for example: Options: (Solr server URL: http://192.168.0.99:8983/solr)
… Job Done …