25 May 2013
Support Center
»
Knowledgebase
»
Howto parse / reindex all archived mails
Howto parse / reindex all archived mails
Solution
In May 2009 we have improved our email indexing and added support for some MS Office Documents. This article describes how to install the latest indexing script as well as how to add support for doc and docx files. There are 2 components to our indexing scheme, a parsing script (fetchdata.pl) and and indexer (sphinx). It is assumed Sphinx is installed and running.
You will need the latest version of
fetchdata.pl
:
Adding support for .doc and .docx files.
1) MPP will use antiword if "antiword" application exists in PATH to process DOC documents (Word 98-2003).
To install Antiword:
wget -c
http://www.winfield.demon.nl/linux/antiword-0.37.tar.gz
tar xzvf antiword-0.37.tar.gz
cd antiword-0.37
sudo make -f Makefile.Linux
sudo make -f Makefile.Linux global_install
2) To process DOCX (Word 2007) if "docx2txt" application exists in PATH
To install Docx2txt use:
http://garr.dl.sourceforge.net/sourceforge/docx2txt/docx2txt-0.3.tgz
tar xzvf docx2txt-0.3.tar.gz
cd docx2txt-0.3
sudo make install
3) To process PDF documents if "pdftotext" application exists in PATH
Pdftotext is part of poppler (
http://poppler.freedesktop.org/
) and you should install the right binaries for your OS.
4) Processing OpenOffice documents is possible using Openoffice::OODoc module.
On RHEL/Fedora/CentOS where RPMForge repository is in use, one can install using: yum install perl-OODoc
Installing using CPAN is also possible:
perl -MCPAN -e shell
install Openoffice::OODoc
Note: OS X users please use /usr/local/mppbase/bin/perl -MCPAN -e shell
How to rebuild your email archive index:
Warning!!!
Re-indexing can take many hours for a large database. Full text search will not be available during this period but other services will not be affected. This process is CPU intensive.
1) stop Sphinx searchd daemon:
killall searchd
2) remove existing index files:
rm -f /usr/local/sphinx/var/data/mpp*
3) drop data from content_index and content_counter tables of MPP Archive DB
mysql -uroot -p
use mppdb;
truncate content_counter;
truncate content_index;
4) temporary disable cronjobs for fetchdata and indexer.
Use: crontab -e and comment out the following
#5 * * * * /usr/local/MPP/scripts/fetchdata.pl >/dev/null 2>&1 </dev/null
#45 * * * * /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf mppdeltaindex --rotate >/dev/null 2>&1 </dev/null
5) Download and install latest fetchdata.pl from
ftp://ftp.messagepartners.com/pub/mpp4/scripts/fetchdata.pl
in /usr/local/MPP/scripts/fetchdata.pl
cd /usr/local/MPP/scripts/
mv fetchdata.pl fetchdata.pl.old
wget -c
ftp://ftp.messagepartners.com/pub/mpp4/scripts/fetchdata.pl
chmod 755 fetchdata.pl
Note: Edit MySQL credentials and set $metadata = 1 if you are using MySQL only for metadata.
6) Edit MySQL credentials in fetchdata.pl to meet your DB requirements and also set $metadata variable to 0 or 1 depending on your setup
7) Run fetchdata.pl parser (it could take some time if there are many messages in DB)
perl /usr/local/MPP/scripts/fetchdata.pl
8) Index parsed data
/usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --all
9) start Sphinx searchd daemon
/usr/local/sphinx/bin/searchd --config /usr/local/sphinx/etc/sphinx.conf
10) enable cronjobs to parse / index data back
Use: crontab -e and uncomment the following
5 * * * * /usr/local/MPP/scripts/fetchdata.pl >/dev/null 2>&1 </dev/null
45 * * * * /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf mppdeltaindex --rotate >/dev/null 2>&1 </dev/null
Article Details
Article ID:
54
Created On:
12 May 2009 03:28 PM
This answer was helpful
This answer was not helpful
User Comments
Add a Comment
Sharing is good. So if you have a comment about this entry please feel free to share. The Comments might be reviewed by our Staff and might require approval before being posted. Questions posted will not be answered, please submit a ticket for support requests.
Fullname:
Email: (Optional)
Comments:
Back
Login
[Lost Password]
Email:
Password:
Remember Me:
Search
-- Entire Support Site --
Knowledgebase
Downloads
Article Options
Add Comment
Print Article
PDF Version
Email Article
Add to Favorites
Home
|
Register
|
Submit a Ticket
|
Knowledgebase
|
Downloads
Language:
English (U.S.)