The information below is also available in the INSTALL.htdig-mm file which is installed by this patch.
Different versions of this patch are available for different versions of Mailman. There may be different versions of this patch for any given version of Mailman, typically as a results of MM version specific improvements or corrections of bugs in the patched code. The names of patch files for this patch are structured as follows:
htdig-<MM-version-no>-<patch-version-no>.patch[.gz]
Thus, for instance, patch file htdig-2.1.4-0.1.patch is patch version 0.1 for application to MM version 2.1.4 source code.
The <patch-version-no> is reset to 0.1 for the first patch version applicable to each new version of Mailman.
The .gz suffix, if present, says that the patch file has been compressed using gzip.
As a general rule, you should use the highest patch version number for the MM version you are installing.
The current version of this patch is for Mailman 2.1.11:
| Mailman 2.1.11 | - |
htdig-2.1.11-0.1.patch
|
Be sure to read the notes in the changes section below about the patch version you are going to use.
Patches for previous versions of Mailman are frozen at the highest revision level they reached while those previous versions of MM were current.
Information about older Mailman and patch versions is given in the history section below.
The following changes are introduced by version 0.1 of this patch:
The frequency with which extra languages are being supported by Mailman exceed my capacity to cope. From htdig-2.1.9-0.1.patch on only the English language (default) templates are guaranteed to have been patched. The following files in a language's default template directory should be checked and if necessary modified per the changes made to the en language templates after installation of this patch if that other language is used:
You must have a working installation of htdig with htsearch available and installed on either the machine on which you are running Mailman or on another machine which has access to Mailman list archives via NFS or some similarly competent network file sharing scheme.
Regardless of how you configure things to provide Mailman's Web UI, if its gives normal operation of the /mailman/private CGI script for providing access to private list archives, it should also support access to htdig search results via the /mailman/mmsearch and /mailman/htdig CGI scripts.
Warning: This patch has been tested with HTdig 3.1.6 and no testing has been done with the Beta versions of HTdig 3.2 at the time of writing. You may or may not encounter problems/issues not described here if you use HTdig 3.2 beta or stable releases.
Prior to installing this patch you may also need to install the other MM patches. This will depend on the version of Mailman and the version of this patch you are dealing with. For version 0.3 of this patch for MM 2.1.3 the latest version of patch #444879,
indexing-2.1.3-x.y.patch
For any other version of this patch details of its prerequisites are in the version of INSTALL.htdig-mm file which is installed by that patch.
This integration enables use of the htdig (http://www.htdig.org) search engine for searching mail list archives produced by pipermail, Mailman's built-in archiver.
You can use htdig without applying these patches to Mailman but you may find it awkward to achieve some of the features offered by this patch.
The main features of the patch are:
a common base URL for both public and private archive access via htsearch results. This means that htdig indices are unaffected by changing an archive from private to public and vice versa. All access to archives via htdig is controlled by wrapped CGI scripts called htdig.py and mmsearch.py.
Note that Mailman's attachment scrubber creates a problem when it extracts attachments from messages as they are being archived because it embeds absolute URLs to what it has extracted in the archived messages. This can only be fixed by running $prefix/bin/arch to rebuild the list's archive from its mbox file after changing its archive from private to public or vices versa. This problem is generic and unrelated to the use of this patch. One way resolving it is by use the Mailman-MHonArc integration patch #???????? available from
$prefix/templates/ directory hierarchy so that site, virtual host, list and language tailoring of them can be done.
Create your Mailman build directory in the normal way.
You can apply the patch to either a fresh expansion of the Mailman source distribution or the one you used to build a currently working Mailman installation.
Execute the following command in the Mailman build directory:
patch -p1 < path-to-htdig-2.m.n-x.y.patch
Follow the configure and make procedures for regular Mailman as given in the $build/INSTALL file.
Then follow the Mailman-htdig configuration instructions given below.
$build/INSTALL
Adds a reference to this file to the standard installation notes.
$prefix/bin/check_perms
To set the permissions for access to $prefix/archive/private/<listname>/htdig/ subdirectories to 2770. This prevents access by 'other', as a security measure.
$prefix/Mailman/Archiver/HyperArch.py
The changes in this file set up the per list htdig stuff such as config files and adds the search forms to the list TOC pages.
$prefix/Mailman/Queue/ArchRunner.py
The changes in this file rewrite a list's TOC page if, when archiving a new message for the list, the update time of the list's TOC page are after the last time that rundig was last run. This is is only of relevance when one of the remote_nightly_htdig series of cron scripts (see below) is being used.
The only deficiency with this approach is that if no message is sent to the list after rundig is run for the list the TOC page is not rewritten to reflect that rundig was run.
$prefix/Mailman/Cgi/private.py
There is a security hole in the released Mailman code via which private.py will serve files such as a list's archive pipermail.pck and files in the list's archive database sub-directory. This hole also allows access to the list's archive htdig sub-directory. Fixes for this are applied. As htdig.py (see below) is based on private.py the same security fix has been incorporated into it.
$build/Mailman/Defaults.py.in
Adds the default configuration variables needed to support the mailman-htdig integration
$build/cron/crontab.in.in
Adds the nightly_htdig cron script to the default crontab
$build/configure
$build/configure.in
$build/Makefile.in
$build/cron/Makefile.in
$build/src/Makefile.in
$build/bin/Makefile.in
Changes to configuration and Makefiles used for installing Mailman
$build/INSTALL.htdig-mm and $build/INSTALL.htdig-mm.html
These contain the material you are reading.
$prefix/cgi-bin/htdig
$prefix/Mailman/Cgi/htdig.py
these are a CGI script and its wrapper, which is always on the path of URLs returned from searches of htdig indices. The script provides secure access to such URLs in the same way that the $prefix/cgi-bin/private and $prefix/Mailman/Cgi/private.py. Both htdig.py and private.py ensures private archives are kept private, applying the same criteria for permitting access. Additionally, htdig.py delivers material from public archives without demanding any authentication.
$prefix/cgi-bin/mmsearch
$prefix/Mailman/Cgi/mmsearch.py
these are a CGI script and its wrapper. The script acts as a security wrapper for htdig's htsearch CGI script. It will only run htsearch if the user is authorized to access a list's archive. it applies the same criteria as $prefix/Mailman/Cgi/private.py. In the case of local htdig operation, this script runs htsearch as a sub-process and returns its results. In the case of remote htdig operation mmsearch runs htsearch on the remote machine via one or other of the CGI scripts remote_mmsearch and remote-mmsearch.
$prefix/Mailman/Cgi/remote_mmsearch
$prefix/Mailman/Cgi/remote-mmsearch
these are companion scripts of mmsearch for use with remote htdig operation. They are run by mmsearch via HTTP requests, and in turn run htsearch as a sub process, returning the results it delivers.
$prefix/bin/blow_away_htdig
this is a utility script for removing per list htdig data, e.g. the config file and indices/db files. This is necessary when:
htdig-2.1.1-0.2.patch or later from an earlier patch version, and prior to running nightly_htdig
$prefix/cron/nightly_htdig
$prefix/cron/remote_nightly_htdig
$prefix/cron/remote_nightly_htdig_noshare
$prefix/cron/remote_nightly_htdig.pl
These scripts all do the same thing; they can be installed as a cron task and run regularly to invoke htdig's rundig script to update mailing list search indices. Only one of these scripts is used, the choice of which depending on your system configuration.
nightly_htdig is used where Mailman and htdig run on the same system.
the remote_... scripts are used where Mailman and htdig live on different systems. You choose which one suits your needs best:
remote_nightly_htdig uses the same python files on both systems, that is the same .py and .pyc files are accessed, and it hence depends on compatible bytecode between the Mailman system and htdig system. It also accesses Mailman data files and depends on compatibility of data files contents, for example pickled Python values. This should work OK if the same version of python is being run on both systems even where the systems are not heterogeneous, for example one is Sun/Solaris and the other is PC/Linux.
remote_nightly_htdig_noshare shares no Python files between the two systems. While it is still written in Python it acquires information from the file system using directory listings and stat operations.
remote_nightly_htdig.pl is a rewrite of remote_nightly_htdig_noshare in Perl. It is for use where the htdig system does not have Python available on it: in which case, shame on you.
$prefix/templates/en/TOC_htsearch.html
$prefix/templates/en/htdig_access_error.html
$prefix/templates/en/htdig_auth_failure.html
$prefix/templates/en/htdig_conf.txt
These are English language templates special to the htdig integration:
TOC_htsearch.html
htdig_access_error.html
htdig_auth_failure.html
htdig_conf.txt
htdig.conf files generated by the patched code.
Configuration of the Mailman-htdig integration is carried out on the Mailman side. While you must have to hand some information about your htdig installation, you should not have to tinker much with htdig for the integration to work.
Most of the configuration of the integration is done by values assigned to python variables in either $prefix/Mailman/Defaults.py or $prefix/Mailman/mm_cfg.py.
If you opt to run htdig on a different machine or under a different HTTP server to the one running the HTTP server which provides Mailman's Web UI you will also have to edit whichever of the patch's three htdig related cron scripts you opt to run (remote_nightly_htdig, remote_nightly_htdig_noshare, or remote_nightly_htdig.pl) to add a small amount of configuration information.
Be careful when editing configuration information in $prefix/Mailman/mm_cg.py: the only Mailman config file you should be editing. Check, double check and then recheck before going ahead. If you get either variable names or their values wrong a lot of confusion in the operation of both Mailman and htdig can result.
You (and others supporting you) can spend hours trying to identify problems and looking for non-existent bugs as a consequence of such editing errors. Expect to find errors in these instructions; compensate for them and tell me when you do (r.barrett at openinfo.co.uk).
Also do read the htdig documentation, release notes etc. This patch integrates a working htdig with htsearch available. These notes are about Mailman and integrating it with that working htdig. It is up to you to sort out the htdig end of things.
This is getting ahead of things but some of you may already be asking "What if I've already been using an older version of this patch and want to start afresh?", or "I want to change from local to remote htdig or vice versa?"
In these cases your friend will be the $prefix/bin/blow_away_htdig script. It removes existing htdig related stuff out of your Mailman installation to the extent that it was added by this patch and added to by the normal operation of pipermail and nightly_htdig. With that removed and a revised Mailman configuration, the patched code will start rebuilding the htdig data.
But before you get carried away with blow_away_htdig, read the rest of these notes.
This patch adds a number of default variables to the file $prefix/Mailman/Defaults.py that affect operation of the Mailman-htdig integration. These are in addition to the standard Mailman defaults in that file. If, in the light of what is said below, you decide any of these are incorrect, you can override them in $prefix/Mailman/mm_cfg.py [NOT IN Defaults.py! See the comments in Defaults.py for why].
By default the Mailman-htdig integration is NOT ENABLED by the installation of this patch; the default value of the USE_HTDIG variable in Defaults.py turns off the operation of the integration. You have to actively override that default in mm_cfg.py to turn on operation of the integration.
Once a list is created, changing most of these variables will have either no effect or a bad effect. You will need to run $prefix/bin/blow_away_htdig script and/or $prefix/bin/arch to rebuild the archive pages if you make significant changes to the Mailman-htdig integration configuration variables.
The install process will not overwrite an existing mm_cfg.py file so you can freely make changes to this file. If you are re-installing a later version of this patch you may have to change what is already configured in the existing file and, if necessary, add extra configuration variables to it.
Most of the Mailman-htdig control variables default to sensible values which you will not need to change, especially if you are using local htdig. The semantics of most variables apply to both local and remote htdig operation but with some the values assigned will depend on whether htdig is viewing things from the same or a remote machine.
The first two variables control what is indexed by htdig. The values assigned are both embedded in the HTML generated by pipermail in the list archives and added. Changing the values of these variables will mean that all previously generated HTML pages in list archives will be out of date and you will probably want to rebuild existing archives using $prefix/bin/arch:
ARCHIVE_INDEXING_ENABLE
Defines a string telling htdig that it should look at the following material when building it indices.
Default: ARCHIVE_INDEXING_ENABLE = '<!--/htdig_noindex-->'
ARCHIVE_INDEXING_DISABLE
Defines a string telling htdig that it not should not look at the following material when building it indices.
Default: ARCHIVE_INDEXING_DISABLE = '<!--htdig_noindex-->'
USE_HTDIG
Semantics: 0 - don't use integrated htdig, 1 - use it
Turns Mailman-htdig integration on or off.
Defaults: USE_HTDIG = 0
Notes:
when USE_HTDIG is turned on the patched code in Mailman will start adding htdig stuff for any archiving-enabled mail lists as new posts for eachlist are handled by Mailman. Until a new post is made after enabling with USE_HTDIG an existing mail list's archive will not be htdig searchable. When the new post is handled:
Even with this done, htdig searches only become available when htdig indices are constructed. This is done when one or other of the patch's htdig related cron scripts are run (nightly_htdig, remote_nightly_htdig, remote_nightly_htdig_noshare, or remote_nightly_htdig.pl, depending on how you configure your system). These can be run from the command line ahead of their scheduled cron time to get htdig searches operational.
Turning USE_HTDIG off will not remove htdig indices or search forms from existing archive-enabled lists. It will however stop htdig features from being added to newly created lists. If you want to eliminate htdig from your existing lists then use the $prefix/bin/blow_away_htdig script.
HTDIG_FILES_URL
This is the URL of the directory containing various HTML and Graphics files installed by htdig; files such as buttonr.gif, buttonl.gif and button1-10.gif. The URL must end with a '/'.
Default: HTDIG_FILES_URL = '/htdig/'
The default assumes the HTTP servers providing access to htdig and to Mailman's web UI are on the same machine and a symbolic link called 'htdig' has been put into your HTTP server's top level HTML directory which points to the directory your htdig install has put the actual files into; this link is often to /usr/share/htdig. This value will depend on your htdig installation decisions and HTTP server's configuration files (typically /etc/httpd/httpd.conf on a late model Apache installation) i.e the Alias through which the link to the htdig files are reached.
HTDIG_CONF_LINK_DIR
This is the name of a directory in which links to list specific htdig config files are placed.
Default: HTDIG_CONF_LINK_DIR = os.path.join(VAR_PREFIX, 'archives', 'htdig')
The VAR_PREFIX of the default is resolved to an actual file system path when when Mailman's 'make install' is run. The 'os.path.join' creates a full file system path by gluing together the three pieces when Mailman is run. This definition puts the directory alongside the default PUBLIC_ARCHIVE_FILE_DIR and PRIVATE_ARCHIVE_FILE_DIR. Unless you are changing the value of these variables you probably do not want to change HTDIG_CONF_LINK_DIR.
HTDIG_RUNDIG_PATH
This is the path in your file system to the rundig shell script that is installed as part of htdig. This tells one or other of the patch's htdig related cron scripts (nightly_htdig and remote_nightly_htdig) where to find rundig in order that they can execute it.
Default: HTDIG_RUNDIG_PATH = '/usr/local/bin/rundig'
HTDIG_HTSEARCH_PATH
This is the file path to the htsearch program in the htdig package.
Default: HTDIG_HTSEARCH_PATH = '/usr/local/bin/rundig'
This value will depend on your htdig installation decisions. This path is used by either the mmsearch CGI script (for local htdig) or the remote_mmsearch/remote-mmsearch CGI script (for remote htdig) to execute htsearch as a sub-process.
HTDIG_EXCLUDED_URLS
See htdig's configuration file documentation. The value of this MM variable is inserted into per-list htdig.conf files when they are created as the value of an htdig excluded_urls directive. But if an exclusion in this value would prevent indexing of URLs for accessing the htdig.py cgi wrapper then that exclusion is omitted from that per-list htdig.conf file.
Default: HTDIG_EXCLUDED_URLS = '/cgi-bin/ .cgi'
Note: these are the same as the htdig 3.1.6 default values.
REMOTE_HTDIG
Semantics: 0 - htdig runs on local machine, 1 -on remote machine
Says whether htdig going to be run on the same machine as Mailman or on another machine.
Default: REMOTE_HTDIG = 0
REMOTE_PRIVATE_ARCHIVE_FILE_DIR
Only relevant if REMOTE_HTDIG = 1. It is the file system path to the directory in which Mailman stores private archives, as seen by the machine running htdig.
Default: REMOTE_PRIVATE_ARCHIVE_FILE_DIR = os.path.join(VAR_PREFIX,
'archives', 'private')
The VAR_PREFIX of the default is resolved to an actual file system path when when Mailman's 'make install' is run. The 'os.path.join' creates a full file system path by gluing together the three pieces when Mailman is run. If you assign a value to this in mm_cfg.py, just put the relevant explicit file system path in.
REMOTE_MMSEARCH_URL
Only relevant if REMOTE_HTDIG = 1. It is the URL on the htdig machine through which whichever of the the remote_mmsearch/remote-mmsearch CGI scripts you have opted to use can be reached via an HTTP request.
Default: REMOTE_MMSEARCH_URL = '/cgi-bin/remote-mmsearch'
HTDIG_STRICT_FILE_PERM
Semantics: 0 - 'other' access allowed, 1 - 'other' access denied
Says whether 'other' has access permissions for per-list $prefix/private/archives/<listname>/htdig/ directories. For local htdig operation such access is not required and is a security hole if allowd. Such access may be needed if remote htdig is used; see notes on "Apache". $prefix/bin/check_perms should be run after changing the value of this variable in mm_cfg.py to update access permissions of existing directories.
Defaults: HTDIG_STRICT_FILE_PERM = 1
HTDIG_EXTRAS
You can assign a string value to this config variable and that string will be included in all of your site's list specific htdig configuration files when they are created. The value of the string can be any attribute declarations as defined at http://www.htdig.org/confindex.html.
Be cautious in what you do with this. Most sites will not need to use this at all. But if you have some idiosyncratic htdig installation it might help overcome problems in integrating with Mailman. If you think you need to use it I suggest:
HTDIG_EXTRAS in $prefix/Mailman/mm_cfg.py
$prefix/archives/private/<listname>/htdig/<listname>.conf.
HTDIG_EXTRAS from $prefix/Mailman/Defaults.py has been inserted. This value is onlyan htdig comment and does nothing.
HTDIG_EXTRAS in $prefix/Mailman/mm_cfg.py will make sense in the context of the rest of the htdig conf file's contents.
Python scripts added by this patch (nightly_htdig and its relatives) run the htdig rundig script identified by HTDIG_RUNDIG_PATH to build search indices for Mailman archives. Code added by this patch generates per-list htdig configuration files which are passed as a parameter to the rundig script. These configuration files identify a list specific directory ($prefix/archives/private/lt;listname>/htdig) in which list specific data files generated by and used by htdig are to be placed.
However, the rundig script identified by HTDIG_RUNDIG_PATH may attempt to generate some files in htdig's COMMON_DIR when it is first run by nightly_htdig; the files concerned are likely to be root2word.db, word2root.db, synonyms.db and possibly some others generated by htidg's htfuzzy program. The standard rundig script generates these files selectively if they do not already exist. Depending on how you have installed htdig and how the rundig script is first run, there may be a permissions problem when nightly_hdig executes rundig under the mailman UID if it tries to generate these files.
You may need to either give the mailman UID write permission over htdig's COMMON_DIR or, before the nightly_htdig script is first run, run htdig's htfuzzy executable with a sufficiently privileged UID in the manner that the rundig script would run htfuzzy, to create any necessary files in COMMON_DIR.
See htdig's documentation for further information on this topic.
When remote_mmsearch or remote-mmsearch scripts are used as part of a remote htdig strategy you may encounter a file permissions problem. This is because these scripts, which in turn execute htsearch as a sub-process, will be run with UID and GID of the remote Apache server.
By default, the permissions of the per-list $prefix/private/archives/<listname>/htdig/ directories only allow access for the mailman UID and GID and hence the remotely executed htsearch will be unable to access them.
If this problem is encounterd, then you will have to use the HTDIG_STRICT_FILE_PERM configuration variable to say "open up the permissions" before running $prefix/bin/check_perms. You can then use a RewriteRule or similar in the Apache server's httpd.conf file to restrict access to $prefix/private/archives/<listname>/htdig/ directories via the web server.
This configuration is for when you are running Mailman, htdig, the HTTP server used to provide Mailman's web UI and htdig's htsearch CGI script, on the same machine.
You will need to:
HTDIG_RUNDIG_PATH to file $prefix/Mailman/mm_cfg.py.
HTDIG_HTSEARCH_PATH to file $prefix/Mailman/mm_cfg.py.
USE_HTDIG with the value 1 to $prefix/Mailman/mm_cfg.py.
USE_HTDIG = 1
If necessary you can override the values of any of the other configuration variables in file $prefix/Mailman/mm_cfg.py.
In particular you might need to change the HTDIG_FILES_URL variable from its default. This URL can be just the path i.e. absolute URL on the same server as that which serves Mailman's Web UI, or a full URL identifying the scheme (http), server, server port and path, for example http://mailer.yourdomain.tld:8080/htdig/
This configuration is for when you are running htdig and an HTTP server providing access to htsearch via remote_mmsearch or remote-mmsearch on a different machine to that is running Mailman.
For this configuration to work, htdig's programs, both those run from command lines such as rundig and those run via CGI such as htsearch, must be able to see Mailman archives through NFS. In the examples below we'll assume that /mnt/mailman-archives on the htdig machine maps to $prefix/mailman/archives on the Mailman machine.
You should also arrange for he mailman UID and its GID to be common to both machines. Remember that when rundig is called on the htdig machine to produce search indices for each list it will be trying to write those files via NFS in Mailman's archive area and will thus need to run with an appropriate identity and permissions.
The differences between the local and remote configuration are:
You will need to:
HTDIG_HTSEARCH_PATH to file $prefix/Mailman/mm_cfg.py. This is path to htdig's htsearch on the remote machine running htdig. For example:
HTDIG_HTSEARCH_PATH = '/usr/local/bin/htsearch'
HTDIG_RUNDIG_PATH to file $prefix/Mailman/mm_cfg.py. This is path to rundig on the remote machine running htdig. For example:
HTDIG_RUNDIG_PATH = '/usr/local/bin/rundig'
REMOTE_MMSEARCH_URL to file $prefix/Mailman/mm_cfg.py. This must be a full URL referring to one of Mailman's remote_mmsearch/remote-mmsearch CGI scripts on the remote htdig machine, as seen from the Mailman local machine. For example:
REMOTE_MMSEARCH_URL = 'http://htdiggy.your.com/cgi-bin/remote-mmsearch'
HTDIG_FILES_URL to file $prefix/Mailman/mm_cfg.py. This must be a full URL referring to the directory containing htdig files on the remote htdig machine as seen from the Mailman local machine. This URL must end with a '/'. For example:
HTDIG_FILES_URL = 'http://htdiggy.your.com/htdig/'
REMOTE_PRIVATE_ARCHIVE_FILE_DIR to $prefix/Mailman/mm_cfg.py. This must be the absolute file system path to the directory in which Mailman stores private archives as seen by the machine running htdig. For example:
REMOTE_PRIVATE_ARCHIVE_FILE_DIR = '/mnt/mailman-archives/private'
USE_HTDIG with the value 1 to $prefix/Mailman/mm_cfg.py.
USE_HTDIG = 1
REMOTE_HTDIG with the value 1 to $prefix/Mailman/mm_cfg.py.
REMOTE_HTDIG = 1
HTDIG_STRICT_FILE_PERM with the value 0 to $prefix/Mailman/mm_cfg.py. This may be needed it the UID/GID that Apache on the htdig server will run the remote mmsearch as is not mailman or in the mailman group. This change will open up a security hole which you may want to consider plugging; see under the heading "Apache permissions" for more details.
HTDIG_STRICT_FILE_PERM = 0
You have to choose one of the two remote mmsearch scripts found in $prefix/Mailman/Cgi - remote-mmsearch (a Perl script) and remote_mmsearch (a Python script) - to use and transfer it to the htdig machine. You need to add this script to the directory in which the web server on the htdig machines expects to find CGI scripts. Having transferred the script to you htdig machine you will need to use a text editor to set the values of four configuration variables below the heading "Edit the following configuration variables to suit your installation", namely:
MAILTO- this is the default mail address for your installation.
VALID_IP_LIST- this is a list of IP numbers from which the script should accept an HTTP request. Normally this should be set to the IP number of your machine running Mailman. If the list is empty the script will accept HTTP requests from any machine and be vulnerable to the exploit described under the heading "Private archive security problem prior to
htdig-2.1.1-0.2.patch version" above.HTDIG_CONF_LINK_DIR- this is the file path to the directory in which links to list specific htdig config files are placed, as viewed from the remote machine running htdig.
HTDIG_HTSEARCH_PATH- this is the file path to the
htsearchprogram in the htdig package as viewed from the remote machine running htdig.See "What is Installed by the Patch" for an explanation of the differences between these remote mmsearch scripts which both do the same job: being a security wrapper around htdig's
htsearchprogram to restrict searching of a list's archive indexes to users authorised to see the contents of that archive.Note: You may need to change the '
#!' on the first line of whichever of theremote-mmsearch(Perl) andremote_mmsearch(Python) scripts you opt for so that the correct interpreter is used for running the script on the remote htdig machine. You may also need to verify the supporting packages/modules used by the selected script are installed on that system.
You have to choose one of the three remote_nightly_htdig scripts found in $prefix/cron - remote_nightly_htdig, remote_nightly_htdig_noshare and remote_nightly_htdig.pl - and transfer it to the htdig machine. See above under heading "What is Installed by the Patch" for an explanation of the differences between these scripts, which all do the same basic job. You should add the script to the crontab for the mailman UID on the htdig machine. But first you need to edit the selected script to add some configuration information. What has to be added depends on which script you opt to use. In each case the variables concerned are declared near the top of the script and you just have to enter the appropriate values:
remote_nightly_htdigyou only need to set the value of the python variable
MAILMAN_PATHto be the directory$prefixas seen from the htdig machine. The whole Mailman installation must be accessible via NFS in order to use this script.remote_nightly_htdig_noshareyou need to copy the values for the following configuration variables from either $prefix/Mailman/mm_cfg.py or $prefix/Mailman/Defaults.py to the script:
REMOTE_PRIVATE_ARCHIVE_FILE_DIR,HTDIG_RUNDIG_PATH. The variables declared inremote_nightly_htdig_noshareuse the same names. This script only requires that the archives directory of the Mailman installation be accessible via NFS.remote_nightly_htdig.plyou need to copy the values for the following configuration variables from either
$prefix/Mailman/mm_cfg.pyor$prefix/Mailman/Defaults.pyto the script:REMOTE_PRIVATE_ARCHIVE_FILE_DIR,HTDIG_RUNDIG_PATH. Being a Perl script, the variables inremote_nightly_htdig.pluse the same names but prefixed with the '$' character. This script only requires that the archives directory of the Mailman installation be accessible via NFS.Note: You may need to change the '
#!' on the first line of whichever of these scripts you opt for so that the correct interpreter is used for running the script on the remote htdig machine. You may also need to verify the supporting packages/modules used by the selected script are installed on that system.As with the
nightly_htdigscript when running with local htdig, these scripts can be run from the command line using the mailman UID in order to get htdig to construct an initial set of indices.
$prefix/bin/arch. This will embed the ARCHIVE_INDEXING_ENABLE and ARCHIVE_INDEXING_DISABLE in the regenerated archive pages and, after nightly_htdig has been run, give improved search results.
$prefix/bin/blow_away_htdig script to remove all existing per list htdig config files and htdig indices/db files.
nightly_htdig script from the command line to generate a new set of per list htdig search indices.
If you change the version of htdig you run, you may find that the indices built with the earlier version are not compatible with the newer version of htdig's programs. In that case do the following:
$prefix/bin/blow_away_htdig script with the -i flag to remove all existing per list htdig indices/db files.
nightly_htdig script from the command line to generate new sets of per-list htdig search indices.
If you change the addressing scheme of the web_page_url for a list to or from http then you will need to rebuild the list's htdig configuration file(s) and the related htdig indices. Do the following:
$prefix/bin/blow_away_htdig script to remove all existing per list htdig material for the list(s) concerned.
nightly_htdig script from the command line to generate new sets of per list htdig search indices.
If you have just turned USE_HTDIG on or just used $prefix/bin/blow_away_htdig (without the -i flag) there will be no per-list htdig information saved in the archives.
When the first post to each archive-enabled list is archived by pipermail, the per-list htdig config file will be constructed and some directories and links added to your Mailman archive directories. The htdig search form will be added to list's TOC page.
However, until one of the nightly_htdig scripts is run no htdig indices will be constructed. You can either wait for the script to run as a cron job or run it (while using the mailman UID) from the command line.
This patch is hopefully the final step in closing security holes in archive access.
In version htdig-2.1.3-0.1.patch, htdig.py was rebased on the standard MM release's private.py which had moved on since the snapshot of it used as the basis for htdig.py was originally taken. Among other things, htdig.py had been modified to prevent access to some files in list archive directories such as a list's archive pipermail.pck and files in the list's archive database sub-directory.
This rebasing action re-introduced to htdig.py the security holes, still extant in private.py despite it being later code, via which private.py would serve files such as a list's archive pipermail.pck and files in the list's archive database sub-directory.
The permissions on these files and directories mean that they are inaccessible via the web server using /pipermail/ URIs if a list's archive is public.
Additionally, check_perms is now modified so that the list archive htdig subdirectory permissions are set to 2770 by default. Prior to htdig-2.1.1-0.2.patch, this could not be done as the htsearch script, being run with uid and gid of the Apache server, could then not gain access to files in the htdig subdirectories. But, since the introduction of the mmsearch script, which runs with the mailman gid and spawns htsearch, it can. This prevents accees to the list archive htdig subdirectories via /pipemail/ URI's. Up until htdig-2.1.3-0.2.patch this could only be achieved by using a RewriteRule or similar in the Apache server's httpd.conf file.
RewriteRule or similar in the Apache server's httpd.conf file for protection.
The solution to this problem has been superceded in htdig-2.1.3-0.3.patch as follows: Introduced the HTDIG_STRICT_FILE_PERM Mailman config variable as part of dealing with htsearch access to per-list htdig directories permissions issue when operating with remote htdig. See under the "Apache" heading above.
Versions of the Mailman-htdig integration patch installed by versions of this patch prior to htdig-2.1.1-0.2.patch allow a security exploit which can expose information, held in the per-list search indexes of private list archives, to unauthorised users.
Via the exploit an unauthoized user can submit a search query to htdig's htsearch CGI program without their having been authenticated as a user allowed to access the list archive concerned. The results, returned in good faith by htsearch, will expose some information that the user is not entitled to see.
However, the security breakdown is not complete. Attempts to follow links returned by htsearch, which go via the htdig CGI script installed by this patch, will be blocked if the user is not authorized to access the list archive.
With htdig-2.1.1-0.2.patch and later versions of the patch:
If you are upgrading a Mailman installation that has an earlier version of the the Mailman-htdig integration patch than that installed by htdig-2.1.1-0.2.patch or later, you need to make some changes to that installation:
HTDIG_MAILMAN_LINK Mailman configuration variable. This link previously gave htdig programs access to per list htdig configuration files. This is now done by other means and the symlink allows a security exploit that prejudices the privacy of list archives.
HTDIG_MAILMAN_LINK Mailman configuration variable from the $prefix/Mailman/mm-cfg.py file.
These changes are in addition to the normal installation instructions given below. Having configured and installed the newly patched version of Mailman you must:
$prefix/bin/blow_away_htdig with the -c option to rebuild per-list htdig conf files and delete existing per-list search indexes.
$prefix/cron/nightly_htdig script from the command line to rebuild per-list search indexes using the revised per-list htdig conf files just created by blow_away_htdig.
If you install htdig from the htdig-3.2.0 binary rpm of RH7.1/2 Binary CD 1 of 2 you also have to install the htdig-web-3.2.0 binary rpm. This may be from RH 7.1/2 Binary CD 2 of 2 or CD 1 of 2 depending on whether you are using actual CDs or downloaded CD images.
htdig's graphics file must be accessible via you web server and the Mailman configuration variable HTDIG_FILES_URL setup accordingly. Depending on how you install htdig and Apache you may need to add Alias and/or ScriptAlias directives to you Apache configuration file to make the htdig components accessible. Check the Apache and htdig documentation.
| Version of patch | Version of Mailman |
|---|---|
htdig-2.1.11-0.1.patch
|
Mailman 2.1.11 |
htdig-2.1.10-0.1.patch
|
Mailman 2.1.10 |
htdig-2.1.9-0.1.patch
|
Mailman 2.1.9 |
htdig-2.1.7-0.1.patch
|
Mailman 2.1.7 and 2.1.8 |
htdig-2.1.6-0.1.patch
|
Mailman 2.1.6 |
htdig-2.1.4-0.1.patch
|
Mailman 2.1.4 |
htdig-2.1.3-0.5.patch
|
Mailman 2.1.3 |
htdig-2.1.3-0.4.patch
|
Mailman 2.1.3 |
htdig-2.1.3-0.3.patch
|
Mailman 2.1.3 |
htdig-2.1.3-0.2.patch
|
Mailman 2.1.3 |
htdig-2.1.3-0.1.patch
|
Mailman 2.1.3 |
htdig-2.1.2-0.4.patch
|
Mailman 2.1.2 |
htdig-2.1.2-0.3.patch
|
Mailman 2.1.2 |
htdig-2.1.2-0.2.patch
|
Mailman 2.1.2 |
htdig-2.1.2-0.1.patch
|
Mailman 2.1.2 |
htdig-2.1.1-0.5.patch
|
Mailman 2.1.1 |
htdig-2.1.1-0.4.patch
|
Mailman 2.1.1 |
htdig-2.1.1-0.3.patch
|
Mailman 2.1.1 |
htdig-2.1.1-0.2.patch
|
Mailman 2.1.1 |
htdig-2.1.1-0.1.patch
|
Mailman 2.1.1 |
htdig-2.1-0.3.patch
|
Mailman 2.1 |
htdig-2.1-0.2.patch
|
Mailman 2.1 |
htdig-2.1-0.1.patch
|
Mailman 2.1 |
htdig-2.1b6-0.1.patch
|
Mailman 2.1b6 |
htdig-2.1b5-0.1.patch
|
Mailman 2.1b5 |
htdig-2.1b4-0.1.patch
|
Mailman 2.1b4 |
htdig-2.1b3-0.3.patch
|
Mailman 2.1b3 |
htdig-2.1b3-0.2.patch
|
Mailman 2.1b3 |
htdig-2.1b3-0.1.patch
|
Mailman 2.1b3 |
htdig-2.1b2-0.1.patch
|
Mailman 2.1b2 |
htdig-2.0.13-0.2.patch
|
Mailman 2.0.13 |
htdig-2.0.13-0.1.patch
|
Mailman 2.0.13 |
htdig-2.0.12-0.1.patch
|
Mailman 2.0.12 |
htdig-2.0.11-0.1.patch
|
Mailman 2.0.11 |
htdig-2.0.10-0.2.patch
|
Mailman 2.0.10 |
htdig-2.0.10-0.1.patch
|
Mailman 2.0.10 |
htdig-2.0.9-0.1.patch
|
Mailman 2.0.9 |
htdig-2.0.8-0.1.patch
|
Mailman 2.0.8, 2.0.7, 2.0.6 and probably 2.0.3, 2.0.4 and 2.0.5 |
htdig-2.1.11-0.1.patch:
htdig-2.1.10-0.1.patch:
htdig-2.1.9-0.1.patch:
htdig-2.1.7-0.1.patch:
htdig-2.1.6-0.1.patch:
$build/templates/<lang>/for the following languages are NOT modified by this patch or by its precursor indexing patch: ca, eu, sr, sven language templates after installation of this patch if that other language is used ;'
templates/<lang>/archidxfoot.htmltemplates/<lang>/archidxhead.htmltemplates/<lang>/archtoc.htmltemplates/<lang>/archtocentry.htmltemplates/<lang>/archtocnombox.htmltemplates/<lang>/article.htmlhtdig-2.1.4-0.1.patch:
htdig.html from per-language directories under $build/templates, with the exception of the default templates/en/ directory, that were present in previous versions of this patch.
htdig-2.1.3-0.5.patch:
htdig.py and private.py; the security changes introduced by htdig-2.1.3-0.2 patch to these scripts incorrectly blocked access to the <listname>.mbox/<listname>.mbox file. The O.5 revison of the patch corrects this error. This problem and a suggested fix were pointed out to me in a private email by Stephan Berndts <stb-mm at spline.de>
htdig-2.1.3-0.4.patch:
htdig.py and introduced htdig.html templates. The changes mean that if the user is challenged for authentication, when the credentials are submitted and accepted, the URL requested which led to the challenge is then presented.
htdig-2.1.3-0.3.patch:
$prefix/bin/check_perms and $prefix/Mailman/Archiver/HyperArch.py to improve handling of htdig subdirectory permissions if remote htdig is used. End result is the same as with prior patch version in the case of local htdig.
HTDIG_STRICT_FILE_PERM Mailman config variable as part of dealing with htsearch access to per-list htdig directories permissions issue when operating with remote htdig. See under the "Apache" heading above.
htdig-2.1.3-0.2.patch:
htdig-2.1.3-0.2 patch".
htdig-2.1.3-0.1.patch:
htdig-2.1.2-0.4.patch:
htdig-2.1.2-0.3.patch:
HyperArch.py so htdig related code uses quick_maketext() function instead of the Utils.Maketext() function.
htdig-2.1.2-0.2.patch:
htdig-2.1.1-0.5.patch and carried forward into htdig-2.1.2-0.1.patch
htdig-2.1.2-0.1.patch:
htdig-2.1.1-0.5.patch:
/cgi-bin/ and .cgi. If MM is configured so that the URL for accessing the htdig.py cgi wrapper matches these excluded URLS (for instance by running ./configure with --with-cgi-ext=".cgi") then nothing gets indexed by rundig. The revised patch:
HTDIG_EXCLUDED_URLS which defaults to the old hard-wired value.
htdig.conf file a check is made against HTDIG_EXCLUDED_URLS and if anything in it would prevent indexing of the URL for accessing the htdig.py cgi wrapper for that list, it is omitted from the exclude_urls directive in that htdig.conf file.
htdig-2.1.1-0.4.patch:
mmsearch.py and its remote kin remote-mmsearch and mm_search were overly restrictive on the form fields they were willing to accept. Extended the list so that multi-page search results worked.
htdig-2.1.1-0.3.patch:
mmsearch.py. This will only show if there is a problem with mmsearch running the htsearch program.
htdig-2.1.1-0.2.patch:
htsearch results page without the user being authorised to access the list. Any attempt to follows links on the results page were blocked correctly by $prefix/Mailman/htdig.py but there was leakage of private information from the list's search indexes on the page returned by htdig's htsearch CGI program. The exploit is removed by this patch's revisions. The following sections describe the problem, the solution and special actions required when updating a Mailman installation using an earlier version of this patch:
Note that there is no patch revision to deal with this security problem for MM 2.0.13 or earlier and you should seriously consider updating to MM 2.1.x if you want to implement this security fix.
htdig-2.1.1-0.1.patch:
htdig-2.1-0.3.patch:
$prefix/Mailman/htdig.py worked out content type of file being returned.
$prefix/Mailman/htdig.py adopts revised method for establishing the default URL introduced in 2.1 and as used in $prefix/Mailman/MailList.py
DEFAULT_URL in cron scripts $prefix/cron/remote_nightly_htdig_noshare and $prefix/cron/remote_nightly_htdig.pl
DEFAULT_URL in this document to DEFAULT_URL_PATTERN.
htdig-2.1-0.2.patch:
$prefix/Mailman/htdig.py. Fixes bug with htdig.py and problem of interaction with bug in $prefix/scripts/driver script (see patch #668685 for more details)
htdig-2.1-0.1.patch:
htdig-2.1b6-0.1.patch:
htdig-2.1b5-0.1.patch:
htdig-2.1b4-0.1.patch:
Mailman/Archiver/HyperArch.py have been extracted into files under the templates directory. Edit these with care if you must.
htdig-2.1b3-0.3.patch:
htdig-2.1b3-0.2.patch:
htdig-2.1b3-0.1.patch which showed up as logged errors in the operation of the ArchRunner qrunner at line 721 of HyperArch.py
htdig-2.1b3-0.1.patch:
updateTOC.py and replaced it with an alternate mechanism in a patch to $prefix/Mailma/Queue/ArchRunner.py to update list TOC page after reindexing by htdig. This new method is only exercised when the remote_nightly_htdig series of cron scripts are used.
remote_nightly_htdig series of cron scripts to reflect demise of updateTOC cgi script.
htdig-2.1b2-0.1.patch:
htdig-2.0.13-0.2.patch:
htdig-2.0.13-0.1.patch:
htdig-2.0.12-0.1.patch:
HTDIG_EXTRAS xonfig variable to allow arbitrary htdig configuration parameters to be specified for addition to every htdig.conf file created i.e. site wide additions.
htdig-2.0.11-0.1.patch:
htdig-2.0.10-0.2.patch:
nightly_htdig cron script and its relatives. Doing import * inside a function removed.
htdig-2.0.10-0.1.patch:
src/Makefile.in to get clean patch application to MM 2.0.10
htdig-2.0.9-0.1.patch:
htdig-2.0.8-0.1.patch:
web_page_url for a list, which is usually the same as DEFAULT_URL from either $prefix/Mailman/Defaults.py or $prefix/Mailman/mm_cfg.py, when it doesn't use the http addressing scheme. This arises because htdig will only build indices if the URLs for pages use the http addressing scheme. There is a work-around for this problem posted in htdig's mail archives - see the copy in Appendix 1 to this document.
web_page_url of a list an additional htdig configuration file for use by htsearch is generated.
web_page_url configuration of any of your lists.
htdig-2.0.6-0.3.patch:
mm_cfg.py. The configuration variables concerned default to the previous fixed values so that this version is backwards compatible with earlier versions.
A technique for htdigging when Mailman's web_page_url uses the https addressing scheme is described in this archived e-mail: http://www.htdig.org/mail/1999/10/0187.html The text of that e-mail is as follows: [htdig] Re: Help about htdig indexing https files ------------------------------------------------------------------------ Gilles Detillieux (grdetil at scrc.umanitoba.ca) Wed, 27 Oct 1999 10:18:31 -0500 (CDT) Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] Next message: Avi Rappoport: "[htdig] indexing SSL (was: Help building the database)" Previous message: Gilles Detillieux: "Re: Fw: [htdig] mutiple search results" In reply to: Torsten Neuer: "Re: Fw: [htdig] mutiple search results" ------------------------------------------------------------------------ According to Edouard DESSIOUX: > >Currently, htdig will not support URLs that begin with https://, even > >when using local_urls to bypass the server. A trick that might work > >would be to index using http:// instead, but use local_urls to point > >to the directory that contains the contents of the secure server. > > I used that, and now, when i use htsearch, it work, except the fact > that all my URL are http://x.y.z/ instead of https://x.y.z/ > > >You'd need to use separate > >configuration files for digging and searching, and use > >url_part_aliases in each of these configuration files to rewrite the > >http:// into https:// in the search results. > > This is the part i dont understand, and i would like you to explain. It basically works as a search and replace. One url_part_aliases in the configuration file used by htdig maps the http://x.y.z/ into some special code like "*site", and another url_part_aliases in the configuration file used by htsearch maps the "*site" back into the value you want, i.e. https://x.y.z/. The substitution is left to right in htdig, and right to left in htsearch. So, if you use the same config file for both, or the same setting for both, you get back what you started with (but saved some space in the database because of the encoding). However, if you use two separate config files with different url_part_aliases setting for htdig and htsearch, you can remap parts of URLs from one substring to another. I hope this makes things clearer. I thought the current description at http://www.htdig.org/attrs.html#url_part_aliases was already quite clear. -- Gilles R. Detillieux E-mail: <grdetil@scrc.umanitoba.ca> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 ------------------------------------
| Click to e-mail comments or complaints | Last updated: 18/07/08, 01:30 pm |