Mailman Patch #444879 README.NOINDEXtags
If you are defining values for the ARCHIVE_INDEXING_ENABLE and
ARCHIVE_INDEXING_DISABLE configuration attributes in mm_cfg.py you
may want to try and control the indexing activities of multiple search
engines that you let access your mail archives.
At the time of writing this, the problem you face is that there is no standard
tag defined to exert partial control over search engine indexing of a page. By
this I mean a way of telling the search engine to index only a specified part of
the page content. There is no formal or de facto standard equivalent to the
robots property on the HTML 4.0 META tag e.g.
<META NAME=robots CONTENT="noindex,follow">,
which gives whole page control with most search engines.
However, you should be able to put multiple start and stop indexing tags in the
values you assign to the ARCHIVE_INDEXING_ENABLE/DISABLE strings in mm_cfg.py.
For example, some writers on the web suggest using <NOINDEX> and </NOINDEX> tags
because they are recognised and honoured by a number of search engines.
The defaults recognised by htdig are actually HTML comments of the form
<!--htdig_noindex--> and <!--/htdig_noindex-->
You could combine these in your mm_cfg.py file as follows:
ARCHIVE_INDEXING_ENABLE = '<!--/htdig_noindex-->\n</NOINDEX>'
ARCHIVE_INDEXING_DISABLE = '<NOINDEX>\n<!--htdig_noindex-->'
Most browsers and search engines should be happy with the results of this as, in
general, they will ignore tags they do not understand and act on those they do.