Thu Jan 31 17:32:33 2002 Geoff Hutchison * Release of 3.1.6. * htdoc/confindex.html, htdoc/htsearch.html, htdoc/index.html, htdoc/mailarchive.html: Remove CSS link, not needed in these frameset pages. * htdoc/howto-mirror.html: Update with Jesse's latest version. Thu Jan 31 15:13:07 2002 Gilles Detillieux * Makefile.in: Fixed install-strip target to properly handle relative paths in INSTALL_PROGRAM when passing it to subdirectories. Thu Jan 31 11:41:39 2002 Gilles Detillieux * htdoc/FAQ.html: Updated questions 4.8 & 4.9 to emphasize use of doc2html over parse_doc.pl. Further clarified question 2.1. Thu Jan 31 10:14:23 2002 Gilles Detillieux * contrib/parse_doc.pl: Added comments explaining why you should not be using this script. Wed Jan 30 17:20:51 2002 Geoff Hutchison * htdoc/FAQ.html: Updated to mention 3.1.6 as the newest version and --with-rx as a fix for regex problems on BSDI. Wed Jan 30 17:15:49 2002 Gilles Detillieux * installdir/synonyms: Updated with the version contributed by David Adams, with minor changes. Kept old one as synonyms.original. * installdir/english.0: Changed lots more dubious uses of suffixes to get more appropriate and correct fuzzy endings expansions. Wed Jan 30 12:30:16 2002 Geoff Hutchison * htlib/Connection.cc (connect): Fixed bug with allow_EINTR and add support for looping when the connection returns EAGAIN (no more free local ports). Thanks to Ahmon Dancy for pointing out the EAGAIN issue. Tue Jan 29 09:59:58 2002 Gilles Detillieux * htdoc/FAQ.html: Updated with today's changes to maindocs FAQ. Mon Jan 28 16:54:15 2002 Gilles Detillieux * contrib/README: Added mentions of examples & xmlsearch, fixed typo. Sun Jan 27 23:13:11 2002 Geoff Hutchison * htdoc/*.html: Final batch of documentation updates. Sat Jan 26 23:28:25 2002 Geoff Hutchison * htdoc/*: More documentation updates from merging with the current maindocs CVS. Fri Jan 25 21:36:21 2002 Geoff Hutchison * acconfig.h, include/htconfig.h.in: Add USE_RX to potential configure #include macros. * htlib/gregex.h: Rename regex.h to prevent conflicts with system version. * htlib/regex.c, htlib/HtRegex.h: Ditto. * htfuzzy/EndingsDB.cc: Use same tests as HtRegex.h for rxposix.h, gregex.h or regex.h depending on configure results. * configure.in: Implement more flexible test for rx/regex, which will check for rxposix.h if --with-rx is supplied, will "fall back" to regex test if rxposix.h isn't available and will only use the htlib/ code and header for regex compile. * configure: Update using autoconf. Fri Jan 25 12:14:26 2002 Gilles Detillieux * contrib/whatsnew/README, contrib/whatsnew/whatsnew.html: Added an example of how to get a what's new listing from the new features in htsearch. Thu Jan 24 22:43:28 2002 Geoff Hutchison * htcommon/defaults.cc: Add ignore_dead_servers attribute to control whether indexing will continue to try to contact a dead server. * htdig/Retriever.cc: Only mark a server as dead if the ignore_dead_servers attribute is set. * htdoc/cf_byname.html, htdoc/cf_byprog.html, htdoc/attrs.html: Documentation updates. Thu Jan 24 15:32:59 2002 Geoff Hutchison * configure, configure.in: Add --with-rx option to switch to system rx code (e.g. on BSDI). Needs some touchups still, including checking that rxposix.h exists and if --without-rx was supplied for some reason. * htlib/HtRegex.h: Add conditional header for systems where rx is better than regex. * htlib/Makefile.in: Make sure regex.o is only compiled if it works on a given system via LIBOBJS as supplied by the configure script. Mon Jan 21 22:33:30 2002 Geoff Hutchison * htdoc/RELEASE.html: Add first shot at the release notes for 3.1.6. Still need to finish some of the htdoc/ merges, including the SF icons and such. * htdoc/*.html: First stab at many of the htdoc/merges including the new Copyright line. (It is 2002, after all.) Fri Jan 18 18:17:34 2002 Geoff Hutchison * htmerge/docs.cc: Add a test if the DB database has no URLs before proceeding. * htmerge/words.cc: Add a slightly more user-friendly error message if the word list file doesn't exist. Remove exit() statements since reportError does this for us. Fri Jan 18 16:47:50 2002 Gilles Detillieux * htdoc/attrs.html: Rewrote description of prefix_match_character to make it more clear, with crosslinks to related attributes, and described new wildcard matching feature. Added more explanations for relative days & months in startday et al. to make it clearer. Added more notes about to-strings in the url_part_aliases description and explained the example even more, as well as adding crosslinks to the new *_rewrite_rules. Fri Jan 18 15:56:11 2002 Gilles Detillieux * htsearch/htsearch.cc (setupWords), htsearch/parser.cc (perform_push): Added support for a wildcard word of "*" (or prefix_match_character if set and not empty) which returns all documents. Wed Jan 16 17:21:26 2002 Gilles Detillieux * htdoc/attrs.html, htdoc/hts_form.html: Described how to use relative dates for startyear et al. Wed Jan 16 16:58:05 2002 Gilles Detillieux * htsearch/Display.cc (buildMatchList): Fixed startday et al. to allow relative days, month & years if values are negative. Fri Jan 11 20:57:51 2002 Gilles Detillieux * htdoc/attrs.html: Updated descriptions for translate_* attributes to match the new default behavior. Fri Jan 11 17:48:54 2002 Gilles Detillieux * htdig/SGMLEntities.cc (translateAndUpdate): Added support for translate_latin1 attribute, to turn off ISO-8859-1-specific entities. * htcommon/defaults.cc: Added translate_latin1 attribute. * htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Documented it. Fri Jan 11 17:14:54 2002 Gilles Detillieux * contrib/xmlsearch.{README,tar.gz}: Removed older xmlsearch package. Fri Jan 11 17:06:09 2002 Gilles Detillieux * contrib/xmlsearch/*: Added files contributed by Nathan Hand and me to implement XML output from htsearch, including DTD, templates and config file. Wed Jan 9 22:08:21 2002 Gilles Detillieux * CONFIG.in: Fixed to allow setting BIN_DIR by configure option. * contrib/htdig-3.1.6.spec: Fixed to make use of new ./configure options for pathnames, do away with patch file. Used variables for many pathnames to allow easy changes. Wed Jan 9 16:22:32 2002 Gilles Detillieux * htdig/ExternalParser.cc (parse): Added support for max_keywords attribute. Wed Jan 9 16:10:44 2002 Gilles Detillieux * htdig/HTML.cc (HTML, do_tag), htdig/ExternalParser.cc (parse): Added support for description_meta_tag_names attribute. Ensure external parser interface accepts META descriptions even if 'description' is added to the keyword list. * htcommon/defaults.cc: Added description_meta_tag_names attribute. * htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Documented it. Tue Jan 8 17:39:24 2002 Gilles Detillieux * htdig/ExternalParser.cc (parse): Added support for use_doc_date attribute. Thu Jan 3 17:10:50 2002 Gilles Detillieux * htlib/Makefile.in, htlib/lib.h: Removed references to timegm, mytimegm and strptime functions. Removed C source for these. Thu Jan 3 16:43:31 2002 Gilles Detillieux * htdoc/htmerge.html: Added extra description for -m option to clear up common points of confusion, added note about LC_COLLATE environment variable. Fri Dec 21 18:52:32 2001 Gilles Detillieux * htdig/Retriever.cc: Added parsedcdate function, used by got_time, to parse DC date meta tags without requiring strptime or timegm. Thu Dec 20 12:25:47 2001 Gilles Detillieux * htdig/Document.cc: Added parsedate function, used by getdate, to parse date headers without requiring strptime or timegm, which have caused problems on some systems. Thu Dec 20 11:51:26 CET 2001 Gabriele Bartolini * configure.in: reviewed directory settings * Makefile.in: ditto (for 'make install' of htdig.conf and rundig) Wed Dec 19 23:05:09 2001 Geoff Hutchison * configure.in: Add tests for ostream.h and iostream.h. * htlib/htString.h: Add HAVE_OSTREAM_H and HAVE_IOSTREAM_H preprocessor statements to deal with portability issues around the C++ header files. Wed Dec 19 13:33:55 2001 Gabriele Bartolini * configure.in: fixed bug in customisation of configure paramters * CONFIG.in: ditto * configure: re-generated with autoconf Tue Dec 18 16:12:17 2001 Gilles Detillieux * htsearch/Display.cc (displayMatch): Fixed to clear out old values of ANCHOR template variable for each result. Thu Dec 6 13:14:22 2001 Gilles Detillieux * contrib/examples/rundig.sh: Fixed to make use of DBDIR variable. Wed Nov 21 12:54:42 2001 Gilles Detillieux * htdoc/rundig.html: Added note about effect of changing database_base. * htmerge/docs.cc (convertDocs): Changed confusing message about total doc db size in stats. Wed Nov 21 11:37:52 2001 Gilles Detillieux * htsearch/TemplateList.cc (createFromString), htdoc/attrs.html: Treat template_map as a _quoted_ string list. Change tags to the HTML-4.0 compliant tags in builtin-long template. Tue Nov 20 17:13:27 2001 Gilles Detillieux * htlib/String.cc (String, append, sub): Added checks for negative lengths or start position to make code more fault-tolerant. Tue Nov 20 16:37:26 2001 Gilles Detillieux * htfuzzy/Synonym.cc (createDB): Check for lines with less than 2 words, to avoid segfault caused by calling Database::Put() with negative length for data field. Sat Nov 3 23:55:00 2001 Geoff Hutchison * htlib/htString.h: Add #include for ostream.h to solve compile problems with gcc3. * htlib/Connection.h, htlib/Connection.cc: Backport Connection class from 3.2 code--installs alarm() call to timeout connections and will retry connections a few times before giving up. Fri Nov 2 12:28:35 2001 Gilles Detillieux * htdig/HTML.cc, htdoc/attrs.html: Added support for dc.date, dc.date.created and dc.date.modified to use_doc_date handling. Fri Nov 2 12:12:59 2001 Gilles Detillieux * contrib/xmlsearch.README, contrib/xmlsearch.tar.gz: Added files contributed by Nathan Hand and me to implement XML output from htsearch, including DTD, templates and config file. Fri Nov 2 12:05:49 2001 Gilles Detillieux * htdig/HTML.cc (do_tag), htcommon/defaults.cc: Added ignore_alt_text attribute to avoid indexing alt text in img tags. * htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Documented it. Thu Nov 1 14:43:13 2001 Gilles Detillieux * htsearch/htsearch.cc (main): Fixed to only show file names in error messages when REQUEST_METHOD not set and -v option given, for security. Thu Nov 1 10:19:27 2001 Gilles Detillieux * htsearch/Display.cc, htsearch/Display.h: Added a localized method for outputing HTTP headers, added support for a new search_results_contenttype attribute to control that header. * htcommon/defaults.cc: Added default for it. * htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Documented it. Wed Oct 31 13:31:18 2001 Gilles Detillieux * installdir/english.0: Changed lots of dubious uses of suffixes to get more appropriate and correct fuzzy endings expansions. Tue Oct 23 14:06:37 2001 Gilles Detillieux * htdig/Retriever.cc (RetrievedDocument): Fixed handling of null return from getParsable(), to avoid segfault problem introduced by text/css conditional added Jul 25. Fri Oct 19 17:24:19 2001 Gilles Detillieux * htsearch/Display.cc (hilight): Added Stefan Nehlsen's idea for anchor_target attribute. * htcommon/defaults.cc: Added default for it. * htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Documented it. Sun Oct 14 22:05:30 2001 Gilles Detillieux * htdoc/attrs.html (external_parsers): Documented external converter chaining to same content-type, e.g. text/html->text/html-internal. Sun Oct 14 21:54:24 2001 Gilles Detillieux * htdoc/attrs.html, htdoc/cf_byprog.html, htdoc/cf_byname.html, htcommon/defaults.cc: Documented and declared startyear, etc. attributes used by htsearch. Sun Oct 14 21:16:19 2001 Gilles Detillieux * htdoc/htdump.html, htdoc/htload.html, htdoc/attrs.html, htdoc/cf_byprog.html, htdoc/contents.html: Documented htdump and htload, indicating which attributes are used by them. Fri Oct 12 14:58:15 2001 Gilles Detillieux * htlib/URL.cc (removeIndex): Fixed to make sure the matched file name is at the end of the URL. Tue Oct 2 09:34:43 2001 Gilles Detillieux * htdoc/attrs.html (start_url): Added a reference and link to limit_urls_to, explaining how the two are tied together. Fri Sep 28 17:19:45 2001 Gilles Detillieux * contrib/htdig-3.1.6.spec: Fixed %install to make symlinks for htdump & htload, added these to %files list. Fri Sep 28 15:38:00 2001 Gilles Detillieux * htsearch/Display.cc (displayMatch): Save rewritten URL in DocumentRef so it'll be used for star_patterns and template_patterns matching. Fri Sep 28 14:25:29 2001 Gilles Detillieux * htsearch/Display.cc (buildMatchList, displayMatch), htsearch/htsearch.cc (main): Added calls to pass search_rewrite_rules to HtURLRewriter class and use it to rewrite URLs in results. * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html, htcommon/defaults.cc: Added search_rewrite_rules attribute. Thu Sep 27 16:34:51 2001 Gilles Detillieux * htlib/Makefile.in, htlib/HtRegex.cc, htlib/HtRegex.h, htlib/HtRegexReplace.cc, htlib/HtRegexReplace.h, htlib/HtRegexReplaceList.cc, htlib/HtRegexReplaceList.h, htlib/HtURLRewriter.cc, htlib/HtURLRewriter.h: Added new classes to support regular expressions and implement url_rewrite_rules attribute, using Geoff's variation of Andy Armstrong's implementation of this. * htlib/URL.h, htlib/URL.cc: Added URL::rewrite() method. * htlib/htString.h: Added Nth() method for HtRegex class. * htdig/Retriever.cc (got_href, got_redirect): Added calls to url.rewrite(), and debugging output for this. * htdig/htdig.cc (main): Added calls to make instance of HtURLRewriter class. * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html, htcommon/defaults.cc: Added url_rewrite_rules attribute. Mon Sep 17 16:52:07 2001 Gilles Detillieux * htdoc/running.html: New documentation on how to run after configuring. * htdoc/rundig.html: New manual page for rundig script. * htdoc/install.html: Added link to running.html. * htdoc/contents.html: Added link to running.html, rundig.html, related projects. Updated links to contrib and developer site. Got rid of link to web site stats. Fri Sep 14 09:18:38 2001 Gilles Detillieux * htdig/Document.cc (RetrieveHTTP): Add port to Host: header when port is not default, as per RFC2616(14.23). Fixes bug #459969. Sat Sep 8 22:04:47 2001 Geoff Hutchison * acconfig.h, include/htconfig.h.in: Add undef for ALLOW_INSECURE_CGI_CONFIG, which if defined does about what you'd expect. (This is for any wrapper authors who don't want to rewrite but are willing to run insecure.) * htsearch/htsearch.cc: Only allow the -c flag to work when REQUEST_METHOD is undefined. Fixes PR#458013. Fri Aug 31 16:00:37 2001 Gilles Detillieux * htlib/URL.cc (URL): Fixed to call normalizePath() even if URL is relative but with absolute path. Should fix bug #408586. Fri Aug 31 15:21:49 2001 Gilles Detillieux * htdig/HTML.h, htdig/HTML.cc (HTML, parse, do_tag): Fixed buggy handling of nested tags that independently turn off indexing, so doesn't cancel tag. Add handling of tag. Fri Aug 31 14:33:41 2001 Gilles Detillieux [ Backport some 3.2.0b4 HTML parser changes. ] * htdig/HTML.cc (do_tag): Rewrite using Configuration class to separate tag attributes. Parse tags properly, looking for rather than src=. Add support for TITLE attributes in anchor and related tags. Treat tags as noindex tags, much like as suggested by Torsten. * htdig/HTML.cc(parse): Fix to prevent closing ">" from being passed to do_tag(). Wed Aug 29 10:20:55 2001 Gilles Detillieux * htdoc/attrs.html (allow_in_form, build_select_lists, limit_normalized, server_aliases, server_max_docs, server_wait_time, url_part_aliases): Added clarifications to allow_in_form, server_aliases and url_part_aliases descriptions. Changed word "directive" to "attribute" where appropriate. Added cross-link to server_aliases from limit_normalized, and to allow_in_form from build_select_lists. Mon Aug 27 17:22:56 2001 Gilles Detillieux * htdig/HTML.cc (do_tag): Improve handling of whitespace in META refresh handling. Fixes bug #406244. Mon Aug 27 16:38:43 2001 Gilles Detillieux * htdig/HTML.cc (parse): Fixed delete [] text (was missing []), added simple optimizations for comment & noindex_start skipping, handle decoded < entity correctly. Mon Aug 27 15:31:01 2001 Gilles Detillieux [ Backport 3.2.0b4 config files. ] * installdir/htdig.conf: Added .css to bad_extensions default, added missing closing ">", added mentions of accents & substring, fixed a couple typos in comments. * installdir/search.html: Add DTD tag for HTML 4 compliance. * installdir/{long, syntax, header, footer, wrapper, nomatch}.html: Add DTD tags, ALT attributes and remove bogus tags to fix invalid HTML pointed out in PR#901. Change all and tags to the HTML-4.0 compliant and tags. * htdoc/config.html: Updated with sample of latest htdig.conf and installdir/*.html, added blurb on wrapper.html. Thu Jul 26 15:05:29 2001 Gilles Detillieux * htcommon/defaults.cc, htsearch/parser.cc (perform_or), htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Added new attribute multimatch_method and used it to boost score on 'or' method with multiple matches. Thu Jul 26 14:25:01 2001 Gilles Detillieux * htcommon/defaults.cc, htsearch/parser.cc, htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Added new attribute boolean_syntax_errors and used it to generate syntax error messages for boolean method. Wed Jul 25 23:39:00 2001 Gilles Detillieux * htnotify/htnotify.cc: Changed calls to EmailNotification class to avoid compiler warnings. Wed Jul 25 23:15:24 2001 Gilles Detillieux * htcommon/defaults.cc, htsearch/htsearch.cc, htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Added new attribute boolean_keywords and used it to make LOGICAL_WORDS and parse "words" using boolean method. Wed Jul 25 22:31:19 2001 Gilles Detillieux * htlib/Dictionary.cc (Remove): Fixed so it doesn't clobber rest of chain when removing an entry, as suggested by Yariv Tal. Wed Jul 25 22:06:08 2001 Gilles Detillieux * htcommon/defaults.cc: Add new attributes htnotify_replyto, htnotify_webmaster, htnotify_prefix_file, htnotify_suffix_file. * htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Document them. * htnotify/htnotify.cc, htnotify/EmailNotification.{h,cc}, htnotify/Makefile.in: Added in code from Richard Beton to collect multiple URLs per e-mail address and allow customization of notification messages by reading in header/footer text as designated by the new attributes above. * htdoc/THANKS.html: Credit where due. Wed Jul 25 21:38:21 2001 Gilles Detillieux * htcommon/defaults.cc: Added .css to bad_extensions, for consistency with 3.2. * htdoc/attrs.html: Ditto for default value. Also set examples for translate_* and modification_time_is_now to false so the example is different than default. Wed Jul 25 17:26:07 2001 Geoff Hutchison * htdig/Document.cc (getParsable): Add conditional to catch text/css files to prevent these from being parsed as Plaintext. * htdig/htdig.cc: Quick fix to make the logging -l flag the default behavior. (Set to Retriever_logUrl from the start.) * htcommon/defaults.cc: Set modification_time_is_now to default to true (now that it works correctly). Also set translate_* attributes to true. * htdoc/htdig.html: Remove documentation for -l flag--now no longer used. * htdoc/attrs.html: Correct new default values for modification_time_is_now and translate_* attributes. Tue Jul 24 16:12:45 2001 Gilles Detillieux * htdoc/attrs.html: Added reference to maximum_page_buttons in the section on maximum_pages. Tue Jul 24 15:38:39 2001 Gilles Detillieux * htsearch/Display.cc (generateStars): Add NSTARS variable for template output as suggested by Caleb Crome (except here precision is 0). Fixes feature request #405787. * htdoc/hts_templates.html: Add description of NSTARS variable above. (Actually copied hts_templates.html from 3.2.0b4.) Tue Jul 24 14:21:53 2001 Gilles Detillieux * htsearch/Display.cc (expandVariables, outputVariable), htdoc/hts_templates.html: Add support for $=(var) template variable references, as suggested by Quim Sanmarti. Tue Jul 24 14:12:06 2001 Gilles Detillieux * htsearch/Display.cc (readFile): Added missing fclose() call, and debugging message for when file can't be opened. * htsearch/Display.cc (displayParsedFile): Added debugging message for when file can't be opened. Tue Jul 24 14:03:12 2001 Gilles Detillieux * htsearch/Display.cc (setVariables), htcommon/defaults.cc: Added maximum_page_buttons attribute, to limit buttons to less than maximum_pages. Fixes PR#731 & PR#781. * htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Documented it. Tue Jul 24 13:42:56 2001 Gilles Detillieux * htdoc/hts_templates.html, htsearch/Display.cc (displayMatch): Add METADESCRIPTION variable. Tue Jul 24 13:20:24 2001 Gilles Detillieux * htcommon/DocumentDB.{h,cc}: Added FindCoded() method to lookup docdb record with URL that's still encoded. * htsearch/Display.cc (display, displayMatch, buildMatchList): Use new method to avoid problems with URLs that are decoded and reencoded with another, more ambiguous url_part_aliases setting. Also fixed a problem with date range checking looking at ref before checking if it's null. Thu Jul 12 11:45:05 2001 Gilles Detillieux * contrib/conv_doc.pl, contrib/parse_doc.pl: Fixed EOF handling in dehyphenation, fixed to handle %xx codes in title made from URL. * contrib/doc2html/doc2html.pl, contrib/doc2html/pdf2html.pl, contrib/doc2html/swf2html.pl: Fixed to handle %xx codes in URL title. Thu Jul 5 11:23:40 2001 Geoff Hutchison * db/dist/config.guess: Update with more recent GNU version that recognizes various flavors of Mac OS X automatically. * htlib/DB2_db.cc: Only #include if we have it. Fixes compilation problems on Mac OS X. * htlib/String.cc: Include instead of depreciated . Fixes compilation problems with Mac OS X. * htlib/Configuration.cc: Make sure we never try to operate on strings of no length--accessing string[-1] is a bug--exposed on Mac OS X. Fri Jun 29 11:56:25 2001 Gilles Detillieux * htdig/Retriever.cc (got_redirect): Allow the redirect to accept relative redirects instead of just full URLs. Fri Jun 22 16:25:21 2001 Gilles Detillieux * htdoc/THANKS.html: Credit Marc Pohl and Robert Marchand. * htsearch/Display.cc (buildMatchList): Fix date_factor calculation to avoid 32-bit int overflow after multiplication by 1000, and avoid repetitive time(0) call, as contributed by Marc Pohl. Also move the localtime() call up before gmtime() call, to avoid clobbering gmtime's returned static structure (my thinko). Tue Jun 19 17:07:01 2001 Gilles Detillieux * htsearch/Display.cc (setVariables): Fixed handling of build_select_lists attribute, to deal with new restrict & exclude attributes. Fri Jun 15 17:45:40 2001 Gilles Detillieux * htdoc/require.html: Added mentions of accents, prefix & substring, taken from 3.2.0b4. * htdoc/htfuzzy: Added blurb on accents algorithm, taken from 3.2.0b4. * htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Added entry for accents_db attribute for htfuzzy and htsearch. Mentioned accents algorithm in description of search_algorithm. Noted effect of locale setting on floating point numbers in search_algorithm and locale descriptions. Fri Jun 15 16:47:09 2001 Gilles Detillieux * htfuzzy/Accents.{h,cc}, htfuzzy/Fuzzy.c (getFuzzyByName), htfuzzy/htfuzzy.cc (main, usage), htfuzzy/Makefile.in: Added latest version of Robert Marchand's accents fuzzy match algorithm. * htcommon/defaults.cc: Added accents_db attribute for this. * htsearch/htsearch.cc: Fixed parsing of search_algorithm not to use comma as separator, because it may be needed as decimal point in some locales. Fri Jun 15 16:30:19 2001 Gilles Detillieux * htfuzzy/Endings.cc (getWords): Undid change introduced in 3.1.3, in part. It now gets permutations of word whether or not it has a root, but it also gets permutations of one or more roots that the word has, based on a suggestion by Alexander Lebedev. * htfuzzy/EndingsDB.cc (createRoot): Fixed to handle words that have more than one root. * installdir/english.0: Removed P flag from wit, like and high, so they're not treated as roots of witness, likeness and highness, which are already in the dictionary. Thu Jun 7 17:09:46 2001 Gilles Detillieux * htcommon/defaults.cc: Add new attribute use_doc_date to use document meta information for the DocTime() field. * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: Document it. * htdig/HTML.cc(do_tag): Call Retriever::got_time if use_doc_date is set and we run across a META date tag. * htdig/Retriever.h, htdig/Retriver.cc: Add new got_date function. When called, sets the DocTime field of the DocumentRef after parsing is completed. Currently assumes ISO 8601 format for the date tag. Thu Jun 7 16:48:13 2001 Gilles Detillieux * htcommon/defaults.cc: Add new attribute any_keywords to allow ORing of keywords input parameter. * htsearch/htsearch.cc (addRequiredWords): Use it. Fix handling of empty search word list. * htsearch/Display.cc (excerpt, highlight): Fix handling of case where "words" is empty but "keywords" isn't. * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: Document any_keywords. Thu Jun 7 16:34:41 2001 Gilles Detillieux * htcommon/defaults.cc: Add new attribute plural_suffix to set the language-dependent suffix for PLURAL_MATCHES contributed by Jesse. * htsearch/Display.cc (setVariables): Use it. * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: Document it. Thu Jun 7 16:03:17 2001 Gilles Detillieux * htsearch/Display.{h,cc}, htcommon/defaults.cc: Added multi-excerpt feature and max_excerpts attribute, as contributed by Jim Cole. * htdoc/THANKS.html, htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: Credit where due, and document attribute. Thu Jun 7 15:27:33 2001 Gilles Detillieux * htdig/ExternalParser.cc: Backported from 3.2.0b3, fixing these problems: no longer confused by "; charset=..." in Content-Type, avoids security problems with popen() and shell parsing untrusted URL (PR#542, PR#951), avoids predictable temporary file name if mkstemp() exists, binary output from external converter no longer mangled, less ambiguous error messages, opens temp. file in binary mode on non-Unix systems. Thu Jun 7 15:10:14 2001 Gilles Detillieux * htcommon/DocumentDB.{h,cc}: Replace CreateSearchDB() with DumpDB(), add LoadDB(), both backported from 3.2.0b3. * htdig/htdig.cc (main, usage), htdig/Makefile.in, htdoc/htdig.html: Add handling of -m (minimal) option, file input for URLs, and arg 0 handling for htdump & htload. * htdig/HTML.cc (do_tag): Change all white space to blanks in meta description tag, for proper ASCII record dumps by htdump, and to fix bug #405771. * htlib/String.cc (= operator), htlib/htString.cc: change handling of 0 length strings. Add readLine() for htload support. Thu Jun 7 14:41:42 2001 Gilles Detillieux * htdig/Retriever.cc (got_href): Fix hop count mishandling. Thu Jun 7 14:23:47 2001 Gilles Detillieux * htmerge/db.cc (mergeDB), htmerge/words.cc (mergeWords), installdir/rundig: Fix various htmerge bugs. Quotes the temp. directory name and word_list name (PR#872). Correctly handles words beginning with +, - and ! when in extra_word_characters (PR#952). Corrects problems with bad wordlists generated by htmerge -m causing it to lose entries in words.db and problems with the sort program using non-ASCII collating having a similar effect. Thu Jun 7 14:13:56 2001 Gilles Detillieux * htsearch/htsearch.cc (main), htsearch/Display.cc (setVariables, createURL, buildMatchList), htdoc/THANKS.html, htdoc/hts_form.html, htdoc/hts_templates.html: Add Mike Grommet's date range search feature. Thu Jun 7 13:57:06 2001 Gilles Detillieux * htdig/Retriever.cc (GetLocal, GetLocalUser): Fix to allow compiling on AIX & other non-GNU compilers. Thu Jun 7 13:52:20 2001 Gilles Detillieux * htsearch/Display.cc (setVariables): Extend the handling of build_select_lists to handle select multiple, radio buttons and checkboxes. * htdoc/attrs.html, htdoc/hts_selectors.html: Describe this. Thu Jun 7 13:40:13 2001 Gilles Detillieux * htfuzzy/Exact.cc (Exact), htfuzzy/Prefix.cc (Prefix): Set the name field to the class name, as suggested by Jesse. Thu Jun 7 13:27:35 2001 Gilles Detillieux * contrib/htdig-3.1.6.spec, contrib/htdig-3.1.6-conf.patch, htdoc/where.html, .version, README: Bump to version 3.1.6. Thu Jun 7 11:58:28 2001 Gilles Detillieux * contrib/multidig/*: Backport from 3.2.0b3, including fixes below. * contrib/multidig/Makefile, gen-collect, db.conf, multidig.conf: Add missing trailing newlines as pointed out by Doug Moran . * contrib/multidig/Makefile (install): Make sure scripts have a+x permissions. Pointed out by Doug Moran. * contrib/multidig/new-collect: Fix typo to ensure MULTIDIG_CONF is set correctly. Thu Jun 7 11:37:52 2001 Gilles Detillieux * contrib/README: Add in descriptions for web site contrib directory, acroconv.pl & conv_doc.pl. * contrib/examples/rundig.sh: Update to most recent version for 3.1.x. * contrib/htparsedoc/htparsedoc: Add in contributed bug fixes from Andrew Bishop to work on SunOS 4.x machines. * contrib/acroconv.pl: Added external converter script to convert PDFs with acroread. Thu Jun 7 10:41:05 2001 Gilles Detillieux * htlib/ParsedString.cc (get), htsearch/Display.cc (expandVariables): Use isalnum() instead of isalpha() to allow digits in attribute and variable names, allow '-' in variable names too for consistency. Wed Jun 6 17:13:49 2001 Gilles Detillieux * htdig/HTML.cc (do_tag): Make parsing of meta robots tag case insensitive. Wed Jun 6 15:31:00 2001 Gilles Detillieux * contrib/doc2html/DETAILS, contrib/doc2html/README, contrib/doc2html/doc2html.cfg, contrib/doc2html/doc2html.sty, contrib/doc2html/doc2html.pl, contrib/doc2html/pdf2html.pl, contrib/doc2html/swf2html.pl: Added version 3.0 of doc2html, contributed by David Adams . Mon Jun 4 10:31:45 CEST 2001 Gabriele Bartolini * htdoc/cf_byname.html: I forgot to insert the 'restrict' attribute. Wed May 30 11:30:43 2001 Gabriele Bartolini * htsearch/htsearch.cc: two new attributes, used by htsearch, have been added: restrict and exclude. They can now give more control to template customisation through configuration files, allowing to restrict or exclude URLs from search without passing any CGI variables (although this specification overrides the configuration one). * htcommon/defaults.cc: ditto * htdoc/attrs.html: ditto * htdoc/cf_byname.html: ditto * htdoc/cf_byprog.html: ditto * htdoc/hts_form.html: ditto Sat May 5 21:43:32 2001 Geoff Hutchison * configure.in, configure: Add tests for wait.h, sys/wait.h, mkstemp() and malloc.h. * acconfig.h, include/htconfig.h.in: Update with autoheader for new tests. * htlib/regex.[h,c]: Update with backports from 3.2.0b4 development. Tue Feb 29 23:04:04 2000 Geoff Hutchison * htlib/DB2_db.cc (Error): Simply fprint the error message on stderr. This is not a method since the db.h interface expects a C function. (db_init): Don't set db_errfile, instead set errcall to point to the new Error function. Fri Feb 25 10:11:50 2000 Gilles Detillieux * htdoc/attrs.html (maximum_pages): Describe new bahaviour (as of 3.1.4), where this limits total matches shown. Thu Feb 24 20:24:24 2000 Geoff Hutchison * htdoc/FAQ.html: Update to refer to 3.1.5 and edit comments about 3.2. Thu Feb 24 15:20:08 2000 Gilles Detillieux * htdoc/RELEASE.html, htdoc/main.html: Updated notes for 3.1.5 release. Thu Feb 24 10:37:45 2000 Gilles Detillieux * htdoc/attrs.html (external_parsers): Add references to FAQ 4.8 & 4.9. (local_default_doc): Give an expanded example. (logging): Explain log entry format. (star_blank): Fix some old typos (incorrect references to other attrs.) Wed Feb 23 13:58:24 2000 Gilles Detillieux * htcommon/cgi.cc(init): Fixed bug: array must be free by delete [] buf, not just delete buf; (from Vadim). * installdir/syntax.html: Fixed a $(WORDS) I'd missed earlier. Tue Feb 22 12:40:22 2000 Gilles Detillieux * htdoc/RELEASE.html, htdoc/main.html: Updated notes for 3.1.5 release. * htlib/URL.cc (URL, normalizePath): Fix PR#779, to handle relative URLs correctly when there's a trailing ".." or leading "//". Thu Feb 17 15:58:53 2000 Gilles Detillieux * htdoc/RELEASE.html, htdoc/main.html: Add notes for 3.1.5 release. * htdoc/TODO.html, htdoc/author.html, htdoc/bugs.html, htdoc/cf_general.html, htdoc/cf_types.html, htdoc/cf_variables.html, htdoc/config.html, htdoc/howitworks.html, htdoc/htdig.html, htdoc/htfuzzy.html, htdoc/htmerge.html, htdoc/htnotify.html, htdoc/hts_form.html, htdoc/hts_general.html, htdoc/hts_method.html, htdoc/install.html, htdoc/isp.html, htdoc/mailing.html, htdoc/meta.html, htdoc/notification.html, htdoc/require.html, htdoc/uses.html, htdoc/where.html: Update copyright date and fix last modified date for automatic CVS update. Thu Feb 17 14:37:18 2000 Gilles Detillieux * installdir/htdig.conf: quote all HTML tag parameters. * htsearch/TemplateList.cc (createFromString), installdir/long.html, installdir/short.html: Use $&(URL) in templates. Thu Feb 17 14:01:34 2000 Gilles Detillieux * contrib/htdig-3.1.5.spec: Fix silly typos in %post script, make cron script a %config file. Thu Feb 17 10:34:05 2000 Gilles Detillieux [ Improve htsearch's HTML 4.0 compliance ] * htsearch/TemplateList.cc (createFromString): Use file name rather than internal name to select builtin-* templates, use $&(TITLE) in templates and quote HTML tag parameters. * installdir/long.html, installdir/short.html: Use $&(TITLE) in templates and quote HTML tag parameters. * htsearch/Display.cc (setVariables): quote all HTML tag parameters in generated select lists. * installdir/footer.html, installdir/header.html, installdir/nomatch.html, installdir/search.html, installdir/syntax.html, installdir/wrapper.html: Use $&(var) where appropriate, and quote HTML tag parameters. Thu Feb 17 10:00:26 2000 Gilles Detillieux * contrib/htdig-3.1.5.spec: Fix %post script to add more descriptive htdig.conf entries. Wed Feb 16 16:26:05 2000 Gilles Detillieux * contrib/htdig-3.1.5.spec, contrib/htdig-3.1.5-conf.patch, htdoc/where.html, .version, README: Bump to version 3.1.5. * htdoc/THANKS.html: Added new contributors. * htdoc/FAQ.html, htdoc/main.html: Updated to versions from web site. Wed Feb 16 15:49:28 2000 Gilles Detillieux * htlib/Configuration.h, htlib/Configuration.cc: split Add() method into Add() and AddParsed(), so that only config attributes get parsed. Use AddParsed() only in Read() and Defaults(). Wed Feb 16 15:02:47 2000 Gilles Detillieux * htlib/URL.h (encodeURL): Change list of valid characters to include only unreserved ones. * htlib/cgi.cc (init): Allow "&" and ";" as input parameter separators. * htsearch/Display.cc (createURL): Encode each parameter separately, using new unreserved list, before piecing together query string, to allow characters like "?=&" within parameters to be encoded. Wed Feb 16 14:42:02 2000 Gilles Detillieux * htsearch/Display.cc (encodeSGML, excerpt): Add encoding for characters that could pose problems in HTML output. * htsearch/Display.cc (expandVariables, outputVariables): Add support for $&(var) and $%(var) template variable references. This should fix PR#750, once we use this in common/*.html. Tue Feb 15 17:21:08 2000 Gilles Detillieux [ Applied a whole collection of patches and fixes from the archives ] * htdig/Server.cc (robotstxt): apply more rigorous parsing of multiple user-agent fields, and use only the first one. * htdig/Retriever.cc(GetLocal, GetLocalUser): Add URL-decoding enhancements to local_urls, local_default_urls & local_default_doc, to allow hex encoding of special characters. * htdoc/attrs.html: Document these. * htdig/Retriever.cc (IsValidURL): Fix problem with valid_extensions when an "extension" would include part of a directory path or server name, as contributed by Warren Jones. Also fix problem with valid_extensions matching failure when URL parameters follow extension, as reported by fxbois@cybercable.fr. * htdig/Document.cc (RetrieveLocal), htdig/Document.h, htdig/Retriever.cc(Initial, parse_url, GetLocal, GetLocalUser, IsLocalURL, got_href, got_redirect), htdig/Retriever.h, htdig/Server.cc(Server), htdig/Server.h: Apply Paul B. Henson's enhancements to local_urls, local_user_urls & local_default_doc. * htdoc/attrs.html: Document these. * htsearch/htsearch.cc (setupWords): Fix problem reported by D.J. Adams, in which bad_words removal failed on upper-case search words. * htsearch/Display.cc(setVariables), htcommon/defaults.cc: Added build_select_lists attribute, to generate selector menus in forms. * htdoc/hts_selectors.html: Added this page to explain this new feature, plus other details on select lists in general. * htdoc/hts_templates.html: Added relevant links to related attributes and selectors documentation. * htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Added relevant explanations and links to selectors documentation. * htlib/QuotedStringList.cc (Create): fix PR#743, where quoted string lists didn't allow embedded quotes of opposite sort in strings (e.g. "'" or '"'), and fix to avoid overrunning end of string if it ends with backslash. * htcommon/WordList.cc (valid_word): Applied Marc Pohl's fix to make this 8-bit clean on Solaris. * contrib/conv_doc.pl, contrib/parse_doc.pl: Applied Warren Jones's changes to these scripts. * htdig/PDF.cc (parseNonTextLine): Fix bogus escape sequences around Title parsing. (Fixes PR#740) * htsearch/Display.cc (display, displaySyntaxError), htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html, htcommon/defaults.cc: Add new attribute "nph" to send out non-parsed headers for servers that do not supply HTTP headers on CGI output (e.g. IIS). If nph is set, send out HTTP OK header, as suggested by Matthew Daniel (PR#727) * htdig/Document.cc (getdate): avoid strftime() altogether on filled-in tm structure, to avoid recurring segfault problems. (PR#734) * htlib/strptime.cc (mystrptime): Use Warren Jones's fix to deal with a web server that returns dates with a two digit year field. (Fixes PR#770) * htdig/HTML.cc (HTML, parse, do_tag), htcommon/defaults.cc, htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: Add max_keywords attribute to limit meta keyword spamming. Wed Dec 8 18:19:32 1999 Geoff Hutchison * htdoc/FAQ.html, htdoc/bugs.html: Update to refer to latest versions. (Update for 3.1.4 release.) Wed Dec 8 18:10:27 1999 Gilles Detillieux * htlib/QuotedStringList.cc (Create): Make sure that an empty token isn't ignored. Tue Dec 7 10:26:58 1999 Geoff Hutchison * htsearch/Display.cc (setVariables): Fix a compilation error by making a statment with '?' an explicit if-else statment. * htdoc/RELEASE.html: Change case_sensitive fix to a bug-fix, update release date for 12/9/99. (We certainly didn't release yesterday!) Mon Dec 6 22:17:21 1999 Gilles Detillieux * htsearch/Display.cc(Display): Add missing call to setupTemplates(), for handling template_patterns. Oops! * htdoc/attrs.html: Fixed a couple typos in new attributes. * htdoc/ChangeLog: Update to latest version. Mon Dec 6 16:41:04 1999 Gilles Detillieux * htdoc/main.html: Update news with latest version. * htdig/htdig.cc(main), htdig/Document.cc(Document), htcommon/defaults.cc, htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: Add authorization attribute, settable by htdig -u. Also fixes PR#490, by setting authentication before robots.txt fetched. * htdoc/RELEASE.html: Update with latest fix. Fri Dec 3 17:31:47 1999 Gilles Detillieux * htcommon/DocumentRef.cc(Clear): Set docHopCount & docSig to 0, and clear docEmail, docNotification & docSubject strings to have a clean slate for Deserialize(), which assume 0/empty for these. Fixes problem with hop counts getting clobbered. * htdoc/RELEASE.html: Update with latest fix. * htdoc/ChangeLog: Update to latest version. Fri Dec 3 12:12:19 1999 Gilles Detillieux * htdig/Document.cc: removed vestiges of internal Postscript support that never worked, and removed test for application/msword, which is handled only by external parser. * htdig/Makefile.in: removed Postscript.o from list. * htdig/Retriever.cc(parse_url): Fix compilation error; (Initial, got_href, got_redirect): Try to get the local filename for a server's robots.txt file and pass it along to the newly generated server. * htdig/Server.cc(Server): Retrieve the robots.txt file from the filesystem when possible; fix compilation error. * htdig/Server.h(Server): Add local_robots_file parameter to Server(). * htlib/HtWordType.h, htlib/HtWordType.cc: fix compilation errors. Fri Dec 3 10:52:57 1999 Gilles Detillieux * htdig/HTML.cc(parse, do_tag): Add handling of ... text, fix parsing of words in meta tags, disable indexing of meta tags when "noindex" state in effect, fix calculations of word positions to more accurately reflect relative positions. * htlib/HtWordType.h, htlib/HtWordType.cc: Add HtWordToken() function, to replace strtok() in HTML parser. * htdoc/RELEASE.html: Update with latest fixes. Fri Dec 3 09:02:55 1999 Gilles Detillieux * htlib/Configuration(Add): handle strings in single quotes, as in parm='value'. Thu Dec 2 16:14:28 1999 Gilles Detillieux * htdoc/attrs.html: Add Tom Metro's suggested revisions for pdf_parser and external_parsers. Thu Dec 2 15:15:03 1999 Gilles Detillieux * htdoc/mailing.html: Updated to version from htdig.org web site. * htcommon/defaults.cc: Add missing no_page_number_text and page_number_text attribute definitions. * htdoc/attrs.html(modification_time_is_now): Make the description a bit clearer as to how it may cut down on reindexing. Thu Dec 2 13:46:11 1999 Gilles Detillieux * htdig/Retriever.cc(parse_url), htdig/Server.cc(Server), htcommon/defaults.cc, htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: Add support for local_urls_only attribute. * htdoc/RELEASE.html: Update with latest feature. Thu Dec 2 11:02:07 1999 Gilles Detillieux * htlib/URL.cc(ServerAlias): Fix server_aliases processing to prevent infinite loop (as for local_urls in PR#688). Wed Dec 1 17:23:24 1999 Gilles Detillieux * htdig/Retriever.cc(parse_url), htdig/Server.h: add IsDead() methods to query and set server status, use them in Retriever to avoid repeated HTTP request to a dead server. (Needed for persistent local stuff.) Wed Dec 1 16:56:28 1999 Gilles Detillieux * htdig/Retriever.cc(GetLocal): Fix error in GetLocalUser() return value check, as suggested by Vadim. * contrib/conv_doc.pl: Added a sample external converter script. * htdoc/THANKS.html: A couple more additions. Tue Nov 30 15:02:25 1999 Gilles Detillieux * htdig/Retriever.cc(IsValidURL): Fix compilation error in valid_extensions list handling. * contrib/htdig-3.1.4.spec, contrib/htdig-3.1.4-conf.patch: Added sample RPM spec file and config patch for it. Tue Nov 30 14:01:51 1999 Gilles Detillieux * htdoc/where.html: Bump to version 3.1.4. * htdoc/THANKS.html: Added new contributors. * htdoc/isp.html, htdoc/uses.html, htdoc/main.html, htdoc/mailing.html: Updated to versions from htdig.org web site. Tue Nov 30 13:01:20 1999 Gilles Detillieux * htdoc/RELEASE.html: Add release notes for 3.1.4 release. * .version, README: Bump for 3.1.4. Tue Nov 30 11:03:34 1999 Gilles Detillieux * htdoc/attrs.html(backlink_factor): Added Geoff's clarification of what this attribute does. Tue Nov 30 09:47:05 1999 Gilles Detillieux * htdig/Document.cc(RetrieveLocal): Handle common extensions for text/plain, application/pdf & application/postscript. * htdig/Retriever.cc(IsValidURL): Add valid_extensions list handling, make it and bad_extensions case insensitive. * htcommon/defaults.cc: Add config attribute valid_extensions, with default as empty. * htdoc/attrs.html, htdoc/cf_by{name,prog}.html: Document it. Tue Nov 30 09:02:02 1999 Gilles Detillieux * htdig/Retriever.cc(got_href & got_redirect): remove all of Patrick's case insensitive server code, to replace it with Geoff's fix to URL.cc * htlib/URL.cc(normalizePath, path): If not case_sensitive, lowercase the URL. Should ensure that all URLs are appropriately lowercased, regardless of where they're generated. Mon Nov 29 20:25:01 1999 Gilles Detillieux * htdig/Retriever.cc, htdig/Retriever.h, htdig/Server.cc(push), htdig/Server.h: added Alexis's patch for persistent local digging even if HTTP server is down. Also made new GetLocal() method call GetLocalUser() itself, to simplify its use, and made it non-private, for eventual use by Server code. Mon Nov 29 19:18:20 1999 Gilles Detillieux * htdig/Retriever.cc(got_href & got_redirect): corrections to case insensitive server fix, to handle redirects, to make more thorough use of mapped URL, and to update it after normalization. Fri Nov 26 17:14:46 1999 Gilles Detillieux * htdig/Document.cc(RetrieveHTTP): always c.close() the connection when returning. * htdig/HTML.cc(HTML & do_tag): add code to turn off indexing between tags. Fri Nov 26 16:31:06 1999 Gilles Detillieux * htlib/Configuration.cc(Read): fixed to allow final line without terminating newline character, rather than ignoring it. * htlib/String.cc(write): added Alexis Mikhailov's fix to bump up pointer after writing a block. * htsearch/Display.cc(setVariables): added Alexis Mikhailov's fix to check the number of pages against maximum_pages at the right time. (Put it even earlier, to make sure nPages is at least 1.) * htsearch/Display.cc(generateStars): Remove extra newline after STARSRIGHT and STARSLEFT variables, noted by Torsten Neuer . Wed Nov 24 20:33:13 1999 Gilles Detillieux * installdir/htdig.conf: Add bad_extensions to make it more obvious to users how to exclude certain document types. Fix the comments for search_algorithm to refer to all the current possibilities. Add example of no_excerpt_show_top attribute in line with most user's expectations. (Geoff's changes) Wed Nov 24 20:02:32 1999 Gilles Detillieux * installdir/search.html (Match): Add Boolean to default search form, as suggested by PR#561. Tue Nov 23 23:03:45 1999 Gilles Detillieux * htsearch/Display.cc(setupTemplates), htsearch/Display.h: fixed a couple of compilation errors in template_patterns code. Tue Nov 23 22:16:31 1999 Gilles Detillieux * htdig/Retriever.cc(got_href): Applied Patrick's case insensitive server fix, to lowercase all URLs if case_sensitive is false. Tue Nov 23 22:08:22 1999 Gilles Detillieux * htlib/StringList.cc(Join): Applied Loic's patch to fix memory leak. Tue Nov 23 21:52:18 1999 Gilles Detillieux [Applied patch from Hanno Mueller , which includes...] * contrib/README: Add scriptname directory. * contrib/scriptname/*: An example of using htsearch within dynamic SSI pages * htcommon/defaults.cc: Add script_name attribute to override SCRIPT_NAME CGI environment variable. * htdoc/FAQ.html: Update question 4.7 based on including htsearch as a CGI in SSI markup. * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html, htdoc/hts_templates.html: Update based on behavior of script_name attribute. * htsearch/Display.cc: Set SCRIPT_NAME variable to attribute script_name if set and CGI environment variable if undefined. Tue Nov 23 21:29:03 1999 Gilles Detillieux * htdoc/FAQ.html: Added the past few month's updates to the FAQ. Tue Nov 23 21:20:35 1999 Gilles Detillieux * htcommon/defaults.cc, htsearch/Display.h, htsearch/Display.cc, htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html, htdoc/hts_templates.html: add template_patterns attribute, to select result templates based on URL patterns. Tue Nov 23 20:52:38 1999 Gilles Detillieux * htlib/cgi.h, htlib/cgi.cc(cgi & init), htsearch/htsearch.cc (main & usage): allow a query string to be passed as an argument. Tue Nov 23 20:35:05 1999 Gilles Detillieux * htsearch/Display.cc(setVariables & createURL), htsearch/htsearch.cc(main), htdoc/hts_templates.html: handle keywords input parameter like others, and make it propagate to followups. Tue Nov 23 20:25:45 1999 Gilles Detillieux * htdoc/attrs.html: removed vestigial references to MAX_MATCHES template variables in search_results_{header,footer}. * htdoc/hts_form.html: add disclaimer about keywords parameter not being limited to meta keywords. * htdoc/meta.html: add description of "keywords" meta tag property. add links to keywords_factor & meta_description_factor attributes. Tue Nov 23 20:07:20 1999 Gilles Detillieux * htsearch/Display.cc(setVariables & hilight): added Sergey's idea for start_highlight, end_highlight & page_number_separator attributes. * htcommon/defaults.cc: added defaults for these. * htdoc/attrs.html, htdoc/cf_by{name,prog}.html: documented them. Tue Nov 23 19:58:28 1999 Gilles Detillieux * htdig/ExternalParser.cc: added support for external converters as extension to external_parsers attribute. * htdoc/attrs.html: Updated external_parsers with new description and examples of external converters. Tue Nov 23 19:52:27 1999 Gilles Detillieux * htdig/HTML.cc(transSGML), htdig/SGMLEntities.cc(translateAndUpdate): Fix the infamous problem in htdig 3.1.3 of mangling URL parameters that contain bare ampersands (&), and not converting & entities in URLs. * htdig/Retriever.cc(IsLocal & IsLocalUser): Fix PR#688, where htdig goes into an infinite loop if an entry in local_urls (or local_user_urls) is missing a '=' (or a ','). * htcommon/cgi.cc(cgi): Fix bug in reading long queries via POST method (PR#668). * htnotify/htnotify.cc(send_notification): apply Jason Haar's fix to quote the sender name "ht://Dig Notification Service". Wed Sep 22 11:12:38 1999 Geoff Hutchison * htdoc/ChangeLog, htdoc/isp.html, htdoc/FAQ.html, htdoc/RELEASE.html, htdoc/THANKS.html, htdoc/attrs.html, htdoc/bugs.html, htdoc/contents.html, htdoc/main.html, htdoc/require.html, htdoc/uses.html, htdoc/where.html: Update for 3.1.3 release and synch with latest versions from the website. Wed Sep 15 17:54:31 1999 Alexander Bergolth A few changes to satisfy the AIX xlC compiler: * htdig/htdig.cc: Moved variable declaration out of case block. * configure.in, htconfig.in: Add check for sys/select.h. Add "long unsigned int" to the possible getpeername_length types. * htlib/Connection.cc: Include sys/select.h. Sun Sep 12 15:02:19 1999 Geoff Hutchison * .version: Bump for 3.1.3. * README: Bump first line for 3.1.3 release, remove mention of rx directory. * htdoc/ChangeLog: Update with latest version. * htdoc/RELEASE.html: Add release notes for 3.1.3 release. Thu Sep 9 14:52:19 1999 Gilles Detillieux * contrib/parse_doc.pl: fix bug in pdf title extraction. Wed Sep 1 15:58:14 1999 Gilles Detillieux * htdig/Retriever.cc(got_word): add code to check for compound words and add their component parts to the word database. * htdig/PDF.cc(parseString), htdig/Plaintext.cc(parse): Don't strip punctuation or lowercase the word before calling got_word. That should be left up to got_word & Word methods. * htlib/StringMatch.h, htlib/StringMatch.cc(Pattern, IgnoreCase): Add an IgnorePunct() method, which allows matches to skip over valid punctuation, change Pattern() and IgnoreCase() to accomodate this. * htsearch/htsearch.cc(main, createLogicalWords): use IgnorePunct() to highlight matching words in excerpts regardless of punctuation, toss out old origPattern, and don't add short or bad words to logicalPattern. * htlib/HtWordType.h, htlib/HtWordType.cc(Initialize): set up and use a lookup table to speed up HtIsWordChar() and HtIsStrictWordChar(). Wed Sep 1 15:48:13 1999 Gilles Detillieux * htdig/PDF.cc(parse), htcommon/defaults.cc, htdoc/attrs.html: Fix PDF.cc to handle acroread in Acrobat 4, which has a bug with the -pairs option. It turns out that even without the -pairs option, acroread 4 is still prone to segmentation violations when generating PostScript, so acroread 3 is a better choice anyway. * htdoc/FAQ.html: Added the past few month's updates to the FAQ. * contrib/parse_doc.pl: Updated to latest version, adapted for xpdf 0.90. Wed Sep 1 15:39:41 1999 Gilles Detillieux Applied "bugfixes" patch collection, which I had posted to htdig@htdig.org mailing list in August. Changes include... * htsearch/Display.cc(expandVariables): Fix problem with $(VAR) at end of template string not being expanded. * htlib/URL.cc(URL): Fix PR#566 by setting the correct length of the string being matched. 'http://' is 7 characters. Submitted by . * htdig/HTML.h, htdig/HTML.cc(do_tag, transSGML): Fix the HTML parser to decode SGML entities within tag attributes. * htlib/URL.cc(ServerAlias): Fix server_aliases entries so port defaults to 80 if omitted. * htlib/URL.cc(removeIndex): Fix the infamous problem with files like left_index.html not getting indexed. PR#543 & PR#585. * htdig/PDF.cc(parseNonTextLine): Fixed a bug in the PDF parser: when the Title header was just the temporary file name, it wouldn't be used, but it also wouldn't be cleared from the _parsedString variable, so it ended up polluting the document excerpt. * htdig/Document.cc(RetrieveHTTP): Added error messages for unknown hosts. * htlib/cgi.cc(cgi): Fix PR#572, where htsearch crashed if CONTENT_LENGTH was not set but REQUEST_METHOD was. * htdig/HTML.cc(do_tag): Fix robots parsing to allow multiple directives to work correctly. Fixes PR#578, as provided by Chris Liddiard . * htsearch/htsearch.cc(main): Allow multiple keywords input parameters in search forms. * htdig/Document.cc(Reset, readHeader): Fix the bug in the handling of modification_time_is_now. * htfuzzy/Fuzzy.cc(getWords), htfuzzy/Metaphone.cc(vscode,generateKey): Should fix PR#514 in the bug database. It's Geoff's first attempt, with a minor correction, plus an added test in the vscode macro, which is where the problem seemed to be happening. This won't map accented vowels to their unaccented counterparts, but it should hopefully put an end to the segmentation faults. * include/htconfig.h.in, htcommon/WordReference.h, htcommon/WordList.cc(Word, Flush, BadWordFile), htcommon/DocumentRef.cc(AddDescription), htcommon/defaults.cc, htsearch/parser.cc(perform_push), htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: Change the maximum word length into a run-time option, rather than compile-time. * htsearch/Display.cc(displayMatch): Applied Torsten Neuer's fix for PR#554. * htdig/HTML.cc(HTML, do_tag): Added support for , and tags. * htdig/htdig.cc(main): Applied Geoff's patch to hide the username/password in the command line arguments. * htdig/Document.cc(readHeader): Fixed a few problems with header parsing, including PR#535 & PR#557. * htdig/Document.cc(getdate): This should help with PR#81 & PR#472, where strftime() would crash on some systems. Idea submitted by benoit.sibaud@cnet.francetelecom.fr. * COPYING, htdoc/COPYING, Makefile.in: Updated the FSF address in COPYING & Makefile.in. PR#595. * htdig/Retriever.cc(IsValidURL): Fix PR#493, to avoid rejecting a valid URL with ".." in it. * htlib/URL.cc(parse): Fix PR#348, to make sure a missing or invalid port number will get set correctly. * htsearch/Display.h, htsearch/Display.cc(excerpt): Fix declaration to refer to "first" as reference--ensures ANCHOR is properly set. Fixes PR#541 as suggested by . * htdig/ExternalParser.cc(parse): Quote the filename before passing it to the command-line to prevent shell escapes. Fixes PR#542. Also make error messages more useful. * htfuzzy/Endings.cc(getWords): Suffix-handling improvement (PR#560), to prevent inappropriate suffix stripping in endings fuzzy matches. * htlib/URLTrans.cc(encodeURL): Fix encoding so all non-ascii characters get hex-encoded. I think this is what PR#339 was all about. * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: Added descriptions for attributes that were missing, added a few clarifications, and corrected a few defaults and typos. Covers PR#558, PR#626, and then some. * configure.in, configure, include/htconfig.h.in, htlib/regex.c: Fix PR#545, to test for presence of alloca.h Wed Apr 21 22:45:16 1999 Geoff Hutchison * .version: Bump for final 3.1.2 release. * htdoc/where.html, htdoc/FAQ.html: Update to mention the new release. Tue Apr 20 13:34:22 1999 Gilles Detillieux * htdoc/RELEASE.html: Fixed a few typos, updated modification date. Tue Apr 20 10:54:59 1999 Geoff Hutchison * htdoc/RELEASE.html: Add notes on changes in the 3.1.2 release. * htdoc/contents.html, htdoc/mailarchive.html, htdoc/where.html, htdoc/uses.html: Update with versions from maindocs. * installdir/htdig.conf: Add example max_doc_size attribute to cut down on FAQ, also add comment on including a file for start_url. Mon Apr 19 15:40:24 1999 Gilles Detillieux * htcommon/WordList.cc(valid_word): fixed to avoid having the new HtIsStrictWordChar() test circumvent the allow_numbers option by allowing numbers all the time. Also fixed to allow HtIsStrictWordChar() to override iscntrl(), so extra_word_characters can define characters that a broken locale would define as control characters. Mon Apr 19 15:17:12 1999 Gilles Detillieux * htcommon/WordList.cc(valid_word): fixed bug introduced Jan 9, where it stopped scanning for control characters prematurely. Now also use iscntrl() to detect all control characters. Fri Apr 16 10:30:42 1999 Gilles Detillieux * htdoc/FAQ.html: fixed typo - use_meta_description was plural. Wed Apr 14 20:22:31 1999 Alexander Bergolth * htlib/regex.h: fixed compile problem with AIX xlc compiler Tue Apr 13 13:01:04 1999 Gilles Detillieux * htsearch/Display.cc(generateStars): Set status to -1 if URLimage.hasPattern() fails, to avoid empty URLimageList. (Fix to Mar 31 change.) Tue Apr 13 11:27:45 1999 Gilles Detillieux * htsearch/Display.h(class Display): move enum SortType up to public section, to avoid problem compiling on IBM AIX C++ compiler. Mon Apr 12 17:36:20 1999 Gilles Detillieux * htdoc/FAQ.html: added sections on indexing docs in other languages, practical & theoretical limits of ht://Dig. Fri Apr 9 16:47:34 1999 Gilles Detillieux * htdoc/FAQ.html: Fixed a few typos. Fri Apr 9 16:24:21 1999 Gilles Detillieux * htdig/Document.cc(RetrieveHTTP): Show "Unable to build connection" message at lower debug level. Fri Apr 9 15:17:53 1999 Gilles Detillieux * htdoc/FAQ.html: Added changes in maindocs from Mar 18, a few clarifications, and four new questions. Wed Apr 7 19:41:12 1999 Geoff Hutchison * htsearch/htsearch.cc (usage): Remove bogus -w flag. Thu Apr 1 11:58:20 1999 Gilles Detillieux * htsearch/htsearch.cc(main): Apply Gabriele's patch to avoid using an invalid matchesperpage CGI input variable. * htsearch/Display.cc(display) & (setVariables): Correct any invalid values for matches_per_page attribute to avoid div. by 0 error. Wed Mar 31 18:21:21 1999 Geoff Hutchison * htdig/htdig.cc: Undo March 30 change. * htdig/Retriever.cc: Use excludes.hasPattern before using the exclude list. (More elegant solution to problem, as pointed out by Gilles.) * htsearch/Display.cc: Remove code setting URLimage to a bogus pattern. Instead, check that URLimage.hasPattern() before using it. Wed Mar 31 15:16:36 1999 Gilles Detillieux * htfuzzy/Synonym.cc: Fix previous fix of minor memory leak. (db pointer wasn't properly set) Tue Mar 30 20:08:18 1999 Geoff Hutchison * htdig/htdig.cc: If exclude_urls attribute is set to empty, set it to something that will never match a URL to ensure nothing is excluded. * Makefile.config.in: Fix typo leading to HTLIBS referring to itself. Mon Mar 29 16:47:48 1999 Gilles Detillieux * htsearch/Display.cc(excerpt): Added patch from Gabriele to improve display of excerpts--show top of description always, otherwise try to find the excerpt. Mon Mar 29 15:57:06 1999 Geoff Hutchison * htdig/htdig.cc: Rename main.cc for consistency with other directories. * htdig/Makefile.in: Use it. Mon Mar 29 12:53:17 1999 Gilles Detillieux * htlib/HtWordType.h (HtIsWordChar): Avoid matching 0 when using strchr. (HtIsStrictWordChar): Ditto. (Patch from Hans-Peter Nilsson) Mon Mar 29 10:51:54 1999 Geoff Hutchison * htlib/regex.h, htlib/regex.c: Include glibc versions of the regex functions to override possibly buggy system versions. * htlib/Makefile.in: Use them. * htfuzzy/EndingsDB.cc: Use glibc regex functions instead of rx for massive speedups on non-English affix files. * configure, configure.in: Use the system timegm function if present. Don't configure rx since we don't use it any more. Don't worry about tsort since that was only needed for rx. * Makefile.in, Makefile.config.in: Ignore the rx directory if present. Thu Mar 25 12:24:18 1999 Gilles Detillieux * installdir/long.html, installdir/short.html: Remove backslashes before quotes in HTML versions of the builtin templates. * Makefile.in: Add long.html & short.html to COMMONHTML list, so they get installed in common_dir. Thu Mar 25 11:45:59 1999 Gilles Detillieux * htsearch/Display.cc(displayMatch), htcommon/defaults.cc, htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: Add date_format attribute suggested by Marc Pohl. Thu Mar 25 09:49:33 1999 Gilles Detillieux * htsearch/Display.cc(displayMatch): Avoid segfault when DocAnchors list has too few entries for current anchor number. Wed Mar 24 12:20:02 1999 Gilles Detillieux * htdig/main.cc (main): Call HtWordType::Initialize. (Missed this one yesterday. Oops!) Tue Mar 23 17:11:46 1999 Gilles Detillieux * backport Hans-Peter Nilsson's suite of changes for HtWordType and extra_word_characters support, to 3.1.2... * htlib/HtWordType.h (class HtWordType): New. * htlib/HtWordType.cc: New. * htlib/Makefile.in (OBJS): Add HtWordType.o * htdoc/attrs.html: Document attribute extra_word_characters. * htdoc/cf_byprog.html: Ditto. * htdoc/cf_byname.html: Ditto. * htcommon/defaults.cc (defaults): Add extra_word_characters. * htsearch/htsearch.h: Lose spurious extern declaration of unused variable valid_punctuation. * htsearch/htsearch.cc (main): Call HtWordType::Initialize. (setupWords): Use HtIsWordChar, HtIsStrictWordChar and HtStripPunctuation. Do not read valid_punctuation. * htsearch/Display.cc (excerpt): Use HtIsStrictWordChar. * htlib/StringMatch.cc (FindFirstWord): Ditto. (CompareWord): Ditto. * htdig/Retriever.h (class Retriever): Lose member valid_punctuation. * htdig/Retriever.cc (Retriever): Lose its initialization. * htdig/Postscript.h (class Postscript): Lose member valid_punctuation. * htdig/Postscript.cc (Postscript): Lose its initialization. (flush_word): Use HtStripPunctuation. (parse_string): Use HtIsWordChar, HtIsStrictWordChar and HtStripPunctuation. * htdig/Parsable.h (class Parsable): Lose member valid_punctuation. * htdig/Parsable.cc (Parsable): Lose its initilization. * htcommon/WordList.cc (valid_word): Use HtIsStrictWordChar. (BadWordFile): Use HtStripPunctuation. Do not read valid_punctuation. * htcommon/DocumentRef.cc (AddDescription): Use HtIsWordChar, HtIsStrictWordChar and HtStripPunctuation. Do not read valid_punctuation. * htdig/PDF.cc (parseString): Similar.. * htdig/HTML.cc (parse): Similar. * htdig/Plaintext.cc (parse): Similar. Tue Mar 23 15:52:33 1999 Gilles Detillieux * .version: Bump to 3.1.2-dev. Tue Mar 23 14:50:37 1999 Gilles Detillieux * htlib/String.cc: Fix up code to be cleaner with memory allocation, inline next_power_of_2, fix some memory leaks. (Geoff's changes of Feb 22-25) Tue Mar 23 14:35:37 1999 Gilles Detillieux * htlib/HtWordCodec.cc(HtWordCodec): Fix bug with constructing from uninitialized variables! * htlib/HtURLCodec.cc (~HtURLCodec): Add missing deletion of myWordCodec. Tue Mar 23 14:18:16 1999 Gilles Detillieux * htdig/PDF.cc(parseString): Use minimum_word_length instead of hardcoded constant. Tue Mar 23 12:02:00 1999 Gilles Detillieux * htsearch/Display.cc(generateStars): Add in support for use_star_image which was lost when template support was put in way back when. Tue Mar 23 11:47:52 1999 Gilles Detillieux * Makefile.in: add missing ';' in for loops, between fi & done Mon Mar 22 19:26:56 1999 Gilles Detillieux * htcommon/DocumentRef.cc(AddDescription): Check to see that description isn't a null string or contains only whitespace before doing anything. Mon Mar 22 19:21:16 1999 Gilles Detillieux * htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Fix #ifdef problems with zlib. Mon Mar 22 19:14:40 1999 Gilles Detillieux * htdoc/attrs.html (template_name): Typo; used by htsearch, not htdig. Mon Mar 22 19:10:56 1999 Gilles Detillieux * htdig/Retriever.cc (got_href): Check if the ref is for the current document before adding it to the db. (From H-P Nilsson, Mar 8) Mon Mar 22 19:03:23 1999 Gilles Detillieux * htdoc/attrs.html: Rephrase and clarify entry for url_part_aliases. (From Hans-Peter Nilsson, Mar 2) Mon Mar 22 18:48:10 1999 Gilles Detillieux * htfuzzy/Synonym.cc: Fix minor memory leak. * htlib/Dictionary.h, htlib/Dictionary.cc(hashCode): Check if key can be converted to an integer using strtol. If so, use the integer as the hash code. (Geoff's patch) Mon Mar 22 18:23:11 1999 Gilles Detillieux * htlib/List.cc(Nth): Check for out-of-bounds requests before doing anything. Mon Mar 22 17:50:47 1999 Gilles Detillieux * htsearch/Display.cc(display): Free DocumentRef memory after displaying them. (displayMatch): Fix memory leak when documents did not have anchors, fix problems when documents did not have descriptions. Mon Mar 22 17:32:14 1999 Gilles Detillieux * htmerge/docs.cc(convertDocs): Replace previous verbose patch with H-P Nilsson's. Mon Mar 22 17:13:35 1999 Gilles Detillieux * htdig/Plaintext.cc, htmerge/words.cc: removed Log lines. Mon Mar 22 16:11:31 1999 Gilles Detillieux * htsearch/htsearch.cc: Add patch from Jerome Alet to allow '.' in config field but NOT './' for security reasons. Mon Mar 22 15:56:55 1999 Geoff Hutchison * installdir/long.html, installdir/short.html: Write out HTML versions of the builtin templates. (committed to 3.1.2 by Gilles) * installdir/htdig.conf: Add commented-out template_map and template_name attributes to use the on-disk versions. Mon Mar 22 15:13:33 1999 Gilles Detillieux * htcommon/defaults.cc, htdoc/attrs.html: Change default locale to "C", as H-P Nilsson recommended. * htlib/Configuration.cc(Add): Fix small memory leak in locale code, as Geoff discovered. Mon Mar 22 15:03:10 1999 Gilles Detillieux * contrib/parse_doc.pl: uses pdftotext to handle PDF files, generates a head record with punctuation intact, extra checks for file "wrappers" & check for MS Word signature (no longer defaults to catdoc), strip extra punct. from start & end of words, rehyphenate text from PDFs, fix handling of minimum word length. Mon Mar 22 14:38:01 1999 Gilles Detillieux * htdig/Plaintext.cc(parse): Use minimum_word_length instead of hardcoded constant. Mon Mar 22 14:33:45 1999 Gilles Detillieux * htlib/Configuration.cc(Add): Fix function to avoid infinite loop on some systems, which don't allow all the letters in isalnum() that isalpha() does, e.g. accented ones. * htdig/HTML.cc: Fix three reported bugs about inconsistent handling of space and punctuation in title, href description & head. Now makes destinction between tags that cause word breaks and those that don't, and which of the latter add space. Mon Mar 22 14:25:34 1999 Gilles Detillieux * htmerge/docs.cc: Make htmerge -vv report reasons for deleting docs. * htmerge/words.cc(mergeWords): Fix to prevent description text words from clobbering anchor number of merged anchor text words. Fri Mar 19 17:09:21 1999 Gilles Detillieux * htdig/HTML.cc: Fix bug where noindex_start was empty, allow case insensitive matching of noindex_start & noindex_end. * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: Fix inconsistencies in documentation for noindex_start & noindex_end. Fri Mar 19 17:05:16 1999 Gilles Detillieux * htdig/HTML.cc: Add check for tag, terminating it at next href. Fri Mar 19 17:00:18 1999 Gilles Detillieux * htdig/Document.cc: Fix check of Content-type header in readHeader(), correcting bug introduced Jan 10 (for PR#91), and check against allowed external parsers. * htdig/HTML.cc: More lenient comment parsing, allows extra dashes. Fri Mar 19 16:52:51 1999 Gilles Detillieux * htdig/HTML.cc: Check for presence of more than one tag. * htlib/mytimegm.cc: Fix Y2K problems. Fri Mar 19 16:43:28 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htdig/HTML.cc: Add patch from Gabriele to ensure META descriptions are parsed, even if 'description' is added to the keyword list. Fri Mar 19 16:37:08 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htsearch/parser.h, htsearch/parser.cc: Clean up patch made for error messages, made on Feb 16. Tue Feb 16 23:48:09 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * configure.in, configure: Default to 'int' when we cannot establish type used by getpeername. * htdoc/RELEASE.html: Additional notes on everything fixed in 3.1.1. Tue Feb 16 23:45:26 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * contrib/parse_doc.pl: Add replacement for less-capable (and buggy) parse_word_doc.pl script. Handles Word, PS, RTF, and WordPerfect files, with appropriate file->text converters. * htsearch/parser.cc, htsearch/parser.h: Add more error messages when the boolean expression is invalid. Mon Feb 15 21:02:24 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Document.cc(RetrieveLocal): Fix to ensure we report reading only max_doc_size bytes, even when the document is larger. * configure.in, configure: Add 'socklen_t' to getpeername check to prevent problems configuring on Solaris 7. * htdoc/RELEASE.html: Minor changes for 3.1.1 release. Sun Feb 14 16:29:48 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Document.cc(retrieveHTTP, retrieveLocal): Fix document size when the document is larger than max_doc_size. Size should be that sent by the server or as given by stat(). * htdoc/*.html: More cleanups from Marjolein. Sat Feb 13 20:53:34 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Retriever.cc(got_word): Ensure heading is in a normal range. * htdoc/RELEASE.html: Added information on the bugs fixed in 3.1.1. * htdoc/attrs.html: Added info on the changed syntax of the pdf_parser attribute in 3.1.0 and later. Sat Feb 13 20:29:26 1999 Marjolein Katsma <webmaster@javawoman.com> * htdoc/*.html: Cleaned up HTML, fixed typos, added appropriate HTML 4.0 syntax, added DTDs to files, other minor fixed. Fri Feb 12 19:58:28 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * .version: Bump for version 3.1.1. * configure.in, configure: Fix problems determining getpeername syntax under IRIX. * db/os/os_map.c: Fixed problems on AlphaLinux pointed out by Paul J. Meyer. Fri Feb 12 12:00:25 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htdig/ExternalParser.cc: Fix crashes noted by Frank Richter. * contrib/htparsedoc/parse_word_doc.pl: Use updated version (with fixed line breaks). * htnotify/htnotify.cc: Add patch mentioned in Feb 8 documentation change. Thu Feb 11 00:29:42 1999 Hans-Peter Nilsson <hp@axis.se> * htcommon/DocumentRef.cc (NUM_ASSIGN): Expand from unsigned types. (getnum): Use temporary for "unsigned short", and memcpy data into it instead of assignment. Tue Feb 9 19:21:55 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/FAQ.html, htdoc/where.html: Update for 3.1.0 release. * htdoc/uses.html: Added remaining backlog. * htdoc/RELEASE.html: Finish up release notes for 3.1.0. Tue Feb 9 19:19:13 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htdig/ExternalParser.cc: Ensure we remove the temporary file. Mon Feb 8 20:28:07 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/ma_menu: Change relative URLs to absolute URLs to www.htdig.org to reflect the changing mail archive. * htdoc/install.html: Add notes on new configure flags to set CONFIG variables. * htdoc/*.html: Ensure Last Modifed date stamps are up-to-date. Mon Feb 8 20:26:40 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htdoc/meta.html, htdoc/notification.html: Add info on date formats for the htnotify-date tag, esp. in relation to ISO 8601. Sat Feb 6 23:24:19 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/DocumentRef.cc: Fixed compile problem when zlib is disabled. * htdoc/cf_byname, htdoc/cf_byprog.html, htdoc/attrs.html: Added entries for url_log, compression_level, noindex_start, noindex_end, allow_in_form, bad_querystr, no_title_text. * htdoc/THANKS.html: Added Gabriele Bartolini. * htdoc/uses.html, htdoc/FAQ.html, htdoc/bugs.html: Synch with the latest versions from the website tree. Fri Feb 5 19:57:39 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htnotify/htnotify.cc: Add function parse_date() to parse date strings from htnotify-date tags. It tries to be as flexible as possible about formatting and will report invalid dates. Based in part from code contributed by Gabriele Bartolini. Fri Feb 5 19:28:24 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * configure, configure.in: Add a test to ensure the zlib.h header file exists. * include/htconfig.h.in: Added definition for HAVE_ZLIB_H. * htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Add checks for HAVE_ZLIB_H in addition to HAVE_LIBZ. Ensures the library is actually accessible, not just present. * htfuzzy/Soundex.cc: Fix typo. Thu Feb 4 22:51:37 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * Makefile.in: Clean up previous patch and tidy up HTML and dictionary installation. Thu Feb 4 22:31:35 1999 Ric Klaren <klaren@telin.nl> * Makefile.in, */Makefile.in: Add support for $INSTALL_ROOT, making it easier to build packages (e.g. RPMs) into directories for later processing. * htsearch/Display.cc: Tiny patch to silence a compiler warning. Thu Feb 4 13:03:44 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htfuzzy/Soundex.cc(generateKey): Skip initial non-alphabetic characters and explicitly skip characters without values. * htfuzzy/Metaphone.cc(generateKey): General bug-fixing, fixing a bug that corrupted the string to be processed, fixing typos, and ensuring keys generated fit the metaphone algorithm. * htfuzzy/Fuzzy.cc(getWords): Add debugging output of the fuzzy key used. * contrib/doclist/doclist.pl, contrib/doclist/listafter.pl, contrib/whatsnew/whatsnew.pl, contribu/urlindex.pl: Change to support additions to ht://Dig database format. Thu Feb 4 02:09:22 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/htsearch.cc: Add debugging information on words returned from fuzzy matching. * htfuzzy/Metaphone.cc(addWord): Fix bug where only one word would be stored per key in the database. * htfuzzy/Soundex.cc(addWord): Ditto. (generateKey): Rewrite to generate keys correctly. Wed Feb 3 19:24:36 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/htdig.html: Added documentation on the -l log and restart feature. * htdoc/htmerge.html: Added documentation on the -m merge database feature. * htdig/main.cc: Added documentation on the -l flag to the usage message. * .version: Bump to 3.1.0. Wed Feb 3 19:09:31 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htsearch/Display.cc: Add check for URLs with no / in the no_title code. * htdig/Document.cc: Fix problems with dates returned from servers with incorrect formats. Those simply missing the day of week are parsed correctly, otherwise output an error, use the current date, and keep going. Wed Feb 3 09:57:14 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * installdir/nomatch.html: Fix small typo. * htdoc/RELEASE.html: Finish up 3.1.0 release notes. * htdoc/TODO.html: Update with status and new directions. Wed Feb 3 14:22:11 1999 Alexander Bergolth <leo@leo.wu-wien.ac.at> * htsearch/Display.cc(setVariables): Removed some of yesterdays changes. Thanks to Gilles! Tue Feb 2 17:26:06 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htdig/PDF.h, htdig/PDF.cc: Fix problems with PDFs generated by CorelDraw. * htdoc/attrs.html: Fixed small typo. Tue Feb 2 21:02:25 1999 Alexander Bergolth <leo@leo.wu-wien.ac.at> * htsearch/Display.cc(setVariables,createURL): As pointed out by Gilles, append allow_in_form variables to the query strings only if they are given as input parameters. Tue Feb 2 10:29:09 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * configure, configure.in: Rewrite getpeername_length_t detection to use prototypes to eliminate type conversion. * htsearch/Display.cc(buildMatchList): Ensure scores are always positive or zero. Mon Feb 1 22:54:02 1999 Hans-Peter Nilsson <hp@axis.se> * htdoc/attrs.html: Correct "default" for "nothing_found_file". Mon Feb 1 14:44:32 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Display.cc(displayMatch): Remove compiler warnings. * */Makefile.in: Define INSTALL_PROGRAM from configure script. Mon Feb 1 14:04:18 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htdig/ExternalParser.cc: Add checks to prevent wayward parsers from bringing down the dig. Sun Jan 31 23:15:36 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/WeightWord.cc(set): Ensure word is lowercased for accurate fuzzy comparisons. * htfuzzy/Fuzzy.cc(openIndex): Destroy the database reference if we cannot open the database. Fixes a coredump in classes that inherit this method. * Makefile.config.in: Remove bogus definitions of INSTALL. * Makefile.in: Define INSTALL, INSTALL_PROGRAM, INSTALL_SCRIPT, and INSSTALL_DATA as defined by configure. Use them. * htdoc/RELEASE.html: Started release notes for version 3.1.0. Mon Feb 1 04:36:29 1999 Hans-Peter Nilsson <hp@axis.se> * htsearch/Display.cc (displayMatch): Fix leaking user of String(String *). * htfuzzy/Prefix.cc (getWords): Ditto. * htlib/htString.h, htlib/String.cc (String(const String &)): New. * htlib/htString.h, htlib/String.cc (String(const String &, int)): No default argument. * htlib/htString.cc, htlib/String.cc (String(String *)): Removed. Sun Jan 31 21:46:52 1999 Alexander Bergolth <leo@leo.wu-wien.ac.at> * htlib/Connection.cc: Include sys/time.h needed by select, fixes PR #322. Sun Jan 31 20:50:38 1999 Hans-Peter Nilsson <hp@axis.se> * htdig/Retriever.cc (Initial, GetRef, Need2Get, IsValidURL, got_href, got_redirect): Do not lowercase URLs. * htlib/HtURLCodec.h (class HtURLCodec): Fake a friend function. Sat Jan 30 22:29:50 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * configure, configure.in: Add support for program name transformations. * */Makefile.in: Do it. Sat Jan 30 21:16:50 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htmerge/docs.cc: Added translation of Dutch comment for us ignorant Americans. ;-) * installdir/rundig: As mentioned by Gilles, use sed with ls -t test. Add more comments for FAQs. * configure.in, configure: Add --disable-zlib to turn off compiling compression entirely. Add --with-cgi-bin-dir, --with-image-dir and --with-search-dir flags to set CONFIG variables. * CONFIG.in: Use them. Sat Jan 30 21:05:35 1999 Randy Winch <gumby@cafes.net> * htcommon/DocumentRef.h: If using compressed document databases, declare compress and decompress functions and the current state of the head (excerpt). * htcommon/DocumentRef.cc: Change document compression to only compress the DocHead field and only decompress when necessary. Sat Jan 30 03:49:21 1999 Hans-Peter Nilsson <hp@axis.se> * htcommon/DocumentRef.h: Add #ifdef around declaration of c_buffer. * htcommon/DocumentRef.cc: Remove spurious extra "static" from c_buffer definition. Add #ifdef HAVE_LIBZ around it. Fri Jan 29 13:30:11 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/htsearch.cc: Construct the StringMatch used for finding excerpts in two pieces--user input and post-fuzzy matching. Fixes problems with matching searches with punctuation. * htlib/StringMatch.cc(IgnoreCase): Fix small memory leak pointed out by Gilles. Thu Jan 28 21:36:03 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/*.html: Changed copyright information to mention the ht://Dig group, removing Andrew's name. * README, configure.in, Makefile.in: Ditto. * configure: Change mention of libg++ -> libstdc++. Thu Jan 28 12:53:40 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: Document new remove_default_doc attribute. * Makefile.in: Make sure we put the wrapper file in the right place. Make sure dictionaries are installed with the correct permissions. * installdir/rundig: Use a portable test for testing the endings and synonym databases. Also enhanced support for flags (-a, -s, -vvv, -c config). * htsearch/Display.cc: Fix bug when sorting results would cause a coredump. Wed Jan 27 20:00:40 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htdig/HTML.cc, htdig/SGMLEntities.cc, htdig/ExternalParser.cc, htcommon/WordList.cc, htcommon/DocumentRef.cc: Speedup by converting many config lookups into static variables. * htdoc/attrs.html, htdoc/hts_templates.cc, htdoc/cf_byname.html, htdoc/cf_byprog.html: Various minor fixes. * htsearch/Display.cc: Fix problems with star_patterns attribute. Wed Jan 27 13:02:39 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/SGMLEntities.cc: Use StringMatch class for matching " & < and > as defined by config options. Should speed up translation. * htdoc/THANKS.html: Minor updates for contributions towards 3.1.0. Tue Jan 26 19:29:08 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * include/htconfig.h.in: Define TRUE and FALSE if not defined. Change default of NO_WORD_COUNT (now undefined) for compatibility. * htdig/htdig.h: Remove definition of TRUE and FALSE (for consistency). * htcommon/DocumentDB.cc(Add, Delete, Exists, []): Do not lowercase the URL before storing it. URLs can be case-sensitive. Tue Jan 26 19:07:03 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htcommon/defaults.cc: Define remove_default_doc as option of default document to strip off URLs (e.g. /index.html -> /). * htlib/URL.cc(removeIndex): Use it. (normalizePath): Fix bug with stripping double slashes and the like from a query string. * htdig/Document.h, htdig/Document.cc: Add new variable contentLength and consider content-length headers when reading in documents. * htdig/PDF.cc: Fix broken code calling acroread. * htsearch/Display.cc: Allow braces in wrapper file. * htdoc/hts_general.html, htdoc/hts_templates.html: Add info on the wrapper alternative to separate header and footer files. * htdoc/config.html, installdir/header.html, installdir/nomatch.html, installdir/wrapper.html, installdir/search.html: Change sort option to be more grammatically correct. Tue Jan 26 21:19:02 1999 Hans-Peter Nilsson <hp@axis.se> * htmerge/docs.cc (convertDocs): Use HtURLCodec to encode URLs going into the doc_index database. * htsearch/Display.cc (buildMatchList): Use HtURLCodec to decode URLs from docIndex. * htcommon/defaults.cc (defaults): Fix typo with "case_sensitive". Tue Jan 26 18:08:19 1999 Alexander Bergolth <leo@leo.wu-wien.ac.at> * include/htconfig.h.in: Added HAVE_STRINGS_H. (I forgot that when added the configure check.) * htdig/Retriever.h: Fix small compiler error. Removed Log-lines. Tue Jan 26 02:22:45 1999 Hans-Peter Nilsson <hp@axis.se> * htdig/main.cc (main): Fix typo "uncoded_db_compatbile". Mon Jan 25 19:38:31 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htlib/Configuration(Find): Make error message for missing entries conditional to DEBUG symbol. Removes odd error messages under normal use. Sun Jan 24 23:55:57 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htmerge/db.cc, htmerge/docs.cc: Fix compiler errors. * htnotify/htnotify.cc: Similar. Sun Jan 24 14:13:37 1999 Hans-Peter Nilsson <hp@axis.se> * htcommon/WordRecord.h (struct WordRecord): Remove member count if NO_WORD_COUNT defined. * htmerge/db.cc (mergeDB): Remove handling. * htmerge/words.cc (mergeWords): Similar. * include/htconfig.h.in: Define NO_WORD_COUNT by default. Sun Jan 24 14:13:37 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Display.cc(logSearch): Added fix from Gilles in case REMOTE_ADDR is NULL as well. * htnotify/htnotify.cc: Fix compiler warnings. * htlib/String.cc(indexOf): Use autoconf check for strstr, fix compiler warnings. * htlib/Configuration.cc(Find): Complain when option is not in the list. * htdig/HTML.cc(parse): Move declarations out of the loop. (parse): Don't add non-word characters to the excerpt if they're in the title. Fixes PR #80. Mon Jan 25 02:17:58 1999 Hans-Peter Nilsson <hp@axis.se> * htcommon/defaults.cc (defaults): New option "uncoded_db_compatible", default true. * htcommon/DocumentDB.h (DocumentDB::SetCompatibility): New function. (DocumentDB::myTryUncoded): New member. * htcommon/DocumentDB.cc (Constructor, Add(), operator[], Exists(), Delete()): Handle uncoded URL in database if myTryUncoded. * htdig/main.cc (main): Call (DocumentDB::)SetCompatibility() with option "uncoded_db_compatible". * htsearch/Display.cc (Display): Likewise. * htnotify/htnotify.cc (main): Likewise. * htmerge/docs.cc (convertDocs): Likewise. * htmerge/db.cc (mergeDB): Likewise. * htdoc/attrs.html, htdoc/cf_byname.html, htdoc/cf_byprog.html: Document option "uncoded_db_compatible". Sun Jan 24 15:21:02 1999 Hans-Peter Nilsson <hp@axis.se> * htlib/HtWordCodec.cc (HtWordCodec(StringList &, etc)): Check limits separately for "to" and "from". Do not calculate string-lengths separately for limit-checking; use methods Count() and length() on data near the final result. * htlib/HtWordCodec.cc (HtWordCodec constructors): Do not explicitly add '\0' to the pattern strings. * htlib/HtWordCodec.cc (code): Check for zero-length replacement list. Sat Jan 23 22:18:18 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Retriever.cc(parse_url): If a server ignores the If-Modified-Since request, still compare the retrieved date to the stored date to see if it has been modified. Sat Jan 23 13:09:03 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htmerge/htmerge.cc: Unlink the db.docs.index file before we build it again. This ensures we have a clean copy and don't duplicate URLs. Fri Jan 22 23:12:12 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * include/htconfig.h.in: Cleaned up preprocessor definitions. * configure.in, configure: Fix NEED_PROTO_GETHOSTNAME check and make check for GETPEERNAME_LENGTH_T more flexible. * htlib/Connection.cc: Change __sun__ to NEED_PROTO_GETHOSTNAME since we prefer feature tests. Sat Jan 23 02:38:08 1999 Hans-Peter Nilsson <hp@axis.se> * htsearch/Display.cc (logSearch): Fix simple typo in last change. Sat Jan 23 01:18:05 1999 Hans-Peter Nilsson <hp@axis.se> * htlib/String.cc (operator =): Add const modifier: const String &. * htlib/htString.h (String::operator=(const String &)): Ditto. * htlib/DB2_db.h (class DB2_db): Make Put(), Get(), Exists() and Delete() use const modifiers on appropriate parameters. * htlib/DB2_db.cc: Ditto. * htlib/GDBM_db.h (class GDBM_db): Ditto. * htlib/GDBM_db.cc: Ditto. * htlib/Database.h (class Database): Ditto. * htlib/Database.cc (Put): Similar. * htlib/BTree.h (class BTree): Make Put(), Get() and Exists() use const modifiers on appropriate parameters. * htlib/BTree.cc: Ditto. * htcommon/DocumentDB.cc (Add, operator[], Exists, Delete): Remove needless temporary String. * htcommon/DocumentRef.cc (Deserialize): Ditto. Fri Jan 22 21:10:12 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htlib/Configuration.cc: Add support for keyword "include" to include other config files. * htdoc/cf_general.html: Document it. Thu Jan 21 23:25:37 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Display.cc(logSearch): Check if HTTP_REFERER is NULL, if so, use a dash. (Otherwise we'll kill some syslog() services). Thu Jan 21 05:30:40 1999 Hans-Peter Nilsson <hp@axis.se> * htlib/HtURLCodec.h, htlib/HtURLCodec.cc, htlib/HtWordCodec.cc, htlib/HtWordCodec.h, htlib/HtCodec.cc, htlib/HtCodec.h: New files. * htlib/Makefile.in (OBJS): Add the corresponding *.o files * htcommon/DocumentDB.cc (Open, Read, Add, operator[], Exists, Delete, CreateSearchDB, URLs): Use HtURLCodec; ::encode() and ::decode() the URL used as a key. * htcommon/DocumentRef.cc (Serialize): Encode the URL using HtURLCodec. (Deserialize): Decode it. * htmerge/htmerge.h: #include <HtURLCodec.h> * htmerge/htmerge.cc (main): Check HtURLCodec for errors. * htnotify/htnotify.cc (main): Ditto. * htsearch/htsearch.cc (main): Ditto. * htdig/main.cc (main): Ditto. * htcommon/defaults.cc (defaults): Add common_url_parts and url_part_aliases. * htdoc/cf_byprog.html, htdoc/cf_byname.html, htdoc/attrs.html: Document url_part_aliases and common_url_parts. * htlib/StringMatch.h (StringMatch::Pattern): Add default parameter sep = '|'. * htlib/StringMatch.cc (Pattern): Similar. Wed Jan 20 20:20:35 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Display.cc(logSearch): Use REMOTE_ADDR when REMOTE_HOST is unavailable (otherwise we silently dump core). Fixes PR #138. * htcommon/WordList.cc(valid_word): Words cannot be valid if they're shorter than minimum_word_length! Fixes PR #139. * htsearch/Display.cc(expandVariables): Allow variables of the form ${VAR}, fixes PR #121. Wed Jan 20 17:21:33 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htmerge/docs.cc: Fix logic to remove documents--missing else statements allow some "deleted" documents to not be removed. Wed Jan 20 11:52:18 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htlib/good_strtok.h, htlib/good_strtok.cc: Added fixes and speed improvements contributed by Andrew Bishop. * htdig/ExternalParser.cc, htdig/Server.cc, htlib/cgi.cc, htmerge/db.cc, htmerge/words.cc: Call good_strtok with appropriate parameters (explicitly include NULL first parameter, second param is char, not char *). * htcommon/WordList.cc(Word): Added check for adding words with weight zero. * htsearch/Display.h, htsearch/Display.cc: Revised setting ANCHOR variable: it will be empty if there is no excerpt which matches the search formula. Fixes problems with META descriptions. Based on a patch contributed by Marjolein. Wed Jan 20 00:30:12 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/SGMLEntities.cc: Declare extern config, since we now use config options. * htsearch/Display.cc: Fix typo causing compile problems. Tue Jan 19 23:51:38 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/defaults.cc: Added options translate_amp, _lt_gt, _quot as suggested by Marjolein to control SGML translation of these entities. * htdig/SGMLEntities.cc: Use them as contributed by Marjolein. Tue Jan 19 12:55:36 1999 Hans-Peter Nilsson <hp@axis.se> * htlib/StringMatch.cc (Pattern): Always set PreviousState before checking PreviousValue. * htlib/StringMatch.cc (FindFirst): Be "greedy"; match longest. (Compare): Ditto. * htcommon/DocumentRef.cc (MEMCPY_ASSIGN, NUM_ASSIGN): New macros for assigning portably to some possibly-enum numeric type. (getnum): Use them. * htlib/StringMatch.cc (FINAL): Remove. (MATCH_INDEX_MASK): Include highest bit. (Pattern, FindFirst, Compare, FindFirstWord, CompareWord): Do not use FINAL. (FindFirst, Compare, FindFirstWord, CompareWord): When shifting by INDEX_SHIFT, cast to unsigned. Mon Jan 18 17:43:29 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/defaults.cc: Added no_title_text option to allow configuration of the text when no title is available. Default is the filename. * htsearch/Display.cc: Use no_title_text to set the title appropriately, as contributed by Marjolein. * htsearch/Display.cc: Ensure PERCENT variable has a minimum of 1. Mon Jan 18 17:41:44 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htdig/Server.cc: Use max_doc_size when retrieving robots.txt files instead of a hard-coded 10k limit. * htdig/Document.cc: When reading chunks of document, if a chunk puts us over the max_doc_size limit, take everything up to that limit (rather than discarding the entire chunk). * htcommon/DocumentRef.cc: Fix thinko with compression_level. Sun Jan 17 21:48:05 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htdoc/(attrs.html, cf_byname.html, cf_byprog.html, config.html, hts_form.html, hts_templates.html): Add documentation for "sort" config and form input. * htcommon/defaults.cc: Added options "sort" and "sort_names" to pick result sorting order and text names for sort options. * htsearch/Display.cc: Added variable SORT to render a form menu for sort options, based on "sort" and "sort_names" options. * installdir/(wrapper.html, header.html, nomatch.html, footer.html, search.html, syntax.html): Add in sort option to form. Sun Jan 17 14:03:54 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/TemplateList.h htsearch/TemplateList.cc(createFromString): Ensure template_map config has three members for each template we add, contributed by Gabriele Bartolini <tlm@mbox.comune.prato.it>. * htsearch/Display.cc(Display): Take advantage of createFromString returning an error value to bail out of poorly-constructed template_maps, based on code contributed by <tlm@mbox.comune.prato.it>. * htdig/PDF.cc: Add debugging output of URLs causing problems. Also, switch system call to make it easier to call xpdf instead of acroread. * htcommon/defaults.cc: Change default pdf_parser attribute to include acrobat-specific flags. Fix mismatched naming of compression_level (was compression_factor). * htdig/Retriever.cc: Fix compiler warnings. * contrib/examples/updatedig: Added contributed rundig-type script from David Robley <webmaster@www.nisu.flinders.edu.au>. Sun Jan 17 13:42:43 1999 didier Gautheron <dgautheron@magic.fr> * htcommon/defaults.cc: add url_log parameter for save and restart function. * htdig/Retriever.cc, htdig/Retriever.h: Add save and restart function. * htdig/main.cc: Add option -l for save and restart function. * htdig/PDF.cc: Check to see if we have acroread before copying the pdf into TMPDIR! Fri Jan 15 07:23:30 1999 Hans-Peter Nilsson <hp@axis.se> * htcommon/DocumentRef.cc(Serialize): Save space when lengths can fit in an unsigned char or unsigned short. * htcommon/DocumentRef.cc(Deserialize): Handle expansion. Thu Jan 14 23:37:29 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/defaults.cc: Added options noindex_start and noindex_end to enable NOT indexing some sections of HTML. Contributed by Marjolein. * htdig/HTML.cc: Use them. * contrib/examples/rundig.sh: Add rundig example from Colin Viebrock with a few modifications for using less disk space. Thu Jan 14 23:27:24 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htlib/URL.cc: Fix parent path logic to ignore slashes in query string. Noted by Adam Coyne <adam@criticalmass.com>. Thu Jan 14 00:04:03 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * README: Fix for upcoming 3.1.0 release. * htcommon/defaults.cc: Set compression_factor to 0 for default (no compression). Thu Jan 14 03:16:15 1999 Hans-Peter Nilsson <hp@axis.se> * htdig/ExternalParser.cc (parse): Added support for 'm': meta element. * htdoc/attrs.html: Document it. Wed Jan 13 21:31:38 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * Makefile.in(install): Add wrapper.html to the common directory when installing. * contrib/examples: Added directory for example common files (e.g. badwords, dictionaries, templates, etc.) * contrib/examples/badwords: Added example bad_words file by Marjolein. * .version: Bump to 3.1.0dev. * htdig/HTML.cc(parse): Added slight fixes to the comment parsing code, contributed by Marjolein. Wed Jan 13 20:11:26 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htdoc/attrs.html: Fix typo with META example. * htdig/Document.cc: Use new StringList::Join function for http_proxy_exclude. * htnotify/htnotify.cc: Bring latest security patch from 3.1.0b4 onto the mainline source. * installdir/wrapper.html: New file to merge header and footer files. * htcommon/defaults.cc: Added search_results_wrapper for the location of the wrapper file, if used. (The default is empty, which uses header.html and footer.html) * htsearch/Display.cc: Added support for using the wrapper instead of header and footer if search_results_wrapper is set. * htsearch/htsearch.cc: Added check for sort config. * htsearch/Display.cc, htsearch/Display.h: Added support for sorting and reverse sorting by date, time, and score. Wed Jan 13 18:45:17 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/defaults.cc: Removed use_document_compression (redundant) and fixed problem with missing comma. Setting compression_factor to 0 is the equivalent of turning off use_document_compression. * htcommon/DocumentRef.cc(Serialize, Deserialize): Update from Randy Winch to eliminate use_document_compression and fix compilation problems noted by Hans-Peter. * htmerge/db.cc: Fixed problem with db.NextDocID() being set incorrectly, reported by Roman Dimov <roman@mark-itt.ru>. * htcommon/DocumentDB.h: Added IncNextDocID to allow big changes in db.NextDocID(), such as those above. * htdoc/THANKS.html: Added Akos Domotor. Wed Jan 13 07:07:35 1999 Hans-Peter Nilsson <hp@axis.se> * htsearch/htsearch.cc (setupWords): Remove parsedWords parameter with accociated processing of original words - deletion of bad_words, spacing and on-the-fly modifiers. (main): Create originalWords from input, not via setupWords(). Tue Jan 12 09:16:49 1999 didier Gautheron <dgautheron@magic.fr> * htcommon/WordList.cc, htmerge/words.cc: Changed field order in db.wordlist. With the old order, words from HTML body and words from links to that url weren't merged sometimes. * htdig/Document.cc, htmerge/words.cc: Small speed improvements. * htdig/HTML.cc: Fixed small memory leak with bogus HTML and small speedups. * htdig/Retriever.cc(got_href) : if ref exists we have to call AddDescription even if max_hop_count is reached. It's important for wwwoffle (urls in the cache are restricted by max_hop_count) * htcommon/DocumentDB.cc, htcommon/DocumentDB.h, htdig/Retriever.cc, htlib/Dictionary.cc, htlib/Dictionary.h, htlib/Object.cc, htlib/Object.h, htlib/String.cc, htlib/htString.h, htcommon/WordList.cc: Speedups after gprof data. Tue Jan 12 07:23:35 1999 didier Gautheron <dgautheron@magic.fr> * htlib/Configuration.cc: Fixed time format to standard to avoid sending If-Modified-Since http headers in native format (which would be incorrect behavior). Use C locale. * htlib/Dictionary.h, htlib/Dictionary.cc: Add new method GetNextElement to directly return next object when iterating. Tue Jan 12 12:56:26 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/DocumentRef.h, htcommon/DocumentRef.cc(serialize, deserialize): Added support for compressing data using zlib if available, contributed by Randy Winch <gumby@cafes.net>. * htcommon/defaults.cc: Added config options use_document_compression and compression_factor for zlib support. * configure.in, include/htconfig.h.in: Added autoconf check for libz and deflate function. * configure: Generated from above change. Mon Jan 11 22:48:17 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htmerge/db.cc: Fixed thinko with setting the docIDs of new words in the destination wordlist. * htdoc/FAQ.html, htdoc/THANKS.html, htdoc/contents.html: Minor cleanups. * htdoc/RELEASE.html: Added release info from 3.1.0b4. * htdoc/uses.html: Alphabetized, added a form for requests, and added in lots of new sites. Mon Jan 11 02:42:51 1999 Hans-Peter Nilsson <hp@axis.se> * htsearch/htsearch.cc (setupWords): Do not skip words if "boolean" search. Mon Jan 11 00:42:51 1999 Hans-Peter Nilsson <hp@axis.se> * htdoc/hts_method.html: Add explanation of operator "not". * installdir/syntax.html: Added examples of correct logical expressions. Mon Jan 11 00:23:58 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/attrs.html(search_algorithm): Added prefix and substring matching--somehow slipped through the cracks! * htdoc/THANKS.html: Update to be more accurate as far as recent contributions. Sun Jan 10 00:06:59 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Document.cc(readHeader): Added check for header status when considering content-types. Fixed PR #91. Sat Jan 9 20:52:49 1999 didier Gautheron <dgautheron@magic.fr> * htcommon/WordList.cc(valid_word): Break out of looping once we're sure the word is invalid. * htlib/Dictionary.cc(Remove, Exists): Remember special case of an empty dictionary. Sat Jan 9 20:16:25 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/HTML.cc(parse): Don't capitalize headers--this creates problems with non-ASCII values, since String::uppercase doesn't know how to capitalize them. Fixes PR #100. Sat Jan 9 14:47:17 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Document.cc(getdate): Strip off weekday before calling strptime since some servers return invalid weekdays. Fixes PR #79. * htmerge/htmerge.h: Declare new mergeDB code. * htmerge/htmerge.cc: Set up merge_config file and add options for mergeDB code. * htmerge/db.cc: New file. Implements merging of two database sets specified by the merge_config and config variables. * htmerge/Makefile.in: Add db.o as an object to be compiled. Fri Jan 8 20:11:56 1999 Alexander Bergolth <bergolth@ariel.wu-wien.ac.at> * htdig/Plaintext.cc: fixed bug that inhibited compressing of whitespace * htlib/URL.cc: fixed problem in stripping anchors from URLs Thu Jan 7 23:29:32 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/HTML.cc(parse): Corrected problems with parsing comments, as contributed by Marjolein Katsma <webmaster@javawoman.com> and Gilles. * htsearch/Display.cc, htsearch/Display.h: Implement add_anchors_to_excerpt option and new variable ANCHOR as contributed by Marjolein. * htdoc/THANKS.html: Added new contributors. * README: Update for 1999 copyright, version, etc. Thu Jan 7 17:29:52 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/(attrs.html, cf_byname.html, cf_byprog.html): Fix typo noted by Joe Jah: keyword_factor -> keywords_factor. Thu Jan 7 14:32:34 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htsearch/Display.cc (display): The start template, if provided, should come out after the header, not before. * htcommon/defaults.cc, installdir/footer.html: Use the no_page_list_header stuff. Thu Jan 7 11:09:08 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * installdir/*.png: Add PNG versions of the default GIF graphics. Wed Jan 6 22:03:54 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htfuzzy/Synonym.cc, htfuzzy/htfuzzy.cc, htmerge/docs.cc, htmerge/words.cc, htdig/SGMLEntities.cc: Fix minor memory leaks. * htcommon/defaults.cc: Add .bin, .tgz, .rpm, .mov, .mpg, .avi to bad_extensions. * htdoc/attrs.html: Update documentation on default. * installdir/rundig: Removed check for age of synonym and endings DB. Nice feature, but it broke under too many shells. * htlib/DB2_db.cc: Change allocation of database cursors to match API in new version. * htdig/Retriever.cc(got_word): Skip changing to lowercase, we do it in WordList::Word. Wed Jan 6 14:49:47 1999 Gilles Detillieux <grdetil@scrc.umanitoba.ca> * htdoc/attrs.html: Added four new attributes, fixed defaults & typos. * htdoc/cf_byname.html: Added four new attributes. * htdoc/cf_byprog.html: Added four new attributes. Wed Jan 6 14:37:06 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * configure.in: Changed to require Autoconf 2.13 to eliminate bugs obeserved by users with older autoconf versions. * configure: Regenerated using Autoconf 2.13. Wed Jan 6 13:08:26 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/DocumentRef.cc: Applied fix from Dave Alden <alden@math.ohio-state.edu> to compile under SunPRO compilers by eliminating trailing comma in enum. Wed Jan 6 17:50:55 1999 Alexander Bergolth <bergolth@ariel.wu-wien.ac.at> * {.,htcommon,htdig,htfuzzy,htlib,htmerge,htnotify,htsearch}/ Makefile.in, Makefile.config.in: fixed relative path problem if install-sh is used. Wed Jan 6 17:12:04 1999 Alexander Bergolth <bergolth@ariel.wu-wien.ac.at> * htlib/StringList.cc: fixed bug in StringList::Join (oops!) Wed Jan 6 10:34:45 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/DocumentRef.cc(AddDescription): Remove delete instruction that fouls up everything (it was removing descriptions as we add them!). Wed Jan 6 14:52:11 1999 Hans-Peter Nilsson <hp@axis.se> * htlib/String.cc (allocate_space): Add missing [] to delete. Wed Jan 6 05:53:02 1999 Hans-Peter Nilsson <hp@axis.se> * htcommon/DocumentRef.cc(AddDescription): Do not add non-word characters to the wordlist. Wed Jan 6 00:28:19 1999 Hans-Peter Nilsson <hp@axis.se> * htdoc/cf_byname.html: Fixed html syntax "<br" and "/a>". Tue Jan 5 22:40:58 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Display.cc: Check if we need to do backlink and date factoring (e.g. we don't if they're zero!), from a patch by Gilles. Tue Jan 5 20:57:02 1999 Alexander Bergolth <bergolth@ariel.wu-wien.ac.at> * configure.in, htlib/Connection.cc: Check for strings.h for those platforms that don't have it. Tue Jan 5 14:24:52 1999 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/DocumentRef.h: Added comments on the members (fields) of DocumentRef objects. * htcommon/defaults.cc: Added new option max_descriptions for limit on the number of descriptions to store (default 5, matches behavior pre 3.1.0b3). * htcommon/DocumentRef.cc: Support restriction of max_descriptions. * .version: Bump to 3.1.0b5dev. Tue Jan 5 20:07:05 1999 Alexander Bergolth <bergolth@ariel.wu-wien.ac.at> * htdig/Retriever.cc: fixed bug in bad_querystring detection Sat Jan 2 16:39:34 1999 Alexander Bergolth <leo@strike.wu-wien.ac.at> * htdig/main.cc, htlib/Configuration.cc: Added warning message if the locale selection was not successful. (e.g. because the locale definition is not installed) config["locale"] is now set to the return string of setlocale. * {.,htcommon,htdig,htfuzzy,htlib,htmerge,htnotify,htsearch}/ Makefile.in, Makefile.config.in, configure.in: Changed to allow compiling in seperate build directories. Fri Jan 1 05:49:19 1999 Hans-Peter Nilsson <hp@axis.se> * htdoc/attrs.html: Describe more thoroughly how "pdf_parser" is used. * htdoc/attrs.html: Fix typo for anchor/attribute "allow_virtual_hosts". * htdoc/attrs.html: Correct and add more verbose description of external parser program parameters and fields. Sun Dec 27 14:52:45 1998 Alexander Bergolth <leo@strike.wu-wien.ac.at> * htlib/URL.cc: Small change in URL::removeIndex so that URLs are not stripped if a query string ends with /index.html * htsearch/Display.cc, htnotify/htnotify.cc: Added patches from Gilles Detillieux <grdetil@scrc.umanitoba.ca> to fix memory leaks. Sat Dec 19 17:53:44 1998 Alexander Bergolth <leo@strike.wu-wien.ac.at> * htdig/main.cc, htdig/htdig.h, htdig/Retriever.cc: Added new option bad_querystr. Allows exclusion when digging CGI-Scripts. * htsearch/htsearch.cc, htsearch/Display.cc: Added new option allow_in_form. Does currently not work with some special variable names! * htcommon/defaults.cc: Added the two new options. Sat Dec 19 11:21:38 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * contrib/htparsedoc/parse_word_doc.pl: Update from Jesse. * .version: Bump for 3.1.0b4. * README: Ditto. * Makefile.in: Remove references to version number. * htnotify/htnotify.cc: Fix nasty security hole found by Werner Hett <hett@isbiel.ch>. Sat Dec 19 15:22:38 1998 Alexander Bergolth <leo@strike.wu-wien.ac.at> * htlib/StringList.cc, htlib/StringList.h: Added StringList::Join to simplify the creation of patterns for StringMatch. * htlib/String.cc: lastIndexOf(char ch) added * htlib/URL.cc: Changed URL::removeIndex to use local_default_doc. (index.html was hardcoded) local_default_doc can be a list. * htdig/main.cc, htlib/URL.cc: Use StringList::Join. Sun Dec 13 23:06:35 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Display.cc: Fix potential coredump when calculating date_factor and backlink_factor on docs that aren't in the database. Sat Dec 12 23:17:56 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/cf_byname.html, htdoc/cf_byprog.html, htdoc/attrs.html: Added docs for new options since version 3.1.0b2. * htdoc/RELEASE.html: Added notes on changes since 3.1.0b2 (we should keep this up rather than all-at-once). * htdoc/hts_templates: Include documentation on using CGI environment variables in templates with this version. * htdig/Retriever.cc(got_href): Added check to prevent currenthopcount from becoming -1. * htcommon/WordList.cc: Change undefined minimumWordLength to config("minimum_word_length"). Sat Dec 12 12:01:55 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * Makefile.in, Makefile.config.in, */Makefile.in: Added target mostlyclean to clean up, but leave compile-intensive targets (e.g. db, rx code). General cleanup too. * htdoc/where.html: Updated for eventual 3.1.0b3 release. * htcommon/WordList.cc: Added additional cleanups for the words in the bad word file, in case they have invalid punctuation, etc. Sat Dec 12 18:41:29 1998 Alexander Bergolth <leo@strike.wu-wien.ac.at> * htmerge/words.cc: Fix last update so that it compiles on AIX. Fri Dec 11 10:40:48 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Retriever.cc: Added additional debugging info on the reason for excluding a URL, based on a patch by Benoit Majeau <Benoit.Majeau@nrc.ca>. * htmerge/words.cc: Fixed a bug where pointer, rather than strings were assigned. Silly references... * htsearch/Display.cc, htsearch/Display.h: Added patch from Gilles to allow CGI environment variables in templates. * htdig/HTML.cc: Fix core dump when META refresh tags don't have content portions. Thu Dec 10 22:28:44 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Retriever.cc, htdig/Server.cc, htdig/Server.h: Changed support for server_wait_time to use delay() method in Server. Delay is from beginning of last connection to this one. Currently this also delays local digging, which may not be ideal. * htcommon/defaults.cc: Added option for server_max_docs as a limit on the number of docs returned from a server. * contrib/htparsedoc/parse_word_doc.pl: New version from Jesse. New code speedups and better matching of punctuation. * htdig/Document.cc: Check http_proxy_exclude to see if it's empty. If so, use the proxy. Mon Dec 7 21:46:34 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/htsearch.cc: Fix thinko with multiple excludes and restricts. Pointed out by Gilles. * htcommon/defaults.cc: Add new option server_wait_time for the number of seconds to wait between requests. * htdig/Retriever.cc: Use server_wait_time to call sleep() before requests. Should help prevent server abuse. :-) * htcommon/WordList.cc(valid_word): Remove unnecessary code. * htcommon/DocumentRef.cc: Fix typo that added description text that contained punctuation or was too short. Sun Dec 6 13:12:55 1998 Geoff Hutchison <ghutchis@ethel.williams.edu> * htsearch/parser.cc: Check for empty boolean searches and report an error. Fixes bug reported by Chuck O'Donnell <cao@bus.net>. * install-sh, mkinstalldirs: Import latest version from autoconf. * htcommon/DocumentRef.cc: Add the text of descriptions to the word database with weight description_factor. * htcommon/WordList.cc: Ensure duplicate words have minimum location and anchor attributes. * htcommon/WordRecord.h: Ensure blank WordRecords have a default count of 1 since a word has to exist to have a WordRecord! * htdig/ExternalParser.cc, htdig/PDF.cc, htfuzzy/EndingsDB.cc: Ensure temporary files are placed in TMPDIR if it's set. * htdig/Retriever.cc: Don't add the text of descriptions to the word db here, it's better to do it in the DocumentRef itself. * htmerge/words.cc: Check for word entries that are essentially duplicates and compact them. Sat Dec 5 01:10:46 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/THANKS.html: Updated for recent submissions. * htdoc/FAQ.html: Cleaned up title. * htdoc/uses.html: Added more sites and cleaned up the HTML. Fri Dec 4 20:15:41 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * db/os/os_fsync.c, db/mutex/mutex.c: Patch from Klaus Mueller <K.Mueller@intershop.de> to compile under CygWinB20. * htdig/HTML.cc: Fix mistake in last update--file was included twice. * htdig/Retriever.cc: Do a check for blank URLs before adding them to the list to be retrieved. Fri Dec 4 19:21:17 1998 Didier Gautheron <dgautheron@magic.fr> * htdig/HTML.cc: Fix parser bug with < becoming a tag. * htlib/Dictionary.cc: Added check for empty dictionaries. * htlib/URL.cc: Allow server_aliases to work under virtual hosts. * htmerge/htmerge.cc: Remove previous db.words.db file before doing a word merging. Fixes bug with deleted documents keeping entries. * htdig/main.cc, htdig/Retriever.h, htdig/Retriever.cc: Added parameter to Initial function to prevent URLs from being checked twice during an update dig. * htcommon/WordList.cc, htmerge/words.cc: Don't store c:1 and a:0 entries in db.wordlist to save space. Fri Dec 4 19:08:28 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * configure.in, Makefile.in, Makefile.config.in: Remove DB_DIR and RX_DIR. * configure: Regenerated for configure.in changes. * htsearch/htsearch.cc: Added usage message for the command line. Fri Dec 4 18:52:55 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/FAQ.html: Added question about phrase matching. Fri Dec 4 21:21:00 1998 Alexander Bergolth <leo@leo.wu-wien.ac.at> * configure.in: Check if the third argument of getpeername is a size_t* or an unsigned int*. * include/htconfig.h.in: Define GETPEERNAME_LENGTH_T. * htlib/Connection.cc: Use GETPEERNAME_LENGTH_T as the type of the third getpeername argument. Included strings.h which is needed for FD_ZERO on AIX. Thu Dec 3 23:03:15 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * configure.in: Check for getopt.h for those platforms that don't have it. Fix checks for db and rx dirs since these names won't change. * include/htconfig.h.in: Define HAVE_GETOPT_H. * configure: Generate from configure.in with latest autoconf (2.12.2). * htdig/Plaintext.cc: Removed compiler warnings. * htdig/main.cc, htfuzzy/htfuzzy.cc, htmerge/htmerge.cc, htnotify/htnotify.cc, htsearch/htsearch.cc: Use configure check to only include getopt.h when it exists. * htcommon/defaults.cc: Add new option http_proxy_exclude for servers that shouldn't use the proxy, from a patch by Gilles Detillieux. * htdig/Document.h, htdig/Document.cc: Use it, from a patch by Gilles. Tue Dec 1 21:36:37 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * Makefile.in: Fixed bug with "make depend," noted by Morgan Davis <mdavis@cts.com>. * htdig/main.cc, htfuzzy/htfuzzy.cc, htmerge/htmerge.cc, htnotify/htnotify.cc, htsearch/htsearch.cc: Add include <getopt.h> to help compiling under Win32 with CygWinB20. * htdig/Retriever.cc: Update hopcount correctly by taking the shortest paths to documents. * htlib/DB2_db.cc: Added fix from Alexander Bergolth for Berkeley DB under AIX. * htlib/StringMatch.cc: Added fix from Christian Schneider <cschneid@relog.ch>, discovered from behavior with limit_urls_to. Tue Dec 1 18:06:33 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/hts_form.html: Explained why config fields reject periods. * htdoc/FAQ.html: Added information about Internal Server Errors. * htdoc/uses.html: Updated with more sites, change e-mail to Geoff. Sun Nov 29 21:26:56 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/htsearch.cc: Fix last update so it compiles (oops!). * htdig/Document.cc: As above! Sun Nov 29 20:06:58 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/htsearch.cc: Improved support for multiple restrict and exclude patterns, based on code from Gilles Detillieux and William Rhee <willrhee@umich.edu>. * htdig/Document.cc, htdig/PDF.cc: Fixed problems under FreeBSD where <sys/types.h> needed to be before <sys/stat.h>, noted by Gilles. * htdig/Server.cc: Fixed bug with robots.txt files containing tabs, based on patch from Christian Schneider <cschneid@relog.ch>. * htdig/Document.cc: Fixed core dumps caused by mystrptime returning NULL. Instead, we'll use the current timestamp. Noted by Michael Hauber <mhauber@datacore.ch> and <MARK_ALLEYNE@Non-HP-UnitedKingdom-om8.om.hp.com>. Fri Nov 27 19:09:33 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * db/*: Import of Sleepycat's Berkely DB 2.5.9 * rx/*: Import of FSF rx 1.5 * configure, configure.in: Updated to deal with changes in db, rx directories. * Attic/db-2.4.14.tar.gz: Removed old db package for update. * htsearch/parser.cc: Removed bogus code with "%01" -> "|" * htlib/URL.cc: Considers URLs with "%7E" to be equivalent to "~" * htlib/String.cc: Changed MinimumAllocationSize to cut down on memory usage on small strings. * htdig/Retriever.h, htdig/Retriever.cc, htdig/HTML.cc: Changed Retriever::got_word to check for small words, valid_punctuation to remove bugs in HTML.cc. * htcommon/defaults.cc: Changed backlink_factor to 1000, description_factor to 150, match_method to and, and meta_description factor to 50. Should produce more accurate search results. * htcommon/WordList.cc: Fixed bug with bad_words and MAX_WORD_LENGTH, noted by Jeff Breidenbach <jeff@alum.mit.edu>. * README: Updated to reflect bug-tracking system. Tue Nov 24 15:57:28 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Retriever.cc: Added patch to use local_default doc with local_user_urls from Gilles Detillieux <grdetil@scrc.umanitoba.ca>. Mon Nov 23 18:57:16 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/RELEASE.html, htdoc/bugs.html, htdoc/contents.html, htdoc/where.html: Updated for new bug reporting system. * htdoc/TODO.html: Updated To Do w/ current status. Sun Nov 22 14:03:06 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * installdir/rundig: Added checks for synonym databases older than the synonym files. * htcommon/defaults.cc: New config options "description_factor" for weighting words added as link descriptions, and "no_excerpt_show_top" to show the top of an excerpt instead of the "no_excerpt_text". * htdig/Retriever.cc: Use "description_factor" to weight link descriptions with the documents at the end of the link. * htsearch/Display.cc: Adjust date_factor and backlink_factor rankings to produce better results. * htsearch/Display.cc: Use "no_excerpt_show_top." * htsearch/htsearch.cc: Don't remove boolean operators from boolean search strings! Thu Nov 19 01:31:37 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/FAQ.html: Update for -ldb problem on Digital UNIX. Wed Nov 18 05:14:53 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/FAQ.html: Update FAQ w/ new questions, better responses. * htdoc/mailing.html: Mention additional archive at www.mail-archive.com. * htdoc/require.html: Update requirements (libstc++ instead of libg++). Tue Nov 17 23:13:04 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * contrib/wordfreq/wordfreq.pl: Added changes by Isoif. * htsearch/Display.cc: Added HTTP_REFERER to htsearch logging * htdig/Document.cc: Fixed memory leak as a result of thinko. * htcommon/DocumentRef.cc: Removed limit on number of link descriptions. Mon Nov 16 22:30:07 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/defaults.cc: Declare new config options backlink_factor and date_factor for counting document backlink counts and modifed dates in rankings. * htsearch/Display.cc: Use above factors. * htsearch/ResultMatch.cc: Clarify getScore() comments. * htlib/mktime.c: Import new version. * installdir/htdig.conf: Add max_doc_size example (to help w/FAQ). Mon Nov 16 10:46:15 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/ExternalParser.cc: Add checks for null tokens, adapted from patch by Vadim Checkan. * htdig/Retriever.cc: Count docBackLinks accurately (previously all docs had count of 2!). Sun Nov 15 17:04:34 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/HTML.cc(do_tag): Fix for refresh tags w/o URLs. * htmerge/docs.cc, htmerge/words.cc: Change \r to \n, as mentioned by Andrew Bishop. * htcommon/DocumentRef.h, htcommon/DocumentRef.cc: Define new fields docBackLinks (backlink count) and docSig (document signature). * htdig/Retriever.cc: Keep track of docBackLinks. * htsearch/Display.cc: Add variable BACKLINKS to display the count. Sat Nov 14 20:30:18 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/HTML.cc(parse, do_tag): Ensure links respect META robot settings. Patch contributed by Michael Spann <mikes@mail.sv.dialogic.com>. * htdig/HTML.cc(do_tag): Eliminate bug that ignores "?" in URLs * htdig/HTML.cc(do_tag): Add support for META refresh tags as "redirects", submitted by Aidas Kasparas <kaspar@dobilas.infosistema.lt>. Thu Nov 12 04:13:26 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/contents.html: Added link to jitterbug bug db. Sun Nov 8 21:10:19 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/ChangeLog, htdoc/RELEASE.html, htdoc/THANKS.html: Correct spelling error with Rene' Seindal's name. * htdoc/hts_templates.html: Update to improve clarity. Sun Nov 8 20:33:22 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Document.cc: Changed reset to keep proxy settings--fixes bug noted by Didier Gautheron <dgautheron@magic.fr> Fri Nov 6 17:07:00 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * contrib/wordfreq/wordfreq.pl: Updated with patch from Isoif Fettich <ifettich@netsoft.ro> to use Berkeley DB. * contrib/whatsnew/whatsnew.pl: Fixed mistake from Oct 26 change. * contrib/htparsedoc/parse_word_doc.pl: Added file contributed by Jesse. * contrib/README: Updated to include short descriptions of the scripts. * contrib/multidig/*: New scripts to make working with multiple DB a little easier. * configure, configure.in: Added changes to support snapshots. * .version: Resurrected to automate snapshot versions. Wed Nov 4 20:13:10 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdoc/contents.html: Added "Contributors" for THANKS.html * htdoc/THANKS.html: Added acknowledgement to contributors. Wed Nov 4 15:02:43 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htnotify/htnotify.cc: Fixed buglet with -F flag to sendmail. * htdig/Plaintext.cc: Added patch from Vadim Chekan to change char to unsigned char to fix reading Cyrillic plaintext files. Mon Nov 2 15:34:53 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htnotify/htnotify.cc, Makefile.config.in, README: Changed "HTDig" to "ht://Dig." Sun Nov 1 20:34:14 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * Makefile.in: Fixed buglet with dist target. * htdig/Makefile.in: Fixed buglet with distclean target. * htdoc/FAQ.html, htdoc/RELEASE.html, htdoc/attrs.html htdoc/cf_byname.html, htdoc/cf_byprog.html, htdoc/htdig.html htdoc/hts_templates.html: Updated documentation for new features, bug-fixes in ht://Dig 3.1.0b2. * htlib/Makefile.in, htlib/lib.h: Call mytimegm.cc instead of timegm.c. * Attic/makedp: Remove file generated by configure * htdig/Document.cc: Remove const from *ext to fix compiler warning. Sun Nov 1 00:17:08 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Display.cc: Added template var DESCRIPTION as first item in DESCRIPTIONS, as requested by Ryan Scott <test@netcreations.com>. * htlib/mytimegm.cc: Resurrected mytimegm() until problems with glibc version can be solved. * htdig/Document.cc, htdig/Retriever.cc, htfuzzy/Prefix.cc, htsearch/WeightWord.cc, htsearch/htsearch.cc: Replaced system calls with htlib/my* functions. Sat Oct 31 23:58:22 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htlib/URL.cc: Fixed compiler warning. * rx-1.5/Attic/Makefile, rx-1.5/Attic/config.log: Removed useless Makefile and config.log file. Tue Oct 27 22:53:03 1998 Andrew Scherpbier <andrew@contigo.com> * */Makefile.in (depend): Fixed so that 'make depend' works again. (Not sure exactly how long it was broken!) Tue Oct 27 20:00:16 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * Makefile.in: Fix buglet with distclean target * configure configure.in: Added check for LOCALTIME_R, removed test for timegm replacement, changed compiler for most tests to $CC. * include/htconfig.in: Added option for LOCALTIME_R. * htlib/timegm.c, htlib/mktime.c: Fixed some compilation problems. * htlib/Makefile.in: Remove mktime.o since source is included in timegm.o. Tue Oct 27 13:31:25 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htlib/mktime.c: Imported new version from glibc-2.0.99. * htcommon/DocumentDB.cc: Fixed bug noted by Vadim Chekan with CreateSearchDB. Mon Oct 26 15:27:28 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * Makefile.config.in, configure.in, configure: Fixed problem with -ldb, -lrx, etc. not being declared in $LIBS * htdoc/install.html: Added remarks about using ./configure --prefix= * README: Cleaned up for new URLs, version numbers, etc. * htsearch/htsearch.cc: Added patch by Esa Ahola fixing bug with not ingoring bad_words properly. * contrib/whatsnew/whatsnew.pl: Added fix from Jacques Reynes <Jacques.Reynes@cict.fr> to get whatsnew to work with Berkeley DB. * htdig/Retriever.cc, htdig/Document.cc: Fixed bug introduced by Oct 18 change. Authorization will not be cleared. * htlib/URL.cc: Fixed new -Wall warnings. Wed Oct 21 13:30:05 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htlib/timegm.c: Corrected Oct 17 change. Should now work. :-) * htcommon/defaults.cc: Added defaults for new directives server_aliases and limit_normalized. * htdig/HTML.cc: Cleaned up HTML parsing based on patch by Rene' Seindal. Wed Oct 21 18:31:00 1998 Alexander Bergolth <leo@leo.wu-wien.ac.at> * htlib/URL.cc, htlib/URL.h: Added patch to support translation of server names. (Configuration directive: server_aliases) * htdig/Retriever.cc, htdig/htdig.h, htdig/main.cc: Additional limiting after normalization of the URL. (Configuration directive: limit_normalized) Sun Oct 18 17:19:51 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htlib/Connection.h, htlib/Connection.cc: Define new function timeout() as adapted from a patch by Rene' Seindal. * htdig/Document.cc: Use it as adapted from a patch by Rene' Seindal. Sun Oct 18 16:33:58 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/DocumentDB.cc: Changed deserialize function to explicitly delete DocumentRef. * htcommon/DocumentRef.cc: Added trap for DOC_STRING value. * htdig/Retriever.cc: Delete and reallocate Document variable before retrieving. (Fixes database corruption bug) Removed code to add a "/" to every URL with a 404--servers should send a redirect in this case. Sat Oct 17 20:15:44 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htlib/timegm.c: Declare __gmtime_r if not defined Sat Oct 17 10:15:57 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * configure.in: Fixed problem with configuring DB_DIR introduced by Oct 11 change. * configure: Regenerated by autoconf for above fix. * htlib/Connection.h, htlib/Connection.cc: Included fixes sent by Paul J. Meyer <pmeyer@rimeice.msfc.nasa.gov> to fix connections on Dec Alpha environments. * htsearch/Display.cc, htsearch/Display.h, htdoc/hts_templates.html: Added variable CURRENT as the number of the current match, adapted from a patch by Rene' Seindal <seindal@webadm.kb.dk> * htcommon/defaults.cc: Changed htdig.sdsu.edu to www.htdig.org in start_urls Wed Oct 14 03:43:22 1998 turtle <turtle@kiwi> * installdir/htdig.conf: fixed broken link pointed out by chris@impulsedata.net, moved maintainer stuff up in the file Sun Oct 11 22:16:27 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htlib/DB2_db.cc: Added fix suggested by Domotor Akos <dome@impulzus.sch.bme.hu> with (char *)NULL cast. * htlib/Attic/mytimegm.cc: Removed old mytimegm function. * installdir/syntax.html: Improved boolean method error message. It now gives examples of boolean expressions. * htcommon/defaults.cc, htsearch/Display.cc, htsearch/Display.h, htsearch/parser.cc: Added htsearch logging patch from Alexander Bergolth. * */Makefile.in, include/htconfig.h.in, htdig/Document.cc, htdig/Images.cc, Attic/.version, Makefile.config.in, Makefile.in, configure, configure.in, mkinstalldirs: Updated Makefiles and configure variables. * htfuzzy/Endings.cc, htfuzzy/Fuzzy.cc, htfuzzy/Prefix.cc, htfuzzy/htfuzzy.cc, htlib/DB2_db.cc, htcommon/DocumentDB.cc: Removed more -Wall warnings. Fri Oct 9 00:29:18 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Retriever.cc: Fixed typo with "meta_desription_factor". * htdig/Images.cc: Use user_agent config in GET request. Thu Oct 8 09:05:41 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * installdir/syntax.html: Improved Boolean search description. Mon Oct 5 11:30:16 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * contrib/ewswrap/ewswrap.cgi, contrib/ewswrap/htwrap.cgi, contrib/ewswrap/README: New scripts, contributed by John Grohol PsyD <johngr@cmhcsys.com>. Fri Oct 2 13:11:24 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Retriever.cc: Added check for docs removed with noindex. Now words in these docs should be ignored for the word db. Fri Oct 2 13:09:04 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * CONFIG Makefile.config.in Makefile.in */Makefile.in, htcommon/defaults.cc htdig/main.cc, htfuzzy/htfuzzy.cc, htmerge/htmerge.cc, htnotify/htnotify.cc include/htconfig.h.in: More configure improvements--use top_srcdir instead of HTDIG_TOP, use PACKAGE, VERSION, etc. Fri Oct 2 11:32:59 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htlib/StringList.cc: Added patch by Alexander Bergolth for bug with multiple delimeter characters Fri Oct 2 15:22:06 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * installdir/rundig, configure.in, CONFIG, CONFIG.in, aclocal.m4, configure: Improvements in configure.in, notably using --prefix= and --exec-prefix= Tue Sep 29 19:26:11 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/HTML.cc: Added patch from Tim Frost <tim@nz.eds.com> for single quotes around URLs. * htfuzzy/Prefix.cc: Added patch from Esa to fix Prefix matching for capitalization. * htcommon/defaults.cc: Added modification_time_is_now config * htdig/Document.cc:, htdig/Retriever.cc: Added patch from Andrew Bishop <amb@gedanken.demon.co.uk> for above to use modification times when servers do not supply them. * htsearch/htsearch.cc: Added patch from Andrew Bishop for -c switch. Wed Sep 23 14:46:34 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/defaults.cc, htdig/Server.cc: Added case_sensitive attribute to work on case insensitive servers. Wed Sep 23 11:58:22 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Display.cc: re-fixed bug noted by Alexander Bergolth * htlib/Attic/timegm.cc, htlib/Makefile.in, htlib/mktime.c, htlib/mytimegm.cc, htlib/timegm.c: Switched to using glibc timegm replacement. * configure, configure.in, Makefile.config.in: Add configure searches for acroread and sendmail programs. * htnotify/Makefile.in, htnotify/htnotify.cc, htcommon/Makefile.in, htcommon/defaults.cc: Use them. * htdig/HTML.cc: Fix thinko in META robots tag. * htcommon/defaults.cc: Define iso_8601 date formatting option * htsearch/Display.cc, htnotify/htnotify.cc: Use it as suggested by Knut A. Syed <Knut.Syed@nhh.no> Fri Sep 18 14:35:02 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Display.cc: Fixed bug noted by Alexander Bergolth <leo@strike.wu-wien.ac.at> in exclude logic * htdig/HTML.cc: Fixed bug in comma-separated keywords noted by <C.H.Liddiard@qmw.ac.uk> * installdir/synonyms: New version contributed by John Banbury <lijab@flinders.edu.au> Fri Sep 18 00:38:09 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * .version: Bump to 3.1.0b2 * htsearch/Makefile.in, htdig/Makefile.in, htfuzzy/Makefile.in, htlib/Makefile.in, htmerge/Makefile.in, htnotify/Makefile.in, htcommon/Makefile.in: Remove include .sniffdir directive. * htdig/HTML.cc: Fix horrible META description coding. * htfuzzy/EndingsDB.cc, htfuzzy/Fuzzy.cc htfuzzy/Synonym.cc, htfuzzy/htfuzzy.cc: Change "\r" to "\n" in statistics on suggestion of Andrew M. Bishop <amb@gedanken.demon.co.uk> * Makefile.config.in: Remove -ggdb from LDFLAGS. Tue Sep 15 22:31:48 1998 turtle <turtle@kiwi> * Makefile.in: add substitution for @DATABASE_DIR@ Thu Sep 10 00:06:58 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/HTML.cc: Change debug level of META tags. * htsearch/TemplateList.cc, htsearch/htsearch.cc, htsearch/Display.cc, htsearch/Display.h: Backed out builtin-long default from Monday, now use error handler Mon Sep 7 23:19:12 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * contrib/htparsedoc: Added contributed external parser for MS Word documents by Richard Jones <rjones@imcl.com>. * htdig/Document.cc: Added fix to use htparsedoc. * htdoc/*.html: Merged in new documentation for htdig-3.1.0b1. * htdig/HTML.cc: Extended "noindex" behavior in previous patch. * htcommon/defaults.cc: Added user_agent config option. * htdig/Document.cc: Use it. Mon Sep 7 00:34:19 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/DocumentRef.h: Added DocState for documents marked as "noindex". * htdig/HTML.cc, htdig/Retriever.h, htdig/Retriever.cc, htmerge/docs.cc: Use it to remove them. * htsearch/TemplateList.cc: Add default template of builtin-long to slot 0 in case of an error. * htsearch/Display.cc: Use it. Sun Sep 6 21:36:16 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htcommon/defaults.cc: Sorted the current list of defaults, added "pdf_parser" for the program to use in PDF.cc. * htdig/PDF.cc: Use it, checking for the file before calling system to fail gracefully. * htlib/URL.cc: Bug fix for http:/ v. http:// Sat Sep 5 23:11:48 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htlib/String.cc: Added patch by Zvi Har'El <rl@math.technion.ac.il> to indexOf function to prevent "false positive" matches. * installdir/nomatch.html, installdir/syntax.html: Fixed reference to ht://Dig 3.0. * htdig/Document.cc: Use robotstxt_name as user-agent as a more consistent approach. * htsearch/parser.cc: Convert "%01" to "|" to support <SELECT ... MULTIPLE> tags. Thu Sep 3 20:53:51 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Makefile.in: Remove reference to -lgdbm * htsearch/Display.cc: Send Content-type header after all variable expansion is completed. * htcommon/WordList.cc: Removed warning under egcs-1.1 Tue Aug 11 08:58:34 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Display.cc, htdig/Retriever.h, htdig/Retriever.cc, htdig/Parsable.h, htdig/Parsable.cc, htdig/HTML.h, htdig/HTML.cc, htcommon/defaults.cc, htcommon/DocumentRef.h, htcommon/DocumentRef.cc, htcommon/DocumentDB.cc: Second patch for META description tags. New field in DocDB for the desc., space in word DB w/ proper factor. * htmerge/docs.cc: Added statistic for total size of docs in DB. Thu Aug 6 10:15:22 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Retriever.cc: Added "local_dir_doc" config option, the default filename in a directory. * htcommon/defaults.cc: Fixed "elipses" spelling mistake, local_dir_doc as above Tue Aug 4 11:34:46 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htlib/Configuration.cc: Added fix by Philippe Rochat <prochat@lbdsun.epfl.ch> to remove whitespace after config options. * htdig/HTML.cc, htdig/HTML.h: Added support for META robots tags. Mon Aug 3 16:50:46 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/ResultList.cc, htnotify/htnotify.cc, htmerge/htmerge.cc, htmerge/docs.cc, htlib/String.cc, htlib/ParsedString.cc, htfuzzy/Substring.cc, htfuzzy/Prefix.cc, htfuzzy/Exact.cc, htdig/SGMLEntities.cc, htdig/Retriever.cc, htdig/PDF.cc, htdig/HTML.cc, htdig/Document.cc: Fixed compiler warnings under -Wall Mon Aug 3 05:56:23 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Display.cc: Spelling correction for "ellipses" Thu Jul 23 12:14:34 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/PDF.cc, htdig/PDF.h, htdig/Document.cc: Added files (and patch) from Sylvain Wallez for PDF parsing. Incorporates fix for non-Adobe PDFs. * htcommon/defaults.cc: Removed .pdf extension from bad_extensions. Wed Jul 22 10:04:31 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Display.cc: Added patch from Sylvain Wallez <s.wallez.alcatel@e-mail.com> to use the filename if no title is found. * htnotify/htnotify.cc: Added patch from Chris Jason Richards <richards@cs.tamu.edu> to fix problems with sendmail. Tue Jul 21 09:56:58 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htsearch/Display.cc: Added patch by Rob Stone <rob@psych.york.ac.uk> to create new environment variables to htsearch: SELECTED_FORMAT and SELECTED_METHOD. Sun Jul 19 09:51:47 1998 Andrew Scherpbier <andrew@contigo.com> * configure.in (berkeley db stuff): Added the berkeley db .tar.gz to the distribution and modified configure.in to extract it if it needs to. Thu Jul 9 09:39:01 1998 Geoff Hutchison <ghutchis@wso.williams.edu> * htdig/Server.cc, htdig/Retriever.h, htdig/Retriever.cc, htdig/Document.h, htdig/Document.cc, htcommon/defaults.cc: Added support for local file digging using patches by Pasi Eronen <pe@iki.fi>. Patches include support for local user (~username) digging. * htdig/HTML.h, htdig/HTML.cc, htcommon/defaults.cc: Added support for META name=description tags. Uses new config-file option "use_meta_description" which is off by default. Mon Jun 22 05:02:01 1998 turtle <turtle@kiwi> * configure.in: Added test to make sure that the berkeley db library is present * .cvsignore: Ignore the berkeley db library * configure: changed * Makefile.config.in: Removed GDBM references * Makefile.in: Removed GDMB references * .version: updated version to 3.1.0b1 * README: Updated version # and website location * htdig/HTML.cc: Applied patch that prevented SGML entities that translate to valid_punctuation characters from becoming part of words * configure.in: Removed references to GDBM * htcommon/defaults.cc: Got rid of my email address as the default maintainer * htdig/htdig.conf: simple config file for development * htlib/String.cc, htlib/Attic/SDSU.h, htlib/Attic/SDSU.cc, htlib/DB2_db.cc, htlib/Connection.cc, htlib/Configuration.cc, htlib/BTree.cc: New Berkeley database stuff * htlib/.sniffdir/ofiles.incl: removed SDSU.* * installdir/syntax.html, installdir/search.html, installdir/rundig, installdir/nomatch.html, installdir/htdig.conf, installdir/footer.html: Changed to use the new http://www.htdig.org/ instead of the sdsu site Sun Jun 21 23:20:14 1998 turtle <turtle@kiwi> * rx-1.5/rx/Attic/config.log, htsearch/htsearch.cc, htsearch/Attic/display.cc, htsearch/Display.cc, htmerge/docs.cc, htlib/.sniffdir/ofiles.incl, htlib/Database.h, htlib/DB2_db.cc, htlib/DB2_db.h, htlib/Database.cc, htfuzzy/.sniffdir/ofiles.incl, htfuzzy/Prefix.cc, htfuzzy/Prefix.h, htfuzzy/Makefile.in, htfuzzy/Fuzzy.cc, htcommon/defaults.cc, configure.in, Makefile.in, Makefile.config.in: patches by Esa and Jesse to add BerkeleyDB and Prefix searching Mon Jun 15 18:15:50 1998 turtle <turtle@kiwi> * htdig/HTML.cc: Added suggestion by Chris Liddiard to add ',' to the list of separator characters for meta keyword parsing Tue May 26 03:58:14 1998 turtle <turtle@kiwi> * rx-1.5/rx/Attic/config.log, htlib/htString.h, htlib/cgi.cc, htlib/URL.cc, htlib/String.cc, htlib/ParsedString.cc, htlib/Database.cc, htlib/Connection.cc: Got rid of compiler warnings. * rx-1.5/rx/.cvsignore: added config.log Fri Apr 3 17:10:44 1998 turtle <turtle@kiwi> * htsearch/Display.cc: Patch to make excludes work Tue Mar 10 16:02:32 1998 turtle <turtle@kiwi> * htlib/strcasecmp.cc: Applied patch by Bernhard Griener to add arguments checks in the mystrncasecmp() function Sun Feb 22 17:43:49 1998 turtle <turtle@kiwi> * htdoc/mailing.html: New mailing list archive location Tue Feb 17 18:05:40 1998 turtle <turtle@kiwi> * htdoc/uses.html: added new one Thu Feb 12 22:22:15 1998 turtle <turtle@kiwi> * htdoc/uses.html: Added more sites Mon Jan 5 06:14:11 1998 turtle <turtle@kiwi> * configure, configure.in: Added check for fstream.h to get rid of the annoying emails about ht://Dig not compiling... * Makefile.config.in: Added include of the GDBM library back * .version: Now at version 3.0.9 * include/htconfig.h.in: Changed refs to time related stuff * htmerge/htmerge.cc, htmerge/docs.cc: format changes * htdig/Document.cc: Changed tm from pointer to real structure * htlib/.sniffdir/ofiles.incl, htlib/timegm.cc: Our own timegm function * rx-1.5/rx/.cvsignore, rx-1.5/rx/Attic/Makefile: cvs cleanup * htmerge/docs.cc: Fixed memory leak * htlib/lib.h: Added own replacement of timegm() * htlib/Dictionary.cc: Fixed memory leaks * htlib/Connection.cc: Fix by Pontus Borg for AIX. Changed 'size_t' to 'unsigned long' for the length parameter for getpeername() * htfuzzy/Metaphone.cc: formatting changes * htdig/Retriever.cc: fixed memory leak * htdig/Document.cc: * Alarm was not cancelled if readHeader returned anything but OK * Use our own timegm() replacement if necessary * htcommon/DocumentRef.h, htcommon/DocumentRef.cc: format changes * htcommon/DocumentDB.h: reformatting * htcommon/DocumentDB.cc: Fixed major memory leak * include/.cvsignore, include/Attic/htconfig.h, rx-1.5/.cvsignore, rx-1.5/Attic/config.cache, rx-1.5/Attic/config.status, rx-1.5/rx/.cvsignore, rx-1.5/rx/Attic/config.status, htlib/Attic/htlib.proj, htmerge/.cvsignore, htmerge/Attic/htmerge.proj, htnotify/.cvsignore, htnotify/Attic/htnotify.proj, htsearch/.cvsignore, htsearch/Attic/htsearch.proj, Attic/config.cache, htcommon/Attic/htcommon.proj, htfuzzy/.cvsignore, htfuzzy/Attic/htfuzzy.proj, lookfor: General cleanup of archived stuff * .cvsignore: config.cache added * htdig/.cvsignore: Added htdig Tue Dec 16 15:57:22 1997 turtle <turtle@kiwi> * htdig/Document.cc: Added little patch by Tobias Oetiker <oetiker@ee.ethz.ch> that should fix problems with timeouts. Thu Dec 11 00:28:59 1997 turtle <turtle@kiwi> * htlib/URL.h, htlib/URL.cc: Added double slash removal code. These were causing loops. Thu Oct 23 18:01:10 1997 turtle <turtle@kiwi> * htlib/Connection.cc: Fix by Pontus Borg for AIX. Changed 'size_t' to 'unsigned long' for the length parameter for getpeername() Mon Oct 13 02:13:52 1997 turtle <turtle@kiwi> * htdig/Attic/Makefile, htdig/Attic/htdig.proj: remove files that shouldn't be in the repository * htdig/.cvsignore: Ignore Makefile * htdoc/cf_byname.html, htdoc/cf_byprog.html, htdoc/attrs.html, htdoc/ChangeLog: Added documentation for the external_parsers attribute. Mon Jul 14 15:32:22 1997 turtle <turtle@kiwi> * htdoc/uses.html: added cambridge Wed Jul 9 15:57:30 1997 turtle <turtle@kiwi> * htdoc/uses.html: added the rhodos project Mon Jul 7 22:15:45 1997 turtle <turtle@kiwi> * htdig/Document.cc: Removed old getdate() code that replaced '-' with ' '. * htlib/URL.cc: Sequences of "/./" are now replaced with "/" to reduce the chance of infinite loops * htdig/Document.cc: Added better date parsing. Now also supports the old RFC 850 format Thu Jul 3 17:44:39 1997 turtle <turtle@kiwi> * htdoc/cf_byname.html, htdoc/cf_byprog.html, htcommon/defaults.cc, htdig/htdig.h, htdoc/attrs.html, htlib/Configuration.h, htlib/URL.cc, htdig/Attic/Makefile, htdig/Document.cc: Added support for virtual hosts Mon Jun 30 17:07:49 1997 turtle <turtle@kiwi> * htdoc/uses.html: Added Depaul university Tue Jun 24 14:59:45 1997 turtle <turtle@kiwi> * Makefile.in: Fixed syntax error in the installation target. Mon Jun 23 17:33:14 1997 turtle <turtle@kiwi> * htdig/Attic/teamball.conf, htdig/Attic/tsdsu.conf, htdig/Attic/rohan.conf, htdig/Attic/sdsu.conf, htdig/Attic/t.conf, htdig/Attic/nsdsu.conf, htdig/Attic/daztec.conf, htdig/Attic/max.conf, htdig/htdig.conf, htdig/Attic/Makefile, htdig/Attic/catalog.conf: Removed old config files * htdoc/FAQ.html: FAQ initial * htdoc/contents.html: Added link to the new FAQ * htdoc/FAQ.html: *** empty log message *** * htnotify/htnotify.cc: Added version info to the usage output * htfuzzy/htfuzzy.cc: Added version info the usage output * htmerge/htmerge.cc: Added version info to usage message * htdig/main.cc: Added version info to the usage message Mon Jun 16 15:35:56 1997 turtle <turtle@kiwi> * installdir/footer.html: Changed the hardcoded version number to the new VERSION variable * htdoc/hts_templates.html: Added docs for the VERSION and PERCENT variables * htsearch/Display.cc: Added PERCENT and VERSION variables for the output templates Sat Jun 14 18:52:42 1997 turtle <turtle@kiwi> * htdig/Document.cc: Made redirect detection code more general Fri Jun 13 05:31:17 1997 turtle <turtle@kiwi> * htdoc/cf_general.html: Fixed typo Thu Jun 5 15:00:53 1997 turtle <turtle@kiwi> * htdoc/uses.html: added VG Gas Analysis Systems Tue Jun 3 17:49:05 1997 turtle <turtle@kiwi> * installdir/english.0.original, installdir/english.0: Added new english dictionary for the endings algorithm Thu May 29 14:56:40 1997 turtle <turtle@kiwi> * htdoc/uses.html: Added Indiana University Computer Security Office Wed May 28 14:47:25 1997 turtle <turtle@kiwi> * htdoc/main.html: Fixed typo Mon May 19 15:23:18 1997 turtle <turtle@kiwi> * htdoc/uses.html: Added daily californian online Tue May 13 19:28:32 1997 turtle <turtle@kiwi> * htdoc/uses.html: Added The Reohr Group * htdoc/uses.html: Added the Linux Documentation Project Sun May 11 17:52:05 1997 turtle <turtle@kiwi> * htdoc/index.html: Made the contents frame a little wider so that text doesn't wrap * htdoc/uses.html: Added NOVA and Gajo & Associati Fri May 2 23:35:56 1997 turtle <turtle@kiwi> * htdoc/uses.html: added www.bajan.org Wed Apr 30 22:28:28 1997 turtle <turtle@kiwi> * htdoc/uses.html: Added Caldera, Inc. Sun Apr 27 14:43:31 1997 turtle <turtle@kiwi> * htsearch/parser.cc, htsearch/parser.h, include/Attic/htconfig.h, htdoc/RELEASE.html, htdoc/uses.html, htdoc/where.html, htlib/URL.cc, htlib/strcasecmp.cc, htsearch/htsearch.cc, .version, README, htdig/Attic/Makefile, htdoc/ChangeLog: changes Mon Apr 21 15:44:39 1997 turtle <turtle@kiwi> * htsearch/htsearch.cc: Added code to check the search words against the minimum_word_length attribute Sun Apr 20 15:27:37 1997 turtle <turtle@kiwi> * CONFIG: Made paths more generic * htdig/Document.cc: Added include for ctype.h * htdig/Plaintext.cc: Fixed bug Tue Apr 1 17:56:57 1997 turtle <turtle@kiwi> * htdoc/uses.html: added ukc Sun Mar 30 01:18:16 1997 turtle <turtle@kiwi> * htdig/Attic/Makefile, htdoc/uses.html, Attic/Makefile.config, Attic/config.log, Attic/config.status, .cvsignore, Attic/Makefile, htsearch/Attic/Makefile, htsearch/.cvsignore, htnotify/Attic/Makefile, htnotify/.cvsignore, htmerge/.cvsignore, htmerge/Attic/Makefile, htlib/.cvsignore, htlib/Attic/Makefile, htfuzzy/.cvsignore, htfuzzy/Attic/Makefile, htcommon/.cvsignore, htcommon/Attic/Makefile: update Thu Mar 27 00:06:05 1997 turtle <turtle@kiwi> * htdig/Plaintext.cc: Applied patch supplied by Peter Enderborg <pme@ufh.se> to fix a problem with a pointer running off the end of a string. Mon Mar 24 04:33:26 1997 turtle <turtle@kiwi> * rx-1.5/rx/Attic/config.log, rx-1.5/rx/Attic/config.status, htsearch/htsearch.h, htsearch/parser.h, include/Attic/htconfig.h, rx-1.5/Attic/config.status, htsearch/Attic/Makefile, htsearch/ResultList.cc, htsearch/ResultMatch.h, htsearch/Template.h, htsearch/WeightWord.h, htlib/cgi.cc, htlib/htString.h, htlib/io.cc, htmerge/Attic/Makefile, htmerge/htmerge.h, htnotify/Attic/Makefile, htlib/StringList.cc, htlib/StringList.h, htlib/String_fmt.cc, htlib/URL.h, htlib/URLTrans.cc, htlib/Attic/SDSU.cc, htlib/Attic/String.h, htlib/ParsedString.h, htlib/String.cc, htfuzzy/htfuzzy.cc, htlib/Attic/Makefile, htlib/Configuration.cc, htlib/Connection.cc, htlib/Database.h, htdig/URLRef.h, htfuzzy/Attic/Makefile, htfuzzy/Exact.cc, htfuzzy/Fuzzy.h, htfuzzy/Substring.cc, htfuzzy/SuffixEntry.h, htdig/Plaintext.cc, htdig/Postscript.cc, htdig/SGMLEntities.cc, htdig/Server.cc, htdig/Server.h, htdig/Attic/Makefile, htdig/ExternalParser.cc, htdig/ExternalParser.h, htdig/Parsable.h, htcommon/Attic/Makefile, htcommon/DocumentRef.h, htcommon/WordList.cc, htcommon/WordList.h, htcommon/WordReference.h, htdig/Document.h, Attic/config.status, configure, configure.in, Attic/Makefile, Attic/Makefile.config, Attic/config.cache, Attic/config.log, Makefile.config.in: Renamed the String.h file to htString.h to help compiling under win32 * Makefile.in: Updated "make dist" to remove CVS stuff Fri Mar 14 17:15:32 1997 turtle <turtle@kiwi> * htcommon/defaults.cc: Changed default value for remove_bad_urls to true Thu Mar 13 18:37:50 1997 turtle <turtle@kiwi> * htnotify/htnotify.cc, Attic/Makefile.config, htdig/SGMLEntities.cc, htdoc/uses.html: Changes Thu Feb 27 00:52:52 1997 turtle <turtle@kiwi> * htdoc/uses.html: new uses Mon Feb 24 17:52:55 1997 turtle <turtle@kiwi> * htsearch/htsearch.cc, htnotify/Attic/Makefile, htsearch/Attic/Makefile, htlib/strcasecmp.cc, htmerge/Attic/Makefile, htlib/Attic/Makefile, htlib/String.cc, htlib/StringMatch.cc, htdig/SGMLEntities.cc, htfuzzy/Attic/Makefile, htdig/Attic/Makefile, htcommon/Attic/Makefile, htcommon/WordList.cc: Applied patches supplied by "Jan P. Sorensen" <japs@garm.adm.ku.dk> to make ht://Dig run on 8-bit text without the global unsigned-char option to gcc. Sun Feb 23 17:29:38 1997 turtle <turtle@kiwi> * htdoc/uses.html: *** empty log message *** Tue Feb 18 15:03:03 1997 turtle <turtle@kiwi> * htdoc/uses.html: New uses of ht://Dig Tue Feb 11 00:38:48 1997 turtle <turtle@kiwi> * htsearch/htsearch.cc: Renamed the very bad wordlist variable to badWords Mon Feb 10 17:32:47 1997 turtle <turtle@kiwi> * htlib/Connection.cc, htdig/Document.h, htdig/Document.cc, htcommon/DocumentRef.cc, htcommon/DocumentRef.h: Applied AIX specific patches supplied by Lars-Owe Ivarsson <lars-owe.ivarsson@its.uu.se> Fri Feb 7 18:04:13 1997 turtle <turtle@kiwi> * htlib/URL.cc: Fixed problem with anchors without a URL Mon Feb 3 17:37:59 1997 turtle <turtle@kiwi> * .version, README: updated stuff to 3.0.8 * Many files: Initial CVS Local Variables: add-log-time-format: current-time-string End: