skWare's HTML Search Engine Online Documentation

Alpha Release 0.08a


Introduction

skWare's HTML Search Engine (skSearchEngine) is a search engine designed to search html pages on a Wildcat! 5 BBS. It has been written mainly in wcCode but does have some components that have been writen in C.

Usage of this software is covered by skWare's Software License which you should read at some stage and you should also check out skWare's Software Warranty. You may also want to check out the Trademarks page at some stage.

You should definately read the FAQ, Software Notes, Software History and the To Do List.

In this documentation we have assumed that Wildcat! has been installed on drive d: in the directory wc5 resulting in the fact that the default wildcat directory is d:\wc5, as this is how our development system is set up. Your system may not be set up like this.

There are two phases to using skSearchEngine namely:- Index Generation and Online Query. These are discussed in more detail below.

Index Generation

This is the heart of skSearchEngine. It has been found that the better the index the better the search results returned to the users. Currently skSearchEngine supports one index which is generated on a directory basis. This means that you can create an index on one or more directories in your http directory structure.

To genrate an index you need to run the indexing program. The command used to run the indexing program is as follows:-

wcRun -r sksearchengine_idxgen mode type http_dir disk_dir

The mode can be CREATE or APPEND.

TypeDescription
CREATE This should be used for the first run of the indexing program as it will create the index database from scratch.
APPEND This should be used for subsequent runs of the indexing program as the index records generated
will be added to the index database.

The type can be TYPE1, TYPE2, TYPE3 or TYPE4.

TypeDescription
TYPE1 This should be used where you want to index files with an extension of .htm.
TYPE2 This should be used where you want to index files with an extension of .html.
TYPE3 This should be used where you want to index files with an extension of .txt.
TYPE4 This should be used where you want to index files with an extension of .text.

The http_dir is the path that the index generation program should use when creating links. This should be relative to your wildcat http directory. For example if you are indexing the directory d:\wc5\http\skware_se the http_dir should be /skware_se. Note also the use of "/" instead of "\" and that there is no trailing "/".

The disk_dir is the full path of the directory that you wish to index. Therefore as in the example above this parameter would be d:\wc5\http\skware_se. This time please note also the use of "\" instead of "/" and that there is no trailing "\".

You should also look at the example batch file sksearchengine_idexgen.bat to see how we created the example index database shipped with skSearchEngine was created.

When indexing html files we have tried to exclude all html tags from the index. This has been done because when searching for occurances of words such as html the last thing you want is every page on your system being returned because they contain a html tag at the start of the document. If you find some html tags that you think should be excluded send email to Steve Davies and we will try to include them in the next release.

One other thing needs to be said about index generation and that is to do with the output produced by online query. Consider the following as an example of a link produced by the online query program.

skWare's skSearchEngine HTML Search Engine Online Documentation
138 Summary: Online documentation index page for skWare's HTML search engine.

  • The underline part is the link is taken from the html title tag. If there is no title tag in the html document the file name is used. You should look at the html source of this document for further information on how this is done. An example of the title tag used to produce the link shown above, is as follows:-

    <title>skWare's HTML Search Engine Online Documentation</title>
  • The number 138 in bold is the number of times the word appears in the document.
  • The summary is taken from the meta name=description tag. If there is no meta name=description tag the message no summary is available will be used. You should look at the html source of this document for further information on how this is done. An example of the meta description tag used to produce the summary shown above is as follows:-

    <meta name="DESCRIPTION" content="Online documentation index page for skWare's HTML search engine.">
  • Online Query

    The online query program is what the user uses to find html pages on your BBS. In the current release the user can specify one or two words and use a joiner such as and or or in their search criteria.

    Setup of the query program is straight forward in that all that you need to do is provide your users with a link to the query program on one or more of your html pages. An example of the link is as follows:-

    <a href="/code/html-sksearchengine.wcx?BtnClear=Clear"<b>Skware's HTML Search Engine</b></a>

    Please not that the case of the BtnClear=Clear is important and should be specified as shown.

    Configuration File

    The operation of both the indexing program to some extent and query program to a larger extent is controlled by the configuration file. This configuration file is called sksearchengine.cfg and must be in your WildCat data directory (d:\wc5\data). It is an 'ini' type file the contents of which are described in the following table.
    ParameterDescription
    RegCode Once registered this parameter should contain your registration code as supplied by skWare. In the shareware version this should contain the word "SHAREWARE" without the quoutes.
    IndexGenLog This flag controls as to whether the indexing program should create a log containing details on the indexing process. It is strongly recommended that this should be left as the default (Y). It can be either Y for yes or N for no. The log file is called sksearchengine_idxgen.log and is written in your default wildcat directory.
    QueryLog This flag controls as to whether the query program should create a log containing details on queries made by users. It is strongly recommended that this should be left as the default (Y). It can be either Y for yes or N for no. The log file is called sksearchengine_query.log and is written in your default wildcat directory.
    HomePageLink This is the link to your home page. It is used by the Home button so that the user can return quickly to you home page.
    BackPageLink This is the link used by the back button so the user can return quickly to the page that called the query program.

    Installation

    To install skSearchEngine you should follow these steps.

    Footer Customisation

    To customise the footers displayed in the query, help and results pages you should modify the supplied templates to suit your needs. The templates supplied are discussed in the following table:-
    Template FileUsed By
    sksearchengine_html_help_footer.htm This template is used as the footer to the help page. If the query program does not find this file it will use the internal default.
    sksearchengine_html_query_footer.htm This template is used as the footer to the online query page. If the query program does not find this file it will use the internal default.
    sksearchengine_html_results_footer.htm This template is used as the footer to the query results page. If the query program does not find this file it will use the internal default.

    Copyright © 1997 skWare, All Rights Reserved