README file for Chatbot::Eliza 1.04 


NAME
    Chatbot::Eliza - A clone of the classic Eliza program

SYNOPSIS
      use Chatbot::Eliza;

      $mybot = new Chatbot::Eliza;
      $mybot->command_interface;

      # see below for details

DESCRIPTION
    This module implements the classic Eliza algorithm. The original
    Eliza program was written by Joseph Weizenbaum and described in
    the Communications of the ACM in 1966. Eliza is a mock Rogerian
    psychotherapist. It prompts for user input, and uses a simple
    transformation algorithm to change user input into a follow-up
    question. The program is designed to give the appearance of
    understanding.

    This program is a faithful implementation of the program
    described by Weizenbaum. It uses a simplified script language
    (devised by Charles Hayden). The content of the script is the
    same as Weizenbaum's.

    This module encapsulates the Eliza algorithm in the form of an
    object. This should make the functionality easy to incorporate
    in larger programs.

INSTALLATION
    The current version of Chatbot::Eliza.pm is available on CPAN:

      http://www.perl.com/CPAN/modules/by-module/Chatbot/

    To install this package, just change to the directory which you
    created by untarring the package, and type the following:

            perl Makefile.PL
            make test
            make
            make install

    This will copy Eliza.pm to your perl library directory for use
    by all perl scripts. You probably must be root to do this,
    unless you have installed a personal copy of perl.

USAGE
    This is all you need to do to launch a simple Eliza session:

            use Chatbot::Eliza;

            $mybot = new Chatbot::Eliza;
            $mybot->command_interface;

    You can also customize certain features of the session:

            $myotherbot = new Chatbot::Eliza;

            $myotherbot->name( "Hortense" );
            $myotherbot->debug( 1 );

            $myotherbot->command_interface;

    These lines set the name of the bot to be "Hortense" and turn on
    the debugging output.

    When creating an Eliza object, you can specify a name and an
    alternative scriptfile:

            $bot = new Chatbot::Eliza "Brian", "myscript.txt";

    You can also use an anonymous hash to set these parameters. Any
    of the fields can be initialized using this syntax:

            $bot = new Chatbot::Eliza {
                    name       => "Brian", 
                    scriptfile => "myscript.txt",
                    debug      => 1,
                    prompts_on => 1,
                    memory_on  => 0,
                    myrand     => 
                            sub { my $N = defined $_[0] ? $_[0] : 1;  rand($N); },
            };

    If you don't specify a script file, then the new object will be
    initialized with a default script. The module contains this
    script within itself.

    You can use any of the internal functions in a calling program.
    The code below takes an arbitrary string and retrieves the reply
    from the Eliza object:

            my $string = "I have too many problems.";
            my $reply  = $mybot->transform( $string );

    You can easily create two bots, each with a different script,
    and see how they interact:

            use Chatbot::Eliza

            my ($harry, $sally, $he_says, $she_says);

            $sally = new Chatbot::Eliza "Sally", "histext.txt";
            $harry = new Chatbot::Eliza "Harry", "hertext.txt";

            $he_says  = "I am sad.";

            # Seed the random number generator.
            srand( time ^ ($$ + ($$ << 15)) );      

            while (1) {
                    $she_says = $sally->transform( $he_says );
                    print $sally->name, ": $she_says \n";
            
                    $he_says  = $harry->transform( $she_says );
                    print $harry->name, ": $he_says \n";
            }

    Mechanically, this works well. However, it critically depends on
    the actual script data. Having two mock Rogerian therapists talk
    to each other usually does not produce any sensible
    conversation, of course.

    After each call to the transform() method, the debugging output
    for that transformation is stored in a variable called
    $debug_text.

            my $reply      = $mybot->transform( "My foot hurts" );
            my $debugging  = $mybot->debug_text;

    This feature always available, even if the instance's $debug
    variable is set to 0.

    Calling programs can specify their own random-number generators.
    Use this syntax:

            $chatbot = new Chatbot::Eliza;
            $chatbot->myrand(
                    sub {
                            #function goes here!
                    }
            );

    The custom random function should have the same prototype as
    perl's built-in rand() function. That is, it should take a
    single (numeric) expression as a parameter, and it should return
    a floating-point value between 0 and that number.

    What this code actually does is pass a reference to an anonymous
    subroutine ("code reference"). Make sure you've read the perlref
    manpage for details on how code references actually work.

    If you don't specify any custom rand function, then the Eliza
    object will just use the built-in rand() function.

MAIN DATA MEMBERS
    Each Eliza object uses the following data structures to hold the
    script data in memory:

  %decomplist

    *Hash*: the set of keywords; *Values*: strings containing the
    decomposition rules.

  %reasmblist

    *Hash*: a set of values which are each the join of a keyword and
    a corresponding decomposition rule; *Values*: the set of
    possible reassembly statements for that keyword and
    decomposition rule.

  %reasmblist_for_memory

    This structure is identical to `%reasmblist', except that these
    rules are only invoked when a user comment is being retrieved
    from memory. These contain comments such as "Earlier you
    mentioned that...," which are only appropriate for remembered
    comments. Rules in the script must be specially marked in order
    to be included in this list rather than `%reasmblist'. The
    default script only has a few of these rules.

  @memory

    A list of user comments which an Eliza instance is remembering
    for future use. Eliza does not remember everything, only some
    things. In this implementation, Eliza will only remember
    comments which match a decomposition rule which actually has
    reassembly rules that are marked with the keyword
    "reasm_for_memory" rather than the normal "reasmb". The default
    script only has a few of these.

  %keyranks

    *Hash*: the set of keywords; *Values*: the ranks for each
    keyword

  @quit

    "quit" words -- that is, words the user might use to try to exit
    the program.

  @initial

    Possible greetings for the beginning of the program.

  @final

    Possible farewells for the end of the program.

  %pre

    *Hash*: words which are replaced before any transformations;
    *Values*: the respective replacement words.

  %post

    *Hash*: words which are replaced after the transformations and
    after the reply is constructed; *Values*: the respective
    replacement words.

  %synon

    *Hash*: words which are found in decomposition rules; *Values*:
    words which are treated just like their corresponding synonyms
    during matching of decomposition rules.

  Other data members

    There are several other internal data members. Hopefully these
    are sufficiently obvious that you can learn about them just by
    reading the source code.

METHODS
  new()

        my $chatterbot = new Chatbot::Eliza;

    new() creates a new Eliza object. This method also calls the
    internal _initialize() method, which in turn calls the
    parse_script_data() method, which initializes the script data.

        my $chatterbot = new Chatbot::Eliza 'Ahmad', 'myfile.txt';

    The eliza object defaults to the name "Eliza", and it contains
    default script data within itself. However, using the syntax
    above, you can specify an alternative name and an alternative
    script file.

    See the method parse_script_data(). for a description of the
    format of the script file.

  command_interface()

        $chatterbot->command_interface;

    command_interface() opens an interactive session with the Eliza
    object, just like the original Eliza program.

    If you want to design your own session format, then you can
    write your own while loop and your own functions for prompting
    for and reading user input, and use the transform() method to
    generate Eliza's responses. (*Note*: you do not need to invoke
    preprocess() and postprocess() directly, because these are
    invoked from within the transform() method.)

    But if you're lazy and you want to skip all that, then just use
    command_interface(). It's all done for you.

    During an interactive session invoked using command_interface(),
    you can enter the word "debug" to toggle debug mode on and off.
    You can also enter the keyword "memory" to invoke the
    _debug_memory() method and print out the contents of the Eliza
    instance's memory.

  preprocess()

        $string = preprocess($string);

    preprocess() applies simple substitution rules to the input
    string. Mostly this is to catch varieties in spelling,
    misspellings, contractions and the like.

    preprocess() is called from within the transform() method. It is
    applied to user-input text, BEFORE any processing, and before a
    reassebly statement has been selected.

    It uses the array `%pre', which is created during the parse of
    the script.

  postprocess()

        $string = postprocess($string);

    postprocess() applies simple substitution rules to the
    reassembly rule. This is where all the "I"'s and "you"'s are
    exchanged. postprocess() is called from within the transform()
    function.

    It uses the array `%post', created during the parse of the
    script.

  _testquit()

         if ($self->_testquit($user_input) ) { ... }

    _testquit() detects words like "bye" and "quit" and returns true
    if it finds one of them as the first word in the sentence.

    These words are listed in the script, under the keyword "quit".

  _debug_memory()

         $self->_debug_memory()

    _debug_memory() is a special function which returns the contents
    of Eliza's memory stack.

  transform()

        $reply = $chatterbot->transform( $string, $use_memory );

    transform() applies transformation rules to the user input
    string. It invokes preprocess(), does transformations, then
    invokes postprocess(). It returns the tranformed output string,
    called `$reasmb'.

    The algorithm embedded in the transform() method has three main
    parts:

    1   Search the input string for a keyword.

    2   If we find a keyword, use the list of decomposition rules for
        that keyword, and pattern-match the input string against
        each rule.

    3   If the input string matches any of the decomposition rules, then
        randomly select one of the reassembly rules for that
        decomposition rule, and use it to construct the reply.

    transform() takes two parameters. The first is the string we
    want to transform. The second is a flag which indicates where
    this sting came from. If the flag is set, then the string has
    been pulled from memory, and we should use reassembly rules
    appropriate for that. If the flag is not set, then the string is
    the most recent user input, and we can use the ordinary
    reassembly rules.

    The memory flag is only set when the transform() function is
    called recursively. The mechanism for setting this parameter is
    embedded in the transoform method itself. If the flag is set
    inappropriately, it is ignored.

  How memory is used

    In the script, some reassembly rules are special. They are
    marked with the keyword "reasm_for_memory", rather than just
    "reasm". Eliza "remembers" any comment when it matches a
    docomposition rule for which there are any reassembly rules for
    memory. An Eliza object remembers up to `$max_memory_size'
    (default: 5) user input strings.

    If, during a subsequent run, the transform() method fails to
    find any appropriate decomposition rule for a user's comment,
    and if there are any comments inside the memory array, then
    Eliza may elect to ignore the most recent comment and instead
    pull out one of the strings from memory. In this case, the
    transform method is called recursively with the memory flag.

    Honestly, I am not sure exactly how this memory functionality
    was implemented in the original Eliza program. Hopefully this
    implementation is not too far from Weizenbaum's.

    If you don't want to use the memory functionality at all, then
    you can disable it:

            $mybot->memory_on(0);

    You can also achieve the same effect by making sure that the
    script data does not contain any reassembly rules marked with
    the keyword "reasm_for_memory". The default script data only has
    4 such items.

  parse_script_data()

        $self->parse_script_data;
        $self->parse_script_data( $script_file );

    parse_script_data() is invoked from the _initialize() method,
    which is called from the new() function. However, you can also
    call this method at any time against an already-instantiated
    Eliza instance. In that case, the new script data is *added* to
    the old script data. The old script data is not deleted.

    You can pass a parameter to this function, which is the name of
    the script file, and it will read in and parse that file. If you
    do not pass any parameter to this method, then it will read the
    data embedded at the end of the module as its default script
    data.

    If you pass the name of a script file to parse_script_data(),
    and that file is not available for reading, then the module
    dies.

Format of the script file
    This module includes a default script file within itself, so it
    is not necessary to explicitly specify a script file when
    instantiating an Eliza object.

    Each line in the script file can specify a key, a decomposition
    rule, or a reassembly rule.

      key: remember 5
        decomp: * i remember *
          reasmb: Do you often think of (2) ?
          reasmb: Does thinking of (2) bring anything else to mind ?
        decomp: * do you remember *
          reasmb: Did you think I would forget (2) ?
          reasmb: What about (2) ?
          reasmb: goto what
      pre: equivalent alike
      synon: belief feel think believe wish

    The number after the key specifies the rank. If a user's input
    contains the keyword, then the transform() function will try to
    match one of the decomposition rules for that keyword. If one
    matches, then it will select one of the reassembly rules at
    random. The number (2) here means "use whatever set of words
    matched the second asterisk in the decomposition rule."

    If you specify a list of synonyms for a word, the you should use
    a "@" when you use that word in a decomposition rule:

      decomp: * i @belief i *
        reasmb: Do you really think so ?
        reasmb: But you are not sure you (3).

    Otherwise, the script will never check to see if there are any
    synonyms for that keyword.

    Reassembly rules should be marked with *reasm_for_memory* rather
    than *reasmb* when it is appropriate for use when a user's
    comment has been extracted from memory.

      key: my 2
        decomp: * my *
          reasm_for_memory: Let's discuss further why your (2).
          reasm_for_memory: Earlier you said your (2).
          reasm_for_memory: But your (2).
          reasm_for_memory: Does that have anything to do with the fact that your (2) ?

How the script file is parsed
    Each line in the script file contains an "entrytype" (key,
    decomp, synon) and an "entry", separated by a colon. In turn,
    each "entry" can itself be composed of a "key" and a "value",
    separated by a space. The parse_script_data() function parses
    each line out, and splits the "entry" and "entrytype" portion of
    each line into two variables, `$entry' and `$entrytype'.

    Next, it uses the string `$entrytype' to determine what sort of
    stuff to expect in the `$entry' variable, if anything, and
    parses it accordingly. In some cases, there is no second level
    of key-value pair, so the function does not even bother to
    isolate or create `$key' and `$value'.

    `$key' is always a single word. `$value' can be null, or one
    single word, or a string composed of several words, or an array
    of words.

    Based on all these entries and keys and values, the function
    creates two giant hashes: `%decomplist', which holds the
    decomposition rules for each keyword, and `%reasmblist', which
    holds the reassembly phrases for each decomposition rule. It
    also creates `%keyranks', which holds the ranks for each key.

    Six other arrays are created: `%reasm_for_memory, %pre, %post,
    %synon, @initial,' and `@final'.

CHANGES
    * Version 1.02-1.04 - January 2003
          Added a Norwegian script, kindly contributed by 
          Mats Stafseng Einarsen.  Thanks Mats!

    * Version 1.01 - January 2003
          Added an empty DESTORY method, to eliminate
          some pesky warning messages.  Suggested by
          Stas Bekman. 

    * Version 0.98 - March 2000
          Some changes to the documentation.

    * Versions 0.96-0.97 - October 1999
          One tiny change to the regex which implements
          reassemble rules.  Thanks to Gidon Wise for
          suggesting this improvement. 

    * Versions 0.94-0.95 - July 1999
          Fixed a bug in the way the bot invokes its random function
          when it pulls a comment out of memory. 

    * Version 0.93 - June 1999
          Calling programs can now specify their own random-number generators.  
          Use this syntax:

                $chatbot = new Chatbot::Eliza;
                $chatbot->myrand( 
                        sub { 
                                #function goes here! 
                        } 
                );

          The custom random function should have the same prototype
          as perl's built-in rand() function.  That is, it should take
          a single (numeric) expression as a parameter, and it should 
          return a floating-point value between 0 and that number.  

          You can also now use a reference to an anonymous hash 
          as a parameter to the new() method to define any fields 
          in that bot instance:

                $bot = new Chatbot::Eliza {
                        name       => "Brian",
                        scriptfile => "myscript.txt",
                        debug      => 1,
                };

    * Versions 0.91-0.92 - April 1999
          Fixed some misspellings. 

    * Version 0.90 - April 1999
          Fixed a bug in the way individual bot objects store 
          their memory.  Thanks to Randal Schwartz and to 
          Robert Chin for pointing this out.

          Fixed a very stupid error in the way the random
          function is invoked.  Thanks to Antony Quintal
          for pointing out the error. 

          Many corrections and improvements were made 
          to the German script by Matthias Hellmund.  
          Thanks, Matthias!

          Made a minor syntactical change, at the suggestion
          of Roy Stephan.

          The memory functionality can now be disabled by setting the
          $Chatbot::Eliza::memory_on variable to 0, like so:

                $bot->memory_on(0);

          Thanks to Robert Chin for suggesting that. 

    * Version 0.40 - July 1998
          Re-implemented the memory functionality. 

          Cleaned up and expanded the embedded POD documentation.  

          Added a sample script in German.  

          Modified the debugging behavior.  The transform() method itself 
          will no longer print any debugging output directly to STDOUT.  
          Instead, all debugging output is stored in a module variable 
          called "debug_text".  The "debug_text" variable is printed out 
          by the command_interface() method, if the debug flag is set.   
          But even if this flag is not set, the variable debug_text 
          is still available to any calling program.  

          Added a few more example scripts which use the module.  

            simple       - simple script using Eliza.pm
            simple.cgi   - simple CGI script using Eliza.pm
            debug.cgi    - CGI script which displays debugging output
            deutsch      - script using the German script
            deutsch.cgi  - CGI script using the German script
            twobots      - script which creates two distinct bots

    * Version 0.32 - December 1997
          Fixed a bug in the way Eliza loads its default internal script data.
          (Thanks to Randal Schwartz for pointing this out.) 

          Removed the "memory" functions internal to Eliza.  
          When I get them working properly I will add them back in. 

          Added one more example program.

          Fixed some minor errors in the embedded POD documentation.

    * Version 0.31
          The module is now installable, just like any other self-respecting
          CPAN module.  

    * Version 0.30
          First release.

AUTHOR
    John Nolan jpnolan@sonic.net January 2003.

    Implements the classic Eliza algorithm by Prof. Joseph
    Weizenbaum. Script format devised by Charles Hayden.