[pcp-discuss:] Thinking as evolution on behaviours

From: Menno RUBINGH (rubingh@delftnet.nl)
Date: Thu Nov 09 2000 - 22:22:43 GMT

  • Next message: Francis Heylighen: "[pcp-discuss:] Fwd: Comments on "One Half Of A Manifesto" by Jaron Lanier"

     Hi Cybernetics people,

     I take the liberty to send with this email a brief description of my AI
     robot brain design ideas. My purpose with this is to ask all of you, if
     you can, to point me to existing data that overlaps with this, and above
     all to people and groups who might at this moment be working on these
     kinds of things. Thanks very much for any pointers you can give me.

    -----

     * Goal :
       ~~~~

     To create a computer program (called ''Brain Program'') that would make
     a PC to operate like the brain of an intelligent robot.

     The PC is connected to

      1. a set of robot arms (''Motor'') controlled by the brain program, and
      2. a set of Sensors that provide the brain with data about its environment.

     Therefore the assembly { Brain + Motors + Sensors } acts like an
     intelligent robot.

     
     * Further analysis of the goal :
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

     The Brain Program, in my opinion, is the essential part of the
     intelligent robot. The Brain Program should be constructed in such a
     way that it automatically learns to make use of input Sensor data and
     learns to operate its Motors, even in the case where e.g. new
     Sensors/Motors are plugged into the robot in the course of its life.
     That is: the Brain Program compensates for the ideosyncracies of the
     particular Sensors/Motors that it is connected with.

     I see the above problem in evolutionary terms : The Brain Program
     should control the robot ( = { Brain+Motors+Sensors} ) in such a way
     that the robot survives. The ''intelligence'' if the robot consists of
     the circumstance that it learns autonomously how to survive optimally.

     For the time being, I only look at this survival of the robot as
     survival of the individual robot itself. My goal is to make a robot
     that learns selfishly useful behaviour. The last word ''useful'' is
     here to be interpreted as a synonymn for behaviour that is favourable
     for the survival of the robot.

     * General design :
       ~~~~~~~~~~~~~~

     I've created a minimally simple Brain Program that consists basically
     simply of a neural network. This simple Brain Program is constructed
     as follows (source code and some documentation at
     http://www.rubingscience.org/aiclub/toc.html) :
     
     The data maintained by the Brain Program is a list of items, each of
     which is arbitrarily called a ''Brain Cell''. The contents of each
     Brain Cell is as follows:
                                              Motor Priority
         Sensor values value value
          s1 s2 s3 s4 s5 sN m p
        +---+---+---+---+---+--- ... --+---+ +----+ +---+
        | a | b | c | e | f | | z | | m1 | | P |
        +---+---+---+---+---+--- ... --+---+ +----+ +---+
      
     which means: if the Sensors are in states (a,b,c,d,e,f,...,z), then
     actuate Motor 'm1'.

     This presupposes that the robot has N Sensors (labelled s1...sN) and M
     Motors (labelled m1...mM). The Sensor states are discrete, and might be
     e.g. binary values (e.g. 1 = light sensor receives light, 0 = doesn't
     receive light).

     Each Brain Cell must be unique. The Brain Program operates in
     discrete, successive moves, each of which consists of :

          1. Receive the states of the Sensors.

          2. Choose from all Brain Cells the one cell the sensor values
             of which are most ''like'' the input Sensor states.
             The likelihood with which a certain Brain Cell is selected,
             *also* depends on the ''Priority'' value p of the Brain Cell.
             Higher p means higher likelihood of being selected.

          3. Get the Motor value from the chosen Brain Cell, and
             send a signal to that Motor that makes that motor go active
             for a (short) time.

          4. Get the pain/pleasure feedback signal that results from the
             physical action of the activated Motor, and change the Priority
             values of the Brain Cells on the basis of the value that
             pain/pleasure signal. (More on this below.)

     In step 2, if the degree of alike-ness between the input Sensor states
     and the chosen Brain Cell is below a certain threshold, the Sensor input
     is considered as a ''new'' thing, and another action than the above is
     executed, namely: a set of new Brain Cells is inserted, with as Sensor
     states that ''new'' set of Sensor input values, and with all possible
     values for the Motor ('m').

     Each of the Brain Cells is a PROGRAM in which is encoded the action that
     the robot executes when the sensors see the pattern (a,b,c,....z). That
     is: all behaviours that the robot can potentially execute are stored in
     the Brain Cells. The sensor input values, in this minimally complex
     design, fairly directly control the robot's actions.

     The set of all the Brain Cells in the robot's brain is a population of
     competing programs. (I suspect that this may overlap with some of
     Koza's ideas/designs.)

     The Priority value of a Brain Cell is a non-negative real value, and
     represents the population size of the program (= behaviour) coded in
     that Brain Cell.
     
     Additionally to the input Sensors, the robot is also equipped with
     sensors that sense the pain/pleasure state of the robot. Pain might
     mean that the robot has bumped into a wall and has thereby damaged
     itself, or that the robot's fuel level is becoming uncomfortably low.
     Pleasure might mean the robot's fuel level has just increased. All
     pain/pleasure sensor inputs are summed (pain with negative weights,
     pleasure with positive weights) in a fixed (probably hard-wired)
     function, and result signal of that fixed function is fed to the Brain.

     The set of Brain Cells is a *list*, in which the Brain Cell that has
     just been executed is always removed from its old place and moved to the
     top of the list. Near the top of the list are thus always the Brain
     Cells that have been used most recently. When a pain/pleasure signal is
     received, the Priority values of the Brain Cells near the top of the
     list are multiplied by a non-negative quantity that depends on the value
     v of the pain signal as C * exp( v ), or a similar function. The effect
     of this is that a net-positive feedback signal (meaning ''Pleasure'')
     rewards the most-recently executed Brain Cells, by increasing the
     Priority value (population size) of these Brain cells; analogously, a
     net-negative feedback signal (meaning ''Pain'') decreases the Priority
     value (population size) of those Brain Cells.

     Good behaviour, i.e. behaviour that results in Pleasure signals,
     therefore reaches ever higher Priority, and is therefore more and more
     likely to be re-executed. Bad behaviour, i.e. behaviour that results in
     Pain signals, gets lower and lower Priority, and thereby gets executed
     less and less.

    * Ideas for extending the above simple design
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     
     In my perception, the key idea in the above is that the Brain Cells are
     programs in which *behaviour* is encoded, and that these programs
     compete with each other via an evolutionary process. These evolutionary
     processes are what makes the robot adapt and learn, and what makes the
     robot ''intelligent'' and also *autonomous*.

     At the moment, the very simple and minimal version of the Brain Program
     that I have running right now, and which effectively can control a
     simple simulated robot that learns to avoid walls and learns to ''seek
     out'' food pellets in a simulated environment it walks around it,
     doesn't yet do genetic mutation or crossover of the contents of Brain
     Cells -- this would be IMO the next interesting thing to extend the
     above design with. Crossover IMO could be included e.g. as an extra
     step in the cyclical operation of the Brain: e.g. in each cycle select
     two high-Priority Brain Cells, let them mate, and insert the created
     offspring behaviour pattern as a new Brain Cell, with a certain (small)
     population size (= Priority). In this way, it would seem possible to
     create in the operation of the Brain an infrastructure through which
     successful sub-patterns in Brain Cells can be communicated between Brain
     Cells.

     One of my promiment longer-term goals is to extend the above simple
     neural-network design into a Brain in which the programs (which are the
     entities that undergo evolution) consist of ''memes'', in the sense of
     being pieces of data/information that the robot communicates to/from
     other intelligent entities in its environment. I mean that in such a
     meme-communicating robot, the ''*thinking*'' going on in the Brain
     consists of an evolutionary process on these ''memes''; that is :
     ''thinking'' is nothing else than a dumb evolutionary process on these
     ''memes'' (which are themselves only dumb pieces of data). The
     meme-communicating robot would have one or more buttons, providing to
     the Brain the pain/pleasure signals, that the entities with which the
     robot communicates can press when they are pleased or disgusted with
     what the robot says to them. Result would be that the robot learns to
     communicate in a way that these parties find ''pleasant'' -- or in other
     words : the robot learns that behaviour that is in that case optimal for
     the survival of the robot is to communicate in an interesting and
     pleasant way with those people empowered to press its buttons.
     
     An even more interesting variant on the latter would be to eliminate the
     buttons, and to replace them by a pain/pleasure signal derived from the
     amount of new data that the robot learns. This would result in a robot
     that -- independently of whether it ''pleases'' persons in its
     surroundings -- autonomously and independently seeks to maximize its own
     knowledge.
     
     (A further thesis of mine is of course that scientists are already
     robots of that last kind :-).)

    ---
    

    The above is a fairly minimal description of my ideas. (But enough for now, given this medium of emails in a mailing list, I think.)

    Again, anyone who can point me to people who are already working on things like this, please inform me of those, if you will. At the moment, I have a feeling that I cannot proceed easily without interaction with others with overlapping ideas. It IMO just has to be the case that there exist research groups which do things that overlap with the above, but I find it a very strenous task to find them. Thanks very much !

    ---

    Best regards, Menno (rubingh@delftnet.nl)

    Ir. Menno Rubingh, Scientific programmer, Software designer, & Software documentation writer Doelenstraat 62, 2611 NV Delft, Netherlands phone +31 15 2146915 (answering machine backup) email rubingh@delftnet.nl http://www.rubinghscience.org/ ======================================== Posting to pcp-discuss@lanl.gov from "Menno RUBINGH" <rubingh@delftnet.nl>



    This archive was generated by hypermail 2b29 : Thu Nov 09 2000 - 22:22:48 GMT