Hi Cybernetics people,
I take the liberty to send with this email a brief description of my AI
robot brain design ideas. My purpose with this is to ask all of you, if
you can, to point me to existing data that overlaps with this, and above
all to people and groups who might at this moment be working on these
kinds of things. Thanks very much for any pointers you can give me.
-----
* Goal :
~~~~
To create a computer program (called ''Brain Program'') that would make
a PC to operate like the brain of an intelligent robot.
The PC is connected to
1. a set of robot arms (''Motor'') controlled by the brain program, and
2. a set of Sensors that provide the brain with data about its environment.
Therefore the assembly { Brain + Motors + Sensors } acts like an
intelligent robot.
* Further analysis of the goal :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Brain Program, in my opinion, is the essential part of the
intelligent robot. The Brain Program should be constructed in such a
way that it automatically learns to make use of input Sensor data and
learns to operate its Motors, even in the case where e.g. new
Sensors/Motors are plugged into the robot in the course of its life.
That is: the Brain Program compensates for the ideosyncracies of the
particular Sensors/Motors that it is connected with.
I see the above problem in evolutionary terms : The Brain Program
should control the robot ( = { Brain+Motors+Sensors} ) in such a way
that the robot survives. The ''intelligence'' if the robot consists of
the circumstance that it learns autonomously how to survive optimally.
For the time being, I only look at this survival of the robot as
survival of the individual robot itself. My goal is to make a robot
that learns selfishly useful behaviour. The last word ''useful'' is
here to be interpreted as a synonymn for behaviour that is favourable
for the survival of the robot.
* General design :
~~~~~~~~~~~~~~
I've created a minimally simple Brain Program that consists basically
simply of a neural network. This simple Brain Program is constructed
as follows (source code and some documentation at
http://www.rubingscience.org/aiclub/toc.html) :
The data maintained by the Brain Program is a list of items, each of
which is arbitrarily called a ''Brain Cell''. The contents of each
Brain Cell is as follows:
Motor Priority
Sensor values value value
s1 s2 s3 s4 s5 sN m p
+---+---+---+---+---+--- ... --+---+ +----+ +---+
| a | b | c | e | f | | z | | m1 | | P |
+---+---+---+---+---+--- ... --+---+ +----+ +---+
which means: if the Sensors are in states (a,b,c,d,e,f,...,z), then
actuate Motor 'm1'.
This presupposes that the robot has N Sensors (labelled s1...sN) and M
Motors (labelled m1...mM). The Sensor states are discrete, and might be
e.g. binary values (e.g. 1 = light sensor receives light, 0 = doesn't
receive light).
Each Brain Cell must be unique. The Brain Program operates in
discrete, successive moves, each of which consists of :
1. Receive the states of the Sensors.
2. Choose from all Brain Cells the one cell the sensor values
of which are most ''like'' the input Sensor states.
The likelihood with which a certain Brain Cell is selected,
*also* depends on the ''Priority'' value p of the Brain Cell.
Higher p means higher likelihood of being selected.
3. Get the Motor value from the chosen Brain Cell, and
send a signal to that Motor that makes that motor go active
for a (short) time.
4. Get the pain/pleasure feedback signal that results from the
physical action of the activated Motor, and change the Priority
values of the Brain Cells on the basis of the value that
pain/pleasure signal. (More on this below.)
In step 2, if the degree of alike-ness between the input Sensor states
and the chosen Brain Cell is below a certain threshold, the Sensor input
is considered as a ''new'' thing, and another action than the above is
executed, namely: a set of new Brain Cells is inserted, with as Sensor
states that ''new'' set of Sensor input values, and with all possible
values for the Motor ('m').
Each of the Brain Cells is a PROGRAM in which is encoded the action that
the robot executes when the sensors see the pattern (a,b,c,....z). That
is: all behaviours that the robot can potentially execute are stored in
the Brain Cells. The sensor input values, in this minimally complex
design, fairly directly control the robot's actions.
The set of all the Brain Cells in the robot's brain is a population of
competing programs. (I suspect that this may overlap with some of
Koza's ideas/designs.)
The Priority value of a Brain Cell is a non-negative real value, and
represents the population size of the program (= behaviour) coded in
that Brain Cell.
Additionally to the input Sensors, the robot is also equipped with
sensors that sense the pain/pleasure state of the robot. Pain might
mean that the robot has bumped into a wall and has thereby damaged
itself, or that the robot's fuel level is becoming uncomfortably low.
Pleasure might mean the robot's fuel level has just increased. All
pain/pleasure sensor inputs are summed (pain with negative weights,
pleasure with positive weights) in a fixed (probably hard-wired)
function, and result signal of that fixed function is fed to the Brain.
The set of Brain Cells is a *list*, in which the Brain Cell that has
just been executed is always removed from its old place and moved to the
top of the list. Near the top of the list are thus always the Brain
Cells that have been used most recently. When a pain/pleasure signal is
received, the Priority values of the Brain Cells near the top of the
list are multiplied by a non-negative quantity that depends on the value
v of the pain signal as C * exp( v ), or a similar function. The effect
of this is that a net-positive feedback signal (meaning ''Pleasure'')
rewards the most-recently executed Brain Cells, by increasing the
Priority value (population size) of these Brain cells; analogously, a
net-negative feedback signal (meaning ''Pain'') decreases the Priority
value (population size) of those Brain Cells.
Good behaviour, i.e. behaviour that results in Pleasure signals,
therefore reaches ever higher Priority, and is therefore more and more
likely to be re-executed. Bad behaviour, i.e. behaviour that results in
Pain signals, gets lower and lower Priority, and thereby gets executed
less and less.
* Ideas for extending the above simple design
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In my perception, the key idea in the above is that the Brain Cells are
programs in which *behaviour* is encoded, and that these programs
compete with each other via an evolutionary process. These evolutionary
processes are what makes the robot adapt and learn, and what makes the
robot ''intelligent'' and also *autonomous*.
At the moment, the very simple and minimal version of the Brain Program
that I have running right now, and which effectively can control a
simple simulated robot that learns to avoid walls and learns to ''seek
out'' food pellets in a simulated environment it walks around it,
doesn't yet do genetic mutation or crossover of the contents of Brain
Cells -- this would be IMO the next interesting thing to extend the
above design with. Crossover IMO could be included e.g. as an extra
step in the cyclical operation of the Brain: e.g. in each cycle select
two high-Priority Brain Cells, let them mate, and insert the created
offspring behaviour pattern as a new Brain Cell, with a certain (small)
population size (= Priority). In this way, it would seem possible to
create in the operation of the Brain an infrastructure through which
successful sub-patterns in Brain Cells can be communicated between Brain
Cells.
One of my promiment longer-term goals is to extend the above simple
neural-network design into a Brain in which the programs (which are the
entities that undergo evolution) consist of ''memes'', in the sense of
being pieces of data/information that the robot communicates to/from
other intelligent entities in its environment. I mean that in such a
meme-communicating robot, the ''*thinking*'' going on in the Brain
consists of an evolutionary process on these ''memes''; that is :
''thinking'' is nothing else than a dumb evolutionary process on these
''memes'' (which are themselves only dumb pieces of data). The
meme-communicating robot would have one or more buttons, providing to
the Brain the pain/pleasure signals, that the entities with which the
robot communicates can press when they are pleased or disgusted with
what the robot says to them. Result would be that the robot learns to
communicate in a way that these parties find ''pleasant'' -- or in other
words : the robot learns that behaviour that is in that case optimal for
the survival of the robot is to communicate in an interesting and
pleasant way with those people empowered to press its buttons.
An even more interesting variant on the latter would be to eliminate the
buttons, and to replace them by a pain/pleasure signal derived from the
amount of new data that the robot learns. This would result in a robot
that -- independently of whether it ''pleases'' persons in its
surroundings -- autonomously and independently seeks to maximize its own
knowledge.
(A further thesis of mine is of course that scientists are already
robots of that last kind :-).)
---The above is a fairly minimal description of my ideas. (But enough for now, given this medium of emails in a mailing list, I think.)
Again, anyone who can point me to people who are already working on things like this, please inform me of those, if you will. At the moment, I have a feeling that I cannot proceed easily without interaction with others with overlapping ideas. It IMO just has to be the case that there exist research groups which do things that overlap with the above, but I find it a very strenous task to find them. Thanks very much !
---
Best regards, Menno (rubingh@delftnet.nl)
Ir. Menno Rubingh, Scientific programmer, Software designer, & Software documentation writer Doelenstraat 62, 2611 NV Delft, Netherlands phone +31 15 2146915 (answering machine backup) email rubingh@delftnet.nl http://www.rubinghscience.org/ ======================================== Posting to pcp-discuss@lanl.gov from "Menno RUBINGH" <rubingh@delftnet.nl>
This archive was generated by hypermail 2b29 : Thu Nov 09 2000 - 22:22:48 GMT