Dan outlines some of the features and requirements that need to be considered for the development of sophisticated, high-fidelity simulated worlds in the context of a massively distributed, shared environment
In the last two years since
Linden Lab released the source code for their client, a vibrant
open source community has grown around the Second Life protocol.
With projects such as the first version of
realXtend, the possibility of moving beyond the limitations of
that protocol are starting to be explored. Features such as
inverse-kinematics-based avatar control and mesh-based prims extend
the platform in ways that go beyond the capabilities of
Second Life. There are also parallel open source virtual world
efforts, such as
Qwaq (based on
Open Croquet), Sun's
Wonderland platform, and the
Web3D/X3D set of formats and runtime tools.
One of my core interests in this space has been to find a way to
bring some of my and other's work in robotics, AI, and advanced
simulation into this arena of shared virtual worlds. There are
numerous reasons why this would be beneficial. Firstly, it would
allow researchers to leverage a common set of tools, fostering
collaboration and improving our ability to reproduce and critique
each other's work. Perhaps even more importantly, it could open the
way for large numbers of people to interact with virtual entities
(AI's, if you prefer). This is a potentially major advance in AI
research, as it represents a real-world, renewable source of
training data, which many researchers think is critical for
advancement in the field. It would also be ultra cool.
(Editor's note: See also earlier article What do
virtual worlds have to offer AI?)

Before I describe this work and the unique requirements it
presents, let's clarify some of the jargon. Linden Labs uses the
term 'sim', short for simulator, to signify a portion of virtual
space supported by Linden's servers.
Opensim, the open-source, mostly Second Life-compatible server,
inherits this usage. When I talk about simulation and simulators, I
mean something quite different. In my work in robotics, I build
software to simulate complex robots. This software uses some of the
same underlying components as a Second Life or Opensim simulator -
a physics engine, 3D representation of objects, and so on. However,
the level of detail is quite different. The simulators I use in my
robotics work are generally operating at a much higher degree of
realism than a realXtend or SecondLife vehicle or avatar. They
require proportionally greater resources in terms of CPU and
memory, and of course the development time to get a simulation of
this sort working is considerable.
While my own work in robotics requires a high degree of physical
realism, there is work being done in this field that may have
different requirements, such as robot swarms, where navigation and
interaction are more important than physical realism. There is also
a body of AI research into subjects such as language comprehension
that may not need physical realism at all. However, my work is on
physically realistic simulation, so I will be focused on that
specific issue in this article.
There are several related issues that come up in the course of
attempting to integrate 'serious' simulation into the Second
Life-derived model:
Basic Architecture
In Second Life (and related platforms), the user's visual
representation of the world consists of three major parts: the
background (land, sky, sea); a set of objects (called 'prims' in
Second Life); and avatars, representing the human (or possibly
scripted) players. These three major elements have very different
architectures, and present an almost mutually exclusive set of
features to the system. Putting aside the background elements,
let's look at the prim and avatar systems.
Prims
Prims are physical objects that are described as a primitive
shape (cube, cylinder, pyramid, sphere etc.), with various possible
modifications such as cuts, tapers, skews, and hollows. Prims are
controlled by the server software. Static prims, which generally
don't move unless they are being edited, make up architectural
elements such as walls and other building elements. Other prims are
allowed to move around the environment. They can be controlled by
the physics engine (balls that roll), scripts (doors), or both
(vehicles). They can have arbitrary textures, but do not have the
ability to utilize deformable meshes, skeletons, or client-side
animation.
Avatars
Avatars are always defined as skeletons with deformable mesh
(skin and clothes). Their motion is usually controlled by the
user's input (walk, run, fly etc). They do react to the physics
engine, but in a rudimentary way: avatars are represented at the
physical level as 'capsules', egg or pill-shaped shells that
provide basic collision with walls and other avatars. This is an
important point: there is no physical representation of, say, your
arm extending out and touching an object. Complex avatar motion is
achieved through client-side animation, which in our case is based
on the .bvh file format (see
http://en.wikipedia.org/wiki/Biovision_Hierarchy). Again, it's
important to realize that the animation of an avatar is not
reflected at the physics layer, at least not in the present
implementation of Opensim.
There is one more subtle but critical point to keep in mind
about avatars vs. prims: avatars, by definition, are associated
with a user account. Users have abilities that go outside the
physical and graphic simulation layer - they can buy and sell
things; they can chat with each other, join groups, get banned for
TOS violations, and so forth. They represent an entity within the
social network of Second Life (or one of the Opensim grids, or the
emerging hypergrid). There are complexities - avatars can have
'prim'-based attachments - but fundamentally, if you want to be
part of the Second Life society, you need an avatar.
Users
Users, through their avatars, can interact with each other
(chat, give/receive items…). They can interact with prims,
by creating and editing them (if they have the permissions), or
through menus and scripts. It may seem obvious, but it's worth
pointing out the heirarchy: Prims cannot create or edit avatars
(!); they cannot buy, sell, give, or receive assets on their own
behalf. They are, by definition, owned and controlled by at least
one user, which is by definition represented by a single
avatar.
The problems this heirarchy and separation of capabilities poses
to those of us working in AI is hopefully becoming evident. If I
design a robot using prims, scripts, and physics, there is no
obvious way to give that entity the same rights and privileges of a
"real" user. I also can't take advantage of graphics abilities such
as skeletons, deformable skin, or animations. On the other hand, if
I design a robot in the typical 'bot' fashion, by pretending to be
a user (using libsecondlife for instance), right at the start I
must accept a number of constraints that may not be desirable: I
need to look roughly human (because of skeleton constraints); I
must have skin; I can't interact physically at a fine-grained level
with the rest of the system; and my behavior must be animated - I
cannot generate low-level behavior on-the-fly.
Distributed simulation and synchronization
Second Life started as a proprietary system run by Linden on a
local server farm. They made the choice to divide up the simulation
by geography; in essence, each 256×256 meter region of land
is assigned its own processing stack (set of CPU threads), and
operates in parallel with all the other regions. Interactions
between region modules are achieved through network
communication.
There are obvious objections to this model, the main one being
that popular sims get overloaded, and sparsely visited sims waste
resources. Some of this can be addressed with load balancing and
clever thread management, but in my opinion it represents a
fundamental flaw in the Second Life architecture. Putting the
resource issue aside for the moment, there is a deeper flaw in this
state of affairs. To build a useful and reliable simulated world
for AI research, it is critical that you not introduce major
defects (artifacts) into the simulation. Learning algorithms are
notorious for finding devious ways to "cheat" researcher's goals.
If something in the simulation is non-physical, you can pretty much
bet that your genetic algorithm or neural network will find that
defect and use it to its advantage, avoiding the more difficult,
"honest" approach you are trying to develop.
Boundaries between regions in Second Life, especially in the
case where the regions are separated by network lag, are places
where the physics of the simulation break down in nasty and
difficult to predict ways. One reason for this is that events in
Second Life have no concept of operative time. An event is sent
over the wire, and the receiving node assumes that this event is
happening 'now', in relation to that node's local concept of time.
This creates some issues on the client side (perhaps I'll post
about that later), but more fundamentally, it's a problem at the
server level. Region modules in Opensim pass messages back and
forth between each other, for instance posting the position,
rotation, linear and angular velocity of objects, so that you can
visualize what is happening across region boundaries. Since these
messages are subject to server lag, each sim in effect has its own
version of reality. In my sim, your objects are delayed by the
average network lag time between our sims; in your sim, my objects
are similarly delayed. Clearly this is not acceptible if our goal
is to have a coherent, realistic, detailed simulation of the
environment.
Open Croquet, an alternative virtual world platform, has
developed a concept of distributed, synchronized simulation that
sounds pretty good to me (I haven't actually deployed it, so I'm
taking some of this on faith. Decide for yourself:
http://www.opencroquet.org/index.php/TeaTime_Architecture).
In summary, I hope I've managed to outline some
of the features and requirements that need to be considered for the
development of sophisticated, high-fidelity simulated worlds in the
context of a massively distributed, shared environment. In the
future, I hope to post some ideas on what we can do to support
these concepts in future releases of Opensim and
realXtend.