You are not logged in (login or register) Sitemap | Help!

Future 3D web: Authoritative client side physics

Makes the server light weight

Edited by: Jani Pirkola


Dan Miller challenges the current model of server side authoritative physics and shows the way forward. After reading this you should see the light.

This is the follow up on Dan's previous articles: Virtual Worlds, Virtual Robots, and AI: beyond gaming and social spaces and The future of Virtual Reality is not server centric.

If everyone is running their own physics, how do we resolve conflicts? Who is authoritative with regard to position and orientation of objects? The answer may surprise you, but it makes a lot of sense. Everyone is responsible for their own behavior, and everyone is authoritative with regard to their own state. But first, we have to talk a little about time.

A favorite band of mine from the 70's, Chicago, had a hit song entitled "Does anybody really know what time it is?" The song continues with the lyric, "Does anybody really care?" I think perhaps this song was written with a precognitive understanding of the Second Life event protocol. In the SL world, events happen "now", meaning whenever you get the message. This ends up working out to some degree by virtue of two facts: 1), in Second Life, all activity is mediated by the server, and 2), ping times tend to be somewhat stable between machines on the 'net. Since the server is the only source of behavior, and the ping times are relatively stable, everyone tends to get the same events at a predictable rate of delay. (Note however that this situation totally breaks down at the boundary between two sims that have substantial lag between them. Linden runs all the servers on a grid at a co-located facility, but the Opensim grids and hypergrid are distributed across the world.)

greymountainskybox

If we want to move to a model where behavior is computed on different machines, we have to up our game a bit in terms of sophistication when it comes to time. If Alice and Bob are moving about on Charlie's server, and there is a 50 millisecond lag between Alice and Charlie (as well as between Bob and Charlie), then Alice sees Charlie's avatar delayed by about 100 milliseconds, and vice versa. A problem then arises if Alice and Bob might bump into each other. One can imagine a scenario where, in Alice's world, she just misses being bumped into by Bob; but in Bob's world, Bob and Alice collide. How do we manage this discrepancy?

It's actually not all that difficult. If we have a good knowledge of our network, and know the average ping time as well as the standard deviation, we can devise a buffering scheme that smooths over time lags. In this example, the buffer time might be set to 125 ms, which is a bit longer than the maximum round trip packet delay. When Alice presses a key to move her avatar forward, an event message is broadcast, with a time stamp set 125 milliseconds in the future. Alice's avatar itself doesn't respond immediately; instead, it waits for the event to become timely, and after 125 milliseconds, the avatar begins to move (note this is not much worse than the SL case, with a varying delay of around 100 ms when the network is responsive). Assuming it takes 100 ms for this message to get to Bob, there is still 25 ms left until Bob also starts moving Alice's avatar in his version of reality. If everything goes according to plan, both Alice and Bob will see Alice's avatar move at the same time, and it will therefore be in the same place according to both client machines. If Bob is moving as well, Alice and Bob will agree precisely on where each avatar is at every moment, and their calculation of affairs, including potential collisions, will be synchronized.

One big advantage to such an arrangement is how it responds to times when the network may be overloaded, or responding poorly for some reason. In Second Life, if the network goes down, you are stuck in one place (surreally, you can look around and your camera will follow, but you cannot move). In this new model, you can continue to walk around, smell the virtual flowers, and do anything you want to do in the environment that doesn't depend on network communication. The only problem is, you will not get accurate update information for other avatars or objects that you don't control -- and they won't get the right information from you. Once the network recovers, everyone will get update messages, and the problem can be corrected (just as it happens in SL today). The difference is, you didn't spend that time in a virtual cage. It's worth pointing out that this model, where messages are buffered and time-stamped, is basically what we use today to implement streaming protocols. In particular, VOIP and teleconferencing apps have used this technique for many years.

There are many details to be worked out, but this is the rough outline of a set of capabilities that I think would go a long way towards increasing the scalability and robustness of the Metaverse. Just as the REST concept revolutionized thinking about the architecture of the web, I think we need some sort of conceptual foundation for real-time, immersive worlds that takes account of the stochastic nature of network connections, and doesn't simply fail to scale when more than a few dozen avatars decide to show up to an event. I'm not claiming that these ideas are magic bullets, but I think they throw some perspective on why we choose to do things in certain ways, and on how those choices are likely to impact the user experience as the system scales.

One element of this idea I'd like to examine a bit more closely is the relationship between physics, scripts, and animation. All three of these phenomena have the ability to direct the behavior of elements in the world. However, due mostly (in my opinion) to historical reasons, they present a very non-orthogonal set of capabilities that are distributed among various elements in somewhat arbitrary fashion. I would make the potentially inflammatory proposal that they should be integrated into a single concept, which I referred to previously as the "behavior server". Whether I want to use pre-sequenced animation, rigid body simulation, or some custom logic coded in a script, in all three cases my goal is to control the behavior of an element in the world. In my proposed scenario, all such behavior is invoked on the machine that owns the object in question. In the case of an avatar running a typical Second Life-style animated walk sequence, my "behavior server" would (locally) invoke a bvh (avatar animation) player function, apply it to my avatar's skeleton, and broadcast the resulting joint angles using a buffered stream of update messages as previously explained. Rather than have the other clients need to load animation assets, they will get a sequence of avatar motions that is exactly what my animation behavior server outputs. In the case where I want to use either physics (ie "ragdoll") or other custom logic to control my avatar, the other clients simply operate as before -- they faithfully replay whatever position, orientation, and joint angle commands they receive from me.

In this way, we decouple the underlying principle by which we generate behavior -- animation vs. physics vs. some other logic -- from the job of transmitting and reproducing that behavior. What is interesting about this idea from a scalability and robustness angle is this: if some bright hacker codes up a new way of controlling the avatar, they can put that capability into a custom client. Anyone who wants to exhibit that behavior needs the new client; however, *everyone*, even those with old clients, can *see* the new behavior. This applies to techniques like inverse kinematics, more realistic physics, new scripting languages, or any other possible innovation. By reducing the client's playback capability to "do what the messages tell you to do", we enable a Metaverse where many different ways of doing things can safely exist simultaneously.

And now for my final act of blasphemy. Everything I just said about avatars should also be possible for "prims". I'll go one step further: from the display perspective, the client doesn't need to know the difference between prims and avatars. There should be a generic scene element, which I refer to as an "existent". This element can be anything from a simple cube to a furry alien with compound eyes and wheels instead of legs. The fact that some of these existents are avatars should be a feature, not a fundamental distinction. The job of the client (as far as viewing the scene; there are other client responsibilities such as inventory management and so on) is simply to display these existents properly, updating their state according to buffered messages received from their respective owners (mediated through the server for now -- though there are p2p ramifications here that I won't go into). The idea is to factor out the semantic capabilities (is a human, can buy things, can create prims, can terraform etc) from the visual presentation (is a cube, is a skeleton with deformable mesh, has textures abc and xyz, etc).

There are many ramifications of these ideas, and of course it's all just mental gymnastics until someone (presumably I) takes the time to actually try to implement some of this stuff. All sorts of questions come to mind -- how do you invoke synchronized animations, such as Second Life's poseballs? -- but personally, I like challenging questions almost as much as I like unexpected answers.

Article tagged:  | 3d web


3 comment(s) for “Future 3D web: Authoritative client side physics”


Gravatar of kripken kripken said on Saturday, July 11, 2009 (12:43:36 AM)
(Can't find you on IRC, so posting here.)

1. Introducing lag in order to synchronize events perfectly is one way to go, but the price is high. And somewhat unnecessary, as people generally can't notice differences between remote clients anyhow. Without introducing lag, physics glitches are possible, and there are standard ways to deal with them (none perfect, of course, but at least 95% of the time you have a nice lag-free experience).

2. Unifying animations with physics etc., leading to sending complete joint information from the 'control location', would be simpler and more elegant than current implementations. But it would be very sensitive to network delays, as there is a lot of information to send here, and it is very time-sensitive. The issue is not just lag but also inter-packet delays. Basically you will need to buffer for a while, just like streaming of audio and video, again, introducing unpleasant lag.

I'm not saying the Second Life way here is better. It isn't good at all. But introducing even more lag seems a step backward. I would suggest instead simply adopting the existing approaches used in 3D multiplayer games like Quake etc. (which is what my project, the Intensity Engine, does). These issues are new to virtual worlds but old in gaming tech.
Gravatar of Joel Foner Joel Foner said on Saturday, July 11, 2009 (11:07:32 AM)
A traditional way to have client-side physics with arbitration between various clients whose physics engines might not "agree" is to have both client and server-side physics operational.

Client-side physics runs without network transport delays and enables local movements to be smooth, and occasionally the client cross-checks with the server and makes corrections to the client's assertions about positions of things.

This does mean that once in a while the client-based objects will move to a new position or stutter, but in general that's better than having slow movement all the time due to having to roundtrip every motion to the server, right? :)
Gravatar of luckyman418 luckyman418 said on Tuesday, July 14, 2009 (8:49:29 AM)
what is that????????
there is danger to jump!