> Games > Text>
0

Blizzard Tim Ford: "Watching Pioneer" architecture design and network synchronization

Original title: Blizzard Tim Ford: "Watching Pioneer" architecture design and network synchronization

Original reprinted from: Tencent GAD game developer platform

Translation: kevinan

At the GDC2017 [ Overwatch Gameplay Architecture and Netcode ] sharing session, Tim Ford from Blizzard introduced the design of the "Watching Pioneer" game architecture and network synchronization. Let’s take a look.

Hello, Hello, everyone, this sharing is about "watch Pioneer" (translation: hereinafter collectively referred to as Overwatch ) game architecture design and network part. Old rules, the phone is muted; remember to fill out the questionnaire when you leave; change the semi-hidden, hurry up the cart! (Laughter)

I am Tim Ford , the boss of Blizzard's Overwatch development team. Since 2013 since the project started in the summer of this team. Before that, I was in the Titan project team, but this time I didn't have a half-dollar relationship with Titan . (Laughter)

Some of the techniques shared this time are used to reduce the complexity of the growing code base (the concept of code complexity needs to be read by the reader). To achieve this we have followed a rigorous framework. Finally , we will explain how to manage complexity by discussing the inherently complex problem of network synchronization ( netcode ).

Overwatch is an online team hero shooting game with a near-future world view. Its main feature is the diversity of heroes. Each hero has his own unique skills.

Overwatch uses an architecture called " Entity Component System " , which I will shortly refer to as ECS .

ECS Unlike some ready-made engines very popular kind of component model, and with 90 late years to 21 classic early-century Actor mode difference greater. Our team has many years of experience in these architectures, so we choose to use ECS bit is " grass is greener on the mountain " means. However, we made a prototype in advance, so this decision is not impulsive.

After more than three years of development , we discovered that the original ECS architecture can manage the rapidly growing code complexity. Although I am happy to share the advantages of ECS , I must know that everything I am talking about today is actually afterwards.

ECS Architecture Overview

The ECS architecture looks like this. There is a World , which is the system . The system here refers to the S in the ECS , not the system in the general sense, for the convenience of reading, hereinafter collectively called System and the collection of entities (Entity) . The entity is an ID , the ID corresponding to the component (Component) set. The component is used to store the game state and has no behavior (Behavior) . System has behavior but no state.

This may sound surprising, because the component has no function and System does not have any fields.

System and components used by the ECS engine

The left hand side of the diagram is a list of Systems arranged in polling order , and the right side is the components owned by different entities. After selecting a different System on the left , just like playing the piano, all the corresponding components will be highlighted on the right. We call this component tuple . From the following, the main function is to call The Sibling function is used to get the components in the same tuple, which means a little virtual grouping).

System traverse checking all tuples, and its state ( State perform some operations (i.e., behavior) Behavior ). Remember that the component does not contain any functions, and its state is barely stored.

Most important systems focus on more than one component. As you can see, the Transform component here is used by many Systems .

An example of a System poll (tick) from the prototype engine

This is the polling function of the physical system , very straightforward, is a timed update of an internal physics engine. The physics engine might be Box2d or Domino (Blizzard's own physics engine). After performing the simulation of the physical world, the tuple collection is traversed. Use the proxy saved in the DynamicPhysicsComponent component to get the underlying physical representation and copy it to the Transform component and the Contact component (which will be used extensively later).

The System doesn't know what the entity is. It only cares about the small slice of the component set (slice can be understood as a specific subset ) and then performs a set of behaviors on the slice. Some entities have as many as 30 components, and some only 2 , 3 Ge, System do not care about the number, it only concerned a subset of components to perform operations behavior.

Like the example in this prototype engine (pointing to Figure 7 above), this is the player character entity that can do a lot of cool behavior, and on the right is the bullet entity that the player can fire.

Every System at runtime does not know or care what these entities are, they just perform operations on a subset of the entity-related components.

The implementation of (ECS architecture ) in Overwatch is like this.

EntityAdmin is a World that stores a collection of all System and a hash table of all entities. The table key is the ID of the entity . The ID is a 32 -bit unsigned integer that is used to uniquely identify this entity on the Entity Array . On the other hand, each entity are also saved entity ID and the resource handle ( Resource handle ), which is an optional field that points to the corresponding physical Asset resources (Annotation: it should depend on another set of specialized Blizzard Asset Management system), the resource defines the entity.

The component Component is a base class with hundreds of subclasses. Each subclass component contains the member variables required to execute the Behavior on the System . The only use of polymorphism here is to overload lifecycle management functions such as Create and Destructor . Others that can be used directly by an instance of an inherited component class have only a few helper functions for easy access to internal state . But these helper functions are not behaviors ( annotation is here to follow the principle mentioned above: components have no behavior ) , just a simple accessor.

EntityAdmin the end of the call all of System of Update . Every System will do some work. Figure 9 above is how we use it. Instead of performing operations on a fixed set of tuple components, we select some basic components to traverse, and then call other sibling components by the corresponding behavior. So you can see that the operations here are only performed on tuples of entities that have Derp and Herp components.

Overwatch client's System and component list

There are about 46 different System and 103 components here. This page's cool animation is used to attract you to see (Laughter).

Then the server

You can see that some System implementations require a lot of components, and some Systems only need a few. Ideally, we try to make sure that each System depends on many components to run. Them as pure function ( Annotation, Pure function , no side function ) without changing ( mutating ) their status, you can do it. We do have a small number of Systems that need to change the state of the components, in which case they must manage the complexity themselves.

Below is a real System code

This System is used to manage player connections. It is responsible for the forced offline on all our game servers (A , Away From Keyboard , which means offline for a long time).

The System through all Connection assembly (Annotation: not appropriate here translated directly into the " connected " ), Connection assembly for the player management server connected on the network, is physically linked to the representation of the player. It can be a player, spectator, or other player-controlled character in the game. System does not know or care about these details, its role is to force the offline.

The tuple of each Connection component contains the input stream (InputStream) and the Stats component (Annotation: it seems to be used to count battle information). We read your actions from the input stream component to make sure you have to do something, such as keyboard keys; and read from the Stats component your contribution to the game to some extent.

You only need to do these operations to reset the AFK timer. Otherwise, we will send a message to your client via the network connection handle stored on the Connection component, kicking you off the line.

Entities running on the System must have complete tuples to make these behaviors work. Like the robot entity in our game, there is no Connection component and input stream component, only one Stats component, so it will not be affected by the forced offline function. The behavior of the System depends on the " slice " of the complete set . Frankly, we really don't have to waste resources to get the forced robot off the assembly line.

Why can't I use the traditional object-oriented programming model directly?

The above system update behavior raises a question: Why can't we use the traditional object-oriented programming (OOP) component model? For example , overloading the Update function in the Connection component , constantly tracking and detecting AFK ?

The answer is that because the Connection component is used by multiple actions at the same time , including: AFK check; a list of connected players that can receive webcast messages; stores the state including the player's name; and stores the state of the player's unlocked achievements. So (if using the traditional OOP method) which specific behavior should be called in the component's Update ? Where should the rest be placed?

In traditional OOP , a class is both behavior and data, but the Connection component is not a behavior, it is just a state. Connection is totally inconsistent with OOP concept of the object, which in a different System , a different time, which means completely different things.

Then to separate the behavior and state district, what theoretical advantages ( Conceptual is Advantages ) it?

Imagine the cherry trees in your front yard. Subjectively, these trees are completely different for you, your community committee chairman, the gardener, a bird, property tax officials and termites. From the state of describing these trees, different observers will see different behaviors. A tree is a subject that is treated differently by different observers .

Analogy, the player entity, or more precisely, the Connection component, is a subject that is treated differently by different systems . The System that manages the player connection we discussed earlier , the Connection component is regarded as the main body of the AFK kickdown ; the ConnectUtility regards the Connection component as the main body of the broadcast player network message; on the client, the user interface System uses the Connection component as the body of the popup UI element with the player's name on the scoreboard .

Why is Behavior doing this? The results seem to distinguish between all based on subjective perspective Behavior , so to describe a tree of all acts will be easier, the same principle also applies to the game objects (Game Objects) .

However, with the implementation of this industrial-grade ECS architecture, we have encountered new problems.

First we are struggling with the rules we have set before: components can't have functions; System can't have state. Obviously, System should have some status, right? Some from other non- ECS import infrastructure legacy System has a member variable, what is the problem? For example, InputSystem, you can save the player input information in the InputSystem , and other System if you need to know whether the button is pressed, you only need a pointer to the InputSystem .

It seems silly to store a global variable in a single component, because you develop a new component type, it is impossible to instantiate it only once. ( Annotation: This means that if you instantiate multiple times, there will be multiple globals.) copy variables, obviously unreasonable ) , no need to prove it. Components are usually as we have seen previously that way (translation: refers by ComponentItr <> function templates that way) to iterate visit, if only one instance of a component in the whole game, and that such access will It looks weird.

In any case, this way for a while. We store one-off state data in the System and then provide a global access method. The entire access process can be seen from Figure 16. (The focus is on the g_game->m_inputSystem line).

If a System can call another System , it is not very friendly for compile time, because System needs to include each other ( include ). Suppose I now reconstructed InputSystem , want to move some functions, modify the header file ( Annotation: Client / System / the Input / InputSystem.h) , then all the header files rely on this to get input state System need to be recompiled, which is Annoying, there will be a lot of coupling, because the System exposes the implementation of internal behavior. (Annotation: Reprint does not indicate the source, really a big husband? Also delete the translator's name! Statement: This article is my kevinan should be translated by GAD request!)

As you can see from the bottom of Figure 16, we have a PostBuildPlayerCommand function, which is the main value of InputSystem here. If I want to add some new features to this function, the CommandSystem will need to populate the server with some additional structure information based on the player's input. So should my new functionality be added to the CommandSystem or the PostBuildPlayerCommand function? Am I exposing internal implementations between Systems ?

As the system grows, choosing where to add new behavioral code becomes ambiguous. The behavior of CommandSystem above is populated with some structures. Why do you want to mix them together? Why should you put it here instead of elsewhere?

Anyway, we did it for a while, until the emergence of Killcam demand.

In order to implement Killcam , we have two different, parallel game environments, one for real-time gameplay rendering and one for Killcam . I will show how they are implemented next.

First of all, it is also very straightforward. I will add a second new ECS World . Now there are two Worlds , one is liveGame ( normal game ) and the other is replayGame for playback ( Replay ).

Playback (Replay) works like this, the server delivers about 8 to 12 online games data about seconds, then flip client World , began to render replayAdmin this World information to the players on the screen. Then forward the online game data to replayAdmin , pretending that the data is really from the network. At this point, all the System , all the components, all the behaviors do not know that they are not predicted (predict , the synchronous technology mentioned later ) , they think that the client is running on the network in real time, like normal The game process is the same.

Sounds cool? If anyone wants to learn more about playback techniques, I suggest that you listen to Phil Orwig 's share tomorrow , also in this room, at 11 am.

In any case, now we have to know is: First, all require global access System call points ( Call sites ) will suddenly wrong ( Annotation: Tim thinking too jump, and suddenly changed the subject, totally can not keep up ) ; In addition, There is no longer only one global EntityAdmin , there are now two; System A can not directly access the global System B , somehow, can only be accessed through the shared EntityAdmin , so it is very winding.

After Killcam , we spent a lot of time reviewing the flaws of our programming model, including: weird access patterns; the compilation cycle is too long; the most dangerous is the coupling of the internal system. It seems that we have a big problem.

The final solution to these problems relies on the fact that developing a component with only a single instance is nothing wrong! Based on this principle, we implemented a singleton component.

These components belong to a single anonymous entity and can be accessed directly through EntityAdmin . We moved most of the state in the System to the singleton.

Here I have to mention that the state that only needs to be accessed by a System is actually very rare. We later kept this habit in the process of developing a new System , if we found that the system needs to depend on some state. Do a singleton to store, almost every time will find some other System also needs these states, so here is already ahead of the previous architecture to solve the problem in the coupling.

Below is an example of a singleton input.

All the button information exists in a singleton, but we removed it from the InputSystem . Any System that wants to know if a button is pressed, just need to take a component to ask (that singleton). After doing this, some very cumbersome coupling problems disappeared, and we also followed ECS 's architectural philosophy: System has no state; components have no behavior.

Buttons are not behaviors . There is an action in the Movement System that controls the movement of local players . Use this singleton to predict the movement of local players. The MovementStateSystem has had a behavior is to pack these key information to the server (translation: the keys for different System is not different subject).

It turns out that the use of the singleton mode is very common, and 40% of the components in our entire game are singletons.

Once we move some of the System states into a singleton, the shared System functions are broken down into Utility functions that need to be run on those singletons, which is somewhat coupled, as we'll discuss in more detail.

After the transformation in FIG. 22 is , InputSystem remains (Annotation: however, did not see InputSystem where), which is responsible for reading the input operation from the operating system, filling SingletonInput value, and the other downstream of the System can obtain the same Input do They want to do it.

Things like button mapping can be implemented in a singleton and decoupled from CommandSystem .

We also moved the PostBuildPlayerCommand function to CommandSysem . This should be the case. Now we can guarantee that all changes to the player's input command ( PlayerCommand ) can be made here and only here. These player commands are important data structures that will be synchronized on the network and used to simulate the game process in the future.

When we introduced the singleton component, we didn't know that we were actually creating a decoupling and complexity reduction development model. In this example, CommandSystem only one place (Annotation player capable of generating input commands related side effects: sideeffect , means that when calling a function, in addition to a function that returns a value, but also produce additional effects on the primary call functions, such as modifying Global variables).

Every programmer can easily understand the changes in player commands, because at the same time a System update, only this code is likely to change. If you want to add a modified code for the player's command, it is also very clear, can only be changed in this source file, all ambiguity disappeared.

Now discuss another issue, related to shared behavior (sharedbehavior) .

Sharing behavior generally occurs when the same behavior is used by multiple Systems .

Sometimes two observers of the same subject are interested in the same behavior. Going back to the example of the cherry tree in front, your community committee chairman and gardener may want to know how many leaves the tree will fall in the spring.

According to this output can be done differently, at least the chairman may yell at you, the gardener will honestly go back to work, but the behavior here is the same.

For example, a lot of code will care about " hostile relationships " . For example, is entity A and entity B mutually hostile? Hostile relationships are determined by three optional components: filter bits , pet master and pet . The filter bits store the team index ; the pet master stores the unique keys of all the pets it has ; the pet is typically used for torpedo like Tobion.

If the 2 entities do not filter bits , then they are not hostile. So for the two doors, they are not hostile because their filter bits component has no team number.

If they ( translation: 2 entities ) are in the same team, that nature is not hostile, it is very easy to understand.

If they belong forever hostile 2 Ge team, they will also check his body and the other body pet master components, to ensure that every pet all and the other is hostile relations. This also solves a problem: If you are hostile to everyone, then when you build a turret, the turret will attack you immediately (Annotation: I don't understand why). Yes, this is a bug , we fixed it. (Laughter)

If you want to check the hostile relationship of a projectile in flight, you only need to go back and check the firer who shot the shell. It's very simple.

The implementation of this example is actually a function call, the function name is CombatUtilityIsHostile , which accepts 2 entities as arguments and returns true or false to indicate whether they are hostile. Numerous System calls this function.

Figure 25 in this function is called the System , but as you can see, only used the 3 components, very little, and this 3 components of which are read-only. More importantly, they are pure data, and these System will not modify the data inside, just read.

Let's take another example of using this function.

As an example, we used different rules when using the Utility function to share behavior .

If you want to call a Utility function in multiple places , then this function should rely on very few components, and should not have side effects or few side effects. If your Utility function depends on many components, try limiting the number of call points.

Our example here is called CharacterMoveUtil , which is used to move the player's position in each tick during the game simulation . There are two call points , one is to simulate the execution of the player's input command on the server, and the other is to predict the player's input on the client.

We continue to replace the function calls between System with the Utility function and move the state from System to the singleton component.

If you plan to use a shared Utility replacement function System function calls between, it is impossible to automatically (magically) to avoid complexity, almost had to make a statement-level adjustments.

Just as you can hide side effects behind the publicly accessible System functions, you can do the same thing behind the Utility function.

If you need to call those Utility functions from several places , it will introduce a lot of serious side effects throughout the game loop. Although it happens after the function call, it doesn't look so obvious, but it's also quite a terrible coupling.

If this sharing only lets you learn a little bit, then it's best to: If there is only one call point, the complexity of the behavior will be low, because all side effects are limited to where the function call occurs .

Let's take a look at the techniques we use to reduce this type of coupling.

When you find that some behaviors may have serious side effects, you must perform, ask yourself: this code is the need now to be implemented?

Good singleton components may communicate by " delayed " ( Deferment ) to address System issues between coupling. " Deferred " stores the state required for the behavior and then delays the side effects to a better timing in the current frame before executing.

For example, the code, there are a lot of points to be invoked to generate a collision effects (Impact Effects) .

Comprising hitscan ( Annotation: direct, no time of flight ) bullets; explosive projectile time of flight of the belt; check Liya particle beam, the beam looks like cracks in the walls, and the need to maintain in contact when firing target; in addition to spraying .

The side effect of creating collision effects is great because you need to create a new entity on the screen that can indirectly affect lifecycle, threading, scene management, and resource management.

The life cycle of collision effects needs to start before the screen is rendered, which means they don't need to appear in the middle of the game simulation, at different call points.

Figure 30 below is a small part of the code used to create collision effects. Based on Transform (transformation, including displacement rotation and scaling), collision type, material structure data for collision calculation, and also called LOD , scene management, priority management, etc., and finally generated the required special effects.

These codes ensure that the lasting effects like bullet holes and burn marks are not surprisingly stacked. For example, you use a hunt gun to shoot a wall, leaving a pile of pockmarks, and then the Pharaoh's Eagle sends a rocket to create a large area of ​​scorch on the pockmark. You definitely want to delete those pockmarks, or else it looks ugly, like the kind of flicker caused by Z-Fighting . I don't want to go through the delete operation everywhere, it's best to get it in one place.

I have to modify the code, but it looks a lot, there are a lot of call points, and every time I need to test after the change. And more and more heroes in the future, everyone needs new special effects. Then I copy and paste the call of this function everywhere, no big deal, not a function call, it is not a nightmare. (Laughter)

In fact, after doing this, there will be side effects at each call point. The programmer has to spend more brain power to remember how this code works. This is where the complexity of the code is and should definitely be avoided.

So we have a Contact singleton.

It contains an array of pending collision records, each with enough information to create that effect later in the frame. If you want to generate a special effect, just add a new record and fill the data. When running to the end of the frame, when the scene is updated and ready to render, ResolveContactSystem will iterate over the array, generate special effects according to the LOD rules and overlay each other. In this case, even if there are serious side effects, each frame only happens at one call point.

In addition to reducing complexity, the postponed approach has many other advantages. Data and instructions are cached locally, can bring performance; you can do a performance budget for special effects, for example, you have 12 months D.VA while shooting the wall, they will bring hundreds of special effects, you do not immediately create all These effects, you can just create the D.VA effects that you manipulate yourself , other effects can be spread out in the subsequent operations, smoothing performance glitch. There are many benefits to doing this. Really, you can now implement some complex logic. Even if ResolveContactSystem needs to perform multi-threaded collaboration to determine the orientation of a single particle effect, it is now easy to do. The " postponed " technology is really cool.

Utility function, single cases, postpone, these are just our past 3 years to build ECS a small portion of the mode architecture. In addition to limiting the state of the System , there can be no behavior in the component, these techniques also dictate how we solve the problem in Overwatch .

Following these restrictions means you have to use a lot of tricks to solve the problem. However, these technologies ultimately led to a sustainable, decoupled, and concise code system. It limits you, it takes you to the pit, but it is a " success pit . "

After learning this, let's talk about one of the real challenges and how ECS simplifies it.

As a gameplay ( game, mechanism ) engineer, the most important problem we have solved is network synchronization ( netcode ).

The first goal here is to develop a responsive online battle action game. In order to achieve a quick response, you must make predictions about the player's operation ( predict , or pre-performance). If every operation has to wait for the server to return the package, it is impossible to have high responsiveness. Although I can't trust the client because some bastard players cheat, it has been 20 years, and the truth of this FPS game has not changed.

The games that have quick response requirements include: movement, skills, as well as weapons with skills, and hit registration .

All operations here have a uniform principle: the player must be able to see the response immediately after pressing the button. This must be the case even if the network latency is high.

As demonstrated in my PPT on this page , the ping value has been 250ms , and all my operations are immediately getting feedback. " Looks " is perfect, there is no delay.

However, with predictive clients, server authentication and network latency can have side effects: misprediction , or prediction failure. The main symptom of predicting errors is a little bit, and you will not be able to successfully perform the " What you think you have done " operation.

Although the server needs to correct your operation, the cost is not an operational delay. We will use the " certainty " ( Determinism ) predicted to reduce the probability of error, the following are specific practices.

The preconditions are unchanged and the PING value is still 250 milliseconds. I think I jumped up, but the server does not think so, I was shoved back to the place, but was frozen (frozen is a hero Mei one of skill). Here ( video presentation in PPT ) you can even see the entire forecasting process. At the beginning of the forecasting process, we tried to move us into the air, and even the CD of the gorilla jumping skill has cooled down. This is correct. We don't want the prediction accuracy to be only nine out of ten. So we want to respond as quickly as possible,

If you happen to play this game in Sri Lanka and you are frozen by Mei , you may be able to predict the error.

Below I will first give some guidelines and then discuss how this new technology uses ECS to reduce complexity.

There is no general data replication technique, remote entity interpolation , or backwards reconciliation technical details.

We are completely on the shoulders of giants, using some of the techniques mentioned in other literature. The slides that follow will assume that everyone is familiar with the techniques.

Deterministic (Determinism)

Deterministic analog techniques rely on clock synchronization, fixed update cycles, and quantization. Both the server and the client run on top of this synchronized clock and quantized value. Time is quantized into a command frame , which we call a " command frame . " Each command frame is fixed for 16 milliseconds, but is 7 milliseconds in an esport game .

The frequency of the simulation process is fixed, so the computer clock cycle needs to be converted to a fixed command frame number. We used a loop accumulator to handle the increase in frame number.

In our ECS within the framework of the need for any pre-performance, analog input or based on the results of the players System , will not use the Update , but with UpdateFixed . UpdateFixed will be called at each fixed command frame.

Assuming the output stream is stable, the client will always be ahead of the server, ahead of the length of about half an RTT plus a cache frame. The RTT here is the PING value plus the logical processing time. In the example in Figure 39 above, our RTT is 160 milliseconds, half is 80 milliseconds, plus 1 frame, we are 16 milliseconds per frame, which is the advance amount of the client relative to the server.

The vertical lines in the figure represent the frames in each process. The client starts emulating and reports the input of frame 19 to the server. After a period of time (basically half an RTT plus buffer time), the server begins to simulate the frame. This is why I have to say that the client is always ahead of the server.

Just because the client is a brain to accept player input as soon as possible, as close as possible to the current moment, if you still need to wait for the server to return the package to respond, it seems too slow, it will make the game become stuck. The buffer in Figure 39 , you definitely want to be as small as possible. (By the smaller the buffer, the closer the simulation is to the current time), by the way, the frequency of the game is 60 Hz, the speed at which I play the animation is normal. One percent of the speed (this is also to make the audience see more clearly and understand).

The client's prediction System reads the current input and then simulates the emptying movement process. Here I use the joystick to indicate the empty input operation and report it. Here ( 14th frame) hunting is the state of motion that I am simulating at the moment. After the complete RTT plus buffering event, the final hunting will return from the server to the client. (Annotation: It is best to combine the speech video, static The article cannot be expressed in place). Returning here is a server-verified motion state snapshot. The side effect of server emulation authority is that verification requires an additional half of the RTT time to get back to the client.

So why should the client use a ring buffer to record historical motion trajectories? This is to facilitate comparison with the results returned by the server. After comparison, if the result is the same as the server simulation, then the client will happily continue processing the next input. If the results are inconsistent, that is, a "prediction error", then you need to "reconciliation" ( Reconcile ) a.

If you want to be simple, then directly overwrite the client with the results delivered by the server, but the result is already "old" (relative to the current input), because the server's return package is usually a few hundred It was before milliseconds.

In addition to the ring buffer above, we have another ring buffer to store the player's input operations. Because the code that handles the move is deterministic, once the player starts to enter and he wants to move into the mobile state, it is easy to reproduce the process. So here we deal with it, once the packet is recovered from the server and the prediction fails, we will replay all your input until it catches up with the current moment. As shown in the 17th frame in Figure 41 below, the client thinks that the hunt is running, and the server indicates that you have been stunned and may have been attacked by McRae's flash.

The next process is, when the client receives the data packet describing the status of the role, we basically have to restore the mobile state to the last time the server has been verified, and all the input operations must be recalculated until Catch up with the current time ( frame 25 ).

Now the client proceeds to frame 27 (above), when we receive the 17th frame of the server . Once resynchronized , it is equivalent to reverting to the " lockstep " algorithm after all the state of the client's hunt in Figure 41 is corrected to "halo" .

We must know how long we have been fainted.

After the 33rd frame of the figure below , the client knows that it is no longer stunned, and the server is simulating the same situation. There is no longer a strange sync catching problem. Once in this mobile state, the player's current moment of operation input can be resent.

However, the client network does not guarantee this stability, and packet loss occurs. The input in our game is implemented through a customized and reliable UDP . Therefore, the input packet of the client often cannot reach the server, that is, the packet is lost. The server tries to keep a small buffer that holds unsimulated input, but keep it as small as possible to ensure smooth game operation.

Once this buffer is empty, the server can only "guess" based on your last input. When the real input arrives, it tries to "moderate" and make sure that you don't lose any of your actions, but there are also prediction errors.

Here are the moments to witness the miracle.

As you can see from the above figure, you have already lost some packages from the client. After the server realizes that it will copy the previous input operations to predict, and pray that you want to predict correctly, you will send the package to the client: " Hey, buddy, lost the package. " It's not quite right . " What happens next is even more strange. The client will time-expand and simulate faster than the agreed frame rate.

In this example, the agreed frame rate is 16 milliseconds, and the client will pretend that the frame rate is now 15.2 milliseconds, which it wants to be more advanced. As a result, these inputs are getting faster and faster. The buffer on the server will also grow larger, which is to get through the (loss of packet) difficulties without wasting.

This technology works well, especially in the Internet environment where jitter is often jittery, and packet loss and PING are unstable. Even if you play this game on the International Space Station, it is ok. So I think this program is really NB .

Now, everyone is taking notes. I received the news here. Now I will zoom in on the time scale. Notice that we are really speeding up the polling. You can see that the slope on the right side of the picture is getting flatter. It reports input more quickly than before. At the same time, the buffer on the server is getting bigger and bigger, and it can tolerate more packet loss. If the packet loss occurs, it may be added during the buffer.

Once the server finds out that your current network is back to health, it will send you a message saying, " Hey guys, it's okay now . " The client does the opposite: it shrinks the time scale and sends the packet at a slower speed. At the same time, the server will reduce the size of the buffer.

If this process continues, the goal is to not exceed the limit and minimize the prediction error by inputting redundancy.

As I mentioned earlier, once the server is hungry, it will copy the last input operation, right? Once the client catches up, it will not copy the input again, which will risk being ignored due to packet loss. The solution is that the client maintains a sliding window of input operations. This technology has been around since the world of Thor.

Instead of just sending the current 19th frame input, we send all the input from the last confirmed motion state to the current one. As you can see from the above example, the last confirmation from the server is the 4th frame. And we just simulated the 19th frame. We will package each input of each frame into a single packet. Players usually only have one operation every 1/60th of a second, so the amount of data after compression is actually not big. Generally, before you hold down the "forward" button, it is likely that you are already "advancing".

The result is that even if a packet loss occurs, the next packet will still have all the input operations when it arrives. This will fill in all the holes that have occurred due to packet loss before you actually simulate it. So the process of this feedback loop and the size of the buffer that can grow, as well as the sliding window, so that you don't lose anything because of packet loss. So even if you lose the packet, there will be no prediction errors.

Next, I will show you the animation process again. This time it is double speed, which is 1/50 of the normal speed .

There are all unstable factors here: network PING value jitter, packet loss, client time scale amplification, input window filled with all vulnerabilities, prediction failure, server correction. We all play together for you to see.

The next issue, I do not want to say too much detail, because this is the share of Dan Reid theme ( Annotation, has been translated ) , because this is part of the opening ceremony, it is strongly recommended that you listen to, really great. Still in this room, I started when I finished.

All skills are developed using Blizzard's own instructional scripting language State . One of the great advantages of the scripting system is that it can travel through time and space. Predicting on the client side, then server validation, like the move operation in the previous example, we can roll back and replay all the input. The skill also uses the same roll-and-roll principle as the move, first back to the state of the last verified snapshot, and then replay the input until the current time.

You must remember this example, that is, the server correction process caused by the stun of the shovel, the processing of the skills is the same. Both the client and the server simulate the deterministic process of skill execution. The client is ahead of the server, so the client first simulates and the server follows up later. The way the client handles the prediction error is to roll back according to the server snapshot and then roll forth , just like the animation process of the slide show. Demonstrated here is the ghost form of death. Figure 45 of these boxes (Annotation: State of the State ) on behalf of the ghost form, with these squares very confident I can play cool special effects and animation.

These blocks are closed when the ghost form is over. These small animations show the closing process of State in the same frame . Then is the emergence of ghost form, we will soon get the message from the server: " Hey, I predict the process ghost form already told you, so you hurry back to back, these State are open, and then we re all analog inputs, these State were shut " . This is basically the process of rolling back and rolling forward each time the server issues an update.

Being able to predict that movement is cool means that we can predict each skill, and we do. We can do the same for weapons or other modules.

Now let's discuss the prediction and confirmation of the hit decision.

ECS is very convenient to handle this. Remember, if an entity has the component tuples required for its behavior, it will be the subject of this behavior. If your entity is hostile (remember the hostile check we talked about earlier) and you have a ModifyHealthQueue component, you can be hit by another player, which is subject to a " hit determination " .

These two components, one is used to check for hostility, and the other is ModifyHealthQueue . ModifyHealthQueue is all the damage and treatment you have recorded on your server. Similar to the singleton Contact , it is also delayed, and there are multiple call points, which is the biggest side effect. The delay calculation is because we don't want to generate a lot of special effects on the way to the projectile simulation. We choose to delay.

By the way, the damage is not predicted at all on the client side, because they are all scammers.

However, the hit determination is handled on the client side. So, if you have a MovementState components, and will not be manipulated is a local player remote objects, then you will be moved System interpolated ( the interpolate ) operation to reposition. Standard interpolation occurs between the two last received MovementStates , a technique that has been around since the Quake era.

System doesn't care if you are a mobile platform, a turret, a door or a pharaoh. You only need to have a MovementState component. The MovementState component is also responsible for storing the ring buffer. Remember the ring buffer? Previously used to save the position of those hunters.

With the MovementState component, the server will roll you back to the frame you were in when the attacker reported it before the hit was calculated. This is backwards reconcilation . All this and ModifyHealthQueue components orthogonal, ModifyHealthQueue decide whether to accept the damage components. We also need to revert back to the door, platform, and car state. If the bullet is blocked, it doesn't matter. Generally speaking, if you are hostile and have a MovementState component, you will be reversed and may be injured.

The rewind ( rewind ) is a set of Utility manipulation functional behaviors; injury is MovementState another behavior that occurs when the assembly is delayed. These two behaviors are independent, each occurring on a separate component slice.

The shooting process is a bit abstract, I will break it down here.

The box in Figure 47 is the bounding volumes of each entity . The logical boundary is basically a union that represents the real-time snapshot of this source. So the logical boundary around the source represents the entire range of motion of the character in the past half second. If I am shooting in the direction of the crosshairs, I will first intersect this boundary before reversing the character, because based on my PING value, it is possible to be anywhere in the boundary.

In this example, if I shoot in this direction, I only need to return to Anna alone, because the bullet only intersects her border. You don't need to return the sledgehammer and his energy shield or car at the same time, as well as the back door.

Shooting is like moving, and there may be prediction failures.

The green doll here is the client perspective of the god of death, and the yellow is the server perspective. These little green dots are where the client thinks its bullets hit. It can be seen that the green thin line is the path through which the bullet passes, but when the server is verifying, the blue-violet hemisphere represents the actual hit position.

This is entirely an example of personal manufacturing. The deterministic simulation process is very reliable. In order to reproduce the prediction failure during the shooting process, I set my packet loss rate to 60% , and then shot the bastard for 20 minutes. Successfully reproduced ( Laughter ) .

I have to mention here that the simulation process is so precise, thanks to our QA team colleagues. They never accept "NO" as the answer, and because other games on the market don't make the prediction accuracy of the hit determination to this level, our QA friends don't believe me at all, and don't care about me. Just keep on raising bugs , and there are more and more bugs , and every time we check to see if there are really bugs , the result is really there every time. I want to express my deep gratitude to them, and with their work, we can make such a great product.

If your PING value is particularly high, the hit decision will be invalid.

Once the PING value exceeds 220 milliseconds, we will delay some hits and will not predict it again, waiting for the server to return the package confirmation. The reason for this is that the client has done extrapolation and does not want to revert the target so far. I don't want the victims to feel that they are desperately running behind the wall to find a cover. The result is still pulled back and injured. So added a layer of protection. This reverts to behavior after a period of extrapolation. The video below demonstrates this process (Annotation: It is highly recommended to watch the video).

When PING is 0 , the ballistic collision is predicted, and the hit point and the blood bar are not predicted, and the server is required to wait for the package to be rendered.

When the PING reaches 300 milliseconds, the collision is not predicted, because the shooting target is doing the extrapolation of the fast reading, he is actually not here at all, here we use the DR ( Dead Reckoning ) navigation speculation algorithm, although very close, But he is really not there. This happens when the god of death sways back and forth, and it cannot be predicted correctly when extrapolated. Here we will not take care of your feelings, your network is too bad.

This last video , especially when PING reaches 1 second, is especially noticeable. The way of moving the god of death is the same, there will be extrapolation. By the way, even PING is as slow as 1 second, and all operations on the client can still be predicted and responded immediately, but most of them are wrong. In fact, I should enlarge the move (noon has arrived), and I can definitely kill him.

The following is an example of other prediction failures. The PING value is still not very good, 150 milliseconds. Under this condition, whenever a sports prediction fails, the hit is incorrectly predicted. Let's show it in slow motion below.

Look, they are all bleeding, but they have not seen the blood, and did not see the crater, so it is wrong for the prediction of ballistic collision. The server refused, this is not a legitimate hit. The reason why the collision effect prediction failed was that the " ice wall " stood up. You " thought " their own fire Shihai standing on the ground, but when the simulation server, you have been ice wall rose in the air, is this behavior led to predict failure.

When we fixed these tiny hit prediction errors, we found that most of the situation was eliminated by agreeing with the server on the location problem, so we spent a lot of time aligning the locations.

The following are examples of motion-related prediction failures, as well as gameplay.

The PING value is still 150 milliseconds. You want to shoot this death, but he is in a ghost form. When the arrow hits him, the client will predict that there should be blood, no hit pit , and no blood. We didn't hit him at all because it had already entered the state of ghosts.

In this case, although the attacker is given priority most of the time, unless the victim does something to mitigate the attack. In this example, the ghost form of Death will give him 3 seconds of invincible time. In any case, we did not really hit the god of death.

Let me imagine from a philosophical point of view, you are the god of death, you are in a ghost state, but in fact the server is likely to let you play all the special effects, let you die, because you can not enter that state so quickly.

ECS simplifies network synchronization issues. The System used in the network synchronization code knows when it is used by the player. It is very straightforward. Basically, if an entity is controlled by something with a Connection component, it is a player.

The System also knows which targets need to be reverted back to the frame of the attacker's moment, and any entity that contains the MovementState component will be rewinded.

The main behavior of the intrinsic association between entities and components is that MovementState can be canceled on the timeline.

Figure 52 above is a panorama of the System and components, of which only a few are related to network synchronization behavior. And this is the most complicated problem we know. Two of the System are NetworkEvent and NetworkMessage , which are the core components of the network synchronization module and participate in typical network behaviors such as receiving input and sending output.

There are several other systems , one hand is counted: InterpolateMovement , Weapons , State , MovementState , I especially want to delete the MovementState , because I don't like it. So, in fact, network synchronization module, only 3 Ge System with gameplay -related component of which is used to highlight the right list, and only the components for network synchronization module is read-only. What really modified the data is like ModifyHealthQueue , because the damage done to the enemy is real.

Looking back now, after using ECS for so many years, I have learned what knowledge and experience.

I kind of hope that System and Utility can return to the usage of the authoritative routine of the earliest ECS operation ancestor. The practice is a bit special. We only need to traverse one component, and then access all the sibling components through it. For truly complex components to access the tuple model, you must know the exact access object. If there is a behavior that requires a tuple with 40 components, it may be because your system design is too complicated and there is a conflict between the tuples.

Tuple Another cool side effect is that you have mastered about what System prior knowledge of what the state can access, then back to our prototype engines that use tuples them, we can know 2 or 3 Ge System can operate different Collection of components. Because they know their purpose based on the definition of the tuple. The design here is very easy to expand. Just like the previous animation of playing the piano, you can see that multiple Systems are lit at the same time, just because the set of components they manipulate is different.

Since the priority of component reads and writes is already known, System polling can do multithreading of gameplay code. To put it this way, the Transform component is still very popular, but only a few Systems will actually modify it, and most of the System is read-only. So when you define a tuple, you can mark the component as " read-only " , which means that even if there are multiple Systems operating on the component, they are all read-only and can be processed in parallel.

Entity lifecycle management requires some tricks, especially those created in the middle of a frame. In the early days, we postponed the creation and destruction behavior when you said " Hey, I want to create an entity " , which is actually done at the end of that frame. As it turns out, there is no problem with postponing the destruction, and there are a lot of side effects when it is postponed. Especially when you apply to create a new entity in System A and then use it in System B , then if you postpone the creation process, you have to use it every other frame.

This is a bit uncomfortable. This also adds a lot of internal complexity ( Annotation: see here, complexity is some hidden rules, need to brainstorm to remember the hardcode) , we want to modify this part of the code so that it can be created in the middle of a frame Ok, so you can use it right away.

It was terrible that we made these changes after the game was released. This patch was played in version 1.2 or 1.3 . I was all night on the night of the line.

We spent about 1 and a half years to develop ECS using criteria like the example before that authority, but we need to reform some of the existing code so that it can adapt to the new architecture. These guidelines include: components have no functions; System has no state; shared code is placed in Utils ; complex side effects in components are deferred through queues, especially singleton components; System cannot call other System functions, even us It’s not the same as the name of the System . This system was shared by Blizzard a few years ago.

There is still a large amount of code that does not conform to this specification, so they are the main source of complexity and maintenance work, and it is not surprising. You can see this by looking at the number of code changes or the number of bugs .

So, if you have any legacy code and can't fit into the ECS specification, you should never use it. Keep the subsystems clean and don't need to create any proxy components to package them.

Different system designs are different ways to solve problems.

ECS is an integrated large number of System of tools, improper system design principles should not be used.

ECS designed was used to integrate a large number of modules and decoupling, many System and its components are dependent on the shape of the tip.

Iceberg type module to other ECS the System small surface exposed, but they are actually a large number of internal state data structure is a proxy or ECS layer inaccessible.

These icebergs are quite obvious in the threading model, and most of the ECS work, such as updating the System , happens on the main thread ( top of Figure 58 ) . We also use a lot of multi-threading technologies like fork and join . In this example, a character fires a lot of projectiles, and then the script System says that we need to generate some projectiles, and we create several worker threads to work. Also here is ResolvedContactSystem who wants to create some collision effects, which takes a few worker threads to do the job.

The behind-the-scenes work of the projectile simulation has been isolated and is invisible to the upper ECS , which is good.

Another cool example is the AIPetDataSystem , which works well with fork and join modes. At the ECS level, there is only a little coupling, which may mean " Hey, this is a destructible door, you may need to rebuild in these areas." " Path " , but there are a lot of behind-the-scenes work, like getting all the triangles, rendering and cutting, these are not related to ECS , we should not put ECS in those problem areas, we should think of it.

Here video presentation is PathValidationSystem , the path ( the Path ) is all the color blue, the AI can walk on the surface thereof. In fact, the path is not only used for AI , but also used in the skills of many heroes. So you need to synchronize the data between the server and the client.

The Zen Tower in the video will destroy these items, and you will see the damaged objects fall below the surface. Then the door there will open and we will stick those surfaces together. PathValidationSystem only needs to say: " Hey, the triangle has changed " . Then the iceberg will use all the data to reconstruct the path.

Now I am ready to end today's sharing.

ECS is Overwatch 's binder, it's cool because it helps you integrate a large number of discrete systems with minimal coupling. If you plan to define your specification with ECS , in fact, no matter what architecture you want to use to quickly define your specification, there should be only a few programmers who need to touch physical system code, scripting engines, or audio libraries. But everyone should be able to use the glue code to integrate the system together.

By implementing these restrictions, you will be successful.

It turns out that network synchronization is really complicated, so it must be decoupled from the rest of the engine as much as possible. ECS is a good way to solve this problem.

Finally, before accepting the question, I would like to thank our team members, especially the gameplay engineer, who spent three years creating such a wonderful piece of art. We work together to create principles, and the architecture is constantly evolving, and the results are obvious to all.

If you want to see more exciting content, please pay attention to the WeChat public number "Gad-Tencent Game Developer Platform"!

Original link: http://gad.qq.com/article/detail/7212152

Please indicate the source!Go back to Sohu and see more

Editor:

Disclaimer: This article only represents the author himself, Sohu is the information publishing platform, and Sohu only provides information storage space services.
Reading ( 1305 )
not interested
Do not submit again

Thank you for your feedback, we will reduce the recommendation of such articles

Complaint
Related recommendations in this article
I have two sentences
0 people participated, 0 comments
Sign in and post
Comments can't be empty!
No comments yet, come and grab the sofa!
This comment has been closed!

Select report type

Marketing advertising
Obscene porn
Malicious attack谩骂
other
submit
cancel

Verified

Little Fox reminds you:

In accordance with national laws, the use of Internet services requires real-name verification. In order to protect the normal use of your account, please complete the mobile phone verification as soon as possible, thank you for your understanding and support!

Recommended reading
免费获取
今日推荐
安全提示
系统出于安全考虑,在点击“发送语音验证码”后,您将会收到一条来自950开头号码的语音验证码,请注意接听。
暂不发送发送语音验证码

Original text