Jump to content

Server Profiling test


Recommended Posts

Hi guys and gals,

it's time for another profiling run of the server, so we can eliminate the hot spots and get the game running a bit better.

With the new server hardware some of us have noticed the game doesn't seem to be running quite as well as it did, although this may be our collective imaginations. Whatever the reason, the current server hardware should be more than capable of handling everything smoothly.

I'm sure we can improve the code a fair bit, so I've been running the profiler on the dev server, which has highlighted some of the 'background' inefficiencies which have been re-written now. This was mainly dealing with MOB vs MOB interactions, which could produce quite a high server load at times.

I'll bet though that if we run the profiler on the main server with everyone online it'll yield some pretty interesting data on where the bottlenecks are.

So, some time tomorrow, probably evening US time, I'll do a profiling run. This shouldn't affect the experience too much, but unfortunately the way visual studio is rigged up we need to stop the Net7 server to see the results once the run (lasting about 30 mins or so) has collected enough data. It'll be restarted straight away.

Then the 'fun' begins seeing where the server is getting bogged down, and hopefully re-coding to try and do something about it.

Yesterday I came online and from the in-game profile data it said there had been a sector cycle in Glenn that took 18 seconds. The normal execution time for a sector cycle is meant to be less than 0.1 second.

So, I'd like to get a raid group going to the FB, people prospecting in groups and misc combat and missions - all in all normal business for the server.

Hopefully we can get this server running a lot more smoothly. When we start running in release mode, we'll see another big performance leap, but I would like to see the debug build running smoothly first.

cheers,

TB
  • Upvote 1
Link to comment
Share on other sites

Static have been having a horrible time in the Fish Bowl recently. Some of us have found disabling sound in the game config seems to help a bit, but lots of lag and disconnections are making us less enthusiastic to risk our activation devices when we can't guarantee raid completion.

But, I'll be in class tomorrow until 8pm central so I'm not sure if I can organize any raids during your run, but I hope it goes well. Keep up the good work, Tien. Edited by Alyseria
Link to comment
Share on other sites

[quote name='Alyseria' timestamp='1301986523' post='38265']
Static have been having a horrible time in the Fish Bowl recently. Some of us have found disabling sound in the game config seems to help a bit, but lots of lag and disconnections are making us less enthusiastic to risk our activation devices when we can't guarantee raid completion.
[/quote]

Alyseria,

We have been running several raids a day with no issue in the FB. The buffer fix the TB put in did seem to help a lot. One thing to check: In the FB you really must have the packet optimization and prototype reorder enabled on the launcher. The catch is that one of the last several updates unchecked them so many people did not notice and plaid right through. Checking those should improve your FB experience greatly.

Lannister
Link to comment
Share on other sites

In the past 2 days I have personally ran at least 8 fishbowls with a 6 man raid successfully in 30-40mins start to finish, no CTDs and very very minimal lag (and believe me we we're pounding the mobs with tons of AOEs all 4 PWs were firing Shield Sap and Shield Nova constantly, the only place we so any noticeable lag (although not game ruining) was on Nesshix Spawn who formerly use to be the CTD King!

Group was 6 people and maybe had 1-3 pickers outside the group in the bowl with us. As Lannister mentioned its important to have Packet Optimization checked (not sure Reorder plays any role) but I always have both checked anyhow. And I honestly the FB can be done with a 4-6man group so perhaps running with more than 1 group in the zone could be the cause for lag as well? Most I've ever had in is maybe 2 groups (2nd group purely low level looters if any as they were not needed to kill the Raid).
  • Upvote 1
  • Downvote 1
Link to comment
Share on other sites

I'm really looking forward to seeing the profiler report on this! Call me a nerd, but nothing's more fun than looking at a bunch of numbers and charts about method execution time! B)
Link to comment
Share on other sites

[quote name='C Del' timestamp='1302025548' post='38292']
I'm really looking forward to seeing the profiler report on this! Call me a nerd, but nothing's more fun than looking at a bunch of numbers and charts about method execution time! B)
[/quote]
NERD! ;) :P
Link to comment
Share on other sites

Nerd! I am a geek too but wouldn't go that far!

I've had to SQL profiling for my job .... Boring .... Until the time comes to tell the developer it's there code that sucks ass ;) and my SQL server runs sweet as anything B) *smug*
Link to comment
Share on other sites

[quote name='C Del' timestamp='1302025548' post='38292']
I'm really looking forward to seeing the profiler report on this! Call me a nerd, but nothing's more fun than looking at a bunch of numbers and charts about method execution time! B)
[/quote]

Right on!

Would anyone else be interested in seeing the results, with a brief explanation of what the offending method does, for the top 10 or so bottlenecks?
Link to comment
Share on other sites

Have someone analyze and manufacture (alot during this too) lately it seems this is been hit with severe lag and pushing that Manufacture button 200+ times for ammo/comps is a DRAG and then some. I think I click manufacture and it takes anywhere from 4-10secs minimum before I can push again for the next.


Some other (not sure if belongs here but possibly could boost performance).

Shut off every user being able to see when everyone logs on. In live you would only see those messages if that person was on your friends list. I know sometimes its a help, but mostly its more an annoyance to see people constantly logging on/off as they switch between alts, crash, etc. This just may be me but it pushes what little window I have for information off the screen and fills up the logs rather quickly with useless info.

The other one would be the info chat window showing the components in hold flash upon every push of the manufacture button, if not mistaken (correct me if wrong again) but wasnt it just an error message for an item that was not present/missing, thus preventing you from pushing the manufacture button? That could be another hint as to why that takes forever too the fact it checks all that and sends a message? I assume the checks are necessary but is it really helpful to keep saying over and over 200+ times in the chat window that X items are in hold (again see comps/ammo/items requiring 5+ component builds).
Link to comment
Share on other sites

[quote name='RojoMuerte' timestamp='1302060278' post='38335']
Have someone analyze and manufacture (alot during this too) lately it seems this is been hit with severe lag and pushing that Manufacture button 200+ times for ammo/comps is a DRAG and then some. I think I click manufacture and it takes anywhere from 4-10secs minimum before I can push again for the next.
[/quote]

according to the source code this isn't lag - it's coded to take a certain amount of time to finish manufacture etc.

Maybe there could be some sort of progress sound triggered or something like that?
Link to comment
Share on other sites

[quote name='Tienbau' timestamp='1302061050' post='38336']
according to the source code this isn't lag - it's coded to take a certain amount of time to finish manufacture etc.

Maybe there could be some sort of progress sound triggered or something like that?
[/quote]


Can you verify what time its suppose to take? Since server move its taking almost 2X as long as before (for more than just me, several in game are experiencing severe delay/lag while at the manufacturing terminal. Assuming you say source code this cannot be modified? If it can set that mess to 1-2secs LOL.

Not sure about the sound (so routine now) but isn't there a noise of some sorts made when the screen flashes "Rojo has manufactured X item at Y %" ? I'd say gather yourself some components to build some ammo say 20-30 stacks... Ooh even better start with making the ammo comps for said stacks :) Then work on the ammo. Look at the delays between each manufacture button push (it varies noticeably between each push) and as mentioned it seems to be taking longer than normal (even before was a bit long if you ask me, and only worsened as of late).
Link to comment
Share on other sites

ok I'll have a look at it. The manu terminals make heavy use of STL, which we know really slows down the server, so it could be they need a recode. With any luck the profiling will show up problems. OTOH if the problem has suddenly gotten worse with the new server it points to a different cause.

Speaking of which - it wasn't possible to do as currently the profiling is broken on the main server. I was able to make quite a few speed optimisations for all the 'background' processes from the dev server results which may make things work better after the next restart, but let's wait and see eh.
Link to comment
Share on other sites

[quote name='Tienbau' timestamp='1302059175' post='38334']
Right on!

Would anyone else be interested in seeing the results, with a brief explanation of what the offending method does, for the top 10 or so bottlenecks?
[/quote]

While I'm not NERD enough to want to look at all the raw data and process it I am GEEK enough to be interested in what the results of the processing look like and what it all means :unsure:
Link to comment
Share on other sites

[quote name='Tienbau' timestamp='1302061050' post='38336']
according to the source code this isn't lag - it's coded to take a certain amount of time to finish manufacture etc.

Maybe there could be some sort of progress sound triggered or something like that?
[/quote]

In live here is how I remember the ammo manufacturing process as a PW, time wise, and supported by some Fraps Video I took back in the day.

You click the button, and there is the sound effect of the build which was/is comprised of the audio effect repeating series of sound pips,like: !!-!-!-! with a final 'bleep' pip about 1 sec later that was accompanied by the screen message in green about the build quality. This took about 4 seconds total time.
There has always been a progress sound associated with it.

As it is now this seems to fire like normal, however the build delivery and screen messaging are delayed for an additional 4-8 seconds, minimum.
There is definitely a difference that has occurred since the server move. Prior to the move, in regards to Live, it still took one or 2 extra seconds, its something I noticed nearly right out of the gate upon returning, but it has gotten more severe/lengthly.

Also, in talking to some of the major builders regarding this, they all reported new delays, of essentially the same amount of time.

[EDIT]::
I was unable to login for a bit, server down, but I just got on and went to test the ammo building issue, it seemed resolved!! Arx Spartoi if it matters, but it looked good. Will update as required.


Thanks!
-Kii Edited by Kii
Link to comment
Share on other sites

The results of the first profiling run are in.

To begin with, the easiest way to go through optimisations is by looking at execution times on a program branch basis - so this is how much total time is spent at the head of a branch. Optimisations are easiest to make because usually it just needs a small conceptual change to improve code speed:

[code]Player::CheckForNewMissions 29.09%[/code]

This one was a bit of a surprise! There was a mission check being performed once per 2 seconds for any available new mission while the player is floating in space.

I have changed this so checks are only performed once when the player comes to a stop. Hopefully this will cull a lot of checks.

[code]ObjectManager::PerformObjectAdmin 26.14%[/code]

This one I expected. Every 100ms the sector server updates all the dynamic objects in the sector. This checks for respawn, updates mob positions and behaviours, object state changes and so on. It should be the most expensive function, but I guess there are opertunities for optimisation - eg we could limit the checks for respawns and state changes to say once per 5 seconds.

[code]MOB::UpdateObject 22.39%[/code]

again this is expected to be an expensive routine as this checks the mob's levels, regenerates shields etc, sends mob information changes (hull/shield levels). There are probably a few things we could do to make this run better.

[code]MOB::UpdateObjectVisibilityList 11.57%[/code]

another expensive routine - this checks when mobs come in and out of player's view. Perhaps this could be done from the player side, then we could optimise how often it fires depending on the player's speed and scan range?

[code]MissionHandler::GetMissionStartNPC 7.75%[/code]

This was easy to optimise out, there was a helper dump for devs looking at which NPC's gave missions. However it only needed to be called when a dev was checking, not every time for non-devs.

The rest of the branch heads become less relevant and it wouldn't be worthwhile to look at optimising them yet.

---------------------------------------------------------------

The second way to look at the data is to see which individual functions are taking the most time (note that these percentages are on a different scheme to the above stats):

[code]PlayerManager::GetNextPlayerOnList 6.00%[/code]

This function is used extensively to iterate through player lists so we can process each player. It's used a hell of a lot throughout the server, to cycle through players in a sector, to cycle through player range lists and so on.

I knew when I wrote it that it would be used a lot, so it's fairly optimally coded - initially we used STL which is pretty sub-optimal. I can tell you there won't be any STL in retail games' main loops.

It might be worth re-visiting to see if there are any further optimisations we can make.

There's a bulk of STL calls after this, which we can optimise by re-coding the STL 'scaffolding' by hand where the time is critical, or just try to avoid using the routine so much (which is where the branch profile dump comes in handy).

We got mutex lock/unlock taking a bit of time, there's a few of these that can now be removed as a lot were placed before we had worked out a good server architecture, and we needed those mutex locks for data safety. Most have been removed as we gradually discovered the optimum architecture for how E&B works, but there's a few lurking in there still.

That's pretty much it for round 1!

thanks to everyone who participated, we'll be running further, more directed profile runs in the future, but for now this ought to keep us going for a bit.

cheers,

TB
  • Upvote 2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...