Jump to content

Regarding the Process for When the Server Goes Down


Specops

Recommended Posts

I know nothing about coding or servers, so dunno if these suggestions are feasible:

 

(1) Restart the server immediately after it crashes.

 

In another thread some dev(s) mentioned that we keep getting crashes in Live because, with so many players on, there are simply exponentially more ways that errors may occur that cause crashes (and it is this that causes crashes as opposed to an underlying problem the devs are unable to fix). I would say if the error is so rare that it has happened for the first time only weeks after Live began, why not restart immediately without figuring out a hot fix and figure out the problem while the server is back up. What are the chances this same error causes a server crash before another error causes a server crash?

 

(2) Designate more people to be able to restart the server.

 

I don't know how hard it would be remotely do this without having a dev client installed, etc., but we seem to have some blind spots, especially during the US West Coast hour evening hours in getting the server back up if it goes down. Perhaps if it could made where people who can restart are not able to bring it down, but only start it up, there would be less issues with giving non-devs this ability?

Link to comment
Share on other sites

I'll echo Specops in that I know nothing about coding or servers. My suggestion is on more of keeping the player base informed could you create a Sticky Thread on server status, Up, Down for extented period or whatever explanation is appropriate. It would beat trying to login to check the server status. The thread I'm suggesting would be Dev/Gm/Random Official thread only and would be for relating information. Just my two credits worth.

Link to comment
Share on other sites

The reason why we dont have it automatically restarting or a bunch of people that can restart it is because we want to see what caused the crash and fix it before we start it back up.

 

That way the bug is fixed, and the same bug wont just crash it again in five minutes.

 

Only three active people currently have the technical know how to debug and fix on the fly.  Kyp, Zack, and Rolo.

  • Upvote 2
Link to comment
Share on other sites

I'll echo Specops in that I know nothing about coding or servers. My suggestion is on more of keeping the player base informed could you create a Sticky Thread on server status, Up, Down for extented period or whatever explanation is appropriate. It would beat trying to login to check the server status. The thread I'm suggesting would be Dev/Gm/Random Official thread only and would be for relating information. Just my two credits worth.

Upper right hand corner of the Net-7 portal site has a Green, Yellow, Red circle that indicates Up, Starting, or Down.  It will also tell you how many players are on if it is green.

 

http://www.net-7.org/

Link to comment
Share on other sites

I'll echo Specops in that I know nothing about coding or servers. My suggestion is on more of keeping the player base informed could you create a Sticky Thread on server status, Up, Down for extented period or whatever explanation is appropriate. It would beat trying to login to check the server status. The thread I'm suggesting would be Dev/Gm/Random Official thread only and would be for relating information. Just my two credits worth.

Leavon check out the last page on this topic: https://forum.enb-emulator.com/index.php?/topic/5823-server-status/

 

Doesn't always get an official response right away, but it usually does, and you can bet someone will have posted if the server is down.

Link to comment
Share on other sites

The reason why we dont have it automatically restarting or a bunch of people that can restart it is because we want to see what caused the crash and fix it before we start it back up.

 

That way the bug is fixed, and the same bug wont just crash it again in five minutes.

 

Only three active people currently have the technical know how to debug and fix on the fly.  Kyp, Zack, and Rolo.

Yeah Stanig that's fair, but perhaps if the 3 of them could work just as effectively off a log immediate restarts might be feasible. Also with the bugs, I thought each crash was generally from a bug that occurred for the first time. Unless for some reason once a bug occurs, it will happen contiously thereafter (i.e. causing the server to crash within 5 min), what are the chances that that same bug will occur before another new bug occurs and brings it down.

Link to comment
Share on other sites

Over my pay grade, honestly.  I only make 2 cents a year, youd have to ask one of the nickel makers.  Joking aside, I really dont know past what ive overheard in Dev meetings.  I do know we have an ongoing multithread locking issue that happens under load, but there are other bugs that crop up as well as people stumble across them.

 

Id tell ya more if i had a clue, but its over my head to be completely honest.  Imagine sitting in a dev meeting when youve got three c++ coders talking shop.  XD

  • Upvote 1
Link to comment
Share on other sites

Over my pay grade, honestly.  I only make 2 cents a year, youd have to ask one of the nickel makers.  Joking aside, I really dont know past what ive overheard in Dev meetings.  I do know we have an ongoing multithread locking issue that happens under load, but there are other bugs that crop up as well as people stumble across them.

 

Id tell ya more if i had a clue, but its over my head to be completely honest.  Imagine sitting in a dev meeting when youve got three c++ coders talking shop.  XD

 

I just appreciate that you often respond :)

Link to comment
Share on other sites

The biggest obstacle we face is lack of people with the time/expertise and of course....just being plain "nuts" who can server code for the project. I don't code myself (I however will  always be available should a need for apple ][ Appplesoft BASIC from 1983 ..be needed and or ..should the need arise)...however from what I understand...you pretty much have to be an 'up to date' coder for the efforts here at the project.."i.e. know your crap ..so to speak.....(which usually means you code this stuff for a living...much like the WEB devs) ..then all fired up from your coding at work M-F ..you rush home at 'no pay' to code here at the project with "no source code" nor "instructions"  on how it is compiled and of course for the wonderful amount of $000000 dollars..the Server Devs get......(which is beyond my meager $00 dollars I get as a Game Master GM).....I think perhaps you begin to see the problem...You also from what I gather from Dev conversations the added "bonus" of trolling thru old code from server devs past....and most likely have to re-do/figure out and or replace their stuff when that inevitable "oops it blew up" moment arrives and the Server Hamsters rebel....we have had an "improvement" of some note however 2 of the server devs are in Europe so when the Server Hamsters inevitably rebel on a Wednesday night say at 3am ...it is possible from their day jobs in Europe they "may" see the server is down and do something about it ...much better when we only had 2 server devs and they both had to work and ..selfish that they were sleep ..in that they were both here in the USA..which meant we sometimes had 15hr turnabouts for server restarts during the week.

 

To end it is kinda also a "chicken and egg" problem....we went LIVE after massive testing on the Dev Server (yep we have one of those) and all looked hunky dory...then we find that 400 to 500 people on *much like the keggers of my youth at the old fraternity caused some unanticipated neighborhood issues" ...er..... seemed to put a strain on the code/servers etc....so w/o going LIVE we would have been ignorant of the problem in that the large influx of people would not have happened....to point out that more coding needed to be done.....

 

so by all means if you know of some dedicated types who can server code at this level without real life issues / wife or girlfriend ARGO/  or are eccentric and independently wealthy and want to do this for fun and giggles send them our way....

 

but most of if not all of IMHO work in some manner on the project at LEAST 20hrs  a week with no $$$ ....no matter what level of staff...but alas....we can't just flip a switch yet and keep the server up

 

as always this is my rant and I take full responsibility for it....incoherent though it may be.....but even at the best of times no matter what your skills towards the project 3 out of 4 who attempt to becom staff.. say pretty much "you guys are nuts" and walk away ..which probably is the sensible response to reviving a MMORPG from nothing but the install CD's and no instructions...but luckily sanity is not a real big issue here should you want to help the project ..thank goodness

 

anyway it is what it is Server wise..IMHO

 

my 2c worth

 

Searing   

  • Upvote 1
Link to comment
Share on other sites

As a coder, sometimes a crash log will be overwritten when a restart happens.  Sometimes there is just information you can not get without looking at the crash when it happens.  Auto restarting the server can wipe these logs and then leave you with nothing to evaluate to fix the problem.  Yes, experience taught me that.

 

Sometimes users just have to trust the IT people who know what is going on.  We are doing what is the best with what we have.

 

I doubt the devs want to deal with an error log rather than killing tengu, so they want to get the game running as fast as possible.

 

Less people that have access to the server the better, especially in this type of environment.  One pissed off player could wipe out years of work with just a few keystrokes and a server restart.

  • Upvote 1
Link to comment
Share on other sites

As a coder, sometimes a crash log will be overwritten when a restart happens.  Sometimes there is just information you can not get without looking at the crash when it happens.  Auto restarting the server can wipe these logs and then leave you with nothing to evaluate to fix the problem.  Yes, experience taught me that.
 
Sometimes users just have to trust the IT people who know what is going on.  We are doing what is the best with what we have.
 
I doubt the devs want to deal with an error log rather than killing tengu, so they want to get the game running as fast as possible.
 
Less people that have access to the server the better, especially in this type of environment.  One pissed off player could wipe out years of work with just a few keystrokes and a server restart.
Incredibly correct, sir.

Part of the problem is that we'd have to dump the call stack and memory image to figure out what happened, and that's a huge amount of data for an application this large.
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...