Jump to content

BUG: Frequent disconnects and loss of action bars since the last restart


Recommended Posts

I apologize in advance as I know the Technical Support forum is not exactly intended for bug reports but this is the best place I know to post this; if there is a better one please let me know.

 

Ever since the last major server restart which I believe occurred after maintenance (not this most recent one due to the login server, but the one before that) there has been a dramatic uptick in the number of instances of disconnect => lose action bars.

 

This is an old bug that I and many others have seen before, but it is happening MUCH more frequently now.

 

Here's a couple chat excerpts from recent in-game convos about this:
 

Quote

  (11): 5:49:36  [New Players] Oredeous: for the next 4 hours people kept getting blackholed in different sectors.  Kyp took it down and addressed the problems and brought the server back up in under 5 mins.

  (11): 5:50:58  [New Players] Codemonkeyx: things have been pretty rough for me... lots of DC's, losing action bars, etc.
  (11): 5:51:23  [New Players] Pwhulksmash: same dcs and action bars resetting
  (11): 5:51:47  [New Players] Lixxi: same here
  (11): 5:52:17  [New Players] Oredeous: ive been online since he brought the server back up until now with the exception of 6 hours, and it seems good for me. :)


 

 

Quote

  (8):22:37:11  [Guild] Codemonkeyx: !@#$%... got disconnected... just got my group back together and noticed my action bars gone, lol
  (8):22:37:31  [Guild] AntisJE: that been like every other day for me
  (8):22:37:53  [Guild] TeflonPrime: agreed, me too
  (8):22:37:54  [Guild] Codemonkeyx: yeah, things have been bad since this last restart
 

 

So for everyone but @Tweetz (:)) something seems to have changed for the worse recently. I decided to make this post since every time it has come up multiple others chime in that they're seeing the same thing, and it hasn't resolved itself in a few days.

 

FWIW my workaround for this problem has been to backup my Data\client\output\shortcut.ini file each time I launch the game (with a date/time stamp) and whenever this happens I go back to the character select, copy my last good backup over top of shortcut.ini, then log back in and that gets me my bars back 99% of the time. On one occassion I had to do it twice.

 

This always seems to be coupled with a disconnect, usually while gating or docking, so that may be the real root cause. My theory is that there are scenarios where the client hasn't yet read shortcut.ini but saves it anyway after a disconnect, clobbering the actual shortcuts with an empty file. (In other words, maybe this isn't fixable if it's a client bug, but perhaps the root cause of the more frequent disconnects is).

Edited by Codemonkeyx
Add disconnects to title
Link to comment
Share on other sites

  • Codemonkeyx changed the title to BUG: Frequent disconnects and loss of action bars since the last restart

An update to this - and @Codemonkeyx response in particular, I'd like to advise that whilst I'm not losing my action bars, I am frequently getting disconnected (maybe 1 sector change in 15-20 aggregate).  If not sector change then, when I go back to the character login screen I'm getting the black screen of death.

 

As a side note, if you multi-client on a Windows box, bring up Task Manager and select the option of bringing the client to the front.  This allows you to correctly identify the erroneous client and end task it, instead of throwing the proverbial dart of hope at the one you think it's most likely to be.

 

Please don't advise we check the four boxes...  :( 

Link to comment
Share on other sites

I get disconnected in the station if i sit still for 2 mins it automatically disconnects me that is one of the main reasons for not playing as much re-logging in toons because i let them sit for a few mins is annoying,,,

Link to comment
Share on other sites

I very rarely have issues with random crashes unless the server has been up over 699 hours,  Several things I can recommend (not going to mention packet optimization or prototype reorder boxes being marked on the Net-7 launcher.)

 

1.  How long has it been since you restarted your home gateway/router  you should do it at least monthly so that it clears its cache and memory and resets your net connection achieving a better connection?

 

2.  Have you left you computers online with out restarting it for long periods of time, this can cause spurious data losses,?

 

3. What is the weather where you are and what is it like between you and the server, this does effect it a lot more than one would think, especially through the south lately with the massive rain storms and in the midwest with all of the tornadoes knocking down the lines. Even Fiber optic cables can not stand up to the winds speeds of a tornado, or the flash flood washouts of infrastructures. The most recent server restart was just as the severe weather started hitting the central, north and south parts of the country.

 

  • Like 1
Link to comment
Share on other sites

First thing I'd ask beyond what Woody has said is (and you can privately share this if preferred) what are your ISPs, what type of modem/model do you have that is routing your local LAN. Usually the ISPs "modem" unless you're techy enough to do your own thing.

I would first wonder if it's a common thread here. If it's happening for a single character it's something we could target but if it's multi-client because of the nature of how the game server communicates this is why we have the general rule that says we don't support it. I won't explain that in depth because it could have security implications but it has to do with how packets are being handled betwixt client(your public IP/router outward interface from ISP router)/server.

The reason is we have seen issues in the past with people trying to use mobile hotspots (weirdly 5g etc works fine, and I've even heard of successful SteamDeck installs of E&B). Unfortunately with networking -- and the reason it's such a headache -- is every device in the path is a potential issue. :)

Link to comment
Share on other sites

On the day's in question parts of NA and EU had heavy internet outages on the backbone networks specifically Level3 Communications and AT&T on their AS(Autonomous Systems - BGP level) and it took approximatley 12-18 hours to clear up.

Link to comment
Share on other sites

Posted (edited)

Thanks everyone for the responses so far.

 

First of all let me be clear in saying that I'm not trying to throw stones here. I understand better than most that operating a 24/7 network service of any kind is non-trivial, let alone a MMORPG.  I'm incredibly grateful for all the work you guys put in, for free.  By and large things work quite well and I've enjoyed hundreds of hours of playing this game again.  Thank you all for that!

 

I only started this conversation to share some data points in an effort to help improve what is already a great experience.

 

That said, let me address some of the comments in no particular order.

 

Re: my router, computer, the weather, the internet, networking, sunspots, et. al...

 

I understand that there are a LOT of potential causes for problems.  The stack of technology and devices that all have to be operating within expected parameters for any of this to work is staggering to say the least, from my local power station to a low-level networking driver in my kernel to electromagnetic radiation.

 

But let's consider some things to try and remain grounded and focused:

  • This is not the only online game I play; if I was seeing connection issues here AND in other games, I would be looking at potential causes on my end and not wasting your time
  • Games are not the only thing I use the internet for; I work remotely and I'm constantly logged into dozens of other machines located all over the country. When there's a problem with my connection, I'm keenly aware of it and it's "mission critical" for me. For this reason I both live in a place with exceptionally reliable and fast internet service and monitor my equipment more closely than most because my livelihood depends on it.
  • I'm not saying this to suggest that there can't be problems on my end; there absolutely can be and there have been in the past, though rare. I say it mostly to point out that I have plenty of data points and the skills for troubleshooting so that I can tell the difference.
  • FWIW my ISP is a municipal fiber provider in a small city outside a major metro area, so it's highly unlikely that would be shared with anyone else here. The ISP modem is a Calix GigaPoint 803G, and behind that I have an ASUS router running custom firmware, Asuswrt-Merlin (to Woody's point most stock router firmware is absolute garbage filled with memory-leaks and bloatware requiring frequent reboots).  Everything in my local network is hardwired gigabit ethernet.

But perhaps more importantly:

  • The events I'm describing are directly correlated to the relatively uncommon in-game event of gating/docking/undocking. That can't be a coincidence. It would be one thing if I was getting disconnected while randomly flying through space, or in combat, or idling in a station. But this only seems to occur when gating/docking/undocking.
  • Are we suggesting that a tree fell on a power-line at the exact moment I was gating?  That every instance of this that myself, Tweetz, Pwhulksmash, Lixxi, Anti, and Tef, (and undoubtedly many others who haven't happened to be in these conversations) have experienced could be traced back to some larger event going on in the world that just happened to correlate with when they were individually gating/docking/undocking?  And conversely that such events very rarely occur when we aren't gating/docking/undocking?  It strains the imagination.
  • A much more likely explanation is that there is something unusual going on during gating/docking/undocking which is less than 100% reliable (recent experiences aside, I think everyone knows that to be true)
  • Finally, what I'm describing here is a change in behavior relative to say the prior 7 months.  Gating/docking/undocking has never been 100%, but something seems to have changed to widen that window where issues occur; to make it more likely that whatever causes the otherwise rare symptom of action bars disappearing to become a much more common (daily) phenomenon.

However, on the topic of common threads and multi-client:

  • I will admit that this is curious because I know that all of the people listed above run multiple clients, though not necessarily to the same extent. Pwhulksmash, Anti, and myself basically always run full groups of six.  I'm not sure about Tweetz but I believe Tweetz, Lixxi, and Tef all frequently run at least 2-4 clients.
  • This may be a coincidence and biased by any number of factors though
    • The guild I'm in (with Anti and Tef)
    • More enfranchised players who play enough to notice a difference also being more likely to run multi-client or to speak up in such a discussion
    • Anecdotally it seems to me that most players run multi-client, albeit to very different degrees, with at least 2-3 (i.e. a JE and their main for wormholes, or a JE, a TT for healing, and their main, etc.)
    • It would therefore be more difficult to get data on people who only run a single client and whether they see similar issues at a similar per-capita rate
    • Even if multi-client isn't the cause, people who run 6 clients would be expected to see intermittent failures 6 times as often so there is also a perception bias
  • I know you said you wouldn't elaborate for security reasons, but it's hard for me to understand why it matters whether clients are running on the same machine or the same local network or the same internet when they all have to establish independent connections to the same server.
  • If this is somehow part of the issue and multi-client isn't supported I can accept that some issues are to be expected but that doesn't explain why it works most of the time nor why it would have gotten worse lately.

Of course, I don't expect all of that to convince you on its own.  The only way we're going to get to the bottom of this is with some data.  I've scoured the logs on the client side (chat.log, enb_client.log, authlogin.log, etc.) and unfortunately can't find anything which could be used to measure the frequency of these events.  The only messages that get logged for me are on other clients I'm grouped with indicating so-and-so "has left the group" which of course also shows up normally for many, many other completely normal game play reasons, perhaps most notably the Net-7 vault functionality which requires logging out (and therefore breaking group).  As a result there is no reliable way to extract any useful data from the client logs.

 

Do you guys have server-side logging which can differentiate between a client-initiated disconnect (e.g. /quit, Quit to Character Selection, Quit to Desktop, etc.) and an unexpected disconnect?  If you do and you keep a reasonable amount of historical data maybe that can be used to try and put some numbers to this?

 

Is there any other logging I could enable client-side which might help learn more about what's going on?  Would it help for us to start reporting data here of disconnect+action bar reset events, like:

  • Character Name
  • Gating/Docking/Undocking From
  • Gating/Docking/Undocking To
  • approx /time as reported by server
  • Number of clients we were running

etc.?

 

I recognize that the time you guys have to commit to this project is limited and that this may not be the most pressing issue, but I do think it's an impactful one that is seen widely by many players.  I also believe that it's likely a server-side issue of some kind and that some recent change (network security changes related to recent maintenance events?) seems to have increased the rate of occurrence.  Any data that we could add to this conversation would be valuable.  Trying to speculate about potential causes when we don't even know exactly what's happening is not likely to be fruitful.

Edited by Codemonkeyx
Link to comment
Share on other sites

I have done everything possible to clear it up but it still disconnects me i guess something i am going to have to deal with and i am so used to it now it is no big deal other games i play i have no issue but it started around the last upgrade or patch if that is any help..

Link to comment
Share on other sites

It seems odd thats its appears to be only effecting players on the NA region and not the EU as far as I'm aware (please inform me if I'm wrong - i.e. you are having issues from a differenet region), although as far as I know there are only a handful of players active from the EU (including the UK).

 

This would point to a networking issue rather than a server side one in my opinion (yes I get the co-incidence that it "appears to have occured since the last restart".)

 

As a test (I dont have nor have need of VPN abilities so can't test this) could someone who actively uses VPN services test this for me? i.e VPN to lets say Norway or something similar and try to connect to the game and see if thats trouble free or has issues and report back?

 

Thanks in advance.

Link to comment
Share on other sites

Posted (edited)

Lost both CodePP and CodeTT at the same time:

 

  • Character Name: CodePP
  • Type (Gate, Dock, Undock, Wormhole, Return to Base): Wormhole
  • From: Carpenter
  • To: Xipe Totec
  • approx /time as reported by server: 02:44:58 on 6/17/2024 (UTC+0)
  • Number of clients we were running: 6

 

  • Character Name: CodeTT
  • Type (Gate, Dock, Undock, Wormhole, Return to Base): Wormhole
  • From: Carpenter
  • To: Xipe Totec
  • approx /time as reported by server: 02:44:58 on 6/17/2024 (UTC+0)
  • Number of clients we were running: 6

 

CodeJE wormholed to Xipe Totec successfully so then I had to wormhole back to Carpenter, re-login CodePP and CodeTT, fly them to Yamuna's Weft in Carpenter to meet up with CodeJE, reform the group, then wormhole again to Xipe Totec to get everyone back together.

 

In this instance CodePP and CodeTT's action bars were both still intact.

Edited by Codemonkeyx
Link to comment
Share on other sites

Posted (edited)

Almost the exact same thing happened as last night, except this time with CodePP and CodeJD:

 

  • Character Name: CodePP
  • Type (Gate, Dock, Undock, Wormhole, Return to Base): Wormhole
  • From: Carpenter
  • To: Xipe Totec
  • approx /time as reported by server: 09:59:52 on 06/18/2024 (UTC+0)
  • Number of clients we were running: 6

 

  • Character Name: CodeJD
  • Type (Gate, Dock, Undock, Wormhole, Return to Base): Wormhole
  • From: Carpenter
  • To: Xipe Totec
  • approx /time as reported by server: 09:59:52 on 06/18/2024 (UTC+0)
  • Number of clients we were running: 6

 

CodeJE wormholed to Xipe Totec successfully so then I had to wormhole back to Carpenter, re-login CodePP and CodeJD, fly them to Yamuna's Weft in Carpenter to meet up with CodeJE, reform the group, then wormhole again to Xipe Totec to get everyone back together.

 

Once again both CodePP and CodeJD's action bars were both still intact.

Edited by Codemonkeyx
Link to comment
Share on other sites

Posted (edited)

Lost CodePP:

 

  • Character Name: CodePP
  • Type (Gate, Dock, Undock, Wormhole, Return to Base): Wormhole
  • From: Odin Rex
  • To: Endriago
  • approx /time as reported by server: 11:37:01 on 06/19/2024 (UTC+0)
  • Number of clients we were running: 6

 

Once again didn't lose action bars, just disconnect and dance to rejoin group. (I promise that these disconnects have been more frequently coupled with missing action bars lately, but haven't had one result in that symptom since I started reporting them here).

Edited by Codemonkeyx
Link to comment
Share on other sites

I don't typically run multiple clients, so I don't really have a dog in this race ...

 

But out of curiousity, are your toons 'staggered' when they go through these wormholes, by any chance? 

 

I play with folks who multibox and they have their toons staggered at about 3 seconds or so between and dont get the issues you have been describing here. If you're not staggered, perhaps just give it a shot and see if it alleviates?

 

If you already are, and it's still happeneing, then I respectfully bow out as I have not a clue. 

 

Fly safe!

 

Alurra

Link to comment
Share on other sites

1 hour ago, alurra said:

But out of curiousity, are your toons 'staggered' when they go through these wormholes, by any chance?

 

I haven't been doing anything like that, no, I'll give that a shot, thanks!

Link to comment
Share on other sites

So.. very very high level, when a character from your account transfers between sectors he's being handed off between 'servers' so to speak. These connect you to the new 'map' and give you everything going on there, that is your gating process that you know and love so much. :)

The main reason we didn't want to support multi client has to do with this because the way our server works relies on a UDP/TCP translation that is occurring in the process This is probably something we could improve but would require a major overhaul of the server as everyone knows it, but it would eliminate a lot of things like lags in the graphics in the client, etc.

Anyway, long story short the reason I asked what I asked wasn't because I doubted your gear or competency, but because we've had that sorta thing happen in the past so it is something I rule out. 

If it continues, my recommendation would be to resolve the IP for 'sunrise.net-7.org' and then set up a packet capture between your computer and the server so we can look at the traffic. As far as server logs, some things are logged, others are simply output to the console and don't get saved to a text type of log so it depends on what exactly we're hunting for. However, with the only things I've recently changed in our environment it would likely be security-related updates if we had a sudden issue like this occur, some change in how the firewall 'senses' traffic for example. To understand that I would have to see all of the traffic as my IP address always passes that firewall and I can't 'see' those kinds of issues.

  • Like 1
Link to comment
Share on other sites

Thanks for the extra context; I wasn't aware that TCP/IP was involved, I thought all the server communication was UDP so that's an interesting new data point.

 

I'm going to try @alurra's suggestion, but in the event that I still see issues after that, how does this sound for the packet capture:

  • Save a pcap file using tcpdump
  • Capture all TCP and UDP traffic with host `sunrise.net-7.org`
  • Say, +/- 1 minute around the time of the disconnect?
  • Send the packet capture to you directly via a forum PM?

e.g.

sudo tcpdump -w /tmp/capture.pcap host sunrise.net-7.org

 

Then I can also send you a post-processed log in addition to the pcap. Here's an example of a single ICMP echo (with no response because sunrise.net-7.org doesn't respond to pings, a good security posture :)):

 

tcpdump -tttt -vvv -x -r /tmp/capture.pcap
reading from file /tmp/capture.pcap, link-type EN10MB (Ethernet), snapshot length 262144
2024-06-19 15:18:28.543864 IP (tos 0x0, ttl 64, id 8486, offset 0, flags [DF], proto ICMP (1), length 84)
    tungsten > 216.219.87.147: ICMP echo request, id 36, seq 1, length 64
	0x0000:  4500 0054 2126 4000 4001 2691 c0a8 01db
	0x0010:  d8db 5793 0800 2e0a 0024 0001 244b 7366
	0x0020:  0000 0000 6b4c 0800 0000 0000 1011 1213
	0x0030:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
	0x0040:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233
	0x0050:  3435 3637

 

Any other specifics that you would want to see for that?

Link to comment
Share on other sites

I finally caught an instance of this with the network capture running where the action bars were also lost.  This happened a couple days ago but I needed to figure out how to trim down the size of the capture file to just the relevant timeframe and just remembered to do that.

  • Character Name: Codemonkeyx
  • Type (Gate, Dock, Undock, Wormhole, Return to Base): Undock
  • From: Yasuragi (City/Station)
  • To: Yasuragi Area (Swooping Eagle Planet)
  • approx /time as reported by server: 09:45:49 on 06/20/2024 (UTC+0)
  • Number of clients running: 6

I'll send the network capture files to Kyp directly:

Codemonkeyx_dc_Yasuragi_Yasuragi_Area_20240620_094449_to_094549_lost_action_bars.pcap

Codemonkeyx_dc_Yasuragi_Yasuragi_Area_20240620_094449_to_094549_lost_action_bars.pcap.txt

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...