gibney$:[handbook]IPCS_TROUBLESHOOTING.MEM Troubleshooting the IPCS messaging system ----------------------------------------- 02-Oct-2003 Thu --------------- Symptoms: o SHOTCLOCK command on Europa does not create window Status: o evtmgr (on epicsrv) not responding to requests from computers on Physics network. Confirmed by: $ usrtest Enter ['']: req woof o on Kees (the cell netmailer for Physics network): $ nm_status : epicsrv epicsrv 8 TCP REQ The REQ status indicates that Kees' IPCSMAILER is requesting a connection with epicsrv. However, the RUNNSTX log file NSTX$:[LOGS.RUNNSTX]runnstx.log shows that RUNNSTX is not aware of a problem. Strange ... o on epicsrv (cell netmailer for Engineering network) > nm_status : pppl_res pppl_res 2 TCP CONN CELL () : RUNNSTX pppl_res pppl_res ad21a021 fbac7a3f epicsrv pppl_res Looks like everything is fine, and RUNNSTX is connected in spite of the "REQ" status noted on Kees! Stranger ... Solution: o Restart evtmgr on epicsrv (see ipcs_nstx_restart.mem). Doesn't seem to make sense, but it worked -- see nm_status below. o On Kees: RUNNSTX.LOG shows that connection with evtmgr was lost and then re-established $ nm_status : epicsrv epicsrv 10 TCP CONN CELL () : EVTMGR_NSTX epicsrv epicsrv 275c0000 9c277c3f pppl_res epicsrv NOTE: if only the evtmgr is restarted it should not be necessary to restart any processes. Like RUNNSTX above, the processes will try periodically to re-establish connection with the evtmgr. However, if IPCSMAILER is restarted then many processes need to be restarted. See ipcs_nstx_restart.mem for details. 03-Oct-2003 Fri --------------- Symptom: usrtest on Kees did not work: KEES$ usrtest Enter ['']: req woof xxx: evt_wait4ack: err from ipcs sts=no_net_receiver xxx: evtHello: err from evt_wait4ack, connecting to EVTMGR_NSTX Status: o Kees nm_status showed epicsrv status as DISC. EVTMGR_NSTX not listed as registered process. However, RUNNSTX.LOG showed no problems. o epicsrv nm_status showed everything ok. RUNNSTX listed as a registered process. Solution: Once again, this doesn't make a lot of sense. o Since RUNNSTX appeared to be known to epicsrv, wanted to test whether it would actually receive an event (check RUNNSTX.LOG) : On rich: > declare_evt NSTX_SOC 1234 o SURPRISE! RUNNSTX got the event and responded appropriately, PLUS Kees nm_status showed all normal, including EVTMGR_NSTX registered as known program! Go figure.