Custom Actions (Scripts) and Logon/Logoff Hangs

*This post originally appeared on the AppSense blog prior to the rebrand in January 2017, when AppSense, LANDESK, Shavlik, Wavelink, and HEAT Software merged under the new name Ivanti.

Blog_Banners_main-page[1]
Every so often I get an incident with an unexplained intermittent logon/logoff hang that is down to an Environment Manager Logon/Logoff Custom Action (script).

By default, a Node has a 30 second timeout but the timeout configured in a Custom Action changes the timeout of the entire Node it resides in.  When creating a new Custom Action, the default timeout is set to 0 seconds, which represents wait for an infinite time.  This has the additional effect of setting the node to an infinite timeout.  In the case of a Custom Action that intermittently hangs, Environment Manager is doing what it has been configured to do and waiting indefinitely for the script to complete.

Ideally set a timeout on all Custom Actions to reduce the risk of a node hanging indefinitely.

If you want to test the impact of a long running or hung Custom Action on each Trigger type.  Configure a basic “WScript.Sleep(300000)” vbScript Custom Action, which will result in a 5-minute sleep.  Test under each Trigger type you’re interested in to see the impact on Computer Startup, Shutdown, Logon, Logoff, Process Start/Stop etc.

You can also use this feature to your advantage.  If the actions within a Node takes longer than 30 seconds to complete, the Node will timeout.  Any actions that do not already have a thread created will not be executed and child Nodes will also not run.  If you have actions that you suspect might be contributing to Node timeout e.g. connecting to multiple remote site resources.  Add a blank Custom Action to the node and set an appropriate timeout.  This should extend the timeout of the node.

Not as easy to troubleshoot as it sounds

This type of issue tends to be quite rare as the vast majority of scripts will complete.  Unfortunately, it has historically (EM 8.5 and earlier) been quite tricky and time consuming to troubleshoot.  The issue can be very intermittent, as it tends to be related to something in the script not working intermittently e.g. access to a network resource that is intermittently not available.  Therefore, this is difficult to catch with logs.

Once caught the issue doesn’t stand out in the logs.  Environment Manager is doing what it’s configured to do, waiting an infinite amount of time for an action to complete.  Therefore, there is no error message related to the action.  As Environment Manager Policy processing is multithreaded, a new thread is created for each node, condition and action.  In a big configuration with hundreds of actions, the Custom Action thread will start but the rest of the configuration will continue to be processed at the same time.  Eventually when Environment Manager has processed all other remaining logon actions, Environment Manager will appear to just sit there idly in the logs and occasionally processing a background task.  The issue has occurred typically tens or hundreds of thousands of lines earlier in the log.  The trick at this point is to try and determine which of the EM Internal Tasks didn’t complete and then which of the hundreds of threads that started didn’t complete, a job made a little more interesting as thread ID’s can be reused.

EMMon [Orphan] Action

The new Log Parser in Environment Manager 8.6 and above (EMMon) will show orphaned threads.  These are threads that have started but didn’t complete.

After capturing a log and opening using EMMon, ensure that “Orphans” is ticked in the window that shows Types to display.  You can find orphaned threads by either sorting by Type, as an orphaned thread is prefixed by “[Orphan]”, or sorting by Duration (descending) and orphaned threads should appear at the top of the list.

From this screen shot we can see that there is an orphaned Task “EM_TASK_NODE_EXECUTION_RUN_LOGON_ACTION”.  We can also see that the LOGON trigger is orphaned, which typically represents the “Pre-Desktop” trigger within EM Console.  So we have an indication that logon is hung as Logon Policy actions have not completed, rather than an issue restoring Personalization settings.  We can also see that there is a single node that has not completed and a single action which was most likely from the node that has not completed.

It would be nice to review the “[Orphan] Action” item directly but unfortunately EMMon only shows a couple of lines that doesn’t provide an additional clue.  To get further information about the action, it would be necessary to convert the ETL to a text based log using Etl2Txt and to follow the thread.  Notepad++ is useful for this, as you can use the “Find All in Current Document” and search for the Thread ID.  As the thread never stopped, the Thread ID would not have been recycled for another action.  So the last actions of the thread relate to the hung action.

An alternative is to review the details within the orphaned node, this will provide the name of the node.  From there you can review your configuration for custom actions or conditions.  After reviewing your configuration, you can also review the actions completed within the node within EMMon.  In the example below, I would review “NodeA” within the configuration initially.  However, for certainty I would review the text log using the technique described above.

Hopefully this blog provides a heads up of a relatively uncommon issue but one that can be easily avoided by setting a timeout value.