How to Troubleshoot Agent Discovery

Issue

Agents fails to discover during a task.

Status: Failed
Result: Cannot Find Agent
Return code: 1110

Agent Discovery Process

When a scheduled task is set to Push, these are the steps it goes through to discover a client to provide the task to.

Logs

These logs will be used in diagnosing this issue:

C:\ProgramData\Landesk\log\PolicyTaskHandler.log
C:\Program Files\LANDesk\ManagementSuite\log\PolicyTaskHandler.exe.log

Is the machine online?

Yes - Device is online

If the machine is online, and continues to fail discovery, continue on to Setup a Discovery Troubleshooting test.

No - Device is not online

The most common reason for a machine failing to be discovered is that it is offline. A machine that is not online, cannot be discovered.

Can you ping the device?

Can you validate through some other method it is online?

Setup Discovery Troubleshooting Test

When the LDMS core is attempting to discover an agent, it will send a 'discovery' packet to the client. When the client receives the discovery packet, it will respond with information about the device to let the Core know who received the discover request.

Follow these steps to setup a Discovery Troubleshooting Test.

Enable verbose policy task handler logging

When the Core sends a discovery packet, it will log the attempt in the PolicyTaskHandler.exe.log, if verbose logging for this process has been enabled.

On the LDMS Core, in the Scheduled Task panel, select the Configuration Settings option (gear icon) and choose Default Scheduled Task Settings

In the Default scheduled task settings window under Accelerated Push, check the box for Enable verbose policy task handler logging
Click Save

Note: The changes will only take effect on Tasks started after the change has been saved.

Setup Wireshark on core

Because the Core is involved in a large amount of network traffic, we want to filter down the traffic that Wireshark actually listens for.

In Wireshark choose Capture | Options.

In the Wireshark: Capture Options window choose Capture Filter:

In the Wireshark: Capture Filter - Profile: Default window click the New button
Select New Filter at the bottom of the list
Set the following -
- Filter name: Arp or UDP Port 9595 or ICMP
- Filter string: arp or udp port 9595 or icmp
  - (The filter string is case sensitive, so make it just as listed above)
The entry in the list will now reflect the Filter Name entered
Click Ok

In the Wireshark: Capture Options window, the Capture filter will show the Arp or UDP Port 9595 filter.

Note: If the bar is green the filter is acceptable. If the bar is red, the filter is malformed. Go back and make it match exactly as above

Click Start to begin the Capture

Example of malformed filter.

End Other PolicyTaskHandler Processes

If you have ran several tasks, you may have multiple instances of PolicyTaskHandler.exe running, all of which are sending discovery packets.

To limit the amount of traffic we gather in Wireshark, and in the PolicyTaskHandler.exe.log, we want to end any other PolicyTaskHandler.exe processes running. This can be done through Task Manager.

Note: Ending this process will stop any agent discovery occuring on that particular task. Because terminating the process will stop discovery, the Scheduled Task may stay in a 'Discovering Agent' state indefinitely. The task will need to be re-ran later for a status change.

Create Scheduled Task to Test

You can use an existing scheduled task if desired, so long as you terminated it's running PolicyTaskHandler.exe process as outlined above.

Schedule the task, and assign machines to be pushed to.
Right click the task and choose Info
Take note of the TaskID
- We will use this when looking through logs to identify which lines the task belongs to

Were Discovery Attempts Logged?

When the Core attempts to discover a machine, it will log the machine info attempting to be discovered in the PolicyTaskHandler.exe.log (so long as verbose logging was enabled).

07/20/2015 07:00:33.2837 INFO 6812:1 RollingLog : [Task: 7zip 9.20 - 7/20/2015 7:00:18 AM, TaskID: 1541, ProcID: 6812] : Discover: Discovering machine: [96-AGENT] using it's known ip address [10.14.130.61]...
07/20/2015 07:00:35.2996 INFO 6812:1 RollingLog : [Task: 7zip 9.20 - 7/20/2015 7:00:18 AM, TaskID: 1541, ProcID: 6812] : Discover: Successfully discovered machine: [96-AGENT]

Yes

Continue to Were Discovery Packets Sent?

No

If Discovery attempts were still not logged, verify that new entries are being logged at all to the PolicyTaskHandler.exe.log.

If PolicyTaskHandler.exe runs, it should log accordingly, and utilize the verbose option if enabled. If yours is not doing this, then it is likely that now PolicyTaskHandler.exe isn't getting to run which means the task will not go to 'Unable to discover agent'.

Verify the Landesk Scheduler Service is running.

Review other logs to identify if there are any errors that may indicate why PolicyTaskHandler was unable to run.

Example: TaskHandlerProxy.exe.log shows PolicyTaskHandler.exe couldn't be found.

07/24/2015 08:03:28.0657 INFO 8044:1 RollingLog : TaskHandlerProxy: C:\Program Files\LANDesk\ManagementSuite\PolicyTaskHandler.exe /TASKID=2554
07/24/2015 08:03:28.2991 INFO 8044:1 RollingLog : TaskHandlerProxy: run process exception - The system cannot find the file specified

Were All Discovery Packets Sent?

To verify the traffic we are looking at is for our test task specifically, do the following:

In the PolicyTaskHandler.exe.loglook for the first line that lists Discovering Machine for our TaskID

Note: Our task ID is 1547

RollingLog : [Task: 7zip 9.20 - 7/24/2015 7:55:58 AM, TaskID: 1547, ProcID: 10500] : Discover: Discovering machine: [TABLET] using it's known ip address [10.0.0.6]...

In the Wireshark trace, filter for only discovery packets for the first IP address that belongs to our task.

Example: udp.port == 9595 && ip.addr == 10.0.0.6

The filtered trace should show the Source as the LDMS Core, and the Destination as the client IP address we got from the PolicyTaskHandler.exe.log

Example: LDMS Core - 10.14.130.58 Client - 10.0.0.6

Repeat this check against the 2nd and 3rd IP addresses as listed in PolicyTaskHandler.exe.log for the taskID.

Spot check a couple more from the list as well, sometimes only a single packet is sent to the first IP which is easier to identify by checking several in the list.

All Discovery Packets Sent

If the couple IP's checked did send discovery packets, continue to: Did the Client Respond to Discovery?

Only Single Discovery Packet Sent

If no discovery packets were sent to any of the clients on the task, but PolicyTaskHandler.exe.log shows that we attempted to discover a device by IP, Windows may have not got

Single Discovery Packet

ARP Responses Received, No Discovery Packet Sent or Single Discovery Packet Sent

PolicyTaskHandler.exe.log shows many discovery attempts for a task
Wireshark shows a single Discovery Packet go out

-or-

PolicyTaskHandler.exe.log shows many discovery attempts for a task
Wireshark shows ARP Responses to local subnet ARP Requests, but no Discovery Packets go out

This is typically caused by 3rd party security software on the LDMS core stopping the UDP Traffic, typically due to its resemblance to a UDP Flood Attack. We have seen Symantec Endpoint Protection (SEP) behave in this manner.

To correct this issue:

Configure the security software to allow UDP Traffic out from LDMS

-or-

Remove the software from the LDMS Core

We have also seen instances where the discovery packet sent to a particular device loops or bounces between two or more nodes. Each bounce decrements the TTL value until it reaches zero. At this point a TTL error is sent back to the core and Windows shuts down further network communication for that task. This results in failure to discover all remaining devices in the task.

To correct this issue, the cause of the loop must be determined and resolved by the networking team.

Discovery Packets Started then Stopped

PolicyTaskHandler.exe.log showed attempting to discover devices, wireshark showed discovery attempts at first, but then stopped prematurely.

On each packet sent is a defined value for how long the packet should be allowed to move across the network. This is called the Timeto Live (TTL), and by default a discovery packet's TTL is set to 128. Each time a packet makes a hop on the network (switch, router, client etc) it diminshes this value by 1. So if it takes 3 hops to get from Core to Client, the TTL on the packet when the client gets it would be 128-3 = 125. A TTL of 128 is enough to get around the world, so it is plenty to get through your network.

If however the TTL bounces around so much that it hits 0, an ICMP packet is sent to the original source of the packet indicating Time-to-live exceeded (Time to live exceeded in transit).

Note: This is most commonly caused by a routing loop in the network.

In order to send packets out, LDMS passes parameters to a Windows method indicating what type of packet, and what IP address. The OS (Windows) responds indicating it received the request, and LDMS Logs the attempt to policytaskhandler.exe.log that we attempted to discover the client.

It has been seen where if a TTL exceeded ICMP packet is sent to the Core, it will terminate new UDP packets going out for current PolicyTaskHandler tasks at an Operating System level. This means that LDMS continues to pass parameters to the OS asking Windows to send out packets, and the method returns that it received the request successfully, LDMS logs the attempt, but the OS does not continue sending packets out for the existing PolicyTaskHandler process.

Example: A task contains 6 IP addresses: 10.0.0.1, 10.0.0.2, 10.0.0.6, 10.0.0.10, 10.0.0.17, 10.0.0.50

Discovery packets are sent until a TTL Exceeded ICMP packet is returned to the core.

PolicyTaskHandler.exe.log indicates that we continued passing parameters to the OS to send out packets for us, despite no more packets showing in Wireshark.

Discover: Discovering machine: [TABLET] using it's known ip address [10.0.0.1]...
Discover: Discovering machine: [SQLTEMP] using it's known ip address [10.0.0.2]...
Discover: Discovering machine: [JANE] using it's known ip address [10.0.0.6]...
Discover: Discovering machine: [SAM] using it's known ip address [10.0.0.10]...
Discover: Discovering machine: [NICK] using it's known ip address [10.0.0.17]...
Discover: Discovering machine: [JOE] using it's known ip address [10.0.0.50]...

To correct this issue -

This is typically caused by routing loops on the network. Running Tracert against the IP may provide more information regarding where potential loops exist.

The preferred fix for this is to correct the routing loop. The packet has a high enough Time-to-Live value to get around the world, so it should not bounce around your network so many times that it was exceeded.
Alternatively, if the IP triggering the TTL exceeded is for a device on a scheduled task, the device can be removed from the task which should prevent attempted discovery packets being sent to that address, and instead can be added to a task that is available via Policy. This would prevent the discovery packet from going out since it is a policy, and the client would still be able to get the task on check in.

Note: This issue has been seen to occur on after some TTL Exceeded packets, but not necessarily all. If Discovery packets stop being sent after a TTL Exceeded packet is received on the core, this address should be reviewed for potential routing loops and treated accordingly.

Discovery Packets Not Sent

PolicyTaskHandler.exe.log showed attempting to discover devices, but there were no discovery packets (shown on udp.port == 9595) going to it.

No ARP Response

When determining who to send the discovery packet to, Windows will check if the client IP address is in the same subnet as the sender. If the Core and client are in the same subnet, an ARP Request will go out, asking who currently has the target IP Address. If any machines on the subnet receive the ARP Request and contain information on who has the IP address, they respond with the client information so Windows can route the packet to the client.

Example: The Core (10.14.130.58) is trying to discover Client (10.14.130.61). Because they are in the same subnet, an ARP Request is issued asking who has the 10.14.130.61 address. An ARP Response is sent back from a machine on the subnet indicating which device has the IP, and the Discovery packet is issued.

If there is no ARP Response to the request, Windows will not know where to send the discovery packet, and it will not be sent.

Example: The Core (10.14.130.58) is trying to discover Client (10.14.130.250). Because they are in the same subnet, an ARP Request is issued asking who has the 10.14.130.250 address. No machines send an ARP Response to indicate who has the IP, so no discovery packet is sent.

Did the Client Respond to Discovery?

After the Core sends a discovery packet to the client, it looks for a response from the client indicating it is online and available.

In the wireshark trace, apply a filter to show only discovery packets and the client's ip address.

Example: udp.port == 9595 && ip.addr == 10.14.130.61

There should be a discovery packet going from the LDMS Core (Source) to the Client (Destination). This is reflected in the first line of the screenshot.
There should be a response from the Client (Source) to the LDMS Core (Destination).

Yes - Client responded

Continue to Client responds but is still marked 'Failed to Discover'

No - Client did not respond

Continue to Does the Client receive the discovery packet?

Does the Client receive the discovery packet?

Verify the client is actually online. If it is not online, it cannot receive the discovery packet.

If the device is online, install Wireshark on the client and trace the discovery attempt.

Example: udp.port == 9595 && ip.addr == 10.14.130.61

{steps on manually doing a single discovery)

Note: Core IP = 10.14.130.58, Client IP = 10.14.130.61

No discovery packet received

Discovery Packet received, but no response returned to Core

Yes - Client received discovery packet

If wireshark shows that the Client did see the discovery packet, but did not respond to it, this could indicate an issue with the Agent. Things to try:

Verify the LANDesk(R) Management Agent is started.
Reboot the client

If the agent services are all running, and continue to not return a response to the discovery packet, check Client side logs under C:\ProgramData\Landesk\Logs for any errors.

No - Client did not receive discovery packet

If wireshark does not show that a discovery packet was received, this is an indication that the packet is not routing to the client correctly. This could be caused by:

Firewall blocking packets/ports
Network routing packets to wrong machines

Correct this issue internally to allow the client to receive the packet.

Client responds but is still marked 'Failed to Discover

In LDMS 9.6, discovery attempts now include client validation. This helps to verify that the machine responding to the Discovery attempt, is the machine we are intending to reach.

This is done by parsing the Clients Discovery response, which contains inventory information about the device who is responding.

You can simulate the discovery and view the response with the following command: pds2dis.exe ping {client ip}

Example: C:\Program Files\LANDesk\ManagementSuite\pds2dis.exe ping 10.14.130.61

Response:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<discover>
<response><address>10.14.130.61</address><XHSH>3D820E0A</XHSH><APID>ping</APID><MAID>FE495D5D-404A-234C-9CAB-697F258E8DB7</MAID><FQCN>96-Agent.evdomain.local</FQCN><MACA>000C29C44B0A</MACA><MASK>255.255.255.00.0.0.0</MASK><AGRP>EVDOMAIN</AGRP><OSFM index="7">Win32</OSFM><OSNM index="4">WinNT</OSNM><OSVR>060223f0</OSVR><CERT>;55e3c398</CERT><PONG size="8">Q0JBOAAAAAA=</PONG></response>
</discover>

If the response contains information that does not match the clients Inventory information in the LDMS core, it will be considered the 'wrong machine' and marked as 'failed to discover'.

This can be viewed within the C:\ProgramData\Landesk\Log\PolicyTaskHandler.log

07/20/2015 07:00:33.2837 PDS2CallbackFunc: not the same deviceId - ip=3d820e0a

Note: The ip address is listed here as a hexidecimal representation of the IPv4 address. Visit this article on how to convert the values to Dotted IPv4 addresses.

How to Convert IP from Hex to Dotted Format

Check Client with Real-time Discovery

A device that was PolicyTaskHandler.exe.log shows as failed to discover, but that Wireshark shows as having responded to the Discovery likely has Inventory data that conflicts with what was returned in the discovery.

This can be checked quickly by:

Right click the client in Inventory and choose Scheduled Tasks and Diagnostics

In the Scheduled tasks and diagnostics window click the Real-time Discovery button (globe in upper left corner)

The Discovery Information window will open. At this point, the LDMS Core is sending Discovery packets to the client, waiting for returned xml data.
- All fields stay at 'No data' - This indicates that we never got a response from the Client. This machine would be marked 'Failed to Discover'.
- Discovery response XML populates, and all fields remain same color - This indicates that the Client responded, and appears to be the Client we were looking for. This machine would be a successful Discovery.
- Discovery response XML populates, and one or more fields are red - This indicates that the Client responded, but the response contained data that conflicts what we have in Inventory. This machine would be marked as Failed to Discover, and not be given the task.

If Real-time Discovery shows a mismatch, correct the inventory issues at hand in order to Discover the affected client.