Thursday, August 3, 2017

SCCM SUP WSUS Pool keeps stopping or the server is unresponsive

SCCM SUP WSUS Pool keeps stopping or the server is unresponsive

Scenario: Our WSUS/SUP had become unresponsive and the decision to reinstall the server role was made. After the Site server had been reinstalled  I became aware that Windows Defender updates were failing to update (3 days old) and even though the updates were sync'd, downloaded, and deployed in SCCM the client was still unable to receive them.


Client Log analysis:

ScanAgent.log

ScanJob({999C9FFA-A463-4BE8-8771-67EE96D4140B}): CScanJob::OnScanComplete -Scan Failed with Error=0x80240440
ScanJob({999C9FFA-A463-4BE8-8771-67EE96D4140B}): CScanJobManager::OnScanComplete- failed at CScanJob::OnScanComplete with error=0x80240440



Update Deployment.log

Job error (0x80240440) received for assignment ({bf7a48e6-d220-4070-bb9b-ecc239107584}) action        UpdatesDeploymentAgent       
Updates will not be made available       

WUAHandler.log

Async searching of updates using WUAgent started.       
Async searching completed.       
OnSearchComplete - Failed to end search job. Error = 0x8024401c.        
Scan failed with error = 0x8024401c.        
Its a WSUS Update Source type ({3AAB6A76-CE2D-4E8A-9F11-741AE69677A2}), adding it.        
OS Version is 6.3.9600   
Existing WUA Managed server was already set (http://CMSUP.domain.co.uk:8530), skipping Group Policy registration.        

Added Update Source ({3AAB6A76-CE2D-4E8A-9F11-741AE69677A2}) of content type: 2        

WindowsUpdate.log

Report WARNING: CSerializationHelper:: InitSerialize failed : 0x80070002
AU        WARNING: Failed to get Wu Exemption info from NLM, assuming not exempt, error = 0x80070032
WS        WARNING: Nws Failure: errorCode=0x803d0006
WS        WARNING: Original error code: 0x80072ee2
WS        WARNING: There was an error communicating with the endpoint at 'http://CMSUP.domain.co.uk:8530/ClientWebService/client.asmx'.
WS        WARNING: There was an error receiving the HTTP reply.
WS        WARNING: The operation did not complete within the time allotted.
WS        WARNING: The operation timed out


Analysis Results:
Since I removed the role and put it back it was like the role had been installed for the first time.  Every client within SCCM (9k+) would need to do a Full Software Update & scan cycle. This would generate a heavy load on the ISS pool on the SUP server.  It is very important to configure the pool correctly otherwise the server will stop responding to clients as the client will receive "Service Unavailable" responses similar to DOS.

WSUS Pool Config
Queue Length: 25000
Limit (Percent) 70
Limit Action Throttle
Maximum workers (0 Numa) (1 Default) (2 if you have the server resource, not NUMA)
"Service Unavailable" Response TcpLevel
Failure Interval (Minutes) 30
Maximum Failures 60
Private Memory Limit (KB) 0


Conclusion
 Once we had the SUP role reinstalled synchronizing with SCCM (see WCM.log, Wsyncmgr.log) as well as the correct WSUS pool settings, we notice the server was still spiking CPU with ISS workers taking the full 70% limit without breaks.
Upon reviewing the Client Policy (within SCCM Client Settings) for the estate the "Software Update Scan schedule" was set to 1 day (not the recommended 7 days). This had the affect of overloading the WSUS server with 503 errors in the IIS log "service unavailable". As the schedule was every day the server could not get through the backlog before the whole process began again.
After setting the scan schedule to 7 days, and as clients were checking for policy every hour we notice a steady decline in CPU activity within the hour and all clients were able to complete there scan Software Update & scan cycle  and all clients were able to download Software updates.




No comments:

Post a Comment