scope agreement for the reported your issue.
Issue
Definition: Server "xxxxxxxxx" takes long
time to restart while shutting down and reboot is normal. When the restart is
initiated from start menu then the complete reboot cycle takes more than 20
minutes however if we shut down the machine and perform cold start then the
machine comes up in not more than 5 minutes. This is a domain controller and
the issue started after the server was updated with latest hotfixes/windows
updates. The reboot gets stuck while shutting down the machine and we see a
black screen on ILO console during that time.
Environment:
OS Name Microsoft®
Windows Server® 2008 Standard
Version 6.0.6002
Service Pack 2 Build 6002
System Name
xxxxxxxxx
System
Manufacturer HP
System Model
ProLiant DL360 G7
System Type
x64-based PC
Time Zone Eastern
Standard Time
Installed Physical
Memory (RAM) 12.0 GB
Total Physical
Memory 12.0 GB
Available Physical
Memory 10.3 GB
Total Virtual
Memory 13.6 GB
Available Virtual
Memory 12.2 GB
Page File Space
2.00 GB
Page File
C:\pagefile.sys
Scope
Agreement: The case will be marked as resolved and
ready to close once we have successfully resolved the issue "Server
"xxxxxxx" takes long time to restart while shutting down and reboot
is normal" or if we come to a conclusion that it is a by design behavior.
Any further issues, which require dedicated troubleshooting, will be considered
as a new case and shall be handled by the concerned team. In case the issue is
ruled out to be due to third party application, then we will provide best
effort support and you will have to engage the application vendor.
Please Note: The Microsoft tickets are server and issue
specific. One ticket can be used only for one server and one issue. If the
problem is found to be due to third-party code we will provide information to
substantiate this.
We will now begin
working together to resolve your issue. If you do not agree with the scope
defined above, or would like to amend it, please let me know as soon as
possible. If you have any questions or concerns, please don't hesitate to
contact me.
Best Regards,
xxxxxxx
I just sent action plan to
Marcus.
Here it is:
Hey xxxxxxx,
I am the current owner of your
case where rebooting takes over an hour but shutdown and start takes 5 minutes.
From previous engineer’s
notes, the action plan was to test with Symantec removed.
Have you had a chance to
perform that yet?
If so, are you still having
the issue?
If so, the next action plan
would be to
1. **setup the server for a
complete memory dump.
2.** From diagnostics, it
appears your machine is setup for complete dump using NMI.
3.Service 'HP ProLiant System
Shutdown Service' is configured to automatically start on this machine.
This service may cause a
memory dump to be either corrupted or not be created.
4.**Please stop and disable
‘HP ProLiant System Shutdown Service’
5.After configuring server,
shutdown, start, then perform a restart to reproduce issue.
6.When server is hung on
shutdown, crash the server using NMI switch.
This may or may not work
depending on where machine is actually hanging.
7. If it does not work, we can
then proceed with getting a reboot xperf trace.
I uploaded 2k8r2-x64-xperf.zip
to your workspace.
Please download and extract to
server having issue.
Open admin command prompt
From extracted directory run
xperf.mgr.bat /?
If prompted to
disablepagingexecutive, type yes
Run xperf.mgr.bat rebootcycle
Wait for machine to reboot
Log on
Xperf will now compile files
into c:\*.etl
From admin command prompt run
xperf.mgr.bat clean
Zip and upload c:\*.etl
####################
CASE ARCHIVED
Since your down time window
is more than a couple of weeks away, we typically request archiving the case
until you have new information.
When a case is archived, you
can always contact me with case number for same issue and I will reopen. If I
am not available, you can also call back into the queue.
You have next steps of:
- Gathering a complete dump
- If above fails or since you
have a down time window you may want to gather this as well, capture an xperf
rebootcycle trace. It may be best to do both since it takes so long to get
window approved.
We were unable to create Crash dump using NMI switch . Hence , we
have generated etl (event trace log ) through xperf tool
Results from xperf were
inconclusive.
Only item of interest is
Cpqnicmgmt.exe which was only process /service sticking out from services view.
When would be a good time to
have a call and discuss next steps?
Thanks,
##########
Our next steps would be to
determine why NMI is not working.
It is very rare for NMI not
to work.
There may be some
configuration changes that we can make to get NMI to work.
Other than that we could
either run some isolation tests such as testing in safe mode or disabling
processes / services and testing.
Lastly we may have to hook
up a debugger to the server to try and capture additional information when it
hangs.
If this is just a single
server with this issue, it may be best to rebuild.
Please let me know if you
have availability during my working hours or how you would like to proceed.
Thanks,
xxxxxxx
After discussion with case
contact and reviewing what has been done, here are the current limitations we
are hitting:
For slow reboot issues we
have only a couple of tools that can be used to diagnose issue.
ISOLATE:
We would first try to
isolate the issue by looking at recent changes on the server such as hotfixes,
application installs or updates, drivers.
We would also test to see if
the same issue occurs while booting into safe mode.
TROUBLESHOOT:
The primary tool would be
capturing a complete memory dump of server while it is in a hung state.
The secondary tool would be
to capture an XPerf (trace) of server reboot.
For this case we tried to
isolate the issue but found no recent changes and booting into safe mode still
had the issue. We then tried to capture a complete memory dump using NMI but
were unable to capture a dump possibly due to where we are in the shutdown
process. We then were able to capture an Xperf but the trace did not show
anything conclusive as the trace stopped before the hang situation occurred.
At this point, we have
exhausted all of the standard troubleshooting techniques. Since this is a single
server issue, the server is a domain controller, and the server does not have
any special configuration, the best action would be to rebuild the server.
Thanks,
xxxxxx xxxxx
Windows Platforms Core –
Reliability Team – Support Escalation Engineer
Office: xxxxxxx
Working Hours: Mon – Fri
8AM - 5PM EST
###############################################
Microsoft recommended to rebuild this server --- :-(
No comments:
Post a Comment