What can you do if you’ve completed maintenance in a component (a memory component, as per the example below) but keep receiving failure messages?
First, try clearing all the error messages after completing the maintenance. Now, check if the threshold is reached again. If so, you may need to replace it.
How can you do it? Easy: use the ILOM (Integrated Lights Out Manager) interface from the ssh command line.
ssh [email protected] -> show /SYS/MB/P0/D3 Expected: [...] fault_state = Faulted [..] -> set /SYS/MB/P0/D3 clear_fault_action=true Are you sure you want to clear /SYS/MB/P0/D3 (y/n)? y -> show /SYS/MB/P0/D3 [Expected] /SYS/MB/P0/D3 Targets: PRSNT SERVICE Properties: type = DIMM ipmi_name = MB/P0/D3 fru_name = 16384MB DDR4 SDRAM DIMM fru_manufacturer = Samsung fru_part_number = % fru_rev_level = 01 fru_serial_number = % fault_state = OK clear_fault_action = (none)
I hope this helps if you’re facing this issue.
If you have any questions or thoughts, please leave them in the comments.