-- MarcosASeco - 14 Mar 2007
Node Failure Date of Failure down Action Taken Date of Repair
nodo077.inv.usc.es faulty disk 01/02/2011 Disk swapped with the one from nodo025 03/02/2011
nodo069.inv.usc.es faulty disk 01/02/2011 Disk swapped with the one from nodo026 03/02/2011
nodo065.inv.usc.es faulty motherboard and power suply 01/02/2011 motherboard and power supply swapped with those from nodo025 03/02/2001
lhcb065.usc.cesga.es On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory 01/11/2008 All problems in the Caton machines were related to the power supply. The power supplies were changed on all machines between August and September 2008 01/04/2009
lhcb074.usc.cesga.es unknown 02/06/2008 after unplugging and plugging again all the memory the problem was solved 13/06/2008
lhcb023.usc.cesga.es faulty disk 05/12/2007 disk replaced -
lhcb030.usc.cesga.es faulty disk 05/12/2007 disk replaced -
lhcb085.usc.cesga.es faulty power supply 06/05/2008 power supply exchanged with the one in lhcb070 20/05/2008
nodo109.inv.usc.es faulty disk 07/02/2011 Disk replaced 10/02/2011
lhcb025.usc.cesga.es faulty disk 07/07/2007 disk replaced -
lhcb069.usc.cesga.es faulty motherboard 08/01/2008 motherboard replaced 12/02/2008
lhcb063.usc.cesga.es scsi timeout, lost disc access 08/03/2007 system restarted -
lhcb074.usc.cesga.es faulty power supply 08/04/2008 power supply replaced 25/04/2008
lhcb066.usc.cesga.es Periodically the one bank of memory will be disable because too many 'Correctable ECC' ocurred after reboot things returned to normal. The reason of the failures was a faulty DIMM. The actual DIMM was discovered after moving half of the modules to another machine (lhcb064) and the failures apeared in the new machine. These failures were reproducible by running memtest long enough 08/04/2008 Faulty DIMM replaced 03/11/2008
lhcb062.usc.cesga.es scsi timeout, lost disc access 10/07/2007 system restarted -
lhcb036.usc.cesga.es faulty disk 10/08/2007 disk replaced -
lhcb021.usc.cesga.es faulty disk 10/10/2007 disk replaced -
lhcb022.usc.cesga.es faulty disk 10/10/2007 disk replaced -
lhcb031.usc.cesga.es faulty disk 10/10/2007 disk replaced -
lhcb038.usc.cesga.es faulty disk 10/10/2007 disk replaced -
lhcb018.usc.cesga.es kernel panic 12/03/2006 system restarted -
lhcb036.usc.cesga.es probably a faulty power supply 12/03/2008 in progress (now working) -
lhcb070.usc.cesga.es faulty sensor fan and faulty power supply(comming from lhcb085 on 20/05/2008) 13/05/2008 motherboard and power supply replaced 02/07/2008
lhcb079.usc.cesga.es after too many 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory 15/10/2008 after unplugging and plugging again all the memory the problem was disappeared 16/10/2008
lhcb027.usc.cesga.es faulty disk 17/04/2009 Disk replaced 20/04/2009
lhcb079.usc.cesga.es faulty motherboard and power supply 18/03/2008 motherboard and power supply replaced 23/04/2008
lhcb080.usc.cesga.es faulty motherboard and power supply 18/03/2008 motherboard and power supply replaced 23/04/2008
lhcb086.usc.cesga.es faulty motherboard and power supply 18/03/2008 motherboard and power supply replaced 23/04/2008
lhcb029.usc.cesga.es faulty disk 19/09/2007 disk replaced -
lhcb041.usc.cesga.es faulty disk 19/09/2007 disk replaced -
lhcb061.usc.cesga.es faulty memory: DIMM4 20/02/2007 module replaced -
lhcb079.usc.cesga.es faulty motherboard 20/02/2008 motherboard and power supply replaced 05/03/2008
lhcb020.usc.cesga.es faulty disk 27/07/2007 disk replaced -
lhcb054.usc.cesga.es faulty DIMM 30/06/2008 DIMM replaced 28/07/2008
lhcb052.usc.cesga.es python process consumed all avaliable memory 30/07/2007 system restarted -

Revision: r1.23 - 10 Feb 2011 - 17:32 - MarcosASeco
LCGatUSC > HardAtUSC > HardwareFailures
Copyright © 1999-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding this material Send feedback