<<O>>  Difference Topic HardwareFailures (r1.23 - 10 Feb 2011 - MarcosASeco)

META TOPICPARENT HardAtUSC
-- MarcosASeco - 14 Mar 2007
Node Failure Date of Failure Action Taken Date of Repair
Line: 33 to 33

lhcb066.usc.cesga.es Periodically the one bank of memory will be disable because too many 'Correctable ECC' ocurred after reboot things returned to normal. The reason of the failures was a faulty DIMM. The actual DIMM was discovered after moving half of the modules to another machine (lhcb064) and the failures apeared in the new machine. These failures were reproducible by running memtest long enough 08/04/2008 Faulty DIMM replaced 03/11/2008
lhcb065.usc.cesga.es On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory 01/11/2008 All problems in the Caton machines were related to the power supply. The power supplies were changed on all machines between August and September 2008 01/04/2009
lhcb027.usc.cesga.es faulty disk 17/04/2009 Disk replaced 20/04/2009
Added:
>
>
nodo077.inv.usc.es faulty disk 01/02/2011 Disk swapped with the one from nodo025 03/02/2011
nodo069.inv.usc.es faulty disk 01/02/2011 Disk swapped with the one from nodo026 03/02/2011
nodo065.inv.usc.es faulty motherboard and power suply 01/02/2011 motherboard and power supply swapped with those from nodo025 03/02/2001
nodo109.inv.usc.es faulty disk 07/02/2011 Disk replaced 10/02/2011

 <<O>>  Difference Topic HardwareFailures (r1.22 - 15 Jun 2009 - MarcosASeco)

META TOPICPARENT HardAtUSC
-- MarcosASeco - 14 Mar 2007
Node Failure Date of Failure Action Taken Date of Repair
Line: 32 to 32

lhcb079.usc.cesga.es after too many 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory 15/10/2008 after unplugging and plugging again all the memory the problem was disappeared 16/10/2008
lhcb066.usc.cesga.es Periodically the one bank of memory will be disable because too many 'Correctable ECC' ocurred after reboot things returned to normal. The reason of the failures was a faulty DIMM. The actual DIMM was discovered after moving half of the modules to another machine (lhcb064) and the failures apeared in the new machine. These failures were reproducible by running memtest long enough 08/04/2008 Faulty DIMM replaced 03/11/2008
lhcb065.usc.cesga.es On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory 01/11/2008 All problems in the Caton machines were related to the power supply. The power supplies were changed on all machines between August and September 2008 01/04/2009
Changed:
<
<
lhcb027.usc.cesga.es faulty disk 17/04/2009    
>
>
lhcb027.usc.cesga.es faulty disk 17/04/2009 Disk replaced 20/04/2009

 <<O>>  Difference Topic HardwareFailures (r1.21 - 20 Apr 2009 - MarcosASeco)

META TOPICPARENT HardAtUSC
-- MarcosASeco - 14 Mar 2007
Node Failure Date of Failure Action Taken Date of Repair
Line: 32 to 32

lhcb079.usc.cesga.es after too many 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory 15/10/2008 after unplugging and plugging again all the memory the problem was disappeared 16/10/2008
lhcb066.usc.cesga.es Periodically the one bank of memory will be disable because too many 'Correctable ECC' ocurred after reboot things returned to normal. The reason of the failures was a faulty DIMM. The actual DIMM was discovered after moving half of the modules to another machine (lhcb064) and the failures apeared in the new machine. These failures were reproducible by running memtest long enough 08/04/2008 Faulty DIMM replaced 03/11/2008
lhcb065.usc.cesga.es On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory 01/11/2008 All problems in the Caton machines were related to the power supply. The power supplies were changed on all machines between August and September 2008 01/04/2009
Added:
>
>
lhcb027.usc.cesga.es faulty disk 17/04/2009    

 <<O>>  Difference Topic HardwareFailures (r1.20 - 17 Apr 2009 - MarcosASeco)

META TOPICPARENT HardAtUSC
-- MarcosASeco - 14 Mar 2007
Node Failure Date of Failure Action Taken Date of Repair
Line: 31 to 31

lhcb054.usc.cesga.es faulty DIMM 30/06/2008 DIMM replaced 28/07/2008
lhcb079.usc.cesga.es after too many 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory 15/10/2008 after unplugging and plugging again all the memory the problem was disappeared 16/10/2008
lhcb066.usc.cesga.es Periodically the one bank of memory will be disable because too many 'Correctable ECC' ocurred after reboot things returned to normal. The reason of the failures was a faulty DIMM. The actual DIMM was discovered after moving half of the modules to another machine (lhcb064) and the failures apeared in the new machine. These failures were reproducible by running memtest long enough 08/04/2008 Faulty DIMM replaced 03/11/2008
Changed:
<
<
lhcb065.usc.cesga.es On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory 01/11/2008    
>
>
lhcb065.usc.cesga.es On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory 01/11/2008 All problems in the Caton machines were related to the power supply. The power supplies were changed on all machines between August and September 2008 01/04/2009

 <<O>>  Difference Topic HardwareFailures (r1.19 - 05 Nov 2008 - MarcosASeco)

META TOPICPARENT HardAtUSC
-- MarcosASeco - 14 Mar 2007
Node Failure Date of Failure Action Taken Date of Repair
Line: 26 to 26

lhcb086.usc.cesga.es faulty motherboard and power supply 18/03/2008 motherboard and power supply replaced 23/04/2008
lhcb074.usc.cesga.es faulty power supply 08/04/2008 power supply replaced 25/04/2008
lhcb085.usc.cesga.es faulty power supply 06/05/2008 power supply exchanged with the one in lhcb070 20/05/2008
Changed:
<
<
lhcb070.usc.cesga.es faulty sensor fan and faulty power supply(comming from lhcb085 on 20/05/2008) 13/05/2008 motherboard and power supply replaced 2/07/2008
>
>
lhcb070.usc.cesga.es faulty sensor fan and faulty power supply(comming from lhcb085 on 20/05/2008) 13/05/2008 motherboard and power supply replaced 02/07/2008

lhcb074.usc.cesga.es unknown 02/06/2008 after unplugging and plugging again all the memory the problem was solved 13/06/2008
lhcb054.usc.cesga.es faulty DIMM 30/06/2008 DIMM replaced 28/07/2008
lhcb079.usc.cesga.es after too many 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory 15/10/2008 after unplugging and plugging again all the memory the problem was disappeared 16/10/2008
Added:
>
>
lhcb066.usc.cesga.es Periodically the one bank of memory will be disable because too many 'Correctable ECC' ocurred after reboot things returned to normal. The reason of the failures was a faulty DIMM. The actual DIMM was discovered after moving half of the modules to another machine (lhcb064) and the failures apeared in the new machine. These failures were reproducible by running memtest long enough 08/04/2008 Faulty DIMM replaced 03/11/2008
lhcb065.usc.cesga.es On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory 01/11/2008    

 <<O>>  Difference Topic HardwareFailures (r1.18 - 16 Oct 2008 - MarcosASeco)

META TOPICPARENT HardAtUSC
-- MarcosASeco - 14 Mar 2007
Node Failure Date of Failure Action Taken Date of Repair
Line: 27 to 27

lhcb074.usc.cesga.es faulty power supply 08/04/2008 power supply replaced 25/04/2008
lhcb085.usc.cesga.es faulty power supply 06/05/2008 power supply exchanged with the one in lhcb070 20/05/2008
lhcb070.usc.cesga.es faulty sensor fan and faulty power supply(comming from lhcb085 on 20/05/2008) 13/05/2008 motherboard and power supply replaced 2/07/2008
Changed:
<
<
lhcb074.usc.cesga.es faulty power supply 02/06/2008 after unplugging and plugging again all the memory the problem was solved 13/06/2008
>
>
lhcb074.usc.cesga.es unknown 02/06/2008 after unplugging and plugging again all the memory the problem was solved 13/06/2008

lhcb054.usc.cesga.es faulty DIMM 30/06/2008 DIMM replaced 28/07/2008
Added:
>
>
lhcb079.usc.cesga.es after too many 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory 15/10/2008 after unplugging and plugging again all the memory the problem was disappeared 16/10/2008

 <<O>>  Difference Topic HardwareFailures (r1.17 - 05 Aug 2008 - MarcosASeco)

META TOPICPARENT HardAtUSC
-- MarcosASeco - 14 Mar 2007
Node Failure Date of Failure Action Taken Date of Repair
Line: 18 to 18

lhcb038.usc.cesga.es faulty disk 10/10/2007 disk replaced -
lhcb023.usc.cesga.es faulty disk 05/12/2007 disk replaced -
lhcb030.usc.cesga.es faulty disk 05/12/2007 disk replaced -
Changed:
<
<
lhcb069.usc.cesga.es faulty motherboard 08/01/2008 motherboard replaced -
lhcb079.usc.cesga.es faulty motherboard 20/02/2008 motherboard and power supply replaced -
>
>
lhcb069.usc.cesga.es faulty motherboard 08/01/2008 motherboard replaced 12/02/2008
lhcb079.usc.cesga.es faulty motherboard 20/02/2008 motherboard and power supply replaced 05/03/2008

lhcb036.usc.cesga.es probably a faulty power supply 12/03/2008 in progress (now working) -
lhcb079.usc.cesga.es faulty motherboard and power supply 18/03/2008 motherboard and power supply replaced 23/04/2008
lhcb080.usc.cesga.es faulty motherboard and power supply 18/03/2008 motherboard and power supply replaced 23/04/2008
Line: 28 to 28

lhcb085.usc.cesga.es faulty power supply 06/05/2008 power supply exchanged with the one in lhcb070 20/05/2008
lhcb070.usc.cesga.es faulty sensor fan and faulty power supply(comming from lhcb085 on 20/05/2008) 13/05/2008 motherboard and power supply replaced 2/07/2008
lhcb074.usc.cesga.es faulty power supply 02/06/2008 after unplugging and plugging again all the memory the problem was solved 13/06/2008
Changed:
<
<
lhcb054.usc.cesga.es 90% of memory is not working 30/06/2008    
>
>
lhcb054.usc.cesga.es faulty DIMM 30/06/2008 DIMM replaced 28/07/2008

 <<O>>  Difference Topic HardwareFailures (r1.16 - 10 Jul 2008 - MarcosASeco)

META TOPICPARENT HardAtUSC
-- MarcosASeco - 14 Mar 2007
Node Failure Date of Failure Action Taken Date of Repair
Line: 25 to 25

lhcb080.usc.cesga.es faulty motherboard and power supply 18/03/2008 motherboard and power supply replaced 23/04/2008
lhcb086.usc.cesga.es faulty motherboard and power supply 18/03/2008 motherboard and power supply replaced 23/04/2008
lhcb074.usc.cesga.es faulty power supply 08/04/2008 power supply replaced 25/04/2008
Changed:
<
<
lhcb085.usc.cesga.es faulty power supply 06/05/2008 power supply replaced 20/05/2008
lhcb070.usc.cesga.es faulty sensor fan 13/05/2008    
lhcb074.usc.cesga.es faulty power supply 02/06/2008    
>
>
lhcb085.usc.cesga.es faulty power supply 06/05/2008 power supply exchanged with the one in lhcb070 20/05/2008
lhcb070.usc.cesga.es faulty sensor fan and faulty power supply(comming from lhcb085 on 20/05/2008) 13/05/2008 motherboard and power supply replaced 2/07/2008
lhcb074.usc.cesga.es faulty power supply 02/06/2008 after unplugging and plugging again all the memory the problem was solved 13/06/2008
lhcb054.usc.cesga.es 90% of memory is not working 30/06/2008    

 <<O>>  Difference Topic HardwareFailures (r1.15 - 02 Jun 2008 - MarcosASeco)

META TOPICPARENT HardAtUSC
-- MarcosASeco - 14 Mar 2007
Node Failure Date of Failure Action Taken Date of Repair
Line: 25 to 25

lhcb080.usc.cesga.es faulty motherboard and power supply 18/03/2008 motherboard and power supply replaced 23/04/2008
lhcb086.usc.cesga.es faulty motherboard and power supply 18/03/2008 motherboard and power supply replaced 23/04/2008
lhcb074.usc.cesga.es faulty power supply 08/04/2008 power supply replaced 25/04/2008
Changed:
<
<
lhcb085.usc.cesga.es faulty power supply 06/05/2008    
>
>
lhcb085.usc.cesga.es faulty power supply 06/05/2008 power supply replaced 20/05/2008
lhcb070.usc.cesga.es faulty sensor fan 13/05/2008    
lhcb074.usc.cesga.es faulty power supply 02/06/2008    

 <<O>>  Difference Topic HardwareFailures (r1.14 - 12 May 2008 - MarcosASeco)

META TOPICPARENT HardAtUSC
-- MarcosASeco - 14 Mar 2007
Node Failure Date of Failure Action Taken Date of Repair
Line: 25 to 25

lhcb080.usc.cesga.es faulty motherboard and power supply 18/03/2008 motherboard and power supply replaced 23/04/2008
lhcb086.usc.cesga.es faulty motherboard and power supply 18/03/2008 motherboard and power supply replaced 23/04/2008
lhcb074.usc.cesga.es faulty power supply 08/04/2008 power supply replaced 25/04/2008
Added:
>
>
lhcb085.usc.cesga.es faulty power supply 06/05/2008    

 <<O>>  Difference Topic HardwareFailures (r1.13 - 25 Apr 2008 - MarcosASeco)

META TOPICPARENT HardAtUSC
-- MarcosASeco - 14 Mar 2007
Changed:
<
<
Node Failure Date of Failure Action Taken
lhcb018.usc.cesga.es kernel panic 12/03/2006 system restarted
lhcb061.usc.cesga.es faulty memory: DIMM4 20/02/2007 module replaced
lhcb063.usc.cesga.es scsi timeout, lost disc access 08/03/2007 system restarted
lhcb025.usc.cesga.es faulty disk 07/07/2007 disk replaced
lhcb062.usc.cesga.es scsi timeout, lost disc access 10/07/2007 system restarted
lhcb020.usc.cesga.es faulty disk 27/07/2007 disk replaced
lhcb052.usc.cesga.es python process consumed all avaliable memory 30/07/2007 system restarted
lhcb036.usc.cesga.es faulty disk 10/08/2007 disk replaced
lhcb029.usc.cesga.es faulty disk 19/09/2007 disk replaced
lhcb041.usc.cesga.es faulty disk 19/09/2007 disk replaced
lhcb021.usc.cesga.es faulty disk 10/10/2007 disk replaced
lhcb022.usc.cesga.es faulty disk 10/10/2007 disk replaced
lhcb031.usc.cesga.es faulty disk 10/10/2007 disk replaced
lhcb038.usc.cesga.es faulty disk 10/10/2007 disk replaced
lhcb023.usc.cesga.es faulty disk 05/12/2007 disk replaced
lhcb030.usc.cesga.es faulty disk 05/12/2007 disk replaced
lhcb069.usc.cesga.es faulty motherboard 08/01/2008 motherboard replaced
lhcb079.usc.cesga.es faulty motherboard 20/02/2008 motherboard and PowerSup? replaced
lhcb036.usc.cesga.es probably a faulty PowerSup? 12/03/2008 in progress (now working)
lhcb007.usc.cesga.es needs a check at the site 18/03/2008 in progress
lhcb079.usc.cesga.es probably a burned motherboard 18/03/2008 motherboard and PowerSup? replaced
lhcb080.usc.cesga.es probably a burned motherboard 18/03/2008 motherboard and PowerSup? replaced
lhcb086.usc.cesga.es probably a burned motherboard 18/03/2008 motherboard and PowerSup? replaced
>
>
Node Failure <