Node | Failure | Date of Failure | Action Taken | Date of Repair |
---|---|---|---|---|
lhcb018.usc.cesga.es | kernel panic | 12/03/2006 | system restarted | - |
lhcb061.usc.cesga.es | faulty memory: DIMM4 | 20/02/2007 | module replaced | - |
lhcb063.usc.cesga.es | scsi timeout, lost disc access | 08/03/2007 | system restarted | - |
lhcb025.usc.cesga.es | faulty disk | 07/07/2007 | disk replaced | - |
lhcb062.usc.cesga.es | scsi timeout, lost disc access | 10/07/2007 | system restarted | - |
lhcb020.usc.cesga.es | faulty disk | 27/07/2007 | disk replaced | - |
lhcb052.usc.cesga.es | python process consumed all avaliable memory | 30/07/2007 | system restarted | - |
lhcb036.usc.cesga.es | faulty disk | 10/08/2007 | disk replaced | - |
lhcb029.usc.cesga.es | faulty disk | 19/09/2007 | disk replaced | - |
lhcb041.usc.cesga.es | faulty disk | 19/09/2007 | disk replaced | - |
lhcb021.usc.cesga.es | faulty disk | 10/10/2007 | disk replaced | - |
lhcb022.usc.cesga.es | faulty disk | 10/10/2007 | disk replaced | - |
lhcb031.usc.cesga.es | faulty disk | 10/10/2007 | disk replaced | - |
lhcb038.usc.cesga.es | faulty disk | 10/10/2007 | disk replaced | - |
lhcb023.usc.cesga.es | faulty disk | 05/12/2007 | disk replaced | - |
lhcb030.usc.cesga.es | faulty disk | 05/12/2007 | disk replaced | - |
lhcb069.usc.cesga.es | faulty motherboard | 08/01/2008 | motherboard replaced | 12/02/2008 |
lhcb079.usc.cesga.es | faulty motherboard | 20/02/2008 | motherboard and power supply replaced | 05/03/2008 |
lhcb036.usc.cesga.es | probably a faulty power supply | 12/03/2008 | in progress (now working) | - |
lhcb079.usc.cesga.es | faulty motherboard and power supply | 18/03/2008 | motherboard and power supply replaced | 23/04/2008 |
lhcb080.usc.cesga.es | faulty motherboard and power supply | 18/03/2008 | motherboard and power supply replaced | 23/04/2008 |
lhcb086.usc.cesga.es | faulty motherboard and power supply | 18/03/2008 | motherboard and power supply replaced | 23/04/2008 |
lhcb074.usc.cesga.es | faulty power supply | 08/04/2008 | power supply replaced | 25/04/2008 |
lhcb085.usc.cesga.es | faulty power supply | 06/05/2008 | power supply exchanged with the one in lhcb070 | 20/05/2008 |
lhcb070.usc.cesga.es | faulty sensor fan and faulty power supply(comming from lhcb085 on 20/05/2008) | 13/05/2008 | motherboard and power supply replaced | 02/07/2008 |
lhcb074.usc.cesga.es | unknown | 02/06/2008 | after unplugging and plugging again all the memory the problem was solved | 13/06/2008 |
lhcb054.usc.cesga.es | faulty DIMM | 30/06/2008 | DIMM replaced | 28/07/2008 |
lhcb079.usc.cesga.es | after too many 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory | 15/10/2008 | after unplugging and plugging again all the memory the problem was disappeared | 16/10/2008 |
lhcb066.usc.cesga.es | Periodically the one bank of memory will be disable because too many 'Correctable ECC' ocurred after reboot things returned to normal. The reason of the failures was a faulty DIMM. The actual DIMM was discovered after moving half of the modules to another machine (lhcb064) and the failures apeared in the new machine. These failures were reproducible by running memtest long enough | 08/04/2008 | Faulty DIMM replaced | 03/11/2008 |
lhcb065.usc.cesga.es | On 20/10/2008 the memory of lhcb079 was exchanged with the memory of this machine and after around a week and several 'Correctable ECC' errors the machine will not reboot because it was unable to find any memory | 01/11/2008 | All problems in the Caton machines were related to the power supply. The power supplies were changed on all machines between August and September 2008 | 01/04/2009 |
lhcb027.usc.cesga.es | faulty disk | 17/04/2009 |