Pages: [1]
[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9456
Credits: 353,172,950
World-rank: 4,968

2009-04-27 22:56:15

It seems that the BOINCstats update server (the oldest server in the BOINCstats/PrimeGrid server "park", now about 5 years old) is running out of steam. It's producing mass quantities of errors (Machine check exception errors) and it has an increasing number of database failures (corrupt tables). So far I was able to restore everything (backups are made frequently and stored on another server) but I'm just waiting for the big crash.

So, in preparation for the big crash, what are your ideas about what hardware the next server should have and where should I get it. Keep in mind that BOINCstats has a limited budget. (Maybe some server manufactured (Sun, Dell, HP) wants to sponsor a server???)

Needs:

  • loads of memory (minimum 8GB )
  • minimum two CPU cores
  • very fast diskdrives
  • enough storage for the databases (~50GB ), XML files and backups
  • two network interfaces



Just some useless facts about the current update server:


  • AMD Athlon X2 4400+
  • 4GB DDR400 (mainboard maximum)
  • Tyan server grade mainboard (with 2x NIC)
  • Adaptec 29160 SCSI controller
  • 1 SCSI boot disk 37GB 10k-RPM
  • 2 SCSI data disks 74GB 15k-RPM (Linux software RAID 0 for speed)
  • Linux Debian Sarge



This doesn't mean I'm buying yet. I prefer to save the current server, so if anybody has a clue what part of the hardware the Machine check exception error is linked to then feel free to enlighten me. As soon as I can I will run some tests on it but for that I need physical access.

Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
BAM!ID: 64136
Joined: 1970-01-01
Posts: 0
Credits: 0
World-rank: 0

2009-04-28 06:45:47

... so if anybody has a clue what part of the hardware the Machine check exception error is linked to then feel free to enlighten me. As soon as I can I will run some tests on it but for that I need physical access.

My first idea of what you're telling would be bad blocks on the datadisk or the diskcontroller.
Guest

2009-04-28 13:59:09

... so if anybody has a clue what part of the hardware the Machine check exception error is linked to then feel free to enlighten me. As soon as I can I will run some tests on it but for that I need physical access.

My first idea of what you're telling would be bad blocks on the datadisk or the diskcontroller.


i'd first check the memory..
[BOINCstats] Willy
 
Forum moderator - Administrator - Developer - Tester - Translator
BAM!ID: 1
Joined: 2006-01-09
Posts: 9456
Credits: 353,172,950
World-rank: 4,968

2009-06-14 09:00:29

I checked memory and harddisks, both were reported as fine.

Still, it keeps failing. Just yet the host update failed on a "Segmentation fault" and the recovery fails with the same error. Now trying to recover by copying the tables back from the www server.
Please do not PM, IM or email me for support (they will go unread/ignored). Use the forum for support.
noderaser
 
BAM!ID: 13859
Joined: 2006-12-03
Posts: 839
Credits: 483,300,310
World-rank: 4,000

2009-06-17 05:45:23

From my limited experience with command-line Linux, I think a segmentation fault generally means there are bad sectors on the hard disk.
Kokomiko
 
BAM!ID: 16496
Joined: 2007-01-06
Posts: 8
Credits: 85,894,207
World-rank: 12,859

2009-06-17 17:34:08

Often the PSU is aged and the voltage is not longer stable enough. This is the part most users don't think about, but often I found there the failure.
Pages: [1]

Index :: BOINCstats general :: Server trouble