By December 18, 2006

Ashes, ashes, we all fall down.

I bit the bullet and bought the makings of a hard disk for my file server Janus last week. We were using 700GB+ of storage and we needed more space. I planned on building a RAID-Z array in Solaris 10 via ZFS. I read this blog from a Sun developer about building a multi-terabyte server with Sol10 and ZFS. Mark said it was “pretty easy.” I studied up on other documentation on Solaris 10 and ZFS. I was willing to give this a shot. Based off of Mark’s guide, I purchased this four-port Syba card and three 500GB disks from Newegg. They were delivered on Friday.

It’s Monday now, and I’ve learned a lot. The most important thing I learned was never trust a Solaris guru when he says something was “pretty easy.”

If you want the short version, skip to the end. Here’s the long version, as of this afternoon.

Friday.

  • Removed all of my existing hard drives from Janus. I put the OS drive in one spot on my desk, my data drives in another. I put a spare 40GB drive in to serve as the Solaris 10 OS drive. Jumpers set properly.
  • Put my three new hard drives in the lower drive cage and wire them up to the Syba card.
  • I installed the Syba controller. The DVD Solaris 10 installer was thrown for an infinite loop, stammering “APIC Error Interrupt on CPU 0 status 0 = 8, status 1 = 8.” I have no idea what that means.
  • I research the APIC error for a few hours and try various motherboard BIOS configurations to fix it. Go to bed. Total time spent so far: two and a half hours.

Saturday.

  • I decided to upgrade the BIOS on my motherboard.
    • The link to the latest BIOS from the manufacturer’s site was broken. I FTPed in and got the correct BIOS.
    • The self-extracting archive was in Chinese. Luckily I pressed the right button and the files extracted successfully.
  • I didn’t have any floppies of my own in the house, so I downloaded Bart’s FreeDOS BIOS boot disk CD image. I put the BIOS update file in the appropriate folder and gave it a shot. FreeDOS reported “ROOT FAT KERNEL GO” and then crashed. Research on the Web suggested that USB devices might cause this error. I disconnected my external USB hard drives. No luck.
  • Started to consider the RAID 5 in Windows XP Pro registry hack.
  • Thinking my motherboard didn’t like Bart’s CD, I decided to burn the 256K BIOS directly to a 700MB CD. Total burn time: 27 seconds, mostly due to write caching on Nero.
  • My motherboard will not read BIOS files off of a CD — floppy only. Jesus.
  • Decided to put the BIOS file on a CD with a bootable floppy image, similar to the Bart/FreeDOS approach. Downloaded the win98sc.exe boot disk image. Discovered that Nero has its own boot disk image. At least my burn times were consistent: another 27 seconds.
  • No dice. I appropriated a floppy disk (sorry honey, I backed up your data beforehand just in case). Updated the BIOS successfully.
  • Still got the same ACPI error from the Solaris 10 installer.
  • Drank some coffee.
  • Disconnected my PCI firewire card, network card, and the SATA card in case there was a resource conflict. No dice.
  • Manually disabled ACPI in the Solaris installer via the -B acpi-user-options=2 command … still didn’t work.
  • Fuck Solaris. I re-read the tutorials on enabling RAID 5 in Windows XP Pro.
  • The example configuration lines given in the Tom’s Hardware tutorial weren’t the same as the ones in my file. Uh oh. Oh well, I’ll deal with that later. Let’s get the card and drives going.
  • In order to use WXP RAID 5 I would have to disable the Syba card’s RAID drivers and install the card’s regular SATA drivers. I put my old system drive in. The machine wouldn’t boot to Windows with the new card installed, even in Safe Mode. The machine crashed while/during/after loading amdagp.sys.
  • Tried the Syba card in all of the available PCI slots, including my 64-bit PCI slots.
  • Installed the Syba card in Lady Jaye’s machine with one new drive attached.
  • Lost my screwdriver somewhere. Used my hot spare. Redundancy isn’t just about data, folks.
  • Installed the RAID drivers for the Syba card. According to the manufacturer’s Web site, one couldn’t install the non-RAID drivers without doing the RAID ones first. The written instructions stated the contrary, but I figured the Web site was the most current/correct.
  • Tried to install the non-RAID drivers. The computer crashed during installation.
  • Windows XP Pro will not install the non-RAID drivers, either crashing or stating that a better driver is already installed. I uninstall all of the drivers found in the RAID/SCSI section of the Hardware Management control panel.
  • Reboot. Windows XP Pro automatically reinstalls the RAID drivers. I uninstall them again. Physically remove the card from the workstation and put it in another PCI slot.
  • The Syba card was not recognized in the second slot. Ran the installation software that came with the card, but this was just for array management. The software reported that no array could be found. No shit.
  • Remove the card from Lady Jaye’s workstation and put it back into my file server.
  • Found my screwdriver.
  • Enter the Syba’s BIOS configuration screen. Discovered that my $20 card does CPU-assisted RAID 5. It won’t allow for online capacity expansion, but I figure that’s not a big deal for right now. Hoping to avoid the Windows XP Pro drivers and RAID 5 hack unpleasantness, I configure the RAID 5 array in the card’s BIOS.
  • I disconnect everything in my machine except for the system hard drive, video card and ram. Computer will post, but not boot to Windows.
  • Read, read, read about amdagp.sys. I pull my OS drive and slave it to Lady Jaye’s workstation. I copy over a fresh amdagp.sys. Return the OS drive to Janus. Boot. No luck.
  • Pull the AGP video card out of Janus. Power down and open my workstation so I can switch video cards. Realize that I have PCI-X video cards in all the other computers in the house. Have a problem putting my workstation’s case cover back on. I launch the cover into the hallway to teach it a lesson.
  • Wonder if the ram is bad. I take out the four sticks and try them each individually. No dice.
  • Decide to reinstall Windows XP on my original OS drive. Copy over ~80GB of data onto an external hard drive.
  • Try to reinstall Windows XP. The text-based portion of the setup goes fine. When the installer reports “Starting Windows,” the screen goes blank, just like trying to boot into Windows regularly.
  • Downgrade my BIOS in an attempt to return everything to its original, working condition. Nothing.
  • Install a backup power supply. Maybe my old one is dying out. The 5 volt rail is about, I shit you not, 1/8″ too short to make it to the plug on the motherboard. I install a weaker power supply to no effect. Fuck it. I’m done for the night.
  • Total time spent: over ten hours.

Sunday.

  • Get ahold of Teach and Rangerette; luckily they have a spare video card. Lady Jaye and I head over, jabberjaw for a bit, and I pull an old AGP card out of a dead machine.
  • Drive home, install the card — same problem. I think either the motherboard or the AGP slot is dead. I’m waiting for Stilts to arrive today or tomorrow with a Mach 64 2MB PCI video card. With my luck, it will be too old to work in WXP.
  • Total time spent: twelve hours, including drive time.

The end (fow now). I have seven hard drives, one network card, two AGP cards, two optical drives, a multi-flash card reader, floppy drive, firewire three power supplies and a SATA card on my desk, floor, or computer case. I’m starting to formulate my repair/replace strategies. I could buy a PCI video card if the problem is the AGP slot. This is the cheapest solution at about $25 + shipping. Everything gets more expensive from there. Janus is currently a dual AMD Socket A system. Getting a replacement motherboard is pretty much out of the question; they were rare when Socket As were still being manufactured. I could buy a new processor/motherboard combo, but I am not sure if I could reuse the 2.5GB of ram in Janus because it is registered ECC ram. That means I’d have to buy more ram, and that adds to the cost. Bah.

Keep your fingers crossed that Stilt’s aging PCI card saves the day. He’ll be here tonight.

Related posts:

Posted in: technology

3 Comments on "Ashes, ashes, we all fall down."

Trackback | Comments RSS Feed

  1. Markie says:

    Stay away from SUN. I had to setup hw and sw once and it almost did me in… There was this particular, undocumented, fiber configuration that only the tech’s knew about… WTF…

    Mark

  2. Holy fscking shit man!

    I most definitely do not envy you! I’ve always steered clear of Solaris on x86 (and most other x86 platforms for that matter) exactly because of stuff like this! Computing is my line of work, not my hobby.

    /me hands you a big cup of cocoa.

    Yes, there’s marshmallows in there… 😉

  3. Tom says:

    I’ve been running the card (from newegg even) in a Fedora Core 5 system for awhile now. It works well. I am looking at getting the card (a 2nd one) running under Solaris 10 u3 so I can do ZFS.

    Don’t use the RAID stuff on the card. It’s software raid and Linux/Solaris can do that itself, thank you very much. In fact, the card’s RAID makes the disks non standard as you found with XP.

    I’m looking at flashing the BIOS on the card to make it IDE, non RAID.