TryCatchFinally.net Some SQL, some .NET, and whatever else

22May/150

Correcting OGG-01733 in Goldengate – Trail file header file size mismatch

Warning - Oracle/Goldengate support will probably get mad at you if you try this. It worked great for me, but they recommended we reload the data from scratch, so that's probably what they'd recommend for you too. Just know that this is 100% unofficial and unsupported 🙂

Quick summary:

We have Goldengate replication from Oracle 11.2 on Lunux to MSSQL 2012 on Windows, and we ran into an OGG-01733 error "Trail file header file size value {X} for trail file {Y} differs from actual size of the file ({Z})", which caused an ABEND where we were stuck. We opened a ticket with Oracle support and after a week with very little response, the concluded that I should just perform a new initial load on the destination - since the trail files had already been pumped to the destination server and removed from the extract server, they were unable to troubleshoot further.

It turns out the work-around was to open the trail file in a hex editor and manually update the trail file header to make it think it was supposed to be the size it actually was. After saving the file again and resuming replication, it continued on its merry way and applied the transactions without another complaint.

Steps to resolve this error message:

  1. Make a backup of your trail file - you know, since you're editing it and might want a second shot.
  2. Open the report file and make a note of the size the file is currently ("Z") and the size it's supposed to be ("X"). I'll refer to those as X and Z further down.
  3. Use a decimal-to-hex converter like this one to convert both of these values to their hex equivalent (now I'll call them "HX" and "HZ")
  4. Load up the trail file in your favorite hex editing tool - I like using Notepad++ in combination with the HEX-editor plug-in (once the file is loaded, select "HEX-Editor" from the plug-ins menu, and then select "View in Hex")
  5. Perform a search (if you're using Notepad++, ensure the data type is set to "Hexadecimal") for your "HX" value - the size the file thinks it should be. However, you need to search for an even number of digits - if your hex value is an odd number of digits, either drop the leftmost (largest) one or add a zero to the left (I dropped a digit):
  6. Goldengate - Hex Editor

    • Side note: You can see that my trail file size isn't too far into the file - under 300 bytes from the beginning. However, since it's stored in hex, it's not something that's easily viewable in the file (where you will see some file path and server version information if you look to the right where the ASCII is displayed. Also, in my image, the file size is preceded by a quite a few zeroes - my trail files are set to 100MB, but it appears Goldengate supports up to 4GB trail files using the 32 bytes in the header file. Back to fixing this...
  7. CAREFULLY edit the HX value you've found to be the new HZ value - the actual size of the file. In particular, don't move any of the bytes around or add/remove anything, just fix the values you need to change so that the file size is stored in the same location.
  8. Save the file and close it.
  9. Resume replication right where you left off (assuming you made a backup and the edited the original trail file) - it should check the new file size, see the transaction that was previously beyond the file size limit, and then apply it and move on!

Conclusion?

What causes this behavior? I can't find any clear documentation or explanation at all - when searching for this error, the only meaningful links I can find at all are either in an Oriental language and have basic details as well as a dire warning to call Oracle support immediately or a case where somebody receives it on an initial load and the forum's advice is "your table is too small to mess with this - just export it to CSV and reload it that way".

When we looked at the list of trail files, we noticed something particularly odd - the trail files near the offending file all had ascending "last modified" timestamps, as you'd expect, but this file was actually out of order:

05/01/2015  03:45 AM        99,999,462 SV002351
05/01/2015  04:38 AM        99,999,802 SV002352
05/01/2015  08:13 AM        99,999,367 SV002353
05/01/2015  10:09 AM        99,999,936 SV002354
05/01/2015  11:05 AM        99,999,630 SV002355
                                                 <-- File should be right here
05/01/2015  11:41 AM               891 SV002357
05/01/2015  11:47 AM        99,999,462 SV002358
05/01/2015  11:50 AM        99,999,280 SV002359
05/01/2015  11:58 AM        99,999,314 SV002360
05/01/2015  12:09 PM        99,999,910 SV002361
05/01/2015  12:40 PM        99,998,043 SV002362
05/01/2015  01:16 PM        99,999,754 SV002363
05/01/2015  01:34 PM        72,017,446 SV002356  <-- But it's down here
05/01/2015  02:05 PM        99,999,516 SV002364
05/01/2015  02:40 PM        99,999,966 SV002365

The file contained two additional transactions beyond the stated header size and the actual end of the file, and they were both time-stamped correctly to have been located in that file (they were both stamped 10:34AM, along with the transactions that were earlier in the file, and since the server is an hour off because of time zone, they were in the right file).

The fact that it's smaller than the others, and that it's followed by a file containing no transactions (just a header) led me to believe the file was cut short by a network interruption of some kind. We're using a local extract and a separate pump, as we're advised to do, but the connection still drops from time to time. In this case, I can only imagine it was in the middle of committing something, was interrupted, and then somehow these transactions were suspended for some reason and then added to the file later. I can't imagine why, but when they're added, the file header isn't updated.

Hopefully this explanation and work-around have helped somebody else - we pulled our hair out for a week going back and forth with Oracle support and scouring the internet (unsuccessfully) for any relevant information - in the end, going rogue and editing the file was the only way (short of a complete reload) to get things moving again!

25Mar/101

Visual Studio solutions don’t load when you double-click on them

When I installed Windows 7, double-clicking on .SLN files to load them in Visual Studio stopped working - I would get the hourglass for a few moments, and then nothing. It turns out that it's because I had set Visual Studio to always run as an Administrator, but when you double-click a .SLN file, you're not actually running Visual Studio - you're running the "Visual Studio Version Selector", even if you only have one version of Visual Studio installed.

To resolve the problem, you'll need to set the Version Selector to also run as an Administrator - to do this, find the Version Selector EXE:

C:\Program Files\Common Files\microsoft shared\MSEnv\VSLauncher.EXE (or "Program Files (x86)" if you have an x64 installation)

Right-click on it, go to the "Compatibility" tab, and check the box that says "Run this program as an administrator". Since I have multiple users on my laptop that I use for development, I had to click "Change Settings for all users" before checking the "Run as admin" box.

Voila! The VS Version Selector will now properly load the solution files when you double-click on them - enjoy!

16Sep/090

Expand a disk partition under Windows XP or Server 2003

Vista/Server 2008 include support for expanding hard disk partitions in the Disk Management MMC snap-in, but XP/2003 support it as well (if you're not afraid of the command line). To expand a partition:

  1. Open a command window
  2. Type "DISKPART" and press Enter to run the partition manager
  3. "LIST DISK" and press enter to show general disk information, including unused space
  4. "LIST VOLUME" to show the volumes. Make note of the volume you'd like to modify
  5. "SELECT VOLUME 0", where "0" is actually the volume number you're expanding
  6. "EXTEND" to expand the partition into all available space
  7. "EXIT" to leave DISKPART

There you go - enjoy your newly expanded partition, with no reboot necessary!

WARNING: Messing with your partitions can cause serious problems, including losing the ability to boot Windows or permanently deleting data. Only use this utility to if you know what you're doing! You've been warned!

22Apr/091

Resolving “STOP 0x0000007B INACCESSIBLE_BOOT_DEVICE” after you make changes to your hard drive configuration

I'd heard that, unlike previous versions of Windows, a Vista installation could be moved to a new computer without any problems, and would just re-detect all the hardware and reconfigure itself. So when I got my new computer, I imaged my boot partition to the new hard drive (it was much bigger than my old one), attempted to boot Vista, and got this error:

STOP: 0x0000007B (0x818B51B0, 0xC00000010, 0x00000000,0x00000000) INACCESSIBLE_BOOT_DEVICE

WTF? After much digging, it turns out that my new motherboard had AHCI enabled by default, whereas my old one didn't. This is a SATA standard that supports Native Command Queuing (NCQ), special power management, and some other features. While Vista supports it out of the box, if no AHCI drives are connected when Vista is installed, it disables the drivers, and any attempt to switch to AHCI mode later will give you a blue screen at boot time. The general consensus online is that you have to reinstall Windows, but you do not!

To resolve the error:

    1. In the system BIOS, switch AHCI mode off. This will probably mean something like "Compatibility mode" for the drive - look for a setting that sounds like it does this, either for the controller or the drive itself. This will allow you to boot into Windows again
      1. a.  If you still can't boot into Windows, you may need to rebuild the boot sector - not as scary as it sounds! Boot the the Vista install DVD, and when prompted, select "Repair installation". After thinking for about 15 seconds, the install should say that it's found the problem and corrected it, and you can reboot - Vista should come right up after that.
    2. Once you're in Windows, load REGEDIT and navigate to [HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Msahci]
    3. Set the value "Start" to "0". This will tell Windows to search for AHCI drives when it boots.

Reboot, and switch your hard drive back into AHCI mode when you do. Windows will boot, detect it, and install drivers. It will probably ask you to reboot again

  1. Resolved! Windows boots on your old Vista installation. Aside from this minor hiccup, Vista did indeed cooperate with my new hardware - it detected everything and came right up. It asked me to reactivate, but since it's a corporate copy, that was no big deal.