ORACLE-L: On ZFS & Snapshots

December 12, 2006

I posted the following to oracle-l earlier today, in reference to a discussion I’d started about using Solaris 10’s ZFS filesystem (particularly its snapshotting ability) in tandem with an online Oracle instance for use in refreshing a copy of that instance on a development server.

ZFS Snapshot-Based Refreshes

The goal is to find a fast way to periodically update a pre-prod environment using a copy of the current production database, and to do so with minimal production outage and without requiring a long restore, recover operation.

So we have a production environment, running on ZFS, from which we take a base snapshot and populate the target server. (This is effectively an entire copy of the filesystem, and will thus take some time, but will only need to be performed once.) Note that the copy of the snapshot is never opened by Oracle – it’s merely a backup of the production database as of a point in time.

Going forward, we periodically snapshot production (without needing to bring it down) and applying that snapshot incrementally to the most recent target snapshot copy. If we do this frequently enough we end up with a series of small updates to the copy.

And all of this, for our use case, is to simplify the occasional rebuilding of the pre-production database. To do the rebuild, we simply clone the latest ZFS snapshot on the target, freeing us from the need to restore (duplicate) from RMAN, and allowing us to avoid any production down time. It’s just a clone of the most recent snapshot on the target.

But, of course, that clone is inconsistent, in that the snapshot was taken when the datafiles may have been in an inconsistent state (SCN wise). Because we took the datafile snapshot first, though, we can recover them using the snapshotted logs, allowing us to open the database. Voila!

(In case it’s not clear, pre-production will diverge from production at times, hence the need for the refresh. This is an operational requirement here.)

I haven’t yet heard an argument as to why this wouldn’t work, and I’m fairly convinced, indeed, that it will.

II. BEGIN BACKUP … END BACKUP AND PITR

I posited that a hot backup via BEGIN BACKUPEND BACKUP would not allow one to recover the associated datafile(s) to a point in time between those calls.

I found this note in the Oracle documentation that led me to that hypothesis (emphasis mine):


Like any other backup, an online backup is composed of image
copies of all the datafiles that make up a tablespace. The point to
remember is that as these files are being backed up they may also be
in the process of being written to by the detached process DBWR. Some
characteristics of an online backup are

    o  users are allowed normal access to all online tablespaces;
    thus, users can access the tablespace being backed up.

    o  when used for recovery the backup can only be used to return
    to the most recent state of the database, not to a previous
    state

    o  only the database files comprising a tablespace are backed
    up; the log files are being archived and the control file
    does not need to be backed up if there had been no
    structural change to the database since the last control file
    backup

Yet if every change during the hot backup copies the changed block to the log, I’m not exactly clear why PITR wouldn’t be possible. It’s a lot more data in the log, sure, but the implication is that every SCN reflected in the redo stream is associated with the block that changed, indeed, the entire block change.

So perhaps I am misinterpreting that statement.

[Note that the statement The point to remember is that as these files are being backed up they may also be in the process of being written to by the detached process DBWR does not apply to the ZFS scenario I outlined above, hence my conclusion we could do without ever having the bring the datafiles into backup mode.]

Advertisements

OS{DBA,OPER} Settings Ignored During Oracle10gR2 Installation on Solaris (x86)

December 8, 2006

As part of my work to deploy Oracle10gR2 on our Solaris 10 (x86) hosts, I ran across an annoying bug in which the OUI ignores the setting of the OSDBA and OSOPER groups the user has specified during installation, and compiles into the Oracle binaries the default name, dba, instead. This bug is significant, of course, because OS-based authentication is essential for connecting to and starting idle instances; if you aren’t a member of the OSDBA group on a given host, you won’t be able to start up an instance on that host.

For many environments, however, the bug doesn’t introduce a true problem: most people tend to use the default dba group anyhow, and thus things are kosher from the get-go.

But in mixed database environments – those which support multiple database vendors – an Oracle OSDBA group name other than dba is often selected in order to distinguish the name of the group from possible DBA groups in the other environments. In these cases, though, the bug introduces a mismatch between what you use as your OSDBA group, and what the OUI actually compiles into the Oracle binaries.

At my company, for instance, we use both Sybase and Oracle, and thus set our OSDBA group name to oradba, in order to distinguish it from the Sybase-related groups. Changing the OSDBA group to dba is not an option. Because of this, in the presense of this bug, installing Oracle10gR2 on Solaris (x86) out-of-the-box left us unable to start new instances. Therefore, it was imperative we fix the bug.

A search on Metalink revealed a variation on the issue (in which it happened under different circumstances but with the same result, namely an inability to connect as a sysdba using OS group-based authentication), but the specified fix was unacceptable: “Change the user’s group to dba.”

After a little digging, I discovered the error was in the $ORACLE_HOME/rdbms/lib/config.s file, a small piece of assembler code in which the OSDBA group is hard-coded. By modifying this file, recompiling the appropriate Oracle binary (namely config.o), and then relinking to the Oracle executables, we were able to resolve the issue. The patch we used to correct the bug follows. (We ran into a different problem during compilation, this one related to the NOKPIC flag options in the env_rdbms.mk. A patch for it is included below too.)

The patch we applied to config.s:

12c12
 .LV13:        .string "oradba"
14c14
 .LV12:        .string "oradba"
!

The patch we applied to env_rdbms.mk:

274c274
 NOKPIC_ASFLAGS=-xarch=amd64
!

The command we used to relink:

make -f ins_rdbms.mk config.o ioracle

With the new executable and library in place, we were able finally to connect as sysdba via OS-group.