Occasionally, a DBA faces an Oracle RAC node that needs to be fixed, usually after applying a nasty patch.
Currently, my first approach is to remove the node, then add it back. There are other methods to try fixing the problem, but they usually will take some time, such as opening an SR with Oracle.
Even though it sounds complicated, it's not. I will show you how to recover a node from several disaster scenarios.
A. GRID binaries corrupted on ol8-19-rac2
B. GRID binaries corrupted on ol8-19-rac2
C. GRID and DB binaries corrupted on ol8-19-rac2
Oracle RAC with 2 nodes: ol8-19-rac1 and ol8-19-rac2
CDB: cdbrac1
PBD: pdb1
Grid Version
34318175;TOMCAT RELEASE UPDATE 19.0.0.0.0 (34318175)
34160635;OCW RELEASE UPDATE 19.16.0.0.0 (34160635)
34139601;ACFS RELEASE UPDATE 19.16.0.0.0 (34139601)
34133642;Database Release Update : 19.16.0.0.220719 (34133642)
33575402;DBWLM RELEASE UPDATE 19.0.0.0.0 (33575402)
DB Version
34086870;OJVM RELEASE UPDATE: 19.16.0.0.220719 (34086870)
34160635;OCW RELEASE UPDATE 19.16.0.0.0 (34160635)
34133642;Database Release Update : 19.16.0.0.220719 (34133642)
When you see <dbenv>, load the DB HOME variables.
When you see <gridenv>, load the GRID HOME variables.
When you see <bnode>, execute the command on the broken node.
When you see <anode>, execute the command on any other working node.
I am using an installation where both DB and GRID are installed under the user ORACLE, and I set the variables to access each environment. But the same procedure works even if the installation uses two different users (usually oracle e grid).
OBS: Always validate any procedure before you try it in a production environment.
This scenario's grid binaries were unaffected, so we don't need to replace them. The database is already down since the binaries are corrupted.
<bnode dbenv>
[oracle@ol8-19-rac2 admin]$ mkdir -p /tmp/oracle; tar cvf /tmp/oracle/db_netadm.tar -C $ORACLE_HOME/network/admin .
<bnode dbenv>
[oracle@ol8-19-rac2 admin]$ $ORACLE_HOME/deinstall/deinstall -local
Confirm the database name, and choose yes to delete the instance. It's not unusual to face file deletion error messages; ignore them.
If we were adding an actual new node, we were supposed to run some cluster verification checks to confirm that everything is OK, but since the node was already part of the cluster, let's skip that.
Due to that, it's not unusual to face the message "[WARNING] [INS-13014] Target environment does not meet some optional requirements." ignore it.
From any other node, add the ex-broken node back. This step takes a while, as it copies the files from one node to another.
<anode dbenv>
[oracle@ol8-19-rac1 ~]$ $ORACLE_HOME/addnode/addnode.sh -silent "CLUSTER_NEW_NODES={ol8-19-rac2}"
As root, execute the following on the ex-broken node.
<bnodt>
[root@ol8-19-rac2 scripts]# /u01/app/oracle/product/19.0.0/dbhome_1/root.sh
From any other node, as oracle user execute the following to each existing database to add the nodes' instance back (recreate UNDOTBS, REDOS, etc).
<anode dbenv>
[oracle@ol8-19-rac1 ~]$ dbca -silent -ignorePrereqFailure -addInstance -nodeName ol8-19-rac2 -gdbName cdbrac -instanceName cdbrac2 -sysDBAUserName sys -sysDBAPassword SysPassword1
<bnode dbenv>
[oracle@ol8-19-rac2 admin]$ tar xfv /tmp/oracle/db_netadm.tar -C $ORACLE_HOME/network/admin
[oracle@ol8-19-rac2 ~]$ srvctl status database -db cdbrac
Instance cdbrac1 is running on node ol8-19-rac1
Instance cdbrac2 is running on node ol8-19-rac2
Voilà, that's it. Now your database is supposed to be back online on the ex-broken node.
In the following article, I'll show how to recover from corrupted GRID binaries.