This whitepaper covers experience of database migration from Oracle to PostgreSQL made by Intelligent Converters for a large cargo delivery company. The key reason of migration was reducing cost of the database management system and transferring to open-source system with sufficient scalability, security and customization capabilities.
Here is brief specification of the source database:
Database server Oracle 12g
Database contains 190 tables, 50 GB of data
120000 lines of PL/SQL in stored procedures, functions and triggers
The project duration was 2 months: a half month of assessment and planning, one month for migration and a half month for testing. There were 3 specialists involved in the migration: 2 developers and 1 database administrator.
There are some data types in Oracle having no direct equivalent in PostgreSQL. One of such types is DATE containing both date and time parts.
There are two options of mapping Oracle dates into PostgreSQL: either use TIMESTAMP or set up orafce extension to use Oracle-style date type oracle.date.
Spatial types also require special attention. Oracle has built-in type SDO_GEOMETRY while PostgreSQL needs PostGIS installation to work with spatial data types.
Oracle |
PostgreSQL |
---|---|
BINARY_FLOAT |
REAL |
BINARY_INTEGER |
INTEGER |
BINARY_DOUBLE |
DOUBLE PRECISION |
BLOB, RAW(n), LONG RAW |
BYTEA (1GB limit) |
CLOB, LONG |
TEXT (1GB limit) |
DATE |
TIMESTAMP |
NUMBER, NUMBER(*) |
DOUBLE PRECISION or BIGINT if it is a part of Primary Key |
NUMBER(n,0), NUMBER(n) |
n<5 – SMALLINT5<=n<9 – INT9<=n<19 – BIGINTn>=19 – DECIMAL(n) |
NUMBER(p,s) |
DECIMAL(p,s) |
REAL |
DOUBLE PRECISION |
For numeric types it is important to understand the scope of use in database. If it is focused on accuracy, Oracle numeric types must be mapped in PostgreSQL NUMERIC. If the top priority is calculation speed, the best mapping would be REAL or DOUBLE PRECISION.
We used Oracle-to-PostgreSQL converter to automate migration of table definitions, indexes and constrains for this project. It maps Oracle types into the most appropriate PostgreSQL equivalents and allows to customize particular type mapping.
Since data migration may consume much time for large databases, it is extremely important to choose right strategy and tools for this step.
There are three common approaches to the data migration:
Snapshot method requires essential downtime of the source database for the entire period of reading data to avoid data loss or corruption during migration. The downtime for piecewise snapshot approach is much less but still required. In this migration project we used piecewise snapshot method for migration of large tables containing millions of rows.
It has two major implementations and each of them has its weakness. First one is known as trigger-based CDR. It requires creating triggers on insert, updates and delete for each table being replicated in the source database. Those triggers track all changes by storing information about changing events into special ‘history’ table. Based on this data the CDR tool replicates all changes to the target database. Trigger-based CDR causes extra overhead of the source database due to writing into ‘history’ table for every data update.
The second method is called Transaction Log CDR. It uses transaction logs created by Oracle DBMS to track and replicate changes into the target PostgreSQL database. The advantage of this approach compared to trigger-based CDR is that it does not modify the source database.
However, Transaction Log CDR has some weaknesses as well:
The most challenging aspects of data migration from Oracle to PostgreSQL are: specific data formats of the source DBMS having no direct equivalent in the target and the external data.
In the previous section of this article BYTEA is stated as the most suitable PostgreSQL data type for binary data. However, when migrating large binary data (average field size is not less than 10MB), it is not recommended to use BYTEA. The reason is particularity of reading BYTEA data – it only can be extracted in one fragment; piecewise reading is not possible. This may cause essential RAM overhead. PostgreSQL type LARGE OBJECT may be used as workaround for this issue. All values of LARGE OBJECT type are stored in the system table ‘pg_largeobject’ that is a part of each database. There can be up to 4 billion of rows in table ‘pg_largeobject’. Max size of LARGE OBJECT is 4TB and piecewise reading is available.
Also, it is important to properly migrate ROWID that is pseudo-column identifying physical address of record in the table. PostgreSQL has a similar service field called ctid, however it is a direct equivalent of ROWID. PostgreSQL documentation says that ctid might be changed due to the vacuuming procedure.
Oracle may link external data that is stored outside the database. However, it can be processed as a regular table using ‘external table’ feature of the DBMS. PostgreSQL uses Foreign Data Wrapper library for the same purpose. For example, it provides extension ‘file_fdw’ to work with external CSV files as a regular table.
This section explores issues that was resolved during migration of Oracle packages, stores procedures, functions and triggers to PostgreSQL.
Packages. PostgreSQL does not have packages. This missing feature can be emulated by grouping all entries belong to one package inside PostgreSQL schema with the same name, global variables can be stored in a service table.
Empty strings. Unlike Oracle, in PostgreSQL empty strings are not equal to NULL. For example, in Oracle:
SQL> select NVL('','This is null') as null_test from dual;
NULL_TEST
------------
This is null
The same in PostgreSQL:
testdb=# select coalesce('','This is null') as null_test;
null_test
-----------
(1 row)
Also, there is a different behavior of string concatenation with NULL in Oracle and PostgreSQL. In Oracle:
SQL> Select 'String' || NULL as cs from dual;
CS
--
String
In PostgreSQL:
synctest=# Select 'String' || NULL as concat_str;
concat_str
------------
(1 row)
Types casting. PostgreSQL requires strict types casting when calling functions, operators or when INSERT and UPDATE statements place the results of expressions into a table.
The workaround is to use pseudo-type ‘anyelement’:
create or replace function my_concat(str1 anyelement, str2 anyelement)
returns varchar
language plpgsql
as $$
begin
return str1::varchar || str2::varchar;
end;
$$;
Note: you can either use one anyelement argument in stored procedure/function or all parameters of the same type anyelement.
Sequences. Syntax of sequence declaration is similar for Oracle and PostgreSQL. For example:
CREATE SEQUENCE SEQ1 START WITH 10 INCREMENT BY 2 MAXVALUE 1000000 CACHE 20 NO CYCLE;
However, referencing the next value of sequence is different. In Oracle it is ‘sequence.nextval’ and in PostgreSQL nextval(‘sequence’).
Triggers. In Oracle the source code of trigger’s body in included in CREATE TRIGGER statement. PostgreSQL requires that trigger source code is compose as standalone function with the reference from CREATE TRIGGER statement:
CREATE OR REPLACE FUNCTION store_changes()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
BEGIN
IF (NEW.first_name <> OLD.first_name) OR (NEW.last_name <> OLD.last_name) OR (NEW.email <> OLD.email)
THEN
INSERT INTO changes_log(id,changed_on)
VALUES(OLD.id,now());
END IF;
RETURN NEW;
END;
$$
CREATE TRIGGER make_changes
BEFORE UPDATE
ON employees
FOR EACH ROW
EXECUTE PROCEDURE store_changes();
Autonomous Transactions. Autonomous transactions in Oracle allow a subprogram to commit or rollback SQL operations without committing or rolling-back the main transaction. For example, some data is inserted into a table as a part of insert-trigger.
This insert operation must succeed even if main transaction of insert-trigger is failed. In this case the corresponding INSERT statement must be enclosed in autonomous transaction:
CREATE OR REPLACE PROCEDURE insert_critical_data(v_data varchar2)
IS
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
INSERT INTO info_table (data) VALUES (v_data);
commit;
END;
Autonomous Transactions are not supported in PostgreSQL directly. One possible workaround is to refactor the code, so it does not need autonomous transactions.
Otherwise, you can use module ‘dblink‘ of PostgreSQL. With dblink, a new connection is established, and a query is executed using this connection and is immediately committed, irrespective of the main transaction. For example, the following function inserts a row into a table, and the insert operation will be committed even if the calling transaction is rolled back:
CREATE OR REPLACE FUNCTION insert_critical_data(v_data TEXT)
RETURNS void
AS
$BODY$
DECLARE
v_sql text;
BEGIN
PERFORM dblink_connect('myconn',
'dbname=mydbname user=… password=… host=… port=…');
v_sql := format('INSERT INTO info_table (data)
VALUES (%L)', v_data);
PERFORM dblink_exec('myconn', v_sql);
PERFORM dblink_disconnect('myconn');
END;
$BODY$
LANGUAGE plpgsql;
Built-in Functions. Oracle and PostgreSQL provide similar but not equal sets of built-in functions. The table below includes Oracle functions that require porting into PostgreSQL equivalents:
Oracle |
PostgreSQL |
---|---|
ADD_MONTH($date,$n_month) |
$date + $n_month * interval '1 month' |
DECODE($exp, $when, $then, ...) |
CASE $exp WHEN $when THEN $then ... END |
INSTR($str1, $str2) |
*POSITION($str2 in $str1) |
ROWNUM |
**row_number() over () |
SYSDATE |
CURRENT_DATE |
SYS_GUID() |
uuid_generate_v1() |
*Complex porting of Oracle INSTR function to PostgreSQL can be found at: https://www.postgresql.org/docs/9.0/static/plpgsql-porting.html#PLPGSQL-PORTING-APPENDIX
Oracle condition ‘where rownum < N’ must be converted into ‘limit N’ in PostgreSQL We used Oracle-to-PostgreSQL Code Converter to partially automate conversion of triggers, stored procedures and functions. It supports conversion of Oracle built-in functions into PostgreSQL equivalents and processes most syntax patterns of PL/SQL. However, complex source code may require manual post-processing of the output.
This section contains brief description of PostgreSQL modules that can help to migrate from Oracle to PostgreSQL and test the results of migration.
One of the most important PostgreSQL modules when migrating from Oracle is orafce. It emulates specific types, functions and operators of Oracle database management system.
Extensions ‘pgTAP’ and ‘pg_prove’ can be used for functional testing of PostgreSQL functions. After installing ‘pgTAP’, it creates a lot of stored functions used for writing tests. Test function must be declared as ‘returns setof text’. Then it can be run from psql console client:
psql -h $db_host -p $db_port -U $db_user $db_name -f tests/name_of_test.sql
Detailed information about pgTAP can be found here: http://pgtap.org/documentation.html
PostgreSQL extension plpgsql_check helps to validate PL/pgSQL code. For example, if there is missing variable declaration or misspelled column name inside PostgreSQL function, this will definitely fail during execution. Use capabilities of ‘plpgsql_check’ to diagnose such issues before going to production:
select * from plpgsql_check_function_tb('{name of function}');
Module plprofiler helps to analyze performance of PL/pgSQL code in functions and stored procedures.
PostgreSQL extension oracle_fdw provides connection to Oracle databases through Foreign Data Wrapper feature of DBMS. For example, if Oracle database ‘OCRL’ is running on the server ‘server.mydomain.com’, oracle_fdw must be configured as follows:
CREATE EXTENSION oracle_fdw;
CREATE SERVER oraserver FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver '//server.mydomain.com:1521/ORCL');
GRANT USAGE ON FOREIGN SERVER oraserver TO pguser;
CREATE USER MAPPING FOR pguser SERVER oraserver OPTIONS (user 'oracle user', password 'oracle password');
CREATE FOREIGN TABLE oratable ( { column definitions } ) SERVER oraserver OPTIONS (schema 'ORAUSER', table 'ORATAB');
Then, you can use ‘oratable
’ as a regular PostgreSQL table.