In this article, we explore platform specifics, differences, and similarities in the way Oracle and PostgreSQL handle data types.
One key thing to note is that Oracle has two main tiers of data types - it supports ANSI standard data types, but these types are actually converted to Oracle specific types under the hood, sometimes with unexpected results.
For example, if you create a table with an BIGINT column in Oracle, it will create a NUMBER(38) column. If you create a SMALLINT column, it will also create a NUMBER(38) - probably not what you were expecting.
Oracle specific Types
Oracle provides a set of data types that are not available in PostgreSQL.
Oracle supports BFILE data type. This is used for storing pointers to large binary files outside the database while allowing the files to be accessed and manipulated via the database. This functionality is not directly present in PostgreSQL.
Oracle's ROWID and UROWID are two other unique types. ROWID is used to store the row identifier, which is a unique key for every row in the database. UROWID (Universal Rowid) can store both logical and physical rowids of a row from a nonpartitioned table or partitioned table, or from a table or view in a different database on the same network.
One important (and confusing) difference between PostgreSQL and Oracle is that Oracle’s FLOAT type is not a floating point number at all - it is a fixed precision value represented by Oracle’s NUMBER type. If you want an actual floating point number, you should use BINARY_DOUBLE or BINARY FLOAT.
Oracle-specific types can complicate migration from Oracle to PostgreSQL as they don't have direct counterparts. Custom solutions often must be explored to migrate or translate these data types correctly.
Ansi Data types
Oracle supports ANSI data types, which allows for greater readability for DBA and engineers who are used to working with different types of databases. It is worth mentioning again that these ANSI types are mapped to Oracle's native built-in types. As a result, when working with Oracle, we recommend using Oracle's built-in data types to avoid any unexpected issues regarding data processing and management due to the internal mapping that occurs.
Type Mapping
Below are common built-in Oracle types and their equivalent that we translate to in PostgreSQL.
Oracle Data Types | PostgreSQL Data Types
--------------------------------+--------------------------------
VARCHAR2 | TEXT
NVARCHAR2 | TEXT
NUMBER | NUMERIC
FLOAT | NUMERIC
LONG | TEXT
DATE | TIMESTAMP
BINARY_FLOAT | REAL
BINARY_DOUBLE | DOUBLE PRECISION
TIMESTAMP | TIMESTAMP
TIMESTAMP WITH TIME ZONE | TIMESTAMPTZ
TIMESTAMP WITH LOCAL TIME ZONE | TIMESTAMPTZ
INTERVAL YEAR TO MONTH | INTERVAL YEAR TO MONTH
INTERVAL DAY TO SECOND | INTERVAL DAY TO SECOND
RAW | BYTEA
LONG RAW | BYTEA
ROWID | TEXT
UROWID | TEXT
CHAR | TEXT
NCHAR | TEXT
CLOB | TEXT
NCLOB | TEXT
BLOB | BYTEA
BFILE | TEXT
Some things you need to keep in mind when mapping Oracle data types to PostgreSQL:
- Worth mentioning again: Oracle’s FLOAT type is not a floating point number at all - it is a fixed precision value represented by Oracle’s NUMBER type. If you want an actual floating point number, you should use BINARY_DOUBLE or BINARY FLOAT.
- LONG data type can hold a variable length string of up to 2 GB. The closest PostgreSQL type is 'TEXT', however, its size should not exceed 1 GB. Similarly, the Oracle’s LONG RAW can store binary data up to 2 GB. The 'BYTEA' equivalent in PostgreSQL can be used for storing binary data, but each row in a table in PostgreSQL should not exceed a total size of 1 GB.
- Oracle discourages the use of 'LONG RAW' datatype and recommends using 'BLOB' datatype instead for large binary data.
- In Oracle, 'TIMESTAMP WITH LOCAL TIME ZONE' converts and displays stored timestamps in the time zone of the database or the client session. From Oracle docs: “Data is normalized to the database time zone when it is stored in the database.When the data is retrieved, users see the data in the session time zone.” In PostgreSQL, you can use 'TIMESTAMPTZ' (timestamp with time zone) to store a timestamp with a timezone. However, 'TIMESTAMPTZ' does not store the time zone information. It normalizes the input value to the server's timezone and only the UTC value is stored. When querying, the timestamp value is adjusted to the client's timezone.
- Oracle’s 'ROWID' datatype is not directly equivalent to any datatype in PostgreSQL. 'ROWID' in Oracle is a pseudo column attached to each database table row, representing the physical row address. Typically, an auto-incrementing integer column such as 'SERIAL' or 'BIGSERIAL' is used in PostgreSQL as a unique identifier for rows. However, these are not the same as a 'ROWID', as they do not automatically change if the row's physical location changes.
- Oracle’s 'UROWID' represents the logical address of a row of an index-organized table. It holds extended 'ROWID' values which include block number, row number, and other necessary information to locate rows in a table. In PostgreSQL, there isn't a directly equivalent datatype, as it uses a different system for row identification.
- PostgreSQL does not have a direct equivalent to the 'BFILE' datatype. It typically expects all data to be stored within the database itself. For working with filesystem-stored files in PostgreSQL, you often would handle this at the application layer, not the database layer. In cases where you require large object data types, you could use PostgreSQL's 'BYTEA' type to store binary data, but these store the data in the database, not as external files.
String types
According to the PostgreSQL documentation, there is no real performance difference between the TEXT, VARCHAR, and CHAR types. Unless you need to constrain the length of a string column, you should use the TEXT type.
The relevant quote from the docs:
"There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact, character(n) is usually the slowest of the three because of its additional storage costs. In most situations, text or character varying should be used instead."
In fact, if you interpret that quote literally, the TEXT type should actually be the fastest, since it isn't blank padded (like the CHAR type) and isn't length constrained (like the CHAR or VARCHAR types).
Number types
In Oracle, the NUMBER data type is used to store numeric values that often contain precision and scale. This type allows for variable precision that can range from 1 to 38 significant digits, and scale which can be -84 to 127. Oracle will allow floating point values in this data type, with the scale dictating the number of digits to the right of decimal point, which can cause a difference between the entered and stored values due to rounding.
PostgreSQL, on the other hand, offers a variety of numeric types such as smallint, integer, bigint, decimal, numeric, real, and double precision. The equivalently broad type in PostgreSQL to Oracle’s NUMBER would be NUMERIC or DECIMAL type. This type allows an arbitrary precision number with no predetermined limits, other than being constrained within the system’s limits.
In PostgreSQL, the NUMERIC or DECIMAL type optionally supports enforicing precision (total number of digits) and scale (number of digits in the fractional component). However, unlike Oracle, the NUMERIC type in PostgreSQL exactly holds the number as entered with no adjustments for precision or scale leading to potential differences due to rounding.
Number types
In Oracle, the NUMBER data type is used to store numeric values that often contain precision and scale. This type allows for variable precision that can range from 1 to 38 significant digits, and scale which can be -84 to 127. Oracle will allow floating point values in this data type, with the scale dictating the number of digits to the right of decimal point, which can cause a difference between the entered and stored values due to rounding.
PostgreSQL, on the other hand, offers a variety of numeric types such as smallint, integer, bigint, decimal, numeric, real, and double precision. The equivalently broad type in PostgreSQL to Oracle’s NUMBER would be NUMERIC or DECIMAL type. This type allows an arbitrary precision number with no predetermined limits, other than being constrained within the system’s limits.
In PostgreSQL, the NUMERIC or DECIMAL type optionally supports enforicing precision (total number of digits) and scale (number of digits in the fractional component). However, unlike Oracle, the NUMERIC type in PostgreSQL exactly holds the number as entered with no adjustments for precision or scale leading to potential differences due to rounding.
Date and time types
In Oracle, confusingly, there are two types that store timestamp without timezone data: The DATE and TIMESTAMP types. The DATE type doesn’t store fractional second precision, while the various TIMESTAMP types, do. PostgreSQL provides two separate data types for date and time: DATE for storing dates (year, month, day) and TIME for storing time values (hour, minute, second).
For storing timestamps with fractional precision or time zone information, both Oracle and PostgreSQL offer TIMESTAMP datatypes. Oracle has distinct types to handle these scenarios: TIMESTAMP WITH TIME ZONE and TIMESTAMP WITH LOCAL TIME ZONE for timestamps with timezone data and TIMESTAMP for timestamps with fractional seconds. PostgreSQL offers TIMESTAMP and TIMESTAMPTZ.
Oracle also has built-in support for its own calendar system within the database. The proprietary Oracle Calendar supports different date systems through AD and BC separators.
Binary data types
In Oracle, binary data is primarily handled using BLOB, BFILE, and RAW data types. The BLOB datatype is used for storing binary data in the database, and it can store up to 4GB of binary data. RAW is used for smaller amounts of binary data, typically up to 2000 bytes. BFILE is a type that stores a locator to a binary file outside the database, potentially of a very large size, and it's able to handle operations on the file.
PostgreSQL, on the other hand, uses the BYTEA data type to store binary data as a variable-length array, and it can hold up to 1GB of binary data.