Bring your impdp some love

Imagine you’re planning a database migration between two servers.

You’re performing a FULL import into a brand-new DATABASE.

You have plenty of good reasons to “reorganize” the entire DATABASE: a SYSTEM tablespace over 60GB, a SYSAUX tablespace exceeding 40GB… you know, audit records written to the wrong places, dozens of tables with millions of rows that have never been reorganized, and the same situation with indexes…

So, you set up a fresh environment and kick off an impdp using NETWORK_LINK, with a FLASHBACK_TIME=”SYSTEMESTAMP” and a solid degree of parallelism.

Let’s say you go with PARALLEL=16 or even PARALLEL=32. The server has more than enough CPUs to handle it.

The process starts off like a frenzy.

In just 26 minutes, you’ve successfully migrated all the table data to the target environment.

And just when you’re about to pour yourself a celebratory whiskey to savor the sweet taste of success, you hit the real challenge: indexes, constraints, and statistics.

Those 26 minutes quickly stretch into 26 hours — with your flashback retention dangerously flirting with an ORA-1555 error.

If you’re thinking the 32 processes are clashing with each other… that’s not the issue. In fact, it’s quite the opposite.

First of all: indexes are created by a single worker.

Secondly, impdp handles index creation serially, without any parallelism. Yep, you heard that right.

And it’s by design. Serializing index creation prevents multiple indexes from being built simultaneously on the same table, which would otherwise cause a flood of “TM” locks, leading to heavy contention in the library cache — and ultimately, significant delays due to lock management.

This behavior isn’t new. In earlier versions (such as 11g — check out bug 8604502 if you’re curious), indexes were imported with PARALLEL 1 and stayed that way at the destination, even if they had a different parallel setting at the source. This was later improved by adding an ALTER INDEX after creation to reset the parallelism level accordingly.

In the case I’m describing, there were no partitioned objects at the source. However, for partitioned indexes, impdp still dedicates one worker per partition — sequentially — leading to further underutilization of system resources.

In short, the frenzy of 32 parallel processes quickly turns into a single thread, working through index creation and constraint enforcement one by one.

To overcome this bottleneck, here’s my “homemade recipe” to manually inject a boost of parallelism into your Data Pump import — and give your migration that extra kick it deserves.

1.- Load only the data first.

During data loading, use as much parallelism as you can. In my case, around 500GB of data was transferred in 26 minutes via dblink. Maximum parallelism.
Exclude indexes, constraints, and statistics — you’ll import those later.

impdp network_link=BBDD_ORIGINAL_LINK full=yes logfile=migracion_solo_datos.log parallel=32 exclude=STATISTICS,CONSTRAINT,REF_CONSTRAINT

2.- Generate an index file, and then another one for constraints.

 impdp network_link=BBDD_ORIGINAL_LINK schemas=LISTA_USUARIOS_SEPARADOS_POR_COMA sqlfile=indices_ddls.sql include=INDEX impdp network_link=BBDD_ORIGINAL_LINK schemas=LISTA_USUARIOS_SEPARADOS_POR_COMA sqlfile=constraints_ddls.sql include=CONSTRAINT,REF_CONSTRAINT

3.- Edit the file indices_ddls.sql to fix the PARALLEL 1 setting.

You’ll see that in the indices_ddls.sql file, all indexes are created like this:

 
-- First it creates the index with PARALLEL 1 
CREATE UNIQUE INDEX "USUARIO"."INDICE1" ON "USUARIO"."TABLA1" ("COLUMNA1") 
PCTFREE 10 INITRANS 2 MAXTRANS 255 STORAGE(INITIAL 131072 NEXT 1048576 MINEXTENTS 1 
MAXEXTENTS 2147483645 --> This part depends on the original STORAGE clause. 
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL 
DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT) 
TABLESPACE "INDICES" PARALLEL 1 ; --> This is the PARALLEL 1 you want to replace 

-- Then it resets the parallelism to the original value 
ALTER INDEX "USUARIO"."INDICE1" PARALLEL ;

If you’re using “Find and Replace”, replace PARALLEL 1 with PARALLEL.
If you’re using vi, then::%s/PARALLEL 1/PARALLEL/g

4.- Prepare the gather_stats script.

Since the entire database has been reorganized and objects recreated, it’s best to generate fresh statistics:

 
begin  
     dbms_stats.gather_database_stats( 
	estimate_percent =>dbms_stats.auto_sample_size, 
	degree =>dbms_stats.auto_degree, -- Automatic parallelism 
	method_opt =>'for all indexed columns size auto', 
	cascade => true); 
end;
/

5.- Identify the largest tables and those with the most indexes and constraints in the source environment.

Here’s an example query to extract them. If you need to locate partitioned objects, use a similar approach by joining with DBA_IND_PARTITIONS.

 
column owner format a30 
column segment_name format a30 
column segment_type format a15 
select s.owner, segment_name, s.segment_type, round(s.bytes/1024/1024/1024) GB, 
       count(*) TOTAL_INDICES 
from dba_segments s, dba_indexes i 
where s.segment_name=i.table_name and s.owner=i.table_owner 
  and s.bytes/1024/1024/1024>3 and s.owner in ('LISTA_USUARIOS') 
  and s.segment_type='TABLE' 
group by s.owner, segment_name, s.segment_type, round(s.bytes/1024/1024/1024) 
order by 4 desc,5 desc ; 

select s.owner, segment_name, s.segment_type, round(s.bytes/1024/1024/1024) GB, 
       count(*) TOTAL_CONSTRAINTS 
from dba_segments s, dba_constraints c 
where s.segment_name=c.table_name and s.owner=c.owner 
and s.bytes/1024/1024/1024>3 and s.owner in ('LISTA_USUARIOS') 
and s.segment_type='TABLE' and c.constraint_type in ('P','R') 
/* You're only interested in PRIMARY KEY and FOREIGN KEY constraints */ 
group by s.owner, segment_name, s.segment_type, round(s.bytes/1024/1024/1024) 
order by 4 desc,5 desc ;

6.- Do some manual tuning with the indexes.

It’s not a good idea to create all indexes for a single table in parallel — this will cause read contention and many waits. I recommend “isolating” the indexes of the large tables identified in the previous query.

Create a separate script for each of these large tables and prioritize those with more indexes and constraints.

7.- Apply the same manual care to constraints.

These large tables will also have their constraints applied sequentially, which can be heavy.
Identify foreign keys so you can prepare a script that first creates the primary keys on related tables, and then the foreign keys on the large tables.

Remember: constraint scripts can’t be run until the related indexes are already created. In the script generated in step 2 (constraints_ddls.sql), you’ll find the PRIMARY KEY creation first, like:

 ALTER TABLE "USUARIO"."TABLA" ADD CONSTRAINT "TABLA_PK" PRIMARY KEY ("ID") 
USING INDEX "USUARIO"."INDICE_PK" ENABLE;

And in the second part of the script, all the FOREIGN KEYs:

 ALTER TABLE "USUARIO"."TABLA_REF" 
ADD CONSTRAINT "FK_TABLA" FOREIGN KEY ("COLUMNA") 
REFERENCES "USUARIO"."TABLA" ("ID") ENABLE;

Use this as a base to build custom scripts per table.

8.- At import time, follow this order:

First, create the tables without indexes or constraints.
This is what you did in step 1. Add the FLASHBACK_TIME option to ensure consistency in case there’s transactional activity at the source.

 impdp network_link=BBDD_ORIGINAL_LINK full=yes logfile=importacion_tablas.log 
parallel=32 exclude=statistics,constraint,ref_constraint,index 
flashback_time="systimestamp"

Once the tables are created, run the corrected index scripts. In separate sessions, run the scripts for the large tables.

Since these are high-volume tables, you do want their indexes to be created serially. But that doesn’t prevent you from launching index scripts for multiple large tables in parallel.
There won’t be DDL contention, since they’ll be locking different objects. However, if you launch all index creation scripts in multiple sessions, you will get waits when trying to create multiple indexes on the same table at once. The first script will lock the table, and the second will be forced to wait.

 
alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS'; 
set timing on 
alter session set ddl_lock_timeout=1; 
/* If the table is locked, now is not the time to wait. */ 
spool indices_ddls_1.log 
/* You can run this in multiple sessions, just be aware of potential contention. */ 
select sysdate from dual; 
/* The sysdate calls are just to measure how long each script takes. */ 
@indices_ddls.sql 
select sysdate from dual; 
spool off alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS'; 
set timing on alter session set ddl_lock_timeout=1; 
spool lanza_indices_TABLA_GRANDE1.log 
select sysdate from dual; 
@indices_ddls_TABLA_GRANDE1.sql 
select sysdate from dual; 
spool off

Once the indexes for the large table are in place, you can launch the constraint scripts. Remember: primary keys require their indexes to exist first.

 
alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS'; 
set timing on alter session set ddl_lock_timeout=1; 
spool lanza_constraints_TABLA_GRANDE1.log 
select sysdate from dual; 
@constraints_ddls_TABLA_GRANDE1.sql 
select sysdate from dual; spool off

Once you finish with indexes and constraints for large tables, if overall import time is a concern, go ahead and launch the statistics generation script so it can start gathering stats on the large tables. Thanks to dbms_stats.auto_degree, this will also take advantage of parallelism.

With this “handcrafted parallelism,” you’ll wrap up the import in record time.
Here’s a summary of my handcrafted import process that used to take 26 hours. With this recipe, it looked like this:

Final timings (from my latest migration)

Standard impdp with PARALLEL 32 but sequential index/constraint creation and a final DBMS_STATS: 26h.

The step-by-step impdp method in this article:

T1 – Migration start.
Table and row creation -> 26 minutes

T2 = (T1 + 26 minutes) – With tables and rows in place.
General index creation with 6 concurrent sessions -> 4h 25min
Large table indexes (~50GB each) with 4 concurrent sessions -> 1h
Medium table indexes (~5GB each) with 2 concurrent sessions -> 35min

T3 = (T1 + 26min + 4h 25min)
Constraint creation with 6 concurrent sessions (excluding large table constraints already created) -> 3h 30min

Total import time: 26min + 4h 25min + 3h 30min = 8h 20min