A common scenario: in the development environment, a table’s ID column is missing its unique constraint, and developers have been creating their own data for testing. Suddenly someone realizes that the ID column has duplicates. The DBA is summoned. "Enable the unique key", the DBA is told, "but do not delete our data."
Can the DBA escape this catch-22? For sure, says this post — with an approach of replacing *only* the duplicate IDs with unique values, and letting everything else stay untouched.
We’ll see this in action with a simple case study: a table BOOKS with two columns {ref_no, book_name}, in which ref_no is expected to be unique — but isn’t.
SQL> desc books Name Null? Type ----------------------- -------- ---------------- REF_NO NUMBER(3) BOOK_NAME VARCHAR2(50) SQL> select * from books; REF_NO BOOK_NAME ---------- -------------------- 1 Alice in Wonderland 2 Oliver Twist 2 Pygmalion 3 LOTR 6 Animal Farm 6 The Hobbit 6 Life Of Pi 7 Atonement
In this example, ID column (ref_no) with values 2 and 6 are duplicates.
SQL> -- List duplicates SQL> select ref_no, count(*) 2 from books 3 group by ref_no 4 having count(*) > 1; REF_NO COUNT(*) ---------- ---------- 6 3 2 2
As you can see, if we fix one row with ref_no 2, and two rows with ref_no 6, the duplicate issue will get resolved.
Here’s how.
Steps to Replace Duplicate IDs with Unique Values
1. Find the maximum value [X] of ID column
SQL> select max(ref_no) from books; MAX(REF_NO) ----------- 7
2. Create sequence to generate ID, starting with [X+1]
Set the start point of the new sequence ahead of the value identified in step 1, so that numbers generated by the sequence do not clash with any existing value in the ID column.
SQL> create sequence ref_no_seq 2 start with 8; Sequence created.
3. Update ID for duplicates in the table, using the new sequence
The SQL:
update books set ref_no = ref_no_seq.nextval where rowid in -- Update only the duplicates ( select rowid from ( select rowid , ref_no , row_number() over (partition by ref_no order by book_name) sno from books ) where sno > 1 );
When run:
SQL> update books 2 set ref_no = ref_no_seq.nextval 3 where rowid in 4 -- Update only the duplicates 5 ( 6 select rowid 7 from 8 ( 9 select rowid 10 , ref_no 11 , row_number() over 12 (partition by ref_no 13 order by book_name) sno 14 from books 15 ) 16 where sno > 1 17 ); 3 rows updated.
Verification
Check the data after duplicates have been resolved:
SQL> select * from books 2 order by ref_no; REF_NO BOOK_NAME ---------- ------------------- 1 Alice in Wonderland 2 Oliver Twist 3 LOTR 6 Animal Farm 7 Atonement 8 Pygmalion 9 The Hobbit 10 Life Of Pi 8 rows selected.
All good now, and the unique key constraint can now be enabled.
Caution: Note that this method will give you unique IDs, not gapless IDs — if you’re insisting on that, you’re in for trouble anyway!