The DISTINCT keyword placed next to SELECT restricts the result to unique rows from a query.
DISTINCT is also a much abused keyword, often used as a quick fix to bad queries. Take this example from Oracle Applications:
-- Problem SELECT p.party_id , p.party_name FROM hz_parties p , hz_cust_accounts c WHERE c.party_id=p.party_id AND upper(p.party_name) LIKE 'BUSINESS WOR%' PARTY_ID PARTY_NAME ---------- ------------------ 4429 Business World 4429 Business World
You see that the query returns two rows. Since you’re only interested in one unique answer, how would you correct the query?
The lazy developer approach:
-- Wrong Approach SELECT DISTINCT p.party_id , p.party_name FROM hz_parties p , hz_cust_accounts c WHERE c.party_id=p.party_id AND upper(p.party_name) LIKE 'BUSINESS WOR%' PARTY_ID PARTY_NAME ---------- ------------------ 4429 Business World
Some developers simply stick a DISTINCT in, to suppress duplicate rows from an incorrect query. But is this a wise fix?
DISTINCT Abuse and its Consequences
You may have got the right answer with the lazy developer approach BUT:
- Your query will not perform optimally – DISTINCT causes an additional sort of data.
- It is possible that one of the joins in the query needs to be an EXISTS subquery, or the join condition is incomplete, or the data model is faulty. This needs to be investigated and the root cause fixed. By using DISTINCT, you are ignoring the problem, not resolving it.
How to Correct The Query Above
That query happened to need an EXISTS subquery in place of a join. The following modified query would work correctly, returning only unique rows:
-- Correct Approach SELECT p.party_id , p.party_name FROM hz_parties p WHERE upper(p.party_name) LIKE 'BUSINESS WOR%' AND EXISTS ( SELECT null FROM hz_cust_accounts c WHERE c.party_id=p.party_id) PARTY_ID PARTY_NAME ---------- ------------------ 4429 Business World
Summary
DISTINCT makes a query return unique rows only, and is often used injudiciously to suppress duplicate rows being returned by a bad query. Don’t use DISTINCT to cover up errors. Fix the problem at the source: run the query without DISTINCT, investigate why the duplicates occur and correct it.
{ 3 comments… read them below or add one }
Makes sense. So is there a right time to use ‘Distinct’?
When we actually want to list unique rows, and the data in question may be repeated.
e.g. To list all the database usernames that own objects:
select distinct owner from all_objects;
In a database with thousands of objects, skipping “distinct” would mean some painful sifting through the result set.
hi,
I have a doubt on joining tables. Suppose , in a table there are some 5 rows , and in another table 0 rows ( no records) , then wht shud be the output of cartisian product of the two tables ?
5 or 25 or null?