Delete duplicate records from table even when there is no key field

Hi all,yesterday only i was discussing with my colleague Kamlesh Samnani who is MCTS about deleting records which does not have any primary key field based on which we can delete the data (actually he only fired this question yesterday to me)

so i thought to write a blog post about it,there are many posts which talk about deleting duplicate records from the table which has some key field,so i started working on the query and what i got is what i am gonna write here.

Lets start creating the table and after inserting some data in it,we will fire our query in that and will check the output as well

CREATE TABLE [dbo].[Employee](
[EmployeeId] [int] NULL,
[Salary] [money] NULL
) ON [PRIMARY]

INSERT INTO Employee VALUES(1,1000)
INSERT INTO Employee VALUES(1,2000)
INSERT INTO Employee VALUES(1,3000)
INSERT INTO Employee VALUES(2,2000)
INSERT INTO Employee VALUES(3,3000)
INSERT INTO Employee VALUES(3,4000)
INSERT INTO Employee VALUES(4,4000)
INSERT INTO Employee VALUES(4,5000)

after creating table and inserting some data table will look like following

Now we will try solving the problem using two method,first one is using simple query which is quite interesting (It was new for me when i created this query and performed) and second approach is using CTE (Common Table Expression)

1) Using Simple Query with Row_Number()

DELETE temp FROM
  (
      SELECT id FROM
      (
        SELECT row_number() over (PARTITION BY EmployeeId order by EmployeeId) AS   
        id, EmployeeId,Salary 
        FROM Employee
      ) AS t WHERE id > 1
   ) temp
   SELECT * FROM Employee

Here what row_number() will do is will create numbers for each of the row,now we have PARTITION BY EmployeeId so it will reset the number when EmployeeId changes,so on every new EmployeeId there will be number generated from 1,so we can directly delete data which are > 1,because they are duplicate,now in case if we do not have any duplicate record this row_number() will generated only once and it will never be reset,so in that case no records would be having same number generated

2) Using CTE

;WITH temp (uniquid,id,salary) AS 
 (
    SELECT row_number() OVER (PARTITION BY EmployeeId order by EmployeeId) AS id, 
    EmployeeId,Salary 
    FROM Employee
 )
 DELETE FROM temp WHERE uniquid>1
 SELECT * FROM Employee

Output

Both of the query will produce the desire output and will delete the duplicate records

I hope this is interesting one right,honestly i was not knowing until i performed it my self yesterday,so you can also hands on...

Comments

UnknownAugust 14, 2010 at 4:03 AM
I left IT befor some years.but this query reminds me of our CMC days.
ReplyDelete
Replies

Add comment

Daivagna Nanavati

Search This Blog

Delete duplicate records from table even when there is no key field

Comments

Post a Comment

Popular posts from this blog

Set up leprechaun code generation with Sitecore XM Cloud Starterkit

401.1 Unauthorized with windows authentication error code 0xc000006d

POV - How does the SCRUNCH acquisition matters the most at this point from Sitecore?