Skip to main content

Delete duplicate records from table even when there is no key field

Hi all,yesterday only i was discussing with my colleague Kamlesh Samnani who is MCTS about deleting records which does not have any primary key field based on which we can delete the data (actually he only fired this question yesterday to me)

so i thought to write a blog post about it,there are many posts which talk about deleting duplicate records from the table which has some key field,so i started working on the query and what i got is what i am gonna write here.

Lets start creating the table and after inserting some data in it,we will fire our query in that and will check the output as well

CREATE TABLE [dbo].[Employee](
[EmployeeId] [int] NULL,
[Salary] [money] NULL
) ON [PRIMARY]

INSERT INTO Employee VALUES(1,1000)
INSERT INTO Employee VALUES(1,2000)
INSERT INTO Employee VALUES(1,3000)
INSERT INTO Employee VALUES(2,2000)
INSERT INTO Employee VALUES(3,3000)
INSERT INTO Employee VALUES(3,4000)
INSERT INTO Employee VALUES(4,4000)
INSERT INTO Employee VALUES(4,5000)


after creating table and inserting some data table will look like following



Now we will try solving the problem using two method,first one is using simple query which is quite interesting (It was new for me when i created this query and performed) and second approach is using CTE (Common Table Expression)

1) Using Simple Query with Row_Number()

DELETE temp FROM
  (
      SELECT id FROM
      (
        SELECT row_number() over (PARTITION BY EmployeeId order by EmployeeId) AS   
        id, EmployeeId,Salary 
        FROM Employee
      ) AS t WHERE id > 1
   ) temp
   SELECT * FROM Employee

Here what row_number() will do is will create numbers for each of the row,now we have PARTITION BY EmployeeId so it will reset the number when EmployeeId changes,so on every new EmployeeId there will be number generated from 1,so we can directly delete data which are > 1,because they are duplicate,now in case if we do not have any duplicate record this row_number() will generated only once and it will never be reset,so in that case no records would be having same number generated

2) Using CTE

;WITH temp (uniquid,id,salary) AS 
 (
    SELECT row_number() OVER (PARTITION BY EmployeeId order by EmployeeId) AS id, 
    EmployeeId,Salary 
    FROM Employee
 )
 DELETE FROM temp WHERE uniquid>1
 SELECT * FROM Employee

Output




Both of the query will produce the desire output and will delete the duplicate records

I hope this is interesting one right,honestly i was not knowing until i performed it my self yesterday,so you can also hands on...

Comments

  1. I left IT befor some years.but this query reminds me of our CMC days.

    ReplyDelete

Post a Comment

Popular posts from this blog

High CPU to completely normal CPU - SXA issue, SXA pages not loading in mobile device

  Hi Team, Today i am going to share one of the nightmarish issue with you all, We are having Sitecore 9.1.1 hosted in azure PaaS environment Our site was working just fine and no noise, but we have been working on a feature release where 7-8 months of development needed to be released to production, Big GO LIVE event right?  Also to make the development smoother we also introduced BLUE/GREEN deployment slots in the same release, so we can easily SWAP slots and go live Everything went well, we went live, we even did load and performance testing on our staging and pre-prod and we were confident enough of results Very next day we started getting "SITE DOWN" alerts, and also product owners and clients mentioned that site is very slow for them in US time and in our morning when we were accessing it, it was working lighting fast so we were clue less at start, but we started digging  1) First thing caught our eyes were HIGH CPU spikes, in US time, also without any traffic CPU u...

Experience Analytics not working

Hello Sitecorian, How many of you have encountered an error or the situation where graph on the experience analytics dashboard or graph on launchpad's dashboard is not working I recently encountered with this scenario where i was presented with the situation where simply graph was not coming up, Before digging completely into it and checking logs and everything, From very first high level thoughts pointed me to check Is the valid certificate is used for xconnect? Are thumbprints are correct in all configs ? We had scaled environment so i needed to check the thumbprint in all the needed roles. Is xconnect site is accessible from CD? Without any https exceptions? Is XDB is enable?  Is Tracking is enable?  Everything looked ok on first place, so i finally took my sleeves up and started digging into details, Below are the details steps and exceptions those were captured and i will also show you the topology of instances we were using. Scaled instance We are using...

An error occurred while receiving the HTTP response to This could be due to the service endpoint binding not using the HTTP protocol. This could also be due to an HTTP request context being aborted by the server (possibly due to the service shutting down). See server logs for more details.

You have noticed many times that everything was working fine and suddenly the below error starts coming and you find no way to work it out An error occurred while receiving the HTTP response to This could be due to the service endpoint binding not using the HTTP protocol. This could also be due to an HTTP request context being aborted by the server (possibly due to the service shutting down). See server logs for more details. The reason for this is the receiving size of WCF service is smaller then the data which is coming from service It was working before because it was small,So you will have to try to increase the receiving setting in your end point,Possible settings can be following maxStringContentLength="2147483647" maxReceivedMessageSize="2147483647" maxBufferSize="2147483647" maxArrayLength="2147483647" That would definately help you!!!