SQL UPDATE() updating millions of records nicely

Ever needed to update millions of records before ?
New to tackling this ?
I had this experience when our database contained 17 million records when 15.5 million were junk and could be disgarded. My mission was flag the unwanted records and remove them.
This blog is simply about how to go about updating a table with millions of records. Fear not there is an approach that wont lock your database for hours, or expontentially increase the size of your transaction log file. Its known as Chunked updates.

original source explained here http://www.sqlusa.com/bestpractices2005/hugeupdate/

In my case I needed to update 17 million records based on a cross reference match of each record against another table containing 5.5K records, a single update statement would have takens hours and hours.
The chunked approach on an optimised indexed set of tables took just 35 minutes on a single license quad core proc SQL server 2005 standard edition.
Heres is the SQL statement

So you can see from the code i was chunking in 500000 rows segments, tune this to your requirements I guessed and struck it lucky.

For those interested, yes i did take steps to prevent this from occuring again. I created an insert trigger that cross references the User Agent table flagging the record. A Job then does the clean up of periodically archiving the unwanted data into a redundant database for historical reference.

2 thoughts on “SQL UPDATE() updating millions of records nicely”

  1. Nice post Andre, I came to this blog post from your post
    I am in a similar situation, due to negligence in programming, my database has 3.5 million entries which needs to be cleaned up, but there is no quiet time for the product. There are a series of delete statements which need to be executed so as to maintain the referential integrity. I above post, i did not get the concept of copying the value into temp table, won’t it add to the time cost? also if have to delete from several tables, do i need as many temp tables? It would be really helpful if you can mail your view points in this regards.

    1. Hi there

      If i understand correctly you need to perform a delete statement on given data within a table, the referenced tables also require a cascade delete.

      Firstly dont rely on the enforce cascade delete on the relationships since it will cause a lock on all tables that will not be released until the transaction is complete.
      Let me explain my post a little more clearly.

      This post is more to do with updating rather than deleting. This method collected up a chunk of the ID’s (or other primary key / indexed field on the target table) of the records that needed updating based on a field containing a match within another table. This allows collecting the IDs as slow as you like without affecting the table, ie no locking will occur.
      Once collected, It then used the faster (Exists) method in SQL server to perform the fastest possible update to minimise impact since it will lock the table while this occurs.
      The while loop continues the process until all records have been updated.

      If you still need help with your delete let me know because i now have an archive routine that shifts millions of records from a series of related tables into a similarly structured “Archive” set of related tables with minimal impact to live processing

Leave a Reply to Andre Cancel reply

Your email address will not be published. Required fields are marked *