A Fast Way to Update Large Amount of Data in SQL Server With Select Into Statement
November 24th, 2007 by Andrew Chen
I frequently need to update large amount of data, like tens of millions of rows.
update o set o.data_field1=n.data_field1, o.data_field2=n.data_field2, o.data_field3=n.data_field from old_table o join new_table n on o.record_id=n.record_id
If you do this you will probably find your update query run for 24 hours and still can not finish. Even when you have indexes correctly built on both the new and old tables. I think a much faster approach is to create another table based on the old table and incorporate the updates into the new table while the table is built. To do that you need the help of a select … into query like this
select o.recordid,
case when n.data_field1 is not null then n.data_field1 else o.data_field1 end as data_field1,
case when n.data_field2 is not null then n.data_field2 else o.data_field2 end as data_field2,
case when n.data_field3 is not null then n.data_field3 else o.data_field3 end as data_field3,
o.other_fileds.....
into table_with_updates
from old_table o left join new_table n on o.record_id=n.record_id
A few notes about this query. First the old table is left join to new_table so that all records in the old table are in the result of the select statement. Second the three case statements are to use the data field values in the new table if it has one and use the value in the old table only when new table doesn’t have it. In effect this is an update of the data fields. Third “into table_with_updates” instructs SQL server to create a table called table_with_updates and dump the result set of the select statement into the table.
After the table with updates is built, you can drop the old table and rename the table with updates to the old table. If both the new table and old table have clustered indexes built on the record_id columns, the select…into query usually can finish within a couple hours. That is more than 10 times faster!




While I appreciate this is a time saving approach, and your testing indicates it has improved speed 10 fold…it may be worth mentioning that “Select Into” clauses are very inefficient.
I carried out some similar tests on 2 million rows, and found Select Into took a lot longer than an Insert Into…Select From query.
That’s probably true but I didn’t have that observasion. If anyone can provide some input on what is the underlying difference between “select * into table from” and “insert into table select * from” it will be great.