Somewhat surprisingly, batch processing stays as one of the most gaping holes in DB power-users’ knowledgebases… Processing records one by one, searching, fetching, updating- they seem to be fine with all of this no matter the type of the DB, but there is something about batching that many do not seem to grasp. What’s worse, is that they don’t even KNOW that they are really missing out…
Oh, do all of the records one by one, I can do that…I think…will just take a very long while, but guess I’ll live.
Seriously? Why not just do all at the same time, quickly, easily, and with minimal impact? And, if you cannot do that for some reason (or fear drastic impact), then at least put it on a loop, how hard is that?
Working with MS SQL I find 3 approaches to be the most useful:
1. Full batch (aka “all at once”)
UPDATE ATable SET AColumn = 'SOMENEWPREFIXVALUE' + AColumn WHERE AColumn = 'SOMEVALUE'
2. Loop batch (aka “in chunks”)
DECLARE @COUNT INT = 1000 SELECT 'Starting The Loop...' WHILE @@ROWCOUNT > 0 BEGIN UPDATE ATable SET AColumn = 'SOMENEWPREFIXVALUE' + AColumn WHERE ID IN (SELECT TOP (@COUNT) ID FROM ATable t (NOLOCK) WHERE t.AColumn = 'SOMEVALUE') END
3. Cursor (aka “one by one”)
DECLARE @ID INT, @ImportantValue VARCHAR(50) DECLARE itrCur CURSOR FOR SELECT ID,AColumn FROM ATable t (NOLOCK) WHERE t.AColumn = 'SOMEVALUE' OPEN itrCur FETCH NEXT FROM itrCur INTO @ID,@ImportantValue WHILE @@FETCH_STATUS = 0 BEGIN UPDATE ATable SET AColumn = 'SOMENEWPREFIXVALUE' + @ImportantValue WHERE ID = @ID WAITFOR DELAY '00:00:01' ; FETCH NEXT FROM itrCur INTO @ID,@ImportantValue END CLOSE itrCur DEALLOCATE itrCur
Many seem to be set on one and use it everywhere, but as you can guess by now- I do believe that all 3 have advantages and all 3 have major flaws. Knowing about them will allow one to choose the right tool for the right occasion and if nothing else, be proud of what is put out there.
- Scary and immediate
- Potential for indefinite locks
- Rollback on cancellation may take even more time or be even completely unavailable
- All at once means all-at-once chunk for logs and replication (may be a (-) if the HW/system was not set-up to handle this)
- quick and easy (assuming proper query/indexing/etc)
- Still scary
- Still issues w/locks and rollbacks if the batch size is too big
- “Manual” rollbacks (on cancel) are possible for the last ran batch only
- Multiple chunks for logs/replication to process (may be a (+) if the system is not so up-to-par)
- Looks uncool
- (Almost) all the positives of the full batch
- A bit more control over locks and rollback
- Ability to manually stop at any time with only partial apply
- Still quick and fairly easy
- Rollbacks not possible
- N number of lines for logs
- N number of lines for replication to process (may be a (+), but rarely)
- N number of hits against the table (albeit cursor takes care of caching the select part)
- Very slow (especially w/the artificial delay)
- Definitely not pretty
- Minimal lock and no rollback issues
- Not much thinking/optimization necessary