SQL – Cursor and Batching

Somewhat surprisingly, batch processing stays as one of the most gaping holes in DB power-users’ knowledgebases… Processing records one by one, searching, fetching, updating- they seem to be fine with all of this no matter the type of the DB, but there is something about batching that many do not seem to grasp. What’s worse, is that they don’t even KNOW that they are really missing out…

Oh, do all of the records one by one, I can do that…I think…will just take a very long while, but guess I’ll live.

Seriously? Why not just do all at the same time, quickly, easily, and with minimal impact? And, if you cannot do that for some reason (or fear drastic impact), then at least put it on a loop, how hard is that?

Working with MS SQL I find 3 approaches to be the most useful:
1. Full batch (aka “all at once”)

UPDATE
    ATable
SET 
    AColumn = 'SOMENEWPREFIXVALUE' + AColumn
WHERE
    AColumn = 'SOMEVALUE'

2. Loop batch (aka “in chunks”)

DECLARE @COUNT INT = 1000
SELECT
    'Starting The Loop...'
WHILE @@ROWCOUNT > 0 
    BEGIN
        UPDATE
            ATable
        SET 
            AColumn = 'SOMENEWPREFIXVALUE' + AColumn
        WHERE
            ID IN (SELECT TOP (@COUNT)
                    ID
                   FROM
                    ATable t (NOLOCK)
                   WHERE
                    t.AColumn = 'SOMEVALUE')
    END

3. Cursor (aka “one by one”)

DECLARE
    @ID INT,
    @ImportantValue VARCHAR(50)
DECLARE itrCur CURSOR
FOR
SELECT ID,AColumn
FROM
ATable t (NOLOCK)
WHERE
t.AColumn = 'SOMEVALUE'
OPEN itrCur
FETCH NEXT FROM itrCur INTO @ID,@ImportantValue
WHILE @@FETCH_STATUS = 0 
    BEGIN
        UPDATE
            ATable
        SET 
            AColumn = 'SOMENEWPREFIXVALUE' + @ImportantValue
        WHERE
            ID = @ID
        WAITFOR DELAY '00:00:01' ;
        FETCH NEXT FROM itrCur INTO @ID,@ImportantValue
    END
CLOSE itrCur
DEALLOCATE itrCur

Many seem to be set on one and use it everywhere, but as you can guess by now- I do believe that all 3 have advantages and all 3 have major flaws. Knowing about them will allow one to choose the right tool for the right occasion and if nothing else, be proud of what is put out there.

1. Full

    • Scary and immediate
    • Potential for indefinite locks
    • Rollback on cancellation may take even more time or be even completely unavailable
  • +
    • All at once means all-at-once chunk for logs and replication (may be a (-) if the HW/system was not set-up to handle this)
    • quick and easy (assuming proper query/indexing/etc)

2. Loop

    • Still scary
    • Still issues w/locks and rollbacks if the batch size is too big
    • “Manual” rollbacks (on cancel) are possible for the last ran batch only
    • Multiple chunks for logs/replication to process (may be a (+) if the system is not so up-to-par)
    • Looks uncool
  • +
    • (Almost) all the positives of the full batch
    • A bit more control over locks and rollback
    • Ability to manually stop at any time with only partial apply
    • Still quick and fairly easy

3. Cursor

    • Rollbacks not possible
    • N number of lines for logs
    • N number of lines for replication to process (may be a (+), but rarely)
    • N number of hits against the table (albeit cursor takes care of caching the select part)
    • Very slow (especially w/the artificial delay)
    • Definitely not pretty
  • +
    • Minimal lock and no rollback issues
    • Not much thinking/optimization necessary