Quantcast
Channel: Transaction Log – Paul S. Randal
Viewing all 47 articles
Browse latest View live

TechNet Magazine: July 2011 SQL Q&A column


Initial VLF sequence numbers and default log file size

$
0
0

We're teaching on-site with a financial client in New York for two weeks and in a session this morning on log file architecture I was asked why the VLF (virtual log file) sequence numbers in a new database don't start at one.

Here's an example:

CREATE DATABASE foo;
GO
DBCC LOGINFO ('foo');
GO

FileId  FileSize  StartOffset FSeqNo Status  Parity  CreateLSN
——- ——— ———– —— ——- ——- ——————
2       253952    8192        137    2       64      0
2       253952    262144      0      0       0       0

Why is the first sequence number (FSeqNo) 137 and not 1? (For an explanation of the various fields in the output of the undocumented DBCC LOGINFO, see the post Inside the Storage Engine: More on the circular nature of the log.)

The answer is that it's one more than the highest VLF sequence number currently in the model database. Let's see:

DBCC LOGINFO ('model');
GO

FileId  FileSize  StartOffset FSeqNo Status  Parity  CreateLSN
——- ——— ———– —— ——- ——- ——————
2       253952    8192        136    2       128     0
2       262144    262144      135    0       128     0

The highest FSeqNo in the output above is 136, which accounts for the 137 in the new user database.

Now I'll prove it by increasing the sequence number in the model database:

USE model;
GO
CREATE TABLE BigRows (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
INSERT INTO BigRows DEFAULT VALUES;
GO 50
DBCC LOGINFO;
GO

FileId  FileSize  StartOffset FSeqNo Status  Parity  CreateLSN
——- ——— ———– —— ——- ——- ——————
2       253952    8192        136    0       128     0
2       262144    262144      137    2       128     0

And now I'll create another user database and the first VLF sequence number should be 138:

CREATE DATABASE foo2;
GO
DBCC LOGINFO ('foo2');
GO

FileId  FileSize  StartOffset FSeqNo Status  Parity  CreateLSN
——- ——— ———– —— ——- ——- ——————
2       253952    8192        138    2       64      0
2       360448    262144      0      0       0       0

And it is – proven!

The eagle-eyed among you may notice that the log file size of the database foo2 I just created is larger than the size of the foo database I initially created. This is because the default log file size is the higher of 0.5MB or 25% of the sum of the data files created. In this case, the data file is copied from the model database, and I made it larger when I created the BigRows table, so the log file of the new database has to be slightly larger.

One question that came up in the post comments – why is the highest VLF sequence number in my model database 136? No particular reason except I've done a bunch of things in model which generated transaction log and so the VLF sequence numbers have increased. It's almost certain to be different in your SQL Server instances for that reason.

Next up tomorrow – results from the index count survey from last month.

The post Initial VLF sequence numbers and default log file size appeared first on Paul S. Randal.

TechNet Magazine: September 2011 SQL Q&A column

$
0
0

The September edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Online index operations logging changes in SQL Server 2008
  • Is that transaction contained in my full backup?
  • ALTER INDEX … REBUILD vs. ALTER INDEX … REGORGANIZE
  • Avoiding regular log file shrinking

Check it out at http://technet.microsoft.com/en-us/magazine/hh395481.aspx.

The post TechNet Magazine: September 2011 SQL Q&A column appeared first on Paul S. Randal.

Understanding data vs log usage for spills in tempdb

$
0
0

Earlier this morning I noticed a discussion on the SQL MCM distribution list (that all the original MCM instructors are part of) that was trying to make sense of a huge disparity between tempdb data file usage and log file usage. I explained the answer and thought I'd share it with you all too.

The situation was a tempdb where the data files grew from 30GB to 120GB, running out of space on the disk, but the tempdb log file did not grow at all from its initial size of 1GB! How could that be?

One of the things to consider about tempdb is that logging in tempdb is ultra-efficient. Log records for updates in tempdb, for instance, only log the before image of the data instead of logging both before and after images. There is no need to log the after image – as that is only used for the REDO portion of crash recovery. As tempdb never gets crash-recovered, REDO never occurs. The before image *is* necessary, however, because transactions can be rolled back in tempdb, just like other databases, and so the before image of an update must be available to be able to successfully roll back the update.

Getting to the question though, I can easily explain the observed behavior by considering how a sort spill happens with tempdb.

I can simulate this using a gnarly query on the SalesDB database you can download from our resources page (see the top of the page for the sample databases to download).

I'm going to do a join of my Sales and Products tables and then sort the multi-million row result set by product name:

SELECT S.*, P.* from Sales S
JOIN Products P ON P.ProductID = S.ProductID
ORDER BY P.Name;
GO 

The query plan for this is (using Plan Explorer):

tempdbquery2 Understanding data vs log usage for spills in tempdb

I know that the sort is going to spill out of memory into tempdb in this case. First I checkpoint tempdb (to clear out the log) and then after running the query, I can analyze the transaction log for tempdb.

Looking at the operation in the log:

SELECT
     [Current LSN],
     [Operation],
     [Context],
     [Transaction ID],
     [Log Record Length],
     [Description]
FROM fn_dblog (null, null);
GO

Current LSN            Operation       Context  Transaction ID Len Description
———————- ————— ——– ————– — ———————————————————-
000000c0:00000077:0001 LOP_BEGIN_XACT  LCX_NULL 0000:00005e4d  120 sort_init;<snip>
000000c0:00000077:0002 LOP_BEGIN_XACT  LCX_NULL 0000:00005e4e  132 FirstPage Alloc;<snip>
000000c0:00000077:0003 LOP_SET_BITS    LCX_GAM  0000:00005e4e  60  Allocated 1 extent(s) starting at page 0001:0000aa48
000000c0:00000077:0004 LOP_MODIFY_ROW  LCX_PFS  0000:00005e4e  88  Allocated 0001:0000aa48;Allocated 0001:0000aa49;
000000c0:00000077:0005 LOP_MODIFY_ROW  LCX_PFS  0000:00005e4d  80  Allocated 0001:00000123
000000c0:00000077:0006 LOP_FORMAT_PAGE LCX_IAM  0000:00005e4d  84               
000000c0:00000077:0007 LOP_SET_BITS    LCX_IAM  0000:00005e4e  60               
000000c0:00000077:0009 LOP_COMMIT_XACT LCX_NULL 0000:00005e4e  52               
000000c0:00000077:000a LOP_BEGIN_XACT  LCX_NULL 0000:00005e4f  128 soAllocExtents;<snip>
000000c0:00000077:000b LOP_SET_BITS    LCX_GAM  0000:00005e4f  60  Allocated 1 extent(s) starting at page 0001:0000aa50
000000c0:00000077:000c LOP_MODIFY_ROW  LCX_PFS  0000:00005e4f  88  Allocated 0001:0000aa50;Allocated 0001:0000aa51;<snip>
000000c0:00000077:000d LOP_SET_BITS    LCX_IAM  0000:00005e4f  60               
000000c0:00000077:000e LOP_SET_BITS    LCX_GAM  0000:00005e4f  60  Allocated 1 extent(s) starting at page 0001:0000aa58
000000c0:00000077:000f LOP_MODIFY_ROW  LCX_PFS  0000:00005e4f  88  Allocated 0001:0000aa58;Allocated 0001:0000aa59;<snip>
000000c0:00000077:0010 LOP_SET_BITS    LCX_IAM  0000:00005e4f  60               
000000c0:00000077:0011 LOP_SET_BITS    LCX_GAM  0000:00005e4f  60  Allocated 1 extent(s) starting at page 0001:0000aa60
000000c0:00000077:0012 LOP_MODIFY_ROW  LCX_PFS  0000:00005e4f  88  Allocated 0001:0000aa60;Allocated 0001:0000aa61;<snip>
000000c0:00000077:0013 LOP_SET_BITS    LCX_IAM  0000:00005e4f  60               
000000c0:00000077:0014 LOP_COMMIT_XACT LCX_NULL 0000:00005e4f  52               
000000c0:00000077:0015 LOP_BEGIN_XACT  LCX_NULL 0000:00005e50  128 soAllocExtents;<snip>
000000c0:00000077:0016 LOP_SET_BITS    LCX_GAM  0000:00005e50  60  Allocated 1 extent(s) starting at page 0001:0000aa68
000000c0:00000077:0017 LOP_MODIFY_ROW  LCX_PFS  0000:00005e50  88  Allocated 0001:0000aa68;Allocated 0001:0000aa69;<snip>
000000c0:00000077:0018 LOP_SET_BITS    LCX_IAM  0000:00005e50  60               
000000c0:00000077:0019 LOP_SET_BITS    LCX_GAM  0000:00005e50  60  Allocated 1 extent(s) starting at page 0001:0000aa70
000000c0:00000077:001a LOP_MODIFY_ROW  LCX_PFS  0000:00005e50  88  Allocated 0001:0000aa70;Allocated 0001:0000aa71;<snip>
000000c0:00000077:001b LOP_SET_BITS    LCX_IAM  0000:00005e50  60               
000000c0:00000077:001c LOP_SET_BITS    LCX_GAM  0000:00005e50  60  Allocated 1 extent(s) starting at page 0001:0000aa78
000000c0:00000077:001d LOP_MODIFY_ROW  LCX_PFS  0000:00005e50  88  Allocated 0001:0000aa78;Allocated 0001:0000aa79;<snip>
000000c0:00000077:001e LOP_SET_BITS    LCX_IAM  0000:00005e50  60               
000000c0:00000077:001f LOP_SET_BITS    LCX_GAM  0000:00005e50  60  Allocated 1 extent(s) starting at page 0001:0000aa80
000000c0:00000077:0020 LOP_MODIFY_ROW  LCX_PFS  0000:00005e50  88  Allocated 0001:0000aa80;Allocated 0001:0000aa81;<snip>
000000c0:00000077:0021 LOP_SET_BITS    LCX_IAM  0000:00005e50  60               
000000c0:00000077:0022 LOP_COMMIT_XACT LCX_NULL 0000:00005e50  52               
000000c0:00000077:0023 LOP_BEGIN_XACT  LCX_NULL 0000:00005e51  128 soAllocExtents;<snip>

<snip>

000000cd:00000088:01d3 LOP_SET_BITS    LCX_GAM  0000:000078fc  60  Deallocated 1 extent(s) starting at page 0001:00010e50
000000cd:00000088:01d4 LOP_COMMIT_XACT LCX_NULL 0000:000078fc  52               
000000cd:00000088:01d5 LOP_BEGIN_XACT  LCX_NULL 0000:000078fd  140 ExtentDeallocForSort;<snip>
000000cd:00000088:01d6 LOP_SET_BITS    LCX_IAM  0000:000078fd  60               
000000cd:00000088:01d7 LOP_MODIFY_ROW  LCX_PFS  0000:000078fd  88  Deallocated 0001:00010e68;Deallocated 0001:00010e69;<snip>
000000cd:00000088:01d8 LOP_SET_BITS    LCX_GAM  0000:000078fd  60  Deallocated 1 extent(s) starting at page 0001:00010e68
000000cd:00000088:01d9 LOP_COMMIT_XACT LCX_NULL 0000:000078fd  52               
000000cd:00000088:01da LOP_MODIFY_ROW  LCX_PFS  0000:00005fac  80  Deallocated 0001:00000109
000000cd:00000088:01db LOP_SET_BITS    LCX_SGAM 0000:00005fac  60  ClearBit 0001:00000108
000000cd:00000088:01dc LOP_SET_BITS    LCX_GAM  0000:00005fac  60  Deallocated 1 extent(s) starting at page 0001:00000108
000000cd:00000088:01dd LOP_COMMIT_XACT LCX_NULL 0000:00005fac  52               

(I snipped out a few extraneous log records plus the 6 extra 'Allocated' and 'Deallocated' for each of the PFS row modifications.) 

One of the things I notice is that the sort spill space is allocated in extents, and almost the entire sort – from initialization, through allocating all the extents, to deallocating them – is contained in a few very large transactions. But the transactions aren't actually that large.

Look at the soAllocExtents transaction with Transaction ID 00005e50. It's allocating 4 extents – i.e. 256KB – in a single system transction (4 x mark an extent as unavailable in the GAM, 4 x bulk set the 8 PFS bytes for the 8 pages in the extent, 4 x mark an extent allocated in the IAM). The total size of the log records for this transaction is 1012 bytes. (The first soAllocExtents system transaction only allocates 3 extents, all the others allocate 4 extents.)

When the sort ends, the extents are deallocated one-at-a-time in system transactions called ExtentDeallocForSort. An example is the transaction with Transaction ID 000078fd. It generates log records totalling 400 bytes. This means each 256KB takes 4 x 400 = 1600 bytes to deallocate.

Combining the allocation and deallocation operations, each 256KB of the sort that spills into tempdb generates 2612 bytes of log records.

Now let's consider the original behavior that I explained. If the 90GB was all sort space:

  • 90GB is 90 x 1024 x 1024 = 94371840KB, which is 94371840 / 256 = 368640 x 256KB chunks.
  • Each 256KB chunk takes 2612 bytes to allocate and deallocate, so our 90GB would take 368640 x 2612 = 962887680 bytes of log, which is 962887680 / 1024 / 1024 = ~918MB of log.

And this would explain the observed behavior. 90GB of tempdb space can be allocated and used for a sort spill with roughly 918MB of transaction log, give or take a bit from my rough calculations.

Tempdb logs things very efficiently – especially things that spill out of memory. The next stop in debugging such a problem would be regularly capturing the output of sys.dm_db_task_space_usage to figure out who is using all the space and then digging in from there.

Hope this helps explain things!

The post Understanding data vs log usage for spills in tempdb appeared first on Paul S. Randal.

Using fn_dblog, fn_dump_dblog, and restoring with STOPBEFOREMARK to an LSN

$
0
0

I've blogged a bunch about using the undocumented fn_dblog function I helped write (and I've got a lot more to come :-) but here's one I haven't mentioned on my blog before: fn_dump_dblog (although I have talked about it at SQL Connections last year).

Here's a scenario: someone dropped a table and you want to find out when it happened and maybe who did it. The default trace has also wrapped so you can't get the DDL trace from there anymore.

If the transaction log for the DROP hasn't yet been cleared from the active portion of the log then you'd be able to use fn_dblog to search through the log for the information you need. You might even be able to look in the inactive portion of the log by using trace flag 2536, which instructs the log reader to ignore the log truncation point and dump all possible log records from the log.

But what do you do if the pertinent log records just don't exist in the log anymore? They're only in your log backups. You could tediously inch your way through restoring the log backups a few seconds at a time until you find the point at which the DROP took place, and then restore to just before that point so you can get the data back.

Or you could save a whole ton of time and use fn_dump_dblog which allows you to dump and search log records from a log backup file, without having to restore the database!

Finding a DROP in the log

Here's an example – I'm going to create a table, populate it, back it up, then drop it.

USE MASTER;
GO
IF DATABASEPROPERTYEX ('FNDBLogTest', 'Version') > 0 DROP DATABASE FNDBLogTest;
GO

CREATE DATABASE FNDBLogTest;
GO
USE FNDBLogTest;
GO
SET NOCOUNT ON;
GO

– Create tables to play with
CREATE TABLE ProdTable (c1 INT IDENTITY, c2 DATETIME DEFAULT GETDATE (), c3 CHAR (25) DEFAULT 'a');
CREATE TABLE ProdTable2 (c1 INT IDENTITY, c2 DATETIME DEFAULT GETDATE (), c3 CHAR (25) DEFAULT 'a');
GO

INSERT INTO ProdTable DEFAULT VALUES;
GO 1000

– Take initial backups
BACKUP DATABASE FNDBLogTest TO DISK = 'D:\SQLskills\FNDBLogTest_Full.bak' WITH INIT;
GO
BACKUP LOG FNDBLogTest TO DISK = 'D:\SQLskills\FNDBLogTest_Log1.bak' WITH INIT;
GO

INSERT INTO ProdTable2 DEFAULT VALUES;
GO 1000

Now I'll drop the table and add some more log records:

DROP TABLE ProdTable;
GO

INSERT INTO ProdTable2 DEFAULT VALUES;
GO 1000

Now how can we find the point at which the table was dropped?

SELECT
    [Current LSN],
    [Operation],
    [Context],
    [Transaction ID],
    [Description]
FROM fn_dblog (NULL, NULL),
    (SELECT [Transaction ID] AS tid FROM fn_dblog (NULL, NULL) WHERE [Transaction Name] LIKE '%DROPOBJ%') fd
WHERE [Transaction ID] = fd.tid;
GO

Current LSN            Operation       Context           Transaction ID Description
———————- ————— —————– ————-  ——————————–
0000009d:0000021e:0001 LOP_BEGIN_XACT  LCX_NULL          0000:00001ff7  DROPOBJ; <snip>
0000009d:0000021e:0002 LOP_LOCK_XACT   LCX_NULL          0000:00001ff7 
0000009d:0000021e:0003 LOP_LOCK_XACT   LCX_NULL          0000:00001ff7 
0000009d:0000021e:0008 LOP_MODIFY_ROW  LCX_IAM           0000:00001ff7 
0000009d:0000021e:0009 LOP_MODIFY_ROW  LCX_PFS           0000:00001ff7  Deallocated 0001:0000009b
0000009d:0000021e:000a LOP_MODIFY_ROW  LCX_IAM           0000:00001ff7 
0000009d:0000021e:000b LOP_MODIFY_ROW  LCX_PFS           0000:00001ff7  Deallocated 0001:0000009c
0000009d:0000021e:000c LOP_MODIFY_ROW  LCX_IAM           0000:00001ff7 
0000009d:0000021e:000d LOP_MODIFY_ROW  LCX_PFS           0000:00001ff7  Deallocated 0001:0000009d
0000009d:0000021e:000e LOP_MODIFY_ROW  LCX_IAM           0000:00001ff7 
0000009d:0000021e:000f LOP_MODIFY_ROW  LCX_PFS           0000:00001ff7  Deallocated 0001:0000009e
0000009d:0000021e:0010 LOP_MODIFY_ROW  LCX_IAM           0000:00001ff7 
0000009d:0000021e:0011 LOP_MODIFY_ROW  LCX_PFS           0000:00001ff7  Deallocated 0001:0000009f
0000009d:0000021e:0012 LOP_MODIFY_ROW  LCX_PFS           0000:00001ff7  Deallocated 0001:0000009a
0000009d:0000021e:0013 LOP_HOBT_DDL    LCX_NULL          0000:00001ff7  Action 3 on HoBt 0xd:100 <snip>
0000009d:0000021e:0014 LOP_DELETE_ROWS LCX_MARK_AS_GHOST 0000:00001ff7 
0000009d:0000021e:0032 LOP_LOCK_XACT   LCX_NULL          0000:00001ff7
<snip>

Cool eh?

Now we take another log backup, which clears the log, and contains the log we just looked at.

BACKUP LOG FNDBLogTest TO DISK = 'D:\SQLskills\FNDBLogTest_Log2.bak' WITH INIT;
GO

Restoring using STOPBEFOREMARK 

The LSN for the LOP_BEGIN_XACT log record is where we need to restore to just before.

To do that we need to convert the LSN to the format necessary when using the STOPBEFOREMARK option for RESTORE. The option is documented but the format is not – how helpful!!

The LSN we have from the log dump above is 0000009d:0000021e:0001. To convert it:

  • Take the rightmost 4 characters (2-byte log record number) and convert to a 5-character decimal number, including leading zeroes, to get stringA
  • Take the middle number (4-byte log block number) and convert to a 10-character decimal number, including leading zeroes, to get stringB
  • Take the leftmost number (4-byte VLF sequence number) and convert to a decimal number, with no leading zeroes, to get stringC
  • The LSN string we need is stringC + stringB + stringA

So 0000009d:0000021e:0001 becomes '157' + '0000000542' + '00001' = '157000000054200001'.

The restore sequence to restore to just before the DROP is therefore:

RESTORE DATABASE FNDBLogTest2
    FROM DISK = 'D:\SQLskills\FNDBLogTest_Full.bak'
    WITH MOVE 'FNDBLogTest' TO 'C:\SQLskills\FNDBLogTest2.mdf',
    MOVE 'FNDBLogTest_log' TO 'C:\SQLskills\FNDBLogTest2_log.ldf',
    REPLACE, NORECOVERY;
GO

RESTORE LOG FNDBLogTest2
    FROM DISK = 'D:\SQLskills\FNDBLogTest_Log1.bak'
    WITH NORECOVERY;
GO

RESTORE LOG FNDBLogTest2
    FROM DISK = 'D:\SQLskills\FNDBLogTest_Log2.bak'
    WITH STOPBEFOREMARK = 'lsn:157000000054200001',
    NORECOVERY;
GO

RESTORE DATABASE FNDBLogTest2 WITH RECOVERY;
GO

And the table is there again, right before the point it was dropped. You can see where I used the constructed LSN string in the final log restore.

Using fn_dump_dblog

So what if the log records are no longer in the log? You can use the fn_dump_dblog function. For instance, here is how you use it to look in the FNDBLogTest_Log2.bak backup:

SELECT COUNT (*) FROM fn_dump_dblog (
    NULL, NULL, 'DISK', 1, 'D:\SQLskills\FNDBLogTest_Log2.bak',
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT);
GO

You have to specify all the DEFAULT parameters (63 of them!) or it won't work. The other parameters are:

  • Starting LSN (usually just NULL)
  • Ending LSN (again, usually just NULL)
  • Type of file (DISK or TAPE)
  • Backup number within the backup file (for multi-backup media sets)
  • File name

So you could do the same query as I did above:

SELECT
    [Current LSN],
    [Operation],
    [Context],
    [Transaction ID],
    [Description]
FROM fn_dump_dblog (
    NULL, NULL, 'DISK', 1, 'D:\SQLskills\FNDBLogTest_Log2.bak',
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT),
    (SELECT [Transaction ID] AS tid
     FROM fn_dump_dblog (
         NULL, NULL, 'DISK', 1, 'D:\SQLskills\FNDBLogTest_Log2.bak',
         DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
         DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
         DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
         DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
         DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
         DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
         DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
         DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
         DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT)
     WHERE [Transaction Name] LIKE '%DROPOBJ%') fd
WHERE [Transaction ID] = fd.tid;
GO 

Which works perfectly, but takes much longer to run. 

So maybe you're wondering what all the other parameters to fn_dump_dblog are for? They are for specifying the media families of a media set that has more than one media family.

Here's an example using a log backup striped across two files:

BACKUP LOG FNDBLogTest
    TO DISK = 'D:\SQLskills\FNDBLogTest_Log3_1.bak',
    DISK = 'D:\SQLskills\FNDBLogTest_Log3_2.bak'
    WITH INIT;
GO 

If I try to use fn_dump_dblog and only specify a single file, I get an error:

SELECT COUNT (*) FROM fn_dump_dblog (
    NULL, NULL, 'DISK', 1, 'D:\SQLskills\FNDBLogTest_Log3_1.bak',
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT);
GO

Msg 3132, Level 16, State 1, Line 1
The media set has 2 media families but only 1 are provided. All members must be provided.

So I have to specify both media families:

SELECT COUNT (*) FROM fn_dump_dblog (
    NULL, NULL, 'DISK', 1, 'D:\SQLskills\FNDBLogTest_Log3_1.bak',
    'D:\SQLskills\FNDBLogTest_Log3_2.bak', DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
    DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT);
GO

Summary

So there you go – some more powerful tools to add to your disaster recovery arsenal.

Enjoy!

The post Using fn_dblog, fn_dump_dblog, and restoring with STOPBEFOREMARK to an LSN appeared first on Paul S. Randal.

Survey: transaction log files per database (code to run)

$
0
0

In this survey I'd like to see what the distribution of the number of log files per database is for your servers. I'm planning to do similar surveys through the rest of the year around data files and filegroups too.

Please run the following code (with example output):

SELECT
    DISTINCT [LogFiles],
    COUNT (*) AS [Databases]
FROM (
    SELECT
        COUNT (*) AS [LogFiles]
    FROM sys.master_files
    WHERE [type_desc] = N'LOG'
    GROUP BY [database_id]) [LogFilesCounts]
GROUP BY [LogFiles];
GO

LogFiles    Databases
———– ———–
1           28
2           1

And send me the results for as many servers as you want. You can either email me plaintext or a spreadsheet or append the results as a comment to this post. Please do not add any more info to the results (like server name, version etc.) as it's not relevant for this survey and adds a bunch of time to the results processing.

Note that the script will only work on 2005 onwards.

I'll editorialize the results in a week or two.

Thanks!

The post Survey: transaction log files per database (code to run) appeared first on Paul S. Randal.

Multiple log files and why they’re bad

$
0
0

About a month ago I kicked off a survey with some code to run to figure out how many log files your databases have (see here). There are all kinds of misconceptions about transaction logs and how to configure them (and how they work) and I'm amazed at the misinformation I continue to see published. For instance, a few weeks back I skimmed through a video that stated in multiple places that the default transaction log growth rate is 10MB – it's not, it's 10% and has been since SQL Server 2005.

I got data back from 1300 SQL Server instances across the world from 75 people (thanks!), with an average of 18 databases per instance including system databases (which jumped to an average of 29 per instance if I included 4 anomalous instances with many thousands of databases each).

Out of all those instances, only 32 had databases with more than one log file:

  • 23 instances with one database with two log files
  • 3 instances with two databases with two log files
  • 1 instance each with 3, 4, 8, 9, and 27 databases with two log files
  • 2 instances with one database with four log files

So 2.5% of instances surveyed had at least one database with more than one log file.

I think I'm pleased about that as I expected the number to be higher, but I also suspect I'm equating poorly configured VLF totals with general log mis-configuration, and a higher percentage of systems out there have transaction logs with too many VLFs. Let's settle for pleased :-)

But why do I care? And why should you care? Although it seems like there's nothing damaging about having multiple log files, I don't think that's true.

Firstly, having multiple log files implies that the first one ran out of space and because the second one still exists, the first one might still be pretty large (maybe the second one is large too!). I don't really care about that for crash recovery (which is bounded by how much unrecovered log there is), or performance of HA technologies like database mirroring, Availability Groups, or replication, which are bounded by transaction log generation rate, not size.

What I care about is performing a restore during disaster recovery. If the log files don't exist, they must be created and zero-initialized, and twice if you restore a diff backup too as both the full and diff restores zero out the log. If the first log file is as big as it can be, and there's a second log file still, that's potentially a lot of log file to zero initialize, which translates into more downtime during disaster recovery.

I would much rather that all the log files apart from the first one are removed once they're no longer needed, by simply waiting until all the active VLFs (marked with status 2 in the output from DBCC LOGINFO – see here) are in the first log file and then simply doing an ALTER DATABASE and removing the extra file, and then reducing the size of the remaining log file to something reasonable.

Which brings me to the second thing I care about: why was the extra log file needed in the first place? Why did the transaction log run out of space, necessitating creating another log file? That's the only explanation I can think of for having more than one log file as there is no performance gain from multiple log files – SQL Server will write to them sequentially, never in parallel (Jonathan demonstrates this neatly with Extended Events here.)

(I like to draw a parallel here with page splits. People fixate on the fact that they've got fragmentation, not the massive performance issue that created the fragmentation in the first place – page splits themselves!)

I blogged about the Importance of proper transaction log file size management more than three years ago (and here five years back), and many others have blogged about it too, but it's still one of the most common problems I see. Log growth can easily be monitored using the Log Growths performance counter in the Databases performance object and I'm sure someone's written code to watch for the log growth counter incrementing for databases and alerting the DBA.

For someone who's a DBA, there's no excuse for having out-of-control transaction logs IMHO, but for involuntary DBAs and those who administer systems where SQL Server sits hidden for the most part (e.g. Sharepoint), I can understand not knowing.

But now you do. Get reading these articles and get rid of those extra log files, and the need for them! Use the code in the original survey (see the link at the top) to see whether you've got an extra log files kicking around.

Enjoy!

PS For a general overview of logging, recovery, and the transaction log, see the TechNet Magazine article I wrote back in 2009.

The post Multiple log files and why they’re bad appeared first on Paul S. Randal.

Parent transaction ID in 2012 fn_dblog output

$
0
0

All kinds of little bits of information have been added to the output of various DMVs, functions, and commands in SQL Server 2012.

One which I only discovered recently, and about which I’m really excited (ok, I should probably get out more :-), is the inclusion of the parent transaction ID in the output of fn_dblog. This allows us to see which transaction is the parent of nested system transactions and other sub-transactions.

You might wonder why I care? It allows us to see the hierarchy and definitive order of transactions during operations we’re performing with SQL Server, to aid in working out how operations work under the covers, rather than having to guess and assume.

For instance, when inserting the first row into a new table, the allocation system has to allocate a page from a mixed extent. Using the new field, we can see the definitive order of operations:

CREATE TABLE t1 (c1 INT);
GO

CHECKPOINT;
GO

INSERT INTO t1 VALUES (1);
GO

SELECT
[Current LSN],
[Operation],
[Context],
[Transaction ID],
[AllocUnitName],
[Transaction Name],
[Parent Transaction ID]
FROM fn_dblog (NULL, NULL);
GO

Current LSN            Operation              Context       Transaction ID AllocUnitName           Transaction Name           Parent Transaction ID
———————- ———————- ————- ————– ———————– ————————– ———————
00000119:000001c2:0001  LOP_BEGIN_XACT        LCX_NULL      0000:0000a216  NULL                    INSERT                     NULL
00000119:000001c2:0002  LOP_BEGIN_XACT        LCX_NULL      0000:0000a217  NULL                    AllocHeapPageSimpleXactDML 0000:0000a216
00000119:000001c2:0003  LOP_LOCK_XACT         LCX_NULL      0000:0000a217  NULL                    NULL                       NULL
00000119:000001c2:0004  LOP_SET_BITS          LCX_SGAM      0000:00000000  Unknown Alloc Unit      NULL                       NULL
00000119:000001c2:0005  LOP_BEGIN_XACT        LCX_NULL      0000:0000a218  NULL                    AllocFirstPage             0000:0000a217
00000119:000001c2:0006  LOP_MODIFY_ROW        LCX_PFS       0000:0000a218  dbo.t1                  NULL                       NULL
00000119:000001c2:0007  LOP_BEGIN_XACT        LCX_NULL      0000:0000a219  NULL                    AllocMixedExtent           0000:0000a217
00000119:000001c2:0008  LOP_SET_BITS          LCX_GAM       0000:0000a219  Unknown Alloc Unit      NULL                       NULL
00000119:000001c2:0009  LOP_SET_BITS          LCX_SGAM      0000:0000a219  Unknown Alloc Unit      NULL                       NULL
00000119:000001c2:000a  LOP_COMMIT_XACT       LCX_NULL      0000:0000a219  NULL                    NULL                       NULL
00000119:000001c2:000b  LOP_MODIFY_ROW        LCX_PFS       0000:0000a217  Unknown Alloc Unit      NULL                       NULL
00000119:000001c2:000c  LOP_FORMAT_PAGE       LCX_IAM       0000:0000a217  dbo.t1                  NULL                       NULL
00000119:000001c2:000d  LOP_HOBT_DELTA        LCX_NULL      0000:0000a217  NULL                    NULL                       NULL
00000119:000001c2:000e  LOP_MODIFY_ROW        LCX_IAM       0000:0000a218  dbo.t1                  NULL                       NULL
00000119:000001c2:000f  LOP_CREATE_ALLOCCHAIN LCX_NULL      0000:0000a217  NULL                    NULL                       NULL
00000119:000001c2:0010  LOP_LOCK_XACT         LCX_NULL      0000:0000a217  NULL                    NULL                       NULL
00000119:000001c2:0011  LOP_COMMIT_XACT       LCX_NULL      0000:0000a218  NULL                    NULL                       NULL
00000119:000001c2:0012  LOP_HOBT_DELTA        LCX_NULL      0000:0000a217  NULL                    NULL                       NULL
00000119:000001c2:0013  LOP_LOCK_XACT         LCX_NULL      0000:0000a217  NULL                    NULL                       NULL
00000119:000001c2:0014  LOP_ROOT_CHANGE       LCX_CLUSTERED 0000:0000a217  sys.sysallocunits.clust NULL                       NULL
00000119:000001c2:0015  LOP_LOCK_XACT         LCX_NULL      0000:0000a217  NULL                    NULL                       NULL
00000119:000001c2:0016  LOP_ROOT_CHANGE       LCX_CLUSTERED 0000:0000a217  sys.sysallocunits.clust NULL                       NULL
00000119:000001c2:0017  LOP_FORMAT_PAGE       LCX_HEAP      0000:0000a217  dbo.t1                  NULL                       NULL
00000119:000001c2:0018  LOP_LOCK_XACT         LCX_NULL      0000:0000a217  NULL                    NULL                       NULL
00000119:000001c2:0019  LOP_ROOT_CHANGE       LCX_CLUSTERED 0000:0000a217  sys.sysallocunits.clust NULL                       NULL
00000119:000001c2:001a  LOP_COMMIT_XACT       LCX_NULL      0000:0000a217  NULL                    NULL                       NULL
00000119:000001c2:001b  LOP_INSERT_ROWS       LCX_HEAP      0000:0000a216  dbo.t1                  NULL                       NULL
00000119:000001c2:001c  LOP_SET_FREE_SPACE    LCX_PFS       0000:00000000  Unknown Alloc Unit      NULL                       NULL
00000119:000001c2:001d  LOP_COMMIT_XACT       LCX_NULL      0000:0000a216  NULL                    NULL                       NULL

Pretty darn cool, eh? We can see that the system started an implicit transaction for us, which it called INSERT. This then started a sub-transaction called AllocHeapPageSimpleXactDML to handle the internals of the allocating a page. This in turn started two further sub-sub-transactions to deal with the extent allocation and allocating the page.

>Note that the setting of the free space of the new page in the PFS is not done as part of a transaction (called being non-transactional), as it would be impossible to maintain transactional free space bits without causing major blocking in the system.

In future I’ll be making use of this in my spelunking exercises.

Enjoy!

The post Parent transaction ID in 2012 fn_dblog output appeared first on Paul S. Randal.


New 7.5 hour online course on logging, recovery, and the transaction log

$
0
0

As you know we're recording a lot of content for Pluralsight, and they've just published my latest course today: SQL Server: Logging, Recovery, and the Transaction Log.

This is a carefully structured, 7.5 hour brain dump of everything I know about logging and recovery, which will be useful whether you're a beginner, a seasoned DBA, or you want to delve into more details. The course gives a wealth of practical, applicable knowledge that will help you avoid and recover from transaction log problems.

I can confidently say this is the most comprehensive coverage of logging and recovery that exists in the world today, and there aren't any others by someone who actually wrote code in that portion of the SQL Server Engine.

The course has 37 demos and 194 total recordings, split into the following modules:

  • Introduction
  • Understanding Logging
  • Transaction Log Architecture
  • Log Records
  • Checkpoints
  • Transaction Log Operations
  • Recovery and Crash Recovery
  • Recovery Models and Minimal Logging
  • Transaction Log Provisioning and Management
  • Transaction Log Backups
  • Corruption and Other HA/DR Topics

This means we now have 10 courses from the SQLskills team available on Pluralsight, totaling more than 40 hours of content, with several more coming online before year end and plans for another 25-30+ more courses each year going forward.

With individual subscriptions are low as US$30/month, more than 380 total IT courses, and newly introduced course transcripts and multi-language closed-captioning, Pluralsight is where it's at for online training.

Check out my new course at: SQL Server: Logging, Recovery, and the Transaction Log

I hope you like it!

The post New 7.5 hour online course on logging, recovery, and the transaction log appeared first on Paul S. Randal.

Tracking page splits using the transaction log

$
0
0

Whenever I’m teaching about index fragmentation I get asked how to track page splits proactively. This can be useful to discover fragmentation occurring in indexes you didn’t know had fragmentation problems, without running the sys.dm_db_index_physical_stats DMV (see here for how that works) against all the indexes in your databases. Today this came up multiple times, both in class and in email, so it’s time to bubble this blog post up to the top of the list.

You might think this is easy, as there’s a page split counter in sys.dm_db_index_operational_stats and in the Access Methods perfmon object. However, neither of these distinguish between ‘good’ splits and ‘nasty’ splits, which are my terms :-). A ‘nasty’ split is what we think of a just a page split – a data or index page having to split into two pages to make space for a record to be inserted or an existing record to expand. A ‘good’ split is what the Storage Engine calls adding a page on the right-hand side of the index leaf level as part of inserting new records in an ascending key index (e.g. a clustered index with a bigint identity column as the cluster key).

This is a really annoying as it makes both these methods of tracking page splits essentially useless.

If you’re running SQL Server 2012 or later, the solution is to use Extended Events, based on the new sqlserver.transaction_log event. Jonathan wrote a great post here that gives you the Extended Events sessions to use.

If you’re not running SQL Server 2012 or later, read on.

Before the sqlserver.transaction_log event was added, there was (and still is) the sqlserver.page_split event but that does not distinguish between ‘good’ splits and ‘nasty’ splits either, so some post processing is involved (essentially reading the page referenced in the event to see if it really split or not).

So what’s the answer?

Scanning the log for page splits

The easiest way to proactively see page splits occurring is to look in the transaction log. Whenever a page splits, an LOP_DELETE_SPLIT log record is generated so querying the transaction log can let you know what’s going on.

Some simple code to do this is:

SELECT
    [AllocUnitName] AS N'Index',
    (CASE [Context]
        WHEN N'LCX_INDEX_LEAF' THEN N'Nonclustered'
        WHEN N'LCX_CLUSTERED' THEN N'Clustered'
        ELSE N'Non-Leaf'
    END) AS [SplitType],
    COUNT (1) AS [SplitCount]
FROM
    fn_dblog (NULL, NULL)
WHERE
    [Operation] = N'LOP_DELETE_SPLIT'
GROUP BY [AllocUnitName], [Context];
GO

However, I don’t recommend doing this, for two reasons:

  1. Running fn_dblog will cause read I/Os on the transaction log, which can cause performance issues, especially if you’re running the scanner regularly and it happens to coincide with a log backup, for instance.
  2. Log clearing is disabled while fn_dblog is running, so on a system with a large amount of log to scan, this could interrupt the ability of the log to clear and cause log growth.

If you’re running in the full or bulk-logged recovery model, I recommend scanning your log backups for page splits instead of your actual log. If you’re only running in the simple recovery model, and you *really* want to run the script regularly, you’re going to have to run the script just before each checkpoint operation clears the log. But still, be careful you don’t interrupt the log clearing process.

Scanning a log backup for page splits

There are two options for this, using the fn_dump_dblog function I blogged about here:

  • Scanning a log backup on a system other than the production system.
  • Scanning a log backup on the production system.

If  you choose to use a system other than the production system, then unless you have a restored copy of the database, you will not be able to get the index name, as fn_dump_dblog does not give you the name and you will not have the metadata to allow looking up the index name from the allocation unit ID in the log.

So I’ve created two scripts for you, for when the database is and isn’t available on the server where the backup is located. I’ll extend these in future posts.

Have fun!

Scanning a log backup where the database is not available

SELECT
    [AllocUnitId],
    (CASE [Context]
        WHEN N'LCX_INDEX_LEAF' THEN N'Nonclustered'
        WHEN N'LCX_CLUSTERED' THEN N'Clustered'
        ELSE N'Non-Leaf'
    END) AS [SplitType],
    COUNT (1) AS [SplitCount]
FROM
    fn_dump_dblog (NULL, NULL, N'DISK', 1, N'C:\SQLskills\SplitTest_log.bck',
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
        DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT)
WHERE
    [Operation] = N'LOP_DELETE_SPLIT'
GROUP BY [AllocUnitId], [Context];
GO

Scanning a log backup where the database is available

SELECT
    CAST ([s].[name] AS VARCHAR) + '.' + CAST ([o].[name] AS VARCHAR) + '.' + CAST ([i].[name] AS VARCHAR) AS [Index],
    [f].[SplitType],
    [f].[SplitCount]
FROM
    (SELECT
        [AllocUnitId],
        (CASE [Context]
            WHEN N'LCX_INDEX_LEAF' THEN N'Nonclustered'
            WHEN N'LCX_CLUSTERED' THEN N'Clustered'
            ELSE N'Non-Leaf'
        END) AS [SplitType],
        COUNT (1) AS [SplitCount]
    FROM
        fn_dump_dblog (NULL, NULL, N'DISK', 1, N'C:\SQLskills\SplitTest_log.bck',
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT,
            DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT)
    WHERE
        [Operation] = N'LOP_DELETE_SPLIT'
    GROUP BY [AllocUnitId], [Context]) f
JOIN sys.system_internals_allocation_units [a]
    ON [a].[allocation_unit_id] = [f].[AllocUnitId]
JOIN sys.partitions [p]
    ON [p].[partition_id] = [a].[container_id]
JOIN sys.indexes [i]
    ON [i].[index_id] = [p].[index_id] AND [i].[object_id] = [p].[object_id]
JOIN sys.objects [o]
    ON [o].[object_id] = [p].[object_id]
JOIN sys.schemas [s]
    ON [s].[schema_id] = [o].[schema_id];
GO

The post Tracking page splits using the transaction log appeared first on Paul S. Randal.

The Accidental DBA (Day 30 of 30): Troubleshooting: Transaction Log Growth

$
0
0

This month the SQLskills team is presenting a series of blog posts aimed at helping Accidental DBAs ‘keep the SQL Server lights on’. It’s a little taster to let you know what we’ll be covering in our brand new Immersion Event for The Accidental DBA, which we’ll be presenting for the first time in September 2013 in Bellevue, WA. You can find all the other posts in this series at http://www.SQLskills.com/help/AccidentalDBA. Enjoy!

For the final post in our 30-day series I want to cover one of the most common problems for Accidental DBAs to come across – uncontrolled transaction log growth.

I remember an email we were sent a few years ago, which basically said: I’ve got a 10-GB database and a 987-GB transaction log. What happened?!?!

The number 1 cause of uncontrolled transaction log growth is the following:

  1. Someone creates a database, where the default recovery model is Full; or someone changes the recovery model of a database to Full (remember I explained about these back on day 7 of the series.)
    • Even though the recovery model is Full, the database behaves as if it’s using the Simple recovery model, where the transaction log will automatically clear (to allow reuse and help prevent growth) whenever a checkpoint occurs. This is called the pseudo-Simple recovery model. This situation continues until a database backup is performed.
  2. Someone performs a database backup, completing the transition of the database into the Full recovery model.
    • This database backup provides a basis for further log backups to be performed, and so the log will no longer clear until a log backup is performed.
  3. Nobody performs log backups.

As no-one is performing log backups, and the database is properly using the Full recovery model, the log will never clear. As the log doesn’t clear, it will eventually fill up and then require more space for more log records, and so it will automatically grow. And grow, and grow, until the disk volume runs out of space.

The trick here is simple: perform regular log backups of databases using the Full (or Bulk-Logged) recovery model.

This is all well and good as long as you know why the log cannot clear, as you may be taking log backups and the log is still growing because it cannot clear. SQL Server knows why the log can’t clear, and you can ask it using the following code:

SELECT [log_reuse_wait_desc] FROM sys.databases;
GO

This will list the reasons why the log could not clear, for all databases, at the time the most recent log clearing was attempted. In the case I described above, the output for the database would be LOG_BACKUP, and there are a bunch of other reasons, including long-running transactions, unreplicated transactions, active data backups, and more. You can read the full list in SQL Server Books Online topic Factors That Can Delay Log Truncation.

Whatever the reason is that has made your database’s log grow, take corrective action to allow the log to clear, and then take preventative action to stop the log having to grow again. And if your log has already grown really large, you’ll want to shrink it down to a more reasonable size, and you can read more details on how to do that in Kimberly’s (old but still relevant) post here.

Apart from the log growing and potentially running out of space, log growth is problematic because:

  • Every time the log grows, the new portion of the log has to be zero initialized, which takes some time and causes the write operations on the database to pause.
  • Every time the log grows, more internal chunks are added to the log, and the more of these there are, the more performance impact there is on log operations (Kimberly’s post discusses this too).
  • As the log gets larger, the potential for more downtime during a disaster increases. If the entire database is destroyed, restoring from backups involves recreating the data and log files. The larger the log file is, the longer this takes because of zero initialization again.

Careful transaction log management is required as part of being responsible for a SQL Server instance, whether as an Accidental or experienced DBA.

Finally, we really hope you’ve enjoyed reading our post-a-day series during June as much as we’ve enjoyed writing it, and we also hope you’ve learned a bunch of information that will help you increase your DBA skills. To continue your DBA education, make sure to follow our blogs, subscribe to our bi-weekly SQLskills Insider newsletter, check out our online training, and consider coming to one of our in-person training classes – for the ultimate learning experience!

Thanks – Paul & the SQLskills team

The post The Accidental DBA (Day 30 of 30): Troubleshooting: Transaction Log Growth appeared first on Paul S. Randal.

How to tell who changed a log file characteristic?

$
0
0

My good friend Orson Weston (@weston12) posted a question to #sqlhelp on Twitter earlier: Is there a way out of the box to find when and who changed the max file size for a log file?

You can’t tell this from the default trace as that just logs that the database was altered, not the log file.

But yes, I can tell you who made a change to the configuration of the log file for a database, using transaction log analysis.

I just ran the following code to set up the demo:

CREATE DATABASE foo2;
GO
ALTER DATABASE foo2
MODIFY FILE (NAME = N'foo2_log', MAXSIZE = 100GB);
GO

If I want to know who changed the log file properties, I can rely on the fact that the log file is file ID 2, and that the log file characteristics are stored in the sysprufiles hidden system table. Slot 0 in that table is the MDF, slot 1 is the LDF. Using fn_dblog to look in the transaction log for changes to that slot in that table, I can run the following code:

SELECT
	SUSER_SNAME ([Transaction SID]) AS [User],
	[Begin Time] AS [Change Time]
FROM
	fn_dblog (NULL, NULL)
WHERE
	[Transaction ID] =
		(SELECT
			[Transaction ID]
		FROM
			fn_dblog (NULL, NULL)
		WHERE
			[Operation] = N'LOP_MODIFY_ROW'
			AND [AllocUnitName] = N'sys.sysprufiles.clst'
			AND [Slot ID] = 1)
	AND [Operation] = N'LOP_BEGIN_XACT';
GO

 

User                                        Change Time
------------------------------------------- ------------------------
APPLECROSS\paul                             2013/09/18 14:01:25:310

It’s harder to tell exactly what changed, but you can work that out by looking at the structure of the sysprufiles records and matching the Offset column in the log record in the fn_dblog output. Not impossible, but tedious. In most cases it’s enough to know that someone changed the log file characteristics at all.

If you’re interested in a different log file (why do you have more than one?) then figure out which slot ID it is in sysprufiles and substitute that slot ID in the code.

And if you don’t have any information in the current transaction log, you can scan your log backups using the same code and using fn_dump_dblog instead (see here for details, syntax, and a usage warning).

Hope this helps!

The post How to tell who changed a log file characteristic? appeared first on Paul S. Randal.

Are I/O latencies killing your performance?

$
0
0

In this post I explain some methods for investigating and reducing high tempdb and transaction log I/O latencies that can severely hamper the performance of your workload.

Back at the end of August I kicked off a survey asking you to run some code to calculate average I/O latencies and send me the results. I have to apologize for taking two months to get to this editorial as I kept putting off spending the time to collate all the results I was sent. I made up for it by putting in lots of useful information :-)

I received results from 1094 random servers around the world (thank you all!) hosting 25445 databases and it took me a while to pull all the results into SQL Server so I could aggregate the data. Here it is.

What is Good or Bad?

For everything we’re talking about, you have to consider two things:

  • What is good or bad for I/O latency?
  • Even if you have “bad” I/O latency, do you care?

Everyone has their idea of what constitutes good or bad I/O latency, and here’s my take:

  • Excellent: < 1ms
  • Very good: < 5ms
  • Good: 5 – 10ms
  • Poor: 10 – 20ms
  • Bad: 20 – 100ms
  • Shockingly bad: 100 – 500ms
  • WOW!: > 500ms

You may want to argue that my numbers are too high, too low, or wrong, based on what you’ve seen/experienced, and that’s fine, but this is my blog and these are my numbers :-)

Even if your I/O latency is in what I consider the “bad” category, it could be that your workload is performing within the acceptable bounds for your organization and your customers, and so you’re comfortable with that. I’m OK with this, as long as you are aware of your I/O latencies and you’ve consciously decided to accept them as they are. Being ignorant of your I/O latencies and just accepting the workload performance as it is, because that’s just what the performance is, is not acceptable to me.

On to the survey data…

Tempdb Data Files

For this data I worked out the average read and write latency over all tempdb data files for each instance. I didn’t see any instances where one tempdb data file had a huge latency compared to the others so I believe this is a valid approach.

TempdbAvgRead Are I/O latencies killing your performance?

The tempdb data file read latencies aren’t too bad, to be honest, with more than 93% of all the servers in the survey having read latencies less than 20ms.

TempdbAvgWrite Are I/O latencies killing your performance?

This is very interesting – almost 42% of all the servers in the survey had average tempdb data file write latency of more than 20ms, and just over 12% of all servers had average tempdb data file write latency of more than half a second per write – that’s ridiculously high!

To be honest, I’m pretty shocked by these results. especially the relatively high number of servers with multi-second average write latencies for tempdb.

So if you check your average I/O latency for tempdb (using a script such as the one I blogged here, using sys.dm_io_virtual_file_stats) and find that it’s really high on my (or your) scale, what can you do?

Well, there are four approaches I can think of:

  1. Don’t do any investigation and just immediately move tempdb to a faster I/O subsystem, such as two SSD cards in a RAID-1 configuration (remember, one SSD is RAID-0, and that’s not good enough because if tempdb is corrupt or unavailable, your instance shuts down!). You might think that this is the lazy approach, but if you *know* you can’t make any changes to workload, this may be your only option. I bumped into this scenario while working with a client that supplies casino software. The optimal solution was to change some SPs to reduce tempdb usage, but that was going to take 18 months to be approved by the Gaming Commission in the state where the client’s client was located. The only solution in the meantime? Pony up for a faster tempdb I/O subsystem.
  2. Investigate the I/O subsystem where tempdb is located, looking for things like (non-exhaustive list):
    • Pathing/network issues, such as having a 1Gb switch in the middle of a 4Gb path, or having mismatched jumbo frame settings in your iSCSI configuration, or where the network to the SAN is being saturated by something other than SQL Server I/O traffic.
    • Incorrect SAN settings, such as not having write-caching enabled for a write-heavy workload, incorrect queue depth compared to your SAN vendor’s recommendations, or tempdb stuck on slow storage on an auto-tiering SAN because of incorrect SAN training.
    • Multiple users of the portion of the I/O subsystem where tempdb is located, such as having tempdb lumped in with other volatile databases, or even having LUNs shared between SQL Server and other applications like Exchange or IIS.
    • Only having a single tempdb data file so missing out on the usual higher I/O subsystem performance gained from having multiple data files in a filegroup (note: don’t confuse this with adding more tempdb data files to reduce PAGELATCH_XX contention on in-memory allocation bitmaps.)
  3. Try to reduce the usage of tempdb, by looking for (non-exhaustive list):
    • Spill warnings in query plans (hash, sort, or exchange) indicating that there was not enough query execution memory and an operator had to spill results to tempdb. See these blog posts for more information: here, here, here, and a blog post of mine explaining how to understand the data vs. log usage for a memory spill to tempdb here.
    • Incorrect, excessive usage of temp tables, such as *always* using a temp table when it may be better sometimes to not do so, pulling more columns or rows into a temp table than are actually required (e.g. SELECT * into the temp table from a user table, with no WHERE clause), creating nonclustered indexes on a temp table that are not used. I’ve seen this over and over with client code.
    • Index rebuilds that use SORT_IN_TEMPDB.
    • Using one of the flavors of snapshot isolation and allowing long-running queries that cause the version store to grow very large.
    • There are lots of useful queries and other information in the whitepaper Working with tempdb in SQL Server 2005 (which is still applicable to all current versions.)
  4. A combination of #2 and #3, and then maybe you just have to move to a faster I/O subsystem, as in #1.

One other thing to consider is the risk of making a change to your code and/or the cost of the engineering effort (dev and test) to do so. It may be cheaper and less risky to move to a faster I/O subsystem. Your call. Another issue you may have is that the bad code is in a 3rd-party application that you have no control over. In that case you may have no choice except to throw hardware at the problem.

Transaction Log Files

For this data I considered each database separately rather than aggregating per instance.

LogReadAvg2 Are I/O latencies killing your performance?

LogWriteAvg2 Are I/O latencies killing your performance?

For the transaction log, you really want the average write latency to be in the 0-5ms range, and it’s good to see more than 79% of transaction log files in the survey are achieving that. I would say that write latency for the transaction log is much more important than read latency, as write latency slows down transactions in your workload. That’s not to say that you should ignore high read latencies, as these slow down log readers (such as log backups, transactional replication, change data capture, asynchronous database mirroring/availability groups), but log read latencies don’t usually slow down your workload unless you have transactions that are rolling back (the only time that transactions will cause log reads) or you rely heavily on change data capture.

So you’re focusing on write latency. Again, there are multiple approaches that are the same as #1-#4 above. What you’re looking for in approach #3 is different though. I’ve written several detailed posts about transaction log performance tuning (including reducing the amount of log generated and changing the log block flush pattern) for SQL Sentry’s SQLPerformance.com blog so rather than duplicate those, I’ll point you to them:

And then there’s my 8-hour Pluralsight online training class that covers SQL Server: Understanding Logging, Recovery, and the Transaction Log.

Summary

It’s really important that you pay attention to the read and write I/O latencies for critical components of your SQL Server environment: tempdb and transaction logs. I’ve given you a lot of info above on what you can do inside and outside SQL Server to reduce these latencies and increase your workload throughput.

Unless you have no choice, don’t just throw some SSDs into the mix without first figuring out whether there are some things you can do inside SQL Server to reduce the I/O load, and if you do throw in some SSDs, make sure that you’re using them to host whatever files are your biggest I/O bottleneck, and make sure you’re using at least two of them in a RAID-1 configuration to protect against failure.

I hope this has been a useful read – happy tuning!

The post Are I/O latencies killing your performance? appeared first on Paul S. Randal.

More on using Transaction SID from the transaction log

$
0
0

Back in 2012 I blogged about using fn_dblog and fn_dump_dblog to figure out the point at which something occurred that you’d like to restore to just before (e.g. a table drop). I also mentioned that you can use the SUSER_SNAME () function on on the [Transaction SID] column for the LOP_BEGIN_XACT log record of the operation to find out who performed the operation.

Yesterday in our IE2 Performance Tuning class in Tampa, someone asked me what the [Transaction SID] column would show if someone had run EXECUTE AS. As I wasn’t 100% certain, I decided to test and write a quick blog post.

First off I’ll set up a database and table to use:

USE [master];
GO
CREATE DATABASE [Test];
GO
ALTER DATABASE [Test] SET RECOVERY SIMPLE;
GO
USE [Test];
GO
CREATE TABLE [TestTable] ([c1] INT IDENTITY);
GO
INSERT INTO [TestTable] DEFAULT VALUES;
GO 5

Next I’ll create a Kimberly user for a SQL login, and a Katelyn user for a Windows login:

-- Create Kimberly login and user
CREATE LOGIN [KimberlyLogin] WITH PASSWORD = 'NiceWife';
CREATE USER [KimberlyUser] FOR LOGIN [KimberlyLogin];
EXEC sp_addrolemember N'db_owner', N'KimberlyUser';
GO

-- Create Katelyn user
CREATE USER [KatelynUser] FOR LOGIN [APPLECROSS\Katelyn];
EXEC sp_addrolemember N'db_owner', N'KatelynUser';
GO

Now I’ll delete a single row as me and each of the users and logins:

-- Delete as me
DELETE FROM [TestTable] WHERE [c1] = 1;
GO

-- Now delete as Kimberly user
EXECUTE AS USER = N'KimberlyUser';
DELETE FROM [TestTable] WHERE [c1] = 2;
REVERT;
GO

-- Now delete as Kimberly login
EXECUTE AS LOGIN = N'KimberlyLogin';
DELETE FROM [TestTable] WHERE [c1] = 3;
REVERT;
GO

-- Now delete as Katelyn user
EXECUTE AS USER = N'KatelynUser';
DELETE FROM [TestTable] WHERE [c1] = 4;
REVERT;
GO

-- Now delete as Katelyn login
EXECUTE AS LOGIN = N'APPLECROSS\Katelyn';
DELETE FROM [TestTable] WHERE [c1] = 5;
REVERT;
GO

Finally I’ll pull the [Transaction SID] for each of the delete operations and pass it into SUSER_SNAME ():

SELECT
	[Operation], [Transaction Name], [Transaction SID],
	SUSER_SNAME ([Transaction SID]) AS [WhoDidIt?]
FROM fn_dblog (NULL, NULL)
WHERE [Operation] = N'LOP_BEGIN_XACT'
AND [Transaction Name] = 'DELETE';
GO
Operation       Transaction Name  Transaction SID                                             WhoDidIt?
--------------- ----------------- ----------------------------------------------------------- -------------------
LOP_BEGIN_XACT  DELETE            0x0105000000000005150000003A5014D05A957BF8F5C8882EE8030000  APPLECROSS\paul
LOP_BEGIN_XACT  DELETE            0x9A9A69BEACF67E4994E2F2DEE35BC02F                          KimberlyLogin
LOP_BEGIN_XACT  DELETE            0x9A9A69BEACF67E4994E2F2DEE35BC02F                          KimberlyLogin
LOP_BEGIN_XACT  DELETE            0x0105000000000005150000003A5014D05A957BF8F5C8882EFE030000  APPLECROSS\Katelyn
LOP_BEGIN_XACT  DELETE            0x0105000000000005150000003A5014D05A957BF8F5C8882EFE030000  APPLECROSS\Katelyn

So the answer is that the log record contains the SID of who you’re executing as. The only way to tell who is really running the code would be through auditing.

Enjoy!

The post More on using Transaction SID from the transaction log appeared first on Paul S. Randal.

What is the most worrying cause of log growth (log_reuse_wait_desc)?

$
0
0

Two weeks ago I kicked off a survey that presented a scenario and asked you to vote for the log_reuse_wait_desc value you’d be most worried to see on a critical database with a 24×7 workload.

Here are the results:

logreuse What is the most worrying cause of log growth (log reuse wait desc)?

Another very interesting spread of responses – as always, thanks to everyone who took the time to respond.

Remember that you have no knowledge about anything on the system except what the log_reuse_wait_desc is for the database with constantly growing log.

Before I start discussing them, I’m going to explain the term log truncation that I’ll be using. This is one of the two terms used (the other is log clearing) to describe the mechanism that marks currently-active VLFs (virtual log files) in the log as inactive so they can be reused. When there are inactive VLFs, the log can wrap around (see this post for more details) and doesn’t have to grow. When there are no available VLFs because log truncation isn’t possible, the log has to grow. When log truncation can’t make any VLFs inactive, it records the reason why, and that’s what the log_reuse_wait_desc value in sys.databases gives us. You can read more about how the log works in the this TechNet Magazine article I wrote back in 2009, and get all the log info you could ever want in my logging and recovery Pluralsight course.

We also need to understand how the log_reuse_wait_desc reporting mechanism works. It gives the reason why log truncation couldn’t happen the last time log truncation was attempted. This can be confusing – for instance if you see ACTIVE_BACKUP_OR_RESTORE and you know there isn’t a backup or restore operation running, this just means that there was one running the last time log truncation was attempted. You can also see some weird effects – for instance if you do a log backup, and then see LOG_BACKUP immediately as the log_reuse_wait_desc value. I blogged about the reason for that phenomenon here.

Let’s quickly consider what each of the log_reuse_wait_desc values in the list above means (and this isn’t an exhaustive analysis of each one):

  • NOTHING: Just as it looks, this value means that SQL Server thinks there is no problem with log truncation. In our case though, the log is clearly growing, so how could we see NOTHING? Well, for this we have to understand how the log_reuse_wait_desc reporting works. The value is reporting what stopped log truncation last time it was attempted. If the value is NOTHING, it means that at least one VLF was marked inactive the last time log truncation occurred. We could have a situation where the log has a huge number of VLFs, and there are a large number of active transactions, with each one having its LOP_BEGIN_XACT log record in successive VLFs. If each time log truncation happens, a single transaction has committed, and it only manages to clear one VLF, it could be that the speed of log truncation is just waaaay slower than the speed of log record generation, and so the log is having to grow to accommodate it. I’d use my script here to see how many active transactions there are, monitor VLFs with DBCC LOGINFO, and also track the Log Truncations and Log Growths counters for the database in the Databases perfmon object. I need to see if I can engineer this case and repeatably see NOTHING.
  • CHECKPOINT: This value means that a checkpoint hasn’t occurred since the last time log truncation occurred. In the simple recovery model, log truncation only happens when a checkpoint completes, so you wouldn’t normally see this value. When it can happen is if a checkpoint is taking a long time to complete and the log has to grow while the checkpoint is still running. I’ve seen this on one client system with a very poorly performing I/O subsystem and a very large buffer pool with a lot of dirty pages that needed to be flushed when the checkpoint occurred.
  • LOG_BACKUP: This is one of the most common values to see, and says that you’re in the full or bulk_logged recovery model and a log backup hasn’t occurred. In those recovery models, it’s a log backup that performs log truncation. Simple stuff. I’d check to see why log backups are not being performed (disabled Agent job or changed Agent job schedule? backup failure messages in the error log?)
  • ACTIVE_BACKUP_OR_RESTORE: This means that there’s a data backup running or any kind of restore running. The log can’t be truncated during a restore, and is required for data backups so can’t be truncated there either.
  • ACTIVE_TRANSACTION: This means that there is a long-running transaction that is holding all the VLFs active. The way log truncation works is that it goes to the next VLF (#Y) from the last one (#X) made inactive last time log truncation works, and looks at that. If VLF #Y can’t be made inactive, then log truncation fails and the log_reuse_wait_desc value is recorded. If a long-running transaction has its LOP_BEGIN_XACT log record in VLF #Y, then no other VLFs can be made inactive either. Even if all other VLFs after VLF #Y have nothing to do with our long-running transaction – there’s no selective active vs. inactive. You can use this script to see all the active transactions.
  • DATABASE_MIRRORING: This means that the database mirroring partnership has some latency in it and there are log records on the mirroring principal that haven’t yet been sent to the mirroring mirror (called the send queue). This can happen if the mirror is configured for asynchronous operation, where transactions can commit on the principal before their log records have been sent to the mirror. It can also happen in synchronous mode, if the mirror becomes disconnected or the mirroring session is suspended. The amount of log in the send queue can be equated to the expected amount of data (or work) loss in the event of a crash of the principal.
  • REPLICATION: This value shows up when there are committed transactions that haven’t yet been scanned by the transaction replication Log Reader Agent job for the purpose of sending them to the replication distributor or harvesting them for Change Data Capture (which uses the same Agent job as transaction replication). The job could have been disabled, could be broken, or could have had its SQL Agent schedule changed.
  • DATABASE_SNAPSHOT_CREATION: When a database snapshot is created (either manually or automatically by DBCC CHECKDB and other commands), the database snapshot is made transactionally consistent by using the database’s log to perform crash recovery into the database snapshot. The log obviously can’t be truncated while this is happening and this value will be the result. See this blog post for a bit more info.
  • LOG_SCAN: This value shows up if a long-running call to fn_dblog (see here) is under way when the log truncation is attempted, or when the log is being scanned during a checkpoint.
  • AVAILABILITY_REPLICA: This is the same thing as DATABASE_MIRRORING, but for an availability group (2012 onward) instead of database mirroring.

Ok – so maybe not such a quick consideration of the various values :-) Books Online has basic information about these values here.

So which one would *I* be the most worried to see for a 24 x 7, critical database?

Let’s think about some of the worst case scenarios for each of the values:

  • NOTHING: It could be the scenario I described in the list above, but that would entail a workload change having happened for me to be surprised by it. Otherwise it could be a SQL Server bug, which is unlikely.
  • CHECKPOINT: For a critical, 24×7 database, I’m likely using the full recovery model, so it’s unlikely to be this one unless someone’s switched to simple without me knowing…
  • LOG_BACKUP: This would mean something had happened to the log backup job so it either isn’t running or it’s failing. Worst case here would be data loss if a catastrophic failure occurred, plus the next successful log backup is likely to be very large.
  • ACTIVE_BACKUP_OR_RESTORE: As the log is growing, if this value shows up then it must be a long-running data backup. Log backups can run concurrently so I’m not worried about data loss of a problem occurs.
  • ACTIVE_TRANSACTION: Worst case here is that the transaction needs to be killed and then will take a long time to roll back, producing a lot more transaction log before the log stops growing.
  • DATABASE_MIRRORING: Worst case here is that a crash occurs and data/work loss occurs because of the log send queue on the principal, but only if I don’t have log backups that I can restore from. So maybe we’re looking at a trade off between some potential data loss or some potential down time (to restore from backups).
  • REPLICATION: The worst case here is that replication’s got itself badly messed up for some reason and has to be fully removed from the database with sp_removedbreplication, and then reconfigured again.
  • DATABASE_SNAPSHOT_CREATION: The worst case here is that there are some very long running transactions that are still being crash recovered into the database snapshot and that won’t finish for a while. It’s not possible to interrupt database snapshot creation, but it won’t affect anything in the source database apart from log truncation.
  • LOG_SCAN: This is very unlikely to be a problem.
  • AVAILABILITY_REPLICA: Same as for database mirroring, but we may have another replica that’s up-to-date (the availability group send queue could be for one of the asynchronous replicas that is running slowly, and we have a synchronous replica that’s up-to-date.

The most worrying value is going to depend on what is likely to be the biggest problem for my environment: potential data loss, reinitialization of replication, very large log backup, or lots *more* log growth from waiting for a long-running transaction to commit or roll back.

I think I’d be most worried to see ACTIVE_TRANSACTION, as it’s likely that the offending transaction will have to be killed which might generate a bunch more log, cause more log growth, and more log to be backed up, mirroring, replicated, and so on. This could also lead to disaster recovery problems too – if a long-running transaction exists and a failover occurs, the long-running transaction has to be rolled back before the database is fully available (even with fast recovery in Enterprise Edition).

So there you have it. 37% of you agreed with me. I discussed this with Glenn just after I posted it and we agreed on ACTIVE_TRANSACTION as the most worrying type (so I didn’t just pick this because the largest proportion of respondents did).

I’m interested to hear your thoughts on my pick and the scenario, but please don’t rant about how I’m wrong or the scenario is bogus, as there is no ‘right’ answer, just opinions based on experience.

And that’s the trick when it comes to performance troubleshooting – although it’s an art and a science, so much of how you approach it is down to your experience.

Happy troubleshooting!

The post What is the most worrying cause of log growth (log_reuse_wait_desc)? appeared first on Paul S. Randal.


When is fast recovery used?

$
0
0

It’s been a bit light on technical posts here over the last few months but now that summer’s over I’ll be ramping up again with lots of cool stuff planned.

First up is a question that came up on the MCM distribution list this morning. There was a discussion of fast recovery (which I explained in detail in the post Lock Logging and Fast Recovery back in 2009), but in a nutshell is the ability of Enterprise Edition to allow access to a database after the REDO (rolling forward committed transactions) phase of crash recovery has completed and before the UNDO (rolling back uncommitted transactions) phase of crash recovery has completed. The idea is that UNDO can take much longer than REDO, so early access to the database is a good thing, hence it being an Enterprise Edition feature (from SQL Server 2005 onward).

The question essentially became: when is fast recovery used?

The answer is that it’s used whenever a database is started up and needs to have recovery run on it. This means fast recovery will be used:

  • When SQL Server starts up after a crash or shutdown where a database was not cleanly shut down
  • After a cluster failover
  • After a database mirroring failover
  • After an availability group failover
  • When a database state is changed to ONLINE and crash recovery needs to be run

Note that I did not include:

  • When restoring a database from backups
  • When bringing a log shipping secondary database online (this is restoring from backups)

Fast recovery is NOT used during a restore operation. You’ll read in some places online that it is, but those places are incorrect.

So why isn’t it used during a restore sequence?

It’s to do with the underlying mechanism that allows fast recovery. Operations that make changes to a database are logged, and the log record includes a bitmap of what locks were held at the time (examples of this are in the blog post I referenced above). When crash recovery runs, the REDO phase also acquires all the locks necessary to do the UNDO phase, as the REDO phase knows which transactions in the log being recovered need to be rolled back. At the end of the REDO phase, access can be given to the database because recovery can guarantee that no user can block the UNDO phase, as the UNDO phase locks are already held.

So why doesn’t that mechanism work for restores? Well restore doesn’t do one REDO and one UNDO like crash recovery does. For each backup that is restored in the restore sequence, the REDO phase of recovery is performed. This avoids having a really long REDO phase at the end of the restore sequence (which could be, say, a week’s worth of transactions spread over tens or hundreds of backups), and having to have a huge transaction log to hold all those log records.

At the end of the restore sequence, all necessary REDO has already been performed, but the REDO operations have NOT been acquiring UNDO locks. The UNDO locks aren’t acquired because UNDO isn’t likely to be the next phase during a restore sequence. It’s likely to be another restore operation. In that case, it’s likely that some of the transactions that were uncommitted at the end of the last restore become committed during the next restore, so if UNDO locks had been acquired, they would have to be released again. This would involve either rescanning the log records involved or keeping track of which in-restore transactions had acquired which locks. Either of these would be complicated and time consuming, so the benefit hasn’t been deemed worthwhile for the engineering effort involved.

So no fast recovery during restores.

But hold on, I hear you say, database mirroring is just a constant REDO process so how come fast recovery works for that? Back in SQL Server 2005, when a database mirroring failover occurred, the database was momentarily set offline so that full crash recovery would be run when the database came back online, hence allowing fast recovery to work. From SQL Server 2008 onward, that doesn’t happen any more, so there is a mechanism that figures out what UNDO locks are necessary when a mirroring failover occurs, allowing fast recovery behavior. I guess technically that same mechanism could be ported over to the restore code base, but I think it would be difficult to do, and I don’t think there’s enough demand to make the engineering effort and possible destabilization of the restore code worthwhile.

Hope this helps explain things – let me know if you have any questions.

The post When is fast recovery used? appeared first on Paul S. Randal.

Delayed Durability in SQL Server 2014

$
0
0

One of the cool new features in SQL Server 2014 is delayed durability (available in all Editions), which is described in detail in Books Online here.

I think I’m going to see a lot of people turn this on, as you can get a profound increase in transaction throughput with the right workload. However, I also think a lot of people are going to turn this on without realizing the potential for data loss and making the appropriate trade off.

Why can it give a throughput boost?

I put together a contrived workload with a small table where 50 concurrent clients are updating the same rows, and the database log is on a slow I/O subsystem. Here’s a graph showing my test:

delayed Delayed Durability in SQL Server 2014

At the obvious change point, that’s where I enabled delayed durability, with all transactions being forced to use it. Before the change, the number of Transactions/sec is equal to the number of Log Flushes/sec, as each transaction is holding locks that block all other transactions (I told you it’s a contrived workload). So why the profound jump in Transactions/sec when I forced delayed durability?

Under normal circumstances, when a transaction commits, the commit doesn’t complete until the log block (see this blog post for more details) containing the LOP_COMMIT_TRAN log record for the transaction has been flushed to disk and the write is acknowledged back to SQL Server as having completed, providing the durability of the transaction (the D in the ACID properties of the transaction). The transaction’s locks cannot be dropped until the log flush completes.

In my workload, all the other transactions are waiting for the one that is committing, as they all need the same locks, so Transactions/sec is tied to Log Flushes/sec in this case.

With delayed durability, the transaction commit proceeds without the log block flush occurring – hence the act of making the transaction durable is delayed. Under delayed durability, log blocks are only flushed to disk when they reach their maximum size of 60KB. This means that transactions commit a lot faster, hold their locks for less time, and so Transactions/sec increases greatly (for this workload). You can also see that the Log Flushes/sec decreased greatly as well, as previously it was flushing lots of tiny log blocks and then changed to only flush maximum-sized log blocks.

Note:

  • I was forcing all transactions to be delayed durable, but the facility exists to make the delayed durability choice per transaction too (see Books Online for more details).
  • There’s a bit more to the log block flushing too: under delayed durability, a log block will flush when it fills up, or if a non-delayed durable transaction commits, or if the new sp_flush_log proc is executed.

My good friend Aaron Bertrand over at SQL Sentry has a long post about delayed durability that looks into its performance implications in a little bit more depth so I recommend you check out his post as well.

So this looks great, for the right type of workload. But I bet you’re thinking:

What’s the catch?

Your transactions aren’t durable when they commit. Simple.

Now you may be thinking that if the system crashes, the most you’re going to lose is up to 60KB of transaction log. Wrong. If that last log block contains the LOP_COMMIT_TRAN log record for a long-running transaction, when the system crashes, and that log block isn’t on disk, that whole transaction will roll back during crash recovery. So the potential for work/data loss is greater than just 60KB.

And there’s more:

  • Log backups will not back up that unflushed log block, as it’s not on disk, so non-durable committed transactions may not be contained within a log backup.
  • Non-durable transactions that have committed are not protected by synchronous database mirroring or a synchronous availability group either, as these rely on log block flushes (and transmission to the mirror/replica).

For critical transactions, an sp_flush_log can be used, or per-transaction delayed durability used instead.

So the million-dollar question is:

Should I enable delayed durability?

It depends. Is your business comfortable making the throughput vs. durability trade off? Does enabling it give a throughput boost? If yes to both, go ahead. If no to either, don’t enable it. That’s a very simplistic way of thinking about it, but that’s what it boils down to really.

There are lots of other things you can do to increase the throughput and performance of the transaction log, and I explained them in a blog post series:

As I stated above though, I think a lot of people are going to be wowed by the throughput boost (if their workload benefits) from delayed durability and see this as a no-brainer, without considering the potential for data loss.

Tempting, isn’t it?

The post Delayed Durability in SQL Server 2014 appeared first on Paul S. Randal.

Important change to VLF creation algorithm in SQL Server 2014

$
0
0

Since SQL server 2014 was released back in April last year, there have been some rumblings about changes to how many VLFs are created when the log is grown or auto-grown (I’ll just say auto-grown from now on, as that’s the most common scenario). I experimented a bit and thought I’d figured out the algorithm change. Turns out I hadn’t. There was a question on the MVP distribution list last week that rekindled the discussion and we collectively figured out that the algorithm was behaving non-deterministically… in other words we didn’t know what it was doing. So I pinged my friends in CSS who investigated the code (thanks Bob Ward and Suresh Kandoth!) and explained the change.

The change is pretty profound, and is aimed at preventing lots of auto-growth from creating huge numbers of VLFs. This is cool because having too many (it depends on the log size, but many thousands is too many) VLFs can cause all manner of performance problems around backups, restores, log clearing, replication, crash recovery, rollbacks, and even regular DML operations.

Up to 2014, the algorithm for how many VLFs you get when you create, grow, or auto-grow the log is based on the size in question:

  • Less than 1 MB, complicated, ignore this case.
  • Up to 64 MB: 4 new VLFs, each roughly 1/4 the size of the growth
  • 64 MB to 1 GB: 8 new VLFs, each roughly 1/8 the size of the growth
  • More than 1 GB: 16 new VLFs, each roughly 1/16 the size of the growth

So if you created your log at 1 GB and it auto-grew in chunks of 512 MB to 200 GB, you’d have 8 + ((200 – 1) x 2 x 8) = 3192 VLFs. (8 VLFs from the initial creation, then 200 – 1 = 199 GB of growth at 512 MB per auto-grow = 398 auto-growths, each producing 8 VLFs.)

For SQL Server 2014, the algorithm is now:

  • Is the growth size less than 1/8 the size of the current log size?
  • Yes: create 1 new VLF equal to the growth size
  • No: use the formula above

So on SQL Server 2014, if you created your log at 1GB and it auto-grow in chunks of 512 MB to 200 GB, you’d have:

  • 8 VLFs from the initial log creation
  • All growths up to the log being 4.5 GB would use the formula, so growths at 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5 GB would each add 8 VLFs = 56 VLFs
  • All growths over 4.5 GB will only create 1 VLF per growth = (200 – 4.5) x 2 = 391 VLFs
  • Total = 391 + 56 + 8 = 455 VLFs

455 is a much more reasonable number of VLFs than 3192, and will be far less of a performance problem.

A commenter asked whether compatibility level affects this? No – compatibility level is ignored by the Storage Engine internals.

I think this is an excellent change and I can’t see any drawbacks from it (apart from that it wasn’t publicized when SQL Server 2014 was released). CSS will be doing a comprehensive blog post about this soon, but they were cool with me making people aware of the details of the change ASAP to prevent confusion.

You might think that it could lead to very large VLFs (e.g. you set a 4 GB auto-growth size with a 100 GB log), and it can. But so what? Having very large VLFs is only a problem if they’re created initially and then you try to shrink the log down. At a minimum you can only have two VLFs in the log, so you’d be stuck with two giant VLFs at the start of the log and then smaller ones after you’d grown the log again. That can be a problem that prevents the log being able to wrap around and avoid auto-growth, but that’s not anywhere near as common as having too many VLFs. And that’s NOT a scenario that the new algorithm creates. (As an aside, you can fix that problem by creating a database snapshot and then reverting to it, which deletes the log and creates a 0.5 MB log with two tiny VLFs… it’s a bugfeature that’s been there since 2005, but it breaks your log backup chain when you do it.)

There’s certainly more that can be done in future around VLF management (e.g. fixing the problem I describe immediately above), but this is a giant step in the right direction.

Enjoy!

The post Important change to VLF creation algorithm in SQL Server 2014 appeared first on Paul S. Randal.

Incomplete checkpoints and recovery

$
0
0

Back in 2009 I blogged about how checkpoints work (see How do checkpoints work and what gets logged) and I received a question in email on Monday that I thought would make a good little blog post.

The question is (paraphrased): What happens if a checkpoint starts but doesn’t finish before a crash occurs? Will that checkpoint be used for crash recovery?

The answer is no, it won’t. Now if I left it at that, that really would be a little blog post, so let me explain my answer :-)

The purpose of a checkpoint is to bring the pages in the data files up-to-date with what’s in the transaction log. When a checkpoint ends, there’s a guarantee that as of the LSN of the LOP_BEGIN_CKPT log record, all changes from log records before that point are persisted in the data files on disk. There’s no guarantee about logged changes after that point, only before it. In other words, all the log records before the LSN of the LOP_BEGIN_CKPT log record are no longer required for crash recovery, unless there’s a long running transaction that started before that LSN.

When the checkpoint ends, the boot page of the database (page 9 in file 1 – see here for some more info) is updated with the beginning LSN of the checkpoint (and then if in the SIMPLE recovery mode, any log clearing/truncating can occur).

So if a checkpoint started but didn’t end before a crash, it’s LSN would not be in the boot page and so crash recovery would start from the previous checkpoint. This is good, because an incomplete checkpoint means there’s no guarantee about which logged changes are persisted in the data files, and so crash recovery wouldn’t be able to work correctly from only starting at the beginning of the incomplete checkpoint.

A corollary question could be: How does SQL Server guarantee that there’s always one complete checkpoint in the active portion of the log, in case a crash occurs?

The answer is that log clearing/truncation of a VLF containing an LOP_BEGIN_CKPT log record cannot happen until another complete checkpoint has occurred. In other words, a complete checkpoint has to occur since the last log clearing/truncation before the next one can happen. If a checkpoint hasn’t occurred, the log_reuse_wait_desc for the database in sys.databases will return CHECKPOINT. It’s not common to see this occur, but you might see it if there’s a very long running checkpoint (e.g. a very large update on a system with a slow I/O subsystem so the flushing of data file pages takes a long time) and very frequent log backups, so two log backups occur concurrently with a single checkpoint operation. It could also happen if you’ve messed with the sp_configure recovery interval and set it higher than the default.

Interesting, eh?

REPLICATION preventing log reuse but no replication configured

$
0
0

Last week, for the second time in as many weeks, I was sent a question in email from someone who had a transaction log that was growing out of control. He’d already queried log_reuse_wait_desc for the database (see this post for some more background) and the result was REPLICATION.

The problem was, there was no replication configured for that database. Just for completeness, he tried turning off the publish and merge publish settings for the database with sp_replicationdboption, ran sp_removedbreplication on the database, and then when those didn’t work, he also tried configuring and then removing replication. Nothing worked and the transaction log kept growing.

The problem turned out to be Change Data Capture. CDC uses the replication log scanning mechanism to harvest changes from the database, either piggy-backing on replication’s Log Reader Agent job or creating it’s own capture job if replication isn’t configured. If CDC is configured but the capture job isn’t running, the log_reuse_wait_desc will show as REPLICATION, as the log manager doesn’t have any way to know *why* the replication log scanner is configured, just that it is, and it hasn’t run.

So, if you ever see REPLICATION as the log_reuse_wait_desc and don’t have replication configured, check the is_cdc_enabled flag in sys.databases too. And then either figure out why the CDC capture job isn’t running correctly (see Change Data Capture Agent Jobs in this BOL entry), or remove CDC if it’s not supposed to be there (see this BOL entry).

Hope that helps a few people!

Viewing all 47 articles
Browse latest View live