Benjamin Franklin once said, “A place for everything and everything in it’s place“. While I assume that he was not thinking about NULL values in a SQL Server table, the premise is still correct. Some data professionals believe that NULL values do not belong in any table design. However, the reality is that they exist and I do not see that changing anytime soon.
Several years ago, I learned that when a table is designed, putting columns that will have the majority of NULL values at the end of the table will help with performance. To be honest, at that time I took that lesson at face value. I never really understood why that would make a difference. Recently, I ran across my notes from that lesson and decided to investigate it further.
Let’s examine a situation where a table has a variety of NULL placements. We will create a table, NullPlacementDemo, that has five columns total. An identity column along with four variable character length columns.
USE Scratch GO IF OBJECT_ID('dbo.NullPlacementDemo') IS NOT NULL BEGIN DROP TABLE NullPlacementDemo END GO CREATE TABLE NullPlacementDemo (id int identity(1,1), col1 varchar(10), col2 varchar(10), col3 varchar(10),col4 varchar(10)) GO
Now that the table structure is created, let’s put some data in it. I will insert three records.
- This record will have values in columns 1 & 2, with 3 & 4 NULL
- This record will have values in columns 1 & 3, with 2 & 4 NULL
- This record will have values in columns 1 & 4, with 2 & 3 NULL
INSERT INTO NullPlacementDemo (col1, col2) SELECT REPLICATE('a',10), REPLICATE('b',10) GO INSERT INTO NullPlacementDemo (col1, col3) SELECT REPLICATE('a',10), REPLICATE('b',10) GO INSERT INTO NullPlacementDemo (col1, col4) SELECT REPLICATE('a',10), REPLICATE('b',10) GO
Note, that in order to make calculations easier, I am filling the variable length columns to their maximum capacity of ten characters. Also, note that the amount of data inserted is the same on all three records.
Anatomy of a Record
In order to understand why NULL placement matters, we need to investigate the anatomy of the record itself. I highly recommend the post by Paul Randal, “Inside the Storage Engine: Anatomy of a Record”. This post does an excellent job of telling us about all of the crucial elements that comprises each record.
Taking the information from Paul’s blog we can deduce that the size for each record should be 37 bytes. Remember that we inserted the exact same amount of data for all three records.
Regarding the calculations on the left, the tag bytes, NULL bitmap location, number of columns in NULL bitmap, and variable column count are fixed values. The others are determined on how the table is configured:
- Fixed Data Length – the total size in bytes of the fixed data length. We have an integer data type for the identity, so four bytes
- Bit for every column – ‘N’ represents the total number of columns. 5/8 = 0.625 but you cannot have a partial byte, so we round up to one byte.
- Variable Column Offset – ‘N’ represents the total number of columns that have a value. We only inserted two values, therefore four bytes
Remember that we need to turn on trace flag 3604 to get the output of the DBCC Page command to send the results to the SSMS window.
DBCC IND(Scratch,NullPlacementDemo,1) GO DBCC TRACEON(3604) DBCC PAGE('Scratch',1,300,1) DBCC TRACEOFF(3604) GO
The results of DBCC Ind tells me that I need to look at page ID 300 to see the contents of the page. The page ID will potentially different on your system.
Once we determine which page to look at, we’ll use the DBCC Page command to look at the page itself. For simplicity, we are going to use the dump style of one. This will display the size for each corresponding record.
We confirmed that the first record is 37 bytes, which matches the calculation above.
Wait. Hold the phone!
The other record sizes are different! How can that be? We inserted the EXACT same data each time! You will recall that in the first record both values were in conjunction to each other. The second record had a NULL value between them and the third record had two NULL values between them. The only thing different between all three records was the NULL placement.
The Truth Shall Set You Free
The truth is how SQL Server processes the values for each column along with values in the NULL bitmap. Paul’s article from above teaches us, among other things:
- The NULL bitmap keeps track whether or not the column has a NULL value, even if it’s non-nullable
- The variable column offset stores the end of each column value
Let’s look closer at the NULL bitmap, specifically for the second record. Remember that it’s zero based, thus the second row is in slot one. Using the output of DBCC Page, we can see the NULL Bitmap byte. This is in hexadecimal.
If we convert the value of fourteen to binary, we get the value 10100. Zero indicates that the column is not null. Reading right to left, this value tells SQL Server that:
- the first two columns are not null (zero) – 10100
- the third column is null – 10100
- the fourth column is not null – 10100
- the fifth column is null – 10100
Using this information in conjunction with the variable column offset array it has to keep track of that third column, which is NULL. We recall that the variable column offset array stores the end location of the column value. This allows SQL Server to know where the start of the next column is as well as provide an easy way to determine how big the value is.
Even though the value is NULL in the third column, it has to allocate two bytes in order to track where that column ends. If it did not allocate these two bytes, it wouldn’t know where the fourth column started since, according to the NULL bitmap, is not null.
We can further see this by again looking at the DBCC Page output:
The underlined three in the above image represents the number of variable columns that are present. This means that there are three – two byte entries for these columns. They are
- Col1 – 0x001d (not null)
- Col2 – 0x001d (Null)
- Col3 – 0x0027 (not null)
These are highlighted in the above image. If we convert 0x001d to decimal, this is 29. If we count over 29 bytes from the beginning of the record, we end up at the last ’61’, as shown below:
This represents the end of the first column. The second column, which is NULL, has a length of 0 so also ends at the same location, 0x001d.
The third two byte entry, 0x0027 is 39 in decimal. This would correlate to the last ’62’ in the output above. You’ll also notice that this is the same as the length of the record.
Since SQL Server has to store these extra two bytes for this NULL value, the size is two bytes larger, 39. Given that the third record has two NULL values between columns one and four, the record size is four bytes larger than the first record, thus 41 bytes in total.
At the end of the day, SQL Server is all about how many bytes it has to read/write to get the job done. If you have a table that has 100 columns with a mixture of NULLable columns and multi-million records, could this impact you? Possibly. If the table design calls for nullable columns and you organize them appropriately, you will help to reduce the amount of bytes that SQL Server will have to handle. A number of factors may affect your milage of course, but reducing the amount of I/O is never a bad thing.
© 2015, John Morehouse. All rights reserved.