Why NULL Placement Matters

Last Updated on June 4, 2015 by John Morehouse

showmethebenjamins — Not those Benjamin’s

Benjamin Franklin once said, “A place for everything and everything in it’s place“. While I assume that he was not thinking about NULL values in a SQL Server table, the premise is still correct. Some data professionals believe that NULL values do not belong in any table design. However, the reality is that they exist and I do not see that changing anytime soon.

Several years ago, I learned that when a table is designed, putting columns that will have the majority of NULL values at the end of the table will help with performance. To be honest, at that time I took that lesson at face value. I never really understood why that would make a difference. Recently, I ran across my notes from that lesson and decided to investigate it further.

Let’s examine a situation where a table has a variety of NULL placements. We will create a table, NullPlacementDemo, that has five columns total. An identity column along with four variable character length columns.

USE Scratch
GO
IF OBJECT_ID('dbo.NullPlacementDemo') IS NOT NULL
	BEGIN
		DROP TABLE NullPlacementDemo
	END
GO
CREATE TABLE NullPlacementDemo (id int identity(1,1), col1 varchar(10), col2 varchar(10), col3 varchar(10),col4 varchar(10))
GO

Now that the table structure is created, let’s put some data in it. I will insert three records.

This record will have values in columns 1 & 2, with 3 & 4 NULL
This record will have values in columns 1 & 3, with 2 & 4 NULL
This record will have values in columns 1 & 4, with 2 & 3 NULL

INSERT INTO NullPlacementDemo (col1, col2)
	SELECT REPLICATE('a',10), REPLICATE('b',10)
GO
INSERT INTO NullPlacementDemo (col1, col3)
	SELECT REPLICATE('a',10), REPLICATE('b',10)
GO
INSERT INTO NullPlacementDemo (col1, col4)
	SELECT REPLICATE('a',10), REPLICATE('b',10)
GO

Note, that in order to make calculations easier, I am filling the variable length columns to their maximum capacity of ten characters. Also, note that the amount of data inserted is the same on all three records.

Anatomy of a Record

In order to understand why NULL placement matters, we need to investigate the anatomy of the record itself. I highly recommend the post by Paul Randal, “Inside the Storage Engine: Anatomy of a Record”. This post does an excellent job of telling us about all of the crucial elements that comprises each record.

NullPlacementMatters_3 Taking the information from Paul’s blog we can deduce that the size for each record should be 37 bytes. Remember that we inserted the exact same amount of data for all three records.

Regarding the calculations on the left, the tag bytes, NULL bitmap location, number of columns in NULL bitmap, and variable column count are fixed values. The others are determined on how the table is configured:

Fixed Data Length – the total size in bytes of the fixed data length. We have an integer data type for the identity, so four bytes
Bit for every column – ‘N’ represents the total number of columns. 5/8 = 0.625 but you cannot have a partial byte, so we round up to one byte.
Variable Column Offset – ‘N’ represents the total number of columns that have a value. We only inserted two values, therefore four bytes

In order to confirm this, we can use two of my favorite undocumented tools, DBCC IND and DBCC PAGE to look at the internal structure of the records.

Remember that we need to turn on trace flag 3604 to get the output of the DBCC Page command to send the results to the SSMS window.

DBCC IND(Scratch,NullPlacementDemo,1)
GO
DBCC TRACEON(3604)
DBCC PAGE('Scratch',1,300,1)
DBCC TRACEOFF(3604)
GO

The results of DBCC Ind tells me that I need to look at page ID 300 to see the contents of the page. The page ID will potentially different on your system.

Once we determine which page to look at, we’ll use the DBCC Page command to look at the page itself. For simplicity, we are going to use the dump style of one. This will display the size for each corresponding record.

We confirmed that the first record is 37 bytes, which matches the calculation above.

Wait. Hold the phone!

The other record sizes are different! How can that be? We inserted the EXACT same data each time! You will recall that in the first record both values were in conjunction to each other. The second record had a NULL value between them and the third record had two NULL values between them. The only thing different between all three records was the NULL placement.

The Truth Shall Set You Free

The truth is how SQL Server processes the values for each column along with values in the NULL bitmap. Paul’s article from above teaches us, among other things:

The NULL bitmap keeps track whether or not the column has a NULL value, even if it’s non-nullable
The variable column offset stores the end of each column value

Let’s look closer at the NULL bitmap, specifically for the second record. Remember that it’s zero based, thus the second row is in slot one. Using the output of DBCC Page, we can see the NULL Bitmap byte. This is in hexadecimal.

If we convert the value of fourteen to binary, we get the value 10100. Zero indicates that the column is not null. Reading right to left, this value tells SQL Server that:

the first two columns are not null (zero) – 10100
the third column is null – 10100
the fourth column is not null – 10100
the fifth column is null – 10100

Using this information in conjunction with the variable column offset array it has to keep track of that third column, which is NULL. We recall that the variable column offset array stores the end location of the column value. This allows SQL Server to know where the start of the next column is as well as provide an easy way to determine how big the value is.

Even though the value is NULL in the third column, it has to allocate two bytes in order to track where that column ends. If it did not allocate these two bytes, it wouldn’t know where the fourth column started since, according to the NULL bitmap, is not null.

We can further see this by again looking at the DBCC Page output:

The underlined three in the above image represents the number of variable columns that are present. This means that there are three – two byte entries for these columns. They are

Col1 – 0x001d (not null)
Col2 – 0x001d (Null)
Col3 – 0x0027 (not null)

These are highlighted in the above image. If we convert 0x001d to decimal, this is 29. If we count over 29 bytes from the beginning of the record, we end up at the last ’61’, as shown below:

This represents the end of the first column. The second column, which is NULL, has a length of 0 so also ends at the same location, 0x001d.

The third two byte entry, 0x0027 is 39 in decimal. This would correlate to the last ’62’ in the output above. You’ll also notice that this is the same as the length of the record.

Since SQL Server has to store these extra two bytes for this NULL value, the size is two bytes larger, 39. Given that the third record has two NULL values between columns one and four, the record size is four bytes larger than the first record, thus 41 bytes in total.

Summary

At the end of the day, SQL Server is all about how many bytes it has to read/write to get the job done. If you have a table that has 100 columns with a mixture of NULLable columns and multi-million records, could this impact you? Possibly. If the table design calls for nullable columns and you organize them appropriately, you will help to reduce the amount of bytes that SQL Server will have to handle. A number of factors may affect your milage of course, but reducing the amount of I/O is never a bad thing.

7 Replies to “Why NULL Placement Matters”

Nikhil Mahajan says:

June 10, 2015 at 08:08

This is really very helpful information. Thank you for sharing this article.
I have one question regarding this article.
Only null value between non null values matters..
I mean If i have 10 columns and only 1st has value rest 9 are null values.. Then this column placement is okay.

But if there even 1 null value between first to 10th column .. that will add extra 2 byte per record.

And what are the other benefits of this approach apart from reducing size of database ?

Loading...
John Morehouse says:

June 10, 2015 at 08:26

Hi Nikhil –

Thanks for stopping by! Glad you liked the post. You are correct; that column placement is okay.

The only other benefit that I can think of, at the moment, is helping to reduce the I/O that SQL server will need to do. If it doesn’t need to consume that address for those 2 extra bytes per column then the server will save on CPU cycles. In conjunction with I/O, proper column placement would also help to increase page density, which also helps to increase I/O. If more data can fit onto a page, then SQL server has less pages to read to find the data.

Overall this is probably minor in the grand scheme of things but it’s something to keep in mind when designing table structures.

Thanks again for stopping by!

Loading...
Marilyn says:

June 10, 2015 at 08:50

Thank you so much for sharing your knowledge with your readers. Great article and simple to understand. Happy SQL!

Loading...
Alex Friedman says:

June 10, 2015 at 08:54

Very interesting, thanks!

Loading...
Jesse says:

June 10, 2015 at 12:50

Very interesting and well written article! Thank you for sharing, I had not heard this piece of info before.

Loading...
LA says:

June 11, 2015 at 08:17

First of all, thank you for your article, well written.

Second, isn’t it the time to open SSMS and check out if it really matter or not from a performance perspective (CPU, IO, time for instance)?

Loading...
John Morehouse says:

June 14, 2015 at 09:27

Hi LA –

Thanks for stopping by! Yes, you are correct. As I finish out designing the table, I would do what is call a ‘fanout’. I would calculate the maximum row size for the table and then the subsequent row density. After that, if time allowed, I would load test it to see how the database functions not only under load, but also in volume. But yes, I would open SSMS and take the database for a road test. 😉

Thanks again!

Loading...