Quantcast
Channel: SQL Database Engine Blog
Viewing all 177 articles
Browse latest View live

Reducing the Size of your Database in SQL Server 2005/SP2

$
0
0

An exciting new feature in SQL Server 2005/SP2 is Vardecimal Storage Format. This storage format lets you reduce the size of your table significantly if the table has one of more columns of type decimal or numeric without requiring any changes to your application.

 

Up until now, the decimal and numeric types are stored as fixed length data in SQL Server. Both of these types are functionally equivalent and have a format of (p, s) where p is the precision (number of decimal digits) and s is the scale representing number of digits after the decimal. Depending on the precision (it can be declared in the range from 1 to 38), the decimal value can take anywhere from 5 bytes to 17 bytes. This can bloat the size of the table, especially when you have small decimal values for a column declared with high precision requirement. This issue is similar to char (17) vs. varchar(17). In this case, if most of your character data is 1 or 2 characters long but the max value is 17 characters long, you can reduce the size of the table by declaring the column to be of type varchar(17) instead of char(17).

 

The new vardecimal storage format stores the decimal/numeric values in a variable length storage format. It provides efficient storage of decimal/numeric data by eliminating the leading/trailing zeros and only storing the minimum required bytes. Using this format, you can get significant space savings (depending on your data distribution) in the space required to store decimal/numeric data. You can enable vardecimal storage format at a table level.

 

In our in-house testing, we have seen significant reduction in the size of the FACT table(s) that has large number of decimal columns. FACT tables are typically the largest table in a Data Warehouse. Here are some the numbers from our testing.

 

Best case reduction in the size of the table

57%

69%

51%

 

In my next post http://blogs.msdn.com/sqlserverstorageengine/archive/2006/11/13/estimating-the-space-savings-with-vardecimal-storage-format.aspx , I will describe when and how to enable Vardecimal storage format on one or more tables in your database.

 


Estimating the space savings with vardecimal storage format

$
0
0

Before enabling Vardecimal storage format, you may want to know the potential reduction in the size of the table. Clearly, if the table has no decimal/numeric columns, there will be no savings. Note, that even if you have a table with decimal/numeric column types, there is no guarantee that you will be able to reduce the size of the table by enabling Vardecimal storage format. Again, this issue is similar to VARCHAR (17) vs. CHAR(17). If all the values in the column type has 17 characters, then average row length will be larger with VARCHAR(17) because it will be stored in the variable portion of the record structure. Recall, you need 2 bytes to store the offset of the variable length column. Also, if VARCAHR(17)  is the only variable length column in the table, there is another overhead of 2 bytes to store number of variable length columns in the row.  So in this case, the worst case, declaring column type as VARCAHR(17) may cost you 4 bytes more for each row than CHAR(17). 

 

SQL Server 2005/SP2 provides you a tool, a stored procedure, to estimate the ‘reduction in row size’ with Vardecimal storage format. The following example illustrates the reduction in row size for two tables that have same scheme but different data, t_decimal being the best case and t_decimal2 being the worst case (where each decimal value has max 38 digits as allowed by the declared precision)

 

create table t_decimal (c1 int, c2 decimal(10,2), c3 decimal (38,2), c4 varchar(10))

go

 

create table t_decimal2 (c1 int, c2 decimal(10,2), c3 decimal (38,2), c4 varchar(10))

go

 

-- insert rows into these tables.

declare @i int

select @i = 0

while (@i < 1000)

begin

        insert into t_decimal values (1, 0.0,0.0, 'hello')

        insert into t_decimal2 values

                (1,12345678.99,123456789012345678901234567890123499.99, 'hello')

         set @i = @i + 1

end

 

-- Now let us find the potential space savings for each of these tables

-- This is the best case

exec sys.sp_estimated_rowsize_reduction_for_vardecimal 't_decimal'

 

Here is the output. Note, in this case, you can reduce the size of the row by almost 50%. Also, if you have more decimal/numeric columns, the savings will be proportionally larger.

 

avg_rowlen_fixed_format  avg_rowlen_vardecimal_format    row_count

--------------------------------------- --------------------------

46.00                      24.00                         1000

 

 

 

-- this is worst case. Note in this case, the average row length actually increases

-- with Vardecimal storage format.

--

exec sys.sp_estimated_rowsize_reduction_for_vardecimal 't_decimal2'

 

avg_rowlen_fixed_format  avg_rowlen_vardecimal_format   row_count

-------------------------   ---------- --------------------    ------

46.00                                 48.00                                       1000

 

 

In the next blog

http://blogs.msdn.com/sqlserverstorageengine/archive/2006/11/13/enabling-vardecimal-storage-format.aspx , I will discuss how to enable vardecimal storage format on the table

Enabling vardecimal storage format

$
0
0

First, this feature is only available in EE and Dev SKU.

 

Enabling vardecimal storage format on a table is a two step process as follows:

 

First you need to enable database for Vardecimal storage format. This can be done using the stored procedure sp_db_vardecimal_storage_format. The exact command is as follows

 

exec sp_db_vardecimal_storage_format '<dbname>', 'ON'

 

When the above command is executed, SQL Server internally bumps the database version number but no tables are enabled for Vardecimal storage format. The database version needs to be bumped to indicate that the data in this database can potentially have a different storage format (i.e. the Vardecimal storage format). This is used to prevent attaching a Vardecimal enabled database to earlier versions of SQL Server 2005 as those versions don’t know how to interpret the new storage format. You can only enable Vardecimal storage format on user databases.

 

To find out which database(s) is enabled for Vardecimal storage format, you can use the following command

 

exec sp_db_vardecimal_storage_format

 

Once you have enabled the database for Vardecimal storage format, you can now choose to enable one or more tables (based on the potential disk savings using the tool described earlier) with this new storage format as follows

 

sp_tableoption '<table-name>', 'vardecimal storage format', 1

 

This command, potentially an expensive one (same order as creating an index), converts all the rows in the table containing columns of type decimal/numeric to Vardecimal storage format. During this conversion, the table is locked and is not available. If the table has no clustered index, then all non-clustered indexes are rebuilt because the RIDs of the rows will change due to storage format change. However, if you have clustered index on the table, then only the non-clustered indexes containing decimal numeric column as key or included column need to be rebuilt. Note, that you cannot enable vardecimal storage format on all tables. Before enabling vardecimal storage format, SQL Server needs to make sure that we can always revert back to static storage format for decimal data and that update of decimal/numeric data always succeeds. If these conditions are not satisfied, the conversion to vardecimal storage format is denied.

 

To disable Vardecimal storage format on the table, you can use the following command

 

sp_tableoption '<table-name>', 'vardecimal storage format', 0

 

SQL Server guarantees that you can always revert back to ‘static’ storage format unless you run out of disk space during conversion. Note, that the space overhead, to enable/disable Vardecimal storage format, is of the same order as building an index and it is not an online operation.

 

You can use the following command to find out which tables(s) has been enabled for Vardecimal storage format

 

select objectproperty(object_id('<table-name>’), 'TableHasVarDecimalStorageFormat')

 

or

 

select name, object_id, type_desc

from sys.objects

where objectproperty(object_id, N'TableHasVarDecimalStorageFormat') = 1

 

 

I will discuss the boundary conditions for enabling vardecimal storage format and the implications of changing database version on backup/recovery and mirroring in next blogs.

Vardecimal Storage Format and its implications on Backup/Recovery

$
0
0

Has any one tried restoring or attaching a SQL Server 2005 database on SQL Server 2000? You will find that SQL Server 2000 will fail this restore or attach. The reason is simple. SQL Server 2000 does not understand the physical structure changes in SQL Server 2005 database. SQL Server detects this incompatibility using the database version stored in the bootpage of the database. You cannot attach/restore a database with higher database version on a SQL Server that does not support it.  Note that it is always possible to attach/restore a SQL Server 2000 database to SQL Server 2005. In fact, this must be allowed as otherwise customers will not be able to upgrade their database to run with newer SQL Server versions. In this case, the SQL Server 2005 understands the physical structures of SQL Server 2000 database and converts them to SQL Server 2005 specific structures during the upgrade process.

 

You may wonder why I am telling you all this in the context of Vardecimal storage format? Well, the Vardecimal storage format is a new storage format, introduced in SQL Server SP2, to store decimal/numeric data. This new storage format is not understood by SQL Server 2005 or SQL Server 2005/SP1. Just like you cannot attach a SQL Server 2005 database to SQL Server 2000, attaching/restoring a SQL Server 2005/SP2 database that has been enabled for Vardecimal storage format to earlier versions of SQL Server 2005 will fail. SQL Sever implements this by incrementing the database version number when the database is enabled for Vardecimal storage format. When you disable Vardecimal storage format on a database, its database version is decremented so that the database can now be attached to earlier versions of SQL Server 2005. There is one interesting scenario/requirement that you need to be aware of when disabling Vardecimal storage format on the database.

 

Scenario:

·         Do a full physical backup (DB) on SQL Server 2005/SP1. This backup will have the database version that is supported by SQL Server 2005/SP1.

·         Attach the database to SQL Server 2005/SP2.

·         Enable Vardecimal storage format on the database.

·         Create a table and enable it for Vardecimal storage format

·         Insert one row. The log records generated by this will have data in Vardecimal storage format

·         Disable Vardecimal storage format on the database.

·         Do the logs backup (L). Again, this log backup will have the database version that is supported by SQL Server 2005/SP1.

 

Now if you restore (DB + L) on SQL Server/SP1, it will not be able to detect that there were log records with Vardecimal storage format and will potentially fail unpredictably.  To prevent this situation, SQL Server requires you to set database in SIMPLE recovery mode before you can disable Vardecimal storage format on the database. When you do that, the log chain is broken and the above scenario is prevented. Note that you are not required to set database in SIMPLE recovery mode when disabling Vardecimal storage format on individual table(s).  It is ONLY needed when disabling it on the database.

Boundary conditions for enabling vardecimal storage format

$
0
0

Have you ever tried updating a variable length column and fail? Well, it can happen if the modified row cannot fit on the page. One simple example of this as follows

 

create table boundary (c1 char(8000), c2 char(20),

c3 varchar(23), c4 decimal(38,2))

go

 

When you create this table, it gives you the following warning

 

Warning: The table "boundary" has been created, but its maximum row size exceeds the allowed maximum of 8060 bytes. INSERT or UPDATE to this table will fail if the resulting row exceeds the size limit.

 

 

-- this is the max allowed value

insert into boundary values ('a', 'b', replicate ('1', 12), 0.0)

 

-- this update fails with the error

Msg 511, Level 16, State 1, Line 1

Cannot create a row of size 8061 which is greater than the allowable maximum of 8060.The statement has been terminated.

 

update boundary set c3 = replicate('1', 13)

 

However, if you update a fixed length column value, you will never get the error 511 because the fixed length value, by definition, retains the same size regardless of the value. Most applications that update a fixed length column value don’t check for 511 error.

 

You may wonder what happens when we enable vardecimal storage format on a table? Since the numeric/decimal data is now stored using variable length storage, an update to decimal/numeric value can potentially fail just like it failed when updating varchar column in the example before. Since applications are not expecting updates to decimal/numeric column to fail (unless of course there are constraints defined on the decimal/numeric value), the application may encounter unexpected failure. To prevent this from happening, SQL Server only allows enabling vardecimal storage format on a table if and only if it can gurantee that updates to decimal/numeric value will nevel fail with error 511. So for the table in the above example, if I enable vardecimal storage format as follows

 

sp_tableoption'boundary','vardecimal storage format', 1

go

 

You will get the following error.

 

Msg 1721, Level 16, State 2, Procedure sp_tableoption, Line 129

Altering table 'boundary' has failed because the row size using vardecimal storage format exceeds the maximum allowed table row size of 8060 bytes.

Reason:

For the insert above, since the decimal column as 0.0 value, it takes only 2 bytes (offset array) of storage in Vardecimal storage format. Had SQL Server allowed Vardecimal storage format on ‘boundary’ table, then the following update to decimal data can fail:

 

Steps:

(1)   Update column c3 to ‘1234567890124567890’. When you do that, the row size becomes 8053 bytes.

(2)   Now update the decimal column c4 to max digits (38) allowed which requires 18 additional bytes of storage but we have only space for (8060 - 8053) = 7 bytes 

 

Please ignore the size computations, if you are not familiar with it but the point is that SQL Server guarantees that your update to decimal/numeric will not fail with error 511.  You can read http://blogs.msdn.com/sqlserverstorageengine/archive/2006/06/23/644607.aspx for more details on row format.

 

Since tables defined in most application schemas, the row size is much less than 8060, we don’t need to concern ourselves with this issue. But it is reassuring to know that your application will not be caught off guard when Vardecimal storage format is enabled.

 

 

Besides this, SQL server allows enabling Vardecimal storage format on a table only if it can guarantee that you can always disable it.

 

Data row before and after vardecimal storage format

$
0
0

Paul Randal in one of his earlier BLOGs described DBCC Page paul-tells-all

and the record layout. I thought it will be interesting to show how a row looks before and after the Vardecimal storage format is enabled. So here it is

 

Let us take a simple table

createtable t_simple (c1 char(5), c2 decimal(38,2))

go

 

insertinto t_simple values('aaaaa', 1.0)

go

 

If you run the command DBCC Page with option 3, you will get the following output

 

….

Slot 0 Offset 0x60 Length 29

 

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP    

Memory Dump @0x44D0C060

 

00000000:   10001a00 61616161 61016400 00000000 †....aaaaa.d.....        

00000010:   00000000 00000000 00000200 fc††††††††.............

 

Key things to note here is that row length is 29 bytes computed as follows

  • Record Header = 4 bytes
  • Column C1 = 5 bytes
  • Null bit map and column count = 3 bytes
  • Fixed length decimal value = 17 bytes

 

Now, let us enable Vardecimal storage format on this table. The following shows the row in the new storage format

 

Slot 0 Offset 0x60 Length 18

 

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

 

Memory Dump @0x44E8C060

 

00000000:   30000900 61616161 610200fc 01001200 †0...aaaaa.......        

00000010:   c019†††††††††††††††††††††††††††††††††..                      

 

Slot 0 Column 0 Offset 0x4 Length 5

 

c1 = aaaaa                          

 

Slot 0 Column 1 Offset 0x10 Length 2 [VarDecimal/VarNumeric]

 

c2 = 1.00                        

 

Note, now the row length is 18 bytes. So the size of the row is reduced from 29 bytes to 18 bytes representing a reduction in the size of the row of around 30%. Couple of other interesting points

  • Decimal value is now stored in variable length portion of the record. The value is represented as ‘c019’, which is just 2 bytes.
  • Since C2 now becomes the first variable length column, you see an overhead of 4 bytes for storing variable length column count (2 bytes) and offset array (2 bytes)

White Paper: Reducing Database Size by Using Vardecimal Storage Format

$
0
0

Hermann Daeubler and Sunil Agarwal have co-authored this white paper. Hermann is a Program Manager with the SQL team and is a SAP expert. Hermann was involved with vardecimal storage format team from the very beginning to guide us with the customer scenarios and the performance implications/trade-offs.  This paper provides a general overview of vardecimal storage format usage scenarios, restrictions, database migration to SQL Server 2005/SP2 in the context of vardecimal, space savings and the impact on the performance. You can access the paper at

http://msdn2.microsoft.com/en-us/library/bb508963.aspx.

 

Loading line-delimited JSON files in SQL Server 2016

$
0
0

Loading Line-Delimited JSON files in SQL Server

One of the problem with JSON is the fact that you cannot continuously append JSON messages. If you want to have valid array of JSON objects you will need to surround them with brackets. Once you add final bracket, you cannot add new data.

Line delimited JSON (aka. JSON lines or Newline-delimited JSON) is an alternative JSON format that might be a good choice for continuous serialization of streaming JSON objects.  LD JSON address one of the main issues in standard JSON format – ability to continuously append valid JSON objects. LD-JSON introduces few changes in standard JSON format:

  1. New line is separator between objects
  2. Stream of JSON objects is not surrounded with brackets so you can continuously append JSON information in the file.

An example of LD-JSON content is shown in the following example:

{"time":"2015-11-27T02:33:05.063","ip":"214.0.57.12","request":"/", "method":"GET", "status":"200"}
{"time":"2015-11-27T02:33:06.003","ip":"224.12.07.25","request":"/",method":"GET", "status":"200"}
{"time":"2015-11-27T02:33:06.032","ip":"192.10.81.115","request":"/contact", "method":"POST", "status":"500", "exception":"Object reference not set to an instance of object",”stackTrace”:”…” }
……..
{"time":"2015-11-27T02:37:06.203","ip":"204.12.27.21","request":"/login",method":"GET", "status":"200"}
{"time":"2015-11-27T02:37:12.016","ip":"214.0.57.12","request":"/login", "method":"POST","status":"404", "exception":"Potentially dangerous value was detected in request"}

Now we can read this file with FORMATFILE and run the same report:

INSERT INTO MyTable(time, request, method, status)
SELECT time, request, method, status
 FROM OPENROWSET(BULK 'C:\logs\log-ld-json-2015-11-27.txt', FORMATFILE= 'c:\logs\csv.fmt') AS log_file
      CROSS APPLY OPENJSON(json) WITH( time datetime, status int, method varchar(10), request nvarchar(200)) AS log

You can read these values and load them in standard SQL tables.
You would need to have format file with the following content:

13.0
1
1 SQLCHAR 0 0 "\r\n" 1 json ""

Or you ca use equivalent XML format file:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="8000" COLLATION="Serbian_Latin_100_CI_AS"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="json" xsi:type="SQLNVARCHAR"/>
 </ROW>
</BCPFORMAT>

Note that “json” column in the T-SQL query above is defined in format file.
You can explicitly specify size in format file or change delimiter to \n, e.g.:

13.0
1
1 SQLCHAR 0 80000 "\n" 1 json ""

Moving data from relational to JSON columns and vice versa

$
0
0

Moving data from relational to JSON columns and vice versa

Unlike other pure relational or pure NoSQL databases, Sql Server do not forces you to store data in relational or JSON format. You can choose any format you want, put some data in scalar columns and other in JSON columns. As an example if you have Person table, you can put primary keys, foreign keys, the most commonly used columns such as FirstName and Last name in scalar columns, and other values that are rarely used in JSON key value pairs. This might be useful if you have some columns that depends on type of Person (e.g. Student can have one set of columns, Teacher another, etc.) Instead of breaking Student and Teacher into different tables or adding all possible columns in Person table, you can put all variable columns in JSON text.

Also, if you change your mind and decide that some value stored as JSON field should be moved to regular column, and vice versa, you can easily move data around, e.g. move data from column into JSON text or move some JSON property into regular column.

As an example, you might find that middle name field stored in JSON text should be stored as standard relational column. You can easily create new column and move values from JSON text on some path to the column:

ALTER TABLE Person
ADD MiddleName NVARCHAR(200)

UPDATE Person
SET MiddleName = JSON_MODIFY(AdditionalInfo, '$.info.middleName'),
    AdditionalInfo = JSON_MODIFY(AdditionalInfo, '$.info.middleName', NULL)

In this example, we are reading values from $.info.middleName path and store them into new MiddleName column. We are also updating AdditionalInfo JSON text and delete middleName key. Note that by default JSON_MODIFY suppresses NULL values. Whenever you try to set NULL value, JSON_MODIFY will delete key instead of setting null value. This way, JSON_MODIFY automaticaly cleans unnecessary null values.

Also, you might decide to move values from the existing column to some JSON field. In this case you can modify AdditionalInfo column and add these values on some property path:

UPDATE Person
SET AdditionalInfo = JSON_MODIFY(AdditionalInfo, '$.info.middleName', MiddleName)
GO
ALTER TABLE Person
DROP MiddleName

Once you move data, you can drop the column you don’t need.  New JSON_MODIFY function in SQL Server enables you to easily move values between scalar and JSON columns and redesign your schema.

Loading JSON files from Azure File Storage

$
0
0

Azure File Storage supports SMB protocol, so you can map your local virtual drive to the Azure File storage share using the following procedure:

  1. Create file storage account (e.g. mystorage), file share (e.g. logs), and folder using Azure portal or Azure PowerShell SDK.
  2. Create firewall outbound rule in Windows Firewall on your computer that allows port 445. Note that this port might be blocked by your internet provider. If you are getting DNS error (error 53) in the following step then you have not opened that port or it is blocked by your ISP.
  3. Mount Azure File Storage share as local drive (e.g. t: ) using the following command:
 net use [drive letter] \\[storage name].file.core.windows.net\[share name] /u:[storage account name] [storage account access key]

Example that I have used is:

net use t: \\mystorage.file.core.windows.net\sharejson /u:myaccont hb5qy6eXLqIdBj0LvGMHdrTiygkjhHDvWjUZg3Gu7bubKLg==

Storage account key and primary or secondary storage account access key can be found in the Keys section in Settings on Azure portal.

Now if you setup you application to log data to some log file into Azure File Storage (e.g. log-file.json), you can use queries above to analyze data loaded from path mapped to t: \\mystorage.file.core.windows.net\sharejson\log-file.json

Now, we can store some JSON file in this location. Example of file might be :

[
 {"time":"2015-11-27T02:33:05.063","ip":"214.0.57.12","request":"/", "method":"GET", "status":"200"},
 {"time":"2015-11-27T02:33:06.003","ip":"224.12.07.25","request":"/",method":"GET", "status":"200"},
 {"time":"2015-11-27T02:33:06.032","ip":"192.10.81.115","request":"/contact", "method":"POST", "status":"500", "exception":"Object reference not set to an instance of object","stackTrace":"…"}, ……
 {"time":"2015-11-27T02:37:06.203","ip":"204.12.27.21","request":"/login",method":"GET", "status":"200"}, {"time":"2015-11-27T02:37:12.016","ip":"214.0.57.12","request":"/login", "method":"POST","status":"404", "exception":"Potentially dangerous value was detected in request"}
 ]

We can load this file using standard OPENROWSET(BULK) command that reads entire content using SINGLE_CLOB option, and parse it using OPENJSON function:

SELECT log.*
 FROM OPENROWSET (BULK '\\mystorage.file.core.windows.net\sharejson\log-file.json', SINGLE_CLOB) as log_file
 CROSS APPLY OPENJSON(BulkColumn)
 WITH( time datetime, status varchar(5), method varchar(10),
 exception nvarchar(200)) AS log

Generate stored procedure that imports array of JSON object in table

$
0
0

OPENJSON function enables you to easily write simple statement that loads array of JSON objects in table. Example is:

INSERT INTO dbo.People(Name, Surname)
SELECT Name, Surname
FROM OPENJSON (@json) WITH (Name nvarchar(100), Surname nvarchar(100))

See details in this post OPENJSON – The easiest way to import JSON text into table.

Although this is a simple command, it might be hard to write it if you have wide tables with 20-30 column. Also if some of the columns have special characters, you will need to surround them with [ ] in SQL names, and with ” ” in JSON paths.

Therefore, I have create a function that generates this script – you can download it here. This function looks like this:

CREATE FUNCTION
dbo.GenerateJsonInsertProcedure(@SchemaName sysname, @TableName sysname, @JsonColumns nvarchar(max))
RETURNS NVARCHAR(MAX)

In order to generate Insert stored procedure, you can specify Schema name of your table and table name. Also if you have some columns in table that contain JSON text and if you will have some nested JSON in your input, you can specify list of these columns in @JsonColumns parameter.

Now, let’s see hot it works. I will generate JSON insert stored procedure for Person.Address table:

declare @SchemaName sysname = 'Person' --> Name of the table where we want to insert JSON
declare @TableName sysname = 'Address' --> Name of the table schema where we want to insert JSON
declare @JsonColumns nvarchar(max) = '||' --> List of pipe-separated NVARCHAR(MAX) column names that contain JSON text, e.g. '|AdditionalInfo|Demographics|'
print (dbo.GenerateJsonInsertProcedure(@SchemaName, @TableName, @JsonColumns))

In this case I will just print script that function returns. Output will be:

DROP PROCEDURE IF EXISTS [Person].[AddressInsertJson]
GO
CREATE PROCEDURE [Person].[AddressInsertJson](@Address NVARCHAR(MAX))
AS BEGIN
INSERT INTO Address([AddressLine1],[AddressLine2],[City],[StateProvinceID],[PostalCode],[ModifiedDate])
 SELECT [AddressLine1],[AddressLine2],[City],[StateProvinceID],[PostalCode],[ModifiedDate]
 FROM OPENJSON(@AddressJson)
 WITH (
 [AddressLine1] nvarchar(120) N'strict $."AddressLine1"',
 [AddressLine2] nvarchar(120) N'$."AddressLine2"',
 [City] nvarchar(60) N'strict $."City"',
 [StateProvinceID] int N'strict $."StateProvinceID"',
 [PostalCode] nvarchar(30) N'strict $."PostalCode"',
 [ModifiedDate] datetime N'strict $."ModifiedDate"')
END

Function will go through all columns in the specified table, check what is the type, is it required column (in that case it will generate $.strict modifier in path) and create script. You can modify this query and remove unnecessary columns if you want.

Script that generates insert procedure is here.

What’s new for In-Memory OLTP in SQL Server 2016 since CTP3

$
0
0

SQL Server 2016 is making a lot of enhancements to In-Memory OLTP to make it easier to use and perform even better. In a previous post I listed all the new features that had been included in SQL Server 2016 up to and including CTP3. But we have added a lot of new features since then, including NULLable index key columns, LOB types and auto-update of statistics. Below are all the new features for In-Memory OLTP that we added between CTP3 and RC0. Underneath the list of new features you will find more detailed descriptions of LOBs and other off-row columns, ALTER TABLE improvements, and statistics improvements.

In-Memory OLTP features added between CTP3 and RC0

Transact-SQL Improvements:

  •  Query Surface Area in Native Modules:
    • LOB types [varchar(max), nvarchar(max), and varbinary(max)] for parameters and variables.
    • OUTPUT clause: In a natively compiled stored procedure, INSERT and UPDATE and DELETE statements can now include the OUTPUT clause.
    • @@SPID: this built-in function is now supported with natively compiled T-SQL modules, as well as constraints in memory-optimized table.
  • Support with memory-optimized tables for:
    • NULLable index key columns. It is now allowed to include NULLable columns in the keys of indexes on memory-optimized tables.
    • Large rows: LOB types [varchar(max), nvarchar(max), and varbinary(max)] can be used with columns in memory-optimized tables. In addition, you can have a memory-optimized table with row size > 8060 bytes, even when no column in the table is a LOB type. Detailed considerations are below.
    • UNIQUE indexes in memory-optimized tables. Indexes can now be specified as UNIQUE.
  • Heap scan: the query processor can now scan the rows in a table heap data structure in memory directly. When a full table scan is needed, this is more efficient than a full index scan.
  • Parallel scan: all index types, as well as the underlying table heap, now support parallel scan. This increases the performance of analytical queries that scan large sets of data.
  • Reduced downtime during upgrade: Upgrade from an earlier build of SQL Server 2016 to the latest build of SQL Server no longer runs database recovery. Therefore, data size no longer affects the duration of upgrade. For upgrade and attach/restore from SQL Server 2014, the database is restarted once, therefore the downtime experienced during upgrade of a database from SQL2014 is in the order of [time required for database recovery].
  • Log-optimized and parallel ALTER TABLE: Most ALTER TABLE scenarios now run in parallel and optimize writes to the transaction log. The optimization is that only the metadata changes are written to the transaction log. For a detailed discussion of exceptions, see below.
  • Statistics improvements:
    • Automatic update of statistics is now supported. It is no longer required to manually update statistics.
    • Sampling of statistics is now supported. This improves the performance of statistics collection.
    • Note that automatic recompilation of native modules is not supported. They need to be recompiled manually using sp_recompile.
    • More details about statistics below.

LOBs and other off-row columns

Large object (LOB) types varchar(max), nvarchar(max) and varbinary(max) are now supported with memory-optimized tables and table types, as well as natively compiled T-SQL modules, and the size limitations mirror that of disk-based tables (2GB limit on LOB values). In addition, you can have a memory-optimized table with a row size > 8060 bytes, even when no column in the table uses a LOB type. There is no run-time limitation on the size of rows or the data in individual columns; this is part of the table definition. Of course, all data does need to fit in memory.

Even though large columns are now supported, it is still recommended to have most columns fit within 8060 bytes for performance reasons. Details below.

The following T-SQL script illustrates a table with multiple large non-LOB columns and a single LOB column:

CREATE TABLE dbo.LargeTableSample
(
      Id   int IDENTITY PRIMARY KEY NONCLUSTERED,
      C1   nvarchar(4000),
      C2   nvarchar(4000),
      C3   nvarchar(4000),
      C4   nvarchar(4000),
      Misc nvarchar(max)
) WITH (MEMORY_OPTIMIZED = ON);
GO

LOB columns and other columns that do not fit in the 8060 byte in-row storage are stored off-row, while the in-row storage has an 8-byte reference to the off-row data. There is an internal table for each individual off-row column.

The logic that decides whether a given column lives on-row or off-row is as follows, and every ALTER TABLE operation ensures that these rules are followed.

  • If the columns do not fit in the 8060-byte row limit, the biggest columns are stored off-row. For example, adding a varbinary(2000) column to a table with a varbinary(8000) column that is in-row will cause the varbinary(8000) column to be moved off-row.
  • All index key columns must be stored in-row; if the index key columns in a table do not all fit in row, adding the index will fail. Consider the same table as in the previous example. If an index is created on the varbinary(8000) column, that column will be moved back in-row, and the varchar(2000) column will be moved off-row, as index key columns must live in-row.

The following query shows all columns that are stored off-row, along with their sizes and memory utilization. A size of -1 indicates a LOB column. All LOB columns are stored off-row.

SELECT object_name(moa.object_id) AS 'table', c.name AS 'column', c.max_length
FROM sys.memory_optimized_tables_internal_attributes moa
     JOIN sys.columns c ON moa.object_id = c.object_id AND moa.minor_id=c.column_id
WHERE moa.type=5

To get more details about the memory consumption of off-row column you can use the following query, which shows the memory consumption of all internal tables and their indexes that are used to store the off-row columns:

SELECT
  OBJECT_NAME(moa.object_id) AS 'table',
  c.name AS 'column',
  c.max_length,
  mc.memory_consumer_desc,
  mc.index_id,
  mc.allocated_bytes,
  mc.used_bytes
FROM sys.memory_optimized_tables_internal_attributes moa
   JOIN sys.columns c ON moa.object_id = c.object_id AND moa.minor_id=c.column_id
   JOIN sys.dm_db_xtp_memory_consumers mc ON moa.xtp_object_id=mc.xtp_object_id
WHERE moa.type=5

ALTER TABLE Optimizations

ALTER TABLE is used to make schema changes and tune indexes. For details about syntax and examples see the documentation about Altering Memory-Optimizes Tables.

In SQL Server 2016 ALTER TABLE operations on memory-optimized tables are offline, meaning that the table is not available for queries while the operation is ongoing. All operations that make changes to the in-memory data structures, including column and index changes, result in a new copy of the table being created under the hood. An ALTER operation on a 10GB table takes roughly 1 minute when running in parallel on a server with 24 logical processors, and the time taken scales with the size of the table. Good news is that it is possible to combine multiple ADD, DROP, or ALTER operations in a single ALTER TABLE statement. For example, you could add a column, add an index, and add a constraint, all in one ALTER TABLE statement.

Most ALTER TABLE scenarios run in parallel and the operation is log-optimized, meaning that only the metadata changes are written to the transaction log. However, there are some ALTER TABLE operations that run single-threaded and are not log-optimized, meaning that a complete copy of the table is written to the transaction log as part of the ALTER TABLE transaction.

The following ALTER operations run single-threaded and are not log-optimized:

  • ADD/ALTER a column to use a large object (LOB) type: nvarchar(max), varchar(max), or varbinary(max).
  • ADD/DROP a COLUMNSTORE index.
  • ADD/ALTER an off-row column and ADD/ALTER/DROP operations that cause an in-row column to be moved off-row, or an off-row column to be moved in-row.
    • note: ALTER operations that increase the length of an off-row column are log-optimized
    • Refer to the description in the previous section to determine whether a given column is stored off-row.

Statistics Improvements

Statistics for memory-optimized tables are now updated automatically, and sampling of statistics is supported. With these changes, statistics management for memory-optimized tables works essentially in the same way as disk-based tables, and comes with the same tradeoffs.

  • The logic to decide whether stats should be updated mirrors that of disk-based table, with one exception: disk-based tables have a mod-counter at the column-level, while memory-optimized tables have a mod-counter at the row level. These mod-counters are used to track how many changes have been made, and if a certain threshold is reached, auto-update of statistics will kick in. TF2453 and OPTION (RECOMPILE) with table variables are supported.
  • AUTO_UPDATE_STATISTICS_ASYNC is supported.
  • The sampling rate for statistics mirrors that of disk-based tables, and parallel sampling is supported.
  • For most statistics improvements to kick in, ensure your database is set to compatibility level = 130.
  • To enable pre-existing statistics to be updated automatically, a one-time manual update operation is required (see sample script below).
  • The recompilation of natively compiled modules is still manual. Use sp_recompile to recompile natively compiled modules.

One-time script for statistics: For memory-optimized tables that were created before SQL Server 2016 CTP3.3, you can run the following Transact-SQL script one time to update the statistics of all memory-optimized tables, and enable automatic update of statistics from then onward (assuming AUTO_UPDATE_STATISTICS is enabled for the database).

 

-- Assuming AUTO_UPDATE_STATISTICS is already ON for your database:
-- ALTER DATABASE CURRENT SET AUTO_UPDATE_STATISTICS ON;
GO
ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 130;
GO
DECLARE @sql NVARCHAR(MAX) = N'';
SELECT
      @sql += N'UPDATE STATISTICS '
         + quotename(schema_name(t.schema_id))
         + N'.'
         + quotename(t.name)
         + ';' + CHAR(13) + CHAR(10)
   FROM sys.tables AS t
   WHERE t.is_memory_optimized = 1
;
EXECUTE sp_executesql @sql;
GO
-- Each row appended to @sql looks roughly like:
-- UPDATE STATISTICS [dbo].[MyMemoryOptimizedTable];

Next steps

For all the features added in SQL Server 2016 up to and including CTP3 see What’s new for In-Memory OLTP in SQL Server 2016 CTP3

For general information about In-Memory OLTP, see In-Memory OLTP (In-Memory Optimization).

To get started today, download the latest build of SQL Server 2016 or spin up an Azure VM with SQL Server 2016 pre-installed.

 

A Technical Case Study: High Speed IoT Data Ingestion Using In-Memory OLTP in Azure

$
0
0

In this post we look at a customer case study of an Internet of Things (IoT) scenario, where large amount of device data is ingested into an Azure SQL Database. Because the data lives in a SQL database, it can be conveniently accessed and analyzed through SQL queries. In-Memory OLTP was used to achieve significant performance gains in data ingestion without making application changes.

This customer creates and sells popular consumer devices that reside in people’s homes. Every day, a subset of these devices emits diagnostic data. The diagnostic data is used downstream to:

  • Track quality of the devices.
  • Compare with the tests that were done on the device before it left the factory.
  • And analyze trends and individual issues, in order to improve customer care.

Azure SQL is a fast and convenient to correlate telemetry with customer data and business data. Every day, there are 50,000 to 100,000 devices submitting diagnostics to the mid-tier app running in Microsoft Azure. The diagnostics for each device is a single large file that translates into 5,000 to 10,000 rows inserted into over 70 tables in the database. This translates to 750 million rows inserted into the database every day.

iot-diagram

The efficient way to ingest sets of data into a SQL database is to use stored procedures with table-valued parameters (TVPs). The app uses a stored procedure with 70 TVPs to ingest the diagnostics into the SQL database, where the data can then be analyzed.

Here is an example of a traditional table type definition, plus a stored procedure that uses the table type:

CREATE TABLE Location
( LocationName VARCHAR(50) PRIMARY KEY
, CostRate INT );
GO
CREATE TYPE LocationTableType AS TABLE
( LocationName VARCHAR(50)
, CostRate INT );
GO
CREATE PROCEDURE dbo.usp_InsertProductionLocation
 @TVP LocationTableType READONLY
AS
BEGIN
 INSERT INTO Location (LocationName, CostRate)
 SELECT LocationName, CostRate FROM @TVP
END
GO

When the stored procedure is called from the client, the table-valued parameter (TVP) is created at the client-side and passed into the server. Now, at the server-side, TVPs are stored in a system database called tempdb. Tempdb usage does count toward the DTU utilization of the database to some extent, with CPU and log IO utilization being major factors.

When ingesting all this diagnostic data, the database reached the log IO maximum associated with the pricing tier of the database – P11 in this case. The log IO percentage reached 100%, limiting the data that the database could ingest. To overcome this limit, the customer could have increased the pricing tier of the database, or scaled out the database. However, those options would increase the cost.

The customer found that a better option is to use Azure SQL Database In-Memory technology. In-Memory OLTP is a technology that became available in Azure SQL Database recently. As of the time of this writing in April 2016, In-Memory OLTP is in public preview for standalone premium databases. At a high level, In-Memory OLTP uses memory-optimized data structures and algorithms that allow you to improve performance of transactional workloads on Azure SQL DB, without increasing the pricing tier.

The customer chose to use memory-optimized TVPs. The memory-optimization reduces CPU utilization. It also completely eliminates the log IO, because the data exists only in active memory. Note that if the stored procedure inserts the TVP data into a user table, log IO still occurs for the user table.

Here is an example of a memory-optimized table type, and a stored procedure that uses the type for a TVP:

CREATE TABLE Location
( LocationName VARCHAR(50) PRIMARY KEY
, CostRate INT );
GO
CREATE TYPE LocationTableType AS TABLE
( LocationName VARCHAR(50) INDEX ix_LocationName
, CostRate INT )
WITH (MEMORY_OPTIMIZED=ON);
GO
CREATE PROCEDURE dbo.usp_InsertProductionLocation
 @TVP LocationTableType READONLY
AS
BEGIN
 INSERT INTO Location (LocationName, CostRate)
 SELECT LocationName, CostRate FROM @TVP
END
GO

Notice that the only changes required are in the table type definition. No changes are required to the stored procedure or the client application.

For this application, the customer simply changed the table type definitions to memory-optimized. The result is a 30-40% improvement in the performance of the stored procedures used for data ingestion. This translates into being able to ingest 30-40% more data every day, and thus the ability to process diagnostics for that many more devices without increasing cost and without making changes to the application.

Next steps

In-Memory OLTP in Azure SQL Database is a great tool for your IoT data analytics. For more details about how you can use memory-optimization to improve performance of TVPs, table variables, as well as temp tables, see the following blog post:

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/03/21/improving-temp-table-and-table-variable-performance-using-memory-optimization/

Get started with In-Memory OLTP in Azure SQL Database:

https://azure.microsoft.com/en-us/documentation/articles/sql-database-in-memory/

JSON is available in Azure SQL Database

$
0
0

JSON functionalities that are added in SQL Server 2016 are also available in Azure SQL Database see Public preview: JSON in Azure SQL Database. All functionalities that you can use in SQL Server 2016 RC (i.e. FOR JSON, OPENJSON, built-in functions) are also available in Azure SQL Database, including JSON_MODIFY function that is added in RC version.

JSON is available in all service tiers (basic, standard, and premium) but only in new SQL Database V12. We don’t have plans to support it in older versions, since we expect that everyone will migrate to new V12. If your JSON functions do not work, please check version using SELECT @@version.

Also, OPENJSON function requires database compatibility level 130. If all functions work except OPENJSON, you would need to set the latest compatibility level in database. Note that level 120 is set by default even in new databases.

Increased nonclustered index key size with SQL Server 2016

$
0
0

SQL Server 2016 and Azure SQL Database have increased the maximum size for index keys with nonclustered indexes. The new maximum key size for nonclustered indexes is 1700 bytes. The maximum key size for clustered indexes remains 900 bytes.

Consider, for example, the following script which creates a table with a 1700 byte variable-length character column, creates an index on that column, and inserts a 1700-byte value in the column.

DROP TABLE IF EXISTS dbo.t1
GO
CREATE TABLE dbo.t1
( c1 VARCHAR(1700)
)
GO
CREATE INDEX ix_c1 ON dbo.t1(c1)
GO
INSERT t1 VALUES (REPLICATE('1', 1700))
GO

This script succeeds in SQL Server 2016 and Azure SQL Database. In earlier versions of SQL Server the CREATE INDEX statement would give a warning, and the INSERT statement would fail.

For more details see the Maximum Capacity Specification for SQL Server.

Two notes about columnstore indexes and memory-optimized tables:

  • When specifying a nonclustered columnstore index you can specify the columns to be stored in columnar format. This is not actually an index key, and there are no strict size limitations on the columns included in a nonclustered columnstore index.
  • For memory-optimized tables: the maximum index key size for nonclustered indexes is 2500 bytes; there is no strict limit on index key size for hash indexes.

Using DB Compatibility Level 130 with Old CE in SQL Server 2016

$
0
0

In SQL Server 2014 we introduced a revamped Cardinality Estimator (CE), which we further improved in SQL Server 2016 and Azure SQL Database. With the new CE most workloads are seeing better query plans and thus improved performance.

However, there are some workloads that experience plan regressions and thus performance degradation under new CE. To keep using the old CE you would either use the DB compatibility level 110,  or fiddle with trace flags, which you probably don’t want or can do.

In SQL Server 2016 there are a lot of goodies under Compatibility Level 130, such as performance improvements with Columnstore and In-Memory OLTP and all the Query Optimizer enhancements done over the years under Trace Flag 4199. So we are faced with a situation where there are workloads that can benefit from some enhancements that come with Compatibility Level 130, but that cannot use the new CE.

The solution is to use one of the new database-scoped configuration options, namely the Legacy Cardinality Estimation option. You can enable this using the following ALTER DATABASE command:

ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = ON

If you set this option, and you set Compatibility Level to 130, you get all the performance benefits for query execution that come with SQL Server 2016, while still using old CE.

Warning: only use this option if the performance of your workload in a specific database scope is significantly worse with new CE, when compared with old CE.

Database Scoped Configuration

$
0
0

This release now supports a new database level object holding optional configuration values to control the performance and behavior of the application at the database level. This support is available in both SQL Server 2016 and SQL Database V12 using the new ALTER DATABASE SCOPED CONFIGURATION (Transact-SQL) statement. This statement modifies the default SQL Server 2016 Database Engine behavior for a particular database. Several benefits are expected from using this feature

  • Allows to set different configuration options at the database level
    • Current functionality allows to set it up only at the server or individual query level using query hints
    • This is especially important for Azure SQL DB, where certain options could not be configured at the database level.
  • Provides better isolation level for setting different options in case of multiple databases/applications running in a single instance.
  • Enables lower level permissions that can be easily granted to individual database users or groups to set some configuration options
  • Allows to setup the database configuration options differently for the primary and the secondary database which might be necessary for the different types of workloads that they serve

The following options can be configured

  • Set the MAXDOP parameter to an arbitrary value (0,1,2, …) to control the maximum degree of parallelism for the queries in the database. It is recommended to switch to db-scoped configuration to set the MAXDOP instead of using sp_configure at the server level, especially for Azure SQL DB where sp_configure is not available. This value may differ between the primary and the secondary database. For example, if the primary database is executing an OLTP workload, the MAXDOP can be set to 1, while for the secondary database executing reports the MAXDOP can be set to 0 (defined by the system). For more information on MAXDOP see Configure the max degree of parallelism Server Configuration Option
  • Set the LEGACY_CARDINALITY_ESTIMATION option to enable the legacy query optimizer Cardinality Estimation (CE) model (applicable to SQL Server 2012 and earlier), regardless of the database compatibility level setting. This is equivalent to Trace Flag 9481. This option allows to leverage all new functionality provided with compatibility level 130, but still uses the legacy CE model (version 70) in case the latest CE model impacts the query performance. For more information on CE see Cardinality Estimation
  • Enable or disable PARAMETER_SNIFFING at the database level. Disable this option to instruct the query optimizer to use statistical data instead of the initial values for all local variables and parameters when the query is compiled and optimized. This is equivalent to Trace Flag 4136 or the OPTIMIZE FOR UNKNOWN query hint
  • Enable or disable QUERY_OPTIMIZER_HOTFIXES at the database level, to take advantage of the latest query optimizer hotfixes, regardless of the compatibility level of the database. This is equivalent to Trace Flag 4199
  • CLEAR PROCEDURE_CACHE which allows to clear procedure cache at the database level without impacting other databases and without requiring sysadmin permission. This command can be executed using ALTER ANY DATABASE SCOPE CONFIGURATION permission on the database, and the operation can be executed on the primary and/or the secondary

For the T-SQL syntax and other details see ALTER DATABASE SCOPED CONFIGURATION (Transact-SQL)

 

Improved Query Performance with Compatibility Level 130 in Azure SQL Database

$
0
0

Improved Query Performance with Compatibility Level 130 in Azure SQL Database

Azure SQL Database is running transparently hundreds of thousands of databases at many different Compatibility Levels, preserving and guaranteeing the backward compatibility to the corresponding version of SQL Server for all its customers!

Therefore, nothing prevents customers moving any existing databases to the latest Compatibility Level to benefit from the new query optimizer and query processor features. As a reminder, in SQL 2008 and Azure SQL Database v11, the Compatibility Level was by default set to 100;  in SQL 2012 to 110; in SQL 2014 and Azure SQL Database v12 to 120;  and today, with SQL Server 2016 and latest updates of Azure SQL DB comes the latest Compatibility Level 130.

Starting in June 2016, the Azure SQL Database default will change from 120 to 130 for newly created databases. Databases created before June 2016 won’t be affected and will keep running at the Compatibility Level they were initially created with (100, 110 or 120). The same goes for databases migrated from Azure SQL Database v11 to v12.

In this blog, we will explore what the compatibility level 130 brings at the table, what you should be paying attention to, and how to leverage its benefits and address the possible side-effects on the query performance for the existing SQL applications.

About Compatibility Level 130

First, if you want to know what’s the current Compatibility Level of your database, execute the SQL statement hereunder.

SELECT compatibility_level FROM SYS.DATABASES WHERE NAME='<YOUR DATABASE_NAME>’

Before this change for newly created databases happens, let’s get some context, and briefly review what this change is all about through some very basic query examples and see how anyone can benefit from it.

Query processing in relational databases can be very complex and can lead to lots of computer science and mathematics to understand the inherent design choices and behaviors. In this document, the content has been intentionally simplified to ensure that anyone with some minimum technical background can understand the impact of the Compatibility Level change and determine how it can benefit applications.

Let’s have a quick look at what the Compatibility Level 130 brings at the table.  You can find more details at https://msdn.microsoft.com/en-us/library/bb510680.aspx but here is a short summary:

  • The Insert operation of an Insert-select statement can be multi-threaded or can have a parallel plan, while before this operation was single-threaded.
  • Memory Optimized table and table variables queries can now have parallel plans, while before this operation was also single-threaded .
  • Statistics for Memory Optimized table can now be sampled and are auto-updated. See https://msdn.microsoft.com/en-us/library/bb510411.aspx#InMemory for more details.
  • Batch mode v/s Row Mode changes with Column Store indexes
    • Sorts on a table with a Column Store index are now in batch mode.
    • Windowing aggregates now operate in batch mode such as TSQL LAG/LEAD statements.
    • Queries on Column Store tables with Multiple distinct clauses operate in Batch mode.
    • Queries running under DOP=1 or with a serial plan also execute in Batch Mode.
  • Last, Cardinality Estimation improvements are actually coming with Compatibility Level 120, but for those of you running at a lower Compatibility level (i.e. 100, or 110), the move to Compatibility Level 130 will also bring these improvements, and these can also benefit the query performance of your applications.

Practicing Compatibility Level 130

First let’s get some tables, indexes and random data created to practice some of these new capabilities. The TSQL script examples can be executed under SQL Server 2016, or under Azure SQL Database. However, when creating an Azure SQL database, make sure you choose at the minimum a P2 database because you need at least a couple of cores to allow multi-threading and therefore benefit from these features.


– Create a Premium P2 Database in Azure SQL Database

CREATE DATABASE MyTestDB (EDITION=’Premium’, SERVICE_OBJECTIVE=’P2′);

GO

 – Create 2 tables with a column store index on the second one (only available on Premium databases)

CREATE TABLE T_source(Color varchar(10), c1 bigint, c2 bigint)

CREATE TABLE T_target(c1 bigint, c2 bigint)

CREATE CLUSTERED COLUMNSTORE INDEX CCI ON T_target

GO

– Insert few rows.

INSERT T_source VALUES (‘Blue’, RAND() * 100000, RAND() * 100000),

(‘Yellow’, RAND() * 100000, RAND() * 100000),

(‘Red’, RAND() * 100000, RAND() * 100000),

(‘Green’, RAND() * 100000, RAND() * 100000),

(‘Black’, RAND() * 100000, RAND() * 100000)

GO 200

INSERT T_source SELECT * FROM T_source

GO 10


Now, let’s have a look to some of the Query Processing features coming with Compatibility Level 130.

Parallel INSERT

Executing the TSQL statements below executes the INSERT operation under Compatibility Level 120 and 130, which respectively executes the INSERT operation in a single threaded model (120), and in a multi-threaded model (130).


– Parallel INSERT … SELECT … in heap or CCI is available under 130 only

SET STATISTICS XML ON;

ALTER DATABASE MyTestDB SET COMPATIBILITY_LEVEL = 120

GO 

– The INSERT part is in serial

INSERT t_target WITH (tablock) SELECT C1, COUNT(C2) * 10 * RAND() FROM T_source GROUP BY C1 OPTION (RECOMPILE)

ALTER DATABASE MyTestDB SET COMPATIBILITY_LEVEL = 130

GO

– The INSERT part is in parallel

INSERT t_target WITH (tablock) SELECT C1, COUNT(C2) * 10 * RAND() FROM T_source GROUP BY C1 OPTION (RECOMPILE)

SET STATISTICS XML OFF;


By requesting the actual the query plan, looking at its graphical representation or its XML content, you can determine which Cardinality Estimation function is at play. Looking at the plans side-by-side on figure 1, we can clearly see that the Column Store INSERT execution goes from serial in 120 to parallel in 130. Also, note that the change of the iterator icon in the 130 plan showing two parallel arrows, illustrating the fact that now the iterator execution is indeed parallel. If you have large INSERT operations to complete, the parallel execution, linked to the number of core you have at your disposal for the database, will perform better; up to a 100 times faster depending your situation!

Figure 1 INSERT operation changes from serial to parallel with Compatibility Level 130.

Figure 1

SERIAL Batch Mode

Similarly, moving to Compatibility Level 130 when processing rows of data enables batch mode processing. First, batch mode operations  are only available when you have a column store index in place. Second, a batch typically represents ~900 rows, and uses a code logic optimized for multicore CPU, higher memory throughput and directly leverages the compressed data of the Column Store whenever possible. Under these conditions, SQL Server 2016 can process ~900 rows at once, instead of 1 row at the time, and as a consequence, the overall overhead cost of the operation is now shared by the entire batch, reducing the overall cost by row. This shared amount of operations combined with the column store compression basically reduces the latency involved in a SELECT batch mode operation. You can find more details about the column store and batch mode at https://msdn.microsoft.com/en-us/library/gg492088.aspx.


– Serial batch mode execution

SET STATISTICS XML ON;

ALTER DATABASE MyTestDB SET COMPATIBILITY_LEVEL = 120

GO

– The scan and aggregate are in row mode

SELECT C1, COUNT (C2) FROM T_target GROUP BY C1 OPTION (MAXDOP 1, RECOMPILE)

GO

ALTER DATABASE MyTestDB SET COMPATIBILITY_LEVEL = 130

GO 

– The scan and aggregate are in batch mode, and force MAXDOP to 1 to show that batch mode also now works in serial mode.

SELECT C1, COUNT(C2) FROM T_target GROUP BY C1 OPTION (MAXDOP 1, RECOMPILE)

GO

SET STATISTICS XML OFF;


As visible below, by observing the query plans side-by-side on figure 2, we can see that the processing mode has changed with the Compatibility Level, and as a consequence, when executing the queries in both Compatibility Level altogether, we can see that most of the processing time is spent in row mode (86%) compared to the batch mode (14%), where 2 batches have been processed. Increase the dataset, the benefit will increase.

Figure 2 SELECT operation changes from serial to batch mode with Compatibility Level 130.

Figure 2

Batch mode on Sort Execution

Similar to the above, but applied to a sort operation, the transition from row mode (Compatibility Level 120) to batch mode (Compatibility Level 130) improves the performance of the SORT operation for the same reasons.


– Batch mode on sort execution

SET STATISTICS XML ON;

ALTER DATABASE MyTestDB SET COMPATIBILITY_LEVEL = 120

GO

– The scan and aggregate are in row mode

SELECT C1, COUNT(C2) FROM T_target GROUP BY C1 ORDER BY C1 OPTION (MAXDOP 1, RECOMPILE)

GO

ALTER DATABASE MyTestDB SET COMPATIBILITY_LEVEL = 130

GO

– The scan and aggregate are in batch mode, and force MAXDOP to 1 to show that batch mode also now works in serial mode.

SELECT C1, COUNT(C2) FROM T_target GROUP BY C1 ORDER BY C1 OPTION (MAXDOP 1, RECOMPILE)

GO

SET STATISTICS XML OFF;


Visible side-by-side on figure 3, we can see that the sort operation in row mode represents 81% of the cost, while the batch mode only represents 19% of the cost (respectively 81% and 56% on the sort itself).

Figure 3 SORT operation changes from row to batch mode with Compatibility Level 130.

Figure 3

Obviously, these samples only contain tens of thousands of rows, which is nothing when looking at the data available in most SQL Servers these days. Just project these against millions of rows instead, and this can translate in several minutes of execution spared every day pending the nature of your workload.

Cardinality Estimation Improvements

Introduced with SQL Server 2014, any database running at a Compatibility Level 120 or above will make use of the new Cardinality Estimation functionality. Essentially, cardinality estimation is the logic used to determine how SQL server will execute a query based on its estimated cost. The estimation is calculated using input from statistics associated with objects involved in that query. Practically, at a high-level, Cardinality Estimation functions are row count estimates along with information about the distribution of the values, distinct value counts, and duplicate counts contained in the tables and objects referenced in the query. Getting these estimates wrong, can lead to unnecessary disk I/O due to insufficient memory grants (i.e. TempDB spills), or to a selection of a serial plan execution over a parallel plan execution, to name a few. Conclusion, incorrect estimates can lead to an overall performance degradation of the query execution. On the other side, better estimates, more accurate estimates, leads to better query executions!

As mentioned before, query optimizations and estimates are a complex matter, but if you want to learn more about query plans and cardinality estimator, you can refer to the document at  https://msdn.microsoft.com/en-us/library/dn673537.aspx for a deeper dive.

Which Cardinality Estimation do you currently use?

To determine under which Cardinality Estimation your queries are running, let’s just use the query samples below. Note that this first example will run under Compatibility Level 110, implying the use of the old Cardinality Estimation functions.


– Old CE

ALTER DATABASE MyTestDB SET COMPATIBILITY_LEVEL = 110

GO

SET STATISTICS XML ON;

SELECT [c1]

FROM [dbo].[T_target]

WHERE [c1] > 20000

GO

SET STATISTICS XML OFF;


Once execution is complete, click on the XML link, and look at the properties of the first iterator as shown below. Note the property name called CardinalityEstimationModelVersion currently set on 70. It does not mean that the database Compatibility Level is set to the SQL Server 7.0 version (it is set on 110 as visible in the TSQL statements above), but the value 70 simply represents the legacy Cardinality Estimation functionality available since SQL Server 7.0, which had no major revisions until SQL Server 2014 (which comes with a Compatibility Level of 120).

Figure 4 The CardinalityEstimationModelVersion is set to 70 when using a Compatibility Level of 110 or below.

Figure 4

Alternatively, you can change the Compatibility Level to 130, and disable the use of the new Cardinality Estimation function by using the LEGACY_CARDINALITY_ESTIMATION set to ON with the ALTER DATABASE SCOPED CONFIGURATION statement (See https://msdn.microsoft.com/en-us/library/mt629158.aspx for more information). This will be exactly the same as using 110 from a Cardinality Estimation function point of view, while using the latest query processing compatibility level. Doing so, you can benefit from the new query processing features coming with the latest Compatibility Level (i.e. batch mode), but still rely on the old Cardinality Estimation functionality if necessary.


– Old CE

ALTER DATABASE MyTestDB SET COMPATIBILITY_LEVEL = 130

GO

ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = ON;

GO

SET STATISTICS XML ON;

SELECT [c1]

FROM [dbo].[T_target]

WHERE [c1] > 20000

GO

SET STATISTICS XML OFF;


Simply moving to the Compatibility Level 120 or 130 enables the new Cardinality Estimation functionality. In such a case, the default CardinalityEstimationModelVersion will be set accordingly to 120 or 130 as visible below.


– New CE

ALTER DATABASE MyTestDB SET COMPATIBILITY_LEVEL = 130

GO

ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = OFF;

GO

SET STATISTICS XML ON;

SELECT [c1]

FROM [dbo].[T_target]

WHERE [c1] > 20000

GO

SET STATISTICS XML OFF;


Figure 5 The CardinalityEstimationModelVersion is set to 130 when using a Compatibility Level of 130.

Figure 5

Witnessing the Cardinality Estimation differences

Now, let’s run a slightly more complex query involving an INNER JOIN with a WHERE clause with some predicates, and let’s look at the row count estimate from the old Cardinality Estimation function first.


– Old CE row estimate with INNER JOIN and WHERE clause

ALTER DATABASE MyTestDB SET COMPATIBILITY_LEVEL = 130

GO

ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = ON;

GO

SET STATISTICS XML ON;

SELECT T.[c2]

FROM [dbo].[T_source] S

INNER JOIN [dbo].[T_target] T ON T.c1=S.c1

WHERE S.[Color] = ‘Red’ AND S.[c2] > 2000 AND T.[c2] > 2000 OPTION (RECOMPILE)

GO

SET STATISTICS XML OFF;


Executing this query effectively returns 200,704 rows, while the row estimate with the old Cardinality Estimation functionality claims 194,284 rows. Obviously, as said before, these row count results will also depend how often you ran the previous samples, which populates the sample tables over and over again at each run. Obviously, the predicates in your query will also have an influence on the actual estimation aside from the table shape, data content, and how this data actually correlate with each other.

Figure 6 The row count estimate is 194,284 or 6,000 rows off from the 200,704 rows expected.

Figure 6

In the same way, let’s now execute the same query with the new Cardinality Estimation functionality.


– New CE row estimate with INNER JOIN and WHERE clause

ALTER DATABASE MyTestDB SET COMPATIBILITY_LEVEL = 130

GO

ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = OFF;

GO

SET STATISTICS XML ON;

SELECT T.[c2]

FROM [dbo].[T_source] S

INNER JOIN [dbo].[T_target] T ON T.c1=S.c1

WHERE S.[Color] = ‘Red’ AND S.[c2] > 2000 AND T.[c2] > 2000 OPTION (RECOMPILE)

GO

SET STATISTICS XML OFF;


Looking at the below, we now see that the row estimate is 202,877, or much closer and higher than the old Cardinality Estimation.

Figure 7 The row count estimate is now 202,877, instead of 194,284..

Figure 7

In reality, the result set is 200,704 rows (but all of it depends how often you did run the queries of the previous samples, but more importantly, because the TSQL uses the RAND() statement, the actual values returned can vary from one run to the next). Therefore, in this particular example, the new Cardinality Estimation does a better job at estimating the number of rows because 202,877 is much closer to 200,704, than 194,284! Last, if you change the WHERE clause predicates to equality (rather than “>” for instance), this could make the estimates between the old and new Cardinality function even more different, depending on how many matches you can get.

Obviously, in this case, being ~6000 rows off from actual count does not represent a lot of data in some situations. Now, transpose this to millions of rows across several tables and more complex queries, and at times the estimate can be off by millions of rows , and therefore, the risk of picking-up the wrong execution plan, or requesting insufficient memory grants leading to TempDB spills, and so more I/O, are much higher.

If you have the opportunity, practice this comparison with your most typical queries and datasets, and see for yourself by how much some of the old and new estimates are affected, while some could just become more off from the reality, or some others just simply closer to the actual row counts actually returned in the result sets. All of it will depend of the shape of your queries, the Azure SQL database characteristics, the nature and the size of your datasets, and the statistics available about them. If you just created your Azure SQL Database instance, the query optimizer will have to build its knowledge from scratch instead of reusing statistics made of the previous query runs. So, the estimates are very contextual and almost specific to every server and application situation. It is an important aspect to keep in mind!

Some considerations to take into account

Although most workloads would benefit from the Compatibility Level 130, before you adopting the Compatibility Level for your production environment, you basically have 3 options:

  1. You move to Compatibility Level 130, and see how things perform. In case you notice some regressions, you just simply set the Compatibility Level back to its original level, or keep 130, and only reverse the Cardinality Estimation back to the legacy mode (As explained above, this alone could address the issue).
  2. You thoroughly test your existing applications under similar production load, fine tune, and validate the performance before going to production. In case of issues, same as above, you can always go back to the original Compatibility Level, or simply reverse the Cardinality Estimation back to the legacy mode.
  3. As a final option, and the most recent way to address these questions, is to leverage the Query Store. That’s today’s recommended option! To assist the analysis of your queries under Compatibility Level 120 or below versus 130, we cannot encourage you enough to use Query Store. Query Store is available with the latest version of Azure SQL Database V12, and it’s designed to help you with query performance troubleshooting. Think of the Query Store as a flight data recorder for your database collecting and presenting detailed historic information about all queries. This greatly simplifies performance forensics by reducing the time to diagnose and resolve issues. You can find more information at https://azure.microsoft.com/en-in/blog/query-store-a-flight-data-recorder-for-your-database/

At the high-level, if you already have a set of databases running at Compatibility Level 120 or below, and plan to move some of them to 130, or because your workload automatically provision new databases that will be soon be set by default to 130, please consider the followings:

  • Before changing to the new Compatibility Level in production, enable Query Store. You can refer to https://msdn.microsoft.com/en-us/library/bb895281.aspx for more information.
  • Next, test all critical workloads using representative data and queries of a production-like environment, and compare the performance experienced and as reported by Query Store. If you experience some regressions, you can identify the regressed queries with the Query Store and use the plan forcing option from Query Store (aka plan pinning). In such a case, you definitively stay with the Compatibility Level 130, and use the former query plan as suggested by the Query Store.
  • If you want to leverage new features and capabilities of Azure SQL Database (which is running SQL Server 2016), but are sensitive to changes brought by the Compatibility Level 130, as a last resort, you could consider forcing the Compatibility Level back to the level that suits your workload by using an ALTER DATABASE statement. But first, be aware that the Query Store plan pinning option is your best option because not using 130 is basically staying at the functionality level of an older SQL Server version.
  • If you have multitenant applications spanning multiple databases, it may be necessary to update the provisioning logic of your databases to ensure a consistent Compatibility Level across all databases; old and newly provisioned ones. Your application workload performance could be sensitive to the fact that some databases are running at different Compatibility Levels, and therefore, Compatibility Level consistency across any database could be required in order to provide the same experience to your customers all across the board. Note that it is not a mandate, it really depends on how your application is affected by the Compatibility Level.
  • Last, regarding the Cardinality Estimation, and just like changing the Compatibility Level, before proceeding in production, it is recommended to test your production workload under the new conditions to determine if your application benefits from the Cardinality Estimation improvements.

Conclusion

Using Azure SQL Database to benefit from all SQL Server 2016 enhancements can clearly improve your query executions. Just as-is! Of course, like any new feature, a proper evaluation must be done to determine the exact conditions under which your database workload operates the best. Experience shows that most workload are expected to at least run transparently under Compatibility Level 130, while leveraging new query processing functions, and new Cardinality Estimation. That said, realistically, there are always some exceptions and doing proper due diligence is an important assessment to determine how much you can benefit from these enhancements. And again, the Query Store can be of a great help in doing this work!

As SQL Azure evolves, you can expect a compatibility level 140 in the future. When time is appropriate, we will start talking about what this future compatibility level 140 will bring, just as we briefly discussed here what compatibility level 130 is bringing today.

For now, let’s not forget, starting June 2016, Azure SQL Database will change the default Compatibility Level from 120 to 130 for newly created databases. Be aware!

/Alain Lissoir

References

Query Optimizer Additions in SQL Server 2016

$
0
0

In SQL Server 2016 we have introduced a number of new Query Optimizer improvements. This article summarizes some them and explains you can leverage the benefits of the new enhancements. Expect deep dive follow up articles for some of the enhancements. Here is the short list:

  • Compatibility Level Guarantees
  • Query Optimizer Improvements under Trace Flag 4199
  • New Referential Integrity Operator
  • Parallel Update of Sampled Statistics
  • Sublinear Threshold for Update of Statistics
  • Additions to the New Cardinality Estimator (New CE)
  • Misc. Enhancements

Compatibility Level Guarantees

Starting SQL Server 2016, we promise that after upgrades there will be no plan changes if you stick with the old compatibility levels, like 120 or 110. New features and improvements will be available under the latest compatibility level only.

This will make the upgrade experience much smoother. For example, when upgrading from a database from SQL Server 2014 (compatibility level 120) to SQL Server 2016, the workload will continue getting the same query plans that it used to. Similarly, when we make enhancements to Azure SQL DB capabilities, we will not affect the query plans of your workloads, as long as you don’t change the compatibility level.

As a result of this guarantee, the new Query Optimizer improvements will only be available in the latest compatibility level (130). You are encouraged to upgrade to the latest compatibility level to benefit from all the enhancements. To ensure that you mitigate any unintended consequences of plan changes by such, please refer to the upgrade suggestions in this article.

Query Optimizer Improvements under Trace Flag 4199

Traditionally, to prevent unwanted plan changes, all Query Optimizer hotfixes from previous releases that result in plan changes have been put under a specific Trace Flag (4199) only. Details about this trace flag can be found here. The model going forward is that all improvements to the Query Optimizer will be released and on by default under successive database compatibility levels. As a result, we have enabled the improvements previously available only under trace flag 4199 by default under compatibility level 130.

New Referential Integrity Operator

SQL Server 2016 introduces a new Referential Integrity Operator (under compatibility level 130) that increases the limit on the number of other tables with foreign key references to a primary or unique key of a given table (incoming references), from 253 to 10,000. The new query execution operator does the referential integrity checks in place, by comparing the modified row to the rows in the referencing tables, to verify that the modification will not break the referential integrity. This results in much lower compilation times for such plans and comparable execution times.

Example:

CREATE TABLE Customer(Id INT PRIMARY KEY, CustomerName NVARCHAR(128))
CREATE TABLE ReferenceToCustomer1(CustomerId INT FOREIGN KEY REFERENCES Customer(Id))
CREATE TABLE ReferenceToCustomer2(CustomerId INT FOREIGN KEY REFERENCES Customer(Id))
CREATE TABLE ReferenceToCustomer3(CustomerId INT FOREIGN KEY REFERENCES Customer(Id))
...
DELETE Customer WHERE Id = 1

Old Plan:

FK Old Plan
New Plan:

FK New Plan

The first version of the new Referential Integrity Operator has the following constraints:

  • Greater than 253 foreign key references are only supported for DELETE and UPDATE operations.
  • A table with a foreign key reference to itself is still limited to 253 foreign key references.
  • Greater than 253 foreign key references are not currently available for column store indexes, memory-optimized or Stretched tables

Please refer to this article for more details.

Parallel Update of Sampled Statistics

Collection of statistics using FULLSCAN can be run in parallel since SQL Server 2005. In SQL Server 2016 under compatibility level 130, we have enabled collection of statistics using SAMPLE in parallel (up to 16 degree of parallelism), which decreases the overall stats update elapsed time. Since auto created stats are sampled by default, all such will be updated in parallel under the latest compatibility level.

Sublinear Threshold for Update of Statistics

In the past, the threshold for amount of changed rows that triggers auto update of statistics was 20%, which was inappropriate for large tables. Starting with SQL Server 2016 (compatibility level 130), this threshold is related to the number of rows in a table – the higher the number of rows in a table, the lower the threshold will become to trigger an update of the statistics. Note that this behavior was available under Trace Flag 2371 in previous releases.

For example, if a table had 1 billion rows, under the old behavior it would have taken 200 million rows to be changed before auto-stats update kicks in. In SQL Server 2016, it would take only 1 million rows to trigger auto stats update.

Additions to the New Cardinality Estimator (New CE)

SQL Server 2014 introduced a new Cardinality Estimator to address short-comings in the cardinality estimator that was used in previous versions of the product. In the latest release, we have identified and fixed some inefficiencies with the new models that could result in bad plans. We are planning on having a separate blog post to summarize the New CE enhancements in SQL Server 2016.

Misc. Enhancements

As part of various scenarios like column stores, in-memory OLTP (aka. Hekaton), we have introduced a number of Query Optimizer enhancements that trigger newly introduced perf improvements. Below is a list of some of those:

  • Batch query processing in serial queries
  • Sort operators in batch mode
  • Window aggregates in batch mode
  • Distinct aggregates in batch mode
  • Parallel INSERT SELECT into heaps and CCI
  • Heap scans for memory-optimized tables
  • Parallel scans for memory-optimized tables
  • Sampled and auto-update stats for memory-optimized tables

This blog post has more details for the mentioned In-Memory OLTP improvements.

In-Memory OLTP Performance Demo Available for Download

$
0
0

In-Memory OLTP is a performance feature available in SQL Server (since 2014) and Azure SQL Database (currently in preview). With the memory-optimized tables, memory-optimized table types, and natively compiled stored procedures that are all introduced by In-Memory OLTP you can get great performance improvements in transactional applications.

To demonstrate the performance of In-Memory OLTP we typically compare the SQL Server performance before and after applying In-Memory OLTP. In various videos you may have seen the following application used to demo In-Memory OLTP performance:

in-memory-oltp-performance

We have had requests from various folks who wanted to get a hold of this demo to see how it performs in their environment, and to show it to others.

We have now made this demo available for everyone to download and give it a try. The released version of the demo, including binaries and easy setup scripts, is available at:

in-memory-oltp-demo-v1.0

To get started, download the zip file and follow the steps in the readme that is included with the release. Configuration parameters can be changed through the Configuration Settings, which are accessed through the Options menu in the application.

The source code of the demo is available through the SQL Server Samples GitHub repository.

 

Give the demo a try, and let us know any feedback!

Viewing all 177 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>