Encapsulate JSON parsing with inline table value functions

June 10, 2016, 2:02 am

≪ Previous: In-Memory OLTP Performance Demo Available for Download

If you are storing JSON fields in SQL Server 2016 or Azure SQL Database, you would probably need to use OPENJSON to parse JSON and extract fields.

As an example, in new SQL Server 2016 WideWorldImporters sample database, we have Application.People table with several standard columns and one JSON column that contains custom fields.

If you want to query both standard column and JSON fields, you can write something like:

select FullName, LogonName, EmailAddress, Title, CommissionRate
from Application.People
 cross apply OPENJSON(CustomFields)
             WITH(Title nvarchar(50), HireDate datetime2, OtherLanguages nvarchar(max) as json,
                  PrimarySalesTerritory nvarchar(50), CommissionRate float)

FullName, LogonName, and EmailAddress are coming from standard columns, while Title and CommissionRate are placed in JSON column CustomFields. In order to return JSON fields, we need to use CROSS APPLY OPENJSON on custom fields, and define schema of JSON that will be returned in WITH clause.

This schema-on-query definition might be handy if you need different JSON fields in different queries, but if you know that you will always use the same fields from JSON columns, you would need to copy-paste the same WITH clause in every query.

As an alternative, you can create inline table value function where you can encapsulate OPENJSON and WITH specification:

go
drop function if exists Application.PeopleData
go
create function Application.PeopleData(@data nvarchar(max))
returns table
as return(select *
 from OPENJSON(@data)
      WITH(Title nvarchar(50), HireDate datetime2,
           PrimarySalesTerritory nvarchar(50), CommissionRate float,
           OtherLanguages nvarchar(max) as json)
)
go

Now, the query can be simpler:

select FullName, LogonName, EmailAddress, Title, CommissionRate
from Application.People
     cross apply Application.PeopleData(CustomFields)

We don’t need to repeat WITH clause in every query that need to parse JSON fields. If you know what is JSON schema, you can encapsulate your parsing logic in one place and expose just keys that you need to use in queries.

↧

Handling inheritance with JSON

June 10, 2016, 2:52 am

≫ Next: Appending JSON arrays using JSON_MODIFY function

≪ Previous: Encapsulate JSON parsing with inline table value functions

JSON in SQL Server 2016 and Azure SQL Database enables you to handle custom fields and inheritance. As an example, imagine People/Employee/Salespeople structure where Employee is a kind of Person, and Sales person is a kind of Employee. This is a standard inheritance structure of entities. In earlier versions of SQL Server, you had several options to design table for this inheritance structure:

Single table inheritance where you can put all fields from all sub-classes in one wide table (e.g. People)
Multiple table inheritance where you create separate table for every entity
Entity-attribute-value pattern where you keep common filed in one table (e.g. People) and then store all custom fields in separate (PersonId, FieldName, FieldValue) table

In new SQL Server 2016 WideWorldImporters sample database, we have new approach for handling inheritance – using JSON column and key-value pairs. Application.People table is used to store all kind of people and it has only columns that are common for all types of people. We have two flags IsEmployee and IsSalesperson that represent type of person, and one JSON column (CustomFields) that contains custom fields specific for some kind of people.

If someone is employee (IsEmployee column is equal to 1) then it has some additional fields (like OtherLanguages he speaks, Title, and HireDate). Since these fields are custom for employee type, they are stored in JSON column as key-value pairs. We are storing them as JSON key-values because we want to avoid additional sparse columns or separate table that will contain custom fields.

If some person is sales person (IsEmployee column is equal to 1) then it has some additional fields PrimarySalesTerritiory and CommisionRate.

This looks like EAV model where key-values are not stored in separate table.

Returning data as JSON

This model enables you to access all data with a single table read, e.g.:

select PersonID, FullName, PhoneNumber, FaxNumber, EmailAddress, CustomFields
from Application.People
where PersonID = 17

This query will return common database columns and all custom fields as one text field. If your client understands JSON (e.g. AngularJS single-page app) raw content of CustomField column can be directly displayed.

If you have JavaScript client that understands JSON, you might return entire text as JSON using FOR JSON clause:

select PersonID, FullName, PhoneNumber, FaxNumber, EmailAddress, JSON_QUERY(CustomFields) CustomFields
from Application.People
where PersonID = 17
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER

Note important thing here – we need to wrap CustomFields with JSON_QUERY call. If you just use CustomFields column in FOR JSON, it will be surrounded with double quotes and escaped because this column is treated as plain text. However, if you wrap it with JSON_QUERY, FOR JSON will know that result of this column is valid JSON and it will not escape it.

Returning data to clients that don’t understand JSON

In some case you might want to have relational view over JSON values.

Imagine that you have PowerBI or SSRS reports that can connect to a People table as data source. They don’t understand JSON and they need standard columns. In reports usually you need to have predefined columns that you will use for reporting, so you cannot use just a JSON text.

If you can write TSQL that defines data source, you can use JSON_VALUE or OPENJSON in SQL query that defines data source to read values from JSON, e.g.:

select PersonID, FullName, PhoneNumber, FaxNumber, EmailAddress, Title, HireDate
from Application.People
 cross apply OPENJSON(CustomFields)
             WITH(Title nvarchar(50), HireDate datetime2)

When a reporting tool executes this query, it will see Title and HireDate as standard columns. In this case it might be better to put this query in some stored procedure that will be called from reporting tool.

Another alternative is to create views that will encapsulate JSON values and use these views as a source.

You can create Employees view that filters only people with IsEmployee flag, and add custom fields specific for employees:

drop view if exists Application.Employees
go
create view Application.Employees as
select PersonID, FullName, PhoneNumber, FaxNumber, EmailAddress, Title, HireDate
from Application.People
 cross apply OPENJSON(CustomFields)
             WITH(Title nvarchar(50), HireDate datetime2)
WHERE IsEmployee = 1

Or we can have a view called Salespeople that filters only people with IsSalesPerson flag, and add custom fields specific for sales people:

drop view if exists Application.Salespeople
go
create view Application.Salespeople as
select PersonID, FullName, PhoneNumber, FaxNumber, EmailAddress, Title, HireDate, PrimarySalesTerritory, CommissionRate
from Application.People
 cross apply OPENJSON(CustomFields)
             WITH(Title nvarchar(50), HireDate datetime2, PrimarySalesTerritory nvarchar(50), CommissionRate float)
WHERE IsSalesperson = 1

External applications can read data from these views and they will not be aware that custom fields are coming from JSON column. Following queries will return columns that looks like any standard table columns:

select * from Application.Employees
select * from Application.Salespeople

The only constraint is that fields that are coming from JSON cannot be directly updated using UPDATE Employee SET CommissionRate = 0.15. You would need to use JSON_MODIFY function to update values in JSON column.

Conclusion

There is no perfect structure that can map OO inheritance to relational database, but this is one approach. Compared to other design approaches there are some pros and cons:

Single table inheritance: you will not have table schema explosion where you need to add new column for each property in some subclass. Downside is that accessing JSON values is slower than direct column reference.
Multiple table inheritance enables you to organize you properties into separate columns that can be small and manageable. However, in order to return information about entities you need a lot of joins.
Entity-attribute-value table is similar to JSON, but in this case you have only one table access instead of reading from two tables.

↧

Appending JSON arrays using JSON_MODIFY function

June 10, 2016, 11:20 am

≫ Next: IoT code sample – loading messages from Event Hub into Azure SQL Database

≪ Previous: Handling inheritance with JSON

Sql Server 2016 and Azure Sql Database enables you to easily modify JSON object and arrays. JSON_MODIFY updates the value of a property in a JSON string and returns the updated JSON string. Here I will show how to append objects in JSON array.

In new SQL Server 2016 WideWorldImporters sample database you can find ReturnedDeliveryData JSON column in Sales.Invoice table. This column contains information about delivery information formatted as JSON like in the following example:

{
 "Events": [{
      "Event": "Ready for collection",
      "EventTime": "2013-01-01T12:00:00",
      "ConNote": "EAN-125-1051"
      }, {
      "Event": "DeliveryAttempt",
      "EventTime": "2013-01-02T07:05:00",
      "ConNote": "EAN-125-1051",
      "Comment": "Receiver not present"
  }],
  "DeliveredWhen": "2013-01-02T07:05:00",
  "ReceivedBy": "Klara Rakus"
}

In the $.Events array we have complete history of events, delivery attempts, etc. If you want to see values in the event array you can use the following query:

select di.*
from Sales.Invoices
    cross apply OPENJSON(ReturnedDeliveryData,'$.Events')
                WITH (Event nvarchar(100), EventTime datetime2,
                      ConNote nvarchar(100), Comment nvarchar(100) ) as di
where InvoiceID = 1

Now, imagine that you want to add a new event in the $.Events array. You don’t need to parse entire JSON to inject a new object. JSON_MODIFY function enables you to update values on some path if you specify append keyword in JSON Path (2nd parameter):

DECLARE @event nvarchar(4000)
SET @event = N'{"Event":"DeliveryAttempt","EventTime": "2013-02-02T08:15:00","Comment": "Receiver not present"}'
UPDATE Sales.Invoices
SET ReturnedDeliveryData = JSON_MODIFY(ReturnedDeliveryData, 'append $.Events', @event)
WHERE InvoiceID = 1

JSON_MODIFY function will find an array on the path $.Events and append value provided as third parameter. Unfortunately, this code will not do what we expect. If you provide just a string variable as third parameter, JSON_MODIFY will treat it as any other string and wrap it with double quotes, escape all special characters so you will have string as the last parameter instead of object – something like:

{
 "Events": [{
      "Event": "Ready for collection",
      "EventTime": "2013-01-01T12:00:00",
      "ConNote": "EAN-125-1051"
      }, {
      "Event": "DeliveryAttempt",
      "EventTime": "2013-01-02T07:05:00",
      "ConNote": "EAN-125-1051",
      "Comment": "Receiver not present"
  }, "{\"Event\":\"DeliveryAttempt\",\"EventTime\": \"2013-02-02T08:15:00\",\"Comment\": \"Receiver not present\"}"
 ],
 "DeliveredWhen": "2013-01-02T07:05:00",
 "ReceivedBy": "Klara Rakus"
}

This is not what we need because we need to append JSON object and not some escaped JSON string. If you want to append JSON object instead of string that looks like a object, you need to wrap third parameter with JSON_QUERY(…) function:

DECLARE @event nvarchar(4000)
SET @event = N'{"Event":"DeliveryAttempt","EventTime": "2013-02-02T08:15:00","Comment": "Receiver not present"}'
UPDATE Sales.Invoices
SET ReturnedDeliveryData = JSON_MODIFY(ReturnedDeliveryData, 'append $.Events', JSON_QUERY(@event))
WHERE InvoiceID = 1

Since JSON_QUERY returns valid JSON fragment as result, JSON_MODIFY will know that this is not a string and that result should not be escaped. JSON_QUERY without second parameter behaves as “cast to JSON”. Now you will have expected result:

{
 "Events": [{
      "Event": "Ready for collection",
      "EventTime": "2013-01-01T12:00:00",
      "ConNote": "EAN-125-1051"
      }, {
      "Event": "DeliveryAttempt",
      "EventTime": "2013-01-02T07:05:00",
      "ConNote": "EAN-125-1051",
      "Comment": "Receiver not present"
  }, {"Event":"DeliveryAttempt","EventTime": "2013-02-02T08:15:00","Comment": "Receiver not present"}
 ],
 "DeliveredWhen": "2013-01-02T07:05:00",
 "ReceivedBy": "Klara Rakus"
}

As you can see appending JSON object to arrays is easy. You just need to be aware that JSON_MODIFY might convert JSON formatted as string into escaped string if you don’t use it properly.

↧

IoT code sample – loading messages from Event Hub into Azure SQL Database

June 10, 2016, 2:40 pm

≫ Next: Announcing availability of SQL Server 2014 Express Docker image

≪ Previous: Appending JSON arrays using JSON_MODIFY function

Paolo Salvatori created an example that simulates an Internet of Things (IoT) scenario where thousands of devices send events (e.g. sensor readings) to a backend system via a message broker. The backend system retrieves events from the messaging infrastructure and store them to a persistent repository in a scalable manner. Solution has the following components:

Event Hub to collect messages
Service Fabric to move messages from Event Hub into Azure SQL Database in JSON format
Azure Sql Database where JSON messages are parsed using OPENJSON function and stored messages

The architecture is shown on the following figure:

Device simulator sends messages to Event Hub, and Service Fabric service reads messages as JSON and stores them into Azure SQL Database. Messages are sent in JSON format so they are parsed in database using OPENJSON function and stored into Events table.

Source code is available on GitHub:

https://github.com/azure-cat-emea/servicefabricjsonsqldb

↧

Announcing availability of SQL Server 2014 Express Docker image

June 11, 2016, 9:05 pm

≫ Next: IoT Smart Grid code sample

≪ Previous: IoT code sample – loading messages from Event Hub into Azure SQL Database

We are excited to announce the public availability of the sql server 2014 express Docker image for Windows Server Core based Containers! The public repo is hosted on Docker Hub and contains the latest docker image as well as pointers to the Dockerfile and the start PS script (hosted on Github). We hope you will find this image useful and leverage it for your container based applications!

Image Requirements

Windows Server Core TP5 v10.0.14300.1000

Docker Pull Command

docker pull microsoft/mssql-server-2014-express-windows

Docker Run Command

docker run -p 1433:1433 --env sa_password=YOUR_SA_PASSWORD microsoft/mssql-server-2014-express-windows

IoT Smart Grid code sample

June 15, 2016, 10:59 am

≫ Next: Increased Memory Size for In-Memory OLTP in SQL Server 2016

≪ Previous: Announcing availability of SQL Server 2014 Express Docker image

This code sample simulates an IoT Smart Grid scenario where multiple IoT power meters are constantly sending measurements to a SQL Server 2016 in-memory database. The sample is leveraging the following features: Memory Optimized Tables, Table valued Parameters (TVPs), Natively Compiled Stored Procedures, System-Versioned Temporal Tables (for building version history), Clustered Columnstore Index, Power BI (for data visualization). The combination of these features could be used to improve performance in High Data Input Rate / Shock Absorber scenarios, as well as to address scenarios where a memory optimized table exceeds available memory (also referred to as memory cliff).

The v1.0 release of the sample, including binaries and easy setup scripts, is available here:
IoT Smart Grid code sample v1.0 Release

The source code along with instructions on how to configure and run the sample can be found here:
IoT Smart Grid Source Code

Please give the sample a try, and let us know what you think!

↧

Increased Memory Size for In-Memory OLTP in SQL Server 2016

June 23, 2016, 4:02 pm

≫ Next: Columnstore Index vs BTree index

≪ Previous: IoT Smart Grid code sample

We are happy to announce that SQL Server 2016 removes the size limitation on user data in memory-optimized tables. You can grow your memory-optimized tables as large as you like as long as you have enough available memory. This means that with Windows Server 2016 you can leverage all 12TB (Terabytes) of available memory in a given server with In-Memory OLTP. With this, you can achieve incredible performance gains (up to 30X) for very large transactional (OLTP) workloads.

When In-Memory OLTP was initially released in SQL Server 2014, there was one key size limitation: SQL Server supported up to 256GB of active user data in tables with durability SCHEMA_AND_DATA in a given database. This limit was due to limitations in the storage subsystem – there was a limit of 8192 checkpoint file pairs, with each containing up to 128MB of data. 256GB was never a hard limit, and it was theoretically possible to store more data in durable tables. However, the storage size is dynamic in nature (even if the data size is not), since it is an append-only system with merge operations, that can go up and down in size, as the source files for merges go through the garbage collection process. Based on both the expected behavior of the system in various scenario and our internal testing we settled on the supported limit of 256GB.

With SQL Server 2016 we revamped the In-Memory OLTP storage subsystem, and as part of that work we removed the limit on number of checkpoint files, thereby removing any hard limit from supported data size. When SQL Server 2016 was initially released we stated that we supported up to 2TB of data in memory-optimized tables. This was the limit up to which we had tested, and found that the system worked well.

We have continued testing with large data sizes, and we have found that with 4TB of data in durable memory-optimized tables SQL Server 2016 continues to perform well. The machine in question had 5TB of memory total – the 1TB of overhead was used for operational needs such as supporting the online workload as well as database recovery. We have not found any scaling bottleneck when going from 2TB to 4TB, either in the throughput of the online workload, or with operations such as database recovery. We have thus decided to remove any limitation from our statement of supported data size. We will support any data size that your hardware can handle.

Further reading:

Estimate Memory Requirements for Memory-Optimized Tables
Getting started with In-Memory OLTP
General documentation: In-Memory OLTP

↧

Columnstore Index vs BTree index

July 17, 2016, 4:04 pm

≫ Next: Columnstore Index: Differences between Clustered/Nonclustered Columnstore Index

≪ Previous: Increased Memory Size for In-Memory OLTP in SQL Server 2016

In earlier blog why columnstore index, we had discussed what is a columnstore index and why do we need it. The columnstore storage model in SQL Server 2016 comes in two flavors; Clustered Columnstore Index (CCI) and Nonclustered Columnstore Index (NCCI) but these indexes are actually quite different than the traditional btree indexes. Here are the key differences

No key column(s): This may come as a surprise. Yes, though there are no key column(s) and yet these are considered as indexes. The reason for no key column(s) is that it would be very expensive to maintain row-order based on key column(s). Rowstore is organized as rows in pages with an auxiliary structure, row-offset table, to allow for an easy ordering of rows. On the other hand, the columnstore index organizes data as columns and compresses each column in one or more segments which would require either uncompress, insert the row, and then compress of each segment or alternatively maintain the row(s) outside of compressed segments. Since there are no key-column(s), searching for qualifying rows, in the absence on nonclustered indexes, requires scanning of full columnstore index which can get expensive unless one or more rowgroups can be eliminated based on filter conditions.
Heap vs Clustered Columnstore index: One way to look at clustered columnstore index is like a ‘heap’ that is organized as columns. Like rowstore ‘heap’, there is no ordering of the rows. A nonclustered index leaf node will refer to a row in columnstore index as <rowgroup-id, row-number> which is similar to how a row is reference is RID<page-id, row-id> for rowstore heaps. When searching for a row through nonclustered index in (NCI/CCI) case, the leaf row of nonclustered index will point to a <rowgroup-id, row-number> which can then be retrieved by accessing the referenced rowgroup.
Only One Columnstore Index: Unlike rowstore btree indices, you can only create one columnstore index, either CCI or NCCI, on a table.
Index Fragmentation: For rowstore based indexes, it is considered fragmented if (a) the physical order of pages in out of sync with the index-key order. (b) the data pages (clustered index) or index pages (for nonclustered index) are partially filled. A fragmented index will lead to significantly higher physical IOs and can potentially put more pressure on memory which can ultimately slowdown queries. Most organizations run a periodic index maintenance job to defragment indexes. For details, please refer to https://msdn.microsoft.com/en-us/library/ms189858.aspx#Fragmentation best practices on how to maintain btree indexes. For columnstore index, an index fragmentation is considered fragmented if (a) there are 10% or more rows marked as deleted in a compressed rowgroup (b) one or more smaller compressed rowgroups can be combined to create a larger compressed rowgroup such that the resultant compressed rowgroup has less than or equal to 1 million rows. Note, if a compressed rowgroup has less than 1 million rows due to dictionary size, it is not considered fragmented because there is nothing that can be done to increase its size. Also recall that a columnstore index consists of zero or more delta rowgroups as shown the in the picture below.

cci

The rows within delta rowgroup are organized as regular btree rowstore and they can get fragmented just like any other btree index but we don’t consider this as fragmentation because delta rowgroups are transitory and they eventually get compressed into compressed rowgroups. Please refer to https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/03/07/columnstore-index-defragmentation-using-reorganize-command/ on details how to defragment columnstore index, both NCCI and CCI.

Thanks

Sunil Agarwal

↧

Columnstore Index: Differences between Clustered/Nonclustered Columnstore Index

July 17, 2016, 7:26 pm

≫ Next: Columnstore Index: Parallel load into clustered columnstore index from staging table

≪ Previous: Columnstore Index vs BTree index

SQL Server 2016 provides two flavors of columnstore index; clustered (CCI) and nonclustered (NCCI) columnstore index. As shown in the simplified picture below, both indexes are organized as columns but NCCI is created on an existing rowstore table as shown on the right side in the picture below while a table with CCI does not have a rowstore table. Both tables can have one or more btree nonclustered indexes.

cci-ncci

Other than this, the physical structures on how data is stored in delta and compressed rowgroups are identical and both kinds of indexes have same performance optimizations including batchmode operators. However, there some key differences between these and the table below lists the main differences.

cci-vs-ncci-diff

Thanks

Sunil Agarwal

↧

Columnstore Index: Parallel load into clustered columnstore index from staging table

July 18, 2016, 5:46 pm

≫ Next: JSON is Generally available in Azure Sql Database!

≪ Previous: Columnstore Index: Differences between Clustered/Nonclustered Columnstore Index

SQL Server has supported parallel data load into a table using BCP, Bulk Insert and SSIS. The picture below shows a typical configuration of a Data Warehouse where data is loaded from external files either using BCP or SSIS. SQL Server supports parallel data load.

bcp

Another common scenario of loading data is via a staging table. Data is first loaded into one or more staging tables where it is cleansed and transformed and then it is moved to the target table as shown below. bcp-staging

While this works well, one challenge is that though loading data into staging tables can done in parallel, the data load from staging table into target tables is single-threaded which can slowdown the overall ETL process. To give you an extreme example, let us say you are running on a 64-core machine and the last step in your ETL is to move data from staging table to target table. If INSERT is single threaded, you will notice that only one of 64 CPUs are being used. Ideally, you want all 64 cores to be used for faster migration of data to target table. With SQL Server 2016, you can move data from staging table into a target table in parallel which can reduce the overall data load time significantly.

Example: This example shows data migration from a staging table into a target table with CCI both with/without parallel insert
-- create a target table with CCI

select * into ccitest_temp from ccitest where 1=2

create clustered columnstore index ccitest_temp_cci on ccitest_temp

— Move the data from staging table into the target table
— You will note that it is single threaded

insert into ccitest_temp

select top 4000000 * from ccitest

Here is the query plan. You will note that the 4 million rows are being inserted single threaded into the target table noparallel-insert-plan-1

This data movement took
SQL Server Execution Times:
CPU time = 11735 ms, elapsed time = 11776 ms.

Here is how the rowgroups look. Note, there are 4 compressed rowgroups. These rowgroups are directly compressed as number of rows are > 100k. Only one rowgroup has BULKLOAD as the trim reason as other three compressed rowgroups were not trimmed.

noparallel-insert-RG

Now, let us the same data migration but with TABLOCK.

— Now use the TABLOCK to get parallel insert
insert into ccitest_temp with (TABLOCK)

select top 4000000 * from ccitest

Here is the query plan. You will note that the 4 million rows are being inserted single threaded into the target table parallel-insert-plan-2

This data movement took 7.9 seconds as compared to 11.7 seconds with single-threaded plan. That is almost 35% reduction in the elapsed time.
SQL Server Execution Times: CPU time = 21063 ms, elapsed time = 7940 ms. Here is how the rowgroups look. Note, there are 4 compressed rowgroups each with 1 million rows and the compression was triggered due to BULKLOAD. It is set to BULKLOAD because each of the 4 threads (the machine I am testing on has 4 cores) ran out of rows after accumulating 1000000 rows and did not reach the magic marker of 1048576.

parallel-insert-RG

SQL Server 2016 requires following conditions to be met for parallel insert on CCI

Must specify TABLOCK
No NCI on the clustered columnstore index
No identity column
Database compatibility is set to 130

While these restrictions are enforced in SQL Server 2016 but they represent important scenarios. We are looking into relaxing these in subsequent releases. Another interesting point is that you can also load into ‘rowstore HEAP’ in parallel as well.

Thanks

Sunil Agarwal

↧

JSON is Generally available in Azure Sql Database!

September 5, 2016, 12:07 pm

≫ Next: Implementing Product Catalogs in SQL Server and Azure SQL database

≪ Previous: Columnstore Index: Parallel load into clustered columnstore index from staging table

JSON functionalities are now generally available in Azure Sql Database! All functions that are available in SQL Server 2016 are also available in Azure Sql Database.

Azure Sql Database enables you to get values from JSON documents using JSON_VALUE function, modify values in JSON text using JSON_MODIFY function, transform JSON to table using OPENJSON function or format Sql query results as JSON text using FOR JSON clause:

You can also index JSON values using NONCLUSTERED or FullText Search indexes.

JSON is available in all service tiers (basic, standard, and premium) but only in new SQL Database V12. You can see quick introduction here or more details in Getting Started page. you can also find code samples that JSON functions in Azure Sql Database on official Sql Server/Azure Sql Database GitHub repository.

Note that OPENJSON function requires database compatibility level 130. If all functions work except OPENJSON, you would need to set the latest compatibility level in database.

↧

Implementing Product Catalogs in SQL Server and Azure SQL database

September 7, 2016, 3:53 am

≫ Next: SQL Server 2016 – It Just Runs Faster: Always On Availability Groups Turbocharged

≪ Previous: JSON is Generally available in Azure Sql Database!

Product catalog is one of the key scenarios in NoSQL systems. In product catalog scenario, you need to store different types of products with different properties (e.g. phones have memory and CPU power; cars have number of doors and max speed, etc.)

If you try to model this in relational database you will end-up either with single product table with a lot of sparse columns where 5-10 columns will be used for particular product types, or you will have a lot of tables (one for each product type) and joins with many tables that contain parts of data.

In NoSQL systems you can model product as JSON documents and put only required key:value pairs in product objects. However, accessing key:value fields is slower than accessing columns directly.

SQL Server 2016 and Azure SQL Database where new JSON support is available, enable you to combine best practices from relational and NoSQL models. You can store some common properties as standard table columns, put properties that are specific to some products in JSON documents, choose should you store some fields as JSON collections or separate tables (e.g. tags, comments, etc.)

If you want to see how to use this hybrid approach to design and index products in SQL server 2016 and Azure SQL Database, you can see following blog posts:

There are also sample apps that show you how to implement product catalog in SQL Server or Azure SQL Database on SQL Server GitHub JSON samples. One of the app is ASP.NET Core REST Service that exposes products, and another is ASP.NET Core Web application that displays products in catalog and uses JSON functionalities in SQL Server 2016 and Azure SQL Database.

Finally there is a new video on Channel9 where it is explained how you can model products by combining relational and JSON data, and also query or update JSON fields:

Using JSON in SQL Server 2016 and Azure SQL Database | Data Exposed

If you need to design product catalogs and if you want to use hybrid approach with SQL+JSON these might be helpful resources and code samples.

↧

SQL Server 2016 – It Just Runs Faster: Always On Availability Groups Turbocharged

September 26, 2016, 1:49 pm

≫ Next: In-Memory OLTP Videos: What it is and When/How to use it

≪ Previous: Implementing Product Catalogs in SQL Server and Azure SQL database

When we released Always On Availability Groups in SQL Server 2012 as a new and powerful way to achieve high availability, hardware environments included NUMA machines with low-end multi-core processors and SATA and SAN drives for storage (some SSDs). Performance issues surrounding Availability Groups typically were related to disk I/O or network speeds. As we moved towards SQL Server 2014, the pace of hardware accelerated. Our customers who deployed Availability Groups were now using servers for primary and secondary replicas with 12+ core sockets and flash storage SSD arrays providing microsecond to low millisecond latencies. While we were confident in the design of SQL Server 2012, several customers reported to us performance problems that did not appear to be with disk subsystems, CPU, or networks. The rapid acceleration in technology brought on a new discovery and paradigm. Now disk I/O and CPU capacity were no longer an issue. Our design needed to scale and be adaptable to the modern hardware on the market. We needed to start thinking about how fast can we replicate to a synchronous secondary replica in terms of % of the speed of a standalone workload. (one without a replica).

The result is a design for SQL Server 2016 that provides high availability for the most demanding workloads on the latest hardware with minimal impact and scalable for the future. Our design for SQL Server 2012 and SQL Server 2014 is still proven and meets the demands for many of our customers. However, if you are looking to accelerate your hardware, our Always On Availability Group design for SQL Server 2016 can keep pace.

First, we looked at the overall architecture of the replica design. Prior to SQL Server 2016, it could take as many as 15 worker thread context switches across both the primary and secondary replicas to replicate a log block. On the super speed of a fast network and disk for primary and secondary, we needed to streamline the design. So now the path can be as little as 8 worker thread context switches across both machines provided the hardware can keep pace.

More aspects of this new design include a streamline design with the ability for the LogWriter thread on the primary to directly submit network I/O to the secondary. Our communication workers can stream log blocks in parallel to the secondary and execute on hidden schedulers to avoid any bottlenecks with other read workloads on the primary. On the secondary, we can spin up multiple LogWriter threads on NUMA machines and apply redo operations from the log in parallel. We also streamlined several areas of the code to avoid spinlock contention where it was not needed. We also streamlined and improved our encryption algorithms (including taking advantage of AES-NI hardware) so ensure it could keep us the pace as well.

Our goal become clear. We aspired to achieve 95% of transaction log throughput with a single synchronous secondary as compared to a standalone workload (90% if using encryption). The results we achieved were remarkable.

This chart shows our scaled results using a OLTP workload derived from TPC benchmarks. The Blue line represents a standalone OLTP workload. The Y axis represents throughput as measured by the Performance Monitor counter Databases:Log Bytes Flushed/Sec. (if replicating log blocks is slow the overall workload is slow and can’t push log bytes flushed on the primary) .The X axis is the number of concurrent users as we pushed the workload. The yellow line represents throughput results for SQL Server 2014 with a single sync replica. The Red line is for SQL Server 2016 with a single sync replica and the gray line is SQL Server 2016 with encryption. As you can see as we pushed the workload, SQL Server 2014 struggled to keep up with the scaling of a standalone workload but SQL Server 2016 stays right with it achieving our goal of 95%. And our scaling with encryption is right at the 90% line as compared to standalone.

These results are possible for anyone given a scalable hardware solution. Our tests on both primary and secondary machines used Haswell class 2 socket 18 core processors (with hyper threading 72 logical CPUs) with 384Gb of RAM machine. Our transaction log was on a striped 4x800Gb SSD and data on a 4×1.8TB PCI based SSD drive.

Performance is not the only reason to consider an upgrade to SQL Server 2016 for Always On Availability Groups. Consider these enhancements that make it a compelling choice:

Domain Independent Availability Groups. See more in this blog post.
Round-robin load balancing in readable secondaries
Increased number of auto-failover targets
Support for group-managed service accounts
Support for Distributed Transactions (DTC)
Basic HA in Standard edition
Direct seeding of new database replicas

So amp up your hardware and upgrade to SQL Server 2016 to see Always Availability Groups Turbocharged.

If you are new to SQL Server, get introduced to Always On Availability Groups with this video. Want to dive deeper? Learn more of the details from Kevin Farlee from this video or head right to our documentation at http://aka.ms/alwaysonavailabilitygroups.

Bob Ward
Principal Architect, Data Group, Tiger Team

↧

In-Memory OLTP Videos: What it is and When/How to use it

October 3, 2016, 11:07 am

≫ Next: Columnstore Index: In-Memory Analytics (i.e. columnstore index) Videos from Ignite 2016

≪ Previous: SQL Server 2016 – It Just Runs Faster: Always On Availability Groups Turbocharged

In-Memory OLTP is the premier technology for optimizing the performance of transaction processing in SQL Server. Last week at Microsoft Ignite 2016 we presented two session about the In-Memory OLTP technology in SQL Server and Azure SQL Database. For those of you who did not attend the conference or did not make it to the session, here is a brief recap with links to the videos for the sessions. At the bottom of this post you will find links to the demos used in the sessions, as well as further resources for In-Memory OLTP.

What is In-Memory OLTP?

Session title: Review In-Memory OLTP in SQL Server 2016 and Azure SQL Database

Link to the video: https://youtu.be/PuZ–v4c6HI

Recap of the session:

We explain why Microsoft decided to build the In-Memory OLTP feature. We go on to discuss the value proposition of In-Memory OLTP, as well as the key aspects of the technology that result in such great performance optimization of transactional workloads:

New data structures and data access methods built around the assumption that the active working set resides in memory
Lock- and latch-free implementation that provides high scalability
Native compilation of T-SQL modules for more efficient transaction processing

The demo (starting 34:22) illustrates the potential performance optimization you can achieve for transaction processing workloads, as well as the tools available in SSMS to get started with In-Memory OLTP in an existing application. We go on to review indexes and index recommendations for memory-optimized tables (starting 50:18). Finally, we review all the new features for In-Memory OLTP in SQL Server 2016 as well as Azure SQL Database (staring 1:03:11), to make it easier to adopt the technology, and manage applications using the technology.

When and How to use In-Memory OLTP?

Session title: Explore In-Memory OLTP architectures and customer case studies

Link to the video: https://youtu.be/h111hyt5Ndk

Recap of the session:

This session addresses the when and how to use In-Memory OLTP from two different angles:
a) listing characteristics that make a workload suitable and those that are not so suitable for In-Memory OLTP
b) reviewing common application patterns and actual customer uses of In-Memory OLTP

We start the session with a brief recap of In-Memory OLTP, followed by:

11:07 – discussion of durability options in SQL Server and impact on performance
17:44 – demo illustrating performance with different levels of durability
25:10 – when to use In-Memory OLTP – app characteristics indicating In-Memory OLTP may or may not be suitable
34:52 – scenarios and case studies – common scenarios for using In-Memory OLTP with some example customer case studies and architecture diagrams

The scenarios we review are:

High throughput OLTP with low latency (37:08)
Shock-absorber for concurrent data load (42:42)
Internet of Things (IoT) data ingestion and analytics (49:22) with a demo (55:00) illustrating the use of memory-optimized temporal tables to both support ingesting and analyzing high volumes of IoT data, and also manage the memory footprint through automatic offload to disk of historical data.
Session state and caching (1:01:52)
Tempdb replacement (1:06:26), showing the benefits of replacing traditional table variables and temp table with memory-optimized table variables and SCHEMA_ONLY tables
ETL – staging tables for data load and transformation (1:11:40)

Resources

Demos used in the session

In-Memory OLTP perf demo: this client application is used in both sessions, first to illustrate the potential perf benefits of In-Memory OLTP, and then to show the perf implications of using various durability settings. The script that comes with the demo uses a standard NONCLUSTERED index for reasons of convenience – to show the highest possible perf numbers, use a NONCLUSTERED HASH index instead, with a BUCKET_COUNT of about 10,000,000.
Memory-optimized table variables and temp tables: this blog post includes all demo scripts and instructions used in the session. In addition, it has instructions on how to start memory-optimizing table variables and temp tables in your applications.
IoT SmartGrid sample: this sample illustrates the use of SQL Server temporal memory-optimized tables to handle the load of IoT devices (in this case smart meters) being ingesting into the database for reporting and analytics purposes

Samples

Documentation

Migrating apps

Find me on Twitter: @jdebruijn

Download and try some of the demos yourself, and let me know what you think!

↧

Columnstore Index: In-Memory Analytics (i.e. columnstore index) Videos from Ignite 2016

October 3, 2016, 7:18 pm

≫ Next: Columnstore Index- Should I partition my columnstore Index?

≪ Previous: In-Memory OLTP Videos: What it is and When/How to use it

I presented two talks on columnstore index during Microsoft Ignite Conference 2016 in Atlanta, GA. The talks focused on describing new enhancements to columntore index in SQL Server 2016 as well as sharing three customer success stories. Here is a high level overview of each session and the video recording of the session.

Review ColumnStore Index in SQL Server 2016 and Azure SQL Database: The updateable clustered columnstore in Microsoft SQL Server 2016 offers a leading solution for your data warehouse workload with order of magnitude better data compression and query performance over traditional B-Tree based schemas. This session describes columnstore index internals with deep insight into data compression methodology for achieving high query performance including improvements in column store investments for SQL Server 2016 and Microsoft Azure SQL Database

Link to the recorded session – https://www.youtube.com/watch?v=CTtzyyX3HQA

We used AdventureWorksDW2016CTP3 database for some of the demos.

Hear customer success stories with columnstore index in SQL Server 2016: In-memory Analytics using columnstore index provides industry leading performance for analytics workload. This session covers some key customer workloads that have been successfully deployed in production both for in-memory analytics and real-time operational analytics with SQL Server 2016. For each workload, the customer describes the scenario, learnings and the performance achieved. We had three customersM-Files (https://www.m-files.com/en/intelligent-metadata-layer), Daman (https://www.damanhealth.ae/opencms/opencms/Daman/en/home/) and First American (http://www.firstam.com/) sharing their application and how they were able to leverage columnstore index in SQL Server 2016 to significantly improve the experience of their users/customers. I am very confident that the customer testimonials will help you in your decision to move your application to use columnstore index in SQL Server 2016.

Link to recorded session – https://www.youtube.com/watch?v=h_CspR86SB8

Thanks,

Sunil Agarwal

↧

Columnstore Index- Should I partition my columnstore Index?

October 4, 2016, 12:29 pm

≫ Next: SQL Server 2016 Express Edition in Windows containers

≪ Previous: Columnstore Index: In-Memory Analytics (i.e. columnstore index) Videos from Ignite 2016

Table partitioning is a perfect way to manage large tables especially in the context DataWarehouse (DW) which can be very large (think TBs). The table partitioning can help both in managing large amount of dara as well as improving the query performance by eliminating partitions that are not required. For example, if a FACT table stores SALES information for an organization, a common query pattern will be to look at SALES data from the last week or last month or last quarter and so on. In this case, partitioning on say weekly boundary could be helpful as SQL Server will eliminate other partitions during query execution.

The interesting thing with columnstore index is that it is implicitly partitioned by rowgroups, an internal physical organization. Assume that incoming data is in data/time order, then each compressed rowgroups are indirectly ordered by date/time. The columnstore index maintains Min/Max for most columns within a rowgroup. For the case here, the columnstore exactly knows the date/time range for all the rows in each rowgroup. Now, if you run a query to look at the sales data from the last querter, SQL Server can find those rows efficiently by looking at the metadata for each rowgroup and eliminating the ones that are out of range. This begs the question if you should even consider partitioning a table with CCI as it is already implicitly partitioned. The short answer is ‘Yes’ as illustrated below

If there are significant number of updates to compressed data, the implicit date/time ordering will begin to dilute as updated rows will be inserted into the delta store. For example, if you update a row from previous month, it will get deleted there and then inserted along with the latest inserts. So by partitioning the data, you can guarantee that all the data for a given week or month or quarter is in its respective partition.
If you query the date/time data along with some other attribute. For example, if you have three product lines (P1, P2 and P3) and you are capturing SALES data based on date/time order for all three product lines. Now if you want to query the SALES of last quarter for product line P1, SQL Server will need to process/eliminate the data of P2 and P3. Normally, this is not an issue because columnstore can process the data really fast. However, if you very large number of rows for each product line, and your normal query pattern is to run analytics for a specific product line, then partitioning your columnstore index on Product Line can be beneficial
Index maintenance can be targeted at partition level. Similarly, other benefits like marking FileGroups containing older partitions as Read Only allows you to control the size of your incremental backups.
You can compress older partitions with COLUMNSTORE_ARCHIVE option that gives you an additional 30% in storage savings at the cost of slower query performance which may be ok as older partitions may not be queries as often.

Thanks

Sunil Agarwal

↧

SQL Server 2016 Express Edition in Windows containers

October 13, 2016, 10:56 am

≫ Next: Columnstore index: Why do we refer to it as In-Memory Analytics?

≪ Previous: Columnstore Index- Should I partition my columnstore Index?

We are excited to announce the public availability of SQL Server 2016 Express Edition in Windows Containers! The image is now available on Docker Hub and the build scripts are hosted on our SQL Server Samples GitHub repository. This image can be used in both Windows Server Containers as well as Hyper-V Containers.

SQL Server 2016 Express Edition Docker Image | Installation Scripts

Please follow this blog post for detailed instructions on how to get started with SQL Server 2016 Express in Windows Containers.

↧