Quantcast
Channel: SQL Database Engine Blog
Viewing all 177 articles
Browse latest View live

Columnstore Index: Why do I need to create clustered columnstore Index on In-Memory OLTP- tables for Analytics?


CREATE OR ALTER – another great language enhancement in SQL Server 2016 SP1

$
0
0

We are happy to announce that SQL Server 2016 SP1 and SQL Server v.Next have new T-SQL language statement – CREATE [OR ALTER]. This statement combines CREATE and ALTER statements and creates object if it does not exist, or alter it if it is already there. CREATE OR ALTER can be applied on the following object:

This is one of the most voted language feature requests on SQL Server connect site with more than 500 votes. A code that uses CREATE OR ALTER is shown in the following example:

create or alter procedure procTest
as
begin
 print (1)
end;
go
create or alter function fnTest()
returns int
as
begin
 return(1)
end;
go
create or alter view vwTest
as
 select 1 as col;
go
create or alter trigger trTest
on Product
after insert, update
as
 RAISERROR ('We love CREATE OR ALTER!', 1, 10);

We hope that this statement would help you to to write easier T-SQL code.

EDIT (11/17/2016): This and other SQL Server 2016 SP1 topics are also available in the SQL Server Tiger blog.

In-Memory OLTP in Standard and Express editions, with SQL Server 2016 SP1

$
0
0

We just announced the release of Service Pack 1 for SQL Server 2016. With SP1 we made a push to bring a consistent programming surface area across all editions of SQL Server. One of the outcomes is that In-Memory OLTP (aka Hekaton), the premier performance technology for transaction processing, data ingestion, data load, and transient data scenarios, is now available in SQL Server Standard Edition and Express Edition, as long as you have SQL Server 2016 SP1.

In this blog post we recap what the technology is. We then describe the resource/memory limitations in Express and Standard Edition. We go on to describe the scenarios for which you’d want to consider In-Memory OLTP. We conclude with a sample script illustrating the In-Memory OLTP objects, and some pointers to get started.

How does In-Memory OLTP work?

In-Memory OLTP can provide great performance gains, for the right workloads. One of our customers managed to achieve 1.2 Million requests per second with a single machine running SQL Server 2016, leveraging In-Memory OLTP.

Now, where does this performance gain come from? In essence, In-Memory OLTP improves performance of transaction processing by making data access and transaction execution more efficient, and by removing lock and latch contention between concurrently executing transactions: it is not fast because it is in-memory; it is fast because it is optimized around the data being in-memory. Data storage, access, and processing algorithms were redesigned from the ground up to take advantage of the latest enhancements in in-memory and high concurrency computing.

Now, just because data lives in-memory does not mean you lose it when there is a failure. By default, all transactions are fully durable, meaning that you have the same durability guarantees you get for any other table in SQL Server: as part of transaction commit, all changes are written to the transaction log on disk. If there is a failure at any time after the transaction commits, your data is there when the database comes back online. In addition, In-Memory OLTP works with all high availability and disaster recovery capabilities of SQL Server, like AlwaysOn, backup/restore, etc.

To leverage In-Memory OLTP in your database, you use one or more of the following types of objects:

  • Memory-optimized tables are used for storing user data. You declare a table to be memory-optimized at create time.
  • Non-durable tables are used for transient data, either for caching or for intermediate result set (replacing traditional temp tables). A non-durable table is a memory-optimized table that is declared with DURABILITY=SCHEMA_ONLY, meaning that changes to these tables do not incur any IO. This avoids consuming log IO resources for cases where durability is not a concern.
  • Memory-optimized table types are used for table-valued parameters (TVPs), as well as intermediate result sets in stored procedures. These can be used instead of traditional table types. Table variables and TVPs that are declared using a memory-optimized table type inherit the benefits of non-durable memory-optimized tables: efficient data access, and no IO.
  • Natively compiled T-SQL modules are used to further reduce the time taken for an individual transaction by reducing CPU cycles required to process the operations. You declare a Transact-SQL module to be natively compiled at create time. At this time, the following T-SQL modules can be natively compiled: stored procedures, triggers and scalar user-defined functions.

In-Memory OLTP is built into SQL Server, and starting SP1, you can use all these objects in any edition of SQL Server. And because these objects behave very similar to their traditional counterparts, you can often gain performance benefits while making only minimal changes to the database and the application. Plus, you can have both memory-optimized and traditional disk-based tables in the same database, and run queries across the two. You will find a Transact-SQL script showing an example for each of these types of objects towards the end of this post.

Memory quota in Express and Standard Editions

In-Memory OLTP includes memory-optimized tables, which are used for storing user data. These tables are required to fit in memory. Therefore, you need to ensure you have enough memory for the data stored in memory-optimized tables. In addition, both Standard Edition and Express Edition each database a quota for data stored in memory-optimized tables.

To estimate memory size required for your data, consult the topic Estimate Memory Requirements for Memory-Optimized Tables.

These are the per-database quotas for In-Memory OLTP for all SQL Server editions, with SQL Server 2016 SP1:

SQL Server 2016 SP1 Edition In-Memory OLTP quota (per DB)
Express 352MB
Web 16GB
Standard 32GB
Developer Unlimited
Enterprise Unlimited

The following items count towards the database quota:

  • Active user data rows in memory-optimized tables and table variables. Note that old row versions do not count toward the cap.
  • Indexes on memory-optimized tables.
  • Operational overhead of ALTER TABLE operations, which can be up to the full table size.

If an operation causes the database to hit the cap, the operation will fail with an out-of-quota error:

Msg 41823, Level 16, State 171, Line 6
Could not perform the operation because the database has reached its quota for in-memory tables. See 'http://go.microsoft.com/fwlink/?LinkID=623028' for more information.

* Note: at the time of writing, this link points to an article about In-Memory OLTP in Azure SQL Database, which shares the same quota mechanism as SQL Server Express and Standard edition. We’ll update that article to discuss quotas in SQL Server as well.

If this happens, you will no longer be able to insert or update data, but you can still query the data. Mitigation is to delete data or upgrade to a higher edition. In the end, how much memory you need depends to a large extend how you use In-Memory OLTP. The next section has details about usage patterns, as well as some pointers to ways you can manage the in-memory footprint of your data.

You can monitor memory utilization through DMVs as well as Management Studio. Details are in the topic Monitor and Troubleshoot Memory Usage. Note that memory reported in these DMVs and reports can become slightly higher that the quota, since they include memory required for old row versions. Old row versions do count toward the overall memory utilization and you need to provision enough memory to handle those, but they do not count toward the quota in Express and Standard editions.

Usage scenarios for In-Memory OLTP

In-Memory OLTP is not a magic go-fast button, and is not suitable for all workloads. For example, memory-optimized tables will not really bring down your CPU utilization if most of the queries are performing aggregation over large ranges of data – Columnstore helps for that scenario.

Here is a list of scenarios and application patterns where we have seen customers be successful with In-Memory OLTP.

High-throughput and low-latency transaction processing

This is really the core scenario for which we built In-Memory OLTP: support large volumes of transactions, with consistent low latency for individual transactions.

Common workload scenarios are: trading of financial instruments, sports betting, mobile gaming, and ad delivery. Another common pattern we’ve seen is a “catalog” that is frequently read and/or updated. One example is where you have large files, each distributed over a number of nodes in a cluster, and you catalog the location of each shard of each file in a memory-optimized table.

Implementation considerations

Use memory-optimized tables for your core transaction tables, i.e., the tables with the most performance-critical transactions. Use natively compiled stored procedures to optimize execution of the logic associated with the business transaction. The more of the logic you can push down into stored procedures in the database, the more benefit you will see from In-Memory OLTP.

To get started in an existing application, use the transaction performance analysis report to identify the objects you want to migrate, and use the memory-optimization and native compilation advisors to help with migration.

Data ingestion, including IoT (Internet-of-Things)

In-Memory OLTP is really good at ingesting large volumes of data from many different sources at the same time. And it is often beneficial to ingest data into a SQL Server database compared with other destinations, because SQL makes running queries against the data really fast, and allows you to get real-time insights.

Common application patterns are: Ingesting sensor readings and events, to allow notification, as well as historical analysis. Managing batch updates, even from multiple sources, while minimizing the impact on the concurrent read workload.

Implementation considerations

Use a memory-optimized table for the data ingestion. If the ingestion consists mostly of inserts (rather than updates) and In-Memory OLTP storage footprint of the data is a concern, either

The following sample is a smart grid application that uses a temporal memory-optimized table, a memory-optimized table type, and a natively compiled stored procedure, to speed up data ingestion, while managing the In-Memory OLTP storage footprint of the sensor data: release and source code.

Caching and session state

The In-Memory OLTP technology makes SQL really attractive for maintaining session state (e.g., for an ASP.NET application) and for caching.

ASP.NET session state is a very successful use case for In-Memory OLTP. With SQL Server, one customer was about to achieve 1.2 Million requests per second. In the meantime, they have started using In-Memory OLTP for the caching needs of all mid-tier applications in the enterprise. Details: https://blogs.msdn.microsoft.com/sqlcat/2016/10/26/how-bwin-is-using-sql-server-2016-in-memory-oltp-to-achieve-unprecedented-performance-and-scale/

Implementation considerations

You can use non-durable memory-optimized tables as a simple key-value store by storing a BLOB in a varbinary(max) columns. Alternatively, you can implement a semi-structured cache with JSON support in SQL Server. Finally, you can create a full relational cache through non-durable tables with a full relational schema, including various data types and constraints.

Get started with memory-optimizing ASP.NET session state by leveraging the scripts published on GitHub to replace the objects created by the built-in session state provider.

Tempdb object replacement

Leverage non-durable tables and memory-optimized table types to replace your traditional tempdb-based #temp tables, table variables, and table-valued parameters.

Memory-optimized table variables and non-durable tables typically reduce CPU and completely remove log IO, when compared with traditional table variables and #temp table.

Case study illustrating benefits of memory-optimized table-valued parameters: https://blogs.msdn.microsoft.com/sqlserverstorageengine/2016/04/07/a-technical-case-study-high-speed-iot-data-ingestion-using-in-memory-oltp-in-azure/

Implementation considerations

To get started see: Improving temp table and table variable performance using memory optimization.

ETL (Extract Transform Load)

ETL workflows often include load of data into a staging table, transformations of the data, and load into the final tables.

Implementation considerations

Use non-durable memory-optimized tables for the data staging. They completely remove all IO, and make data access more efficient.

If you perform transformations on the staging table as part of the workflow, you can use natively compiled stored procedures to speed up these transformations. If you can do these transformations in parallel you get additional scaling benefits from the memory-optimization.

Getting started

Before you can start using In-Memory OLTP, you need to create a MEMORY_OPTIMIZED_DATA filegroup. In addition, we recommend to use database compatibility level 130, and set the database option MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT to ON.

You can use the script at the following location to create the filegroup in the default data folder, and set the recommended settings:
https://raw.githubusercontent.com/Microsoft/sql-server-samples/master/samples/features/in-memory/t-sql-scripts/enable-in-memory-oltp.sql

The following script illustrates In-Memory OLTP objects you can create in your database:

-- configure recommended DB option
 ALTER DATABASE CURRENT SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON
 GO
 -- memory-optimized table
 CREATE TABLE dbo.table1
 ( c1 INT IDENTITY PRIMARY KEY NONCLUSTERED,
   c2 NVARCHAR(MAX))
 WITH (MEMORY_OPTIMIZED=ON)
 GO
 -- non-durable table
 CREATE TABLE dbo.temp_table1
 ( c1 INT IDENTITY PRIMARY KEY NONCLUSTERED,
   c2 NVARCHAR(MAX))
 WITH (MEMORY_OPTIMIZED=ON,
       DURABILITY=SCHEMA_ONLY)
 GO
 -- memory-optimized table type
 CREATE TYPE dbo.tt_table1 AS TABLE
 ( c1 INT IDENTITY,
   c2 NVARCHAR(MAX),
   is_transient BIT NOT NULL DEFAULT (0),
   INDEX ix_c1 HASH (c1) WITH (BUCKET_COUNT=1024))
 WITH (MEMORY_OPTIMIZED=ON)
 GO
 -- natively compiled stored procedure
 CREATE PROCEDURE dbo.usp_ingest_table1
   @table1 dbo.tt_table1 READONLY
 WITH NATIVE_COMPILATION, SCHEMABINDING
 AS
 BEGIN ATOMIC
     WITH (TRANSACTION ISOLATION LEVEL=SNAPSHOT,
           LANGUAGE=N'us_english')

   DECLARE @i INT = 1

   WHILE @i > 0
   BEGIN
     INSERT dbo.table1
     SELECT c2
     FROM @table1
     WHERE c1 = @i AND is_transient=0

     IF @@ROWCOUNT > 0
       SET @i += 1
     ELSE
     BEGIN
       INSERT dbo.temp_table1
       SELECT c2
       FROM @table1
       WHERE c1 = @i AND is_transient=1

       IF @@ROWCOUNT > 0
         SET @i += 1
       ELSE
         SET @i = 0
     END
   END

 END
 GO
 -- sample execution of the proc
 DECLARE @table1 dbo.tt_table1
 INSERT @table1 (c2, is_transient) VALUES (N'sample durable', 0)
 INSERT @table1 (c2, is_transient) VALUES (N'sample non-durable', 1)
 EXECUTE dbo.usp_ingest_table1 @table1=@table1
 SELECT c1, c2 from dbo.table1
 SELECT c1, c2 from dbo.temp_table1
 GO

A perf demo using In-Memory OLTP can be found at: in-memory-oltp-perf-demo-v1.0.

Try In-Memory OLTP in SQL Server today!

Resources to get started:

Columnstore Index: Standard and Express editions with SQL Server 2016 SP1

Introducing Batch Mode Adaptive Memory Grant Feedback

$
0
0

SQL Server uses memory to store in-transit rows for hash join and sort operations. When a query execution plan is compiled for a statement, SQL Server estimates both the minimum required memory needed for execution and the ideal memory grant size needed to have all rows in memory.  This memory grant size is based on the estimated number of rows for the operator and the associated average row size.  If the cardinality estimates are inaccurate, performance can suffer:

  • For cardinality under-estimates, the memory grant can end up being too small and the rows then spill to disk, causing significant performance degradation compared to a fully memory-resident equivalent.
  • For cardinality over-estimates, the memory grant can be too large and the memory goes to waste. Concurrency can be impacted because the query may wait in a queue until enough memory becomes available, even though the query only ends up using a small portion of the granted memory.

You can sometimes address the cardinality misestimates through a variety of methods, such as statistics management (update frequency, increasing sample size), providing multi-column statistics, using intermediate result sets instead of one single complex query, or avoiding constructs such as table variables that have fixed cardinality estimates.  But for some scenarios, addressing poor estimates that impact memory grant sizing can be difficult to address directly without a significant refactoring of the statement or the use of hints such as MIN_GRANT_PERCENT and MAX_GRANT_PERCENT.

When it comes to improving cardinality estimation techniques, there is no one single approach that works for all possible statements.  With that in mind, the Query Processing team has been working on a new wave of adaptive query processing improvements to handle the more intractable cardinality estimation issues that often result in poor query performance.

Batch mode adaptive memory grant feedback is the first improvement under the adaptive query processing family of features to be surfaced in the public preview of the next release of SQL Server on Linux and Windows.  You can now test this feature for non-production workloads and this feature will also be surfaced in Azure SQL DB in a future update.

Addressing repeating-workloads, this improvement recalculates the actual memory required for a query and then updates the grant value for the cached plan.  When an identical query statement is executed, we will be able to use the revised memory grant size, reducing excessive memory grants that impact concurrency and fixing under-estimated memory grants that cause expensive spills to disk.

What kind of results can we expect to see?

For one internal Microsoft customer, they run a recurring process that generates summarized results from a very large telemetry data set.  They query this large data set fifteen times, each time pulling different metrics. Out of the fifteen separate queries, fourteen of them encounter spills to disk due to memory grant misestimates.

The following graph shows one example of using batch mode adaptive memory grant feedback. For the first execution of the customer’s query, duration was 88 seconds due to high spills:

DECLARE @EndTime datetime = '2016-09-22 00:00:00.000';
DECLARE @StartTime datetime = '2016-09-15 00:00:00.000';

SELECT TOP 10 hash_unique_bigint_id
FROM dbo.TelemetryDS
WHERE Timestamp BETWEEN @StartTime and @EndTime
GROUP BY hash_unique_bigint_id
ORDER BY MAX(max_elapsed_time_microsec) DESC

Memory Grant Feedback - Figure 1
With memory grant feedback enabled, for the second execution, duration is 1 second (down from 88 seconds) and we see spills are removed entirely and the grant is higher:
Memory Grant Feedback - Figure 2

How does batch mode adaptive memory grant feedback work?

For excessive grants, if the granted memory is more than two times the size of the actual used memory, memory grant feedback will recalculate the memory grant and update the cached plan.  Plans with memory grants under 1MB will not be recalculated for overages.

For insufficiently sized memory grants that result in a spill to disk for batch mode operators, memory grant feedback will trigger a recalculation of the memory grant. Spill events are reported to memory grant feedback and can be surfaced via the spilling_report_to_memory_grant_feedback XEvent event. This event returns the node id from the plan and spilled data size of that node.

Memory Grant Feedback - Figure 3

Can I see the adjusted memory grant in my execution plan?

Yes.  The adjusted memory grant will show up in the actual (post-execution) plan via the “GrantedMemory” property.  You can see this property in the root operator of the graphical showplan or in the showplan XML output:

<MemoryGrantInfo SerialRequiredMemory="1024" SerialDesiredMemory="10336" RequiredMemory="1024" DesiredMemory="10336" RequestedMemory="10336" GrantWaitTime="0" GrantedMemory="10336" MaxUsedMemory="9920" MaxQueryMemory="725864" />

Memory Grant Feedback - Figure 4
How do I enable batch mode adaptive memory grant feedback?

To have your workloads automatically eligible for this improvement, enable compatibility level 140 for the database.  For CTP1, you can set this using Transact-SQL.  For example:

ALTER DATABASE [WideWorldImportersDW] SET COMPATIBILITY_LEVEL = 140;

What if my memory grant requirements differ significantly based on parameter values of consecutive executions?

Different parameter values may also require different query plans in order to remain optimal. This type of query is defined as “parameter-sensitive.” For parameter-sensitive plans, memory grant feedback will disable itself on a query if it has unstable memory requirements.  The plan is disabled after several repeated runs of the query and this can be observed by monitoring the memory_grant_feedback_loop_disabled XEvent.

Does this feature help singleton executions?

Feedback can be stored in the cached plan for a single execution, however it is the consecutive executions of that statement that will benefit from the memory grant feedback adjustments.  This improvement applies to repeated execution of statements.

How can I track when batch mode adaptive memory grant feedback is used?

You can track memory grant feedback events using the memory_grant_updated_by_feedback XEvent event.  This event tracks the current execution count history, the number of times the plan has been updated by memory grant feedback, the ideal additional memory grant before modification and the ideal additional memory grant after memory grant feedback has modified the cached plan.

Memory Grant Feedback - Figure 5

Why batch mode and not row mode for this improvement?

We are starting with batch mode, however we are looking to expand this to row mode as well for a future update.

Does this improvement work with Resource Governor and memory grant query hints?

The actual memory granted honors the query memory limit determined by resource governor or query hint.

What if the plan is evicted from cache?

Feedback is not persisted if the plan is evicted from cache.

Will this improvement work if I use OPTION (RECOMPILE)?

A statement using OPTION(RECOMPILE) will create a new plan and not cache it. Since it is not cached, no memory grant feedback is produced and it is not stored for that compilation and execution.  However, if an equivalent statement (i.e. with the same query hash) that did not use OPTION(RECOMPILE) was cached and then re-executed, the consecutive statement can benefit from memory grant feedback.

Will Query Store capture changes to the memory grant?

Memory grant feedback will only change the cached plan. Changes are not captured in Query Store for this version.

I’m interested in the other adaptive query processing features.  How can I stay informed, provide feedback and learn about how (and when) I can test my own workloads?

Please sign up for the adaptive query processing preview here: https://aka.ms/AdaptiveQPPreview

We’ll keep in touch with customers who fill out the survey and we will contact you regarding testing and feedback opportunities that will surface in early 2017.

Transaction Commit latency acceleration using Storage Class Memory in Windows Server 2016/SQL Server 2016 SP1

$
0
0

SQL Server 2016 SP1 adds a significant new performance feature, the ability to accelerate transaction commit times (latency) by up to 2-4X, when employing Storage Class Memory (NVDIMM-N nonvolatile storage). This scenario is also referred to as “persistent log buffer” as explained below.

This enhancement is especially valuable for workloads which require high frequency, low latency update transactions. These app patterns are common in the finance/trading industry as well as online betting and some process control applications.

One of the most significant performance bottlenecks for low latency OLTP transactions is writing to the transaction log. This is especially true when employing In-Memory OLTP tables which remove most other bottlenecks from high update rate applications. This feature works to remove that final bottleneck from the overall system, enabling stunning increases in overall speed. Previously, customers that needed these transaction speeds leveraged features such as delayed durability (grouping several transaction commits into a single write, which amortizes the IO overhead, but can put a small number of commits at risk if the write fails after the commit is acknowledged), or In-Memory OLTP with non-durable tables for applications where durability is not required, such as ASP Session state cache, or the midpoint in an ETL pipeline, where the data source can be recreated.

Results:

As a sample of the results obtained, in one simple end to end SQL example of an In-Memory OLTP application, we compared the results when the transaction log was located on NVMe SSD (the fastest class of non-memory storage), against the same configuration with the SCM persistent log buffer configured. Each configuration was run for 20 seconds to collect throughput information:

Rows Updated Updates per second Avg. time per transaction (ms)
Log on NVMe 1,226,069 63,214 0.380
Log on NVMe with
SCM persistent log buffer
2,509,273 125,417 0.191

As you can see, the persistent log buffer configuration was approximately 2X faster than putting the log on the fastest available storage. You can see the full video of this demo here:

https://channel9.msdn.com/Shows/Data-Exposed/SQL-Server-2016-and-Windows-Server-2016-SCM–FAST

How it works:

Storage Class Memory is surfaced by Windows Server 2016 as a disk device with special characteristics. When the filesystem is formatted as Direct Access Mode (DAX), the operating system allows byte-level access to this persistent memory.

SQL Server assembles transaction log records in a buffer, and flushes them out to durable media during commit processing. SQL will not complete the commit until the commit log record is durably stored on media. This hard flush can delay processing in very high transaction rate (low latency) systems.

With this new functionality, we use a region of memory which is mapped to a file on a DAX volume to hold that buffer. Since the memory hosted by the DAX volume is already persistent, we have no need to perform a separate flush, and can immediately continue with processing the next operation. Data is flushed from this buffer to more traditional storage in the background.

Example

The first diagram shows how traditional log processing works. Log records are copied into a buffer in memory until either the buffer is filled, or a commit record is encountered. When a commit record is encountered, the buffer must immediately be written to disk in the transaction log file, in order to complete the commit operation. This happens regardless of how full the log buffer is. If another commit comes in during the IO processing, it waits until the IO is complete.

 

before2

With this new feature, the log records are fully durable as soon as they are copied into the log buffer, so there is no need to write them to disk immediately. This means that you can collect potentially many commits into the same IO, which is the same as is done with Delayed Durability, but with the difference that there is no exposure to data loss. The log buffer is written to the transaction log when it fills up, just as it would if there didn’t happen to be any commit records in the stream before the buffer filled up.

 

after2

Setting it up:

First, the SCM must be setup and surfaced as a volume. The storage should automatically be surfaced as a volume in Windows, and can be identified using the Powershell command:

PS C:\Windows\system32> Get-PhysicalDisk | select bustype, healthstatus, size | sort bustype

BusType HealthStatus size
——- ———— —-

NVMe Healthy 1600321314816
NVMe Healthy 1601183940608
RAID Healthy 960197124096
RAID Healthy 300000000000
RAID Healthy 960197124096
SATA Healthy 960197124096
SATA Healthy 500107862016
SATA Healthy 960197124096
SATA Healthy 500107862016
SCM Healthy 8580476928
SCM Healthy 8580464640

Disks that are Storage Class Memory are reported as BusType SCM.

You can then format the volume using the /dax option to the format command as documented here: https://technet.microsoft.com/windows-server-docs/management/windows-commands/format

You can verify that the SCM based volume was formatted for DAX access using the command:

PS C:\Windows\system32> fsutil fsinfo volumeinfo d:

Volume Name : DirectAccess (DAX) NVDIMM
Volume Serial Number : 0xc6a4c99a
Max Component Length : 255
File System Name : NTFS
Is ReadWrite
Supports Case-sensitive filenames
Preserves Case of filenames
Supports Unicode in filenames
Preserves & Enforces ACL’s
Supports Disk Quotas
Supports Reparse Points
Supports Object Identifiers
Supports Named Streams
Supports Hard Links
Supports Extended Attributes
Supports Open By FileID
Supports USN Journal
Is DAX volume
PS C:\Windows\system32>

Once the SCM volume is properly set up and formatted as a DAX volume all that remains is to add a new log file to the database using the same syntax as any other log file, where the file resides on the DAX volume. The log file on the DAX volume will be sized at 20MB regardless of the size specified with the ADD FILE command:


ALTER DATABASE <MyDB> ADD LOG FILE (NAME = <DAXlog>,
FILENAME = ‘<Filepath to DAX Log File>’, SIZE = 20 MB

Disabling the persistent log buffer feature

In order to disable the persistent log buffer feature, all that is required is to remove the log file from the DAX volume:

ALTER DATABASE <MyDB> SET SINGLE_USER
ALTER DATABASE <MyDB> REMOVE FILE <DAXlog>
ALTER DATABASE <MyDB> SET MULTI_USER

 

Any log records being kept in the log buffer will be written to disk, and at that point there is no unique data in the persistent log buffer, so it can be safely removed.

Interactions with other features and operations

Log file IO

The only impact on the log file IO is that we will tend to pack the log buffers fuller, so we will do fewer IOs to process the same amount of data, thus making the IO pattern more efficient.

Backup/Restore

Backup/Restore operates normally. If the target system does not have a DAX volume at the same path as the source system, the file will be created, but will not be used, since the SCM prerequisite isn’t met. That file can be safely removed.

If there is a DAX volume at the same path as the source system, the file will function in persistent log buffer mode on the restored database.

If the target system has a DAX volume at a different path than the source system, the database can be restored using the WITH MOVE option to locate the persistent log buffer file to the correct path on the target system.

Transparent Database Encryption (TDE)

Because this feature puts log records in durable media immediately from the processor, there is no opportunity to encrypt the data before it is on durable media. For this reason, TDE and this feature are mutually exclusive; you can enable this feature, or TDE, but not both together.

Always On Availability Groups

Because the Availability Group replication depends on the normal log writing semantics, we disable persistent log buffer functionality on the Primary, however it can be enabled on the secondary, which will speed up synchronization, as the secondary doesn’t need to wait for the log write in order to acknowledge the transaction.

To configure this in an Availability Group, you would create a persistent log buffer file on the primary in a location which is ideally a DAX volume on all replicas. If one or more replicas do not have a DAX volume, the feature will be ignored on those replicas.

Other Features

Other features (Replication, CDC, etc.) continue to function as normal with no impact from this feature.

More information on latency in SQL databases

https://channel9.msdn.com/Shows/Data-Exposed/Latency-and-Durability-with-SQL-Server-2016

 

Columnstore Index Perfomance: Column Elimination

Columnstore Index Performance: Rowgroup Elimination


Columnstore Index Performance: Batch Mode Execution

SQL Server R Services – Why we built it

$
0
0

This is the first post in a series of blog posts about SQL Server R Services. We want to take you behind the scenes and explain why we have built this feature and deep dive into how it works.

Future posts will include details about the architecture, highlighting advanced use cases with code examples to explain what can be done with this feature.

Before we get into details, we want to start off by giving you some background on why we built SQL Server R Services and how the architecture looks.

Making SQL Server a data intelligence platform

SQL Server R Services is an in-database analytics feature that tightly integrates R with SQL Server. With this feature, we want to provide a data intelligence platform that moves intelligence capabilities provided with R closer to the data. So why is that a good thing? The short answer is that a platform like this makes it much easier to consume and manage R securely and at scale in applications in production.

intelligence-db

Challenges with traditional R language

There are three major challenges with traditional R. This is how we address those challenges by moving intelligence closer to the data in SQL Server:

Challenge 1 – Data Movement

Moving data from the database to the R Runtime becomes painful as data volumes grow and carries security risks

Our solution: Reduce or eliminate data movement with In-Database analytics

o   There is no need to move data when you can execute R scripts securely on SQL Server. You can still use your favorite R development tool and simply just push the compute to execute on SQL Server using the compute context.

Challenge 2 – Operationalize R scripts and models

It is not trivial how to call R from your application in production. Often, you must recode the R script in another language. This can be time consuming and inefficient.

Our solution:

o   Use familiar T-SQL stored procedures to invoke R scripts from your application

o   Embed the returned predictions and plots in your application

o   Use the resource governance functionality to monitor and manage R executions on the server

Challenge 3 – Enterprise Performance and scale

R runs single threaded and only accommodates datasets that fit into available memory

Our solution:

o   Use SQL Server’s in-memory querying and Columnstore indexes

o   Leverage RevoScaleR support for large datasets and parallel algorithms

SQL Server Extensibility Architecture

The foundation of this data intelligence platform is the new extensibility architecture in SQL Server 2016.

Extensibility Framework – Why?

The way we make SQL Server and R work together is by using a framework we call the extensibility architecture. Previously, CLR or extended stored procedures would enable you to run code outside the constructs of SQL Server, but in those cases, the code still runs inside the SQL Server process space. Having external code running inside the SQL Server process space can cause disruption  and it is also not possible to legally embed runtimes that are not owned by Microsoft.

Instead, we have built a new generic extensibility architecture that enables external code, in this case R programs, to run, not inside the SQL Server process, but as external processes that launch external runtimes. If you install SQL Server with R Services, you will be able to see the new Launchpad service in SQL Server configuration manager:

sql-config-mgr

T-SQL interface: sp_execute_external_script

So how is an external script, like an R script, executed using the extensibility architecture? Well, we have created a new special stored procedure called sp_execute_external_script for that. This stored procedure has all the benefits of any other stored procedure. It has parameters, can return results and is executable from any TSQL client that can run queries. It also enables you to execute external scripts inside SQL Server.

When you execute the stored procedure sp_execute_external_script, we connect to the Launchpad service using a named pipe and send a message to that service telling it what we want to run and how. We currently only support R as language.

Launchpad has a registration mechanism for launchers specific to a runtime/language. Based on the script type, it will invoke the corresponding launcher which handles the duties for invoking and managing the external runtime execution. This launcher creates a Satellite process to execute our R Scripts.

The Satellite process has a special dll that knows how to exchange data with SQL Server to retrieve input rows/parameters and send back results and output parameters. Multiple of these processes can be launched to isolate users from each other and achieve better scalability.

One major advantage with Launchpad is that it uses proven SQL Server technologies such as SQLOS and XEvent to enable XEvent tracing of the Launchpad service. You can read more about how to collect XEvents for R Services here.

Looking ahead

Without going into too much detail, this is how the extensibility architecture works. We hope that you found this interesting and that you will stay tuned for our coming blog posts on this topic.

Use this tutorial if you are new to SQL R Services and want to get started.

Author: Nellie Gustafsson

Columnstore Index Performance: SQL Server 2016 – Multiple Aggregates

$
0
0

SQL product team has made significant improvements in columnstore index functionality, supportability and performance during SQL Server 2016 based on the feedback from customers. This blog series focuses on the performance improvements done as part of SQL Server 2016. Customers will get these benefits automatically with no changes to the application when they upgrade the application to SQL Server 2016. Please refer to the following blog for details

Thanks

Sunil Agarwal

JSON data in clustered column store indexes

$
0
0

Clustered column store indexes (CCI) in SQL Server vNext and Azure SQL Database support LOB types like NVARCHAR(MAX), which allows you to store string with any size, including JSON documents with any size. With CCI you can get 3x compression and query speedup compared to regular tables without any application or query rewrites. In this post we will see one experiment that compares row-store and column store formats used to store JSON collections.

Why would you store JSON documents in CCI?

Clustered column store indexes are good choice for analytics and storage  – they provide high compression of data and faster analytic queries. In this post, we will see what benefits you can get from CCI when you store JSON documents.

I will assume that we have one single column table with CCI that will contain JSON documents:

create table deals (
      data nvarchar(max),
      index cci clustered columnstore
);

This is equivalent to collections that you might find in classic NoSQL database because they store each JSON document as a single entity and optionally create indexes on these documents. The only difference is CLUSTERED COLUMNSTORE index on this table that provides the following benefits:

  1. Data compression – CCI uses various techniques to analyze your data and choose optimal compression algorithms to compress data.
  2. Batch mode analytic – queries executed on CCI process rows in the batches from 100 to 900 rows, which might be much faster than row-mode execution.

In this experiment I’m using 6.000.000 json documents exported from TPCH database. Rows from TPCH database are formatted as JSON documents using FOR JSON clause and exported into the tables with and without CCI. The format of the JSON documents used in this experiment is described in the paper: TPC-H applied to MongoDB: How a NoSQL database performs, and shown on the following picture:

tpch-json

Ref: TPC-H applied to MongoDB: How a NoSQL database performs

JSON documents are stored in standard table with a single columns and equivalent table with CCI and performance are compared.

Compression

First we can check what is compression ratio that we are getting when we store JSON in collection with CCI. We can execute the following query to get the size of the table:

exec sp_spaceused 'deals'

Results returned for table with and without CCI are:

  • Table with CCI 6.165.056 KB
  • Table without CCI 23.997.744 KB

Compression ratio in this case is 3.9x. Although CCI is optimized for scalar data compression, you might also get a good compression on JSON data.

JSON analytic

JSON functions that are available in SQL Server 2016 and Azure SQL Database enable you to parse JSON text and get the values from the JSON. You can use these values in any part of SQL query. An example of the query that calculates average value of extended price grouped by marketing segments is shown in the following sample:

select JSON_VALUE(data, '$.order.customer.mktsegment'), avg(CAST(JSON_VALUE(data, '$.extendedprice') as float))
from deals
group by JSON_VALUE(data, '$.order.customer.mktsegment')

Instead of joining different tables you can just change the paths in the second parameter of JSON_VALUE function to select different fields from JSON that you want to analyze.

In this experiment we have simple 5 analytic queries that calculate average value of some price column from the JSON grouped by other json values (queries are similar to the query above). The same queries are executed both on row-store table and table with CCI on Azure SQL Db P11 instance, and the results are shown below:

Query Column store(sec) Row-store (sec)
Q1 11 18
Q2 15 33
Q3 17 36
Q4 18 39
Q5 21 51

Depending on the query, toy might get 2-3x speedup in analytic on JSON data.

Conclusion

CLUSTERED COLUMNSTORE indexes provide compression and analytic query speed-up. Without any table changes, or query rewrites you can get up top 4x compression and 3x speed-up on your queries.

SQL Server 2016 SP1 and higher versions enables you to create COLUMNSTORE indexes on any edition (even in the free edition), but in this version there is a size constraint of 8KB on JSON documents.  Therefore, you can use COLUMNSTORE indexes on your JSON data and get performance improvements without any additional query rewrites.

 

 

 

 

 

Exporting tables from SQL Server in json line-delimited format using BCP.exe

$
0
0

Line-delimited JSON is one common format used to exchange data between systems and for streaming JSON data. SQL Server can be used to export content of tables into line-delimited JSON format.

Line-delimited JSON is a variation of JSON format where all JSON objects are stored in single line delimited with new-line characters, e.g.:

{"ProductID":15,"Name":"Adjustable Race","Price":75.9900,"Quantity":50}
{"ProductID":16,"Name":"Bearing Ball","Color":"Magenta","Size":"62","Price":15.9900,"Quantity":90}
{"ProductID":17,"Name":"BB","Color":"Magenta","Size":"62","Price":28.9900,"Quantity":80}
{"ProductID":18,"Name":"Blade","Color":"Magenta","Size":"62","Price":18.0000,"Quantity":45}

Although this is not a valid JSON format, many system use it to exchange data.

One advantage of line-delimited JSON format compared to the standard JSON is the fact that you can append new JSON objects at the end of the file without removing closing array bracket as in the standard JSON.

In this post I will show you how to export the content of a table shown in the following listing in line-delimited JSON format:

CREATE TABLE Product (
 ProductID int IDENTITY PRIMARY KEY,
 Name nvarchar(50) NOT NULL,
 Color nvarchar(15) NULL,
 Size nvarchar(5) NULL,
 Price money NOT NULL,
 Quantity int NULL
)

If you want to select all rows from the table in JSON format, you can use standard FOR JSON clause:

select ProductID, Name, Color, Size, Price, Quantity
from Product for json path

This query will return all rows as JSON objects separated with comma and wrapped with [ and ].

Small modification of query will enable you to return one object per row:

select (select ProductID, Name, Color, Size, Price, Quantity for json path, without_array_wrapper)
from Product

You can use standard bcp.exe tool to generate line delimited JSON files using this query:

bcp "select (select ProductID, Name, Color, Size, Price, Quantity for json path, without_array_wrapper) from Product" queryout .\products.json  -c -S ".\SQLEXPRESS" -d ProductCatalog -T

Note that I’m using queryout option because I have specified the T-SQL query that will extract data, and -c option that will generate the output in character format. This option does not prompt for each field; it uses char as the storage type, without prefixes and \r\n (newline character) as the row terminator.

Running this bcp command would generate line-delimited JSON file containing one JSON object for every row in the table.

 

Extreme 25x compression of JSON data using CLUSTERED COLUMNSTORE INDEXES

$
0
0

CLUSTERED COLUMNSTORE INDEXES (CCI) provide extreme data compression. In Azure SQL Database and SQL Server vNext you can create CCI on tables with NVARCHAR(MAX) columns. Since JSON is stored as NVARCHAR type, now you can store huge volumes of JSON data in tables with CCI. In this post, I will show you how you can get 25x compression on a table that contains JSON/NVARCHAR(MAX) column using CCI.

In this experiment, I will use publicly available ContosoDW database where we have FactOnlineSales table with 12 million records. This fact table is related to a number of dimension tables such as DimStore, DimCustomer, DimProduct, etc., as it is shown on the following figure:

contosodwsales

 

Imagine that all these related tables are stored in NoSQL-style as a single JSON document. In that case we would have a single table that will have some Data NVARCHAR(MAX) column where we would store data from all related tables as JSON text – something like:

DROP TABLE IF EXISTS SalesRecords
GO
CREATE TABLE SalesRecords(
 OrderNumber nvarchar(20) not null,
 ProductKey int not null,
 StoreKey int not null,
 Date datetime not null,
 CustomerKey int not null,
 Data nvarchar(max) not null,
 INDEX cci CLUSTERED COLUMNSTORE
)

In this example, I will keep int key columns as separate columns and store all other columns from FactOnlineSales table and all columns from the related tables in a single Data column. The query that will de-normalize all related dimensions into a single column is at the end of this post, and it looks like:

INSERT INTO SalesRecords(OrderNumber, StoreKey, ProductKey, Date, CustomerKey, Data)
SELECT FactOnlineSales.SalesOrderNumber, FactOnlineSales.StoreKey, FactOnlineSales.ProductKey, FactOnlineSales.DateKey, FactOnlineSales.CustomerKey,
 (SELECT ... FROM DimensionTables FOR JSON PATH) as Data

I’m joining all related dimension tables, format them as JSON text using FOR JSON clause and loading everything into SalesRecords table. This query will populate a table with CCI index.

I will also create a copy of this table but without CCI (plain heap table) using the following query:

select *
 into SalesRecordsRS
 from SalesRecords

Now, I will compare sizes of the table with CCI and the table without CCI using the following query:

exec sp_spaceused 'SalesRecordsRS'
exec sp_spaceused 'SalesRecords'

The results of these queries are shown below:

contosodwsales-space-used

A table without CCI has 101.020.920 KB, while the table with CCI has only 4.047.128 KB in data column. With CCI we can compress 100GB table to 4GB with 24.96 compression ratio!

Compression is not important only for storage savings. The following query on a table with CCI this query is executed in 46 seconds, while on a heap table execution takes 13 min 45 seconds.

select min(datalength(data)), avg(datalength(data)), max(datalength(data))
 from SalesRecords

Smaller disk io and batch execution provided by CCI enables you to run 18x faster queries.

Conclusion

CLUSTERED COLUMNSTORE INDEXES provide extreme data compression in SQL Server and Azure SQL Database. With NVARCHAR(MAX) support in CCI indexes you can use them on your JSON data stored is database and get high 25x compression. Therefore, CCI is a perfect solution if you need to store a large volume of JSON data in your SQL Database.

ContosoDW database is publicly available for download, so you can use this database and the script below to re-create this table and try this in your environment.

SQL Script used to populate the table is:

INSERT INTO SalesRecords(OrderNumber, StoreKey, ProductKey, Date, CustomerKey, Data)
SELECT FactOnlineSales.SalesOrderNumber, FactOnlineSales.StoreKey, FactOnlineSales.ProductKey, FactOnlineSales.DateKey, FactOnlineSales.CustomerKey,
 (SELECT FactOnlineSales.PromotionKey,
 FactOnlineSales.SalesOrderLineNumber, FactOnlineSales.SalesQuantity, FactOnlineSales.SalesAmount, FactOnlineSales.CurrencyKey,
 FactOnlineSales.ReturnQuantity, FactOnlineSales.ReturnAmount, FactOnlineSales.DiscountQuantity, FactOnlineSales.DiscountAmount, FactOnlineSales.TotalCost,
 FactOnlineSales.UnitCost, FactOnlineSales.UnitPrice,
 DimProduct.ProductName AS [Product.Name], DimProduct.ProductDescription AS [Product.Description], DimProduct.Manufacturer AS [Product.Manufacturer],
 DimProduct.BrandName AS [Product.Brand], DimProduct.ClassName AS [Product.Class], DimProduct.StyleName AS [Product.Style],
 DimProduct.ColorName AS [Product.Color], DimProduct.Size AS [Product.Size], DimProduct.SizeRange AS [Product.SizeRange],
 DimProduct.Weight AS [Product.Weight], DimProduct.UnitCost AS [Product.UnitCost], DimProduct.UnitPrice AS [Product.UnitPrice],
 DimProduct.ImageURL AS [Product.ImageURL], DimProduct.ProductURL AS [Product.URL],
 DimProductSubcategory.ProductSubcategoryLabel AS [Product.SubcategoryLabel], DimProductSubcategory.ProductSubcategoryName AS [Product.SubcategoryName],
 DimProductSubcategory.ProductSubcategoryDescription AS [Product.SubcategoryDescription], DimProductCategory.ProductCategoryLabel AS [Product.CategoryLabel],
 DimProductCategory.ProductCategoryName AS [Product.CategoryName], DimProductCategory.ProductCategoryDescription AS [Product.CategoryDescription],
 DimCustomer.CustomerLabel AS [Customer.Label], DimCustomer.Title AS [Customer.Title], DimCustomer.FirstName AS [Customer.FirstName],
 DimCustomer.MiddleName AS [Customer.MiddleName], DimCustomer.LastName AS [Customer.LastName], DimCustomer.NameStyle AS [Customer.NameStyle],
 DimCustomer.BirthDate AS [Customer.BirthDate], DimCustomer.MaritalStatus AS [Customer.MaritalStatus], DimCustomer.Suffix AS [Customer.Suffix],
 DimCustomer.Gender AS [Customer.Gender], DimCustomer.EmailAddress AS [Customer.EmailAddress], DimCustomer.YearlyIncome AS [Customer.YearlyIncome],
 DimCustomer.TotalChildren AS [Customer.TotalChildren], DimCustomer.NumberChildrenAtHome AS [Customer.NumberChildrenAtHome],
 DimCustomer.Education AS [Customer.Education], DimCustomer.Occupation AS [Customer.Occupation],
 DimCustomer.HouseOwnerFlag AS [Customer.HouseOwnerFlag], DimCustomer.AddressLine1 AS [Customer.AddressLine1],
 DimCustomer.NumberCarsOwned AS [Customer.NumberCarsOwned], DimCustomer.AddressLine2 AS [Customer.AddressLine2],
 DimCustomer.Phone AS [Customer.Phone], DimCustomer.CompanyName AS [Customer.CompanyName], DimGeography_1.CityName AS [Customer.CityName],
 DimGeography_1.StateProvinceName AS [Customer.StateProvinceName], DimGeography_1.RegionCountryName AS [Customer.RegionCountryName],
 DimGeography_1.ContinentName AS [Customer.ContinentName], DimGeography_1.GeographyType AS [Customer.GeographyType],
 JSON_QUERY(CONCAT('{"type": "Feature","geometry": {"type": "Point","coordinates": [',DimGeography_1.Geometry.STX,',', DimGeography_1.Geometry.STY,']}}')) AS [Customer.Geometry],
 DimCurrency.CurrencyName AS [Currency.Name], DimCurrency.CurrencyDescription AS [Currency.Description],
 DimCurrency.CurrencyLabel AS [Currency.Label], DimPromotion.PromotionLabel AS [Promotion.Label], DimPromotion.PromotionName AS [Promotion.Name],
 DimPromotion.PromotionDescription AS [Promotion.Description], DimPromotion.DiscountPercent AS [Promotion.DiscountPercent],
 DimPromotion.PromotionType AS [Promotion.Type], DimPromotion.PromotionCategory AS [Promotion.Category], DimPromotion.StartDate AS [Promotion.StartDate],
 DimPromotion.EndDate AS [Promotion.EndDate], DimPromotion.MinQuantity AS [Promotion.MinQuantity], DimPromotion.MaxQuantity AS [Promotion.MaxQuantity],
 DimStore.StoreName AS [Store.Name], DimStore.StoreDescription AS [Store.Description], DimStore.StoreManager AS [Store.Manager],
 DimStore.StoreType AS [Store.Type], DimStore.Status AS [Store.Status], DimStore.OpenDate AS [Store.OpenDate], DimStore.CloseDate AS [Store.CloseDate],
 DimStore.ZipCode AS [Store.ZipCode], DimStore.ZipCodeExtension AS [Store.ZipCodeExtension], DimStore.StorePhone AS [Store.Phone],
 DimStore.StoreFax AS [Store.Fax], DimStore.AddressLine1 AS [Store.AddressLine1], DimStore.AddressLine2 AS [Store.AddressLine2],
 JSON_QUERY(CONCAT('{"type": "Feature","geometry": {"type": "Point","coordinates": [',DimStore.Geometry.STX,',', DimStore.Geometry.STY,']}}')) AS [Store.Geometry],
 JSON_QUERY(CONCAT('{"type": "Feature","geometry": {"type": "Point","coordinates": [',DimStore.GeoLocation.Lat,',', DimStore.GeoLocation.Long,']}}')) AS [Store.GeoLocation],
 DimGeography.CityName AS [Store.CityName],
 DimGeography.StateProvinceName AS [Store.StateProvinceName], DimGeography.RegionCountryName AS [Store.RegionCountryName],
 DimGeography.ContinentName AS [Store.ContinentName], DimGeography.GeographyType AS [Store.GeographyType],
 JSON_QUERY(CONCAT('{"type": "Feature","geometry": {"type": "Point","coordinates": [',DimGeography.Geometry.STX,',', DimGeography.Geometry.STY,']}}')) AS [Store.Geo.Location],
 DimGeography.GeographyKey AS [Store.GeographyKey], DimEntity.EntityLabel AS [Store.Entity.Label],
 DimEntity.EntityName AS [Store.Entity.Name], DimEntity.EntityDescription AS [Store.Entity.Description], DimEntity.EntityType AS [Store.Entity.Type],
 DimEntity.Status AS [Store.Entity.Status], DimDate.FullDateLabel AS [Date.FullDateLabel], DimDate.DateDescription AS [Date.DateDescription],
 DimDate.CalendarYear AS [Date.CalendarYear], DimDate.CalendarMonthLabel AS [Date.CalendarMonthLabel], DimDate.FiscalYear AS [Date.FiscalYear],
 DimDate.FiscalMonth AS [Date.FiscalMonth], DimDate.FiscalYearLabel AS [Date.FiscalYearLabel], DimDate.CalendarYearLabel AS [Date.CalendarYearLabel],
 DimDate.CalendarHalfYear AS [Date.CalendarHalfYear], DimDate.CalendarHalfYearLabel AS [Date.CalendarHalfYearLabel], DimDate.Datekey AS [Date.Datekey],
 DimDate.CalendarQuarter AS [Date.CalendarQuarter], DimDate.CalendarQuarterLabel AS [Date.CalendarQuarterLabel],
 DimDate.CalendarMonth AS [Date.CalendarMonth], DimDate.CalendarWeek AS [Date.CalendarWeek], DimDate.CalendarWeekLabel AS [Date.CalendarWeekLabel],
 DimDate.CalendarDayOfWeekLabel AS [Date.CalendarDayOfWeekLabel], DimDate.CalendarDayOfWeek AS [Date.CalendarDayOfWeek],
 DimDate.FiscalHalfYear AS [Date.FiscalHalfYear], DimDate.FiscalHalfYearLabel AS [Date.FiscalHalfYearLabel], DimDate.FiscalQuarter AS [Date.FiscalQuarter],
 DimDate.FiscalQuarterLabel AS [Date.FiscalQuarterLabel], DimDate.FiscalMonthLabel AS [Date.FiscalMonthLabel]
 FOR JSON PATH, WITHOUT_ARRAY_WRAPPER) as Data
FROM FactOnlineSales INNER JOIN
 DimDate ON FactOnlineSales.DateKey = DimDate.Datekey INNER JOIN
 DimStore ON FactOnlineSales.StoreKey = DimStore.StoreKey INNER JOIN
 DimProduct ON FactOnlineSales.ProductKey = DimProduct.ProductKey INNER JOIN
 DimPromotion ON FactOnlineSales.PromotionKey = DimPromotion.PromotionKey INNER JOIN
 DimCurrency ON FactOnlineSales.CurrencyKey = DimCurrency.CurrencyKey INNER JOIN
 DimCustomer ON FactOnlineSales.CustomerKey = DimCustomer.CustomerKey INNER JOIN
 DimGeography ON DimStore.GeographyKey = DimGeography.GeographyKey INNER JOIN
 DimProductSubcategory ON DimProduct.ProductSubcategoryKey = DimProductSubcategory.ProductSubcategoryKey INNER JOIN
 DimProductCategory ON DimProductSubcategory.ProductCategoryKey = DimProductCategory.ProductCategoryKey INNER JOIN
 DimGeography DimGeography_1 ON DimCustomer.GeographyKey = DimGeography_1.GeographyKey INNER JOIN
 DimEntity ON DimStore.EntityKey = DimEntity.EntityKey

Import and analyze IIS Log files using SQL Server

$
0
0

IIS generates logs where are recorded many information about HTTP requests such as what Url was called, when the request happened, what is the origin, etc. If you want to analyze information from log files you can use use text search, regular expressions, or some log analysis tools; however, this might be tedious job. SQL Server enables you to import information from IIS log files into tables and use T-SQL language to analyze information from logs. In this post you can find how to load log files generated by IIS into SQL Server table using BULK INSERT commands, and analyze the date using T-SQL.

IIS Log files

IIS generates textual log files in following format:

#Software: Microsoft Internet Information Services 10.0
#Version: 1.0
#Date: 2016-12-14 20:43:33
#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken
2016-12-14 20:43:33 10.0.0.4 GET /AdventureWorks - 80 - 168.62.177.232 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+Trident/5.0;+AppInsights) - 404 0 2 753
2016-12-14 20:43:33 10.0.0.4 GET /AdventureWorks/Employees/Create - 80 - 70.37.147.45 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+Trident/5.0;+AppInsights) - 404 0 2 7613
2016-12-14 20:44:07 10.0.0.4 GET /AdventureWorks/Employees/Create - 80 - 65.54.78.59 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+Trident/5.0;+AppInsights) - 404 0 2 54
2016-12-14 20:44:38 10.0.0.4 GET /AdventureWorks - 80 - 94.245.82.32 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+Trident/5.0;+AppInsights) - 404 0 2 202
2016-12-14 20:45:05 10.0.0.4 GET /AdventureWorks - 80 - 207.46.98.172 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+Trident/5.0;+AppInsights) - 404 0 2 43

These are textual files where cells are separated with space, and lines are separated with new-line. This can be easily imported into SQL Server using bcp, BULK INSERT commands.

Analyzing log files in SQL Server

First we need to create a table where IIS log files will be stored. Example is shown in the following code:

DROP TABLE IF EXISTS IISLOG
CREATE TABLE IISLOG (
 [DATE] [DATE] NULL,
 [TIME] [TIME] NULL,
 [s-ip] [VARCHAR] (16) NULL,
 [cs-method] [VARCHAR] (8) NULL,
 [cs-uri-stem] [VARCHAR] (255) NULL,
 [cs-uri-query] [VARCHAR] (2048) NULL,
 [s-port] [VARCHAR] (4) NULL,
 [s-username] [VARCHAR] (16) NULL,
 [c-ip] [VARCHAR] (16) NULL,
 [cs(User-Agent)] [VARCHAR] (1024) NULL,
 [cs(Referer)] [VARCHAR] (4096) NULL,
 [sc-STATUS] [INT] NULL,
 [sc-substatus] [INT] NULL,
 [sc-win32-STATUS] [INT] NULL,
 [time-taken] [INT] NULL,
 INDEX cci CLUSTERED COLUMNSTORE
)

When you look at the log file, you will see a line starting with #Fields: where you can see all columns that should be placed in destination table.

#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken

You can create the table y looking in this list.
Note that I’m using CLUSTERED COLUMNSTORE INDEX on this table. This is not mandatory, but CCI is good solution for logs because it provides high compression of data and speed-up analytic.

Loading and analyzing logs

Now when we have destination table, we can load logs using BULK INSERT command:

BULK INSERT iislog
FROM 'D:\Data\Documents\u_ex161214.log'
WITH (
 FIRSTROW = 2,
 FIELDTERMINATOR = ' ',
 ROWTERMINATOR = '\n'
)

You can use space and new line as terminator. I’m using FIRSTROW=2 to ignore header row. Now, when we have all data in table, we can use standard SQL to analyze it:

select [cs-uri-stem], avg([time-taken])
from IISLOG
group by [cs-uri-stem]
order by avg([time-taken]) desc

Once you load logs into table, you can perform any kind of analysis using T-SQL.


Parsing 4GB JSON with SQL Server

$
0
0

SQL Server 2016 and Azure SQL Database enable you to parse JSON text and transform it into tabular format. In this post, you might see that JSON functions can handle very large JSON text – up to 4GB.


First, I would need very large JSON document. I’m using TPCH database so I will export the content of lineitem table in a file. JSON can be exported using the bcp.exe program:

D:\Temp>bcp "select (select * from lineitem for json path)" queryout lineitems.json -d tpch -S .\SQLEXPRESS -T -w

Starting copy...


1 rows copied.

Network packet size (bytes): 4096

Clock Time (ms.) Total     : 103438 Average : (0.01 rows per sec.)

 

The query will format all rows from lineitem table, format them as JSON text and return them as a single cell. I’m using Unicode format (-w flag). As a result, bcp.exe will generate 4.35 GB (4,677,494,824 bytes) file containing one big JSON array.

Now I will load the content of this file using OPENROWSET(BULK) and pass content to OPENJSON function that will parse it, take the values from l_discount key, and find the average value:

select avg([l_discount])
from openrowset(bulk 'D:\Temp\lineitems.json', SINGLE_NCLOB) f
 cross apply openjson(f.BulkColumn) with([l_discount] [money])

In my SQL Server 2016 Express edition this query is finished in 1min 53 sec.

Conclusion

Functions that can parse JSON in SQL Server 2016 do not have any constraint regarding the size of JSON document. As you might see in this example, I can successfully parse 4GB JSON document, which is 2x bigger than maximum size of NVARCHAR(MAX) that can be stored in tables.

 

 

SQL Server 2016 Developer Edition in Windows Containers

$
0
0

We are excited to announce the public availability of SQL Server 2016 SP1 Developer Edition in Windows Containers! The image is now available on Docker Hub and the build scripts are hosted on our GitHub repository. This image can be used in both Windows Server Containers as well as Hyper-V Containers.

SQL Server 2016 Developer Edition: Docker Image | Installation Scripts

We hope you will find this image useful and leverage it for your container-based applications!

Why use SQL Server in containers?

SQL Server 2016 in a Windows container would be ideal when you want to:

  • Quickly create and start a set of SQL Server instances for development or testing.
  • Maximize density in test or production environments, especially in microservice architectures.
  • Isolate and control applications in a multi-tenant infrastructure.

Prerequisites

Before you can get started with the SQL Server 2016 Developer Edition image, you’ll need a Windows Server 2016 or Windows 10 host with the latest updates, the Windows Container feature enabled, and the Docker engine.

Pulling and Running SQL Server 2016 in a Windows Container

Below are the Docker pull and run commands for running SQL Server 2016 Developer instance in a Windows Container. Make sure that the mandatory sa_password environment variable meets the SQL Server 2016 Password Complexity requirements.

First, pull the image

docker pull microsoft/mssql-server-windows-developer

Then, run a SQL Server container

• Running a Windows Server Container (Windows Server 2016 only)
docker run -d -p 1433:1433 -e sa_password= -e ACCEPT_EULA=Y microsoft/mssql-server-windows-developer

• Running a Hyper-V Container (Windows Server 2016 or Windows 10)
docker run -d -p 1433:1433 -e sa_password= ––isolation=hyperv -e ACCEPT_EULA=Y microsoft/mssql-server-windows-developer

Connecting to SQL Server 2016

From within the container

One of the ways to connect to the SQL Server instance from inside the container is by using the sqlcmd utility.

First, use the docker ps command to get the container ID that you want to connect to and use it to replace the parameter placeholder “DOCKER_CONTAINER_ID” in the commands below. You can use the docker exec -it command to create an interactive command prompt that will execute commands inside of the container by using either Windows or SQL Authentication.

• Windows authentication using container administrator account
docker exec -it "DOCKER_CONTAINER_ID" sqlcmd

• SQL authentication using the system administrator (SA) account
docker exec -it "DOCKER_CONTAINER_ID" sqlcmd -Usa

From outside the container

One of the ways to access SQL Server 2016 from outside the container is by installing SQL Server Management Studio (SSMS). You can install and use SSMS either on the host or on another machine that can remotely connect to the host. Please follow this blog post for detailed instructions on connecting to a SQL Server 2016 Windows Containers via SSMS.

SQL 2016 Features Supported on Windows Server Core

Please refer to this link for all SQL Server 2016 features that are supported on a Windows Server Core installation.

Further Reading

Windows Containers Documentation
Container Resource Management
MSSQL Docker GitHub Repo
Tutorials for SQL Server 2016

Please give the SQL Server 2016 Developer image a try, and let us know what you think!

Thanks,
Perry Skountrianos
twitter | LinkedIn

Loading files from Azure Blob Storage into Azure SQL Database

$
0
0

Azure SQL Database enables you to directly load files stored on Azure Blob Storage using the BULK INSERT T-SQL command and OPENROWSET function.

Loading content of files form Azure Blob Storage account into a table in SQL Database is now single command:

BULK INSERT Product
FROM 'data/product.dat'
WITH ( DATA_SOURCE = 'MyAzureBlobStorageAccount');

 

BULK INSERT is existing command in T-SQL language that enables you to load files from file system into a table. New DATA_SOURCE option enables you to reference Azure Blob Storage account.

You can also use OPENROWSET function to parse content of the file and execute any T-SQL query on returned rows:

SELECT Color, count(*)
FROM OPENROWSET(BULK 'data/product.bcp', DATA_SOURCE = 'MyAzureBlobStorage',
 FORMATFILE='data/product.fmt', FORMATFILE_DATA_SOURCE = 'MyAzureBlobStorage') as data
GROUP BY Color;

OPENROWSET function enables you to specify data sources where input file is placed, and data source where format file (the file that defines the structure of file) is placed.

If your file is placed on a public Azure Blob Storage account, you need to define EXTERNAL DATA SOURCE that points to that account:

 

CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
 WITH ( TYPE = BLOB_STORAGE, LOCATION = 'https://myazureblobstorage.blob.core.windows.net');

Once you define external data source, you can use the name of that source in BULK INSERT and OPENROWSET.

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'some strong password';
CREATE DATABASE SCOPED CREDENTIAL MyAzureBlobStorageCredential
 WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
 SECRET = 'sv=2015-12-11&ss=b&srt=sco&sp=rwac&se=2017-02-01T00:55:34Z&st=2016-12-29T16:55:34Z&spr=https&sig=copyFromAzurePortal';
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
 WITH ( TYPE = BLOB_STORAGE,
        LOCATION = 'https://myazureblobstorage.blob.core.windows.net',
        CREDENTIAL= MyAzureBlobStorageCredential);

 

 

You can find full example with some sample files on SQL Server GitHub account.

Comparing performance of data access libraries using StackExchange/Dapper benchmark

$
0
0

One of the most important questions that you need to answer in your projects is what data access library you should use to access your data in SQL Server database. One of the benchmarks that you can use is StackExchange Dapper benchmark that checks how fast could different data access libraries fetch a single row from database.


In this post I will show you how to use StackExchange/Dapper performance benchmark to evaluate performance of different data access libraries. This is simple performance benchmark that executes 500 SQL queries that read a single row from Posts table and return results. Example of test for hand coded query implemented using the plain SqlCommand/DataReader is shown in the following code:

var postCommand = new SqlCommand();
postCommand.Connection = connection;
postCommand.CommandText = @"select Id, [Text], [CreationDate], LastChangeDate,
 Counter1,Counter2,Counter3,Counter4,Counter5,Counter6,Counter7,Counter8,Counter9 from Posts where Id = @Id";
var idParam = postCommand.Parameters.Add("@Id", System.Data.SqlDbType.Int);
using (var reader = postCommand.ExecuteReader())
{
 reader.Read();
 var post = new Post();
 /* populate fields in Post object */
}

The same query is implemented in different frameworks and test measures elapsed time for 500 iterations of this query.

Setting-up the test

To setup test, you can go to StackExchange/Dapper GitHub an download source code. Tests are created as C# solution (Dapper.sln). When you open this solution you can find Dapper.Tests project. You might need to change two things:

  1. Connection strings are hardcoded in Tests.cs file with values like “Server=(local)\SQL2014;Database=tempdb;User ID=sa;Password=Password12!”. You might need to change this and put your connection info.
  2. Project is compiled using dotnet sdk 1.0.0-preview2-003121, so you might get compilation errors if you don’t have a matching framework. I have removed line: “sdk”: { “version”: “1.0.0-preview2-003121” } from global.json to fix this.

Now you will be able to build project and run tests.

Executing tests

To start tests you need to start Dapper.Tests project and results will be shown in console. I have executed test with the following results:

Running 500 iterations that load up a post entity

Running...
hand coded took 122ms
Mapper QueryFirstOrDefault took 125ms
Dynamic Mapper Query (buffered) took 130ms
Dynamic Massive ORM Query took 140ms
Dapper.Contrib took 147ms
DataTable via IDataReader.GetValues took 157ms
Mapper Query (buffered) took 160ms
Linq 2 SQL Compiled took 168ms
Mapper Query (non-buffered) took 179ms
Dynamic Mapper Query (non-buffered) took 195ms
Dynamic Mapper QueryQueryFirstOrDefault took 198ms
Entity framework SqlQuery took 202ms
Linq 2 SQL ExecuteQuery took 302ms
Entity framework No Tracking took 365ms
Entity framework took 378ms
Simple.Data took 767ms
Linq 2 SQL took 905ms

As expected, hand coded data access with data readers is the faster approach. Interesting fact is that Dapper library adds minimal overhead. It is slightly slower than hand coded data access. Compiled Linq 2 SQL also shows good results with 40% overhead compared to hand coded data access.

Entity Framework with Database.SqlQuery<Post>(“SQL QUERY”) adds ~65% overhead, so this might be preferred way to execute queries if performance of EF are important for you.

Entity Framework with Linq, e.g. entityContext.Posts.First(p => p.Id == id) is around x3 slower that handcoded data reader.

Conclusion

This test compares performance of different data libraries with the faster handcoded approach. Using this test you can see what is the overhead of different libraries that are built on top of standard ADO.NET.

Dapper ORM is simple and interesting framework that has enables you to access your data with acceptable performance overhead. I have checked the test code and there is nothing specific to Dapper library that might bring false positive results.

If you are using Entity Framework you should try to use raw SQL queries if performance are important for you. Hopefully in the future we will see some performance improvements in EF Linq that will be closer to EF Query method.

Introducing Batch Mode Adaptive Joins

$
0
0

For SQL Server 2017 and Azure SQL Database, the Microsoft Query Processing team is introducing a new set of adaptive query processing improvements to help fix performance issues that are due to inaccurate cardinality estimates. Improvements in the adaptive query processing space include batch mode memory grant feedback, batch mode adaptive joins, and interleaved execution.  In this post, we’ll introduce batch mode adaptive joins.

aj_image_1

We have seen numerous cases where providing a specific join hint solved query performance issues for our customers.  However, the drawback of adding a hint is that we remove join algorithm decisions from the optimizer for that statement. While fixing a short-term issue, the hard-coded hint may not be the optimal decision as data distributions shift over time.

Another scenario is where we do not know up front what the optimal join should be, for example, with a parameter sensitive query where a low or high number of rows may flow through the plan based on the actual parameter value.

With these scenarios in mind, the Query Processing team introduced the ability to sense a bad join choice in a plan and then dynamically switch to a better join strategy during execution.

The batch mode adaptive joins feature enables the choice of a hash join or nested loop join method to be deferred until the after the first input has been scanned.  We introduce a new Adaptive Join operator.  This operator defines a threshold that will be used to decide when we will switch to a nested loop plan.

Note: To see the new Adaptive Join operator – download SQL Server Management Studio – 17.0

How it works at a high level:

  • If the row count of the build join input is small enough that a nested loop join would be more optimal than a hash join, we will switch to a nested loop algorithm.
  • If the build join input exceeds a specific row count threshold, no switch occurs and we will continue with a hash join.

The following query is used to illustrate an adaptive join example:

SELECT  [fo].[Order Key], [si].[Lead Time Days], [fo].[Quantity]
FROM    [Fact].[Order] AS [fo]
INNER JOIN [Dimension].[Stock Item] AS [si]
       ON [fo].[Stock Item Key] = [si].[Stock Item Key]
WHERE   [fo].[Quantity] = 360;

The query returns 336 rows.  Enabling Live Query Statistics we see the following plan:

aj_image_2

Walking through the noteworthy areas:

  1. We have a Columnstore Index Scan used to provide rows for the hash join build phase.
  2. We have the new Adaptive Join operator. This operator defines a threshold that will be used to decide when we will switch to a nested loop plan.  For our example, the threshold is 78 rows.  Anything with >= 78 rows will use a hash join.  If less than the threshold, we’ll use a nested loop join.
  3. Since we return 336 rows, we are exceeding the threshold and so the second branch represents the probe phase of a standard hash join operation. Notice that Live Query Statistics shows rows flowing through the operators – in this case “672 of 672”.
  4. And the last branch is our Clustered Index Seek for use by the nested loop join had the threshold not been exceeded. Notice that we see “0 of 336” rows displayed (the branch is unused).

Now let’s contrast the plan with the same query, but this time for a Quantity value that only has one row in the table:

SELECT  [fo].[Order Key], [si].[Lead Time Days], [fo].[Quantity]
FROM    [Fact].[Order] AS [fo]
INNER JOIN [Dimension].[Stock Item] AS [si]
       ON [fo].[Stock Item Key] = [si].[Stock Item Key]
WHERE   [fo].[Quantity] = 361;

The query returns one row.  Enabling Live Query Statistics we see the following plan:

aj_image_3

Walking through the noteworthy areas:

  1. With one row returned, you see the Clustered Index Seek now has rows flowing through it.
  2. And since we did not continue with the hash join build phase, you’ll see zero rows flowing through the second branch.

How do I enable batch mode adaptive joins?

To have your workloads automatically eligible for this improvement, enable compatibility level 140 for the database in SQL Server 2017 CTP 2.0 or greater.  This improvement will also be surfacing in Azure SQL Database.

What statements are eligible for batch mode adaptive joins?

A few conditions make a logical join eligible for a batch mode adaptive join:

  • The database compatibility level is 140
  • The join is eligible to be executed both by an indexed nested loop join or a hash join physical algorithm
  • The hash join uses batch mode – either through the presence of a Columnstore index in the query overall or a Columnstore indexed table being referenced directly by the join
  • The generated alternative solutions of the nested loop join and hash join should have the same first child (outer reference)

If an adaptive join switches to a nested loop operation, do we have to rescan the join input?

No.  The nested loop operation will use the rows already read by the hash join build.

What determines the adaptive join threshold?

We look at estimated rows and the cost of a hash join vs. nested loop join alternative and find an intersection where the cost of a nested loop exceeds the hash join alternative.  This threshold cost is translated into a row count threshold value.

aj_image_4

The prior chart shows an intersection between the cost of a hash join vs. the cost of a nested loop join alternative.  At this intersection point, we determine the threshold.

What performance improvements can we expect to see?

Performance gains occur for workloads where, prior to adaptive joins being available, the optimizer chooses the wrong join type due to cardinality misestimates. For example, one of our customers saw a 20% improvement with one of the candidate workloads. And for one of our internal Microsoft customers, they saw the following results:

aj_image_5

Workloads with big oscillations between small and large input Columnstore index scans joined to other tables will benefit the most from this improvement.

Any overhead of using batch mode adaptive joins?

Adaptive joins will introduce a higher memory requirement than an index nested loop join equivalent plan.  The additional memory will be requested as if the nested loop was a hash join. With that additional cost comes flexibility for scenarios where row counts may fluctuate in the build input.

How do batch mode adaptive joins work for consecutive executions once the plan is cached?

Batch mode adaptive joins will work for the initial execution of a statement, and once compiled, consecutive executions will remain adaptive based on the compiled adaptive join threshold and the runtime rows flowing through the build phase of the Columnstore Index Scan.

How can I track when batch mode adaptive joins are used?

As shown earlier, you will see the new Adaptive Join operator in the plan and the following new attributes:

Plan attribute Description
AdaptiveThresholdRows Shows the threshold use to switch from a hash join to nested loop join.
EstimatedJoinType What we think the join type will be.
ActualJoinType * In an actual plan, shows what join algorithm was ultimately chosen based on the threshold.

* Arriving post-CTP 2.0.

What does the estimated plan show?

We will show the adaptive join plan shape, along with a defined adaptive join threshold and estimated join type.

Will Query Store capture and be able to force a batch mode adaptive join plan?

Yes.

Will you be expanding the scope of batch mode adaptive joins to include row mode?

This first version supports batch mode execution, however we are exploring row mode as a future possibility as well.

Thanks for reading, and stay tuned for more blog posts regarding the adaptive query processing feature family!

Viewing all 177 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>