Monday, 6 July 2015

MSSQL - Performance tuning of Scalar-Value User Defined Functions (UDFs)

Everybody knows that UDFs which return a single value (Scalar Value User Defined Functions / Scalar UDF) are very slow.
It is enough to move the scalar expression in to UDF and query performance will be several times reduced.
Yes, it is so and the same time not so. Let us check ...

For testing, we need a test table with the dates for about a thousand years:
-- Recreate test table
IF OBJECT_ID('dbo.TestUDFs') IS NOT NULL
    DROP TABLE dbo.TestUDFs;
GO
CREATE TABLE dbo.TestUDFs(
    Datum Date
);
GO
-- SET NOCOUNT ON to prevent extra log messages returned as part of the result set:
-- (See: https://msdn.microsoft.com/en-us/library/ms189837%28v=sql.110%29.aspx)
SET NOCOUNT ON;
-- Fill test-table
DECLARE
    @Date Date = GetDate(),
       @i INT  = 0;
WHILE @i < 1000 * 365 BEGIN
    SET @Date = DATEADD(day, 1, @Date);
    INSERT INTO dbo.TestUDFs VALUES (@Date);
    SET @i = @i + 1;
END;
-- restore NOCOUNT
SET NOCOUNT OFF;
GO
SELECT CNT = COUNT(*) FROM dbo.TestUDFs;
GO


We will create four different UDFs:

FIRST:
Scalar-value UDF, which uses variables for intermediate calculations.
The body of function is placed into BEGIN-END container.
-- Recreate scalar-value UDF with variables in body
IF OBJECT_ID('dbo.fn_scalar_with_vars') IS NOT NULL
    DROP FUNCTION dbo.fn_scalar_with_vars;
GO
CREATE FUNCTION dbo.fn_scalar_with_vars (@date DATETIME)
RETURNS INT
AS BEGIN
    DECLARE @DateKey INT;
    SELECT @DateKey = CONVERT(INT, CONVERT(VARCHAR(8), @date, 112));
    RETURN @DateKey;
END;
GO
SELECT dbo.fn_scalar_with_vars(GetDate());
GO

SECOND:
Scalar-value UDF, which does not use intermediate calculations, and almost immediately returns the result of the calculation.
The body of function is placed into BEGIN-END container.
-- Recreate scalar-value UDF with direct expression (without variables) in body
IF OBJECT_ID('dbo.fn_scalar_direct_expression') IS NOT NULL
    DROP FUNCTION dbo.fn_scalar_direct_expression;
GO
CREATE FUNCTION dbo.fn_scalar_direct_expression (@date DATETIME)
RETURNS INT
AS BEGIN
    RETURN CONVERT(INT, CONVERT(VARCHAR(8), @date, 112));
END;
GO
SELECT DateKey = dbo.fn_scalar_direct_expression(GetDate());
GO
Scalar-value UDF which returns not scalar value, but a table-result.
For data storage uses UDF a table variable.
The body of function is placed into BEGIN-END container.
-- Recreate table-value UDF which returns table-variable
IF OBJECT_ID('dbo.fn_non_inline_table') IS NOT NULL
    DROP FUNCTION dbo.fn_non_inline_table;
GO
CREATE FUNCTION dbo.fn_non_inline_table (@date DATETIME)
RETURNS @DateKey TABLE(DateKey INT)
AS BEGIN
    INSERT INTO @DateKey
    VALUES (CONVERT(INT, CONVERT(VARCHAR(8), @date, 112)));
RETURN;
END
GO
SELECT DateKey FROM dbo.fn_non_inline_table(GetDate());
GO
Finally UDF, which returns a non-scalar value, but a table.
There is no any BEGIN-END container, and the expression is placed immediately after the AS expression.
Such UDFs are called "Inline Table-Value UDF".
-- Recreate inlide table-value UDF
IF OBJECT_ID('dbo.fn_inline_table') IS NOT NULL
    DROP FUNCTION dbo.fn_inline_table;
GO
CREATE FUNCTION dbo.fn_inline_table (@date DATETIME)
RETURNS TABLE
AS
    RETURN SELECT DateKey = CONVERT(INT, CONVERT(VARCHAR(8), @date, 112));
GO
SELECT DateKey FROM dbo.fn_inline_table(GetDate());
GO


Let us test the direct evaluation of the expression in the query, and call of all four UDF variants.
Tests were performed on a virtual machine running on my NoteBook-e.
On other computers the results may differ.

The test script is designed so that as the first step we will store the current date and time in the variable.
Thereafter, for each row of the test table will be performed a calculation.
The test request will have to substitute this text: / * Put calculation there * /.
After that we will count the number of milliseconds between the stored and the current DateTime.

-- Script to test calculations
DECLARE
    @DateKey INT,
    @StartedAt DATETIME2 = GETDATE(); 

/* Put calculation there */

PRINT 'Elapsed time (Direct calculation): '
    + CAST(DATEDIFF(ms, @StartedAt, GETDATE()) AS VarChar(20)) + ' ms';

Calculation directly in query:
300 ms.
SELECT @DateKey = CONVERT(INT, CONVERT(VARCHAR(8), Datum, 112))
FROM dbo.TestUDFs;

Scalar-value UDF with variables and body:
1570 ms.
SELECT @DateKey = dbo.fn_scalar_with_vars(Datum)
FROM dbo.TestUDFs;

Scalar-value UDF with direct expression and body:
1274 ms.
SELECT @DateKey = dbo.fn_scalar_direct_expression(Datum)
FROM dbo.TestUDFs;

Non-inline table-value UDF with table-variable and body:
29623 ms.
SELECT @DateKey = DateKey
FROM dbo.TestUDFs CROSS APPLY dbo.fn_non_inline_table(Datum);

And now surprise - Inline table-value UDF:
210 ms!!! - at least the same result as in the first case, where a computation request occurs immediately.
SELECT @DateKey = DateKey
FROM dbo.TestUDFs CROSS APPLY dbo.fn_inline_table(Datum);



All-in-one Script (an execution takes about one minute):
-- SET NOCOUNT ON to prevent extra log messages:
-- See: https://msdn.microsoft.com/en-us/library/ms189837%28v=sql.110%29.aspx
SET NOCOUNT ON;
GO
IF OBJECT_ID('dbo.TestUDFs') IS NOT NULL
    DROP TABLE dbo.TestUDFs;
CREATE TABLE dbo.TestUDFs(
    [Date] Date
);
-- Fill test-table
DECLARE
    @Date Date = GetDate(),
       @i INT  = 0;
WHILE @i < 1000 * 365 BEGIN
    SET @Date = DATEADD(day, 1, @Date);
    INSERT INTO dbo.TestUDFs VALUES (@Date);
    SET @i = @i + 1;
END;
-- Check if table is filled
DECLARE @CNT INT = (SELECT COUNT(*) FROM dbo.TestUDFs);
PRINT 'Rows in TestUDFs table: ' + CAST(@CNT AS VARCHAR(20));
GO
IF OBJECT_ID('dbo.fn_scalar_with_vars') IS NOT NULL
    DROP FUNCTION dbo.fn_scalar_with_vars;
GO
CREATE FUNCTION dbo.fn_scalar_with_vars (@Date DATETIME)
RETURNS INT
AS BEGIN
    DECLARE @DateKey INT;
    SELECT @DateKey = CONVERT(INT, CONVERT(VARCHAR(8), @Date, 112));
    RETURN @DateKey;
END;
GO
IF OBJECT_ID('dbo.fn_scalar_direct_expression') IS NOT NULL
    DROP FUNCTION dbo.fn_scalar_direct_expression;
GO
CREATE FUNCTION dbo.fn_scalar_direct_expression (@Date DATETIME)
RETURNS INT
AS BEGIN
    RETURN CONVERT(INT, CONVERT(VARCHAR(8), @Date, 112));
END;
GO
IF OBJECT_ID('dbo.fn_non_inline_table') IS NOT NULL
    DROP FUNCTION dbo.fn_non_inline_table;
GO
CREATE FUNCTION dbo.fn_non_inline_table (@Date DATETIME)
RETURNS @DateKey TABLE(DateKey INT)
AS BEGIN
    INSERT INTO @DateKey
    VALUES (CONVERT(INT, CONVERT(VARCHAR(8), @Date, 112)));
RETURN;
END
GO
IF OBJECT_ID('dbo.fn_inline_table') IS NOT NULL
    DROP FUNCTION dbo.fn_inline_table;
GO
-- Recreate scalar-value UDF with direct expression (without variables) in body
IF OBJECT_ID('dbo.fn_scalar_direct_expression') IS NOT NULL
    DROP FUNCTION dbo.fn_scalar_direct_expression;
GO
CREATE FUNCTION dbo.fn_scalar_direct_expression (@Date DATETIME)
RETURNS INT
AS BEGIN
    RETURN CONVERT(INT, CONVERT(VARCHAR(8), @Date, 112));
END;
GO
-- Recreate table-value UDF which returns table-variable
IF OBJECT_ID('dbo.fn_non_inline_table') IS NOT NULL
    DROP FUNCTION dbo.fn_non_inline_table;
GO
CREATE FUNCTION dbo.fn_non_inline_table (@Date DATETIME)
RETURNS @DateKey TABLE(DateKey INT)
AS BEGIN
    INSERT INTO @DateKey
    VALUES (CONVERT(INT, CONVERT(VARCHAR(8), @Date, 112)));
RETURN;
END
GO
-- Recreate inlide table-value UDF
IF OBJECT_ID('dbo.fn_inline_table') IS NOT NULL
    DROP FUNCTION dbo.fn_inline_table;
GO
CREATE FUNCTION dbo.fn_inline_table (@Date DATETIME)
RETURNS TABLE
AS
    RETURN SELECT DateKey = CONVERT(INT, CONVERT(VARCHAR(8), @Date, 112));
GO
/********** Test all methods **********/
DECLARE
    @DateKey INT,
    @StartedAt DATETIME2 = GETDATE();

PRINT '
Calculation directly in query:'
SET @StartedAt = GETDATE();
SELECT @DateKey = CONVERT(INT, CONVERT(VARCHAR(8), [Date], 112))
FROM dbo.TestUDFs;
PRINT '    Elapsed time: '
    + CAST(DATEDIFF(ms, @StartedAt, GETDATE()) AS VarChar(20)) + ' ms';

PRINT '
Scalar-value UDF with variables and body:'
SET @StartedAt = GETDATE();
SELECT @DateKey = dbo.fn_scalar_with_vars([Date])
FROM dbo.TestUDFs;
PRINT '    Elapsed time: '
    + CAST(DATEDIFF(ms, @StartedAt, GETDATE()) AS VarChar(20)) + ' ms';

PRINT '
Scalar-value UDF with direct expression and body:'
SET @StartedAt = GETDATE();
SELECT @DateKey = dbo.fn_scalar_direct_expression([Date])
FROM dbo.TestUDFs;
PRINT '    Elapsed time: '
    + CAST(DATEDIFF(ms, @StartedAt, GETDATE()) AS VarChar(20)) + ' ms';

PRINT '
Non-inline table-value UDF with table-variable and body:'
SET @StartedAt = GETDATE();
SELECT @DateKey = DateKey
FROM dbo.TestUDFs CROSS APPLY dbo.fn_non_inline_table([Date]);
PRINT '    Elapsed time: '
    + CAST(DATEDIFF(ms, @StartedAt, GETDATE()) AS VarChar(20)) + ' ms';

PRINT '
Inline table-value UDF:'
SET @StartedAt = GETDATE();
SELECT @DateKey = DateKey
FROM dbo.TestUDFs CROSS APPLY dbo.fn_inline_table([Date]);
PRINT '    Elapsed time: '
    + CAST(DATEDIFF(ms, @StartedAt, GETDATE()) AS VarChar(20)) + ' ms';

GO
-- Restore SET NOCOUNT OFF
SET NOCOUNT OFF;

An execution of this Script generates this result message:

Rows in TestUDFs table: 365000

Calculation directly in query:
    Elapsed time: 300 ms

Scalar-value UDF with variables and body:
    Elapsed time: 1570 ms

Scalar-value UDF with direct expression and body:
    Elapsed time: 1274 ms

Non-inline table-value UDF with table-variable and body:
    Elapsed time: 29623 ms

Inline table-value UDF:
    Elapsed time: 210 ms