Thursday 19 January 2023

MSSQL: How to read an UTF-8 file (XML or JSON)?

It is not sosimple to read an UTF-8 file.

Solution:
1. Read file as binary (BLOB)
then
2. Cast content to XML (also for JSON and other formats!!!)
then
3. Cast to the NVARCHAR(MAX)

For example:
SELECT Content = CAST(CAST(Content AS XML) AS NVARCHAR(MAX)) -- !!! Double CAST because of UTF-8 source
FROM OPENROWSET(BULK N'... Path to my file ...', SINGLE_BLOB) AS T(Content)
Then you can save this Content directly to the Variable:
DECLARE @Content NVARCHAR(MAX) = (
    SELECT Content = CAST(CAST(Content AS XML) AS NVARCHAR(MAX)) -- !!! Double CAST because of UTF-8 source
    FROM OPENROWSET(BULK N'... Constant path to my file ...', SINGLE_BLOB) AS T(Content)
)
If path to yout file is a constant, the it works.
But if you want to put this path as parameter, then will be necessary to use dynamic SQL:
DECLARE @FilePath NVARCHAR(260) = N'... Parametrized path to my file ...';
DECLARE @ContentTable TABLE (Content VARCHAR(MAX));
INSERT @ContentTable
EXEC ('
	SELECT Content = CAST(CAST(Content AS XML) AS NVARCHAR(MAX)) -- !!! Double CAST because of UTF-8 source
	FROM OPENROWSET(BULK N''' + @FilePath + ''', SINGLE_BLOB) AS T(Content)
');
DECLARE @Content NVARCHAR(MAX) = (SELECT Content FROM @ContentTable);