Thursday, 19 January 2023

MSSQL: How to read an UTF-8 file (XML or JSON)?

It is not sosimple to read an UTF-8 file.

Solution:
1. Read file as binary (BLOB)
then
2. Cast content to XML (also for JSON and other formats!!!)
then
3. Cast to the NVARCHAR(MAX)

For example:
  1. SELECT Content = CAST(CAST(Content AS XML) AS NVARCHAR(MAX)) -- !!! Double CAST because of UTF-8 source
  2. FROM OPENROWSET(BULK N'... Path to my file ...', SINGLE_BLOB) AS T(Content)
Then you can save this Content directly to the Variable:
  1. DECLARE @Content NVARCHAR(MAX) = (
  2. SELECT Content = CAST(CAST(Content AS XML) AS NVARCHAR(MAX)) -- !!! Double CAST because of UTF-8 source
  3. FROM OPENROWSET(BULK N'... Constant path to my file ...', SINGLE_BLOB) AS T(Content)
  4. )
If path to yout file is a constant, the it works.
But if you want to put this path as parameter, then will be necessary to use dynamic SQL:
  1. DECLARE @FilePath NVARCHAR(260) = N'... Parametrized path to my file ...';
  2. DECLARE @ContentTable TABLE (Content VARCHAR(MAX));
  3. INSERT @ContentTable
  4. EXEC ('
  5. SELECT Content = CAST(CAST(Content AS XML) AS NVARCHAR(MAX)) -- !!! Double CAST because of UTF-8 source
  6. FROM OPENROWSET(BULK N''' + @FilePath + ''', SINGLE_BLOB) AS T(Content)
  7. ');
  8. DECLARE @Content NVARCHAR(MAX) = (SELECT Content FROM @ContentTable);