Tuesday, 2 July 2013

MSSQL - Query Templates - GROUP_CONCAT

Today I will show how to implement aggregation of strings in MSSQL  (similar to MySQL GROUP_CONCAT). My favorite way is with usage of FOR XML expression.

Imagine that we have, such a database:



We need to build a list of table fields so, that if a field with the same name is in multiple tables, then the second column in result set will contain alphabetically sorted and comma separated set of table names. For example, in our case, for all fields terminating by '_ID':



To solve this task, we will use the data from INFORMATION_SCHEMA tables.
In the first step, we will get a list of COLUMN-TABLE combinations:
  1. SELECT DISTINCT
  2. COLUMN_NAME,
  3. TABLE_NAME
  4. FROM
  5. INFORMATION_SCHEMA.COLUMNS
  6. ORDER BY
  7. COLUMN_NAME, TABLE_NAME

now the same, but as XML:
  1. SELECT DISTINCT
  2. COLUMN_NAME,
  3. TABLE_NAME
  4. FROM
  5. INFORMATION_SCHEMA.COLUMNS
  6. ORDER BY
  7. COLUMN_NAME, TABLE_NAME
  8. FOR XML RAW

Now for each field name we need get own list of tables. Result should be also in XML form.
For this we need first a list of unique field names:
  1. SELECT DISTINCT
  2. COLUMN_NAME
  3. FROM
  4. INFORMATION_SCHEMA.COLUMNS
  5. ORDER BY
  6. COLUMN_NAME

now combine this query with a query that returns list of COLUMN-TABLE combinations:
  1. SELECT DISTINCT
  2. [COLUMN] = C1.COLUMN_NAME,
  3. [TABLEs] = (
  4. SELECT C2.TABLE_NAME
  5. FROM INFORMATION_SCHEMA.COLUMNS C2
  6. WHERE
  7. C2.COLUMN_NAME = C1.COLUMN_NAME
  8. ORDER BY TABLE_NAME
  9. FOR XML RAW
  10. )
  11. FROM INFORMATION_SCHEMA.COLUMNS C1
  12. ORDER BY [COLUMN]

We obtain the following table:



It now remains:
- Filter out the unnecessary tables (sysdiagrams)
- Leaving only the fields that terminating by '_ID'
- Replace the start and end XML tags with delimiter (comma)
  1. SELECT DISTINCT
  2. [COLUMN] = C1.COLUMN_NAME,
  3. [TABLEs] = REPLACE(REPLACE(REPLACE(
  4. (
  5. SELECT T=C2.TABLE_SCHEMA + '.' + C2.TABLE_NAME
  6. FROM INFORMATION_SCHEMA.COLUMNS C2
  7. WHERE
  8. C2.TABLE_NAME != 'sysdiagrams'
  9. AND
  10. C2.COLUMN_NAME = C1.COLUMN_NAME
  11. ORDER BY 1
  12. FOR XML RAW
  13. ),
  14. '"/><row T="', ', '),
  15. '<row T="', ''),
  16. '"/>', '')
  17. FROM INFORMATION_SCHEMA.COLUMNS C1
  18. WHERE
  19. C1.TABLE_NAME != 'sysdiagrams'
  20. AND
  21. C1.COLUMN_NAME LIKE '%_ID'
  22. ORDER BY [COLUMN]

The result of the query:



It remains to convert all of this in the form of T-SQL template:
  1. USE <DataBaseName, sysname, tempdb>;
  2. SELECT DISTINCT
  3. [COLUMN] = C1.COLUMN_NAME,
  4. [TABLEs] = REPLACE(REPLACE(REPLACE(
  5. (
  6. SELECT T=CASE C2.TABLE_SCHEMA
  7. WHEN SCHEMA_NAME()
  8. THEN ''
  9. ELSE C2.TABLE_SCHEMA + '.'
  10. END + C2.TABLE_NAME
  11. FROM INFORMATION_SCHEMA.COLUMNS C2
  12. WHERE
  13. C2.TABLE_NAME != 'sysdiagrams'
  14. AND
  15. C2.COLUMN_NAME = C1.COLUMN_NAME
  16. ORDER BY T
  17. FOR XML RAW
  18. ),
  19. '"/><row T="', ', '),
  20. '<row T="', ''),
  21. '"/>', '')
  22. FROM INFORMATION_SCHEMA.COLUMNS C1
  23. WHERE
  24. C1.TABLE_NAME != 'sysdiagrams'
  25. AND
  26. C1.COLUMN_NAME LIKE <ColumnNamePattern, sysname, '%'>
  27. ORDER BY [COLUMN]

Save it in the list of SQL-templates.
Now we can use this template to find the fields of the same name in different tables:


About templates and usage of the system tables you can read here:
MSSQL - Query Templates
MSSQL - Query Templates - usage of system tables