Tuesday, 7 July 2015

MSSQL - Very fast method to get statistic (rows and bytes) for all database tables

Typical way to count all rows in the table ist to send such query:

  1. SELECT COUNT(*) FROM MyTable;

This method has some disadvantages:
  1. We can only get a rows count, but not a physical size of table
  2. This query works on one table, name of which we have to define explicitly
  3. If table has no PRIMARY KEY, then this query can be very slow

There is some other method - to use a statistic from system tables.
This method uses for example this predefined SSMS report:


We can start Profiler (Menu =&rt; Tools =&rt; SQL Server Profiler) and extract for own usage a background query of this report.
I have already extracted this query and reformated as standalone VIEW:

  1. CREATE VIEW dbo.disk_usage_by_table
  2. AS
  3. SELECT TOP(100) PERCENT
  4. Database_Name = DB_NAME(),
  5. Schema_Name = CAST(a3.name AS SYSNAME),
  6. Table_Name = CAST(a2.name AS SYSNAME),
  7. Records = a1.records,
  8. Reserved_KB = (a1.reserved + ISNULL(a4.reserved,0))* 8,
  9. Data_KB = a1.data * 8,
  10. Indexes_KB = CASE
  11. WHEN (a1.used + ISNULL(a4.used,0)) > a1.data
  12. THEN (a1.used + ISNULL(a4.used,0)) - a1.data
  13. ELSE 0
  14. END * 8,
  15. Unused_KB = CASE
  16. WHEN (a1.reserved + ISNULL(a4.reserved,0)) > a1.used
  17. THEN (a1.reserved + ISNULL(a4.reserved,0)) - a1.used
  18. ELSE 0
  19. END * 8
  20. FROM
  21. (
  22. SELECT
  23. ps.object_id,
  24. records = SUM(
  25. CASE
  26. WHEN (ps.index_id < 2)
  27. THEN row_count
  28. ELSE 0
  29. END
  30. ),
  31. reserved = SUM(ps.reserved_page_count),
  32. data = SUM(
  33. CASE
  34. WHEN ps.index_id < 2
  35. THEN ps.in_row_data_page_count
  36. + ps.lob_used_page_count
  37. + ps.row_overflow_used_page_count
  38. ELSE ps.lob_used_page_count
  39. + ps.row_overflow_used_page_count
  40. END
  41. ),
  42. used = SUM(ps.used_page_count)
  43. FROM sys.dm_db_partition_stats ps
  44. GROUP BY ps.object_id
  45. ) AS a1
  46. INNER JOIN sys.all_objects a2 ON ( a1.object_id = a2.object_id )
  47. INNER JOIN sys.schemas a3 ON (a2.schema_id = a3.schema_id)
  48. LEFT OUTER JOIN (
  49. SELECT
  50. it.parent_id,
  51. SUM(ps.reserved_page_count) AS reserved,
  52. SUM(ps.used_page_count) AS used
  53. FROM sys.dm_db_partition_stats ps
  54. INNER JOIN sys.internal_tables it ON (it.object_id = ps.object_id)
  55. WHERE it.internal_type IN (202,204)
  56. GROUP BY it.parent_id
  57. ) AS a4 ON (a4.parent_id = a1.object_id)
  58. WHERE a2.type <> N'S' and a2.type <> N'IT'
  59. ORDER BY Table_Name;

If we query this VIEW:

  1. SELECT * FROM dbo.disk_usage_by_table;

then we can see that [count rows], [reserved space], [space used for data], [space used for indexes] and [free space] for 106 tables takes less then 1 second:



We can also filter or join this view with other tables/views:

  1. SELECT Records FROM dbo.disk_usage_by_table WHERE Table_Name = 'dim_date';



This method has folowing advantages:
  1. Is very very fast
  2. Is independent of table names
  3. Shows
    • Count of records
    • Reserved space (in KBytes)
    • Space used by data (in KBytes)
    • Space used by indexes (in KBytes)
    • Unused space (in KBytes)

You can use this metod for example in ETL processes to count (outside of transformation) rows and space before and after load.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.