🔰 SQL Server monitoring checklist 🔰
Last Updated: 2023-09-23- Item 1 - Are all of your SQL Server services running?
- Item 2 - Did all of your SQL Agent Jobs run successfully?
- Item 3 - Do you have a recent backup of all your SQL Server databases?
- Item 4 - Are there any errors in your SQL Server Error Log?
- Item 5 - Are you running out of space on any of your disks on your SQL Server?
- Item 6 - Are you running low on server memory for SQL Server?
- Item 7 - Are there any SQL Server statements in the cache that could use tuning?
- Item 8 - How many connections do you have to your SQL Server instance?
- Item 9 - How many requests is your SQL Server processing?
Problem
Every good SQL Server DBA goes through some list of checks when they get to the office in the morning to make sure all of their systems are running smoothly. This tip will highlight some of the more important items that every DBA should be monitoring either manually or by using some form of scheduled scripting.
Note: All of the following queries have been tested on SQL Server 2008 and may not work correctly with previous versions of SQL Server
Solution
Obviously once you connect to your instance you know that the database service is up and running but you can use the extended stored procedure xp_servicecontrol to check if any service is up and running. Below are a few examples of what you can check.
exec master.dbo.xp_servicecontrol 'QUERYSTATE', 'MSSQLServer'
exec master.dbo.xp_servicecontrol 'QUERYSTATE', 'SQLServerAgent'
exec master.dbo.xp_servicecontrol 'QUERYSTATE', 'SQLBrowser'
This item can be checked with a fairly straightforward query of the msdb database. The first part of the query checks for any failed job steps and the second part is only concerned with the overall job status. This is also checked because a step could be set to continue even on failure, but should probably still be looked at in the morning. Also, if you are using the SQL Server Agent to backup your databases then this is also a good way to check if any backup jobs failed.
use msdb
go
select 'FAILED' as Status, cast(sj.name as varchar(100)) as "Job Name",
cast(sjs.step_id as varchar(5)) as "Step ID",
cast(sjs.step_name as varchar(30)) as "Step Name",
cast(REPLACE(CONVERT(varchar,convert(datetime,convert(varchar,sjh.run_date)),102),'.','-')+' '+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),1,2)+':'+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),3,2)+':'+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),5,2) as varchar(30)) 'Start Date Time',
sjh.message as "Message"
from sysjobs sj
join sysjobsteps sjs
on sj.job_id = sjs.job_id
join sysjobhistory sjh
on sj.job_id = sjh.job_id and sjs.step_id = sjh.step_id
where sjh.run_status <> 1
and cast(sjh.run_date as float)*1000000+sjh.run_time >
cast(convert(varchar(8), getdate()-1, 112) as float)*1000000+70000 --yesterday at 7am
union
select 'FAILED',cast(sj.name as varchar(100)) as "Job Name",
'MAIN' as "Step ID",
'MAIN' as "Step Name",
cast(REPLACE(CONVERT(varchar,convert(datetime,convert(varchar,sjh.run_date)),102),'.','-')+' '+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),1,2)+':'+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),3,2)+':'+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),5,2) as varchar(30)) 'Start Date Time',
sjh.message as "Message"
from sysjobs sj
join sysjobhistory sjh
on sj.job_id = sjh.job_id
where sjh.run_status <> 1 and sjh.step_id=0
and cast(sjh.run_date as float)*1000000+sjh.run_time >
cast(convert(varchar(8), getdate()-1, 112) as float)*1000000+70000 --yesterday at 7am
The two queries below will list any database that either does not have any backup or has not been backed up in the last 24 hours. The first query checks your full backups and the second query checks your transaction log backups (only for those databases in full recovery mode).
SELECT d.name AS "Database",
ISNULL(CONVERT(VARCHAR,b.backupdate,120),'NEVER') AS "Last Full Backup"
FROM sys.databases d
LEFT JOIN (SELECT database_name,type,MAX(backup_finish_date) backupdate FROM backupset
WHERE type LIKE 'D'
GROUP BY database_name,type) b on d.name=b.database_name
WHERE (backupdate IS NULL OR backupdate < getdate()-1)
SELECT d.name AS "Database",
ISNULL(CONVERT(VARCHAR,b.backupdate,120),'NEVER') AS "Last Log Backup"
FROM sys.databases d
LEFT JOIN (SELECT database_name,type,MAX(backup_finish_date) backupdate FROM backupset
WHERE type LIKE 'L'
GROUP BY database_name,type) b on d.name=b.database_name
WHERE recovery_model = 1
AND (backupdate IS NULL OR backupdate < getdate()-1)
In order to check the SQL Server Error Log we are going to use the undocumented extended stored procedure, xp_readerrorlog. This query will look at the current log and go back a maximum of 2 days looking for any errors during that time frame.
declare @Time_Start datetime;
declare @Time_End datetime;
set @Time_Start=getdate()-2;
set @Time_End=getdate();
-- Create the temporary table
CREATE TABLE #ErrorLog (logdate datetime
, processinfo varchar(255)
, Message varchar(500))
-- Populate the temporary table
INSERT #ErrorLog (logdate, processinfo, Message)
EXEC master.dbo.xp_readerrorlog 0, 1, null, null , @Time_Start, @Time_End, N'desc';
-- Filter the temporary table
SELECT LogDate, Message FROM #ErrorLog
WHERE (Message LIKE '%error%' OR Message LIKE '%failed%') AND processinfo NOT LIKE 'logon'
ORDER BY logdate DESC
-- Drop the temporary table
DROP TABLE #ErrorLog
You can used the extended stored procedure xp_fixeddrives to get a quick look at the space left on your drives.
exec master.dbo.xp_fixeddrives
To check the memory on your server we can use the dynamic management view dm_os_sys_memory.
SELECT available_physical_memory_kb/1024 as "Total Memory MB",
available_physical_memory_kb/(total_physical_memory_kb*1.0)*100 AS "% Memory Free"
FROM sys.dm_os_sys_memory
The following query will identify any poor performing SQL statements. You can alter the "order by" clause depending on what you are most concerned with (IO vs. CPU vs. Elapsed Time).
SELECT top 10 text as "SQL Statement",
last_execution_time as "Last Execution Time",
(total_logical_reads+total_physical_reads+total_logical_writes)/execution_count as [Average IO],
(total_worker_time/execution_count)/1000000.0 as [Average CPU Time (sec)],
(total_elapsed_time/execution_count)/1000000.0 as [Average Elapsed Time (sec)],
execution_count as "Execution Count",
qp.query_plan as "Query Plan"
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
order by total_elapsed_time/execution_count desc
This query on its own does not provide too much information other than show you if there is some blocking in the system. However, once you get a baseline for your applications through running this query, you'll be able to see if you have a higher than normal number of connections. This can be an early sign that there may be a problem.
SELECT spid, kpid, blocked, d.name, open_tran, status, hostname,
cmd, login_time, loginame, net_library
FROM sys.sysprocesses p
INNER JOIN sys.databases d
on p.dbid=d.database_id
As with the previous query, checking the number of requests coming into your SQL Server does not tell you too much. If you capture this number during normal operation you can use it as a baseline for comparison later on. Generally speaking around 1000/sec is a busy SQL Server but this number depends a lot on the hardware you are running on. For others 100/sec may be too much for their instance to handle. Also, using the below query as a template, you can query other O/S performance counters related to SQL Server.
DECLARE @BRPS BIGINT
SELECT @BRPS=cntr_value
FROM sys.dm_os_performance_counters
WHERE counter_name LIKE 'Batch Requests/sec%'
WAITFOR DELAY '000:00:10'
SELECT (cntr_value-@BRPS)/10.0 AS "Batch Requests/sec"
FROM sys.dm_os_performance_counters
WHERE counter_name LIKE 'Batch Requests/sec%'
Last Updated: 2023-09-23