新聞中心
創(chuàng)新互聯(lián)建站是一家專注于做網(wǎng)站、成都網(wǎng)站建設(shè)與策劃設(shè)計,北票網(wǎng)站建設(shè)哪家好?創(chuàng)新互聯(lián)建站做網(wǎng)站,專注于網(wǎng)站建設(shè)十載,網(wǎng)設(shè)計領(lǐng)域的專業(yè)建站公司;建站業(yè)務(wù)涵蓋:北票等地區(qū)。北票做網(wǎng)站價格咨詢:028-86922220
內(nèi)容介紹
本文詳細(xì)介紹了在客戶現(xiàn)場一次因為resmgr:cpu quantum等待事件而導(dǎo)致CPU大量被消耗,在確認(rèn)原因并將導(dǎo)致CPU沖高的Resource Manager資源管理器特性關(guān)閉后,后續(xù)導(dǎo)致了后臺alert日志中大量的自動定時JOB沒有執(zhí)行并報錯:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_255" ORA-29373: resource manager is not on |
客戶操作系統(tǒng):RHEL6
數(shù)據(jù)庫版本:11.2.0.3.11
以下我們將描述并說明Resource Manager資源管理器以及后續(xù)的解決處理排查等一系列的過程。
概念普及
Resource Manager資源管理器是oracle在10g推出,并在11g中得以完善的一項功能。資源管理器通過控制數(shù)據(jù)庫內(nèi)部的執(zhí)行調(diào)度任務(wù)控制資源在各個會話之間的分配。通過控制所要運行的會話以及會話運行的時間長度,數(shù)據(jù)庫資源管理器可以確保資源利用及分配和我們計劃中的配置一樣,有效利用資源,當(dāng)然,Resource Manager主要控制資源包括如下幾塊:
Oracle會話的CPU使用率
并行度
SQL語句操作執(zhí)行時間
會話空閑時間
活躍會話(session)數(shù)
UNDO管理
當(dāng)然,在真正的生產(chǎn)環(huán)境中,Resource Manager還沒有被大量的使用起來,但是在11g(11.1.0.6 to 11.1.0.7 and 11gR2)開始,oracle默認(rèn)開啟Resource Manager計劃,如下顯示oracle后臺日志:
Setting Resource Manager plan SCHEDULER[0x51B5]:DEFAULT_MAINTENANCE_PLAN via scheduler window Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter Thu Feb 05 22:00:03 2009 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Thu Feb 05 22:00:39 2009 End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" |
如下顯示在10g和11g中,oracle對Resource Manager采用的不同策略:
Subject | 10G | 11G |
Maintenance Window | 2 windows, WEEK and WEEKEND | Each day has its own window |
Resource manager | Not enabled per default | Default resource plan specified |
故障詳解
客戶新系統(tǒng)上線,在上線后,發(fā)現(xiàn)數(shù)據(jù)庫出現(xiàn)大量的等待事件:resmgr:cpu quantum,這個等待事件是在11g中一個比較常見的等待事件,消耗大量CPU 系統(tǒng)性能變差,在和客戶溝通后,決定對該特性進(jìn)行關(guān)閉,以下為關(guān)閉該特性步驟:
ALTER SYSTEM SET “_resource_manager_always_on”=FALSE SCOPE=SPFILE SID='*';
#1,Change the active windows to use the null resource manager plan (or other nonrestrictive plan) using:
execute dbms_scheduler.set_attribute('WEEKNIGHT_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('WEEKEND_WINDOW','RESOURCE_PLAN','');
#2,Since in 11g there are more Maintenance Windows, we should add them too:
execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN','');
#3. Then, for each window_name (WINDOW_NAME from DBA_SCHEDULER_WINDOWS), run:
execute
dbms_scheduler.set_attribute(' |
以上關(guān)掉Resource Manager后效果確實很明顯,系統(tǒng)運行穩(wěn)定,CPU處于正常水平。但是運行一段時間后后臺開始報錯:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_186" ORA-29373: resource manager is not on |
從報錯上看,似乎和Resource Manager有一定的關(guān)系,從JOB的命名來看,這個JOB是"SQL Tuning Advisor Job",用于診斷和監(jiān)控高負(fù)載的SQL, 并為ADDM提供SQL優(yōu)化建議的JOB,第一次,我們對該報錯的JOB,也就是SQL Tuning Advisor進(jìn)行了關(guān)閉,關(guān)閉過程如下:
exec DBMS_AUTO_TASK_ADMIN.DISABLE('SQL TUNING ADVISOR',NULL, NULL); |
關(guān)閉完后又有相應(yīng)新的JOB開始報錯:
ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_255" ORA-29373: resource manager is not on |
我們檢查后臺最近的JOB執(zhí)行情況:
SQL> select client_name,window_name,job_name,job_status from dba_autotask_job_history ;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS ----------------------------- ------------------------------------- -------- auto optimizer stats collection WEDNESDAY_WINDOW ORA$AT_OS_OPT_SY_102 FAILED auto optimizer stats collection FRIDAY_WINDOW ORA$AT_OS_OPT_SY_105 FAILED auto optimizer stats collection TUESDAY_WINDOW ORA$AT_OS_OPT_SY_99 FAILED auto optimizer stats collection MONDAY_WINDOW ORA$AT_OS_OPT_SY_96 FAILED |
在以上的腳本輸出中,我們發(fā)現(xiàn)大量的后臺JOB報錯,而以上JOB報錯從告警日志上看,基本都是由于Resource Manager關(guān)閉造成。但是我們知道,Resource Manager的開啟關(guān)閉應(yīng)該不至于影響JOB的正常調(diào)度,而且在相同的數(shù)據(jù)庫版本下,我們在AIX上卻沒有發(fā)現(xiàn)問題,而在檢查過程中,我們發(fā)現(xiàn),雖然我們已經(jīng)關(guān)閉了Resource Manager,但是相關(guān)的DBRM進(jìn)程卻依舊存在,ORACLE的DBRM進(jìn)程進(jìn)程即為Resource Manager的管理進(jìn)程,在某種情況下,關(guān)閉了Resource Manager卻發(fā)現(xiàn)該進(jìn)程依舊存在,這本身就存在一定的問題。這從一定程度上給予我們一定的懷疑方向,可能存在resource并沒有完全關(guān)閉,而且從詳細(xì)的trace文件中的從call stack trace里來看, 當(dāng)前進(jìn)程也沒有沒有關(guān)于resource manager的函數(shù)調(diào)用,而是一直在向另外一個進(jìn)程post message。
所以,我們懷疑錯誤很有可能是由于DBRM沒有正常關(guān)閉造成。
#ps -ef|grep dbrm oracle 34920 1 0 Apr18 ? 00:00:59 ora_dbrm_nfdb1 |
在對問題進(jìn)行了詳細(xì)的跟蹤及分析后,我們向ORACLE提交了SR,ORACLE反饋給我們第二個隱含參數(shù): _resource_manager_always_on設(shè)置為FALSE
我們嘗試對該參數(shù)進(jìn)行設(shè)置,并重啟了數(shù)據(jù)庫生效該參數(shù)后,DBRM進(jìn)程消失:
# ps -ef | grep dbrm root 4650 3647 0 21:58 pts/2 00:00:00 grep dbrm |
且相關(guān)的JOB執(zhí)行正常。為了確認(rèn)問題,我們分別進(jìn)行了3個實驗,分別為:
1. 設(shè)置“_resource_manager_always_on隱含參數(shù),關(guān)閉resource manager windows調(diào)用計劃
2. 刪除隱含參數(shù),只設(shè)置resource manager windows調(diào)用計劃
3. 添加2個隱含參數(shù),關(guān)閉resource manager windows調(diào)用計劃
關(guān)閉數(shù)據(jù)庫,調(diào)整時間,設(shè)置_resource_manager_always_on隱含參數(shù),關(guān)閉windows計劃
關(guān)閉數(shù)據(jù)庫: SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down.
調(diào)整時間: [root@rhel6 ~]# date -s 2014/05/14 [root@rhel6 ~]# date -s 21:57:00 [root@rhel6 ~]# clock -w [root@rhel6 ~]# date Wed May 14 21:57:35 CST 2014
添加隱含參數(shù),啟動至open: SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE; SQL> startup force(測試環(huán)境,直接force啟動,生產(chǎn)環(huán)境勿如此操作)
設(shè)置resource manager plan: execute dbms_scheduler.set_attribute('SATURDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('SUNDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('MONDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('TUESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('WEDNESDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('THURSDAY_WINDOW','RESOURCE_PLAN',''); execute dbms_scheduler.set_attribute('FRIDAY_WINDOW','RESOURCE_PLAN',''); |
我們觀察22點的alert信息,確實開始報錯:
Wed May 14 22:00:03 2014 Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j003_4452.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_OS_OPT_SY_27" ORA-29373: resource manager is not on Wed May 14 22:00:03 2014 Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j004_4454.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_SA_SPC_SY_28" ORA-29373: resource manager is not on Wed May 14 22:00:03 2014 Errors in file /oracle/ora11g/base/diag/rdbms/ora11g/ora11g/trace/ora11g_j005_4456.trc: ORA-12012: error on auto execute of job "SYS"."ORA$AT_SQ_SQL_SW_29" ORA-29373: resource manager is not on Wed May 14 22:00:04 2014 XDB installed. XDB initialized. |
檢查DBRM進(jìn)程已經(jīng)存在:
[root@rhel6 ~]# ps -ef | grep dbrm ora11g 4346 1 0 21:57 ? 00:00:00 ora_dbrm_ora11g
|
檢查后臺JOB執(zhí)行記錄視圖:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME ----------------------------------- ------------------------------ ------------------------------ ------------------------------ ---------------------------------------- auto optimizer stats collection WEDNESDAY_WINDOW ORA$AT_OS_OPT_SY_27 FAILED 14-MAY-14 10.00.03.233570 PM PRC |
模擬過程與當(dāng)前環(huán)境產(chǎn)生效果一致
接下來我們進(jìn)行第二個模擬:
關(guān)閉數(shù)據(jù)庫,調(diào)整時間,去除隱含參數(shù),關(guān)閉windows計劃
關(guān)閉數(shù)據(jù)庫: SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down.
調(diào)整時間: [root@rhel6 ~]# date -s 2014/05/14 [root@rhel6 ~]# date -s 21:57:00 [root@rhel6 ~]# clock -w [root@rhel6 ~]# date Wed May 14 21:57:35 CST 2014
去除隱含參數(shù),啟動至open: 隱含參數(shù)去除采用create pfile from spfile; 刪除spfile,編輯pfile文件,刪除隱含參數(shù),以pfile啟動數(shù)據(jù)庫
設(shè)置resource manager plan:(由于前面已經(jīng)設(shè)置過,無需再設(shè)置) |
我們觀察22點的alert信息,發(fā)現(xiàn)沒有報錯
Tue May 13 22:00:03 2014 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Tue May 13 22:00:05 2014 XDB installed. XDB initialized.
|
檢查 DBRM進(jìn)程:
[root@rhel6 ~]# ps -ef | grep dbrm ora11g 3844 1 0 21:55 ? 00:00:00 ora_dbrm_ora11g |
說明:此時resource manager由于只是關(guān)閉了resource manager plan計劃,沒有真正關(guān)閉resource manager 因此該進(jìn)程依舊存在。
檢查后臺JOB執(zhí)行視圖信息:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME ---------------------------------------- ------------------------------ ------------------------- -------------------- ---------------------------------------- auto optimizer stats collection TUESDAY_WINDOW ORA$AT_OS_OPT_SY_24 SUCCEEDED 13-MAY-14 10.00.02.102741 PM PRC |
說明在隱含參數(shù)除掉的情況下,JOB可以正常執(zhí)行,后臺沒有報錯。
模擬隱含參數(shù)及resource manager plan均存在的情況
關(guān)閉數(shù)據(jù)庫,調(diào)整時間,添加隱含參數(shù),關(guān)閉windows計劃
關(guān)閉數(shù)據(jù)庫: SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. 調(diào)整時間: [root@rhel6 ~]# date -s 2014/05/15 [root@rhel6 ~]# date -s 21:57:00 [root@rhel6 ~]# clock -w [root@rhel6 ~]# date Thu May 15 21:57:35 CST 2014 添加隱含參數(shù),啟動至open: SQL> alter system set "_resource_manager_always_off"=true scope=spfile; SQL> ALTER SYSTEM SET "_resource_manager_always_on"=FALSE SCOPE=SPFILE; SQL> startup force(測試環(huán)境,直接force啟動,生產(chǎn)環(huán)境勿如此操作) 設(shè)置resource manager plan:(由于前面已經(jīng)設(shè)置過,無需再設(shè)置) |
我們觀察22點的alert信息,發(fā)現(xiàn)沒有報錯
Thu May 15 22:00:03 2014 Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" Thu May 15 22:00:05 2014 XDB installed. XDB initialized. End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK" |
檢查 DBRM進(jìn)程:
[root@rhel6 ~]# ps -ef | grep dbrm root 4650 3647 0 21:58 pts/2 00:00:00 grep dbrm |
說明:此時DBRM進(jìn)程消失
檢查后臺JOB執(zhí)行視圖信息:
SQL> select CLIENT_NAME,WINDOW_NAME,JOB_NAME,JOB_STATUS,JOB_START_TIME from DBA_AUTOTASK_JOB_HISTORY where CLIENT_NAME='auto optimizer stats collection' order by JOB_START_TIME desc;
CLIENT_NAME WINDOW_NAME JOB_NAME JOB_STATUS JOB_START_TIME ----------------------------------- ------------------------------ ------------------------------ ------------------------------ ---------------------------------------- auto optimizer stats collection THURSDAY_WINDOW ORA$AT_OS_OPT_SY_30 SUCCEEDED 15-MAY-14 10.00.02.115232 PM PRC |
說明在隱含參數(shù)兩個都添加的情況下,完全屏蔽resource manager的的情況下,JOB可以正常執(zhí)行,后臺沒有報錯。
技術(shù)結(jié)論
以上測試分析結(jié)果證明,后臺報錯JOB執(zhí)行失敗原因應(yīng)該是DBRM進(jìn)程依舊活動,而DBRM進(jìn)程是管理Resource Manager當(dāng)去除"_resource_manager_always_off"=true及"_resource_manager_always_on"=FALSE
或者將兩個參數(shù)全部添加,均可避免該錯誤,統(tǒng)計信息自動收集也可以自動執(zhí)行
在對以為問題進(jìn)行分析及確認(rèn)后,我們向ORACLE提交了相關(guān)的SR,最終,ORACLE對此,確認(rèn)相關(guān)的一個BUG,BUG號為:Bug 18748456 : AUTO TASK JOBS FAILED WITH ORA-29373 ERROR------------------------------------------------------------------------------------
<版權(quán)所有,文章允許轉(zhuǎn)載,但必須以鏈接方式注明源地址,否則追究法律責(zé)任!>
原博客地址:http://blog.itpub.net/23732248/
原作者:應(yīng)以峰 (frank-ying)
-------------------------------------------------------------------------------------
當(dāng)前名稱:ResourceManager關(guān)閉引起的自動JOB無法調(diào)用
文章路徑:http://fisionsoft.com.cn/article/iejids.html