时间:2021-07-01 10:21:17 帮助过:23人阅读
客户数据库运行缓慢,告诉都在等待物理读事件,远程连接查看statspack报告 DB Name DB Id Instance Inst Num Release Cluster Host------------ ----------- ------------ -------- ----------- ------- ------------xxxx 259264653 xxxx 1 9.2.0.1.0 NO M S
客户数据库运行缓慢,告诉都在等待物理读事件,远程连接查看statspack报告
DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
xxxx 259264653 xxxx 1 9.2.0.1.0 NO M
Snap Id Snap TIME Sessions Curs/Sess Comment
------- ------------------ -------- --------- -------------------
BEGIN Snap: 18 04-12月-14 00:00:03 56 5.2
END Snap: 26 04-12月-14 08:00:06 69 9.1
Elapsed: 480.05 (mins)
Cache Sizes (END)
~~~~~~~~~~~~~~~~~
Buffer Cache: 256M Std Block SIZE: 8K
Shared Pool SIZE: 600M Log Buffer: 2,048K
LOAD Profile
~~~~~~~~~~~~ Per SECOND Per TRANSACTION
--------------- ---------------
Redo SIZE: 33,077.28 157,241.29
Logical reads: 46,175.56 219,507.30
Block changes: 12,514.78 59,492.17
Physical reads: 1,582.20 7,521.40
Physical writes: 79.65 378.62
USER calls: 41.86 199.01
Parses: 6.11 29.04
Hard parses: 0.02 0.07
Sorts: 9.51 45.23
Logons: 0.19 0.92
Executes: 6,285.99 29,882.05
Transactions: 0.21
这里发现Block changes 高达12000以上,也就是每秒修改的12000块以上。
Top SQL
逻辑读
Buffer Gets Executions Gets per EXEC %Total TIME (s) TIME (s) Hash VALUE
--------------- ------------ -------------- ------ -------- --------- ----------
802,758,097 15,772 50,897.7 60.4 ######## ######### 827999392
UPDATE "xxx" "A1" SET "col1" = :B1 WHERE "A1"
."col2"=:B2 AND "A1"."col1" IS NULL OR "A1"."col1" <> :B1
物理读
Physical Reads Executions Reads per EXEC %Total TIME (s) TIME (s) Hash VALUE
--------------- ------------ -------------- ------ -------- --------- ----------
21,011,968 15,772 1,332.2 46.1 ######## ######### 827999392
UPDATE "xxx" "A1" SET "col1" = :B1 WHERE "A1"
."col2"=:B2 AND "A1"."col1" IS NULL OR "A1"."col1"<>:B1
表空间IO问题
Tablespace
------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
DATA
11,721,822 407 1.6 3.2 196,380 7 281 8.7
UNDOTBS1
7,477,389 260 9.2 1.0 2,088,770 73 37,178 8.0
-------------------------------------------------------------
数据文件
Tablespace Filename
------------------------ ----------------------------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
DATA D:\ORACLE\ORA92\xxx\xxxxxx.ORA
5,839,496 203 1.9 3.2 118,428 4 46 8.7
D:\ORACLE\ORA92\xxx\xxxxx.ORA
5,882,326 204 1.4 3.3 77,952 3 235 8.7
UNDOTBS1 D:\ORACLE\ORA92\xxxx\UNDOTBS01.DBF
7,477,389 260 9.2 1.0 2,088,770 73 37,178 8.0
-------------------------------------------------------------
从上面可以看出Tablespace data、undotbs1的IO使用比较严重,严重了上面的update语句导致。
查看下undo的使用
ROLLBACK Segment Storage FOR DB: FSJZ Instance: fsjz Snaps: 18 -26
->Optimal SIZE should be larger than Avg Active
RBS No Segment SIZE Avg Active Optimal SIZE Maximum SIZE
------ --------------- --------------- --------------- ---------------
0 385,024 6,553 385,024
1 35,774,464 1,003,908 129,097,728
2 10,608,640 996,987 75,620,352
3 13,819,904 772,807 18,997,248
4 5,365,760 1,355,300 14,802,944
5 8,511,488 5,457,853 3,885,096,960
6 4,183,941,120 2,771,161,530 4,252,098,560
7 5,365,760 619,248 10,608,640
8 14,802,944 7,010,756 21,094,400
9 3,268,608 2,908,152 22,142,976
10 4,317,184 567,499 4,317,184
-------------------------------------------------------------
上面的信息可以得出6号undo段已经使用了4G。
查看事物开始时间
会话的信息
SELECT
--t.xid xid
s.sid sid
, t.inst_id
, p.spid
, s.STATUS
, t.start_time
, t.STATUS transaction_status
, t.SPACE
, t.recursive
, t.noundo
, t.ptx
, ROUND((t.used_ublk * p.VALUE)/1024/1024) used_undo
, t.log_io
, t.phy_io
, t.cr_get
, s.username
, s.osuser
--, s.sql_id
--, s.sql_child_number
, s.program
FROM gv$transaction t
INNER JOIN gv$session s ON t.inst_id = s.inst_id AND t.ses_addr = s.saddr
INNER JOIN gv$process p ON p.inst_id = s.inst_id AND p.addr = s.paddr
ORDER BY t.start_time
;
从上面来看这个事物的Undo使用达到了17G
查看Update 语句的执行计划
SQL> EXPLAIN plan FOR
2 UPDATE xxxxx."xxxxxxxxxxxxxxxxx" "A1"
3 SET "col2" = '1'
4 WHERE "A1" ."col2"=96
5 AND "A1"."col1" IS NULL
6 OR "A1"."col1"<>'0'
7 ;
已解释。
SQL> SELECT * FROM TABLE(dbms_xplan.display());
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
---------------------------------------------------------------------------
| Id | Operation | Name | ROWS | Bytes | Cost |
---------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | | | |
| 1 | UPDATE | xxxxxxxxxxxxxxxxx | | | |
|* 2 | TABLE ACCESS FULL | xxxxxxxxxxxxxxxxx | | | |
---------------------------------------------------------------------------
Predicate Information (IDENTIFIED BY operation id):
---------------------------------------------------
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
2 - FILTER("A1"."col2"=96 AND "A1"."col1" IS NULL OR "A
1"."col1"<>'0')
Note: rule based optimization
已选择16行。
这里得知SQL走的全表扫,仔细查看SQL 发现
《AND "A1"."col1" IS NULL OR "A1"."col1"<>'0'》
这两个条件应该增加括号,否则会oracle会找到col1不等于某个值的所有结果进行更新。
手工KILL掉这个进程后,数据库事物回滚异常缓慢
查看数据库参数
SQL> SHOW parameter cpu_count NAME TYPE VALUE ------------------------------------ ----------- ------------------------ cpu_count INTEGER 16 SQL> SHOW parameter fast_start NAME TYPE VALUE ------------------------------------ ----------- ------------------------ fast_start_parallel_rollback string LOW SQL> SHOW parameter parallel_max NAME TYPE VALUE ------------------------------------ ----------- ------------------------ parallel_max_servers INTEGER 5
fast_start_parallel_rollback参数是加快回滚速度,这个参数默认是low,即2倍cpu数的并发度,进行并发的回滚。这个参数可以设置成high,即为4倍cpu数的并发度进行回滚。
如果设置了此参数为high或low,也就是4倍或2倍的cpu数,也会受到另外一些参数的影响,如PARALLEL_MAX_SERVERS,这个才是真正最大的并发度设置。
PARALLEL_MAX_SERVERS这个参数的默认值为PARALLEL_THREADS_PER_CPU * CPU_COUNT * concurrent_parallel_users * 5。PARALLEL_THREADS_PER_CPU 和CPU_COUNT都是初始化参数。
在一般情况下,并发的回滚总是比串行的快,但是当并发的子进程之间存在资源冲突的情况。
在并发子进程之间需要的资源冲突时,往往此时smon的等待事件是长期处于
Wait for stopper event to be increased
而子进程的等待事件是较多出现
Wait for a undo record。
此时,就是子进程冲突了。并发的回滚速度反而不如串行的回滚速度。
查询数据库得知现在最大并行进程设置5,远远不够并行恢复需要的进程。查看smon进程等待事件时,竟然没有记录。
通过关闭并行进程恢复,提高恢复速度
ALTER system SET fast_start_parallel_rollback=FALSE;
查看恢复进度
SQL> DECLARE
2 l_start NUMBER;
3 l_end NUMBER;
4 BEGIN
5 SELECT ktuxesiz INTO l_start FROM x$ktuxe WHERE KTUXEUSN=6 AND KTUXESLT=46;
6 dbms_lock.sleep(60);
7 SELECT ktuxesiz INTO l_end FROM x$ktuxe WHERE KTUXEUSN=6 AND KTUXESLT=46;
8 dbms_output.put_line('time est Day:'|| round(l_end/(l_start -l_end)/60/24,2));
9 END;
10 /
TIME est DAY:.14
SQL> SELECT 0.14*24 FROM dual
2 ;
0.14*24
----------
3.36
大体评估事务回滚需要3.36个小时
原文地址:记一次故障诊断:事物回滚, 感谢原作者分享。