Pages

Tuesday, February 24, 2009

Journey from 9.0.1 to 9.2.0.8: Part-II

We've seen in the first part that some serious attention was needed towards few of the issues which were escalating day-by-day, namely:


  1. Blocking Locks, affecting user activities
  2. Serious Performance Degradation (Slow Performance, Session Hangs, etc.)
  3. System Crashes in Business Hours, causing downtime
After I joined the company, it took me a while to get the picture by studying the logs and regularly interacting with everyone in the department, with the users & the relevant points of contact in the company. I remember to have drafted a 19-page system review report, which highlighted the following:


  • SERVER CONFIGURATIONS & USAGE STATISTICS
  • APPLICATION ARCHITECTURE & BACKUP STRATEGY
  • DATABASE ARCHITECTURE & CONFIGURATION
  • DATABASE BACKUP-RECOVERY STRATEGY
  • IMMEDIATE RISK FACTORS & AVAILABLE SOLUTIONS (w/ACTION PLAN)
  • LONG-TERM RISK FACTORS & AVAILABLE SOLUTIONS
  • SERVICE IMPROVEMENT PLANS
The database was hosted on a Pentium III 1GHz Dual Core, IBM e-Server, with 1.5 GB RAM and RAID 5 configured 50 GB HDD (40 GB dedicated to Oracle). The Operating System was Windows Server 2000 Standard Edition.
The database was created using Oracle 9i Release 1 (9.0.1) software, with the default name that Oracle gives (ORCL). Default Block Size was used (4K). Archive Logging was disabled. The size of all the Datafiles was around 30 GB. Total SGA was 930 MB.
Out of 930 MB SGA, the Shared Pool was 320 MB, the Buffer Cache was 600 MB, the Log Buffer was 512 KB, and the Sort Area Size was 1.5 MB.

Backups scheduled were schema level export dumps, taken alternate days and pushed to DDS Tapes, No Recovery Testing had been carried out before to check the validity of the backups or even if it was possible to recover the data from the backups. This task was of grave importance, i.e. to test the validity of current backups and replace them with Oracle RMAN/User-Managed Backup, or atleast have a full database export carried out until RMAN is implemented.

Enabling Archive Logging took sometime due to the Resource Constraints on the server. But until that could be resolved, I had already scripted and scheduled a full database export dump creation. I periodically tested the complete recovery of the full export dumps, just to assure database availability to the point of last backup. This was really helpful, because now I could also test migration of the database to any version using the export dumps.

Two major resource contentions that were evidently visible were:


  • The Free RAM on the server would be between 150-200 MB at peak hours, indicating that the server & the database required more memory for better performance.

We had a hard time finding RAM for the server as the model was near to extinct. Eventually, over a period of time, i.e. around an year, we were able to upgrade the RAM to 4 GB. Until then, there was no way but to fix a few things using the existing resources.

  • And, 40 GB partition on which the database was residing had only 4 GB of free space left.
Out of 30 GB data files' size, the TEMP tablespace size was unusually 12 GB, so I had to resize the Temporary tablespace to free quite a lot of space.

The listener.log file had grown more than 2 GB in size. After recreating the log file, I was able to claim another 2GB of space.

With the claimed space, I appropriately sized the tablespaces/datafiles that were lacking available free space, such that there was more than 80-85% of free space available in each tablespace.

Statistics were not gathered on any of the tables. A series of Testing was carried out along with the developers on the behavior of the system in CBO mode with gathered statistics. One notable issue that we faced during the transition was with some Oracle Reports that would occasionally show No Data. These reports were fixed by adding NVL functions on the referenced parameters by Oracle Reports. Over and above, we successfully, migrated from RBO to CBO, and were seeing numbers to measure performance (COST, CARDINALITY, etc).

I configured Statspack to run at an interval of 15 minutes and analyzed the reports for activities performed duting peak hours. Time and again, I visited the top SQLs and wherever possible tuned them along with the assistance of developers. The ERP was poorly indexed. With CBO in play, I was able to test the performance gains due to any new indexes on tables referenced by Top Disk Read & Buffer Get Queries, and then created them on the production.

I came across a lot of unindexed foriegn keys. This was one of the main reasons for Blocking Locks. Using one of the scripts that I had, I created indexes for the unindexed foriegn keys. I also, seggregated the Indexes in separate tablespaces dedicated for indexes. In a couple of days, we could see a drastic decrease in the number of user calls related to Blocking Lock Issue.

By now, I had gradually dealt with both the Performance Degradation Issue and the Blocking Locks Issue, and yet have serious resource constraints to deal with until I get to upgrade the RAM on the server or migrate the database to a new server.

In the next part, I will elaborate more on how I dealt with the System Crash Issue that occured couple of weeks after I joined and how I tackled and resolved the same.

Monday, February 23, 2009

Blat:Win32 console utility to send mail

Blat is a small, efficent SMTP command line mailer for Windows.

Blat simplifies the command line by storing any or all of the following in the regestry [HKEY_LOCAL_MACHINE \ SOFTWARE \ Public Domain \ Blat].
SMTP Server Address
Sender's Address
Number of times to retry sending
Port number to use (ie, if not the SMTP default of 25)
The -q switch which "supresses *all* output"

You need to use -install so that blat can recognize your SMTP server.
Blat -install smtphost..mymail.com test@mymail.com // Sets host and userid
Blat -install smtphost..mymail.com test // Sets host and userid
Blat -install smtphost..mymail.com // Sets host only

Now, you can use blat to send attachments via emails from your command prompt. This is helpful, when you want to send backup logs to emails.
Blat C:\test.txt -to tested@mymail.com -server smtphost..mymail.com -f test@mymail.com

For more information on how to use Blat, check out www.blat.net

Sunday, February 22, 2009

Resizing an Over-Grown Temporary Tablespace

One fine day, while carrying out our daily health checks we came across an over-grown Temporary Tablespace. The Temporary Tablespace had over-grown to 12GB. Normally, our temporary tablespace has not exceeded beyond 1 GB. But due to some one-time un-tuned script that ran the other evening, the tablespace having autoextend set to unlimited had grown to 12GB. We needed to reduce the size of the temporary tablespace to free space on the server, as our server had resource constraints.

You can not reduce the size of the temporary tablespace, despite the % of free space shown. You have to drop and re-create the temporary tablespace. Let me show you how to regain that occupied space by the over-grown temporary tablespace.

Assuming the Default Temporary Tablespace is "TEMP", you create a new temporary tablespace "TEMP02".

SQL> CREATE
2 TEMPORARY TABLESPACE "TEMP02" TEMPFILE
3 'C:\ORACLE\ORADATA\ORCL\TEMP02.dbf' SIZE 100M REUSE
4 AUTOEXTEND ON
5 MAXSIZE 1024M EXTENT MANAGEMENT LOCAL;
Tablespace created.


Then, you set "TEMP02" as your default temporary tablespace.
SQL> ALTER DATABASE DEFAULT TEMPORARY TABLESPACE "TEMP02";
Database altered.

Then, you drop the over-grown temporary tablespace "TEMP" and recreate the "TEMP" temporary tablespace.
SQL> DROP TABLESPACE TEMP INCLUDING CONTENTS AND DATAFILES;
Tablespace dropped.

SQL> CREATE
2 TEMPORARY TABLESPACE "TEMP" TEMPFILE
3 'C:\ORACLE\ORADATA\ORCL\TEMP01.dbf' SIZE 500M REUSE
4 AUTOEXTEND ON
5 MAXSIZE 600M EXTENT MANAGEMENT LOCAL;
Tablespace created.

Then, you make "TEMP" as your default temporary tablespace, and drop the temporary tablespace "TEMP02".
SQL> ALTER DATABASE DEFAULT TEMPORARY TABLESPACE "TEMP";
Database altered.

SQL> DROP TABLESPACE temp02 INCLUDING CONTENTS AND DATAFILES;
Tablespace dropped.

Please ensure that you are carrying out this activity in your Maintenance Windows, or else you may encounter following errors while dropping the temporary tablespace in use:
Errors in file c:\oracle\admin\orcl\udump\orcl_ora_2300.trc:
ORA-01258: unable to delete temporary file C:\ORACLE\ORADATA\ORCL\TEMP01.DBF
ORA-27056: skgfrdel: could not delete file
OSD-04024: Unable to delete file.
O/S-Error: (OS 32) The process cannot access the file because it is being used by another process.

Dealing with Unindexed Foriegn Keys

Couple of months back we were hitting a lot of deadlock issues. I found out that deadlocks occur due to unindexes foreign keys and found the following script to find and create unindexed foriegn keys.

Using SQL*Plus, connect to the schema for which you need to find the unindexed foriegn keys. Then execute the following scripts:

Script to find Unindexed Foriegn Keys


column columns format a30 word_wrapped
column tablename format a15 word_wrapped
column constraint_name format a15 word_wrapped

select table_name, constraint_name,
cname1 PIPE-PIPE nvl2(cname2,',' PIPE-PIPE cname2,null) PIPE-PIPE
nvl2(cname3,',' PIPE-PIPE cname3,null) PIPE-PIPE nvl2(cname4,',' PIPE-PIPE cname4,null) PIPE-PIPE
nvl2(cname5,',' PIPE-PIPE cname5,null) PIPE-PIPE nvl2(cname6,',' PIPE-PIPE cname6,null) PIPE-PIPE
nvl2(cname7,',' PIPE-PIPE cname7,null) PIPE-PIPE nvl2(cname8,',' PIPE-PIPE cname8,null)
columns
from ( select b.table_name,
b.constraint_name,
max(decode( position, 1, column_name, null )) cname1,
max(decode( position, 2, column_name, null )) cname2,
max(decode( position, 3, column_name, null )) cname3,
max(decode( position, 4, column_name, null )) cname4,
max(decode( position, 5, column_name, null )) cname5,
max(decode( position, 6, column_name, null )) cname6,
max(decode( position, 7, column_name, null )) cname7,
max(decode( position, 8, column_name, null )) cname8,
count(*) col_cnt
from (select substr(table_name,1,30) table_name,
substr(constraint_name,1,30) constraint_name,
substr(column_name,1,30) column_name,
position
from user_cons_columns ) a,
user_constraints b
where a.constraint_name = b.constraint_name
and b.constraint_type = 'R'
group by b.table_name, b.constraint_name
) cons
where col_cnt > ALL
( select count(*)
from user_ind_columns i
where i.table_name = cons.table_name
and i.column_name in (cname1, cname2, cname3, cname4,
cname5, cname6, cname7, cname8 )
and i.column_position <= cons.col_cnt
group by i.index_name
);



Script to generate 'CREATE' statements for Unindexed Foriegn Keys


create global temporary table temp_index
(commande varchar2(500))
on commit delete rows;

define Tablespace_name = &&Tablespace_name
set feedback off
set linesize 255
set serveroutput on
set verify off
set heading off;
declare
L_nom_colo varchar2(2000);

Cursor sel_cons is
Select constraint_name, table_name
from user_constraints
where constraint_type = 'R';

Cursor sel_colo(P_nom_cons user_cons_columns.constraint_name%type,
P_nom_tabl user_cons_columns.table_name%type) is
Select column_name
from user_cons_columns
where constraint_name = P_nom_cons
and table_name = P_nom_tabl
order by position;

begin

for liste_cons in sel_cons loop
L_nom_colo := null;
for liste_colo in sel_colo(liste_cons.constraint_name, liste_cons.table_name) loop
if L_nom_colo is not null and liste_colo.column_name is not null then
L_nom_colo := L_nom_colo PIPE-PIPE ',';
end if;
L_nom_colo := L_nom_colo PIPE-PIPE liste_colo.column_name;
End loop;

insert into temp_index values('Create index ' PIPE-PIPE liste_cons.constraint_name PIPE-PIPE ' on '
PIPE-PIPE liste_cons.table_name PIPE-PIPE '(' PIPE-PIPE L_nom_colo PIPE-PIPE ') tablespace &Tablespace_name;');
end loop;
end;
/
spool &SCRIPT_NAME
select * from temp_index;
spool off
drop table temp_index;


A deadlock means that process X has a lock on resource 1 and is waiting for resource 2, while process Y has a lock on resource 2 and is waiting to acquire a lock on resource 1.

SQL Performance Diagnostics Scripts

I use the following SQL Scripts for finding top SQLs that may cause performance degradation.

Top SQL by Disk Reads
select substr(sql_text,1,500) "SQL",
(cpu_time/1000000) "CPU_Seconds",
disk_reads "Disk_Reads",
buffer_gets "Buffer_Gets",
executions "Executions",
case when rows_processed = 0 then null
else (buffer_gets/nvl(replace(rows_processed,0,1),1))
end "Buffer_gets/rows_proc",
(buffer_gets/nvl(replace(executions,0,1),1)) "Buffer_gets/executions",
(elapsed_time/1000000) "Elapsed_Seconds",
module "Module"
from v$sql s
order by disk_reads desc nulls last;

Top SQL by Buffer Gets
select substr(sql_text,1,500) "SQL",
(cpu_time/1000000) "CPU_Seconds",
disk_reads "Disk_Reads",
buffer_gets "Buffer_Gets",
executions "Executions",
case when rows_processed = 0 then null
else (buffer_gets/nvl(replace(rows_processed,0,1),1))
end "Buffer_gets/rows_proc",
(buffer_gets/nvl(replace(executions,0,1),1)) "Buffer_gets/executions",
(elapsed_time/1000000) "Elapsed_Seconds",
module "Module"
from v$sql s
order by buffer_gets desc nulls last;

Top SQL by CPU
select substr(sql_text,1,500) "SQL",
(cpu_time/1000000)
"CPU_Seconds",
disk_reads "Disk_Reads",
buffer_gets
"Buffer_Gets",
executions "Executions",
case when rows_processed = 0 then
null
else (buffer_gets/nvl(replace(rows_processed,0,1),1))
end
"Buffer_gets/rows_proc",
(buffer_gets/nvl(replace(executions,0,1),1))
"Buffer_gets/executions",
(elapsed_time/1000000) "Elapsed_Seconds",
module
"Module"
from v$sql s
order by cpu_time desc nulls last;

Top SQL by Executions

select substr(sql_text,1,500) "SQL",
(cpu_time/1000000)
"CPU_Seconds",
disk_reads "Disk_Reads",
buffer_gets
"Buffer_Gets",
executions "Executions",
case when rows_processed = 0 then
null
else (buffer_gets/nvl(replace(rows_processed,0,1),1))
end
"Buffer_gets/rows_proc",
(buffer_gets/nvl(replace(executions,0,1),1))
"Buffer_gets/executions",
(elapsed_time/1000000) "Elapsed_Seconds",
module
"Module"
from v$sql s
order by executions desc nulls last;

Wednesday, February 18, 2009

RMAN Full Backup Script

Though, creating RMAN Full backup script is pretty simple, I would like to share my script and how to create it over here.

Firstly, Invoke RMAN. Then, connect to your Recovery Catalog and Target Database.

You can create a RMAN Full Backup Script by running the following:

create script prod_full_bkp{
CROSSCHECK BACKUP;
CROSSCHECK
ARCHIVELOG ALL;
Sql 'alter system archive log current’;
BACKUP DATABASE;
BACKUP ARCHIVELOG ALL DELETE INPUT;
REPORT UNRECOVERABLE;
REPORT
OBSOLETE ORPHAN;
}

The script will be stored in the Recovery Catalog. Here, obsolete will reflect obsolete backups as per the retention policy configured for the target database. Delete Input will delete the archivelogs after they have been backed up.

You can either manually run or schedule a job to run using the following script:

spool log to '\prod_full_bkp.log';
run {execute script prod_full_bkp;}
list backup summary;
spool log off;

Tuesday, February 17, 2009

Journey from 9.0.1 to 9.2.0.8: Part-I

I would like to share my experience on how we migrated the production instance from Oracle 9iR1 (9.0.1) to Oracle 9iR2 (9.2.0.8) in August 2008.

To begin with, let me provide a background on what was the situation before I joined this company.

Our company's ERP is based on Oracle Forms/Reports 6i and Oracle Database 9i . Ever since the ERP was implemented, i.e. more than 5-6 years back, the database was never looked upon as there was no DBA in the company and the Vendors gave the thought "if it ain't broke, don''t fix it". The production instance was configured on a Pentium III Dual Core (1GHZ, 1.5GB RAM, 40GB HDD with RAID 5 Configuration) having Windows 2000 Server.

The developers were supporting the ERP as well as new modules developed were being added. Over a period of time the database grew, number of users increased and the database started suffering due to performance degradation and unexpected crashes in the middle of the business hours.

In the event of a crash, the only thing what the Developers and IT Admins did was restarted the Server. IT Admins delegated the responsibility to the developers as it was out of their purview. And the developers, having no or limited Database Administration experience, had no clue why the crashes were happening in the first place.

Initially, the Vendors analyzed the incidents and instructed the developers to increase the size of undo tablespace (200M --> 500M), to add a datafile on any tablespace that exceeds 70% occupied space, to increase the SGA (600M --> 900M) and resize the Buffer Cache & Shared Pool, to reduce open cursors (2500 --> 500), to increase Redo Log Groups (3 --> 6), and to delete the statistics on few tables that had them. The database was supposed to run with Rule Based Optimizer (RBO) mode, so the Vendors advised not to gather statistics on any of the tables.

There were lots of issues seen related to blocking locks. For which, the developers had to either ask the blocking session's user to save the work and exit or kill the blocking sessions using Toad.

Users, time and again, complained about slow performance and session hangs. Some of their queries, mostly specific to Reports, which had a possibility of being addressed were put through query rewrite using hints or tuning, the effect of which was mostly unpredictable due to Rule Based optimization.

Well, needless to say, the developers had to find ways to resolve the incidents/issues for which they had limited knowledge/experience. Keeping in mind the downtime implications, they had quite some tough times. Fortunately, apart from crash incidents, there were no Disaster Recovery Situations, and the business continued at the same pace.

In the next part, I will elaborate on my perception & resolutions applied on the above problems that the company was facing, when I joined the company.

RMAN B/R Implementation on Production

We implemented RMAN backup strategy our production instance earlier this week. We had daily logical full database backups scheduled and weekly cold backups for our 9.2.0.8 Production Instance. The Server did not have sufficient resources to be put in archivelog mode, as the server is pretty old. We arranged extra RAM and some space on the server to achieve the same. Why we couldn't upgrade the production database to a new server is another question that might arise in your minds.

It was only while carrying out the RMAN B/R testing that we came to know that we will be missing few features, if we don't have Oracle Database Enterprise Edition. One of them was Cumulative/Incremental Backups. We had no other option but to have daily full database hot backups. We configured recovery window of 2 days for the production hot backups.

We planned the switch to archivelog mode on the Production Instance after office hours, as we had to bounce the instance. We now generate more 1 GB of archvies per day.

We set up the recovery catalog in a separate database on another machine and registered the production instance in the recovery catalog.

I am thinking of setting up another recovery catalog or in the target controlfile to take periodic backup of the Recovery catalog hosting backup information of the Production Instance.

I will update one of the best features I like about RMAN in the post, with some examples.

My Intro

I am an Oracle DBA working in Kuwait. Trying to learn more from the day-to-day activities and experiences, I will be sharing my knowledge, findings and experiences here for people to see and provide their valuable feedback.