Friday, October 27, 2006

When 'Exclusive' is not really exclusive...

One situation that can really annoy an IBM Informix DBA happens when you're trying to ALTER TABLE [...] and you get the error "Non exclusive access when executing a DDL...". You then think... "Oh... I'll lock it in exclusive mode"... And after killling some sessions, setting the lock mode to WAIT and opening a transaction you get the desired 'exclusive' lock. Then you try the ALTER TABLE again and what happens? The same error! How can it be you ask?! Read on...

First let me say that this same subject is referenced in one of the many technical support articles. The link is here. Why am I writing this then? Well, because I think there's a bit more than what the article says... But please understand that everyhing in that article is correct, and that it solves the problem in most cases.

Let's start by making a few points:
  1. When you set the exclusive lock on a table using "BEGIN WORK; LOCK TABLE IN EXCLUSIVE MODE you don't prevent sessions with DIRTY READ isolation level from accessing the table
  2. In order to do an ALTER TABLE on a table, you need exclusive access on a different level. The fact that no one (besides dirty readers) can't read data from the table is not enough. There must be no references to the table structure. This is on a lower level than the SQL layer. This means for example that there can be no open cursors, no prepared statements using the table, and there can be no sessions waiting for the lock you establish. If for example at time T0 you get the exclusive lock, and before you make the ALTER TABLE, at T1, a session in LOCK MODE WAIT tries to SELECT FROM THE TABLE, this session will have a reference to the table. At T2 when you give the ALTER command, you won't be able to execute it. This needs a very precise timming to happen, but it can be easy when you're dealing with an "hot" table

For the first situation, our friends at R&D have come up with a solution, documented in the already mentioned article. There is an otherwise undocumented variable called IFX_DIRRTY_WAIT which allows us to specify a maximum number of seconds that we will wait for DIRTY READERS to release the table structure. This will also prevent new sessions to attach to the table structure.
So, two things can happen. If the ammount of time specified in IFX_DIRTY_WAIT is enough to let all dirty readers to complete their jobs, your ALTER TABLE statement will wait for a while and successfuly execute after the last one ends their reads. Otherwise, if there are still any dirty reader after the timeout, you'll get the normal error. In this case you'll have to increase the value you put in IFX_DIRTY_WAIT or use the time to kill some sessions.

But this won't solve the second situation. For this you will have to find what sessions have references to the table and close them. And if you have a "wild" environment where you can't stop sessions from popping up, and many of this sessions make references to the table, you'll also need a way to stop this new sessions from using the table. I had cases where I checked which sessions were preventing me from doing the DDL and closed them. But when I retried the ALTER, there were already other sessions with references to the table. So, how can we find which session has a reference to a table?


> ideiafix@PacMan.domus.online.pt:informix-> onstat -g opn

IBM Informix Dynamic Server Version 10.00.UC5 -- On-Line -- Up 00:43:22 -- 87776 Kbytes
tid rstcb isfd op_mode op_flags partnum ucount ocount lockmode
39 0x477bf4c8 0 0x00000400 0x00000397 0x00100071 2 2 0
39 0x477bf4c8 1 0x00000002 0x00000003 0x00100071 2 2 0
39 0x477bf4c8 3 0x00000408 0x00000017 0x001000a3 1 1 0


Err... and now? Well, as you can see you have a column called "partnum", and another with the "rstcb". Knowing the tablename allows you to find out the partnum (and vice-versa) by looking at sysmaster:systabnames.

[informix@PacMan ~]$ dbaccess sysmaster -

Database selected.

> select hex(partnum), tabname, dbsname from systabnames where partnum = '0x001000a3';



(expression) 0x001000A3
tabname customer
dbsname stores_demo

1 row(s) retrieved.


And you can find the session ID from the "rstcb", using something like:

[informix@PacMan ~]$ onstat -u | grep 477bf4c8
477bf4c8 Y--P--- 21 informix 2 482d71a8 0 1 0 0
[informix@PacMan ~]$ onstat -g sql 21

IBM Informix Dynamic Server Version 10.00.UC5 -- On-Line -- Up 00:48:19 -- 87776 Kbytes

Sess SQL Current Iso Lock SQL ISAM F.E.
Id Stmt type Database Lvl Mode ERR ERR Vers Explain
21 SELECT stores_demo NL Not Wait 0 0 9.03 Off

Current statement name : slctcur

Current SQL statement :
select * from customer

As long as session 21 continues to reference the table structure you won't be able to issue any ALTER TABLE on customer table, even after being able to lock it:

[informix@PacMan ~]$ dbaccess stores_demo -

Database selected.

> begin work;

Started transaction.

> lock table customer in exclusive mode;

Table locked.

> alter table customer add (dummy_col char);

242: Could not open database table (informix.customer).

106: ISAM error: non-exclusive access.
Error in line 1
Near character position 40
>


So, this are the basics to find which sessions are referencing a specific table. But I really don't want to see anybody messing around with a lot of onstat commands, greps etc. I've already wrote a script which can do all this. It's called ixtableuse and it will be available soon.

So, by now, we have ways to handle dirty readers and to find out the sessions referencing the tables. We only need a way to prevent new sessions from getting references while we clear the ones we identified. How can we do this? Well... using a dirty trick... All the sessions trying to access a table somehow will have to read it's record from the systables table. Yes... that's it... If somehow we lock that record also, the sessions in LOCK MODE WAIT will get stuck on this systables record, before they open the table in question. How can we lock a record on systables? For example by GRANTing that table some dummy privilege.

So, let's see how to do some DDL on a very "hot" table in a "wild" environment.
In session 1 (the session where you want to ALTER TABLE):

> IFX_DIRTT_WAIT=300
> export IFX_DIRTY_WAIT
> #the above will take care of dirty readers...
> [informix@PacMan ~]$ dbaccess stores_demo -

Database selected.

> set lock mode to wait;

Lockmode set.

> begin work;

Started transaction.

> grant select on customer to dummy_user;

Permission granted.

> lock table customer in exclusive mode;

Table locked.
After this, if you try to access the table from another session in default isolation and lock mode you'll get:

ideiafix@PacMan.domus.online.pt:informix-> dbaccess stores_demo -

Database selected.

> select * from customer;

211: Cannot read system catalog (systables).

107: ISAM error: record is locked.
Error in line 1
Near character position 22


Note that the error refers to systables, not the table customer.


Back to session 1:

> alter table customer add (dummy_col char);

242: Could not open database table (informix.customer).

106: ISAM error: non-exclusive access.
Error in line 1
Near character position 40

Why?! Well, the initial session that ended the select * from customer statement is still referencing the table. You can see how many "users" are referencing it using another onstat:

> ideiafix@PacMan.domus.online.pt:informix-> onstat -t

IBM Informix Dynamic Server Version 10.00.UC5 -- On-Line -- Up 01:22:31 -- 87776 Kbytes

Tblspaces
n address flgs ucnt tblnum physaddr npages nused npdata nrows nextns
4 477f8d38 0 1 100001 1:14 250 250 0 0 1
30 47de1018 0 2 100071 1:126 16 11 5 65 2
80 47e6f018 0 2 1000a3 1:176 8 3 2 28 1
3 active, 108 total


This can be cleared by an onmode -z or a kill to the client process:

onmode -z 21

And again:

ideiafix@PacMan.domus.online.pt:informix-> onstat -t

IBM Informix Dynamic Server Version 10.00.UC5 -- On-Line -- Up 01:24:52 -- 87776 Kbytes

Tblspaces
n address flgs ucnt tblnum physaddr npages nused npdata nrows nextns
4 477f8d38 0 1 100001 1:14 250 250 0 0 1
30 47de1018 0 1 100071 1:126 16 11 5 65 2
80 47e6f018 0 1 1000a3 1:176 8 3 2 28 1

Only one ucnt for the table... We can now ALTER the table... but what happens when the user who got the error on systables retries with anothe LOCK MODE? Check it:

ideiafix@PacMan.domus.online.pt:informix-> dbaccess stores_demo -

Database selected.

> set lock mode to wait;

Lockmode set.

> select * from customer;

He gets stuck... where?:

ideiafix@PacMan.domus.online.pt:informix-> onstat -u

IBM Informix Dynamic Server Version 10.00.UC5 -- On-Line -- Up 01:30:03 -- 87776 Kbytes

Userthreads
address flags sessid user tty wait tout locks nreads nwrites
477be018 ---P--D 1 informix - 0 0 0 30 42
477be544 ---P--F 0 informix - 0 0 0 0 682
477bea70 ---P--- 7 informix - 0 0 0 0 0
477bef9c ---P--B 8 informix - 0 0 0 0 0
477bf9f4 Y--P--D 14 informix - 4407d574 0 0 0 0
477bff20 ---P--D 11 informix - 0 0 0 0 0
477c044c Y-BP--- 43 informix 4 47f706d0 0 10 0 0
477c0978 L--PR-- 45 informix 5 440d03c4 -1 1 0 0
8 active, 128 total, 18 maximum concurrent

ideiafix@PacMan.domus.online.pt:informix-> onstat -k

IBM Informix Dynamic Server Version 10.00.UC5 -- On-Line -- Up 01:30:06 -- 87776 Kbytes

Locks
address wtlist owner lklist type tblsnum rowid key#/bsiz
440d0314 0 477c0978 0 HDR+S 100002 204 0
440d036c 0 477c044c 440d057c HDR+IX 100074 0 0
440d03c4 477c0978 477c044c 440d04cc HDR+X 100071 80b 0
440d041c 0 477c044c 440d036c HDR+X 100074 911 0
440d0474 0 477c044c 440d041c HDR+X 100074 911 K- 1
440d04cc 0 477c044c 440d0524 HDR+IX 100071 0 0
440d0524 0 477c044c 440d0474 HDR+X 100074 911 K- 2
440d057c 0 477c044c 0 S 100002 204 0
440d05d4 0 477c044c 440d03c4 HDR+X 1000a3 0 0
440d062c 0 477c044c 440d05d4 HDR+U 100071 80c 0
440d0684 0 477c044c 440d062c HDR+U 100071 903 0
11 active, 10000 total, 16384 hash buckets, 0 lock table overflows
ideiafix@PacMan.domus.online.pt:informix-> onstat -u | grep 477c0978
477c0978 L--PR-- 45 informix 5 440d03c4 -1 1 0 0
ideiafix@PacMan.domus.online.pt:informix-> onstat -u | grep 477c044c
477c044c Y-BP--- 43 informix 4 47f706d0 0 10 0 0
ideiafix@PacMan.domus.online.pt:informix-> onstat -g sql 45

IBM Informix Dynamic Server Version 10.00.UC5 -- On-Line -- Up 01:30:44 -- 87776 Kbytes

Sess SQL Current Iso Lock SQL ISAM F.E.
Id Stmt type Database Lvl Mode ERR ERR Vers Explain
45 SELECT stores_demo CR Wait 0 0 9.03 Off

Current SQL statement :
select * from customer

Last parsed SQL statement :
select * from customer

ideiafix@PacMan.domus.online.pt:informix-> onstat -g sql 43

IBM Informix Dynamic Server Version 10.00.UC5 -- On-Line -- Up 01:30:47 -- 87776 Kbytes

Sess SQL Current Iso Lock SQL ISAM F.E.
Id Stmt type Database Lvl Mode ERR ERR Vers Explain
43 - stores_demo CR Wait 0 0 9.03 Off

Last parsed SQL statement :
alter table customer add (dummy_col char)

> ideiafix@PacMan.domus.online.pt:informix-> dbaccess sysmaster -

Database selected.

> select hex(partnum), tabname, dbsname from systabnames where partnum = '0x100071';



(expression) 0x00100071
tabname systables
dbsname stores_demo

1 row(s) retrieved.

He is waiting on the lock you got on systables, and this prevents his session from grabbing a reference to the table structure:

> ideiafix@PacMan.domus.online.pt:informix-> onstat -t

IBM Informix Dynamic Server Version 10.00.UC5 -- On-Line -- Up 01:34:12 -- 87776 Kbytes

Tblspaces
n address flgs ucnt tblnum physaddr npages nused npdata nrows nextns
4 477f8d38 0 1 100001 1:14 250 250 0 0 1
30 47de1018 0 2 100071 1:126 16 11 5 65 2
80 47e6f018 0 1 1000a3 1:176 8 3 2 28 1


so you can complete your work (back to session 1):

> alter table customer add (dummy_col char);

Table altered.

> revoke select on customer from dummy_user;

Permission revoked.

> commit work;

Data committed.


And after this, the waiting session will continue:

[...]

customer_num 128
fname Frank
lname Lessor
company Phoenix University
address1 Athletic Department
address2 1817 N. Thomas Road
city Phoenix
state AZ
zipcode 85008
phone 602-533-1817
dummy_col

28 row(s) retrieved.


Please, remember to include the REVOKE...


So, this ends the article. To summarize it:
When you need to alter some table structure you'll need more than exclusive lock on the table because you'll need that no more sessions are referencing the table structure. These other sessions can be of two types:

  • Dirty readers - for these set IFX_DIRTY_WAIT=300 in your environment (this will give you at most 300 seconds while you can kill the sessions or simply wait for them to finish)
  • Other sessions with holds on the table structure. These can be sessions in LOCK MODE WAIT [x], that are waiting for your lock (but that already have a reference to the table structure) or sessions with open cursors, prepared statements etc. referencing the table

    For these use:

    SET LOCK MODE TO WAIT;
    BEGIN WORK;
    LOCK TABLE IN EXCLUSIVE MODE;
    -- dummy grant to prevent new sessions from getting holds on the table structure
    GRANT SELECT ON FOR dummy_user;
    -- before continuing use onstat -g opn to identify the sessions with references
    -- and kill them...
    ALTER TABLE ;
    -- clean up the dummy GRANT
    REVOKE SELECT ON FROM dummy_user;
    COMMIT WORK;

No comments:

Post a Comment