Wow, talk about Mondays! I’ve been developing an ASP.NET Web service that connects to an Oracle backend. Up to today, we’ve only tested on an isolated network and guess what–the configuration of this environment didn’t exactly mirror the QA or the production environment. When we were testing on this isolated network, the application never had Oracle connectivity problems. Today, we go to the staging or QA environment and the Oracle connections would just die after 20 minutes or so. “ORA-; exceeded maximum idle time “. If you reload the erroneous page, you get a different error: “ORA-01012; not logged on“. Keep reloading the page but the app is stuck with the “not logged on” exception. After 30 minutes or so, the application can connect to the Oracle server again! So what happened here?
Here’s what happened. Apparently, due to security policies set forth by the organization, all Oracle connections must time-out (Oracle session status=”SNIPE”) if the connection has exceeded the “Idle Time” specification in the Oracle Profile. Without getting deep into Oracle, the server will kill the connections after the Idle Time limit. I actually had our Oracle DBA helped me diagnose the problem. The Oracle DBA set traces and we monitored the Oracle sessions. That’s when I saw that even if initially, the application will keep several connections open (connection pooling is turned on for the OracleConnection object and you will see INACTIVE status in the Oracle sessions–inactive is okay–it just wasn’t executing commands at that time), after several minutes (15 minutes is the set Idle Time), the Oracle sessions would disappear. In a few rare occassions, we actually saw the Oracle sessions for the app time-out. The sessions appear with a “SNIPE” status for a little bit and then the server eventually cleans them up.
Is this behavior reproducable? Absolutely! Create an Oracle Profile (get your DBA if you don’t have rights to do this) and set the Idle Time to a short interval like 5 minutes. Next, associate the Oracle user-name that the application will use to the profile that has the 5-minute Idle Time limit. Run the app and you should be able to connect to Oracle. Wait like 10 minutes; your next call will generate the Oracle errors. Another way to reproduce this is the run the app for the first time and then watch the Oracle sessions. Kill the sessions created by the app. You will get “ORA-00028: your session has been killed” instead of the “idle time exceeded error”. But the following Oracle commands after that error will now generate the “not logged on” error. The ODP.NET pool manager gave the application a connection that should have been discarded because it’s already dead!
The ODP.NET provider will keep threads in the pool even though the server has already killed them! What’s the fix? The fix is a parameter that you need to add to your connection string. On my Oracle connection string, I only specify Data Source, User Name, and Password; no other parameters. I added an additional parameter called Validate Connection and you have to set it equal to True. I googled all day to find what this does. You can learn more out about this at the Oracle site. When you say Validate Connection=True on the connnection string, the ODP.NET connection pool will test the connection first before it returns the connection to the requesting app. The connection string look something like:
<add key="connectionString" value="Data Source=ORCL; User ID=me; Password=myPassword; Validate Connection=True;" />
By putting Validate Connection=True on the connection string, connectionObject.Open() will get you a connection that is either:
- Valid open connection from the connection pool
- A fresh new connection the connection pool
You’ll stop getting crappy, dead connections from the pool. What’s downside? The pool manager will now ping the Oracle server using the connection first (to test if it’s still valid) before it returns to the application. A small price to pay I say to make the application stable.
I’ve tested using the Validation Connection=True method and it works. I can kill sessions in Oracle and the app will just re-open valid connections. The Oracle instance can be shutdown and restarted–when it restarts, the pool will simply create new database connections. Another possible approach is to set the Connection Lifetime parameter to a time value that is below the Idle Time limit in the server. In theory, this method will make the ODP.NET connection pool will discard the connections before the server kills it. My only problem with this latter strategy is that this doesn’t take into account Oracle server “bounce”, firewall disconnects, physical disconnects and the like. Just go with the Validate Connection parameter–it is still application pooling; just a bit busier.
NOTE: the stuff I wrote above is applicable to ODP.NET only. The System.Data.OracleClient namespace also has OracleConnection, OracleCommand, and adapter and reader objects as well. But they’re not the same. The System.Data.OracleClient, by default, does not use application pool. I tested this several times, again, by monitoring the Oracle sessions. If you put Pooling=True in your connection string, then the application will use connection pooling. What happens when the sessions are killed or sniped and you are using System.Data.OracleClient.OracleConnection in you app? First, you get “ORA-00028: your session has been killed” just like in ODP.NET. The difference between System.Data.OracleClient and ODP.NET OracleConnection is that in the former, it’s smart enough to not keep giving the cut-off connection back to application. You’ll get ORA-00028 error once and the succeeding calls should be fine. ODP.NET on the other hand, will keep that bad connection in the pool even though it’s been bad for a while; only when the ODP.NET connection has exceeded it’s TTL that it is removed from the pool.