Order of Operations in Oracle’s Coalesce

Validating the order of operations and speed implications in Oracle SQL

Jefferey Cave
5 min readJun 13, 2023

There is often a question as to the order of operations in the case of a coalesce. This is an experiment to demonstrate the order of operations and demonstrate some of the implications.

It was determined through experimentation that the use of coalesce is an efficient means of avoiding large processing when a pre-existing value exists.

I’ve been bothered by some code I saw a year ago.

Our system never processes data fast enough, and anytime we can speed up a large query, it is something we are encouraged to investigate.

Last week, one of my processes ran over its limit speed. My critical select statement took 40 of its allotted 15 minutes, and when I looked it was that same piece of code that has been bothering me for the last year.

coalesce(anon_id, standard_hash(analyst_id||first_name||last_name,'sha512') ) as anon_id

The intent here is to get a list of people that need to be processed, using the same anonymization value we used before, or create a new value if necessary. This becomes our new dataset, thereby storing the newly anonymization token.

This is used frequently through our system, any time we receive information about a person, we want to anonymize that person for research purposes. We still want to be able to uniquely identify that a record is a unique individual, but we do not want that individual’s identity to be known. So we run that pattern on almost every record of data we receive in our system, for every analyst doing research. That’s a lot of processing.

There is more to that anonymization token, but this trivialized example gives you an idea of the scale of processing involved.

Hashes require underlying random numbers, and I remember, from many years ago a point regarding generating random numbers:

As a general rule:

1. Generating random numbers is processor intensive thing
2. You should not waste randomness (makes it less random)

Given this, when we go to generate random ids for people this function raises some questions

coalesce(anon_id, standard_hash(analyst_id||first_name||last_name,'sha512') ) as anon_id

In this case, each user is looked up, and if they have anon_id use that, otherwise generate a random one.

Considering the order of operations, we have two possible paths that are followed. (using procedural notation)

  1. Is coalesce evaluated as a function, with the evaluated function values being passed in
program_person_uid = coalesce(program_person_uid, standard_hash(...) );
= coalesce(null, 0x123456789abcdef );
= 0x123456789abcdef;

2. or a function pointer mechanism in which a point to the function is passed, and it is only evaluated on the discovery of null

function coalesce(a,b){
if(a !== null){
return a;
}
else{
// note the evaluation of the function here
return b();
}
}

anon_id = coalesce(anon_id, standard_hash(...));

If Oracle uses the second method, it is a very efficient mechanism. If Oracle uses the first mechanism, we are using very inefficient use of resources to generate several random-like numbers that are inevitably discarded.

anon_id = coalesce(anon_id, standard_hash(...) );
= coalesce( 0xfedcba0987654321, 0x1234567890abcdef );
= 0xfedcba0987654321;
// 0x1234567890abcdef was both expensive to generate, and never used

As a developer, I am expecting that first mechanism to be used. That is the Order of Operations of most procedural language calling functions. However, based on some implications in the documentation, it is possible that coalesce is a keyword that processes differently.

Demonstration

Given my recent slow down, and the pervasiveness of this pattern improving the efficiency of this calculation would be significant.

If there is an efficiency to be gained, it is possible to separate the underlying query into two, one for the null people or the non-null people, and calculate them separately. However, this is a significant and widespread change, and makes things less readable… so it better be worth it!

The easiest way to settle the argument is to test it.

Creating a Control

The first step in any test like this is to first cause the problem but to cause it in such a way that we have control over the way it occurs.

Let’s replace the slow standard_hash function with a slow function. Simply have a function that sleeps for 10 seconds.

create or replace function slow_test(sleep_len int default 10) 
return int
is
v_sleep_len int := coalesce(sleep_len, 10);
begin
dbms_session.sleep(v_sleep_len);
return v_sleep;
end;

I think its worth a hat-tip to OMG Ponies on StackOverflow for offering a hack solution to environments where we may not have sufficient permissions to run elements from the dbms_session package

create or replace function slow_test(sleep_len int default 10) 
return int
is
v_sleep_len int := coalesce(sleep_len, 10);
v_sleep_until date;
begin
select sysdate + v_sleep_len/86400
into v_sleep_until
from dual
;
loop
exit when v_sleep_until <= sysdate;
end loop;
return v_sleep;
end;
/

We have a function that will take a noticeable 10 seconds, rather than a function that is taking an unknown number of millionths of a second (standard_hash). So when we execute the function, it will take 10 seconds, giving us an opportunity to measure when we don’t call it.

Executing the Test

The simplest verification that we can run is

select level, slow_test()
from dual
connect by level <=10
;
-- 100 seconds

10 rows, at 10 seconds, cost us 100 seconds … exactly as we would expect.

This brings us back around to our original question

Does coalesce run the function every-time, or only when it encounters a null?

To check this we need to run it a few times, in different scenarios. We alternate the scenarios to exclude processing orders and caching. Normally I would write a loop for this, but after seeing the initial results, I just couldn’t be bothered…

select level, coalesce(null,slow_test()) from dual connect by level <=10;
-- 99.9 seconds
select level, coalesce(1,slow_test()) from dual connect by level <=10;
-- 0.0 seconds
select level, coalesce(null,slow_test()) from dual connect by level <=10 ;
-- 101.2 seconds
select level, coalesce(1,slow_test()) from dual connect by level <=10 ;
-- 0.0 seconds

I am genuinely surprised.

The coalesce function appears to not be operating as a function, but rather as the way I would expect a keyword to operate. This means this keyword is actually short-circuiting the implied function order of operations and is only running the second parameter as necessary.

Conclusion

I’m surprised … but pleasantly (I guess).

coalese is the more efficient mechanism for using in my queries.

This is good news from the perspective of using Oracle as a DB and processing engine. It follows the more efficient processing path.

This is bad news for me personally, since this means that I have not discovered an obvious way to speed up my processing. Now I’m stuck trying to figure out what I can do to speed things up, but on the other hand, the 15 minutes I spent running the test saved me hours of risky changes that would not have had any benefit.

Learn from my mistake, and always test your assumptions. I’ve been walking around with this assumption for months now, and have never taken the time to validate that assumption. Here I am now with no way to make improvements that I thought were available to me.

--

--

Jefferey Cave
Jefferey Cave

Written by Jefferey Cave

I’m interested in the beauty of data and complex systems. I use story telling to help others see that beauty. https://www.buymeacoffee.com/jeffereycave

No responses yet