Sunday, December 14, 2008

Date-Race-Ful Lazy Initialization for Performance

I was asked a question about benign data races in Java this week, so I thought I would take the opportunity to discuss one of the (only) approved patterns for benign races. So, at the risk of encouraging bad behavior (don't use data races in your code!), I will discuss the canonical example of "benign races for performance improvement". Also, I'll put in another plug for Josh Bloch's new revision of Effective Java (lgt amazon), which I continue to recommend.

As a reminder, basically, a data race is when you have one (or more) writes, and potentially some reads; they are all to the same memory location; they can happen at the same time; and that there is nothing in the program to prevent it. This is different from a race condition, which is when you just don't know the order in which two actions are going to occur. I've put more discussion of what a data race actually is at the bottom of this post.

A lot of people think that it is okay to have a data races all over your code. By and large, it is a lot more dangerous than those people think it is. For example, the version of double checked locking that doesn't use volatile has a data race in it. As I've discussed before, that version is broken.

So, having said that, when can you have a harmless data race in your code? Here's some code from Sun's implementation of java.lang.String, which is available under the GPL:

public final class String {
/** The value is used for character storage. */
private final char value[];

/** The offset is the first index of the storage that is used. */
private final int offset;

/** The count is the number of characters in the String. */
private final int count;

/** Cache the hash code for the string */
private int hash; // Default to 0

public int hashCode() {
int h = hash;
if (h == 0) {
int off = offset;
char val[] = value;
int len = count;

for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}

// ...
}
The principle here is pretty simple. The program calls hashCode(). If it hasn't calculated the hash code yet, then it does, and then stores the value for future use. The people who wrote String didn't bother to calculate the hash code in the constructor, because that would make construction (which is done all the time) more expensive. Because Strings are immutable, we don't need to worry about the hash code potentially changing in the future.

Do you see the potential data race? Here's some code that triggers it:

class BenignDataRace {
final String s = "blah";

public void main() {
class CalcHash implements Runnable {
public void run() {
int h = s.hashCode();
}
}
Thread t = new Thread(new CalcHash());
Thread t2 = new Thread(new CalcHash());

t.start();
t2.start();
}

}
Two threads access s.hash without doing synchronization, and at least one of those threads is going to write to it. Item 71 in Effective Java calls this idiom racy single-check. In this case, though, it actually works. Why is this safe, when the double-checked locking example, which uses synchronization, isn't?

There are basically two parts to the answer, and they are both pretty simple:

  1. In this case, we don't care how many times we actually calculate the answer and assign it to the hash field. It just isn't a big deal if we happen to calculate it multiple times. Furthermore,

  2. Java has a special guarantee for fields that are 32-bit or fewer and object references — a thread will always see some value that was assigned to that field. As a result of this, a thread reading hash will either see 0 or the calculated value. Edited to add: I just noticed that this was misleading. An object reference will point to the right object, but the contents of the object aren't guaranteed to be correct unless the object is immutable. Please see my series on immutability.

That second item might sound obvious, but bear in mind that 64-bit values don't get that guarantee; this code would not work if hash were a long or a double.

Now, let's break the code:

public int hashCode() {
if (hash == 0) {
int off = offset;
char val[] = value;
int len = count;

int h = 0;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return hash;
}
All I did was make it so that the temporary variable was only used for calculating the hash code! How could that break the code?

What I've done here is to add an additional read: the second read of hash, before the return. As odd as it sounds, and as unlikely as it is to happen, the first read can return the correctly computed hash value, and the second read can return 0! This is allowed under the memory model because the model allows extensive reordering of operations. The second read can actually be moved, in your code, so that your processor does it before the first!

(Side note: the first version is better anyway, because it performs fewer memory reads. If you really care that much about performance, that's the way to go.)

What's the moral here? The moral is that this stuff is very confusing, and you have to work very hard to get it right. Make sure you understand exactly what your code does. Avoid data races wherever possible.




Here are the details on data races that I promised above:

A program has a data race in it when it is possible to have an execution of that program where:

  1. All of the instructions in that program occur in the order they occurred in the program (this is important for technical reasons),

  2. There are two accesses to the same field / array element,

  3. At least one of those accesses is a write, and

  4. There is no happens-before ordering between those actions.


I defined "happens-before ordering" in my last post. Basically, it means that there is nothing establishing an order between the two actions; for example, there is no locking protecting the accesses, no volatile writes and reads intervening and no thread starts and joins.

31 comments:

Neil said...

Interestingly, there's another example of a race condition in the JDK which isn't completely deterministic as in the String hash code case: ConcurrentSkipListMap uses an internal random number generator that isn't synchronized. The latter is what I describe in my article on safe omission of synchronization, but thanks for the reminder about the String.hashCode() example: I should add that too!

Domingos Neto said...

Excellent!

I wasn't aware of the limitation for 64 bit values. I guess I should take the time at some point to read the language specification from top to bottom.

Good to know!

franci said...

Great, as always!!!

Note: in the last example 'h' is uninitialized. Yes!!! call me '-pedantic' :-)

serega said...

JVM specification does no guarantee 64 bit operations to be atomic, but it also vaguely states that "VM implementors are encouraged to avoid splitting their 64-bit values where possible". How is it actually implemented in the 64-bit HotSpot?

stinkyminky said...

Jeremy -

Thanks for the post.

What is not certain is that with an immutable object, is it better to store lazily calculated hashcode into 'volatile' or non-volatile int.

Effective Java gives an example with 'volatile'. But 'String' code does not.

I am leaning toward 'String.hashcode()' approach but is there any good reason to use 'volatile'?

BTW - you mentioned of double-read leading into performance penalty. I believe the penalty would be larger if variable is volatile. EJ mentioned 'local' variable version is about ~25% faster.

Thanks.

Buu said...

>> The second read can actually be moved, in your code, so that your processor does it before the first!
Do you mean the code may be rewritten by the processor as:

int h = hash;
if (hash == 0) {
...
}
return h;

Thanks for the post!

Paolo Giarrusso said...

@Buu: I think he does mean that.
@Jeremy: given that for programs without data races (I know this program has a data race) the JMM guarantees sequential consistency (isn't it?), I was wondering a bit about that reordering.

If I understand it correctly, that reordering is allowed because without a data race the programmer cannot notice the difference, isn't it?

Also, this can be proven because there is no possible destination of a happens-before edge, so for a data-race free program "hash" has to stay unchanged, right?

Jeremy Manson said...

@franci -- thanks!

@serega -- I doubt that it is a problem for 64-bit HotSpot, but I'm not 100% sure.

@stinky -- Effective Java 2 mentions the String example and calls it "racy single-check", so I'm not sure what you mean. In general, volatile reads don't have too much of a performance penalty. My suggestion is:

1) Make sure you understand what you are doing,
2) Profile before trying to optimize.

If you don't understand what you are doing, don't do it.

@Buu -- Yes, with two caveats:

1) That's not equivalent code; the equivalent code has another write to h before the end of the if statement.

2) The processor wouldn't do this kind of transformation, though, the compiler would. From a Java programmer's POV, this is 6 of one, half a dozen of the other.

@Paolo -- Sort of. It is true that if you have no data races in the code, you can't tell that a reordering occurred. That's the sequential consistency for data-race-free programs guarantee.

In these cases, I tend to reason backward. I thought about this example and realized that the Java memory model allowed the result in question. Then I thought some more and came up with a reordering that would actually trigger the weird result.

What's funny about that process is that it is often much easier to come up with weird results than it is to come up with plausible transformations! In this case, the transformation isn't terribly plausible. I could rejigger it so it was more plausible, but it would make the example more confusing, so I avoided doing that.

stinkyminky said...

Jeremy -

Thanks for pointing out racy-single check idiom. I totally missed it. Shame on me.

As for performance penality of double-read vs. using local variable, EJ mentioned on Page 284.

On double-checked idiom...

"What this variable does is to ensure that field is read only once in the common case where it's already initialized. While not strictly necessary, this may improve performance and is more elegant by the standards applied to low-level concurrent programming. On my machine, the method above is about 25 percent faster than the obvious version without a local variable."

I misunderstood that double-read performance penality is mainly due to volatile. I should not have assumed accessing volatile field is slow.

BTW - for everyone, it is Item 71: Use lazy initialization judiciously.

stinkyminky said...

@Domingos Neto

For your info - Java Language Specification - Chapter 17.

http://java.sun.com/docs/books/jls/third_edition/html/memory.html

17.7 Non-atomic Treatment of double and long

Raffaello said...

Dear readers, may I direct you to a contribution I wrote some years ago about the subject? It illustrates the same technique on a different example. String.hashCode() is mentioned, too.

http://www-128.ibm.com/developerworks/java/library/j-immutability.html

The article also thanks Jeremy for some clarifications.

Jonathan said...

I don't follow the argument here. By definite assignment rules hash has an initial value of 0 when the object is constructed. After construction hash can only transition to a non-zero value; once the non-zero value is globally visible hash can never again be zero.

If on entry to hashCode() hash evaluates to 0 based on the current thread's view of the global memory state (which may very well lag behind an update performed by a thread running on a different processor) the body of the if block will be entered, and hash will be assigned the proper hash code. A second read from hash at the return statement must respect the write performed w/in the same thread -- this would have been true even in the original Java memory model. The memory model does not allow for temporal reordering w/in a linear control flow (nor, mind you, do any compiler or processor reordering).

Jeremy Manson said...

@Jonathan - When it breaks, the thread that breaks doesn't perform a write. It performs two reads. The first read returns a non-zero value and the second read returns a zero value. This is out of order with respect to the program order in which they were written, but is totally legal WRT the Java memory model.

I hope that clarifies it.

Jeremy Manson said...

@Jonathan - When it breaks, the thread that breaks doesn't perform a write. It performs two reads. The first read returns a non-zero value and the second read returns a zero value. This is out of order with respect to the program order in which they were written, but is totally legal WRT the Java memory model.

I hope that clarifies it.

Jonathan said...

@Jeremy - Per JLS 3rd Ed. Sec. 17.4.5 there is a happens-before relationship between the two reads. Since the reads are to the same memory location it is not permissible to reorder them; doing so breaks "as-if-serial" semantics.

The memory model says nothing about when a write to hash by thread T1 becomes visible to thread T2. However, once the write is visible in T2 all subsequent reads (by program order) will see that new value.

Jeremy Manson said...

@Jonathan - A happens-before relationship between two actions doesn't prevent them from being reordered. If it did, then no two program actions could ever be reordered. The closest happens-before comes to what you are saying is that if one action is a write, and a subsequent (via happens-before) action is a read, then the read has to see the value written by the write (unless there is a data race.

Chapter and verse of that particular effect is the bit about when a read is "allowed to observe" a write, also in 17.4.5.

If you don't believe me that the guarantee you are proposing isn't in there, rather than going through line by line, I'm afraid I'll have to do a proof by authority, since I wrote chapter 17. If you want more detail, you can read my doctoral dissertation or the resulting PoPL paper, both of which have the relevant proofs.

Jonathan said...

@Jeremy -- needless to say I should have looked more carefully at who writes this blog before (incorrectly) quoting from the specs.

Unfortunately the links to your dissertation and PoPL paper from the JMM pages on umd.edu no longer appear to be live. The listing of concurrency testcases (under the unified proposal section) were very instructive, though -- in particular the justification for test case 16 made it clear to me how the model permits intra-thread reordering of reads to the same field.

It does worry me a bit that within a few minutes I was able to find two examples in the JDK 6 sources of hashCode methods that could return an incorrect result given this form of reordering (see java.util.UUID and javax.management.MBeanInfo).

If there ever were a compiler / processor combination that often reordered reads in this manner (and outside of IA64 that seems pretty improbable, as it's more likely than not that the compiler would store a single read of the field in a register or local) fixing up all of the problematic code could be a very arduous -- and possibly endless -- task.

jcsahnwaldt said...

Interesting. Like Jonathan, I was convinced that reads must not be reordered in this way. Now I know better, but I still find it a bit counterintuitive.

Maybe it's just a problematic choice of words. Here's a quote from JLS 3rd ed 17.4.5:

"If one action happens-before another, then the first is visible to and ordered before the second."

Let's put the hash example above into the language of the spec. Consider a read r1 of variable v that happens-before a read r2 of v, and a write w to v that has no happens-before relation to them. Then the following is possible: w is "seen" by r1 but not by r2, although r1 is "visible" to r2! Maybe the spec should have made it clearer that "visible to" only really applies to the relationship between read and write actions.

Trying to formalize my intuitive understanding of "visible to" with regard to different reads of the same variable, I came up with the following:

If a read r1 of a variable v happpens-before a read r2 of v, and a write w1 to v happens-before a write w2 to v, then it must not be the case that w1 is seen by r2 and w2 is seen by r1.

But I guess such a rule would make some useful optimizations impossible...

Christopher

Jeremy Manson said...

@jcsahnwaldt - Basically, you have to follow the formal rules, not the informal ones.

Having said that:

a) When we talk about "visibility", we are talking about which writes are visible to which reads. It doesn't really make much sense to talk about which reads are visible to which reads.

b) More importantly: when we say two actions are "ordered", we mean that they are ordered with respect to each other, not with respect to other actions that may happen in a data race. Here's an example. Imagine that I have (initially, p == o, p.x == 0):

Thread 1:
r1 = o.x;
r2 = p.x;
r3 = o.x;

Thread 2:
o.x = 1;

Consider the case where the compiler does redundant read elimination, so that the two reads of o.x are combined. You can then have the two reads of o.x returning 0, and the read of p.x returning 1.

The read actions are still ordered with respect to each other, in the sense that the local effects of each one (the writes to r1, r2 and r3) don't interfere with the local effects of the next, and so on. They are *not* ordered with respect to the write to o.x, which can appear to be ordered before some of these reads and not before others, pretty much at random.

It is hard to make this clear in a comment, so this is probably worth a whole separate blog entry.

Jeremy Manson said...

... and you are correct that that rule would render optimizations impossible.

Jeremy Manson said...

Oh, and @Jonathan - fortunately, taking actual advantage of this is very unlikely.

jcsahnwaldt said...

Jeremy, thanks for the clarification. Your example involves aliasing, the hash example doesn't, which doesn't make a difference for the spec, of course, but I guess to my intuition it did. Anyway, thanks to your article, I've upgraded my intuition. :-)

Jeremy Manson said...

@jcsahnwaldt -- Happy to help. The example I gave was actually the very first motivating example for the new memory model, because the old memory model wouldn't allow that optimization. I agree, the aliasing sells it. :)

anony said...

I don`t know why is it possible..;
plz specify re-ordered code

Jeremy Manson said...

@anony - I'm afraid I don't know what you want me to specify. Could you clarify your question?

anony said...

hmm...How to break thread-safety?
I can`t understand(...) difference between original hashcode() and
your modification of hashcode().

ram said...

Hi Jeremy,

I still don't get the explanation for the hash example.


1: r1 = read hash
2: if r1 == 0 {
3: ....
4: write hash := r2
}
5: r3 = read hash
6: return r3

Now, you are saying that there is a particular reordering where r3 could be 0. Can you show me an example reordering of code which would be correct in a sequential setting, but somehow gets messed up in a concurrent setting?

I understand that the compiler (or processor) is free to reorder because there are no volatile variables or locks used, but I cannot come up a single correct reordering scenario in a uniprocessor setting.

Can you help spelling it out? Thanks.

Jeremy Manson said...

@ram - The transformation is as follows:

r1 = hash;
if (hash == 0) {
  r1 = hash = // calculate hash
}
return r1;

ram said...

Ah, of course!

This is a valid transformation for a uniprocessor, but is a problem in a multi-processor setting. (hash could be changed by another thread between "r1 = hash" and "if hash == 0")

Thanks, jeremy.

Vladimir said...

Hi Jeremy,
I'm sorry for bothering you at such an old post? but could yoy clarify something?
You wrote that valid transformation could be like this one:

r1 = hash;
if (hash == 0) {
r1 = hash = // calculate hash
}
return r1;

The quote from JMM:
"Each time the evaluation of thread t generates an inter-thread action, it must match the
inter-thread action a of t that comes next in program order."

For our case we have the following order of program inter-thread actions:
1)read hash;
2) write hash(inside if);
3) read hash (on return);

For the transformation you described we have the following order of inter-thread actions:

1)read hash;
2)read hash;
3)write hash(inside if);

So the order of inter-thread actions seems to be incorrect (in terms of JMM) for me.
Could you clarify this?

Jeremy Manson said...

@Vladimir (sorry about the latency, I was on vacation)

That sentence refers to the semantics of the evaluation of the thread, not the actual evaluation of the thread by the JVM.

More specifically, looking at a single thread in isolation, the actions have to appear to the user as if they have been evaluated in that order. What the reads can return is dictated by the memory model. So, let's say we had two threads:

Initially, x = 0

Thread 1:
r1 = x;
r2 = x;
r3 = x;

Thread 2:
x = 1;

If we look at thread 1 in isolation, we will see three reads of x in a row, so that's what we have to see. The memory model dictates that each of these three reads is allowed to return 1 or 0, regardless of what the previous reads saw. We are therefore allowed to see r1 = r3 = 0, r2 = 1: three reads in a row, each of which is allowed to see 1 or 0.

In practice, what this means is that the system reordered r3 and r2:

Thread 1:
r1 = x;
r3 = x;
r2 = x;

And that the write to x of 1 occurred in between r3 and r2.

I know it is confusing (it's hard to explain in a comment followup, certainly), but that's the general idea. Does that clarify it at all?