Java Concurrency (&c): Date-Race-Ful Lazy Initialization for Performance

Sunday, December 14, 2008

Date-Race-Ful Lazy Initialization for Performance

I was asked a question about benign data races in Java this week, so I thought I would take the opportunity to discuss one of the (only) approved patterns for benign races. So, at the risk of encouraging bad behavior (don't use data races in your code!), I will discuss the canonical example of "benign races for performance improvement". Also, I'll put in another plug for Josh Bloch's new revision of Effective Java (lgt amazon), which I continue to recommend.

As a reminder, basically, a data race is when you have one (or more) writes, and potentially some reads; they are all to the same memory location; they can happen at the same time; and that there is nothing in the program to prevent it. This is different from a race condition, which is when you just don't know the order in which two actions are going to occur. I've put more discussion of what a data race actually is at the bottom of this post.

A lot of people think that it is okay to have a data races all over your code. By and large, it is a lot more dangerous than those people think it is. For example, the version of double checked locking that doesn't use volatile has a data race in it. As I've discussed before, that version is broken.

So, having said that, when can you have a harmless data race in your code? Here's some code from Sun's implementation of java.lang.String, which is available under the GPL:


public final class String {
     /** The value is used for character storage. */
     private final char value[];

     /** The offset is the first index of the storage that is used. */
     private final int offset;

     /** The count is the number of characters in the String. */
     private final int count;

     /** Cache the hash code for the string */
     private int hash; // Default to 0

     public int hashCode() {
         int h = hash;
         if (h == 0) {
             int off = offset;
             char val[] = value;
             int len = count;

             for (int i = 0; i < len; i++) {
                 h = 31*h + val[off++];
             }
             hash = h;
         }
         return h;
     }

     // ...
}

The principle here is pretty simple. The program calls hashCode(). If it hasn't calculated the hash code yet, then it does, and then stores the value for future use. The people who wrote String didn't bother to calculate the hash code in the constructor, because that would make construction (which is done all the time) more expensive. Because Strings are immutable, we don't need to worry about the hash code potentially changing in the future.

Do you see the potential data race? Here's some code that triggers it:


class BenignDataRace {
   final String s = "blah";

   public void main() {
     class CalcHash implements Runnable {
       public void run() {
         int h = s.hashCode();
       }
     }
     Thread t = new Thread(new CalcHash());
     Thread t2 = new Thread(new CalcHash());

     t.start();
     t2.start();
   }

}

Two threads access s.hash without doing synchronization, and at least one of those threads is going to write to it. Item 71 in Effective Java calls this idiom racy single-check. In this case, though, it actually works. Why is this safe, when the double-checked locking example, which uses synchronization, isn't?

There are basically two parts to the answer, and they are both pretty simple:

In this case, we don't care how many times we actually calculate the answer and assign it to the hash field. It just isn't a big deal if we happen to calculate it multiple times. Furthermore,

Java has a special guarantee for fields that are 32-bit or fewer and object references — a thread will always see some value that was assigned to that field. As a result of this, a thread reading hash will either see 0 or the calculated value. Edited to add: I just noticed that this was misleading. An object reference will point to the right object, but the contents of the object aren't guaranteed to be correct unless the object is immutable. Please see my series on immutability.

That second item might sound obvious, but bear in mind that 64-bit values don't get that guarantee; this code would not work if hash were a long or a double.

Now, let's break the code:


     public int hashCode() {
         if (hash == 0) {
             int off = offset;
             char val[] = value;
             int len = count;

             int h = 0;
             for (int i = 0; i < len; i++) {
                 h = 31*h + val[off++];
             }
             hash = h;
         }
         return hash;
     }

All I did was make it so that the temporary variable was only used for calculating the hash code! How could that break the code?

What I've done here is to add an additional read: the second read of hash, before the return. As odd as it sounds, and as unlikely as it is to happen, the first read can return the correctly computed hash value, and the second read can return 0! This is allowed under the memory model because the model allows extensive reordering of operations. The second read can actually be moved, in your code, so that your processor does it before the first!

(Side note: the first version is better anyway, because it performs fewer memory reads. If you really care that much about performance, that's the way to go.)

What's the moral here? The moral is that this stuff is very confusing, and you have to work very hard to get it right. Make sure you understand exactly what your code does. Avoid data races wherever possible.

Here are the details on data races that I promised above:

A program has a data race in it when it is possible to have an execution of that program where:

All of the instructions in that program occur in the order they occurred in the program (this is important for technical reasons),

There are two accesses to the same field / array element,

At least one of those accesses is a write, and

There is no happens-before ordering between those actions.

I defined "happens-before ordering" in my last post. Basically, it means that there is nothing establishing an order between the two actions; for example, there is no locking protecting the accesses, no volatile writes and reads intervening and no thread starts and joins.

53 comments:

Anonymous said...: Interestingly, there's another example of a race condition in the JDK which isn't completely deterministic as in the String hash code case: ConcurrentSkipListMap uses an internal random number generator that isn't synchronized. The latter is what I describe in my article on safe omission of synchronization, but thanks for the reminder about the String.hashCode() example: I should add that too!; December 14, 2008 at 12:54 PM
Domingos Neto said...: Excellent!

I wasn't aware of the limitation for 64 bit values. I guess I should take the time at some point to read the language specification from top to bottom.

Good to know!; December 14, 2008 at 2:34 PM
flikxxi said...: Great, as always!!!

Note: in the last example 'h' is uninitialized. Yes!!! call me '-pedantic' :-); December 14, 2008 at 3:46 PM
serega said...: JVM specification does no guarantee 64 bit operations to be atomic, but it also vaguely states that "VM implementors are encouraged to avoid splitting their 64-bit values where possible". How is it actually implemented in the 64-bit HotSpot?; December 14, 2008 at 4:31 PM
stinkyminky said...: Jeremy -

Thanks for the post.

What is not certain is that with an immutable object, is it better to store lazily calculated hashcode into 'volatile' or non-volatile int.

Effective Java gives an example with 'volatile'. But 'String' code does not.

I am leaning toward 'String.hashcode()' approach but is there any good reason to use 'volatile'?

BTW - you mentioned of double-read leading into performance penalty. I believe the penalty would be larger if variable is volatile. EJ mentioned 'local' variable version is about ~25% faster.

Thanks.; December 14, 2008 at 6:24 PM
Anonymous said...: >> The second read can actually be moved, in your code, so that your processor does it before the first!
Do you mean the code may be rewritten by the processor as:

int h = hash;
if (hash == 0) {
...
}
return h;

Thanks for the post!; December 15, 2008 at 2:47 AM
Unknown said...: @Buu: I think he does mean that.
@Jeremy: given that for programs without data races (I know this program has a data race) the JMM guarantees sequential consistency (isn't it?), I was wondering a bit about that reordering.

If I understand it correctly, that reordering is allowed because without a data race the programmer cannot notice the difference, isn't it?

Also, this can be proven because there is no possible destination of a happens-before edge, so for a data-race free program "hash" has to stay unchanged, right?; December 15, 2008 at 5:01 PM
Jeremy Manson said...: @franci -- thanks!

@serega -- I doubt that it is a problem for 64-bit HotSpot, but I'm not 100% sure.

@stinky -- Effective Java 2 mentions the String example and calls it "racy single-check", so I'm not sure what you mean. In general, volatile reads don't have too much of a performance penalty. My suggestion is:

1) Make sure you understand what you are doing,
2) Profile before trying to optimize.

If you don't understand what you are doing, don't do it.

@Buu -- Yes, with two caveats:

1) That's not equivalent code; the equivalent code has another write to h before the end of the if statement.

2) The processor wouldn't do this kind of transformation, though, the compiler would. From a Java programmer's POV, this is 6 of one, half a dozen of the other.

@Paolo -- Sort of. It is true that if you have no data races in the code, you can't tell that a reordering occurred. That's the sequential consistency for data-race-free programs guarantee.

In these cases, I tend to reason backward. I thought about this example and realized that the Java memory model allowed the result in question. Then I thought some more and came up with a reordering that would actually trigger the weird result.

What's funny about that process is that it is often much easier to come up with weird results than it is to come up with plausible transformations! In this case, the transformation isn't terribly plausible. I could rejigger it so it was more plausible, but it would make the example more confusing, so I avoided doing that.; December 15, 2008 at 9:37 PM
stinkyminky said...: Jeremy -

Thanks for pointing out racy-single check idiom. I totally missed it. Shame on me.

As for performance penality of double-read vs. using local variable, EJ mentioned on Page 284.

On double-checked idiom...

"What this variable does is to ensure that field is read only once in the common case where it's already initialized. While not strictly necessary, this may improve performance and is more elegant by the standards applied to low-level concurrent programming. On my machine, the method above is about 25 percent faster than the obvious version without a local variable."

I misunderstood that double-read performance penality is mainly due to volatile. I should not have assumed accessing volatile field is slow.

BTW - for everyone, it is Item 71: Use lazy initialization judiciously.; December 15, 2008 at 10:31 PM
stinkyminky said...: @Domingos Neto

For your info - Java Language Specification - Chapter 17.

http://java.sun.com/docs/books/jls/third_edition/html/memory.html

17.7 Non-atomic Treatment of double and long; December 15, 2008 at 10:47 PM
Anonymous said...: Dear readers, may I direct you to a contribution I wrote some years ago about the subject? It illustrates the same technique on a different example. String.hashCode() is mentioned, too.

http://www-128.ibm.com/developerworks/java/library/j-immutability.html

The article also thanks Jeremy for some clarifications.; December 29, 2008 at 5:57 AM
Jonathan said...: I don't follow the argument here. By definite assignment rules hash has an initial value of 0 when the object is constructed. After construction hash can only transition to a non-zero value; once the non-zero value is globally visible hash can never again be zero.

If on entry to hashCode() hash evaluates to 0 based on the current thread's view of the global memory state (which may very well lag behind an update performed by a thread running on a different processor) the body of the if block will be entered, and hash will be assigned the proper hash code. A second read from hash at the return statement must respect the write performed w/in the same thread -- this would have been true even in the original Java memory model. The memory model does not allow for temporal reordering w/in a linear control flow (nor, mind you, do any compiler or processor reordering).; February 26, 2009 at 2:34 AM
Jeremy Manson said...: @Jonathan - When it breaks, the thread that breaks doesn't perform a write. It performs two reads. The first read returns a non-zero value and the second read returns a zero value. This is out of order with respect to the program order in which they were written, but is totally legal WRT the Java memory model.

I hope that clarifies it.; February 26, 2009 at 7:07 PM
Jeremy Manson said...: @Jonathan - When it breaks, the thread that breaks doesn't perform a write. It performs two reads. The first read returns a non-zero value and the second read returns a zero value. This is out of order with respect to the program order in which they were written, but is totally legal WRT the Java memory model.

I hope that clarifies it.; February 26, 2009 at 7:08 PM
Jonathan said...: @Jeremy - Per JLS 3rd Ed. Sec. 17.4.5 there is a happens-before relationship between the two reads. Since the reads are to the same memory location it is not permissible to reorder them; doing so breaks "as-if-serial" semantics.

The memory model says nothing about when a write to hash by thread T1 becomes visible to thread T2. However, once the write is visible in T2 all subsequent reads (by program order) will see that new value.; February 27, 2009 at 11:57 AM
Jeremy Manson said...: @Jonathan - A happens-before relationship between two actions doesn't prevent them from being reordered. If it did, then no two program actions could ever be reordered. The closest happens-before comes to what you are saying is that if one action is a write, and a subsequent (via happens-before) action is a read, then the read has to see the value written by the write (unless there is a data race.

Chapter and verse of that particular effect is the bit about when a read is "allowed to observe" a write, also in 17.4.5.

If you don't believe me that the guarantee you are proposing isn't in there, rather than going through line by line, I'm afraid I'll have to do a proof by authority, since I wrote chapter 17. If you want more detail, you can read my doctoral dissertation or the resulting PoPL paper, both of which have the relevant proofs.; February 27, 2009 at 2:57 PM
Jonathan said...: @Jeremy -- needless to say I should have looked more carefully at who writes this blog before (incorrectly) quoting from the specs.

Unfortunately the links to your dissertation and PoPL paper from the JMM pages on umd.edu no longer appear to be live. The listing of concurrency testcases (under the unified proposal section) were very instructive, though -- in particular the justification for test case 16 made it clear to me how the model permits intra-thread reordering of reads to the same field.

It does worry me a bit that within a few minutes I was able to find two examples in the JDK 6 sources of hashCode methods that could return an incorrect result given this form of reordering (see java.util.UUID and javax.management.MBeanInfo).

If there ever were a compiler / processor combination that often reordered reads in this manner (and outside of IA64 that seems pretty improbable, as it's more likely than not that the compiler would store a single read of the field in a register or local) fixing up all of the problematic code could be a very arduous -- and possibly endless -- task.; March 2, 2009 at 6:55 PM
jcsahnwaldt said...: Interesting. Like Jonathan, I was convinced that reads must not be reordered in this way. Now I know better, but I still find it a bit counterintuitive.

Maybe it's just a problematic choice of words. Here's a quote from JLS 3rd ed 17.4.5:

"If one action happens-before another, then the first is visible to and ordered before the second."

Let's put the hash example above into the language of the spec. Consider a read r1 of variable v that happens-before a read r2 of v, and a write w to v that has no happens-before relation to them. Then the following is possible: w is "seen" by r1 but not by r2, although r1 is "visible" to r2! Maybe the spec should have made it clearer that "visible to" only really applies to the relationship between read and write actions.

Trying to formalize my intuitive understanding of "visible to" with regard to different reads of the same variable, I came up with the following:

If a read r1 of a variable v happpens-before a read r2 of v, and a write w1 to v happens-before a write w2 to v, then it must not be the case that w1 is seen by r2 and w2 is seen by r1.

But I guess such a rule would make some useful optimizations impossible...

Christopher; March 4, 2009 at 3:08 PM
Jeremy Manson said...: @jcsahnwaldt - Basically, you have to follow the formal rules, not the informal ones.

Having said that:

a) When we talk about "visibility", we are talking about which writes are visible to which reads. It doesn't really make much sense to talk about which reads are visible to which reads.

b) More importantly: when we say two actions are "ordered", we mean that they are ordered with respect to each other, not with respect to other actions that may happen in a data race. Here's an example. Imagine that I have (initially, p == o, p.x == 0):

Thread 1:
r1 = o.x;
r2 = p.x;
r3 = o.x;

Thread 2:
o.x = 1;

Consider the case where the compiler does redundant read elimination, so that the two reads of o.x are combined. You can then have the two reads of o.x returning 0, and the read of p.x returning 1.

The read actions are still ordered with respect to each other, in the sense that the local effects of each one (the writes to r1, r2 and r3) don't interfere with the local effects of the next, and so on. They are *not* ordered with respect to the write to o.x, which can appear to be ordered before some of these reads and not before others, pretty much at random.

It is hard to make this clear in a comment, so this is probably worth a whole separate blog entry.; March 4, 2009 at 3:54 PM
Jeremy Manson said...: ... and you are correct that that rule would render optimizations impossible.; March 4, 2009 at 3:55 PM
Jeremy Manson said...: Oh, and @Jonathan - fortunately, taking actual advantage of this is very unlikely.; March 6, 2009 at 11:20 PM
jcsahnwaldt said...: Jeremy, thanks for the clarification. Your example involves aliasing, the hash example doesn't, which doesn't make a difference for the spec, of course, but I guess to my intuition it did. Anyway, thanks to your article, I've upgraded my intuition. :-); March 7, 2009 at 9:38 AM
Jeremy Manson said...: @jcsahnwaldt -- Happy to help. The example I gave was actually the very first motivating example for the new memory model, because the old memory model wouldn't allow that optimization. I agree, the aliasing sells it. :); March 7, 2009 at 9:55 AM
Anonymous said...: I don`t know why is it possible..;
plz specify re-ordered code; March 10, 2009 at 10:21 PM
Jeremy Manson said...: @anony - I'm afraid I don't know what you want me to specify. Could you clarify your question?; March 10, 2009 at 10:32 PM
Anonymous said...: hmm...How to break thread-safety?
I can`t understand(...) difference between original hashcode() and
your modification of hashcode().; March 10, 2009 at 10:51 PM
ram said...: Hi Jeremy,

I still don't get the explanation for the hash example.

1: r1 = read hash
2: if r1 == 0 {
3: ....
4: write hash := r2
}
5: r3 = read hash
6: return r3

Now, you are saying that there is a particular reordering where r3 could be 0. Can you show me an example reordering of code which would be correct in a sequential setting, but somehow gets messed up in a concurrent setting?

I understand that the compiler (or processor) is free to reorder because there are no volatile variables or locks used, but I cannot come up a single correct reordering scenario in a uniprocessor setting.

Can you help spelling it out? Thanks.; September 11, 2009 at 2:54 PM
Jeremy Manson said...: @ram - The transformation is as follows:

r1 = hash;
if (hash == 0) {
r1 = hash = // calculate hash
}
return r1;; September 12, 2009 at 10:16 AM
ram said...: Ah, of course!

This is a valid transformation for a uniprocessor, but is a problem in a multi-processor setting. (hash could be changed by another thread between "r1 = hash" and "if hash == 0")

Thanks, jeremy.; September 12, 2009 at 10:41 AM
Unknown said...: Hi Jeremy,
I'm sorry for bothering you at such an old post? but could yoy clarify something?
You wrote that valid transformation could be like this one:

r1 = hash;
if (hash == 0) {
r1 = hash = // calculate hash
}
return r1;

The quote from JMM:
"Each time the evaluation of thread t generates an inter-thread action, it must match the
inter-thread action a of t that comes next in program order."

For our case we have the following order of program inter-thread actions:
1)read hash;
2) write hash(inside if);
3) read hash (on return);

For the transformation you described we have the following order of inter-thread actions:

1)read hash;
2)read hash;
3)write hash(inside if);

So the order of inter-thread actions seems to be incorrect (in terms of JMM) for me.
Could you clarify this?; April 9, 2010 at 12:33 AM
Jeremy Manson said...: @Vladimir (sorry about the latency, I was on vacation)

That sentence refers to the semantics of the evaluation of the thread, not the actual evaluation of the thread by the JVM.

More specifically, looking at a single thread in isolation, the actions have to appear to the user as if they have been evaluated in that order. What the reads can return is dictated by the memory model. So, let's say we had two threads:

Initially, x = 0

Thread 1:
r1 = x;
r2 = x;
r3 = x;

Thread 2:
x = 1;

If we look at thread 1 in isolation, we will see three reads of x in a row, so that's what we have to see. The memory model dictates that each of these three reads is allowed to return 1 or 0, regardless of what the previous reads saw. We are therefore allowed to see r1 = r3 = 0, r2 = 1: three reads in a row, each of which is allowed to see 1 or 0.

In practice, what this means is that the system reordered r3 and r2:

Thread 1:
r1 = x;
r3 = x;
r2 = x;

And that the write to x of 1 occurred in between r3 and r2.

I know it is confusing (it's hard to explain in a comment followup, certainly), but that's the general idea. Does that clarify it at all?; April 24, 2010 at 10:57 AM
Anonymous said...: from thurston:

OK, I can see that the JMM permits the reordering that makes the refactored hashCode() buggy, but I find it inconceivable that an actual CPU would exhibit such behavior.
Yes, there are 2 loads of #hash in the source, but the problematic out-of-order execution would only occur if:
a. javac, the JIT, and CPUs missed the opportunity to elide the second (program order) read (viz. redundant read elimination, which would not violate the JMM) and
b. determined that issuing *two* load instructions *with the same src operand* (and consecutively) is the optimal thing to do when r2 might be subsequently invalidated by the conditional jump (i.e. when r1 == 0).
I'm no expert when it comes to speculative execution by CPUs, but I can't imagine that any realworld heuristic would lead to an execution that *in the best case* is no better than a (non-optimized) sequential execution and in the worst case involves a load that is subsequently discarded (perhaps resulting in 3 loads).
Of course, Jeremy wasn't addressing what any actual CPUs might do, but what the JMM permits them to do - but I would love to see an example of this happening in practice;
OOEx is an optimization, not a deoptimization, although it's true I can't imagine revising the JMM to expressly proscribe this particular behavior, as the JMM is abstruse enough already.; October 31, 2014 at 10:27 AM
Unknown said...: Jeremy, a bit late comment as well :)

Can you pls elaborate on @Vladimir's question?
I do not see how can it work differently since we don't have any conditions inside 'if' statement. And once we git into it it should work exactly the same as for one thread. Sorry for repeating @Vladimir's question.; October 30, 2015 at 12:47 PM
Jeremy Manson said...: @Kostya - Another way to think about it is that, unless you have explicit synchronization, you can look at the statements in the thread, rearrange them to execute in a different order that works for just that thread in isolation, and execute them in the new order.

For example, if you have this code:

r1 = x;
r2 = x;
if (r1 == r2)
y = 1;

When you execute it, you can't really tell the difference between it and:

y = 1

Because they get the same results. So you can just execute the second one.

Putting synchronization in (in the form of volatile variables or synchronized blocks or whatever) is a way to signal to the system that it shouldn't rearrange the code, because you expect it to be coordinating with another thread, so some features that may seem like they don't matter in isolation (like execution order) do.; October 30, 2015 at 3:52 PM
Unknown said...: I completely understand that :)

Good example about reordering is here:
https://assylias.wordpress.com/2013/02/01/java-memory-model-and-reordering/

Which some kind of describes our situation.

What worries me much that I should throw away a lot of code written so far :)

So what should be the general rule when writing code? And why can't the first snippet be reordered somehow so it will lead to similar 0-response situation ?; November 3, 2015 at 1:01 PM
Jeremy Manson said...: @kostya: The first snippet can't be reordered because it only reads hash once, not twice, so there is nothing to reorder.

The general rule is, when writing code:

75% of the time; First, try using an existing data structure to do what you want, preferably something in java.util.concurrent.
24% of the time: If that doesn't work, then use mutexes to protect your data.
0.9% of the time: If that doesn't work, then use atomics / volatile to achieve your ends.
0.1% of the time: If that doesn't work, use a heavily vetted and approved pattern for doing racy work (as long as you are in Java, other languages don't have this as an option).; November 3, 2015 at 3:45 PM
Unknown said...: Or do not read fields twice in method. Or do not use fields, use functional style programming and pass vars in fields. Or something else.

Basically this is probably the second biggest disappointment in my life after I was told that there is no Santa.

Using concurrent data structured will save you from these race conditions but probably will impact performance and memory consumption.
I would bet that 80 percent of code in projects including open source ones do not care about such things. Which makes me think about such kind of race conditions as of more theoretical ones than real. Is it true from your experience?

I am not saying that this re-ordering stuff is bad or was not thought deeply beforehand. It is clear that since Java generally platform independent and though rules should be as strict as possible to make it suitable and in the same time not strict to make run on any theoretical device.; November 5, 2015 at 6:13 AM
Jeremy Manson said...: @kostya: In my experience, having a data race in your code often leads to serious bugs. Sometimes it is for memory model related reasons, but usually the code is wrong without any need to think about reordering or visibility issues.; November 7, 2015 at 5:23 PM
Anonymous said...: What I don't understand, is why the "benign data race" version (that has only one read of #hash in the source) is "benign" (viz. is guaranteed to never return zero).

After all, it still doesn't meet the JMM guarantee of SC-DRF (there is still a data race).
There is nothing in the JMM, that treats "does a single read" methods as special (at least as far as I can tell).

Specifically, why couldn't (an admittedly insane) JIT "transform" the source from:

A. benign

r1 = this.hash
if (0 == r1)
r1 = . . . //compute hash
this.hash = r1

return r1

to:

B. malignant

r1 = this.hash
if (0 == r1)
r1 = . . . //compute hash
this.hash = r1

r2 = this.hash
return r2

Essentially transforming the "benign" version to the "malignant" one?
After all, the intra-thread semantic guarantee would be met (any single-threaded execution would never return zero); so you're left with a data-racy execution, which is best thought of as "all bets are off".
Again, I cannot imagine a JIT implementation that would actually do that, but what in the JMM would actually prevent it?; February 5, 2016 at 11:16 PM
Jeremy Manson said...: @thurston: The memory model doesn't let you introduce reads and writes of shared variables to introduce a previously non-existent race.; March 5, 2016 at 10:31 AM
Patrick P said...: Jeremy, I noticed in the Javamemorymodel-discussion mailing list that Niklas Matthies used your String.hashCode example to justify an additional temp variable when Double-Checked Locking via a final field (this was back in Jul 22, 2010). That code has been sitting on Wikipedia since then as required for "correctness," and appears in various implementation of FinalWrapper around the web. So, I was wondering if you could clear that up for me-- 1) is the temp variable really required? and 2) does the FinalWrapper really extend the fully-initialized guarantee when it wraps a reference to a mutable object?; December 3, 2016 at 3:38 PM
Jeremy Manson said...: @Patrick - yes, the temp variable really is required. Two unsynchronized reads can return different values. In practice, an implementation is unlikely to trigger a problem, but do you want to be the first one to find out?

Also, yes, at a first glance, it looks to me as if the FinalWrapper would accomplish approximately the same thing as the volatile field. You are trading off an extra class and some memory bloat for possibly making the reads faster on some architectures (probably not x86).; December 15, 2016 at 10:25 PM
thurston said...: Does the JMM allow an optimizing compiler to remove a local variable altogether?

so,
r1 = this.hash
r1 = computeHash()
this.hash = r1

I'm not sure where in JLS $17.4 that there is an explicit proscription on eliding the local variable and updating this.hash directly;

Certainly if you presumed a single-threaded execution that would be allowed, and since there is always a data race . . .

and is it the return r1 that eliminates the possibility?

It would be great if you could point out which part of JLS section 17 addresses local variable elision, as I can't find it; August 24, 2017 at 12:53 PM
Jeremy Manson said...: @thurston - there doesn't need to be something explicitly addressing local variable elision. Chapter 17 is about how threads interact through memory. Changes to local variables are only visible within a thread. Single-threaded actions are not addressed.

In general, the way local variable elimination is addressed is bullet 3 in 17.4.7. The compiler can do anything it wants provided that the thread behaves as if it executed in the right order. That's the basic assumption behind all compiler optimizations. As long as no one can tell, the compiler can do whatever it wants, including getting rid of assignments to local variables.; August 24, 2017 at 9:20 PM
Unknown said...: Hi, I have a question about visibility across threads. My understanding is that if you get a reference to an immutable object, even if it published unsafely (through an unprotected non volatile class field for example) then its guaranteed to be properly constructed (all the fields will have the values set in the constructor) but you aren’t guaranteed to see the latest version of the object due to caching of variables not flushed to memory. For example, if the initial value of the field is null then is set to value in one thread and the read in a second thread then the second thread may see null. My own experiments seem to confirm that threads can potentially see different values indefinitely. I wanted confirm that my understanding is correct. If this is the case then I don't understand how adding a final wrapper works in the Wikipedia article on double checked locking.; October 6, 2017 at 1:57 AM
Jeremy Manson said...: @Shane - it is incorrect to say that immutable objects are properly constructed even if published unsafely.; October 7, 2017 at 9:35 PM
Unknown said...: Thanks very much for your answer Jeremy. Can you elaborate slightly please? In section 3.5.2 of the book Java Concurrency in Practice, the author states “Immutable objects can be used safely by any thread without additional synchronization, even when synchronization is not used to publish them”. What does he mean by this?; October 11, 2017 at 9:11 PM
Unknown said...: I just read your posts on immutability so what I understand now is that one of the requirements for immutability objects is that the object mush be reachable from a final field so an object referenced from a non final field is not immutable whether the object itself is immutable or not - is that correct? That being the case, how does adding a final wrapper in the Wikipedia article on double checked locking help when helperWrapper itself is not final?; October 11, 2017 at 11:03 PM
Unknown said...: Sorry for all the posts but I did some more thinking. The double checked locking example is OK under the assumption that we don't care whether we construct the resource more than once, analogous to assigning the hash code more than once in your example above. The final wrapper just ensures that the resource returned is fully constructed. Am I correct?; October 11, 2017 at 11:21 PM
Jeremy Manson said...: @Shane: the example using a final wrapper in the article on wikipedia uses a lock to guarantee that the resource is only constructed once.; October 30, 2017 at 11:11 PM
KAISER Edouard said...: @Jeremy, excellent article!!

And to be honest I'm still struggling to rationalize the broken code. In which situation the hardware would move the last read at the top, leave the second read as it is, and introduce this extra write to the local variable before the end of the if statement? It's so counter intuitive.

This comment from hudson describes perfectly my thoughts:

I can see that the JMM permits the reordering that makes the refactored hashCode() buggy, but I find it inconceivable that an actual CPU would exhibit such behavior.
Yes, there are 2 loads of #hash in the source, but the problematic out-of-order execution would only occur if:
a. javac, the JIT, and CPUs missed the opportunity to elide the second (program order) read (viz. redundant read elimination, which would not violate the JMM) and
b. determined that issuing *two* load instructions *with the same src operand* (and consecutively) is the optimal thing to do when r2 might be subsequently invalidated by the conditional jump (i.e. when r1 == 0).
I'm no expert when it comes to speculative execution by CPUs, but I can't imagine that any realworld heuristic would lead to an execution that *in the best case* is no better than a (non-optimized) sequential execution and in the worst case involves a load that is subsequently discarded (perhaps resulting in 3 loads).
Of course, Jeremy wasn't addressing what any actual CPUs might do, but what the JMM permits them to do.

I tried to reproduce the re-ordering by writing a jcstress test, without success unfortunately. Would you be able to share a way to reproduce this exact weird situation?

Thanks; June 10, 2019 at 11:41 PM
Jeremy Manson said...: @Edouard - Current hardware is not likely to produce the surprising result. Compiler optimizations could conceivably produce this result if the code were changed such that hash field were accessed through two different aliases to the same memory location.

I urge you not to program to what you think a particular hardware architecture may or may not do. First of all, compiler optimizations can be very tricky. Second of all, code motion is a moving target - what is true in 2019 may not be true in 2029.; June 24, 2019 at 7:56 PM
Vadzim said...: LOL, looks like "broken" example should be bolted https://bugs.openjdk.java.net/browse/JDK-8166842; August 18, 2019 at 12:08 PM