Strings and intern

Strings, yawn, boring.  They are kinda the bread and butter of our development but lets face it, we can take em for granted sometimes.

So some neat tips from this blog post about Strings in Java. The most useful one being that the compiler builds a string table of all strings that are in code.  The compiler is also smart enough to put strings together, a feature called compiler folding.  Eg “a” + “aa” in code becomes “aaa” in the string table and vars reference the same spot in the string table. All variables reference the premade strings in the string table which leads to faster equals() operations since String#equals() method checks reference equality first before doing a string size or character by character comparison. 

The problem is that if you want the faster equals, you have to have these strings at compile time.  If you have something like

String s1 = “a”; String s2 = “aa”; String s3 = s1 + s2;

then equals performed on s1 and s2 will be reference equals, really fast, but s3 won’t since it will be evaluated at runtime.

Thankfully a simple call to the String’s intern method will register a string in the string table

s3 = s3.intern();

The rest of the blog talks about common mistakes with String equality and understanding immutability. Scarily there was a potential memory leak problem with the substring() method which holds a reference to the original string’s internal value field when providing the sub string.

More importantly talks about different options to compare strings of different locales which have multiple unicode character combinations refer to the same printed character or different characters being considered equal in a locale but not as byte equals.  All things I was never aware of until reading this blog.