Wednesday, October 03, 2007

Java 1, Intuition 0

Pop quiz: If you were given the following snippet of code:
"".split("\\s");
...what would you expect the result to be?

If it helps, the contract for String.split() is that it takes a regular expression (in this case \s means "anything that could reasonably be considered whitespace, like spaces, tabs, et cetera"), and returns a String[].

My intuition was that it would return an empty String[]. (Author's Note: which would have been better than a null, but even that's not what I got.)

I mean think about it: what does it mean to split the empty string, on any character? The empty string has length 0, so what can you split it into? Two strings of length 0? But then you could do that ad infinitum, which would be silly.

So what Java actually does is return an array with one string: the empty string, or one string of length 0. Which implies that Java thinks the beginning of a string -- represented by the regex \A, or ^ if you know the string you're regex-ing doesn't contain newlines -- is significant, but the end of a string (\Z or $) is "whitespace". I guess you could make a case for this making sense, but it doesn't match up with my intuition.

What do you think?

3 comments:

Unknown said...

To be honest, it makes sense to me. The result you got follows the same behavior as if you had written:

"abc".split("\\s");

In this case, the returned String[] contains one element, the original string "abc"; in your case, the original string happened to be empty.

I can understand the confusion if one were to assume that the "split" method unconditionally resulted in an array of substrings that were, in fact, split from the original -- for the same reasons you list: what does it mean to split the empty string?

However, if you think of split's behavior as, "Split the given string around any token matching a given pattern, if such a token is found," then the observed (and documented) behavior makes sense. I don't find that too unintuitive.

David Rupp said...

Thanks, William. I agree that your explanation makes sense; thanks for the thoughtful and reasonable response. :-)

David said...

At least there is a split() method. Where the heck is join()? Why arrays and Lists cannot join their elements together is a question I've spent too much time considering. For that matter, why can't a List sort itself?