Wednesday, June 27, 2007

On the Importance of Being Multi-Lingual

My day job is currently focused on porting an Eclipse RCP ("Rich Client Platform") Java application -- essentially a WORA desktop app -- to a Struts2/JSP/Hibernate webapp. We're using a fair amount of JavaScript (via Prototype and Scriptaculous, along with some home-grown code) for the things you'd normally use JavaScript for. Nothing out of the ordinary there.

One of the home-grown JavaScript bits that I've been working with recently is a text-box filter. You type in the text you want to filter on in a text-box, and the table rows (or li's, or what-have-you) that don't contain that text get hidden, leaving only those that pass the filter. Again, pretty standard stuff.

The fun started when I wanted to add the ability to filter not just on text, but on a time range chosen from a dropdown menu, like, say, "show me only the results that were posted within the last two hours". I wanted to re-use the existing filter code as much as possible, because a lot of it is exactly what I need, but I needed a way to specify comparing two times (the time of each post versus the time range selected) instead of the default behavior of checking for a match on the input text.

(Author's Note: Those of you who are familiar with Ruby probably already know exactly where I'm going with this. Bear with me while I build the suspense a little longer).

If the filter were written in Java (pre-closures Java, anyway <wink/>), I would have no choice but to jump through some serious refactoring hoops -- implement the new behavior, probably as a new set of methods, and then add in some way to designate which behavior I wanted either by introspecting on my arguments, or by an explicit switch, or...yuck. And if I were only aware of Java idoms, there would be nothing stopping me from taking the same approach in JavaScript.

Fortunately, I spent the bulk of last year doing some serious Ruby programming, which forced me to get used to the idea of blocks. Equally fortunately, I've taken some time to get familiar with idiomatic JavaScript programming, and learned that functions are first-class datatypes in JavaScript, which means they can work pretty much like blocks do in Ruby.

And so I was able to solve my problem with minimal impact to the existing codebase, by adding a parameter to the constructor (an optional block/function, defaulting to the existing "match" behavior), and making the code that does the actual filtering call that anonymous function rather than explicitly calling "match". One optional parameter and one (tiny) layer of indirection later, and I've got a solution that will never need extending again. The next time I come up against a type of filter that I haven't already implemented, I can do so on-the-fly, by building a function that implements the exact comparison behavior I want and passing it to the generic filter.

Do yourself a favor: stop hating on Ruby because you're irritated by all the attention it gets. Stop assuming that if you learn idiomatic Ruby you're going to have to turn in your "Java ROOLZ, Ruby DROOLZ" badge and your "I (heart) static typing" secret decoder ring. Free your mind, and the rest will follow. And whatever other inspirational pop lyric you care to apply here.

Learning another language (doesn't have to be Ruby; could be Python, or Lisp, or Erlang, or Scala, or...) can only increase your problem-solving capability. And, after all, isn't that what we're all about? Solving problems?

Monday, June 25, 2007

meme.lolcats.die.die.die

The title says it all, really. But I hate one-liner blog posts, so here are some other "memes" that I'd really, really like to see go away:

meme.edgy.job.posting.die.die.die

Whenever I see a job posting like this:
SELECT * FROM applicants WHERE knows_php=1 AND has_life=0
ORDER_BY tattoo_count DESC;
I simply:
DELETE FROM jobs_i_am_interested_in WHERE employer=that_loser;
I mean think about it. If you're that darn clever, what do you need me for? And what happens if you do hire me and I turn out not to be clever enough for you? Why should I leave my current position (which is at a pretty awesome startup, btw) for the eventual:
INSERT INTO ranks_of_the_unemployed VALUES(that_chump_who_answered_our_lame_job_posting);
Same goes for any company looking for any kind of "programming god", "rock star", or "l33t hax0r". In fact, if any part of your job post is ultra-snarky, ultra-hip, or smacks of leetspeak, there's a good chance I will sprain my finger deleting it. I know you're trying to be, like, all Web-Too-Oh and everything, but who do you think you're fooling? In RL (author's note: "Real Life") you're in your late 40's, and the closest thing you have to "l33t cr3d" is the time you virtually hit on that brooding emo chick whose profile you happened across while prowling MySpace. (Which, by the way -- ewww).

However, we both know that at the end of the day, you're still going to be a clock-watching, interchangeable-human-resource-administrating, two-martini-lunch-taking, office-chair-warming executive type with a pathological distrust of technology. And I'm still going to be a code monkey. So let's just call a spade a spade, shall we?

meme.verbing.die.die.die

"Incentivize". "Monetize". "Reify".

Gagize me.
Verbing weirds language -- Calvin, of Calvin and Hobbes

meme.meme.tag.die.die.die

Look, if I really want you to know about my third lung or the time I gave CPR to that baby seal in Alaska, chances are you're my wife or my physician and I've already told you. Otherwise, buzz off.

Friday, June 22, 2007

Suggested Reading: "Rich Programmer Food"

Steve Yegge: "Rich Programming Food", in which Steve explains why Compilers is the second most important course you can take as a computer science major, and should be required in any self-respecting computer science curriculum. I couldn't agree more.

I had to fight the temptation to label this "Required Reading", because I know that some of you who follow this link will be turned off by Steve's style, which I appreciate but which even I find a bit rambly at times. And I expect that his sense of humor, which resonates with mine and I therefore appreciate, is not for everyone.

But I do consider this worth a read, primarily because I whole-heartedly agree with Steve's contention that knowing how compilers work is key to understanding computer science. It is also, not coincidentally, key to getting better as a programmer.

I used to think that those two concepts were largely orthogonal. I still believe it's possible to be a successful programmer in industry without a good understanding of computer science. I myself have managed nearly 20 years of gainful employment on the strength of a computer science degree, some native ability, and the credibility that naturally goes with that much experience.

But it's been only recently that I've really started to get it, to understand the why's and wherefore's. Understanding state machines, and parse trees, and Big-O notation, and ... all that stuff I couldn't be bothered with as an undergrad ... understanding it as a grad student has made my life as a professional programmer a lot easier.

If you do follow the link, prepare to be challenged. And even if you don't make it all the way through the post, make sure to scroll down to the bottom for the punch line. Again -- I couldn't agree more.

Tuesday, June 19, 2007

Hear, Hear! (A Postscript to My Take on Closures in Java)

From Revised Report on the Algorithmic Language Scheme, Introduction, Paragraph 1, Sentence 1:
Programming languages should be designed not by piling
feature on top of feature, but by removing the weaknesses
and restrictions that make additional features appear necessary.

P.S.: This was written 'way back in 1998. Another fun quote:
Those who cannot remember the past are condemned to repeat it. -- Santayana

Wednesday, June 13, 2007

Hacking JRuby: More on Method Arguments

I closed my last post with this observation:
"...JRuby does not have an equivalent for rb_scan_args(), or at least not one that is called on a per-method basis."

In the comments, Ola Bini -- ThoughtWorker and JRuby committer -- graciously pointed out that there are, in fact, two static methods that we can use to simulate the function of rb_scan_args(), both found in the org.jruby.runtime.Arity class: checkArgumentCount(), and scanArgs().

checkArgumentCount() does just what it says. You pass it an array of argument values, along with the minimum and maximum number of all arguments, and it verifies that the actual number of arguments falls within that range (inclusive). So this provides some basic sanity checking of the number of arguments passed to the method you're implementing. If the number of arguments is valid, checkArgumentCount() returns the actual number of arguments passed. I guess you could argue that this isn't very useful, since you already know the number of arguments you expect, as well as the number given to you. But it is used in a number of places in the JRuby source code.

scanArgs() will do a little more work for you. It will do the same sanity checking (it actually calls checkArgumentCount() to do so), although you specify the numbers slightly differently, passing the number of required arguments along with the number of optional arguments. If the actual number of arguments passed is valid, scanArgs() then creates a new array of length required + optional, copies the values of any arguments passed in (all required arguments plus any provided optional arguments), sets the values of any non-provided optional arguments to nil, and returns the new array. This actually is useful, as it transforms a variable-length array of arguments into one of fixed length, where the fixed length is always equal to the maximum number of arguments your method will accept.

As Ola points out, this still is not quite the equivalent of MRI rb_scan_args(), in that it doesn't handle "rest" or "block" arguments. But it does offload from us some of the burden of handling variable length argument lists.

I do plan to use scanArgs() to finish up my implementation of BigDecimal.mode(), but in order to so meaningfully I'm also going to have to change mode()'s method definition from taking a fixed number of arguments to taking a variable number of arguments (the first argument is required, the second is optional). That's going to involve digging into JRuby's mechanism for defining Ruby methods, which I've researched and found to be pretty cool, but it's going to take another few blog posts to sort out. Stay tuned...

Tuesday, June 12, 2007

Hacking JRuby: BigDecimal and Ruby Internals

I've submitted another patch for JRuby (viewable here), to implement the BigDecimal.mode() class method. In doing so, I learned quite a bit about JRuby's implementation strategy, as well as the internals of the C source code for MRI (Matz's Ruby Implementation) Ruby.

BigDecimal.mode() explained

BigDecimal.mode() is a funky little method in the BigDecimal module, which is not part of the core API but part of the standard library that ships with Ruby. It's kind of multi-variate -- what it does, exactly, depends on how it's called.

The first parameter to BigDecimal.mode() is required, and it must be a Fixnum representing either the constant BigDecimal::ROUNDING_MODE or the exception mode to be set (more on that later). If it's BigDecimal::ROUNDING_MODE and there is no second argument, then mode() just returns the current rounding mode. If a second argument is present, it must also be a Fixnum, and it must equate to one of the seven rounding modes Ruby recognizes (e.g., BigDecimal::ROUND_UP, BigDecimal::ROUND_FLOOR, etc.). In this case, mode() sets the rounding mode (for all BigDecimals, remember, since this is a class method) to the value of the second argument.

If the first argument is a Fixnum that is not equal to BigDecimal::ROUNDING_MODE, then it is expected to have one of its bits set to correspond to one of the known exception modes (e.g., BigDecimal::EXCEPTION_INFINITY). Again, if there is no second argument, mode() simply reports the current exception mode(s) (each bit in the returned value corresponds to a single exception mode set). If there is a second argument, it must be one of 'true' or 'false'. If 'true', mode() sets the mode passed in the first argument. If 'false', mode() unsets (i.e., turns off) the mode passed in the first argument.

Simple, huh?

Not So Fast...

When I picked up this task, mode() was just a default stub that printed a message to the console and returned nil. Not a lot to go on there. So I turned to the MRI source code to figure out just what it was supposed to do.

Introducing: rb_scan_args()

One of the first things MRI does (in a lot of methods, as it turns out) is to call the function rb_scan_args()), which is implemented in the file class.c with the following signature:
int rb_scan_args(int argc, const VALUE *argv, const char *fmt, ...)
It takes the number of arguments passed, a pointer to a structure containing the values of those arguments, a format string of some sort, and...some other stuff. The number and values of the arguments are self-explanatory, but the format string and the trailing "other stuff" are decidedly not, so let's take a look at them.

The format string consists, minimally, of two digits. The first digit is the number of required arguments, the second is the number of optional arguments. rb_scan_args parses the format string to find these numbers, then it walks the list of argument values and stuffs each value into its corresponding reference (which is what the "other stuff" in the signature actually is: a group of references to store the values of the arguments in).

For example, BigDecimal.mode() makes this call to rb_scan_args:
if(rb_scan_args(argc,argv,"11",&which,&val)==1) val = Qnil;
In English:

  • get one required argument and store its value in the variable which

  • get the optional second argument if it exists and put its value in val

  • if rb_scan_args returned 1 (i.e., only one argument was provided), then set the value of the optional argument to its default of nil

So this is how MRI Ruby (as implemented in C) handles variable/optional arguments in a general way. There's more to it, of course, including an astonishing bit of hackery with C macros that actually implements putting the argument values in the right place for return. But I won't go into that until I understand it better. Also, the format string allows for the Ruby constructs of "rest args" (indicated by an '*' in the format string) and finally a "block arg" (indicated by an '&').

Meanwhile, Back in JRuby...

This has gone on a bit long, so I'll just close by saying that JRuby does not have an equivalent for rb_scan_args(), or at least not one that is called on a per-method basis. The runtime is responsible for bundling arguments and calling the appropriate Java method based on the number of arguments actually present. This causes a bit of a problem right now for class methods that take optional arguments (as BigDecimal.mode() does), but that's a subject for another post.

Thursday, June 07, 2007

Umm, What Exactly Is Google Trying To Tell Me Here?

Dear Google,

Look, I know this is a weblog about programming and Java and stuff, and I know you're just trying to help out with the targeted, "relevant" ads. And I'm okay with that. Really, I am.

However...



I think the whole "help out the poor, socially inept Revenge of the Nerds rejects" vibe is a bit much, huh?

Love,
David

Kickin' it Old School: Inspecting $CLASSPATH with sed and grep

Here's a fun sed one-liner that I used today to break up the entries in my $CLASSPATH:

echo $CLASSPATH | sed 's/:/\<return>
/g'

Note that the <return> above means to actually hit the return key following the backslash. This bit of awkwardness is sed's way of specifying a literal newline as part of the substitution string (how literal can you get?). The net effect is to replace the colon characters with newlines, resulting in a display of my classpath with one entry per line.

I needed this in the context of figuring out a broken Ant build while testing some changes I'm making to JRuby. Unfortunately, even the pretty-printed version of my Ant classpath was too long to sift through with the naked eye, so I turned to grep to look for exactly what I needed:

echo $CLASSPATH | sed 's/:/\<return>
/g | grep jruby.jar'

That is, break up the classpath into one line per entry, and show me only the entries for jruby.jar. With this, I was able to determine that I had an older version of jruby.jar on my classpath that is incompatible with the current trunk. Problem solved!

I love a happy ending.

Tuesday, June 05, 2007

Rant: JSP + OGNL + Collections == Train Wreck

The Story So Far

In my day job I'm using Struts 2, with JSP as the view templating mechanism. I have a collection whose size I'd like to report on the page. Unfortunately for my sanity, I have recent experience with Ruby on Rails, in which such a thing is as simple as:

<%= @myCollection.size %>

But no.

Problem #1

Struts 2 exposes properties on the Action class via OGNL, which has a nice, clean, property-based syntax, much like RHTML (which is what Rails uses for its view templates). So I should be able to ask for something like this:

${myCollection.size}

Assuming, that is, that I have a method getMyCollection() defined on my action. Which I do.

The problem here is that I also have to have a method called getSize() defined on whatever getMyCollection() returns. Which is a java.util.Set. Which, for some reason, does not have a getSize() method. It has a size() method instead. Apparently the designers of the Java 2 Collections API were feeling a mite saucy when they went a-designin', and were daring their overlords to punish them for ignoring the Java Beans method naming convention that, as it turns out, OGNL relies on heavily.

D'oh!

No problem, though. OGNL doesn't require method names to be bean-compliant, it just prefers it in its chain of figuring out what the heck you're asking for. You can invoke any method directly, as in:

${myCollection.size()}

Problem solved! Let's save everything and reload:

Struts Problem Report
Struts has detected an unhandled exception:
Messages:
view.jsp(40,109) The function size must be used with a prefix when a default namespace is not specified
org.apache.jasper.JasperException: view.jsp(40,109) The function size must be used with a prefix when a default namespace is not specified


What the...?! Oh. The JasperException must mean that the ${...} expression is being interpreted as JSP EL instead of OGNL. Bummer. Oh well. I'll just let the EL engine handle it.

Which brings me to:

Problem #2

The JSP EL engine can't handle it. Neither the property syntax nor the method syntax works. The property syntax wants to use the (non-existent) getSize() method too (go figure). And the method syntax doesn't exist.

So. Since it looks like JSP EL trumps OGNL when the page is rendered, I'll just turn off JSP EL evaluation (in web.xml) and let OGNL handle everything.

Or not. That results in a ton of errors from pages in my app that rely on JSP EL.

Solution

I'll try to wrap this up. It turns out that the answer is JSP functions. "All" I have to do to get the number of items in my collection is write a public function class and implement a static method -- which I get to name anything I want! Just like the Collections API designers! -- that takes the collection as a parameter and returns its size.

Oh, and I have to write a snippet of XML in the form of a .tld file that tells the JSP where to find this function.

Oh, and I have to declare the .tld as a taglib at the top of the JSP. And that's all I have to do.

Sheesh.

Conclusion

Fortunately, it turns out that the hardest part of this work has been done for me, in the form of the JSTL implementation (documented here) of several useful JSP functions, including length().

It doesn't help my mood any that in Ruby -- if the problem existed to begin with, which it doesn't -- I could have solved it simply by adding a getSize() method, aliased to the size() method of the Set class, and everyone would be happy.

Grumble.

Monday, June 04, 2007

Required Reading: Enforcing Strict Model-View Separation in Template Engines

So I've been reading Terence Parr's excellent addition to the Pragmatic Programmers catalog: The Definitive ANTLR Reference. This book is a lot of fun for me (I recommend it highly, btw), because it's gotten me thinking again about languages and language implementation issues in a way that I haven't had to since my first compilers class. Dr. Parr has a relaxed and informative writing style that can make you forget you're reading about some pretty hardcore computer science.

I wanted to do some more reading on StringTemplate, the templating engine ANTLR uses to emit the results of its translation, because I read that it can be (is) used to build websites as well. My searching led me to Dr. Parr's paper from 2004: Enforcing Strict Model-View Separation in Template Engines. In it, Parr explains -- and most importantly formalizes -- his ideas about the need for strict separation in view templating mechanisms, and how the vast majority of existing solutions (e.g., JSP, Velocity) fall far short, ironically because of the Turing-complete power they provide.

We Java programmers who have been around since the introduction of JSP have already been informally exposed to some of these concepts, in the form of the Model 1 -> Model 2 architecture evolution. One of Dr. Parr's key points, though, is that it's not enough simply to recommend avoiding using the full computational power available in (for example) JSP scriptlets. The temptation to use that power, even in seemingly innocuous ways, is simply too strong to be resisted for long. Parr's insight that having full access to the Turing-complete power inherent in JSP/Velocity/what-have-you actually gets in our way as developers reminds me of the 37Signals take on embracing constraint.

Mind you, I don't expect to be organizing a revolution around replacing my company's use of JSP with StringTemplate anytime soon. But I do plan to develop the personal discipline of following Dr. Parr's advice and embracing constraint by voluntarily avoiding the constructs that violate strict separation. That statement may seem to contradict the statement above about temptation being too strong, but I think that if enough individuals had the same resolve, we could improve the situation somewhat until a move to StringTemplate or its equivalent became politically feasible.

Closures in Java: Continued

In the comments section of my last post, user sanity makes the following astute observation, re: my closing remark about Java being a Turing complete language:

Well, pretty-much every programming language is Turing complete, so by that argument, why bother adding any feature to any language?

To which I reply: "Bravo!" and "Well said." I'll even take it one step further: anything that can be done in a high-level programming language can be done in assembler language, or even machine code. I know from hard experience, having implemented an encapsulation/data-hiding scheme using IBM mainframe assembler language, and having patched code using the raw machine opcodes in the same environment. Ah, the good old days!

So I'll go back to programming in assembler language, and the rest of you can either join me or take a well-earned break. Thanks for playing.

All right -- as much fun as the reductio ad absurdum game is, all it does is reduce the argument to an absurdity. It doesn't really point to a constructive resolution. Although, in this case, it does raise the question (note: notice I did not say, "begs the question," which is a solecism that I will reserve for another post) of the degree to which the effort involved in constructing a higher-level language offsets the effort involved in writing code in a low-level language.

I think we can all agree that programming in C or C++ is more pleasant and more efficient in terms of our time than programming in assembler language. You give up some flexibility (e.g., the ability to stuff a specific value in a specific register, direct access to all non-protected memory), but you gain recursion, stack management, register coloring -- all that good stuff.

Move up to Java, or Objective-C (version 2.0; coming soon!) and you get the added relief of automatic memory management (allocation and garbage collection). With Java and its ilk, you also get checked exceptions, which you may or may not appreciate.

Move up to Ruby (JRuby), or Python (Jython), or LISP, and you get dynamic typing and language/runtime-level support for first-class functions/blocks/closures.

So switching between these flavors of Turing-completeness makes sense. If you need close-to-the-metal performance with some measure of portability, use C. If you need rapid prototyping, use Ruby or Python. If you need to justify spending $300+ on IntelliJ IDEA, use Java. (Sorry, couldn't resist).

My point (and I do have one) is that the difference between Turing-complete-Java and Turing-complete-Java-now-with-closures! is just not worth the effort it looks like adding this nice-but-not-essential feature to the language will require.

Closures in Java will muddy an already-dense syntax even further. They will require more code butchery and internal gymnastics on behalf of the compiler. They will be devilishly hard to add in a way that makes sense, given Java's baked-in typing restrictions and flow control. And, in the end, they're going to satisfy the demands of a vocal minority. The masses will still take the path of least resistance, and will probably avoid closures like the plague. And those of us who make a point of being aware of non-Java programming idioms will appreciate the differences between the many available languages, and will continue to strive to use the best tool for the job at hand.