are a classic means of storing trees of data. Part of the original Lisp specification, they have been a part of software engineering since the very beginning. Other languages, notably Scheme and, more recently, Clojure, have helped maintain the relevancy of this very simple means for representing data.
In the spirit of Greenspun's tenth rule , we are going to attempt make first-class s-expressions in Ruby. There is already a nice implementation of s-expressions in Ruby, for reference, but they are not first-class, which makes them less than ideal. As such, we are going to to see how far we can push (read: abuse) the dynamic nature of the Ruby language, and investigate just how much control we have over certain elements of its syntax.
For anyone more interested in the result than the journey, be warned that this attempt will not be successful, but some interesting results will be found along the road to ultimate failure.
The most natural way for storing nested lists of data in Ruby, is its Array
class. We will be able to skip most of the important details, and simply need to write a mechanism for calling an Array.
The following is a fairly trivial duck punch on the Array
class that introduces a new Array#call
method. This method treats the first item in the array as an operation to be performed, and the second as the implicit receiver of that message. If the receiver responds to the message, that method is called, with the rest of the original array as arguments; if not, then the operation is called directly with the entire remainder of the array as arguments.
This is a terribly incomplete implementation, and we cannot recommend it for production software. Were it more correct, it would still not be advisable to use for any realistic purposes. In spite of that realistic perspective, this actually produces some decent results.
Being required to call
each array after instantiation, however, is incredibly suboptimal—from a standpoint of maintainability and legibility, of course. Perhaps it will be possible for us to be able to have this method get called automatically under some circumstances. In order to do this, we will need to investigate how arrays get instantiated and see if we can hook into that process to streamline our s-expressions.
The first thing to try is overriding Array#initialize
, which should be a simple enough way to transform arrays into s-expressions (read: completely break the functionality thereof). For this first pass, we will just add some debugging output to verify that this is the correct place to hook in and abuse Ruby.
For some reason, the constructor function never gets called when using the array literal syntax. Luckily, we can also manipulate the Array::new
method, so surely that must be the answer. Tapping into individual methods to check if they are getting called will quickly become tedious, however, so first we should factor out that process into a function.
Now, looking into the call to Array::new
, we should expect to see some debugging output.
Surely, Kernel.Array
, Array::[]
, Array.to_a
, or Array.to_ary
must get called at some point during the instantiation of an array from the literal syntax.
This result, in spite of not being at all what we were seeking or expecting, is very interesting, indeed. We have finally found a method being invoked during the instantiation of an array from the literal syntax, but it is an array containing all the characters from our line of input. Very strange, indeed.
Desperate times, desperate measures. Looking through the calls to specific functions is all well and good, but does not give a holistic enough picture of what is actually happening. Perhaps, we should observe every single method call on every single object and class in the entire ObjectSpace
of the Class
class. Surely, that will reveal something about array instantiation.
A not particularly surprising result. If we, however, limit it to the classes of live objects in the session, we do get slightly better results.
This is mostly garbage, but there are some interesting snippets amongst the multitude of noise. Maybe we can abuse something in RubyToken
to allow us to intercept (and own) the normal array instantiation process. Unfortunately, this has actually all been a red herring. The RubyToken
module is a part of irb
, which we conveniently neglected to mention are using to test the various examples. We get no useful output trying the same hacks from within a ruby script, and get different ones when we use pry
. The one glimmer of hope was naught but the interactive shell processing our commands.
There are a couple more things we can try, but few avenues remain. We can look at what Ruby does to our input string using Ripper
, but that is not of any real value in this case.
A more insightful approach could be to use Kernel#set_trace_func
to see exactly what happens internally during array instantiation.
Here we see only one line
event occurring on line 5, where the array is instantiated. This is the final dead end.
With no other avenues to explore, we must concede defeat. At some point, Ruby really must be just magic. Realistically, of course, what appears to be happening is quite the opposite. We have found a situation where Ruby is not as flexible as we would like it to be (N.B. this is not pejorative). In the end, our little edifice is built atop C, and we cannot always manipulate all aspects of our Ruby code in ways that are stupid or downright dangerous. Fortunately, this exploration was not entirely in vain, for we learned about a few interesting features of Ruby and the point at which you can no longer bend the language.
26 Feb 2015