Cool Stuff I Learned on the Job

During the last week, I had the opportunity (after 5 years of Java) to return to writing some C code – yep, you read that right – not even C++, but straight C. I’m sure glad we have C99 and C11 now because I don’t know what I’d do if I couldn’t even declare variables within a for statement.

During this time, I realized that there are some really cool things I learned while working my first engineering job years ago at Novell. I was part of the Novell Directory Services team (later renamed eDirectory). I made a lot of friends in that job and a few of them taught me some really great techniques.

For example, I needed to use a singly linked list and (C being what it is, and all) there just aren’t any library routines for it, so you have to write it yourself. Yes, yes, there are myriad libraries out there at places like github that supply all the code you could want – I’m talking about STANDARD library routines. The most complex function in the C standard library is qsort. After working in the Java world for the last 7 years, I find it hard to go back to a language where you have to write your own data structures.

I needed a way to traverse the list and remove an element matching some search criteria. Now, this is an old problem – people have solved it already, right? So I headed over to my favorite best-solution site, StackOverflow. If you need to know how the majority of folks do stuff, that’s where you go. I found… nothing… well, nothing elegant anyway. There were any number of solutions involving multiple if statements checking various boundary conditions to handle everything perfectly. Then I remembered what my old mentor Dale Olds taught me about this topic – or, rather, I remembered MOST of it. It took me the better part of an hour to remember in enough detail to actually reimplement it.

This blog post will be a running post, updated as I remember little techniques like this. Hopefully, by the time I’m done in a few years, I’ll have a nice library to rely on.

Remove an Element from a Singly Linked List

Let’s consider what we have in a singly linked list. Well, we have a head pointer and nodes with a next pointer. Perhaps a tail pointer, but only if you’re using it like a FIFO queue or something, where you can append to the tail and remove from the head. Let’s just stick to the basics – a head pointer and a next pointer in each node. To traverse from the head to a node matching some search criteria and remove that node, you must keep track of your current node, and the previous node. Or do you? The actual fact is, you only need to keep track of the current node and the previous node’s next pointer and herein lies the elegance. Here’s how we did it back in the day:

static struct node_t *head;
struct node_t **pprev, *cur;
for (pprev = &head, cur = *pprev; cur; pprev = &cur->next, cur = *pprev)
    if (is_match(cur) {
        *pprev = cur->next; /* unlink cur */
return cur;

The really elegant thing about this technique is that head pointer maintenance is automatic. No need to do something special if the matching node is the one pointed to by head. Because you’re only dealing with the address of the previous node’s next pointer, and head looks just like one of those when you’re dealing with its address and not its value. You can treat it just like you would any other next pointer. Cool, huh?

Heads or Tails

This technique loses a bit of its elegance when you decide to maintain a tail pointer. Now you really do need to keep track of the address of the previous node; if the node you decide to remove is the current tail then you need to repoint the tail at the previous node. In short, the address of the previous node’s next pointer is no longer sufficient. Here’s a possible implementation:

static struct node_t *head, *tail;
struct node_t **pprev, *prev, *cur;
for (pprev = &head, prev = NULL, cur = *pprev; cur; 
        pprev = &cur->next, prev = cur, cur = *pprev)
    if (is_match(cur) {
        if (cur == tail)
            tail = prev;
        *pprev = cur->next; /* unlink cur */
return cur;

In this case, we’re tracking all the same information, plus the address of the previous node (which is NULL in case we’re removing the first node).


Why Won’t my mp4 Play on Windows?!

If you’ve never asked this question before then you’ve never tried to convert a captured .avi file to a .mp4 and play it in Windows Media Player – it’s that simple.

I have some home videos I shot years ago using a Hitachi VM-E635LA 8mm camcorder. It was a nice camera in its day but its day is long past. I also have a Canopus ADVC 1394 720p capture card installed in my media center PC. It took me the better part of a Saturday googling and messing around to figure out how to get VirtualDub to capture raw video files from the camcorder without audio sync/speed problems in the resulting AVI (Audio/Video Interleave) file.

Sidebar: Capture Setup

  1. Open VirtualDub and select File | Capture AVI… from the menu.
  2. Select File | Set capture file… (F2) and give the output file a name.
  3. Select the device in the Device menu – the Canopus card uses a standard Windows capture card driver that shows up in this list – and in Device Manager – as “AVC Compliant DV Tape Recorder/Player (DirectShow)”.
  4. Select Video | Overlay from the menu (Preview doesn’t work with this driver).
  5. Select Audio | Enable audio capture and ensure the audio device is properly selected at the bottom of that menu – I used the capture card’s audio channel, so I selected “Capture device” (presumably meaning the video capture device).
  6. Select Capture | Timing… and here’s the trick to getting the audio to remain in sync with the video and not speed up and slow down as the video progresses – uncheck “Drop frames when captured frames are too close together” and “Insert null frames when captured frames are too far apart”. Also select do not resync between audio and video streams in this dialog box.
  7. Press F6 to start the capture, while simultaneously pressing the play button on the camcorder. Let it run till the show is over. Then press ESC to exit the capture and finalize the output file.

See, here’s the thing – the camcorder is basically a VCR (when running in playback mode). VCR’s are notorious for having (sometimes wildly) varying frame rates during playback. Hence, VirtualDub will drop frames and insert null frames – lots of them. Unfortunately, when it drops frames, it tries to speed up the audio to keep it in sync with the remaining frames, so you’re watching your captured video and suddenly your wife’s and children’s voices go up or down an octave.

On the other hand, if you don’t drop or insert frames and you don’t attempt to resync the audio to the video, the camcorder is pretty darn good at sending across exactly the sound that’s supposed to go with each frame of video captured. The moral? Don’t mess – just let it all happen at its own speed and it works just fine.

Why make this a sidebar? Because chances are good these instructions won’t work for you. You’d need a similar capture card and Canopus is not even a company anymore (though you can get some of their higher-end products from Grass Valley – a subsidiary of Belkin – who bought Canopus back in ’05).  Popular USB capture devices today can be had on eBay for 5 bucks or less and are known by a variety of names such as “EasyCAP”. The configuration for using these devices is significantly different from using a PCIe capture card. You can still get PCIe capture cards but they target high-end broadcast media and are kinda pricey.

End of Sidebar.

Now, where was I – oh yeah – we were talking about why you can’t convert these AVI files to mpeg4 and play them on Windows Media Player (WMP). Now, don’t get me wrong – I don’t really think WMP is the ultimate home theater experience. But when my mother wants to watch my newly digitized home videos, I don’t want her to have to download VideoLAN or some other third party player based on ffmpeg. I want her to be able to double-click the file and have it play, and on Windows, that’s WMP. (The same problem occurs on Mac OS, by the way, so wipe that smirk off your faces QuickTime users.)

It’s not really strange to me that you might stumble across a combination of settings in ffmpeg that actually don’t work on WMP or QuickTime – there are about a bazillion settings – and that’s just the settings for ffmpeg’s h.264 plugin library. What’s really strange is that no  one else seems to have these issues. Which is why I’m writing this post today.

I did eventually find a StackExchange (SuperUser) article that solved the issue for me.  It just took a couple of hours of sifting through answers like, “WMP can’t play mp4’s fool – get a real media player like VLAN.” Well, that was news to me – according to Microsoft, WMP version 12 should be able to play them – and version 12 is available for Windows 7 and up. Everyone with automatic updates enabled probably has it.

WMP can play mp4 files – the problem, as it happens, is that Microsoft’s definition of mp4 is a lot more strict that libx264’s definition. In the ffmpeg world, there are myriad discussions about “legacy” and “outdated” players. Turns out this world considers the latest versions of WMP and QuickTime to be outdated players (and in fairness to Apple, QuickTime 9 is officially in end-of-life status). All this really means is that you have to use a few settings to ensure the video stream generated by ffmpeg is compatible with WMP.

Here’s my command line for using ffmpeg to convert a captured AVI file to an mp4 file that WMP and QuickTime can play:

c:\> ffmpeg -i captured.avi -vf yadif -c:v libx264 -crf 23 ^
 -pix_fmt yuv420p -preset medium -c:a aac -b:a 128k -ac 1 output.mp4


-i <input-file>    - input file
-vf yadif          - "yet another deinterace filter"
-c:v libx264       - the video codec to use
-crf 23            - constant rate factor - 23 is the default
-pix_fmt yuv420p   - pixel format
-preset medium     - compression speed
-c:a aac           - audio codec to use
-b:a 128k          - audio bitrate
-ac 1              - audio channels

Many of these are self-explanatory, but there are a few odd ones, and at least one that makes all the difference with our favorite “legacy” players. Some of the more esoteric ones include the de-interlace filter. Let’s face it – any video captured from a VCR or camcorder is going to be interlaced. It’s so obvious you can literally see it in the content while it’s playing – and it’s annoying.

The other one that you might find strange is -ac 1. Why am I only allowing one audio channel? Because my camcorder only has one audio channel – even if you patch it into both left and right inputs on the capture card, all you’re getting is expensive mono. By telling ffmpeg to encode mono, you get the exact same effect in half the space.

The magic h.264 argument is -pix_fmt yuv420p. Most 8mm camcorders recorded data using a pixel format of yuv411 or 422 because the data fits better on the tape that way. After all, it was the camcorder designers that invented these formats. ffmpeg likes to keep things the same unless you tell it otherwise. And it seems WMP believes that any mp4 file containing video with a pixel format other than yuv420p is not worth playing. Strange attitude coming from a forward-thinking company like Microsoft… (I know that sounds totally facetious – and it was intended to, but… I’m kinda half serious, too.)

Maven code generation – ftw!

Have you ever wanted to generate some of the code you use in a project from a data file?

Recently, I’ve been working on a class that provides configuration options to the application. The options come in from a file or are hard-coded, or simply come from defaults, but it’s nice to be able to access configuration options from a single class with type-safe ‘getters’ for each option. The trouble starts when the set becomes larger than, say, a dozen or so options. At that point the class starts to become a maintenance nightmare. Such classes usually access the property name and type a dozen times or more within the class definition. This is the perfect chance to try out template-based code generation.


My first thought was to look for a maven plugin that provided such template-based, data-driven code generation. Hmmm – there is the replacer plugin, but the data source is the maven pom file. Well, ok, but I had in mind a more formal definition of the dictionary. I didn’t really want to bury my properties in some obscure pom file in the middle of a multi-module project. In fact, it would be nice if there existed some template-based file generator that would accept data from multiple sources, and even from multiple types of sources – xml, json, csv, whatever.

I’ve had some limited experience using Apache Velocity for this purpose in a past life. The trouble is my experience with it really was limited. I wasn’t the author of the system – just the consumer – all the work of integrating the Velocity engine into the maven build life cycle had already been done by someone else and I never bothered to see how they did it. A quick discussion with a friend who still worked there enlightened me to the fact that it took a LOT of custom code to get it working right.

Now it seems to me that this is not an uncommon thing to want to do – someone must have worked out all the kinks by now in a nicely integrated solution, right?


I looked at several code generation packages and finally landed on Apache Freemarker (the Apache foundation seems to be the final resting place for a lot of projects – many of which completely overlap in functionality – once their originators tire of them). Freemarker sports a powerful data-driven template language so it was a perfect fit for my needs, but – honestly – the main reason I went with it is because of the fmpp (freemarker pre-processor) project, which is a command-line front-end for the Freemarker template engine. While I could have used the Freemarker engine the same way – it’s also a jar – the engine requires data to be fed to it though its API, and the maven-exec-plugin is just not that sophisticated. No, I needed a command-line tool that would allow me to specify a data file as a command-line argument to the jar.


The fmpp command-line is also very powerful and functionally complete. Additionally, it’s written in java and, thus, comes packaged in a jar file which can be executed by the maven-exec-plugin‘s “java” goal:


Important aspects of this snippet are highlighted in red:

  1. I’m sure it’s obvious to every other maven user out there, but it’s never been obvious to me how you get the exec plugin to run as part of your build. There appear to be several ways of doing this, but the most succinct, and simplest, in my opinion, is to specify a life cycle phase in an execution. In my case, I wanted to generate source code to be used in the build process, so the “generate-sources” phase seemed appropriate.
  2. Being able to add a “dependencies” section to a plugin made sense to me. What did not make sense was having to explicitly tell the plugin to add those dependencies to the plugin’s class path. I mean – what would be the point of adding dependencies to the exec plugin unless you wanted those dependencies to be accessible by the plugin when you told it to run something?! Well, this is apparently not obvious to the authors of the plugin, because unless you set the “includePluginDependencies” tag to true, your attempts to execute any java code in the fmpp package will be in vain.
  3. Finally, I needed to upgrade the Freemarker engine to a newer version than the one that last shipped with fmpp. Adding that as a separate dependency allowed my specified version to override the one defined in the fmpp pom.

The fmpp command-line is driven by a combination of command-line arguments and a configuration file. You may use all command-line arguments, or all configuration file options, or any mix of the two. I chose to use a mix. My configuration file is minimal, just specifying the data source and where to find it, as well as what to do with the file names during translation:

removeExtensions: ftl
dataRoot: .
data: {
  properties: json(data.json, UTF-8)

The first line tells fmpp to remove the ftl extension from the templates as it generates source files. A template file might be named, while the corresponding generated source file would be

The second line says where to look for the data file, relative to the configuration file (they’re in the same directory).

The third option – “data:” – tells fmpp where to look for data sources. I have one data source, a json file named data.json, and it uses UTF-8 text encoding. The term “properties” indicates the name of the root-level hash that will store the data in the json file. Here’s an example of some json data:

    "type": "Integer",
    "section": "someSection",
    "name": "someProperty",
    "defaultValue": "1",

With this data set, you can generate a java class from a template like this:

package com.example;

import ...

 * NOTE: Generated code !! DO NOT EDIT !!
 * Reference:
 * See template: ${pp.sourceFile}
public class Configuration implements Cloneable {
<#list properties as property>
    private final ${property.type} ${};

The possibilities are endless, really. The Freemarker template language is very powerful, allowing you to transform the input data in any way you can think of.


I use an IDE – I love my IDE – it does half my work for me. I remember a math teacher from middle school that was upset at the thought of tools that might cause you to forget some of the basics. But, let’s face it, software developers have so much to remember these days, it’s nice when a tool can take some of the load off of us.

One thing we can agree on about maven is that directory structure matters. Maven is all about using reasonable defaults for the 10,000 configuration options in a build, allowing you to override the few that need to change for your special needs. Thus, directory structure matters. It is, therefore, important to choose wisely where template files go and where generated sources are placed.

Referring back up to the pom file plugin definition, note that the fmpp “-O” command line option specifies where to place generated files. And maven has a place for them – in the target/generated-sources directory. IntelliJ will recognize these files as sources and index them properly. Additionally, since IntelliJ recognizes the generated-sources directory as the proper maven location for generated sources, it will also warn you when you try to edit one of these files.

The fmpp “-S” option specifies where the source templates can be found. These I placed in the src/main/templates directory. Now, the templates sub directory of  src/main is not a standard maven location but, not being able to find a better location, I figured it made sense to place them in a package directory structure relative to the src/main/java directory. If you know of a better location for such templates, please feel free to comment.

The vSphere Web Client and Legacy “Script” Plugins

If you develop and maintain a vCenter server plug-in that extends the vSphere client to support management of your company’s ESX-based products then you have an uphill battle in front of you. I’ve (sadly) come to the conclusion that the only thing that keeps VMware in business today is market share. It’s certainly not great developer documentation and support.

To be clear, they have lots of developer documentation; unfortunately, much of it is less than truthful and it’s often too sparse to be useful. Nevertheless, most of us stumble along managing to make our plug-ins work, but never being really sure we’re doing exactly the right thing in any of the required steps.

VMware is soon to release ESXi 5.5 and surrounding vSphere infrastructure. This new version of vSphere deprecates the Windows thick client – prematurely, in my mind. I know why they’re doing it: They’ve taken a lot of abuse in the industry during the last 5 years about being too Windows-centric with respect to vSphere management tools.

They’ve needed to move to a web-based architecture. To this end, some time ago they began working on the vSphere Web Client – a management client much like the vSphere Windows thick client except, of course, that its rendered in a browser and is, thus, fairly portable. Unfortunately, the web services framework they chose was Adobe Flash and they jumped in with both feet before they realized that Flash was on it’s way out the door in favor of html 5.

VMware specifies two types of “legacy” plug-ins – those designed to be used with the vSphere Windows client:

  1. C# client-side plug-ins. These actually run within the process context of the Windows client itself. They can be deployed by a server, so they’re available for download and installation at the click of a button within the client but they are written in C# and run within the client’s process address space on your Windows work station.
  2. So-called “script” plug-ins, where the term “script” refers to html or xml script, I suppose – it’s a terrible name. What it actually means is that you have a web server running in a VM somewhere providing content to be displayed in an embedded browser window within the client.

This browser embedding mentioned in 2 above is done solely by virtue of the fact that Microsoft Internet Explorer is mostly COM objects wrapped in a simple GUI application. These COM objects may be used by any process, and are documented by Microsoft so they may also be used within third-party applications – such as the vSphere Windows client. (In fact, the specific COM interface used is the one providing Internet Explorer 7 functionality, so to test your plug-in within a normal browser, you should probably enable IE 7 compatibility mode.)

Not all script plug-ins provide a GUI interface, but all of the extension points supported within the client operate on the principle of sending a URL off to a server somewhere and either rendering reply data, or expecting the receiving server to perform some remote operation based on the URL content (path, query string, fragments, etc.).

With the advent of the Web Client, a new type of plug-in is now supported (nay, expected!) by VMware. Essentially, these plug-ins are a combination of a back-end web service that contains business logic and a Flash application that runs within a Web Client Flash frame in the browser.

The Web Client supports the newer Flash-based plug-ins, of course, but it also supports legacy script plug-ins. Given that most people don’t have the time or inclination to rewrite yet another pre-destined-for-doom version of their plug-in, this article focuses on getting your existing legacy script plug-in to work in the Web Client.

VMware Documentation

Read the following VMware documentation pages on getting your legacy script plug-in to work in the Web Client:

Also read the three sub pages beneath this page: Enabling Script Plug-in Support…, Where Script Plug-in Extensions Appear…, and Known Issues…. When you’ve done this, come back and we’ll focus on explaining and debunking the content of these pages. (Chances are pretty good that if you’re here, you’ve already read these pages and are now looking for the real documentation.)

Make note of the extension point differences in the Where Script Plug-in Extensions Appear… page. Assure yourself that your plug-in will provide mission critical functionality in the Web Client before you go any further. Some of the legacy extension points are simply not supported and there’s nothing to be done about this except find a different way to provide your functionality.

Enabling Script Plug-in Support in the Web Client

The first thing to note is that script plug-in support is disabled by default in the Web Client. Explain to your customers that they’ll have to modify a configuration file on the server on which their web client service is running – usually the same as their vCenter server. (Don’t concern yourself overly with this requirement – various groups within VMware provide only legacy script plug-ins and themselves recommend enabling script plug-ins in the Web Client.) The documentation is accurate here – simply have them add the following line to their file, as documented:

  scriptPlugin.enabled = true

If your plug-in only supports unsecure (http) traffic, you’ll also need to have your customers add the allowHttp = true line to the properties file. If you prefer secure plug-in traffic (and you probably should) then you won’t need this step, but you will want to read on because the VMware documentation gets a bit murky from here on in.

The second paragraph on the Enabling Script Plug-in Support… page is one of the fuzzy places. It indicates that you “must add the SHA1 thumbprint of the server where the scriptConfig.xml file is located.” Huh? What server? What’s a scriptConfig.xml file? The SHA1 thumbprint of what… a file… a certificate… a server… some data somewhere? Is a thumbprint the same as a fingerprint? Lots of questions, no answers…

At this point we opened a support ticket with VMware and were told to add the thumbprint string to the registration document you pass to the ExtensionManager when you register your plug-in with vCenter. A quick look in the Managed Object Browser (mob) showed that there is indeed a serverThumbprint field in the server sub-record of the registration document. It didn’t take too much effort to figure out where you assigned that value in the vim web services SDK. The problem still remained: What exactly is a server thumbprint?

I surmised from past experience that perhaps the thumbprint was the ssh server’s rsa fingerprint – the series of hexadecimal octets that shows up when you first attempt to ssh into a linux system – in this case the plug-in server. I queried and was told that was exactly correct. This bit of misinformation kept us busy for a week. A particularly sharp member of my team then considered other types of SHA1 fingerprints and discovered that if you use the following command line:

     openssl x509 -fingerprint -noout -in certificate-pem-file

You can get a SHA1 fingerprint value from any PEM certificate file. In this case, the certificate that makes the most sense is the one presented by your plug-in’s web service. And it turns out this is the value you must use. (See this great stackoverflow article for openssl C/C++ code that queries a certificate for its SHA1 fingerprint.)

Of course, this means that you will have to reregister your plug-in extension with the ExtensionManager any time you change your plug-in web service’s server certificate, but this doesn’t normally happen often.

Final Thoughts

On the Known Issues… page most of the content is clear, but note the second bullet point:

  • When using a script plug-in at a secure URL (HTTPS) in the Chrome or Firefox browsers, you must load the script plug-in page in an external tab at least once before it appears inside the vSphere Web Client.

The reason for this is that the Web Client doesn’t have very mature support for server certificates that are not signed by a well-known CA, or where the corporate CA hasn’t been imported into the browser’s trust store. In the Windows thick client, when you access a plug-in with a self-signed certificate, you see popups that warn you and ask if you’d like to continue. The Web Client simply fails to display the plug-in content if the plug-in server’s certificate is not signed by a CA in the browser’s trust store.

A quick work-around is to open another tab in the browser (or another instance of the same browser) and navigate to your plug-in server’s main page and accept the certificate using the browser’s built-in certificate management mechanism. Once this is done, the Web Client will accept the plug-in’s certificate because the browser has already accepted it.

Armikrog and The Neverhood

Several years ago I walked into my manager’s office around Christmas time and saw a PC game box lying on his desk. The cover was intriguing so I asked him about it. He lit up and asked me how I’d never heard of “The Neverhood”. That’s the way it is with some well-kept secrets. They were never intended to be secrets – they just weren’t advertised properly or widely enough, or something else that I’ve never been able to put my finger on. How could so few people know about this?

The Neverhood is a claymation game that came out in the 90’s. It’s humorous and mysterious, and the story line is well written and easy and fun to follow. I immediately ordered a copy from Amazon and put it under the tree when it came. My kids and I had more fun that Christmas then we’d had in a while – or since.

If you can find a copy and if you still have a system it will run on, I’d highly recommend you pick it up. (It was written for Windows XP, but I’ve tried it on Windows 7 and it seems to work –  in itself, a testament to the quality of the software.) It’s why people my age bought video games in the first place – none of this stuff they seem to focus on today where you get a basic shell for a game when you buy the box, and then find yourself hooked into periodic requests for your credit card number in order to purchase features as the game moves along, or monthly subscriptions to an online fantasy world shared by thousands of others. (Where’s the game in that? It’s not a game anymore, but a social media experience.) The Neverhood wasn’t particularly cheap – 65 bucks – but it was worth every penny.

Now, years later, the passionate folks at Pencil Test Studios have gotten together to create another claymation game in the spirit of The Neverhood – Armikrog – and they’re taking an interesting approach to funding the project. They’ve started an Amazon Kickstarter for it and, unfortunately, they’ve stalled out a bit in reaching their funding goal. They’re asking for 900 thousand dollars. In the first three days, they made it nearly to their half-way point. It would be really sad if Armikrog which, by all the tidbits they’ve dropped on the Kickstarter site, looks to be just as fun and amazing as The Neverhood was, didn’t made it out the door for the same reasons the Neverhood wasn’t very popular – poor advertising.

I’ve donated, myself, but I’ve decided I can do more to help the project along, so I’ve written this quick note about it. Please take a minute if you’re a gamer, and donate if you think you might enjoy it. If you like this sort of puzzle/story line gaming experience, I can just about guarantee you’ll love Armikrog.

Hardware Tools: A PCI Odyssey

Prologue: Please forgive me for the “sales pitch” tone of this entry. It’s the nature of the items I’m describing.

In this industry, we hear a lot about software tools – compilers, syntax checkers, ides and editors, verification utilities, linkers, pre-processors, you name it. But the realm of hardware tools is often enshrouded in mysterious clouds of black magic.

About 6 months ago I changed jobs; I moved from a software company (Novell) to an up-and-coming hardware company in Salt Lake City. I still write software, but now I write driver and utility code designed to support our hardware devices. Recently, I found myself wishing for a way to add, swap, and remove PCI cards from my desktop Linux machine without having to power down the system.

I’m always updating the firmware in my test unit, so it would also be nice to reboot a PCI card without rebooting the entire system. Linux is a pretty fast boot, but it’s annoying to have to do it several times a day, even when the process is reasonably quick.

Finally, it would be nice to be able to plug a PCI device into my laptop so I can more effectively work at home. When I first started with my current employer, I didn’t think these goals were possible, but I’ve since gained some insight which indicates that my original thoughts were in ignorance.

I had occasion to chat with one of our test lab staff members a couple of months ago for the first time. While there, I watched him plug my company test device into a bare card with a PCI slot sitting on his desk. A glance at the back side of this card revealed a cable that went into the back of his computer on the floor. The card was also powered by an external ATX power supply which he switched off before connecting or disconnecting the PCI device.

Cool! A way to hot-swap cards without rebooting. However, my hopes were dashed when he told me that the setup cost nearly 1000 dollars. Despite being the new kid in town, I probably had enough clout to ask for such a setup at my desk, but I just couldn’t justify it emotionally. My lab friend uses it all the time. In fact he swaps cards so often that he uses a disposable “slot saver” – a sort of cheap “extension cord” for his PCI slot to keep the base slot from wearing out.

Well, that was that – or was it? I opened up my browser and began to google for PCI external, extender, adapter, tools – whatever I could think of. No luck. I couldn’t even find the company that makes the tools we already use in our labs. It’s interesting how little advertising hardware companies do – I believe they probably rely on word of mouth more than anything else. Apparently that works for them, but it wasn’t working for me.

Then another friend sent me a link to a company in Shanghai called Shanghai BPlus Electronics Technology Ltd. BPlus (a bad name from an American culture perspective – why not APlus?) sells something similar to the card I saw in our lab, but it costs less than 100 dollars and it’s more functional.

They have 3 distributors, one in Canada, one in Japan, and the one I found in Taiwan at (Don’t omit the “www” unless you can read Chinese.) BPlus’s marketing literature indicates they have sales contracts with Intel, HP, DELL, and PLX, so they’re not likely going under anytime soon.

The configuration I bought was a PCI passive adapter – the PE4H. BPlus purposely designs modular components so they can be used in a mix-and-match fashion. This saves money and increases the usefulness of the devices they make.

The PE4H is sold in several configurations, one of which includes an ExpressCard adapter for your laptop. I have a Lenovo W510, which has an ExpressCard slot built into it. This option comes with the passive adapter – a card with a smooth bottom that sits flat on your desktop, an ExpressCard adapter, and a mini HDMI cable that connects the two parts. It also comes with an external 5V/12V power supply cable and a SWEX adapter – an ATX power supply power-on switch.

This is everything you need to connect a PCI card to your laptop. Here’s the best part: It’s 85 dollars US, plus about 20 dollars shipping from Taiwan – probably cheaper if you can figure out who the Canadian distributor is.

The ExpressCard interface only supports a single PCI link lane, so if your device requires more lanes then this configuration won’t work for you. However, if all you want is an external PCI adapter with hot-swap capability, then the PE4H will still work for you. You’ll just need to buy a different option. In this case, purchase the option with the PCIe 1X passive adapter, HP4A (115USD + shipping). This card has 4 mini HDMI connectors that allow you to connect up to 4 link lanes between your PC’s PCI bus and the PE4H card on your desktop. The cables that come with it are only 30cm in length, so I’d also buy 4 100cm cables (10USD each).

I mentioned above that the device can be powered using an external ATX power supply, but it can also be powered using a simple 15-20 VDC laptop adapter. In fact, it’s power requirements are very flexible. It can handle anything between 15 and 20 volts, AC or DC, at either polarity, as long as it supplies at least 3 amps. I bought a 15 VDC, 6A power brick on eBay for 10 bucks. I had to replace the “Q” class coaxial power plug with an “N” class power plug I bought at Radio Shack. If you can find a power brick that supplies exactly 3 amps, you might find it already comes with the smaller “N” class plug.

You can purchase one of the option packages described above, but you can also purchase each component individually. I purchased the ExpressCard option package, and then the HP4A (and long cables) separately. Total bill: 215 dollars, and the frosting on the cake is that I can connect to my laptop also. Now that’s a price anyone’s manager would be okay with!

Managing a Dynamic Java Trust Store

This is the latest (and probably last) in my series of client-side Java key and trust store management articles, and a good summary article for the topic, I hope.

It’s clear from the design of SSLContext in the JSSE that Java key and trust stores are meant to contain static data. Yet browsers regularly display the standard security warning dialog when connecting to sites whose certificates have expired or whose administrators haven’t bothered to purchase a CA-signed certificate. This dialog generally offers you three choices:

  • Get me out of here!
  • I understand the risks: add certificate for this session only
  • I understand the risks: add certificate permanently

In this article, I’d like to elaborate on what it means to “add certificate” – either temporarily or permanently.

Let’s start with a simple Java http(s) client:

public byte[] getContentBytes(URI uri, SSLContext ctx)
    throws Exception {
  URL url = uri.toURL();
  URLConnection conn = url.openConnection();
  if (conn instanceof HttpsURLConnection && ctx != null) {
  InputStream is = conn.getInputStream();
  int bytesRead, bufsz = Math.max(is.available(), 4096);
  ByteArrayOutputStream os = new ByteArrayOutputStream(bufsz);
  byte[] buffer = new byte[bufsz];
  while ((bytesRead = > 0)
    os.write(buffer, 0, bytesRead);
  byte[] content = os.toByteArray();
  os.close(); is.close();
  return content;

This client opens a URLConnection, reads the input stream into a byte buffer, and then closes the connection. If the connection is https – that is, an instance of HttpsURLConnection – it applies the SocketFactory from the supplied SSLContext.

NOTE: I’m purposely ignoring exception managment in this article to keep it short.

This code is simple and concise, but clearly there’s no way to affect what happens during application of the SSL certificates and keys at this level of the code. Certificate and key management is handled by the SSLContext so if we want to modify the behavior of the SocketFactory relative to key management, we’re going to have to do something with SSLContext before we pass it to the client. The simplest way to get an SSLContext is to call SSLContext.getDefault in this manner:

byte[] bytes = getContentBytes(

The default SSLContext is fairly limited in functionality. It uses either default key and trust store files (and passwords!) or else ones specified in system properties – often via the java command line in this manner:

$ java \ \ \ ...

In reality, there is no default keystore, which is fine for normal situations, as most websites don’t require X.509 client authentication (more commonly referred to as mutual auth). The default trust store is $JAVA_HOME/jre/lib/security/cacerts, and the default trust store password is changeit. The cacerts file contains several dozen certificate authority (CA) root certificates and will validate any server whose public key certificate is signed by one of these CAs.

More importantly, however, the default SSLContext simply fails to connect to a server in the event that a trust certificate is missing from the default trust store. But that’s not what web browsers do. Instead, they display the aforementioned dialog presenting the user with options to handle the situation in the manner that suits him or her best.

Assume the simple client above is a part of a larger application that adds certificates to the trust store during execution of other code paths and then expects to be able to use this updated trust store later during the same session. This dynamic reload functionality requires some SSLContext customization.

Let’s explore. SSLContext is a great example of a composite design. It’s built from several other classes, each of which may be specified by the user when initializing a context object. This practically eliminates the need to sub-class SSLContext in order to define custom behavior. The default context is eschewed in favor of a user-initialized instance of SSLContext like this:

public SSLContext getSSLContext(String tspath) 
    throws Exception {
  TrustManager[] trustManagers = new TrustManager[] { 
    new ReloadableX509TrustManager(tspath) 
  SSLContext sslContext = SSLContext.getInstance("SSL");
  sslContext.init(null, trustManagers, null);
  return sslContext;

At the heart of this method is the instantiation of a new ReloadableX509TrustManager. The init method of SSLContext accepts a reference to an array of TrustManager objects. Passing null tells the context to use the default trust manager array which exihibits the default behavior mentioned above.

The init method also accepts two other parameters, to which I’ve passed null. The first parameter is a KeyManager array and the third is an implementation of SecureRandom. Passing null for any of these three parameters tells SSLContext to use the default. Here’s one implementation of ReloadableX509TrustManager:

class ReloadableX509TrustManager 
    implements X509TrustManager {
  private final String trustStorePath;
  private X509TrustManager trustManager;
  private List tempCertList 
      = new List();

  public ReloadableX509TrustManager(String tspath)
      throws Exception {
    this.trustStorePath = tspath;

  public void checkClientTrusted(X509Certificate[] chain, 
      String authType) throws CertificateException {
    trustManager.checkClientTrusted(chain, authType);

  public void checkServerTrusted(X509Certificate[] chain, 
      String authType) throws CertificateException {
    try {
      trustManager.checkServerTrusted(chain, authType);
    } catch (CertificateException cx) {
      addServerCertAndReload(chain[0], true);
      trustManager.checkServerTrusted(chain, authType);

  public X509Certificate[] getAcceptedIssuers() {
    X509Certificate[] issuers 
        = trustManager.getAcceptedIssuers();
    return issuers;

  private void reloadTrustManager() throws Exception {

    // load keystore from specified cert store (or default)
    KeyStore ts = KeyStore.getInstance(
    InputStream in = new FileInputStream(trustStorePath);
    try { ts.load(in, null); }
    finally { in.close(); }

    // add all temporary certs to KeyStore (ts)
    for (Certificate cert : tempCertList) {
      ts.setCertificateEntry(UUID.randomUUID(), cert);

    // initialize a new TMF with the ts we just loaded
    TrustManagerFactory tmf 
	    = TrustManagerFactory.getInstance(

    // acquire X509 trust manager from factory
    TrustManager tms[] = tmf.getTrustManagers();
    for (int i = 0; i < tms.length; i++) {
      if (tms[i] instanceof X509TrustManager) {
        trustManager = (X509TrustManager)tms[i];

    throw new NoSuchAlgorithmException(
        "No X509TrustManager in TrustManagerFactory");

  private void addServerCertAndReload(Certificate cert, 
      boolean permanent) {
    try {
      if (permanent) {
        // import the cert into file trust store
        // Google "java keytool source" or just ...
        Runtime.getRuntime().exec("keytool -importcert ...");
      } else {
    } catch (Exception ex) { /* ... */ }

NOTE: Trust stores often have passwords but for validation of credentials the password is not needed because public key certificates are publicly accessible in any key or trust store. If you supply a password, the KeyStore.load method will use it when loading the store but only to validate the integrity of non-public information during the load – never during actual use of public key certificates in the store. Thus, you may always pass null in the second argument to KeyStore.load. If you do so, only public information will be loaded from the store.

A full implementation of X509TrustManager is difficult and only sparsely documented but, thankfully, not necessary. What makes this implementation simple is that it delegates to the default trust manager. There are two key bits of functionality in this implementation: The first is that it loads a named trust store other than cacerts. If you want to use the default trust store, simply assign $JAVA_HOME/jre/lib/security/cacerts to trustStorePath.

The second bit of functionality is the call to addServerCertAndReload during the exception handler in the checkServerTrusted method. When a certificate presented by a server is not found in the trust manager’s in-memory database, ReloadableX509TrustManager assumes that the trust store has been updated on disk, reloads it, and then redelegates to the internal trust manager.

A more functional implementation might display a dialog box to the user before calling addServerCertAndReload. If the user selects Get me out of here!, the method would simply rethrow the exception instead of calling that routine. If the user selects It’s cool: add permanently, the method would add the certificate to the file-based trust store, reload from disk, and then reissue the delegated request. If the user selects I’ll bite: add temporarily, the certificate would be added to a list of temporary certificates in memory.

The way I’ve implemented the latter case is to add the certificate to a temporary list and then reload from disk. Strictly speaking, reloading from disk isn’t necessary in this case since no changes were made to the disk file but the KeyStore built from the disk image would have to be kept around for reloading into the trust manager (after the new cert was added to it), so some modifications would have to be made to avoid reloading from disk.

This same code might as well be used in a server-side setting but the checkClientTrusted method would have to be modified instead of the checkServerTrusted method as in this example.

Media Center Disappoints Again

My wife and I like to watch old movies, murder mysteries, and reruns of 80’s sitcoms. Our content of choice pretty much mandates that we watch streaming video; NetFlix is our provider of choice today. We watch on a regular basis between 9 and 10 at night because our kids won’t let us near the TV at any other time. And that’s okay. The point is, by the time we get around to settling down to a good movie, the kids are in bed and we can’t really use the home theater. We have to watch in our bedroom to keep the noise level down.

To facilitate this habit, I’ve strung a VGA cable and a ministereo-to-RCA cable from the back of the flat panel TV in our bedroom to a shelf a few feet away where I set the laptop when streaming NetFlix content. This works well, but short of paying for various accessories like an air mouse, I’ve had to jump up and hit the pause button occasionally, whenever we’ve wanted to discuss a point in the movie – usually a mystery.

Yesterday I upgraded my HTPC (64-bit 2.84 GHz quad processor AMD) from Vista Ultimate to Windows 7 and found the new features of 7MC to be nothing less than wonderful. The user interface enhancments are exactly what you’d expect, given the UI enhancments between Vista and Windows 7. The new TV card management software in 7MC is much better than the Vista version. The new Guide actually represents all of the available channels now (plus a few I didn’t even know about!), as opposed to the previous 30 percent coverage provided by the Vista MC Guide. The Internet TV feature is now out of beta, and seems pretty nice. And lastly, and most importantly, the integrated NetFlix interface is just nothing short of cool.

Or so I thought. But Microsoft seems to think that I’ll only ever want to watch streaming content on my PC. I just don’t get that mindset. Who do you know that sits at their desk and watches TV?! I’ll answer for you: no one. Neither NetFlix nor the new Internet TV features are supported on the extender interface. Thus, my only options for watching NetFlix on my TV remain as follows:

  • Connect my HTPC directly to my TV.
  • Continue to use my laptop as above.

As far as connecting my HTPC to my home theatre system, well, that was always the ultimate goal, but I’ve grown accustom to having the unit in the play room, where I could mess with new features in comfort with convenient access to its insides if necessary. I’ve sort of allowed myself to dream of the possibility that the XBox extender would just become better through the years until it finally did everything I wanted it to.

Recently, I read a comment on a blog somewhere that indicated that Microsoft was motivated (monitarily, of course) to NOT allow streaming content on the XBox extender. The rationale was that the XBox console was using NetFlix streaming content as a hook to get people to buy XBox Live Gold memberships. Well, I’ll be hanged if I’ll pay a subscription fee just to get extender support for a service I already pay NetFlix for.

Wake up Microsoft! These are two different market segments. Will gamers mind paying for an XBox Live Gold subscription? No, they’re already paying for a subscription anyway. Will they go out and buy an HTPC so they don’t have to have that subscription to stream NetFlix? No, they bought that subscription for other reasons (games).

On the other hand, will home theatre enthusiasts pay for XBox Live just so they can stream NetFlix content to their big screen or bedroom? Possibly a few, but most (like myself) will be too angry with the marketing tactics to play along. Will they buy a Gold subscription because they might want to play games too? Possibly a few will, but mostly, gamers are gamers, and home theatre enthusiasts are into movies.

So why emmasculate Media Center and alienate your hardware partners by disallowing some of the most enticing reasons to get an extender? Again I say, wake up…please. I might buy an XBox for my bed room just to use as an extender – if I had a good enough reason.

RESTful Authentication

My last post on RESTful transactions sure seemed to attract a lot of attention. There are a number of REST discussion topics that tend to get a lot of hand-waving by the REST community, but no real concrete answers seem to be forthcoming. I believe the most fundamental reasons for this include the fact that the existing answers are unpalatable – both to the web services world at large, and to REST purists. Once in a while when they do mention a possible solution to a tricky REST-based issue, the web services world responds violently – mostly because REST purists give answers like “just don’t do that” to questions like “How do I handle session management in a RESTful manner?”

I recently read an excellent treatise on the subject of melding RESTful web services concepts with enterprise web service needs. Benjamin Carlyle’s Sound Advice blog entry, entitled The REST Statelessness Constraint hits the mark dead center. Rather than try to persuade enterprise web service designers not to do non-RESTful things, Benjamin instead tries to convey the purposes behind REST constraints (in this case, specifically statelessness), allowing web service designers to make rational tradeoffs in REST purity for the sake of enterprise goals, functionality, and performance. Nice job Ben!

The fact is that the REST architectural style was designed with one primary goal in mind: to create web architectures that would scale well to the Internet. The Internet is large, representing literally billions of clients. To make a web service scale to a billion-client network, you have to make hard choices. For instance, http is connectionless. Connectionless protocols scale very well to large numbers of clients. Can you imagine a web server that had to manage 500,000 simultaneous long-term connections?

Server-side session data is a difficult concept to shoehorn into a RESTful architecture, and it’s the subject of this post. Lots of web services – I’d venture to say 99 percent of them – manage authentication using SSL/TLS and the HTTP “basic auth” authentication scheme. They use SSL/TLS to keep from exposing a user’s name and password over the wire, essentially in clear text. They use basic auth because it’s trivial. Even banking institutions use this mechanism because, for the most part, it’s secure. Those who try to go beyond SSL/TLS/basic auth often do so because they have special needs, such as identity federation of disparate services.

To use SSL/TLS effectively, however, these services try hard to use long-term TCP connections. HTTP 1.0 had no built-in mechanism for allowing long-term connections, but NetScape hacked in an add-on mechanism in the form of the “connection: keep-alive” header, and most web browsers support it, even today. HTTP 1.1 specifies that connections remain open by default. If an HTTP 1.1 client sends the “connection: close” header in a request then the server will close the connection after sending the response, but otherwise, the connection remains open.

This is a nice enhancement, because it allows underlying transport-level security mechanisms like SSL/TLS to optimize transport-level session management. Each new SSL/TLS connection has to be authenticated, and this process costs a few round-trips between client and server. By allowing multiple requests to occur over the same authenticated sesssion, the cost of transport-level session management is amortized over several requests.

In fact, by using SSL/TLS mutual authentication as the primary authentication mechanism, no application state need be maintained by the server at all for authentication purposes. For any given request, the server need only ask the connection layer who the client is. If the service requires SSL/TLS mutual auth, and the client has made a request, then the server knows that the client is authenticated. Authorization (resource access control) must still be handled by the service, but authorization data is not session data, it’s service data.

However, SSL/TLS mutual auth has an inherent deployment problem: key management. No matter how you slice it, authentication requires that the server know something about the client in order to authenticate that client. For SSL/TLS mutual auth, that something is a public key certificate. Somehow, each client must create a public key certificate and install it on the server. Thus, mutual auth is often reserved for the enterprise, where key management is done by IT departments for the entire company. Even then, IT departments cringe at the thought of key management issues.

User name and password schemes are simpler, because often web services will provide users a way of creating their account and setting their user name and password in the process. Credential management done. Key management can be handled in the same way, but it’s not as simple. Some web services allow users to upload their public key certificate, which is the SSL/TLS mutual-auth equivalent of setting a password. But a user has to create a public/private key pair, and then generate a public key certificate from this key pair. Java keytool makes this process as painless as possible, but it’s still far from simple. No – user name and password is by far the simpler solution.

As I mentioned above, the predominant solution today is a combination of CA-based transport-layer certificate validation for server authentication, and HTTP basic auth for client authentication. The web service obtains a public/private key pair that’s been generated by a well-known Certificate Authority (CA). This is done by generating a certificate signing request using either openssl or the Java keytool utility (or by using less mainstream tools provided by the CA). Because most popular web browsers today ship well-known CA certificates in their truststores, and because clients implicitly trust services that provide certificates signed by these well-known CA’s, people tend to feel warm and fuzzy because no warning messages pop up on the screen when they connect to one of these services. Should they fear? Given the service verification process used by CAs like Entrust and Verisign, they probably should, but that problem is very difficult to solve, so most people just live with this stop-gap solution.

On the server side, the web service needs to know the identity of the client in order to know what service resources that client should have access to. If a client requests a protected resource, the server must be able to validate that client’s right to the resource. If the client hasn’t authenticated yet, the server challenges the client for credentials using a response header and a “401 Unauthorized” response code. Using the basic auth scheme, the client base64-encodes his user name and password and returns this string in a response header. Now, base64 encoding is not encrytion, so the client is essentially passing his user name and password in what amounts to clear text. This is why SSL/TLS is used. By the time the server issues the challenge, the SSL/TLS encrypted channel is already established, so the user’s credentials are protected from even non-casual snoopers.

When the proper credentials arrive in the next attempt to request the protected resource, the server decodes the user name and password, verifies them against its user database, and either returns the requested resource, or fails the request with “401 Unauthorized” again, if the user doesn’t have the requisite rights to the requested resource.

If this was the extent of the matter, there would be nothing unRESTful about this protocol. Each subsequent request contains the user’s name and password in the Authorization header, so the server has the option of using this information on each request to ensure that only authorized users can access protected resources. No session state is managed by the server here. Session or application state is managed by the client, using a well-known protocol for passing client credentials on each request – basic auth.

But things don’t usually stop there. Web services want to provide a good session experience for the user – perhaps a shopping cart containing selected items. Servers typically implement shopping carts by keeping a session database, and associating collections of selected items with users in this database. How long should such session data be kept around? What if the user tires of shopping before she checks out, goes for coffee, and gets hit by a car? Most web services deal with such scenarios by timing out shopping carts after a fixed period – anywhere from an hour to a month. What if the session includes resource locks? For example, items in a shopping cart are sometimes made unavailable to others for selection – they’re locked. Companies like to offer good service to customers, but keeping items locked in your shopping cart for a month while you’re recovering in the hospital just isn’t good business.

REST principles dictate that keeping any sort of session data is not viable for Internet-scalable web services. One approach is to encode all session data in a cookie that’s passed back and forth between client and server. While this approach allows the server to be completely stateless with respect to the client, it has its flaws. First, even though the data is application state data, it’s still owned by the server, not the client. Most clients don’t even try to interpret this data. They just hand it back to the server on each successive request. But this data is application state data, so the client should manage it, not the server.

There’s no good answers to these questions yet. What it comes down to is that service design is a series of trade-offs. If you really need your web service to scale to billions of users, then you’d better find ways to make your architecture compliant with REST principles. If you’re only worried about servicing a few thousand users at a time, then perhaps you can relax the constraints a bit. The point is that you should understand the constraints, and then make informed design decisions.

RESTful Transactions

I was reading recently in RESTful Web Services (Leonard Richardson & Sam Ruby, O’Reilly, 2007) about how to implement transactional behavior in a RESTful web service. Most web services today do this with an overloaded POST operation, but the authors assert that this isn’t necessary.

Their example (in Chapter Eight) uses the classic bank account transaction scenario, where a customer wants to transfer 50 dollars from checking to savings. I’ll recap here for your benefit. Both accounts start with 200 dollars. So after a successful transaction, the checking account should contain 150 dollars and the savings account should contain 250 dollars. Let’s consider what happens when two clients operate on the same resources:

Client A -> Read account: 200 dollars
Client A -> Withdraw 50 dollars: 200 - 50 = 150 dollars
Client A -> Write account: 150 dollars

Client B -> Read account: 150 dollars
Client B -> Withdraw 50 dollars: 150 - 50 = 100 dollars
Client B -> Write account: 100 dollars

This is all well and good until you consider that the steps in these operations might not be atomic. Transactions protect against the following situation, wherein the separate steps of these two Clients’ operations are interleaved:

Client A -> Read account: 200 dollars
Client B -> Read account: 200 dollars
Client A -> Withdraw 50 dollars: 200 - 50 = 150 dollars
Client B -> Withdraw 50 dollars: 200 - 50 = 150 dollars
Client A -> Write account: 150 dollars
Client B -> Write account: 150 dollars

After both operations, the account should contain 100 dollars, but because no account locking was in effect during the two updates, the second withdrawal is lost. Thus 100 dollars was physically removed from the account, but the account balance reflects only a 50 dollar withdrawal. Transaction semantics would cause the following series of steps to occur:

Client A -> Begin transaction
Client A -> Read account: 200 dollars
Client B -> Begin Transaction (block)
Client A -> Withdraw 50 dollars: 200 - 50 = 150 dollars
Client A -> Write account: 150 dollars
Client A -> Commit transaction
Client B -> (unblock) Read account: 150 dollars
Client B -> Withdraw 50 dollars: 150 - 50 = 100 dollars
Client B -> Write account: 100 dollars
Client B -> Commit transaction

Web Transactions

The authors’ approach to RESTful web service transactions involves using POST against a “transaction factory” URL. In this case /transactions/account-transfer represents the transaction factory. The checking account is represented by /accounts/checking/11 and the savings account by /accounts/savings/55.

Now, if you recall from my October 2008 post, PUT or POST: The REST of the Story, POST is designed to be used to create new resources whose URL is not known in advance, whereas PUT is designed to update or create a resource at a specific URL. Thus, POSTing against a transaction factory should create a new transaction and return its URL in the Location response header.

A user might make the following series of web requests:

GET /transaction/account-transfer/11a5/accounts/checking/11 HTTP/1.1
200 Ok

GET /transaction/account-transfer/11a5/accounts/savings/55 HTTP/1.1
200 Ok


The fact that the client reads the account balances before beginning is implied by the text, rather than stated explicitly. At some later time (hopefully not much later) the transaction is started:

POST /transaction/account-transfer HTTP/1.1
201 Created
Location: /transaction/account-transfer/11a5
PUT /transaction/account-transfer/11a5/accounts/checking/11 HTTP/1.1

200 Ok
PUT /transaction/account-transfer/11a5/accounts/savings/55 HTTP/1.1

200 Ok
PUT /transaction/account-transfer/11a5 HTTP/1.1

200 Ok

At first glance, this appears to be a nice design, until you begin to consider the way such a system might be implemented on the back end. The authors elaborate on one approach. They state that documents PUT to resources within the transaction might be serialized during building of the transaction. When the transaction is committed the entire set of serialized operations could then be executed by the server within a server-side database transaction. The result of committing the transaction is then returned to the client as the result of the client’s commit on the web transaction.

However, this can’t work properly, as the server would have to have the client’s view of the original account balances in order to ensure that no changes had slipped in after the client had read the accounts, but before the transaction was committed (or even begun!). As it stands, changes could be made by a third-party to the accounts before the new balances are written and there’s no way for the server to ensure that these other modifications are not overwritten by outdated state provided by the transaction log. It is, after all, the entire purpose of a transaction to protect a database against this very scenario.

Fixing the Problem

One way to make this work is to include account balance read (GET) operations within the transaction, like this:

POST /transaction/account-transfer HTTP/1.1
201 Created
Location: /transaction/account-transfer/11a5
GET /transaction/account-transfer/11a5/accounts/checking/11 HTTP/1.1
200 Ok

PUT /transaction/account-transfer/11a5/accounts/checking/11 HTTP/1.1

200 Ok
GET /transaction/account-transfer/11a5/accounts/savings/55 HTTP/1.1
200 Ok

PUT /transaction/account-transfer/11a5/accounts/savings/55 HTTP/1.1

200 Ok
PUT /transaction/account-transfer/11a5 HTTP/1.1

200 Ok

The GET operations would, of course, return real data in real time. But the fact that the accounts were read within the transaction would give the server a reference point for later comparison during the execution of the back-end database transaction. If the values of either account balance are modified before the back-end transaction is begun, then the server would have to abort the transaction and the client would have to begin a new transaction.

This mechanism is similar in operation to lock-free data structure semantics. Lock-free data structures are found in low-level systems programming on symmetric multi-processing (SMP) hardware. A lock-free data structure allows multiple threads to make updates without the aid of concurrency locks such as mutexes and spinlocks. Essentially, the mechanism guarantees that an attempt to read, update and write a data value will either succeed or fail in a transactional manner. The implementation of such a system usually revolves around the concept of a machine-level test and set operation. The would-be modifier, reads the data element, updates the read copy, and then performs a conditional write, wherein the condition is that the value is the same as the originally read value. If the value is different, the operation is aborted and retried. Even under circumstances of high contention the update will likely eventually occur.

How this system applies to our web service transaction is simple: If the values of either account are modified outside of the web transaction before the back-end database transaction is begun (at the time the commit=true document is PUT), then the server must abort the transaction (by returning “500 Internal server error” or something). The client must then retry the entire transaction again. This pattern must continue until the client is lucky enough to make all of the modifications within the transaction that need to be made before anyone else touches any of the affected resources. This may sound nasty, but as we’ll see in a moment, the alternatives have less desirable effects.

Inline Transaction Processing

Another approach is to actually have the server begin a database transaction at the point where the transaction resource is created with the initial POST operation above. Again, the client must read the resources within the transaction. Now the server can guarantee atomicity — and data integrity.

As with the previous approach, this approach works whether the database uses global- or resource-level locking. All web transaction operations happen in real time within a database transaction, so reads return real data and writes happen during the write requests, but of course the writes aren’t visible to other readers until the transaction is committed.

A common problem with this approach is that the database transaction is now exposed as a “wire request”, which means that a transaction can be left outstanding by a client that dies in the middle of the operation. Such transactions have to be aborted when the server notices the client is gone. Since HTTP is a stateless, connectionless protocol, it’s difficult for a server to tell when a client has died. At the very least, database transactions begun by web clients should be timed out. Unfortunately, while timing out a database transaction, no one else can write to the locked resources, which can be a real problem if the database uses global locking. Additional writers are blocked until the transaction is either committed or aborted. Locking a highly contended resource over a series of network requests can significantly impact scalability, as the time frame for a given lock has just gone through the ceiling.

It’s clear that creating proper RESTful transaction semantics is a tricky problem.