Complexity–or how hard is it to display a list of 3,000 items in a table on MacOS X anyway?

The laptop on my desk can do around 2,403 MWIPS, around 100 times faster than the Cray X-MP/24 of yesteryears, and has easily more than 100 times the memory. (For reference, NASA clocked the Cray X-MP/24 at 25 MWIPS, and the system came with either 16 or 32 megabytes of main memory, in 1 megabyte banks.)

So how friggin’ hard is it for Outlook Express to display the summary lines of 3,000 e-mails on my MacBook Pro?

separator.png

Here’s the thing. I suspect (though I don’t know because I don’t work for Microsoft) that rather than suck in the headers into memory from an e-mail file, storing those headers in a linked list and then, as the user scrolls up or down in the scroll bar, the system runs a pointer to the first point of the linked list, and simply does a “drawRow(…); ptr = ptr->next;” operation, instead the system is using something like SQLite for storage and doing a database query to the SQL engine, like “SELECT … ORDERED BY … LIMIT (n-rows) OFFET (scroll-amount)”.

Then, behind the scenes, the SQL statement is then compiled into a byte code for a SQL interpreter engine, the engine then interprets the byte code using a byte code interpreter, with each instruction pulling in elements of the table (stored in a way which is optimized for reducing disk space while arbitrary records are added and removed) into an on-board cache, which sometimes is missed because–well, we’re only interested in a dozen or so rows displayed in the summary table, right?

The end result is that a computer 100 times faster than the fastest computer in the world from 30 years ago (which, honestly, was pretty blindingly fast) and which has more than enough memory to store four encoded movies in RAM, hiccups and stutters when I scroll through my e-mails.

Really?

separator.png

I’d like to call this lazy, but I really can’t: wiring up the UI of Oracle to a back-end SQL engine, then maintaining the link of those systems in such a way which allows you to page through a few thousand e-mails is not a trivial task. Nevermind that SQL was designed in part to reduce its memory footprint for large datasets that cannot be stored in memory, and we’re using it on an e-mail database which could be stored in 1/100th of the available RAM on my system.

Instead, I think it’s because software development today tends to be more about marrying off-the-shelf components without thinking about what those off-the-shelf components actually do–or even if (as in the case of tracking 3,000 e-mails on a system that would have been considered 20 years ago a super-computer whose computational power made it a Federal crime to export overseas) the off-the-shelf components are even warranted.

Now if my e-mail box had, say, 1 million e-mail messages, and a total disk footprint of, say, 2 gigabytes–then I could see using a complicated relational database engine to manage those e-mails. But instead, in an effort to reduce the magnitude of the exponent in an O(n**e) operation, we forget that there is a constant C which can often dominate.

Which is why an e-mail system which simply parses all 3,000 e-mails into a linked list (and which would handle operations in O(n) time) would be far faster than a complicated system using a relational database that perhaps may run at a theoretical O(n**0.5) time, but whose constant C for each operation is measured in milliseconds verses nanoseconds for the former.

separator.png

I wish we would as an industry spend less time thinking about the tools and toys which we use to solve a problem, and more time thinking about the domain the problem resides in–and selecting the appropriate tools given the domain, rather than simply asserting the latest and greatest gee-wiz gadget is the right answer regardless of the question.

I’ve seen tons of examples of this sort of thing. Adobe Air used to build a user interface because, well, it’s pretty–without thinking for a second if it can even be deployed on the target devices used by people in the field. Javascript/AJAX web sites used for people who travel and whose devices do not have a consistent or reliable Internet connection. Sending and receiving full queries across a slow EDGE network connection when on-board caching would reduce the queries to status updates (or even allow the system to run without any queries whatsoever).

We don’t think of the domain of the problem or figure out the priorities of the components of that problem or pick the appropriate tools and technologies.

And so I’m stuck with a computer that is a marvel of advanced technology, a super-computer in a briefcase capable of running a simulation that can model the solution to Schrodinger’s Equation fast enough to allow a physicist to fiddle with the inputs and see the outputs in real time, run like a fucking dog when scrolling through my morning’s e-mails.

1 thought on “Complexity–or how hard is it to display a list of 3,000 items in a table on MacOS X anyway?

  1. Meh, probably not.
    Most of those layers have very little cost on a modern CPU.

    It’s much more likely the case that the size of your inbox has exceeded the hard-wired size of some internal cache in the software and scrolling keeps triggering a cache re-fill.

    Probably because the cache was sized back when computers had 64M of RAM.

    Like

Leave a comment