Long time, no blog

I haven’t blogged in over a year. I’ve been busy working at, which I joined last March, and parenting.

I feel healthier because I’ve run 4-5 days a week since May (though I’ve been slacking slightly the past few weeks since the belated arrival of winter weather disrupted my after-work run ritual). I highly recommend regular exercise to everyone to boost physical and mental health. The best way to exercise consistently is to make it a regular habit. Pick a few specific times each week and commit to regularly exercising then. Something is better than nothing, so just go out and sweat, but don’t sweat it if you’re not up for a full workout each day. Just work out consistently.

I’m proud my kids are doing quite well at chess. They’ve learned to win and lose gracefully. They’ve learned that success requires hard work. As I write this, Lia (6) is the 14th-rated girl 7 and under and Daryl is the 76th-rated 9-yr-old. They’re also working hard to learn Chinese, and Daryl enjoys playing hockey.

If you’re a programmer, you might like my ever-growing list of tech resources. I’m eagerly learning Elm and Elixir/Erlang, but — because I work long hours and spend much of my weekends with the kids — it has been hard to devote as much time to them as I would like. I’m excited about functional programming; it’s the future because functional code is much more maintainable and scalable than object-oriented code.

Posted by James on Jan 17, 2016

Getting started with Docker

A Hedgeye colleague who has suffered similar pain from Chef as I have (especially unstable APIs) suggested we switch to Docker, and we’re now pursuing that. So far, we both like it, though we haven’t yet done anything complicated.

To help our future selves and our colleagues use Docker, I’m going to document some stuff here.

To install Docker on my Mac, I followed this nice guide from Chris Jones.

With Docker running, getting started is super easy. Just type:

docker run -it ubuntu /bin/bash

Docker will then pull down a Ubuntu image (unless it has already done so, in which case it will use what you already have), spin up an instance running just bash console (the command you told it to run), and drop you into the console session.

When you’re finished interacting with your instance, type exit and it will log you out and then terminate the instance. It terminates because Docker’s approach is to spin up a container running just one process and shut that container down when the initial process is no longer active. From the initial process, you can, of course, spawn additional processes. But whenever the special initial process closes, your container dies.

Here are some basic Docker commands.

[ More to come ]

Posted by James on Dec 03, 2014

Easily manage Python environments with Anaconda

Lately, I’ve been doing a lot of Python data analysis. Though Python has many strengths, package management has long been a nightmare.

Continuum Analytics has created several wonderful free open-source projects, perhaps most notably Anaconda, which makes installing Python and packages much, much easier, esp. if you want to maintain multiple Python environments, which you probably do, esp. if you want to run both Python 2 and Python 3.

I’ve just hit on a workflow that enables me to keep my environment up to date without risk of breaking stuff.

I currently have two environments, the default 2.7 environment and a 3.4 environment I use most of the time:

→ conda info -e
# conda environments:
py34                  *  /Users/JLavin/Applications/Anaconda/anaconda/envs/py34
root                     /Users/JLavin/Applications/Anaconda/anaconda

I want to update many packages in py34 that have gone stale, but I’m afraid something might break. So I run:

conda list -n py34 --export > ~/Python/conda_packages_20140911

This creates a file that allows me to clone my current py34 environment with a simple command:

conda create --name oldpy34 --file ~/Python/conda_packages_20140911

Hopefully, I won’t need this, but it’s a super simple insurance policy in case anything goes awry.

Now, let’s try updating my current py34 environment:

±  |78152348-media-content-category-cleanup ✗| → conda update --all
Fetching package metadata: ..
Solving package specifications: .
Package plan for installation in environment /Users/JLavin/Applications/Anaconda/anaconda/envs/py34:

The following packages will be downloaded:

package                    |            build
astroid-1.2.1              |           py34_0         189 KB
astropy-0.4.1              |       np18py34_0         4.9 MB
bcolz-0.7.1                |       np18py34_0         324 KB
beautiful-soup-4.3.2       |           py34_0         114 KB
binstar-0.5.5              |           py34_0          68 KB
xlsxwriter-0.5.7           |           py34_0         165 KB
xz-5.0.5                   |                0         132 KB
                                       Total:       121.2 MB

The following NEW packages will be INSTALLED:

bcolz:             0.7.1-np18py34_0
cytoolz:           0.7.0-py34_0
decorator:         3.4.0-py34_0
toolz:             0.7.0-py34_0
xz:                5.0.5-0

The following packages will be UPDATED:

astroid:           1.1.1-py34_0        --> 1.2.1-py34_0
astropy:           0.3.2-np18py34_0    --> 0.4.1-np18py34_0
beautiful-soup:    4.3.1-py34_0        --> 4.3.2-py34_0
binstar:           0.5.3-py34_0        --> 0.5.5-py34_0
blaze:             0.5.0-np18py34_1    --> 0.6.3-np18py34_0
bokeh:             0.4.4-np18py34_1    --> 0.6.0-np18py34_0
colorama:          0.2.7-py34_0        --> 0.3.1-py34_0
configobj:         5.0.5-py34_0        --> 5.0.6-py34_0
cython:            0.20.1-py34_0       --> 0.21-py34_0
datashape:         0.2.0-np18py34_1    --> 0.3.0-np18py34_1
docutils:          0.11-py34_0         --> 0.12-py34_0
dynd-python:       0.6.2-np18py34_0    --> 0.6.5-np18py34_0
tornado:           3.2.1-py34_0        --> 4.0.1-py34_0
werkzeug:          0.9.6-py34_0        --> 0.9.6-py34_1
xlsxwriter:        0.5.5-py34_0        --> 0.5.7-py34_0

Proceed ([y]/n)? y

The update succeeded, so my environment is now totally up to date. Thanks, Continuum Analytics! But the update could have failed. Or it could have succeeded but one or more of the updated packages could have broken my applications in ways I don’t like, causing me to want to roll back to where I began and update more selectively.

Having a snapshot of my environment and the ability to instantly recreate it gives me peace of mind.

Posted by James on Sep 11, 2014

Reason #427 why I hate proprietary operating systems

At home, I run Linux machines, my wife is on a Mac, and my kids and in-laws are on Windows laptops. (I’ll transition my kids to Linux as they move into programming.)

Because of this heterogeneity, I like to format my external hard drives with multiple partitions, each for a different OS.

But this can be a huge pain. I formatted a 3 TB hard drive with a Windows partition, a Linux partition, and space for a Mac (HFS+) partition. But my wife’s MacBook Pro’s Disk Utility refused to create an HFS+ partition on the third physical partition, complaining that the hard drive has a Master Boot Record. I wasn’t trying to create a bootable partition, but Mac OS didn’t care. Using my Linux machine (and the “hfsprogs” package), I managed to format the partition as HFS+. It shocked me that Linux could create a Mac-formatted partition where a Mac couldn’t.

My wife’s MacBook agreed it was an HFS+ partition in good state, but it still refused to let TimeMachine back up to it because it was a non-journaled HFS+ partition. GParted can’t create a journaled HFS+ partition.

I finally surrendered and threw away all my partitions and let the MacBook Pro take the first physical spot on the hard drive. TimeMachine is finally running. I won’t know whether I can use the unformatted 2TB of space for Windows or Linux till it finishes. Proprietary OSes are so annoying!

Posted by James on Mar 21, 2014

Database tuning: Triggers & materialized views

Had fun today at work tuning a Postgres database that has gotten very slow over the years as it has accumulated many gigabytes of data. This app is frequently rendered inoperable by just one or two users visiting its home page, which is obviously a bad situation. (Luckily, it’s an in-house tool used almost exclusively by a single user, which is why it hasn’t received more love before now.)

Over the past week, I’ve been recording the slowest queries, and today I started attacking them. The easiest-to-fix were the ones caused by missing indexes. Another problem I found was unnecessary overhead from two compound indexes that were indexing the same two columns with opposite orderings; I turned one into a single-column index, which should produce similar read performance and superior write performance.

A third fix I proposed was adding a field for the calculated value of md5(email). Some queries have been doing full-table searches of md5(email). I don’t understand why that’s necessary, but having to calculate md5() for every row in the table and then scanning the whole table sounds pretty inefficient. So I created a named function for calculating md5(email) and a trigger that calls the function whenever a table record is added or modified. Doing this at the database layer makes sense because Rails doesn’t need to know anything about md5(email).

I also created my first Postgres materialized view today. Another query can occasionally take 40+ seconds on our server. The same query normally runs orders of magnitude faster, so I’m not sure what causes such long delays. But it’s doing a join that involves calculating a count on a large table. My first thought was to add a counter cache, but that didn’t make sense when I looked at the table layout. I instead made a materialized view, which worked well on my static copy of the production database. But when I went to the Postgres documentation, I discovered two flaws with Postgres 9.3’s materialized view implementation: 1) Updating the materialized view is a manual process; and, 2) Updating the materialized view takes a full lock on the view. So I’m not sure it’s worth pushing to production, but I’m glad to read that Postgres devs are already working to improve the implementation of materialized views.

Posted by James on Mar 21, 2014