A couple of weeks ago, I ran into a peculiar problem that I think might be useful to talk about. It took me a bunch of googling and hacking around to find a solution — hopefully it will save someone else some time in the future.
Here’s a simplistic version of the problem. Let’s say we want to store and display user profile information in our web application. The user profile has a list of locations they’ve lived in the past. A location comprises a city and country. In JSON notation, it would look like this:
When Ozzy Osbourne logs into the service, we’d like to show this information in a pretty manner. Let’s look at what’s needed to build this app. You can refer to my earlier post on the technologies I’ve chosen for my application, but very quickly, I use:
- PostgreSQL as my DB (in Heroku)
- Dropwizard to build my service
- JDBI to connect to the DB
- Jackson for going from Java objects to JSON and back
- Angular App that can consume JSON
The same user profile information lives in three forms (wire, memory and persistent) and we need a way to move between these representations as shown.
We’ve already discussed the JSON format. The most convenient Java form was an User object with a member variable capturing a list of Location objects.
The nice thing is that Jackson can automatically convert from our JSON format to Java objects as shown above. The problematic part was mapping the Java object to the DB.
When storing user profile in a database, we have two choices:
- Store location information inline with a user record
- Store location in a separate table with foreign key relationships
Let’s evaluate option 2 first. We can create an additional locations table. Where do we put the foreign key? The problem here is that location is a first class object — there is only one Birmingham, England. How do you capture the fact that multiple users are from Birmingham, England? The denormalized way of doing this is to use a third table as shown here.
Every time we want to retrieve the user profile, the DB would need to execute a three-way join! Seems kind of heavy weight. There are situations where this denormalized form is useful — but for our simple application, this is heavy-weight. In my application, this pattern was recurring often and implementing denormalized version wasn’t efficient.
Let’s now look at option 1. Most users would’ve lived in 1 or 2 locations and so it is likely to be efficient to store the location information in a field in the users table. I went down this path, but found that this path was a little rocky as well.
JDBI doesn’t have great support out of the box for nested objects. It is provides hooks to be able to implement the to and fro from Java objects to DB rows. To go from objects to DB, we need to implement a BinderFactory and to go from DB resultsets to objects, we need to implement a ResultSetMapperFactory. This isn’t well documented in the JDBI docs, but I found this github project very instructive.
In a nutshell, I map simple Java types to JDBC data types and non-simple types are converted into a JSON string using Jackson (I handle DateTime type in a special manner). One other thing I do is I convert camel case Java field names to lower underscore db field names. E.g. createdAt in Java becomes created_at in DB.
Thus, the list of locations gets serialized into a string and gets mapped to a varchar type in the DB. I simply do the inverse while going from DB to Java object. One thing I will investigate at some point in the future is to map these nested objects to JSON data type in PostgreSQL. If someone is interested in getting their hands on the actual code, please leave a comment and I’ll take the effort of putting it on github :)
One fundamental problem with storing the nested object as varchars is that its not easily queryable. If you’re interested in getting a list of users from Birmingham, this becomes problematic. PostgreSQL has functions that can convert varchar to JSON and apply JSON functions to it. This becomes a bruteforce search and the DB can’t employ indexes. The de-normalized form works better if this is a common pattern. In my case, it wasn’t.
As an aside I was curious if other ORM frameworks provide a better out of the box solution and I haven’t been able to find any. Hibernate would force you to implement option 2. Python Django also supports option 2 with custom serializers. Rails ActiveRecord also supports option 2 using nested attributes. Please let me know if I’ve missed anything.
Hope this post was useful. I definitely appreciate feedback / comments. Please connect with me on twitter and say hi.