We're incredibly excited to announce that Akiban has been acquired by FoundationDB. Together we are closing the gap between SQL and NoSQL in our quest to disrupt the database market.

Read more in our blog post →

Posted by .

This post introduces a fundamental concept here at Akiban: hKey’s. hKeys are the building block that will facilitate later posts on our operator framework and how to interpret Akiban explain plans.

First off, you will be wondering about the H in hKey. The H is shorthand for hierarchical and refers to how data is organized within the Akiban Server. The H organization of data interleaves rows according to group structure. Consider first the Customer, Orders, and Items schema that we know and love. In Akiban, we would create a Customer group based on this relationship. As mentioned in previous posts, this means the Akiban Server will actually store Customers, Orders, and Items interleaved; basically pre-joined. In this post I will explain how this ordering is realized within the Akiban Server using hKey’s. hKey’s are defined so that ordering by hKey produces the desired interleaving of rows.

For example, the root table in this group, Customer, has an hKey [C, [cid]], where C identifies the Customer table and cid is the value of the cid (PK) column for the row. Similarly, the Order hKey is [C, [cid], O, [oid]]. Finally, the hKey of Item is [C, [cid, O, [oid], I, [iid]]. In general, a table’s hKey includes the parent’s hKey as a prefix, guaranteeing that parents are ordered before their children. In practice, the H organization of data is implemented using a java B-Tree product named Persistit developed internally at Akiban in which the key is a hKey, and the value is the entire row.

Let’s now consider how this works with some sample data. Consider the following hKey’s and the actual rows corresponding to a hKey:

hKey table row
C1 Customer 1, ‘padraig’
C1O10 Order 10, 1, ‘08/31/2011’
C1O10I101 Item 101, 10, $10.00, ‘Astral Weeks’
C1O10I102 Item 102, 10, $10.00, ‘Moondance’
C2 Customer 2, ‘sarah’
C2O13 Order 13, 2, ‘09/06/2011’
C2O13I103 Item 103, 13, $10.00, ‘Abba’
C2O13I104 Item 104, 13, $59.99, ‘Madden NFL 12’
C3 Customer 3, ‘jeffrey’
C3O14 Order 14, 3, ‘09/07/2011’
C3O14I105 Item 105, 14, $10.00, ‘Dead Flowers’
C3O14I106 Item 106, 14, $10.00, ‘The Man In Me’

Now if we want to find all orders and items for customer 2, we can simply perform a scan of the B-Tree structure starting right after the [C, 2] hKey and finishing with the [C, 3] hKey.

As you may have noticed in the previous example, a hKey typically takes the form:

TABLE KEY_VALUE TABLE KEY_VALUE TABLE KEY_VALUE

In practice, we don’t use a letter to represent a table in the hKey; instead we use the concept of a table ordinal. A table ordinal is simply an integer that corresponds to a table within a group. Table ordinals are assigned incrementally when groups are created. In the COI example just discussed, the table ordinals assigned would be:

  • Customer – 1
  • Order – 2
  • Item – 3

Thus, the hKey for a row in this group would in practice look like [1, [cid], 2, [oid], 3, [iid]].

Now let’s expand the COI example a step further to demonstrate a different group structure. Consider the COI schema again but this time Customer’s have an Address associated with them. The group structure would look as follows:

In this case, you can see there are 2 branches in the group structure. In order to ensure rows are interleaved according to group structure, the table ordinals are:

  • Customer – 1
  • Order – 2
  • Item -3
  • Address – 4

 

Thus, hKey’s for this group will look like: [1, [cid], 2, [oid], 3, [iid]] and [1, [cid], 4, [aid]].

Now I’d like to use a real dataset to demonstrate the concept of hKey’s further. Consider the MusicBrainz schema which I discussed in a previous post. In that post, I showed that there are 8 core groups in Akiban for the entire schema. For this article, I’m going to just use one of those groups – the label group. Graphically, the label group looks as shown below.

You can see this group is similar in structure to the classic COI example discussed at the beginning of this article. However, the group is much larger with 12 tables at the leaf level of the group. The table ordinals assigned for this group within Akiban are:

  • label_name – 1
  • label – 2
  • l_label_label – 3
  • l_label_recording – 4
  • l_label_release – 5
  • l_label_release_group – 6
  • l_label_url – 7
  • l_label_work – 8
  • label_annotation – 9
  • label_gid_redirect – 10
  • label_meta – 11
  • label_rating_raw – 12
  • label_tag – 13
  • label_tag_raw – 14

I loaded the data provided by MusicBrainz into an Akiban database using the tools outlined in my previous article. Now I’d like to consider the data for 3 labels: 1) Component Records, 2) Toast Hawaii, and 3) Arab Sheep. These are just 3 of the 59291 labels in the MusicBrainz dataset. I want to use these labels to show what the hKey’s look like for a real dataset after it is loaded into Akiban. The hKey’s and actual data for these labels looks as shown in the table below.

 

hKey row
1, 12 12, “Component Records”
1, 12, 2, 5155 5155, “5d6345a3-22d8-4a3a-91f2-464693258be3″, 12, 12, 1999, NULL, NULL, NULL, NULL, NULL, 4, 222, NULL, NULL, 0, “2011-09-06 14:49:10″
1, 12, 2, 5155, 7, 10563 10563, 27037, 5155, 213321, 0, “2011-05-16 16:31:52″
1, 12, 2, 5155, 7, 10564 10564, 27063, 5155, 213322, 0, “2011-05-16 16:31:52″
1, 12, 2, 5155, 11 5155, NULL, NULL
1, 13 13, “Toast Hawaii”
1, 13, 2, 2123 2123, “f7698aed-8726-495b-9d71-589977c3d27b”, 13, 13, 2002, NULL, NULL, NULL, NULL, NULL, 4, 221, NULL, NULL, 0, “2011-09-06 14:49:10″
1, 13, 2, 2123, 7, 12601 12601, 27035, 2123, 202711, 0, “2011-05-16 16:31:52″
1, 13, 2, 2123, 7, 15560 15560, 27037, 2123, 192804, 0, “2011-05-16 16:31:52″
1, 13, 2, 2123, 9, 102095 2123, 102095
1, 13, 2, 2123, 11 2123, NULL, NULL
1, 14 14, “Arab Sheep”
1, 14, 2, 35495 35495, “bf26495a-02d2-4bed-85c8-1d1f2d3e85f9″, 14, 14, NULL, NULL, NULL, NULL, NULL, 4, 105, NULL, NULL, 0, “2009-04-11 13:57:17″
1, 14, 2, 35495, 7, 30709 30709, 27038, 35495, 918142, 0, “2011-06-20 02:00:05″
1, 14, 2, 35495, 11 35495, NULL, NULL

 

Now, I want to try and display visually what our B-Tree looks like for these 3 labels.

In practice, the B-Tree has a height of 3 when the data is loaded. However, to give a better graphical representation I limited the internal nodes within the B-Tree to only contain 3 pointers resulting in a B-Tree that is much higher. The diagram above only shows a subset of the B-Tree for the 3 labels I’ve already mentioned and does not show the root or all leaf nodes. Also worth noting about the diagram below is that while the leaf nodes only show the hKey within them, they also actually contain the data for that row (sometimes referred to as a clustered index in other database products).

In summary, hKey’s result in a number of benefits for our Akiban Server including:

  • they provide natural clustering for group objects once the hKey’s are assigned
  • each row is still independent and can be accessed completely independent
  • allows us to process streams of interleaved rows (which will be explained in my next post on our operator framework)

That’s it for this post on hKey’s. I hope it provided some useful information on Akiban works at a deeper level. Feel free to ask any questions or request further information in the comments. In my next post, I’ll discuss our operator framework and then in future posts dive in on what each individual operator does and how to read Akiban explain plans.