How (and why?) to order maps in Go.

How (and why?) to order maps in Go.
Photo by Aram Ramazyan / Unsplash

A while ago I've created a tiny library called go-omap (Go Ordered Map) and back then I had not project to use it immediately. I have encountered such cases earlier in time, but had no code to show.

GitHub - eli-l/go-omap: Ordered Map for Go (Golang).
Ordered Map for Go (Golang). . Contribute to eli-l/go-omap development by creating an account on GitHub.

Storing collections in Go

I bet you know sorting on database level is useful and often the fastest way to get entries in the specific order. Sometimes it is OK to sort the results. But no matter how you sort - you want to keep the sorted results the way they are as it really matters. It keeps information easy to understand and perceive for users.

In Go we have a slice (wrapper over array) and a map. Arrays keeps the order, but accessing the specific element takes O(n) time, where n is the length of the array.

On the other hand, we have are dealing with O(1) (map being a hash map) for inserts, lookup and delete. Running iterator over a map, however, does not provide any guarantees regarding the order.

Why to chase fast lookup

When dealing with Go we often prefer to create custom logic that takes more time to implement but less time to execute. Dealing with complex database entities, especially with many relations, usually requires some effort to optimise it.

Imagine the case, when there is a User, Article and Tag entities. User may have many Articles, each Article may have many Tags. We want to find out the top 3 popular Tags for each User.

Simple to implement, slow to execute

The simplest solution would be to:

  • get all Users
  • iterate over Users, getting Articles for the User
  • iterating over Articles to get all related Tags

This may be OK for fast prototyping, but this solution would not scale. Imagine 10 users having 10 Articles and Each Article having 3 tags attached. 10 * 10 * 3 = 300 queries. Your database won't be happy about it, and if there is a DBA in your team you might even feel the real pain of such a solution.

More optimal solution

A better solution would be:

  • get all Users
  • get Tags for all the Articles for Users (from previous step)

This approach provides all the data with just 2 db queries. But we want to show top 3 Tags for each user, remember? So getting all the articles and Tags is not enough.

This is where map becomes useful.

Data pools

We have 2 data pools so far: Users, Articles + Tags.

For simplicity let's use Integers as IDs (primary keys). So we create a map of the Users, where key is ID. We can access elements in a descriptive and straightforward way as soon as we know the ID.

Without getting deeper into details, let's assume we are not interested in any Article details, except the UserID (foreign key - FK) it belongs to, but it's pretty straightforward to JOIN Tags and keep the FK. At the same time we are advanced enough to sort Tags by it's popularity for each User.

Getting the end results

Now it's pretty straightforward: we iterate over Articles+Tags results, read the UserID it belongs to and propagate the proper field of User.

type User struct {
    ID: int
    Name: string,
    Tags: []string
}

type Tag struct {
    UserID: int
    Name: string
}


..
users := map[int]User{} 
// put users from DB results here
..

for _, tag := range tags {
    tags = users[tag.UserID].Tags
    tags = append(tgs, tag.Name)
    users[tag.UserID].Tags = tags
}

Looks pretty simple, huh? It is simple, until you return users. You will find out during 5-10 reloads Users changes the order. But we want to display Users alphabetically! Here is exactly where the problem occurs.

In order to keep Users in the proper order, we need to preserve the key order by keeping it in the array:

userOrder := []int
users := map[int]User{} 

for _, user := range userResults {
    users[user.ID] = user
    userOrder = append(userOrder, user.ID) 
}

results := []User{}

for _, uid := range userOrder {
    result = append(result, users[uid])
}

return results

As you see we have to introduce array, that keeps the keys of the map in the original (sorted) order and we have to return the array of Users, rather than a map.

Why don't use array (slice) only? Searching for the element takes O(n) times, so we might need to iterate over entire array to find the proper User, then iterate once again to put data back. This would lead to running to many iterations to enrich the original results with additional data (Tags in this example).

Summary

Ordered maps hides the ordered array within the omap structure. It can't provide exactly the same way of accessing the elements, so I had to introduce Add and Get methods. Sorting can be also done, using the value, or a simplified version will return keys for sorting and save new order to the sorted array, without touching the map. I must mention I won't die for this API, so let me know if think there should be more of it or there is another useful approach for sorting.

And the best thing: omap can be iterate with range, just like map or array, using Go iterators starting from 1.23+ (or with experimental flag in 1.22.*).

Obviously we love Go for it being a type-safe. Utilising generics, that being introduced a while ago makes all this type safe.

Code example

func getParameters(userID uint, time time.Time) []param{} // returns DB query results

...

parameters := getParameters(user.ID, parsed.Add(24*time.Hour))

var paramNames []string
for _, param := range parameters {
    if param.Rn == 1 {
        paramNames = append(paramNames, param.Name)
        current[param.Name] = param.Value
        units[param.Name] = param.Unit
    } else {
        old[param.Name] = param.Value
    }
}

sort.Strings(paramNames)

data := omap.NewOrderedMap[string, propertyValues]()
for _, k := range paramNames {
    // unrelated code

    data.Set(k, propertyValues{Current: val, Diff: diff, Unit: units[k]})
}

// Using the iterator
for key, value := range details.Iterator {
    line := fmt.Sprintf("%s: %v", key, value.Current)
    println(line)
}