‹ go back

Metadata and You.

Published 14 November 2021 at Yours, Kewbish. 2,195 words. Subscribe via RSS.


Recently, I’ve started pondering the tools-for-thought phenomena again, but perhaps in a more meta way. From what I’ve noticed on Twitter (at least for the short amount of time that I’ve been checking content from it), there’s been a trend over the past couple years with the explosion of popularity of apps like Roam Research, Obsidian, Logseq, Notion, Workflowy, Anki, or the myriad of other apps that solve a major pain point for a lot of people: the management, and meta-management of knowledge. Here, I’d like to focus on the meta-management aspect of this a bit. I’ve noticed a lot of the tools rapidly gaining popularity and (appear to be) pushing the forefront of human-computing interfaces have a shared trait in common: they’re all in essence ways to manipulate metadata.

Metadata and knowledge can sometimes blend together, likely a result of metadata-deficient systems where a majority of the knowledge was spent on cross-labelling itself in lieu of proper metadata. When I reference metadata, I’m talking about not the juicy knowledge itself, but the trail of breadcrumbs leading to it - the labels, the categorizations, and the aliases. Things like ’time last modified’, ’time created’1, and other custom labels all fall under the umbrella of metadata. There’re also actions associated with that metadata, like advanced regex-based searching, filtering by tag or label, alternate ‘slices’ of data. The apps I’ve mentioned above all extend or have significantly innovated and built features connected to metadata. Roam, likely one of the more popular examples, has their famous tag and linking system. Click on a little hashtagged bit of context, and you’re teleported to a page full of other mentions of the same topic - the very alternate ‘slicing’ of data I’ve just mentioned.

Metadata is the apparent solution to the categorization problem, and what appeals so powerfully to the part of the human brain that craves neat little boxes. The issue with software is that it’s generally not hyperpersonalizable. Settings can’t be tweaked in every single little way that all the different workflows and people using the app demand - core feature development would stall to a standstill, and devs would spend more time implementing personalizations than concrete tools. But people still want to create their own little classifications, and make the app theirs. Metadata here is the fix - at least, for most people. Some users might feel overwhelmed (look at the hours of ‘get started with Notion’ videos and the pages of questions about certain tweaks), but on average, I’d say that most people generally approve of this ability. Less personalization-focused people can simply overlook the metadata, or implement just the core basics. Power users, on the other hand, can use custom fields and metadata to their heart’s content - just look at the magic people pull off with Notion databases.

I’ve been thinking a lot recently about tools for thought, and their first principles. What are the majority of lauded apps rooted in? What trends and tendances is society looking for in popular apps? In this post, I’d like to share some of my realizations (well, that’s a strong word) about the space and the role metadata plays in our knowledge interactions.

Manipulating the Meta

Apps do some combination of three key things with metadata: write it, connect it, and allow it to be synthesized and searched for and molded. For example, take something like Trello. You can tag and label your to-dos (writing metadata), link them to one another across boards (connecting it), and have high level overviews and statistics about your project (synthesis). This last bit about super-synthesis is important, as it creates a new level of meta. With that, there’s a possibility to invest in linking up metadata about the metadata, and that’s where the potentials for new tools and new innovations lies.

In a sense, the true innovation factor of these popular apps lies in the meta-levels that they provide. Roam and Notion and whatnot aren’t the power tools they are because of some revolutionary new way to interact with data itself: it’s the refreshing levels of interactions they offer with the metadata instead that boosts the quality of life when working with the knowledge and data itself. Most tools offer a good baseline of data management, and some level of interconnectedness and tagging and such, but when software offers a new way to interact with the metadata, that’s when you get interesting workflows and environments. Roam, Notion, and other apps with a block or bullet model appeal to people because of their fine-grained connections, something that was lacking in the common sterilized, document-driven approach.

A good example of this is what inspired me to write this whole blog post: the Discourse Graph extension available in Roam. I don’t use Roam, nor plan to daily drive it in the near future, but there’s something about this extension in particular that really tempted me to switch. It works with your existing Roam graph to provide a formalized note model, composed of questions, evidence, claims, conclusions, and all the connections in between. It’s suited well for a research ecosystem, with tools to denote that this note ‘supports’ or ‘refutes’ or ‘weakly correlates’ with another. This in itself is a powerful metadata model - the codified structure takes away decision paralysis, and creates a very regimental and logical order of graphs. However, where this extension goes above and beyond is its playground feature, where you can query your model for ‘all evidence that supports X and refutes Y’, for example, and find ‘all claims that connect to such and such question’. That’s a truly new model of interaction with metadata, and a great example of how having a structure and tools to manipulate metadata can lead to connecting together key insights in your knowledge.

Data Dimensions

Besides having tools built into software to analyze metadata, it’s also important to allow users to implement their metadata to fit their workflow. People’ll always want more dimensions to categorize and subcategorize and fit all their data into - I remember when I was trying to implement Trello into my workflow, I was pretty frustrated at the lack of sub-bullets and sub-tasks, and had to make do with a very scuffed checklist model instead. The same goes with nesting - Workflowy’s popular for a lot of reasons, but one is the infinite hierarchy that it provides. That infinity is also something interesting to note: as long as something’s not unlimited (to a reasonable depth), a subset of users will feel like they’re hitting a wall with what the software can help them do. Whatever depth of metadata is available, unless unlimited, will feel like some construct to at least some people, if not a significant portion. This creates essentially infinite ‘axes of belonging’, or categories with which to split data up. Axes of belonging are like possible views and queries into the data, and if the number of axes can be maximized, having all these new variables and possibilities makes software ostensibly more powerful for those power users.

But there’s still a fine balance to walk between too complicated with too many dimensions and axes of data, and too few. Excel, arguably perhaps one of the original tools for thought and innovators in the metadata space, does this quite well. The spreadsheet and table interface is immediately intuitive for many, but hidden deep into menus and tabs that the average user will never explore, are the hidden gems that probably makes someone’s workflow click. There are infinite ways to associate metadata to a cell through other cells or formatting or conditional formulas, and an infinite canvas full of columns and rows to stack all your data in. This is, I think, how people think connections work best - through categories, expressed here in table associations. Maybe it’s not the best and most efficient way to work, because sometimes constraints are what you need, after all. But with its many metadata dimensions and built-in ways to explore data, Excel feels so open, and that’s a trait in software that I think people are almost universally drawn to.

Standardizing Might (Will?) Fail

So far, I’ve discussed the two main elements of what makes metadata integration so appealing. However, to get multiple people on the same page, all benefiting from the same metadata innovations, tools will need to have some shared standard, or some sort of schema. Schemas are inherently limiting though, unless they’re explicitly un-limiting. But in that case, the point of standardization and extension is a bit moot. To some extent, I think you can get schemas to be flexible and agree, but there’s a fine line between too much structure and the sort of spec that goes ’literally any field can be literally anything, go have fun’. How are new features proposed and added across so many apps? Will a central standard have to be patched? How can we make it backwards compatible without ‘de-dimensionalizing’ or flattening metadata?

There’s a relevant XKCD, as there always is, but there’s also a couple relevant tools that might work to bridge the gap between different tools. Cambria by Geoffrey Litt (and Peter Van Hardenberg and Orion Henry) is a tool to move data between schemas. In essence, it’s a Google Translate for tools for thought, where ’lens’ can be defined to rename fields, convert between datatypes, and extract certain labels. It partially solves the issue of differing standards, as apps can define ways to translate and transmute metadata between each other. This does assume that software companies are willing to cooperate with each other (a stretch), or that there’ll be dedicated users who drive this system forward (definitely more likely). Something like this doesn’t solve the issue of how rich interactions and media based on metadata that’s not quite supported should resolve, but it’s a great step to linking all these tools and towards a more metadata-friendly future2.


For software being developed today, I think there are some decently clear trends in users becoming more tech-savvy, and in power users craving even more functionality. In the new field of knowledge work, maximum efficiency is glamorized and lauded. While that has its own caveats that I won’t get into now, it’s key to consider that, in general, people seem to be trending towards wanting more options and more connectivity through the meta. It’s taken me this long to realize that I’ve essentially written a SEO boost for the company-that-must-not-be-named, but oh well. We’ll roll with it, because that appears to be what the zeitgeist is moving towards: a future of connectivity and linking3. Maybe in this case, I’m thinking of it more in terms of knowledge management than magical VR metaverses, but the case in point holds.

In other news, I’ve been up to more work with Full Stack Open, and I’ve also tried to start the CSES problem set while bodging my way through C++. It’s been a while since I’ve felt this beginner - it took me a solid hour and half to write a basic Collatz sequence simulator. Granted most of that was looking up syntax and tweaking a basic Vim setup, but it was certainly a knock to my Python tendencies. CSES is kind of one of those things that’s good for you, and somewhat fun, but only in small doses, not when trying to grind certain amounts of problems per session or whatever. It’s been a good brainteaser so far, so I think I might enjoy (very) small amounts of it in the future. I also feel like I’m starting to spread myself too thin - I’ve started doing some stuff at the UBC CSSS as well. I’m halfway certain it’s something to do with being immersed in an entirely new world of uni, and surrounding myself with people who all seem much more experienced and cooler than I am. It’s a good ego check, but it’s also been inspiring me to explore and look into a lot of new things. Halfway a good thing (exploring new areas of CS, doing fun things), halfway a not-so-great thing (occasionally self-overwhelm and loads of imposter syndrome), but I think I’ll be able to manage, so we’ll see what happens.

  1. There’s also a lot to say about the fact that the most common, and most baseline metadata is often that of time. It’s something that’s easy to connect to a physical moment, a decently useful bit of information for most contexts, and straightforward to record in server logs. It’s also an universal constant - people across different cultures can’t interpret time differently, nor can users argue over the best way to tweak it. It’s just time. (More on this maybe in another post soon - I’ve got some ideas brewing that I’d like to investigate further.) ↩︎

  2. It’d be interesting if something like this got centralized / listed into some graph. It’d then be a link full of linked ways to link (and manipulate) links in linking software! (Infinite metadata recursion, and it’s turtles all the way down.) ↩︎

  3. Besides the unintentional reference to Facebook, the title of this post is a cute nod to Thesephist’s project ‘a piano and you’↩︎

‹ go back