The QoS of QS.

Published 25 February 2024 at Yours, Kewbish. 1,372 words. Subscribe via RSS.

Introduction

I came across this post by Eric Chiang recently. They downloaded their past ten years of Spotify listening data into a big ZIP and conducted their own version of Spotify Wrapped. They graphed their most listened-to artists, how those artists changed over time, their listening time distribution across the day, and their favourite albums from a particular year. It looks like a fun time capsule and nostalgia trip — the perfect application of some hacky scripting skills.

The post ends off with a call to explore your own data a little more, which got me thinking about the work Chiang did to retrieve all their data, write scripts to process it, and visualize it. All these steps are manual processes: if you want to do something like this, you’d better have some time on your hands and a hankering to wrangle with Matplotlib.

You should also have some idea of what exactly you want to find or visualize — otherwise, wading through the hundreds of thousands of raw data points without a goal in sight gets overwhelming. Clearly, as seen on the HN discussion, folks do have a lot of ideas for what they’d like to visualize (e.g. finding one-hit wonders or songs that are correlated with some other data dimension like the weather). What’s the barrier, then? Why aren’t we seeing everyone’s unique graphs?

Right now, extracting your data from cloud services like Spotify, YouTube, GoodReads, or others requires a lot of effort. Services like Spotify that provide a one-click download button are already making it easier, compared to having to roll your own scraper, but there’s still activation energy in seeking out your data and taking the time to visualize it. Because the bottleneck is in the data, I think people tend to end up collecting the data for a particular visualization in mind, rather than the other way around.

But I think that’s backwards — to me, it makes more sense to explore your data with visualizations to aid you, unless you have a particular research question in place. For folks who’re just trying to get a sense of their data without a specific goal,

This is a post about navigating your data, the Quantified Self community, and how we can make data spelunking more compelling.

Quantified Self

Quantified Self is a movement broadly centered around tracking. Large swathes of the community are focused on health tracking: one’s weight, bodyfat percentage, number of calories burned per day. However, other types of tracking are common: usage of time, mood, carbon footprint due to flights taken, and so on.

People interested in self-experimenting (or inspired by the pretty result graphs) tend to be drawn to QS: it presents some order in our lived experiences, and gives us levers to continue improving our lives. I recently wrote about checkpoints, and I think QS data collection is the extreme manifestation of checkpoints. Collecting tens or hundreds of data points on ourselves a day gives us something to look back on and helps visualize the deltas between timesteps.

QS took off with the adoption of IoT devices and trackers, like the popular FitBit or Apple Watch. These made health data salient and readily available, for the first time. With the advent of SaaS companies and their general trend towards walled gardens, though, our data is becoming more inaccessible and opaque.

Quality of Service

I’d say the user experience of quantified self tracking is generally very poor. Take the Spotify example: Chiang had to submit a request on a form probably created solely for compliance, then wait up to thirty days for their data to be processed and returned to them. Then, they had to comb through the data to identify points of interest and write a whole bunch of visualization scripts.

This was for a relatively open data faucet, and a single app at that, though. There are folks who’ve written whole tool libraries to process and collate every aspect of their lives. Karlicoss’s HPI package brings together their browser history, texts, health data, TODOs, and more into a nice Python interface. I particularly like their motivation for the project:

[O]nce the data is available as Python objects, I can easily plug it into existing tools, libraries and frameworks. It makes building new tools considerably easier and opens up new ways of interacting with the data.

They emphasize the ad-hoc-ness and interactivity that I think the QS experience should be like. Our data are artifacts that we created, and I think we should be able to seamlessly pick and choose from what we’ve made to augment and inform what we’ll continue to do.

On the other hand, I understand that making straightforward ways to access raw user data probably isn’t the best use of engineering resources for a business, who’d rather be building their core product instead. There are few who are motivated enough to want their data, so to a business a rarely-used ‘Export my data’ form suffices.

If most apps won’t support the extensive data tracking and collation QS relies on, then, what will? The answer, in practice, is a spreadsheet. Julian Lehr wrote about their Airtable-based setup, and Felix Krause wrote about their Telegram bot → SQL database workflow. Both have automated aspects of their data collection based on publically available APIs, but both also have some sort of manual update habit.

In some sense, bringing all your life’s data into spreadsheets is the opposite extreme of keeping it all locked away in some SaaS: it’s now extremely easy to process and pull into other visualizations or projects, like Karlicoss’s work. The user experience is intuitive, though decidedly less captivating than most aesthetic tracker-specific apps.

In my ideal world, we wouldn’t have to be explicitly collecting all of our own data: it’d export itself, and we’d be able to access it anytime from some central, private store. There’d be visualizations automatically generated from data¹, reminders and ideas for trends that were detected, and more straightforward data connections between apps. But, for now, spreadsheets are a start.

Conclusion

To me, current QS tools feel clunky, clinical, and both overwhelming and underpowered. However, I think reframing QS as an inquisitive and explorative activity can help us design better interfaces and processes for it. As well, viewing it through a lens of nostalgia and rediscovery makes for some interesting ideas. I once built a tool to scan through your notes in a Git repository and resurface notes that you hadn’t edited in a while or reflected on, like spaced repetition for your thoughts. While the data I was working off of was less granular than daily health updates and relied on the heuristics available in Git history, this data was also collected ambiently, unlike today’s QS tooling.

QS ties into many of my thoughts on metadata and personalizable software, and it’s no surprise to me that the ideas that immediately come to mind revolve around richer interactive data formats and translating between data sources. As I mentioned in the introduction, self-tracking also speaks to me as a form of reflection on how I’ve changed, but in a quantitative format that’s harder to handwave. While I won’t be creating a full public life dashboard anytime soon, I’m inspired by that level of self-understanding and dedication to making some sense of the strange way of being we call life.

I used to create QS-level stats on my writing output on this blog, and in a now-unpublished post, even stats on the reading I’d done over a year. I don’t compute any metrics on my writing anymore, since I think it’d be fairly disappointing to see my few-and-far-between posts. I’d like to change that though: I think lately my perfectionism has been leading me to take this blog a little more seriously than I should. I’d like to start writing smaller, less polished posts more frequently, as opposed to forcing out 2k-word-long vomits whose core theses I get too attached to but can’t seem to express properly. Instead, here’s to the pottery parable for prolific work and Alice Maz’s recent reminder to me that I can just do stuff!

Perhaps in the era of ChatGPT, automatically creating little scripts or visualizations like this might be feasible. ↩︎

‹ go back