Exploring the Chordonomicon with Node.js Worker Threads and Jupyter Notebooks

Recently, I stumbled across a Hacker News post discussing a massive dataset called Chordonomicon, which includes chords, key, genre, and other metadata for over 680k songs from Ultimate-Guitar. As someone who loves both music and code, I couldn’t resist digging in.

This turned into a fun project that touched on large data processing, worker threads in Node.js, and even Jupyter notebooks with Deno.

Why Bother?

The post I mentioned before focused on Keys and Chords, I would need to parse those as well, but I specifically wanted to look at the chord progressions. I don’t usually work with datasets this large, so processing it efficiently was part of the challenge—and the learning experience.

“What if I multi-threaded this in TypeScript to make it faster?”

Spoiler: I’m not sure it did make it faster. But it was a great excuse to experiment with worker_threads in a tsx environment.

Setting the Stage: TSX Workers

Here’s a bit of code from my worker.js bootstrap file that made the whole tsx + worker_threads combination possible:

// worker.js
const path = require('path');
const { workerData } = require('worker_threads');

// Enable support for TypeScript in the worker
require('tsx');
require(path.resolve(__dirname, workerData.path));

Why it’s cool: Normally, using TypeScript with Node.js workers is a pain or near impossible. This approach uses tsx to dynamically compile TS files at runtime inside worker threads—meaning I didn’t need to precompile and link them.

Then I spun up multiple workers to chunk and handle parts of the dataset:

const worker = new Worker(resolve(__dirname, 'worker.js'), {
  workerData: { path: 'worker.ts', ...data },
});

Normalizing Chords to Roman Numerals

One of the most critical steps in my analysis was translating every chord into a normalized, key-agnostic format—specifically, roman numerals. This is a fundamental concept in music theory because it lets you describe chord functions like tonic, dominant, subdominant independent of key. A I–IV–V in C major (C–F–G) looks the same in G major (G–C–D), which makes it perfect for large-scale harmonic analysis.

To pull this off, I used the excellent @tonaljs package. This library helps parse chords, detect qualities (major/minor/diminished/etc.), and a myriad of other music theory tasks.

After normalizing and cleaning up the raw chord strings which were often messy or inconsistent, I fed them through tonaljs's getRomanNumeral() to extract structured harmonic data. I then aggregated that data into usable forms: frequency counts by chord type, chord-step pairings, and full progressions.

Since the data was so large, I decided to focus only on the chords from the verses of each song.

Analysis with Deno and Jupyter

Instead of writing a typical web app or running a CLI script, I decided to do something different: I used a Jupyter notebook powered by a Deno kernel to explore the processed data.

This was my first time using a notebook from within VS Code with the Jupyter for Deno kernel, and it was delightful. I could write TypeScript, run it interactively, and plot results all in one place.

Graphing with Vega-Lite

After trying several visualization libraries, I landed on Vega-Lite—which was lightweight and worked great with tabular JSON. My focus was mostly on:

Key distribution
Most common chords
Chord progressions
Heatmap of chords across steps

Progressions

Breaking down the most common chord progressions by their ending cadence gave us a few lists:

Authentic
- I-V-I
- I-IV-V-I
- I-IV-I-V-I
- I-vi-IV-V-I
- I-IV-vi-V-I
- I-vi-ii-V-I
- I-vi-V-I
- I-V-IV-V-I
- I-ii-V-I
Plagal
- I-IV-I
- I-V-IV-I
- I-V-vi-IV-I
- I-V-I-IV-I
- I-vi-IV-I
- I-vi-V-IV-I
- I-V-ii-IV-I
- I-ii-IV-I
- I-IV-V-IV-I
Deceptive
- I-V-vi
- I-IV-V-vi
- I-IV-I-V-vi
- I-IV-V-I-IV-V-vi
- I-V-I-V-vi
- I-IV-I-IV-V-vi
- I-vi-IV-V-I-vi-IV-V-vi
- I-IV-I-IV-I-V-vi