Still life with humblebrag, part one

"Writing is a way of talking without being interrupted."

Oct 28, 2024

The first time I remember writing in a journal was a Holy Week when I must’ve been in elementary school, when John J’s mom sat all us kids down and had us reflect. It was supposed to be about, like, sin. I wrote down facts about my day and she said it was too factual, not emotional enough, and didn’t mention the solemn occasion.

So then after that I didn’t want to journal because I didn’t have much emotional life to record. But eventually I would begin a journaling habit that I’ve continued til the present day.

The first entry that I still have is from February 7, 2015, when I was thirteen. The notebook begins with me testing my pen1, for the occasion this time was that I was getting into fountain pens, and needed a use for them, and journaling seemed as good a use as any.2 That particular day my parents had taken me downtown and bought me a Pilot Metropolitan; the rest of the entry documents my concerns about how “at 13 years old, I really need to remember that the Internet is not always true”, how “I have a problem with starting hobbies and not doing them”, and how “I got an 83 on a math test and that makes me sad. So, I’m implementing some new study habits, which may or may not help me.” And the obvious next step: “I’m in the market for a fountain-pen friendly planner.” I also mention “interest rates being so high,” which they were certainly not.

Reading old journals is always a trip. When they’re, like, a year to three years old, it’s mostly cringe. When they’re much older than that, you’ve got enough distance from your Past Self to find him somewhat endearing, though you recognize hints of the disposition that you will come to love and loathe. My mom told me the other day that people don’t change and that is evidenced by this.

Journaling isn’t really worth it

I’ve journaled, off and on, but mostly on for the past four or five years, every day. When I tell a certain kind of person this, they admire it. People aspire to journal, I know, I’ve been exposed to the same influencers too, and they did have some influence on the persistence of my habit. But these years and some million words later, I don’t think that it’s particularly helpful in any way. Some hypotheses about how it could “help” might be that it makes you a better writer, it makes you more emotionally sensitive, it makes you more mindful, or it makes you more grateful.

I don’t think any of these are true. The last three are linked, and it’s difficult to attribute to journaling any specific improvement in this area, since it’s impossible to tease apart correlation and causation. Are you sensitive because you journal, or do you journal because you’re sensitive?

And for the writing, in my opinion, journaling has made me a worse writer. Part of why I began to journal regularly was coming across the “morning pages” thing from The Artist’s Way, which is supported by a thesis that three pages of stream-of-consciousness writing will help you to slay your inner editor. It does. That’s part of the problem. Now my inner editor is fairly well slain and I can’t resuscitate him when he’s needed. Journaling has bred in me (or perhaps it’s just correlation again) a logorrhea, and while I can sit down at pretty much any time and bang out a thousand words about pretty much anything, it ain’t gonna be pretty.

But by this point I’m addicted.3 Especially earlier on, I began to perceive the world in terms of what could make it into my journal, inevitably compressing the inexpressible. It of course takes time (a cost that I mitigated at the price of my pens). It might have made me a worse writer. But I think most pernicious is the sense that every day must be recorded to be valuable. Another reason that I started to journal is because I have a very bad memory, and I have generally positive experiences, and I thought it was a shame that I forgot so many of them. But when you miss a day, which generally happens on the most fun and busy days, like vacations, it seems irrevocably lost to Lethe. It seems like the day won’t exist unless you write it down. And again because writing is done after the fact, it feels as if you are not living unless you write. But writing is not living! Our tools shape us, and part of the way that this tool has shaped me is that I put even less of a burden on my memory than I naturally do, since for many things I can just grep my journal. So the days are recollected as streams of text, and not the experiences that text is meant to capture.

But as always I’m dissembling. Obviously my revealed preference is that I like journaling.

Eras

My journaling has passed through various phases. The first phase, until about the fall of 2020, was longhand on paper, with my still-beloved fountain pens. Then, at home during the spring 2021 semester, I came across one of those fateful offhand references on the internet that end up changing your life4, and this reference was to “shorthand”. I vaguely knew about the shorthand of journalists from before, but at this point I was struggling under the burden of scribbling in longhand my journal entries. I’d have to schedule forty-five minutes to journal, and I wouldn’t get nearly half of what I wanted to say on the page before the time was up. I needed something faster.

So, as I have done many times in the past and as I will unfortunately do again, I decided to learn a new writing system compelled by a tenuous logic of utility and an overarching boredom.5 I researched a few systems and alit on Orthic, an alphabetic system developed in 1911. (Alphabetic means that one atomic symbol usually represents a letter; it’s opposed to phonetic systems like Gregg that are theoretically faster but are harder to learn and slower to read.) Orthic’s relative ease of learning, its arabesque beauty, and the easy availability of online resources made me decide on it.

Here are two samples, from May 29 and August 14, 2021. In the first I’m using an italic nib and writing somewhat carefully and making an effort to write fairly cleanly; in the second I’ve settled into a more natural rhythm.

As you can see my particular manifestation is not particularly beautiful. I learned that it’s mostly better to write uncommon phrases (or common words except they have annoying forms, or acronyms) in the standard alphabet, since it’s far easier to read. Also, I think I learned some things slightly wrong, so it’s not really in spec, and I’ve invented a few abbreviations, but the nice thing about a private orthography is that none of that matters. (Except for my biographers, for whom my journal is really intended, but I trust they’ll be sufficiently motivated.)6

Around the spring of 2022, I switched to writing in French (still in Orthic), since I was preparing to study abroad in the fall. I kept this up during my summer internship, but with the time crunch in the morning, I was still falling behind. I decided then to abandon my fountain pens and abandon my Orthic and start typing. The horror. They remain neglected. I think I might’ve also switched to Colemak around this time? Don’t remember. I did get very fast at typing, though, and typing is a Pareto improvement. Most importantly, it’s legible to computers.7

I still type my journals, typically at night. (Used to be in the morning, but in the cruel real world, my mornings are thin.) I write in Doom Emacs (Emacs but with Vim keybinds), using the org-mode extension’s extension org-journal.8 The entries — just utf-8 text files — live in Dropbox, which I don’t even pay for; since it’s just text it takes up almost no space. I do want to move to iCloud probably for ~privacy~ but the last time I tried I was unsatisfied with the speed and reliability of the syncing.

Yay computers!

I told myself that eventually I would take advantage of this computational legibility. And now I have. Over the weekend I fired up a Jupyter notebook and fiddled around for a few hours doing some basic analysis on the text, and I’m excited to share it. Such statistics are a great way of feeling good about yourself.

I count, in total, 1,286,503 words on my computer, with an average daily word count from 2022-2024 of 1657 and median of 1443. There are also twenty-one volumes of handwritten journals. As you can see, I haven’t even finished 2015 in my transcription. So I think it’s safe to bet there’s at least 300,000 words untranscribed, giving me over 1.5 million words written so far. (Honestly, this is a very conservative estimate: I wrote that 1.2 million from mid-2022 to late 2024, about two years, giving me about 600k words per year. If I handwrote one-third as much as I type, since I’m missing four years from 2018 to 2022, that would give me 800,000 words untranscribed.) WolframAlpha says that 1.2 million words would take 91 hours to read, and would make for a 2000 page book.9 The first five Game of Thrones books are about 1.7 million words, so I might be there.

Here is the same chart but for only 2024, which is much more legible.

But the counts aren’t that interesting. Maybe I can analyze my relative word frequencies? Am I really as unique of a writer as I think I am? The conclusion is no. I counted the frequencies of all the words in my corpus, and compared against the base frequencies in English, and the words I use more than normal are “yeah”, “chatted”, “sigh”, “stuff”, “lunch”, “texted”, “talked”, “gonna”, “mentor”, “okay”, “honestly”, “weird”, “video”, and then a smattering of obscenities. Not exactly what I wanted to see.

But also in there were some proper nouns. I thought maybe I can count them. So I did, using a sentence-level tokenizer from NLTK, some word lists from there as well, and some heuristics about capitalization.10

I’ve redacted the personal names. An extreme power-law distribution, which you’d expect. I’m glad God makes it up there. (If combined with “Lord”, even in second place. Is that centered enough?) In first place, you know who you are.

What about on a smaller scale? Here are the top proper nouns I’ve used this year (i.e., this isn’t the counter from the above plot, but a new counter based on just entries from this year), along with their frequencies per day.

(“Pro” is from Vision Pro, lol.)

Something more fun — Claude suggested emotion composition. So I tried it out:

This is a fairly old model called Vader. Glad to know I’m more positive than negative; that seems true. I asked Claude and ChatGPT for other models, and they suggested RoBERTa, and I decided to try two versions: roberta-base-go_emotions (go_emotions) and distilbert-base-uncased-go-emotions-student (distil). Here’s an analysis using go_emotions over the year to date.

Hard to read, and slightly misleading due to the fact that gaps are papered over. I added gaps and some smoothing to the top graph. I tried out a few smoothing algorithms: Claude recommended EMA, Lowess, and Gaussian. I found that good old EMA was the clearest.

Over the year, the stacked area chart is nearly impossible to read, and the line chart isn’t super helpful either, especially with those gaping gaps, which artificially elevate or depress the beginnings of each curve segment.

Here it is for distilbert:

So according to distil, I’m confused most of the time; according to go_emotions, I’m neutral most of the time. Probably those are both accurate. But also my journal entries are for the most part recitations of fact, and don’t use too much emotional language. And I do tend to write about things that confuse me. Maybe in the future I’ll dive into individual entries and see how the models are making such decisions.

I do think though that this sentiment analysis is really cool and could be helpful. There are some localized trends, which probably have some basis in fact.

That’s about it for this time. I have some other analysis that I’m excited to run, using LLMs much more (I think RoBERTa is an LLM, it’s at least transformer based, but it’s kind of old). Very excited! Ugh I can’t keep it to myself. I want to do two things. First, I want to use a small LLM to go through each entry and try its best to mark each place name mentioned, and specify its location (e.g., “Costco” should be a specific Costco; “Paris” should be Paris, France and not Paris, Ohio). And then I want to make an animated map. Second, I want to train a small LLM on the journals, and explore that space: maybe I could prompt it with a date in the future and amuse myself with its prognostications of what I’ll do. We’ll see what my M2 Macbook can handle.

Finally, shoutout Claude, really. I barely know how to use matplotlib and numpy and whatever, and I don’t know at all how to use HuggingFace pipelines, but for nearly all the plots, its zero-shot accuracy was 100%: it gave me exactly what I wanted and the code ran first try. It did stumble a bit when I tried to get the gaps on the sentiment analysis, but two more tries fixed that up. It came up with the architecture for the smoothing algorithms, and the EMA and Lowess worked well, but the Gaussian’s normalization was off. Claude couldn’t correct it. I gave it to o1-preview and it fixed it first try. This whole thing would’ve taken me like a week if not for Claude; it actually took four or five hours, and there was a smile on my face the whole time.

It’s not really about how much absolute time it saves you, it’s about how much willpower and frustration it saves you. Because frustration over time causes you to stop, and stopping is the worst thing.

If you must know, the first test was a joke: “Knock knock. Who’s there. Dozen. Dozen who. Dozen anybody wanna let me in?”. The second test was copying “The Road Not Taken” in a shaky cursive.

I also signed up, via a subreddit connected with r/fountainpens, to get a pen pal. How romantic! I hope those places still exist. I was matched with some college student. He was nice and we talked about fountain pens for a couple months and then stopped writing. Ah, the innocence of the halcyon internet.

I do mean addicted — as I’m fond of reminding my friends, the WHO definition of addiction requires not only dependency but also negative effects, and the negative effects are palpable.

I think I also encountered fountain pens like this. And Emacs. And even philosophy. 99.9% of the internet is a waste of time, which is why I’m not on it anymore, but I do miss that high-upside 0.1% or 0.01%, those breadcrumbs carelessly dropped by travelers from worlds you will call home. Should I go back? It’s worse than it was. Non sum qualis eram either.

In high school I learned Dvorak in a computer class in which I otherwise did nothing; I learned Colemak when I got annoyed of alternation; that spring I also attempted to learn stenographic typing, though that was a bridge too far even for me. But maybe I’ll try again…

I’ve been asked a few times whether shorthand is “worth it”. Like almost everything else I do, no, it is almost certainly not. You’d get eighty percent of the benefits just by adopting some abbreviations for the most common words: “the”, “a”, “because”, and so on. There’s even systems that work with keyboards: yash and speedline are two examples. (And yet especially for typing, if your wpm is less than like 90, you’d be better off just improving that, especially sustained speed.)

I am in the slow process of digitizing my handwritten journals. Shorthand’s greatest shortcoming is how slow and painful it is to read, simply because you don’t read much of it. Sometimes I have better luck tracing forms instead of trying to read them — my hand knows more than my eye. And of course if I wrote in the normal alphabet then the inexorable progress of OCR would be able to read even my handwriting. But for this I’d have to train a model myself to read my Orthic, and even then it’ll be tough, because it’s fairly irregular and often I can’t even read it.

I don’t use any of the fancy features. The only thing that I actually use the power of Emacs for is a function I wrote in Lisp to make my transcription a little less tedious, which begins an entry at the day after the current buffer’s day. Emacs is totally unnecessary for journaling but I like it and it, along with writing other prose, is the primary thing I use Emacs for now, since I program in VS Code, because I once spent the first two weeks of an internship trying and failing to install Emacs and don’t want to repeat such an experience.

My college carries over printing credits from semester to semester, and you get a bonus as a senior. In my last semester, I took advantage of this by writing a program that processed my Org journal entries into Latex and then into a huge PDF, which I didn’t end up printing because it was projected to take hours and people were using the printers for, you know, their theses.

The heuristics involve seeing if it’s a one-character word (unlikely to be a proper noun), if it’s in ALL CAPS FOR EMPHASIS (which I rarely do anyway), if it’s a common English word that I might’ve sort of randomly capitalized to Get a Point Across, and then a whitelist of some words that happen to be common words but are actually proper nouns. I had to take particular care to remove common words because I have a habit of sometimes reporting speech without quotes, especially if I just remember the vibes but not the exact words. I might write: He said That’s it but then i replied No it’s not and he said Fine. (I also tend not to capitalize “i” in the middle of sentences.) And I had to remove some French words because of the time I wrote in French and I didn’t want to go out and get a French wordlist.

Bot Vivant

Discussion about this post