Upload support in mwbot-rs and the future of mwapi_errors

I landed file upload support in the mwapi (docs) and mwbot (docs) crates yesterday. Uploading files in MediaWiki is kind of complicated, there are multiple state machines to implement and there are multiple ways to upload files and different options that come with that.

The mwapi crate contains most of the upload logic but it offers a very simple interface for uploading:

pub async fn upload<P: Into<Params>>(
        &self,
        filename: &str,
        path: PathBuf,
        chunk_size: usize,
        ignore_warnings: bool,
        params: P,
) -> Result<String>

This fits with the rest of the mwapi style of simple functions that try to provide the user with maximum flexibility.

On the other hand, mwbot has a full typed builder with reasonable defaults, I'll just link to the documentation instead of copying it all.

A decent amount of internal refactoring was required to make things that took key-value parameters now accept key-value parameters plus bytes that should be uploaded as multipart/form-data. Currently only uploading from a path on disk is supported, in the future I think we should be able to make it more generic and upload from anything that implements AsyncRead.

Next steps for mwapi

This is the last set of functionality that I had on my initial list for mwapi, after the upload code gets some real world usage, I'm feeling comfortable calling it complete enough for a 1.0 stable release. There is still probably plenty of work to be done (like rest.php support maybe?), but from what I percieve a "low-level" MediaWiki API library should do, I think it's checked the boxes.

Except....

Future of mwapi_errors

It took me a while to get comfortable with error handling in Rust. There are a lot of different errors the MediaWiki API can raise, and they all can happen at the same time or different times! For example, editing a page could fail because of some HTTP-level error, you could be blocked, your edit might have tripped the spam filter, you got an edit conflict, etc. Some errors might be common to any request, some might be specific to a page or the text you're editing, and others might be temporary and totally safe to retry.

So I created one massive error type and the mwapi_errors crate was born, mapping all the various API error codes to the correct Rust type. The mwapi, parsoid, and mwbot crates all use the same mwapi_error::Error type as their error type, which is super convenient, usually.

The problem comes that they all need to use the exact same version of mwapi_errors, otherwise the Error type will be different and cause super confusing compilation errors. So if we need to make a breaking change to any error type, all 4 crates need to issue semver-breaking releases, even if they didn't use that functionality!

Before mwapi can get a 1.0 stable release, mwapi_errors would need to be stable too. But I am leaning in the direction of splitting up the errors crate and just giving each crate its own Error type, just like all the other crates out there do. And we'll use Into and From to convert around as needed.


Two steps forward, one step back for mwbot-rs

I was intending to write a pretty different blog post about progress on mwbot-rs but...ugh. The main dependency of the parsoid crate, kuchiki, was archived over the weekend. In reality it's been lightly/un-maintained for a while now, so this is just reflecting reality, but it does feel like a huge setback. Of course, I only have gratitude for Simon Sapin, the primary author and maintainer, for starting the project in the first place.

kuchiki was a crate that let you manipulate HTML as a tree, with various ways of iterating over and selecting specific DOM nodes. parsoid was really just a wrapper around that, allowing you get to get a WikiLink node instead of a plain <a> tag node. Each "WikiNode" wrapped a kuchiki::NodeRef for convenient accessors/mutators, but still allowed you to get at the underlying node via Deref, so you could manipulate the HTML directly even if the parsoid crate didn't know about/support something yet.

This is not an emergency by any means, kuchiki is pretty stable, so in the short-term we'll be fine, but we do need to find something else and rewrite parsoid on top of that. Filed T327593 for that.

I am mostly disappointed because have cool things in the pipeline that I wanted to focus on instead. The new toolforge-tunnel CLI is probably ready for a general announcement and was largely worked on by MilkyDefer. And I also have upload support mostly done, I'm just trying to see if I can avoid a breaking change in the underlying mwapi_errors crate.

In short: ugh.


Wiki streaks, and the person who's edited Wikipedia every day since 2007

This past weekend at Wikipedia Day I had a discussion with Enterprisey and some other folks about different ways edit counters (more on that in a different blog post) could visualize edits, and one of the things that came up was GitHub's scorecard and streaks. Then I saw a post from Jan Ainali with a SQL query showing the people who had made an edit for every single day of 2022 to Wikidata. That got me thinking, why stop at 1 year? Why not try to find out the longest active editing streak on Wikipedia?

Slight sidebar, I find streaks fascinating. They require a level of commitment, dedication, and a good amount of luck! And unlike sports where if you set a record, it sticks, wikis are constantly changing. If you make an edit, and months or years later the article gets deleted, your streak is retroactively broken. Streaks have become a part of wiki culture, with initatives like 100wikidays, where people commit to creating a new article every day, for 100 days. There's a new initiative called 365 climate edits, I'm sure you can figure out the concept. Streaks can become unhealthy, so this all should be taken in good fun.

So... I adopted Jan's query to find users who had made one edit per day in the past 365 days, and then for each user, go backwards day-by-day to see when they missed an edit. The results are...unbelievable.

Johnny Au has made at least one edit every day since November 11, 2007! That's 15 years, 2 months, 9 days and counting. Au was profiled in the Toronto Star in 2015 for his work on the Toronto Blue Jays' page:

Au, 25, has the rare distinction of being the top editor for the Jays’ Wikipedia page. Though anyone can edit Wikipedia, few choose to do it as often, or regularly, as Au.

The edits are logged on the website but hidden from most readers. Au said he doesn’t want or need attention for his work.

“I prefer to be anonymous, doing things under the radar,” he said.

Au spends an average 10 to 14 hours a week ensuring the Blues Jays and other Toronto-focused Wikipedia entries are up to date and error-free. He’s made 492 edits to the Blue Jays page since he started in 2007, putting him squarely in the number one spot for most edits, and far beyond the second-placed editor, who has made 230 edits.

...

Au usually leaves big edits to other editors. Instead, he usually focuses on small things, like spelling and style errors.

“I’m more of a gatekeeper, doing the maintenance stuff,” he said.

Next, Bruce1ee (unrelated to Bruce Lee) has made at least one edit every day since September 6, 2011. That's 11 years, 4 months, 14 days and counting. Appropriately featured on their user page is a userbox that says: "This user doesn't sleep much".

It is mind blowing to me the level of consistency you need to edit Wikipedia every day, for this long. There are so many things that could happen to stop you from editing Wikipedia (internet goes out, you go on vacation, etc.) and they manage to continue editing regardless.

I also ran a variation of the query that only considered edits to articles. The winner there is AnomieBOT, a set of automated processes written and operated by Anomie. AnomieBOT last took a break from articles on August 6, 2016, and hasn't missed a day since.

You can see the full list of results on-wiki as part of the database reports project: Longest active user editing streaks and Longest active user article editing streaks. These will update weekly.

Hopefully by now you're wondering what your longest streak is. To go along with this project, I've created a new tool: Wiki streaks. Enter in a username and wiki and see all of your streaks (minimum 5 days), including your longest and current ones. It pulls all of the days you've edited, live, and then segments them into streaks. The source code (in Rust of course) is on Wikimedia GitLab, contributions welcome, especially to improve the visualization HTML/CSS/etc.

I think there is a lot of interesting stats out there if we kept looking at streaks of Wikipedians. Maybe Wikipedians who've made an edit every week? Every month? It certainly seems reasonable that there are people out there who've made an edit at least once a month since Wikipedia started.

Of course, edits are just one way to measure contribution to Wikipedia. Logged actions (patrolling, deleting, blocking, etc.) are another, or going through specific processes, like getting articles promoted to "Good article" and "Featured article" status. For projects like Commons, we coud look at file uploads instead of edits. And then what about historical streaks? I hope this inspires others to think of and look up other types of wiki streaks :-)


Wikipedia's new skin is a sad opportunity to reminisce what we could've had instead

By the time you read this, you'll probably have seen Wikipedia's new layout ("skin"), dubbed "Vector 2022". You can read about the changes it brings.

As with most design changes, some people will like it and some people won't. But me? I just feel sad because years ago we had a popular, volunteer-driven skin proposal that was shut down by arguments that today we now know were in bad faith and hypocritical.

Back in 2012, then-Wikimedia Foundation senior designer Brandon Harris aka Jorm pitched a new idea: "The Athena Project: being bold", outlining his vision for what Wikipedia should look like.

During the question-and-answer period, I was asked whether people should think of Athena as a skin, a project, or something else. I responded, "You should think of Athena as a kick in the head" – because that's exactly what it's supposed to be: a radical and bold re-examination of some of our sacred cows when it comes to the interface.

His proposal had some flaws, but it was ambitious, different and forced people to think about what the software could be like.

By 2013-2014, focus pivoted to "Winter", an actual prototype that people could play with and conduct user testing on. Unfortunately I've been unable to find any screenshots or videos of the prototype You can play with the original prototype (thanks to Izno for pointing out it has been resurrected). Jorm would leave the WMF in 2015 and it seemed like the project had effectively died.

But later in 2015, Isarra (a volunteer, and a good friend of mine) unexpectedly dropped a mostly functional skin implementing the Winter prototype, named "Timeless". You can try it yourself on Wikipedia today. (I'll wait.)

By the end of 2016, there was a request for it to be deployed to Wikimedia sites. It underwent a security review, multiple rounds of developers poking at it, filing bugs and most importantly, fixing those bugs. The first set of French communities volunteered to test Timeless in February and March 2017. Finally in August 2017 it was deployed as an opt-in user preference to test.wikipedia.org, then iteratively deployed to wikis that requested it in the following weeks before being enabled everywhere in November.

I've been using Timeless ever since, on both my wide monitor and tiny (relatively) phone, it works great. I regularly show it to people as a better alternative to the current mobile interface and they're usually blown away. On my desktop, I can't imagine going back to a single-sidebar layout.

In January 2021, I interviewed Jorm for a Signpost story, and asked him about Timeless. He said, "I love Timeless and it absolutely should replace Vector. Vector is a terrible design and didn't actually solve any of the problems that it was trying to; at best it just swept them under rugs. I think the communities should switch to Timeless immediately."

What went wrong?

At the end of 2017, following Timeless being deployed everywhere as opt-in, Isarra applied for a grant to continue supporting and developing Timeless (I volunteered as one of the advisors). Despite overwhelming public support from community members and WMF staff, it was rejected for vague reasons that I'm comfortable describing as in bad faith. Eventually she applied yet once again and received approval midway through 2018. This time I provided some of the "official WMF feedback" publicly. But the constant delays and secret objections took a lot of steam out of the project.

Despite all of that, people were still enthusiastic about Timeless! In March 2019, the French Wiktionary requested Timeless to become their default skin. This is a much bigger deal than just allowing it as an opt-in choice, and led to discussion of whether Wikimedia wikis need to have a consistent brand identity, how much extra work developers would need to do to ensure they fully support the now-two default skins, and so on. You can read the full statement on why the task was declined - I largely don't disagree with most of it and the conclusion. If Timeless was going to become the default, it really needed to be the default for everyone.

Of course, this principle of consistency would be thrown out in the 2022 English Wikipedia discussion on whether to switch the default to the new "Vector 2022" skin, which was going to be allowed to opt-out of the interface everyone else was using if they voted against it.

Had the French Wiktionary been allowed to switch their default to Timeless, it would've continued to get more attention from users and developers, likely leading to more wikis asking for it to become the default.

You can skim through how Vector 2022 came about. Just imagine if even a fraction of those resources had gone toward moving forward with Timeless, backing a volunteer-driven project. It's just sad to think of it now.

So...

I started this story with Jorm's op-ed rather than a history of MediaWiki skins because I think he accurately captured that the skin is just a subset of the broader workflows that Wikipedians go through that desperately need improvement. Unfortunately that focus on workflows has been lost and it shows, we're all still using the same gadgets for critical workflows that we were 10 years ago. (I won't go into detail on the various Timeless features that make workflows easier rather than more difficult.)

Vector 2022, coming 12 years after the original Vector, is a rather narrow subset of fixes to the largest problems Vector had (lack of responsiveness, collapsed personal menu, sticky header, etc.). It's just not the bold change we need. Timeless, far from perfect, was certainly a lot closer.


Advent of Code 2022, in Rust

There's a yearly programming contest called Advent of Code (AoC). If you haven't heard about it, I'd recommend reading betaveros's post explaining what makes it unique.

This was my third attempt at AoC, previously trying it in 2019 (made it to day 5) and 2021 (day 6). This year I made it to... drumroll ...day 14! I had a good time this year, primarily because a group of friends (read: wiki folks on Mastodon) were doing it every day, so I'd be motivated to be able to compare my solution with their own.

Then on day 15 at midnight I looked at the puzzle and said "nope." and went to sleep.

AoC definitely messed with my sleep schedule being on EST and starting the puzzles at midnight rather than the 9 p.m. back in PST. Once I finished each puzzle, it always took a while to calm down from the rush and by then I'm sleeping at least an hour later than I should've been.

But since I was starting as soon as the puzzle came out on most days, the leaderboard accurately reflects how long it took me on those puzzles:

      --------Part 1---------   --------Part 2---------
Day       Time    Rank  Score       Time    Rank  Score
 14   00:35:44    2411      0   00:40:21    1977      0
 13   00:30:11    1920      0   00:38:08    1735      0
 12   23:09:41   34803      0   23:24:54   33874      0
 11   00:28:01    1435      0   01:01:03    2707      0
 10   00:15:40    2657      0   00:27:38    1841      0
  9   02:34:24   15092      0   02:56:58   11213      0
  8   00:36:38    6896      0       >24h   61768      0
  7   00:34:54    2671      0   00:45:38    2924      0
  6   00:08:31    5046      0   00:10:01    4555      0
  5   00:16:09    1720      0   00:17:34    1375      0
  4   00:08:33    3667      0   00:10:10    2539      0
  3   14:34:00   82418      0   22:00:31   92084      0
  2   14:27:16  100430      0   14:47:19   94770      0
  1   17:13:27  112294      0   17:16:09  107095      0

Day 5 was my best performance, I attribute that to the input format requiring a more-complex-than-usual parser, which I sidestepped by cleaning up the input in my editor first.

I posted a link to each day's solution and some commentary on a Mastodon thread. All of my solutions are available in a Git repo.

Overall I enjoyed doing the challenges in Rust. I feel that a good amount of the puzzles just required basic string/array manipulation, which are faster to do in a dynamically typed language like Python, but there were plenty of times I felt Rust's match statement (which Python now sort of has...) and sum types came in handy. Specifically with Rust's match statement, the compiler will complain if you don't satisfy some branch, which helped when e.g. implementing the rock-paper-scissors state machine.

As far as learning goes, I picked up some CS concepts like Dijkstra's algorithm. I'm not sure I really learned any more Rust, just got more comfortable with the concepts I already knew and likely faster at applying them. For the past few months I feel like I'm now thinking in Rust, rather than thinking in Python and writing it in Rust.

Past puzzles are available indefinitely, so you can do them whenever you want. I don't plan on finishing the rest, I mostly lost the incentive now that it's no longer a daily thing. But I'll probably try again in December and see how far I go :-)


2022 goals, revisited

I set some goals for myself at the beginning of the year.

Here's how it went:

  • Move out of my parents' house.
    ✔️ I live in New York City now. This was definitely my biggest goal and accomplishment of the year.
  • Contribute something meaningful to SecureDrop.
    ✔️ I think so. I need to writeup some of the stuff I worked on in the past year.
  • Contribute something meaningful to MediaWiki.
    ✔️ Slightly more mixed because I contributed a lot less this year than in the past, but I still consider myself having contributed in a meaningful way.
  • Not get COVID.
    ❌ Got it in June :(
  • Continue contributing to Mailman.
    ❌ Didn't really find the motivation this year. I'm hoping to spend more time on this in 2023, Wikimedia's Mailman install is showing that it needs more love.
  • Continue working on mwbot-rs, while having fun and learning more Rust.
    ✔️ I posted updates on the News page.
  • Get more stickers (lack of in-person meetups has really been hurting my sticker collecting).
    ✔️ Definitely. New stickers include FIRE, California poppies, Pacific Northwest, Qubes, HOPE 2022, Pinnacles National Park, a corgi, and very nice yellow and blue bird that says, "There is more power in peace than in violence".
  • Port the rest of my wiki bots to Rust.
    ❌ Still running in good old PHP. I don't really think this is worth it anymore, people are too used to the current bugs that introducing a different set of bugs would be more disruptive than helpful.
  • Make progress on moving wiki.debian.org to MediaWiki.
    ❌ No real progress :(
  • Write at least one piece of recognized content (DYK/GA/FA) for Wikipedia.
    ✔️ I racked up 4 DYKs this year, List of United States Supreme Court leaks (May 2022), Eleanor Bellows Pillsbury (June 2022), 2022 University of California academic workers' strike (Dec. 2022), and Canadian Coalition for Firearm Rights (Dec. 2022). Hitting the DYK threshold feels pretty straightforward now that I should probably aim for a GA!
  • Travel outside the US (COVID-permitting).
    ❌ I probably had the opportunity, but just didn't feel comfortable because of COVID. Planning at least two international trips in 2023!
  • Finish in the top half of our Fantasy Football league and Pick 'em pool. I did pretty well in 2020 and really regressed in 2021.
    ❓ Too early to say. Currently doing well in the Pick 'em pool, but not in Fantasy.
  • Keep track of TV show reviews/ratings. I've been pretty good about tracking movies I watch, but don't yet do the same for TV.
    ❌ Started, but didn't finish.

Publicly publishing a list of goals was nice, every few months I'd re-read the post to see if I was on track or not. But I don't intend to publish my 2023 goals, I expect they'll be more personal than these were.