Wiki streaks, and the person who's edited Wikipedia every day since 2007

This past weekend at Wikipedia Day I had a discussion with Enterprisey and some other folks about different ways edit counters (more on that in a different blog post) could visualize edits, and one of the things that came up was GitHub's scorecard and streaks. Then I saw a post from Jan Ainali with a SQL query showing the people who had made an edit for every single day of 2022 to Wikidata. That got me thinking, why stop at 1 year? Why not try to find out the longest active editing streak on Wikipedia?

Slight sidebar, I find streaks fascinating. They require a level of commitment, dedication, and a good amount of luck! And unlike sports where if you set a record, it sticks, wikis are constantly changing. If you make an edit, and months or years later the article gets deleted, your streak is retroactively broken. Streaks have become a part of wiki culture, with initatives like 100wikidays, where people commit to creating a new article every day, for 100 days. There's a new initiative called 365 climate edits, I'm sure you can figure out the concept. Streaks can become unhealthy, so this all should be taken in good fun.

So... I adopted Jan's query to find users who had made one edit per day in the past 365 days, and then for each user, go backwards day-by-day to see when they missed an edit. The results are...unbelievable.

Johnny Au has made at least one edit every day since November 11, 2007! That's 15 years, 2 months, 9 days and counting. Au was profiled in the Toronto Star in 2015 for his work on the Toronto Blue Jays' page:

Au, 25, has the rare distinction of being the top editor for the Jays’ Wikipedia page. Though anyone can edit Wikipedia, few choose to do it as often, or regularly, as Au.

The edits are logged on the website but hidden from most readers. Au said he doesn’t want or need attention for his work.

“I prefer to be anonymous, doing things under the radar,” he said.

Au spends an average 10 to 14 hours a week ensuring the Blues Jays and other Toronto-focused Wikipedia entries are up to date and error-free. He’s made 492 edits to the Blue Jays page since he started in 2007, putting him squarely in the number one spot for most edits, and far beyond the second-placed editor, who has made 230 edits.

...

Au usually leaves big edits to other editors. Instead, he usually focuses on small things, like spelling and style errors.

“I’m more of a gatekeeper, doing the maintenance stuff,” he said.

Next, Bruce1ee (unrelated to Bruce Lee) has made at least one edit every day since September 6, 2011. That's 11 years, 4 months, 14 days and counting. Appropriately featured on their user page is a userbox that says: "This user doesn't sleep much".

It is mind blowing to me the level of consistency you need to edit Wikipedia every day, for this long. There are so many things that could happen to stop you from editing Wikipedia (internet goes out, you go on vacation, etc.) and they manage to continue editing regardless.

I also ran a variation of the query that only considered edits to articles. The winner there is AnomieBOT, a set of automated processes written and operated by Anomie. AnomieBOT last took a break from articles on August 6, 2016, and hasn't missed a day since.

You can see the full list of results on-wiki as part of the database reports project: Longest active user editing streaks and Longest active user article editing streaks. These will update weekly.

Hopefully by now you're wondering what your longest streak is. To go along with this project, I've created a new tool: Wiki streaks. Enter in a username and wiki and see all of your streaks (minimum 5 days), including your longest and current ones. It pulls all of the days you've edited, live, and then segments them into streaks. The source code (in Rust of course) is on Wikimedia GitLab, contributions welcome, especially to improve the visualization HTML/CSS/etc.

I think there is a lot of interesting stats out there if we kept looking at streaks of Wikipedians. Maybe Wikipedians who've made an edit every week? Every month? It certainly seems reasonable that there are people out there who've made an edit at least once a month since Wikipedia started.

Of course, edits are just one way to measure contribution to Wikipedia. Logged actions (patrolling, deleting, blocking, etc.) are another, or going through specific processes, like getting articles promoted to "Good article" and "Featured article" status. For projects like Commons, we coud look at file uploads instead of edits. And then what about historical streaks? I hope this inspires others to think of and look up other types of wiki streaks :-)


Wikipedia's new skin is a sad opportunity to reminisce what we could've had instead

By the time you read this, you'll probably have seen Wikipedia's new layout ("skin"), dubbed "Vector 2022". You can read about the changes it brings.

As with most design changes, some people will like it and some people won't. But me? I just feel sad because years ago we had a popular, volunteer-driven skin proposal that was shut down by arguments that today we now know were in bad faith and hypocritical.

Back in 2012, then-Wikimedia Foundation senior designer Brandon Harris aka Jorm pitched a new idea: "The Athena Project: being bold", outlining his vision for what Wikipedia should look like.

During the question-and-answer period, I was asked whether people should think of Athena as a skin, a project, or something else. I responded, "You should think of Athena as a kick in the head" – because that's exactly what it's supposed to be: a radical and bold re-examination of some of our sacred cows when it comes to the interface.

His proposal had some flaws, but it was ambitious, different and forced people to think about what the software could be like.

By 2013-2014, focus pivoted to "Winter", an actual prototype that people could play with and conduct user testing on. Unfortunately I've been unable to find any screenshots or videos of the prototype You can play with the original prototype (thanks to Izno for pointing out it has been resurrected). Jorm would leave the WMF in 2015 and it seemed like the project had effectively died.

But later in 2015, Isarra (a volunteer, and a good friend of mine) unexpectedly dropped a mostly functional skin implementing the Winter prototype, named "Timeless". You can try it yourself on Wikipedia today. (I'll wait.)

By the end of 2016, there was a request for it to be deployed to Wikimedia sites. It underwent a security review, multiple rounds of developers poking at it, filing bugs and most importantly, fixing those bugs. The first set of French communities volunteered to test Timeless in February and March 2017. Finally in August 2017 it was deployed as an opt-in user preference to test.wikipedia.org, then iteratively deployed to wikis that requested it in the following weeks before being enabled everywhere in November.

I've been using Timeless ever since, on both my wide monitor and tiny (relatively) phone, it works great. I regularly show it to people as a better alternative to the current mobile interface and they're usually blown away. On my desktop, I can't imagine going back to a single-sidebar layout.

In January 2021, I interviewed Jorm for a Signpost story, and asked him about Timeless. He said, "I love Timeless and it absolutely should replace Vector. Vector is a terrible design and didn't actually solve any of the problems that it was trying to; at best it just swept them under rugs. I think the communities should switch to Timeless immediately."

What went wrong?

At the end of 2017, following Timeless being deployed everywhere as opt-in, Isarra applied for a grant to continue supporting and developing Timeless (I volunteered as one of the advisors). Despite overwhelming public support from community members and WMF staff, it was rejected for vague reasons that I'm comfortable describing as in bad faith. Eventually she applied yet once again and received approval midway through 2018. This time I provided some of the "official WMF feedback" publicly. But the constant delays and secret objections took a lot of steam out of the project.

Despite all of that, people were still enthusiastic about Timeless! In March 2019, the French Wiktionary requested Timeless to become their default skin. This is a much bigger deal than just allowing it as an opt-in choice, and led to discussion of whether Wikimedia wikis need to have a consistent brand identity, how much extra work developers would need to do to ensure they fully support the now-two default skins, and so on. You can read the full statement on why the task was declined - I largely don't disagree with most of it and the conclusion. If Timeless was going to become the default, it really needed to be the default for everyone.

Of course, this principle of consistency would be thrown out in the 2022 English Wikipedia discussion on whether to switch the default to the new "Vector 2022" skin, which was going to be allowed to opt-out of the interface everyone else was using if they voted against it.

Had the French Wiktionary been allowed to switch their default to Timeless, it would've continued to get more attention from users and developers, likely leading to more wikis asking for it to become the default.

You can skim through how Vector 2022 came about. Just imagine if even a fraction of those resources had gone toward moving forward with Timeless, backing a volunteer-driven project. It's just sad to think of it now.

So...

I started this story with Jorm's op-ed rather than a history of MediaWiki skins because I think he accurately captured that the skin is just a subset of the broader workflows that Wikipedians go through that desperately need improvement. Unfortunately that focus on workflows has been lost and it shows, we're all still using the same gadgets for critical workflows that we were 10 years ago. (I won't go into detail on the various Timeless features that make workflows easier rather than more difficult.)

Vector 2022, coming 12 years after the original Vector, is a rather narrow subset of fixes to the largest problems Vector had (lack of responsiveness, collapsed personal menu, sticky header, etc.). It's just not the bold change we need. Timeless, far from perfect, was certainly a lot closer.


Advent of Code 2022, in Rust

There's a yearly programming contest called Advent of Code (AoC). If you haven't heard about it, I'd recommend reading betaveros's post explaining what makes it unique.

This was my third attempt at AoC, previously trying it in 2019 (made it to day 5) and 2021 (day 6). This year I made it to... drumroll ...day 14! I had a good time this year, primarily because a group of friends (read: wiki folks on Mastodon) were doing it every day, so I'd be motivated to be able to compare my solution with their own.

Then on day 15 at midnight I looked at the puzzle and said "nope." and went to sleep.

AoC definitely messed with my sleep schedule being on EST and starting the puzzles at midnight rather than the 9 p.m. back in PST. Once I finished each puzzle, it always took a while to calm down from the rush and by then I'm sleeping at least an hour later than I should've been.

But since I was starting as soon as the puzzle came out on most days, the leaderboard accurately reflects how long it took me on those puzzles:

      --------Part 1---------   --------Part 2---------
Day       Time    Rank  Score       Time    Rank  Score
 14   00:35:44    2411      0   00:40:21    1977      0
 13   00:30:11    1920      0   00:38:08    1735      0
 12   23:09:41   34803      0   23:24:54   33874      0
 11   00:28:01    1435      0   01:01:03    2707      0
 10   00:15:40    2657      0   00:27:38    1841      0
  9   02:34:24   15092      0   02:56:58   11213      0
  8   00:36:38    6896      0       >24h   61768      0
  7   00:34:54    2671      0   00:45:38    2924      0
  6   00:08:31    5046      0   00:10:01    4555      0
  5   00:16:09    1720      0   00:17:34    1375      0
  4   00:08:33    3667      0   00:10:10    2539      0
  3   14:34:00   82418      0   22:00:31   92084      0
  2   14:27:16  100430      0   14:47:19   94770      0
  1   17:13:27  112294      0   17:16:09  107095      0

Day 5 was my best performance, I attribute that to the input format requiring a more-complex-than-usual parser, which I sidestepped by cleaning up the input in my editor first.

I posted a link to each day's solution and some commentary on a Mastodon thread. All of my solutions are available in a Git repo.

Overall I enjoyed doing the challenges in Rust. I feel that a good amount of the puzzles just required basic string/array manipulation, which are faster to do in a dynamically typed language like Python, but there were plenty of times I felt Rust's match statement (which Python now sort of has...) and sum types came in handy. Specifically with Rust's match statement, the compiler will complain if you don't satisfy some branch, which helped when e.g. implementing the rock-paper-scissors state machine.

As far as learning goes, I picked up some CS concepts like Dijkstra's algorithm. I'm not sure I really learned any more Rust, just got more comfortable with the concepts I already knew and likely faster at applying them. For the past few months I feel like I'm now thinking in Rust, rather than thinking in Python and writing it in Rust.

Past puzzles are available indefinitely, so you can do them whenever you want. I don't plan on finishing the rest, I mostly lost the incentive now that it's no longer a daily thing. But I'll probably try again in December and see how far I go :-)


2022 goals, revisited

I set some goals for myself at the beginning of the year.

Here's how it went:

  • Move out of my parents' house.
    ✔️ I live in New York City now. This was definitely my biggest goal and accomplishment of the year.
  • Contribute something meaningful to SecureDrop.
    ✔️ I think so. I need to writeup some of the stuff I worked on in the past year.
  • Contribute something meaningful to MediaWiki.
    ✔️ Slightly more mixed because I contributed a lot less this year than in the past, but I still consider myself having contributed in a meaningful way.
  • Not get COVID.
    ❌ Got it in June :(
  • Continue contributing to Mailman.
    ❌ Didn't really find the motivation this year. I'm hoping to spend more time on this in 2023, Wikimedia's Mailman install is showing that it needs more love.
  • Continue working on mwbot-rs, while having fun and learning more Rust.
    ✔️ I posted updates on the News page.
  • Get more stickers (lack of in-person meetups has really been hurting my sticker collecting).
    ✔️ Definitely. New stickers include FIRE, California poppies, Pacific Northwest, Qubes, HOPE 2022, Pinnacles National Park, a corgi, and very nice yellow and blue bird that says, "There is more power in peace than in violence".
  • Port the rest of my wiki bots to Rust.
    ❌ Still running in good old PHP. I don't really think this is worth it anymore, people are too used to the current bugs that introducing a different set of bugs would be more disruptive than helpful.
  • Make progress on moving wiki.debian.org to MediaWiki.
    ❌ No real progress :(
  • Write at least one piece of recognized content (DYK/GA/FA) for Wikipedia.
    ✔️ I racked up 4 DYKs this year, List of United States Supreme Court leaks (May 2022), Eleanor Bellows Pillsbury (June 2022), 2022 University of California academic workers' strike (Dec. 2022), and Canadian Coalition for Firearm Rights (Dec. 2022). Hitting the DYK threshold feels pretty straightforward now that I should probably aim for a GA!
  • Travel outside the US (COVID-permitting).
    ❌ I probably had the opportunity, but just didn't feel comfortable because of COVID. Planning at least two international trips in 2023!
  • Finish in the top half of our Fantasy Football league and Pick 'em pool. I did pretty well in 2020 and really regressed in 2021.
    ❓ Too early to say. Currently doing well in the Pick 'em pool, but not in Fantasy.
  • Keep track of TV show reviews/ratings. I've been pretty good about tracking movies I watch, but don't yet do the same for TV.
    ❌ Started, but didn't finish.

Publicly publishing a list of goals was nice, every few months I'd re-read the post to see if I was on track or not. But I don't intend to publish my 2023 goals, I expect they'll be more personal than these were.


MySQL connection pooling in Rust for Toolforge

Toolforge is a free cloud computing platform designed for and used by the Wikimedia movement to host various tools and bots. One of the coolest parts of using Toolforge is that you get access to redacted copies of the MediaWiki MySQL database replicas, aka the wiki replicas. (Note that whenever I say "MySQL" in this post I actually mean "MariaDB".)

In web applications, it's pretty common to use a connection pool, which keeps a set of open connections ready so there's less overhead when a new request comes in. But the wiki replicas are a shared resource and more importantly the database servers don't have enough connection slots for every tool that uses them to maintain idle connections. To quote from the Toolforge connection handling policy:

Usage of connection pools (maintaining open connections without them being in use), persistent connections, or any kind of connection pattern that maintains several connections open even if they are unused is not permitted on shared MySQL instances (Wiki Replicas and ToolsDB).

The memory and processing power available to the database servers is a finite resource. Each open connection to a database, even if inactive, consumes some of these resources. Given the number of potential users for the Wiki Replicas and ToolsDB, if even a relatively small percentage of users held open idle connections, the server would quickly run out of resources to allow new connections. Please close your connections as soon as you stop using them. Note that connecting interactively and being idle for a few minutes is not an issue—opening dozens of connections and maintaining them automatically open is.

But use of a connection pool in code has other benefits from just having idle connections open and ready to go. A connection pool manages the max number of open connections, so we can wait for a connection slot to be available rather than showing the user an error that the number of connections for our user has already been met. A pool also allows us to reuse open connections if we know something is waiting for them instead of closing them. (Both of those are real issues Enterprisey ran into with their new fast-ec tool: T325501, T325511; which caused me to finally investigate this.)

With that in mind, let's set up a connection pool using the mysql_async crate that doesn't keep any idle connections open. You can pass pool options programatically using a builder, or as part of the URL connection string. I was already using the connection string method, so that's the direction I went in because it was trivial to tack more options on.

Here's the annotated Rust code I ended up with, from the toolforge crate (source code):

impl fmt::Display for DBConnectionInfo {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        // pool_min=0 means the connection pool will hold 0 active connections at minimum
        // pool_max=? means the max number of connections the pool will hold (should be no more than
        //            the max_connections_limit for your user (default 10)
        // inactive_connection_ttl=0 means inactive connections will be dropped immediately
        // ttl_check_interval=30 means it will check for inactive connections every 30sec
        write!(
            f,
            "mysql://{}:{}@{}:3306/{}?pool_min=0&pool_max={}&inactive_connection_ttl=0&ttl_check_interval=30",
            self.user, self.password, self.host, self.database, self.pool_max
        )
    }
}

In the end, it was pretty simple to configure the pool to immediately close unused connections, while still getting us the other benefits! This was released as part of toolforge 5.3.0.

This is only half of the solution though, because this pool only works for connecting to a single database server. If your tool wants to support all the Wikimedia wikis, you're out of luck since the wikis are split across 8 different database servers ("slices").

Ideally our pool would automatically open connections on the correct database server, reusing them when appropriate. For example, the "enwiki" (English Wikipedia) database is on "s1", while "s2" has "fiwki" (Finnish Wikipedia), "itwiki" (Italian Wikipedia), and a few more. There is a "meta_p" database that contains information about which wiki is on which server:

MariaDB [meta_p]> select dbname, url, slice from wiki where slice != "s3.labsdb" order by rand() limit 10;
+---------------+--------------------------------+-----------+
| dbname        | url                            | slice     |
+---------------+--------------------------------+-----------+
| mniwiktionary | https://mni.wiktionary.org     | s5.labsdb |
| labswiki      | https://wikitech.wikimedia.org | s6.labsdb |
| dewiki        | https://de.wikipedia.org       | s5.labsdb |
| igwiktionary  | https://ig.wiktionary.org      | s5.labsdb |
| viwiki        | https://vi.wikipedia.org       | s7.labsdb |
| cswiki        | https://cs.wikipedia.org       | s2.labsdb |
| enwiki        | https://en.wikipedia.org       | s1.labsdb |
| mniwiki       | https://mni.wikipedia.org      | s5.labsdb |
| wawikisource  | https://wa.wikisource.org      | s5.labsdb |
| fiwiki        | https://fi.wikipedia.org       | s2.labsdb |
+---------------+--------------------------------+-----------+
10 rows in set (0.006 sec)

(Most of the wikis are on s3, so I excluded it so we'd actually get some variety.)

Essentially we want 8 different connection pools, and then a way to route a connection request for a database to the server that contains the database. We can get the mapping of database to slice from the meta_p.wiki table.

This is what the new WikiPool type aims to do (again, in the toolforge crate). At construction, it loads the username/password from the my.cnf file. Then when a new connection is requested, it lazily loads the mapping, and opens a connection to the corresponding server, switches to the desired database and returns the connection.

I've done some limited local testing of this, mostly using ab to fire off a bunch of concurrent requests and watching SHOW PROCESSLIST in another tab to observe all connections slots being used with no idle connections staying open. But it's not at a state where I feel comfortable declaring the API stable, so it's currently behind an unstable-pool feature, with the understanding that breaking changes may be made in the future, without a semver major bump. If you don't mind that, please try out toolforge 5.4.0 and provide feedback! T325951 tracks stabilizing this feature.

If this works interests you, the mwbot-rs project is always looking for more contributors, please reach out, either on-wiki or in the #wikimedia-rust:libera.chat room (Matrix or IRC).


How rich and famous people influence Wikipedia

There were two prominent stories this week about how rich and famous people tried to influence Wikipedia's coverage, and depending on your point of view, got their way. I think the coverage of both stories missed the mark so I'd like to dive into them a bit deeper.

But first, Canada is currently discussing enacting a new gun control law, known as Bill C-21. A prominent ice hockey player, Montreal Canadiens goalie Carey Price, spoke out in opposition to the bill, aligning himself with the Canadian Coalition for Firearm Rights. At the same time the CCFR was under fire for creating a online coupon code, "POLY", which people assumed referred to the 1989 École Polytechnique massacre (the group denies this).

If you had wanted to look up the Canadian Coalition for Firearm Rights on Wikipedia prior to December 7, you wouldn't have found anything. You probably wouldn't have learned that in 2019 they asked members to file complaints against a doctor who called for a ban on assault rifles, or that their CEO shot his first firearm in...the United States.

I'm not very in tune with Canadian politics, so it's unclear to me how prominent this group is actually (doesn't seem to be on the level of the NRA in the US). But Price put them on the map and now there's a Wikipedia article that will educate people on its history. (It's even been approved to go on the Main Page, just pending scheduling.) 1 point for rich and famous people influencing Wikipedia's coverage for the better.

OK, so now onto author Emily St. John Mandel, who is divorced and wanted Wikipedia to not falsely say she was married. She posted on Twitter, "Friends, did you know that if you have a Wikipedia page and you get a divorce, the only way to update your Wikipedia is to say you’re divorced in an interview?"

She then did an interview in Slate, where she was specifically asked and answered that she was divorced.

The thing is, that probably wasn't necessary. Yes, Wikipedia strongly prefers independent, reliable sources as the "Wikipedia:Reliable sources" policy page goes into great detail about. But in certain cases, using the person themselves as a source is fine. In the section "Self-published and questionable sources as sources on themselves", the policy lists 5 criteria that should be met:

  1. The material is neither unduly self-serving nor an exceptional claim.
  2. It does not involve claims about third parties (such as people, organizations, or other entities).
  3. It does not involve claims about events not directly related to the subject.
  4. There is no reasonable doubt as to its authenticity.
  5. The Wikipedia article is not based primarily on such sources.

On top of this, Wikipedia has a strict policy regarding biographies of living persons (BLP), that would lend more weight to using the self-published source.

If Mandel had just tweeted, "I'm divorced now.", that would've been fine. In fact, the first person to update her article with a citation about her divorce used her tweet, not the Slate interview! In the past I've also used people's tweets to remove incorrect information from Wikipedia.

(That said, people do lie about their age, height, etc. So far the worst case I've ever run into was Taio Cruz, who reached the level of sending in a fake birth certificate. You can read the talk page, it's a giant mess.)

And then there's Elon Musk (sigh), who tweeted about how Wikipedia is biased, right after an "Articles for deletion" discussion was started on the Twitter Files article.

Vice covered it with: "We Are Watching Elon Musk and His Fans Create a Conspiracy Theory About Wikipedia in Real Time". It goes into good detail about the Wikipedia deletion process, but I don't fully agree with the conclusion that this is how the process is supposed to work, and how it usually works.

I cast a vote in the discussion, stating it was easily notable and an obvious keep. By the time it was closed, the tally was 73 keep votes, 27 delete votes, and 23 merge votes. Wikipedians will tell you that these discussions are not a vote, rather the conclusion is based on the strength of the arguments. But in this case, I want to focus on the direction of the discussion rather than the final result.

At the time Musk tweeted (Dec 6, 18:46 UTC), the vote count was 12 delete votes, 4 keep votes, 4 merge votes (I should say that I'm relying on Enterprisey's vote-history analysis for these numbers). The votes post-tweet were 69 keep, 15 delete, 19 merge. That's a pretty big shift!

I would like to think that Wikipedians would have reached the same (and IMO correct) conclusion regarding the existence of the Twitter Files article without Musk's "intervention", but it's hard to say that for sure.

But, as I've hopefully demonstrated, Musk is not alone in trying to influence Wikipedia. Rich and famous people do it all the time, for entirely different goals, and sometimes without even realizing it!