Kiwix returns in Debian Bullseye

(This is my belated #newindebianbullseye post.)

The latest version of the Debian distro, 11.0 aka Bullseye, was released last week and after a long absence, includes Kiwix! Previously in Debian 10/Buster, we only had the underlying C/C++ libraries available.

If you're not familiar with it, Kiwix is an offline content reader, providing Wikipedia, Gutenberg, TED talks, and more in ZIM (.zim) files that can be downloaded and viewed entirely offline. You can get the entire text of the English Wikipedia in less than 100GB.

apt install kiwix will get you a graphical desktop application that allows you to download and read ZIMs. apt install kiwix-tools installs kiwix-serve (among others), which serves ZIM files over an HTTP server.

Additionally, there are now tools in Debian that allow you to create your own ZIM files: zimwriterfs and the python3-libzim library.

All of this would not have been possible without the support of the Kiwix developers, who made it a priority to properly support Debian. All of the Kiwix and repositories have a CI process that builds Debian packages for each pull request and needs to pass before it'll be accepted.

Ubuntu users can take advantage of our primary PPA or the bleeding-edge PPA. For Debian users, my goal is that unstable/sid will have the latest verison within a few days of a release, and once it moves into testing, it'll be available in Debian Backports.

It is always a pleasure working with the Kiwix team, who make a point to send stickers and chocolate every year :)


Accidentally creating a server in the wrong datacenter

Yesterday I was working on upgrading the servers that power Wikimedia's Docker registry (see T272550). Since these are virtual machines, I was just creating new ones and going to delete the old ones later (because VMs are cattle, not pets).

We have a handy script to create new VMs, so I ran the following command:

legoktm@cumin1001:~$ sudo cookbook sre.ganeti.makevm --vcpus 2 --memory 4 --disk 20 codfw_B registry2004.eqiad.wmnet

In this command codfw_B refers to the datacenter and row to create the VM in, and registry2004.eqiad.wmnet is the requested fully qualified domain name (FQDN).

If you're familiar with Wikimedia's datacenters, you'll notice that I created the VM in codfw (Dallas) and gave it a name as if it were in eqiad (Virginia). Oops. I only noticed right as the script finished creation. (Also the 2XXX numbering is for codfw. 1XXX servers are in eqiad.)

Normally we have a decommissioning script for deleting VMs, but when I tried running it, it failed because the VM hadn't fully been set up in Puppet yet!

Then I tried just adding it to puppet and continuing enough of the VM setup that I could delete it, except our CI correctly rejected my attempt to do so because the name was wrong! I was stuck with a half-created VM that I couldn't use nor delete.

After a quick break (it was frustrating), I read through the decom script to see if I could just do the steps manually, and realized the error was probably just a bug, so I submitted a one-line fix to allow me to delete the VM. Once it was merged and deployed, I was able to delete the VM, and actually create what I wanted to: registry2004.codfw.wmnet.

Really, we should have been able to catch this when I entered in the command, since I specified the datacenter even before the FQDN. After some discussion in Phabricator, I submitted a patch to prevent such a mismatch. Now, the operator just needs to specify the hostname, registry2004, and it will build the FQDN using the datacenter and networking configuration. Plus it'll prompt for user confirmation that it was built correctly. (For servers that use numbers afterwards, it'll check those too.)

Once this is deployed, it should be impossible for someone to repeat my mistake. Hopefully.


That time I broke Wikipedia, but only for vandals

As one of the top contributors to MediaWiki, the free wiki engine that powers Wikipedia, I'm often asked how I got started. To celebrate Wikipedia's 20th birthday, here's that unfortuante story.

In late 2012, I was a bored college student who was spending most of his time editing Wikipedia. I reverted a lot of vandalism, and eventually began developing anti-vandalism IRC bots to allow patrollers like myself to respond to vandalism even faster than before.

I had filed a bug request asking for the events from our anti-abuse "edit filter" to be broadcast to the realtime IRC recent changes feed (at the time the only way to get a continuous, live feed of edits. A few months later no one had implemented it and I was annoyed.

After complaining to a few people about this, they suggested I fix it myself. The code is all open source and I know how to program, what could go wrong?

It's at this point I should've told someone I didn't actually know PHP; I knew plenty of Python and had just learned Java in my intro to computer science class.

I really had no clue what I was doing, but I submitted a patch that kind of looked right. I asked my friend Ori to review it, and he promptly approved the change and deployed it on the real servers that power Wikipedia.

The broken patch
My very broken patch

I was pretty excited, my first ever patch had been merged and deployed! The millions of people who visited Wikipedia every day would get served a page that included my code.

I then went to go test the change and it did. not. work. I made a test edit that I knew would trigger a filter, except instead of getting a notification from the realtime feed, I saw the Wikimedia Error screen.

In fact, for about 30 minutes any wannabe vandal (and a few innocent users) who triggered a filter would see the error page:

Old Wikimedia error page
This really wasn't a sustainable way to stop vandalism

I immediately told Ori that it was broken and his reaction was along the lines of: "You didn't test it??" He had assumed I knew what I was doing and tested my code before submitting it...oops. He very quickly fixed the issue for me, and then started teaching me how to properly test my patches.

The one line fix
The one line fix

He introduced me to MediaWiki-Vagrant, a then-new project to automate setting up a development instance, which is now used by a majority of MediaWiki developers (I was user #2).

There were a lot of things that went wrong in this story that should have caught this failure before it ended up on our servers. We didn't have any automated testing or static analysis to point out my patch was obviously flawed. We didn't do a staged rollout to a few users first before exposing all of Wikipedia to it.

This incident has stuck in my head ever since and I'm pretty confident it couldn't happen today because we've implemented those safeguards. I've spent a lot of time developing better static analysis tools (MediaWiki-CodeSniffer and phan especially) and building infrastructure to help us improve test coverage. We have proper canary deploys now, so these obvious errors should never make it to a full deployment.

It really sucked knowing that my patch had broken Wikipedia, but at the same it was invigorating. Getting my code onto one of the biggest websites in the world was actually pretty straightforward and within reach. If I learned a bit more PHP and actually tested my code first, I could fix bugs on my own instead of waiting for someone else to do it.

I think this mentality really represents one of my favorite parts about Wikipedia: if something is broken, just fix it.


Starting a new job

Last week I officially joined the Site Reliability Engineering team at the Wikimedia Foundation. I'll be working with the Service Operations team, which "...takes care of public and “user-visible” services."

I'm glad to be back at the WMF; I had originally started working there in 2013 but recently took a break to finish school. SRE will be my ninth distinct team at the WMF, and I'm looking forward to even more adventures.

As part of transitioning into my new role, I have unsubscribed myself from most MediaWiki bug mail and Gerrit notifications. Once I get more situated I'll put out a more detailed request for new maintainers for the components that need them. I'll continue taking care of maintenance as needed until then.

P.S.: I created a new userbox about Rust on mediawiki.org.


PGP key consolidation

Note: A signed version of this announcement can be found at https://legoktm.com/w/index.php?title=PGP/2020-12-14_key_consolidation.

I am consolidating my PGP keys to simplify key management. Previously I had a separate key for my wikimedia.org email, I am revoking that key and have added that email as an identity to my main key.

I have revoked the key 6E33A1A67F4E2DF046736A0E766632234B56D2EC (legoktm at wikimedia dot org). I have pushed the revocation to the SKS Keyservers and additionally published it at https://legoktm.com/w/index.php?title=PGP/2020-12-14_revocation.

My main key, FA1E9F9A41E7F43502CA5D6352FC8E7BEDB7FCA2, now has a legoktm at wikimedia dot org identity. An updated version can be fetched from keys.openpgp.org, the SKS Keyservers, or https://legoktm.com/w/index.php?title=PGP. It should also be included in the next Debian keyring update. I took this opportunity to extend the expiry for another two years to 2022-12-14.


Legoktm, B.S.

Me, wearing my cap and gown, in front of the San Jose State University sign
Photo by Jesus Tellitud and Blue Nguyen
The Trustees of the California State University
on recommendation of the faculty of the
College of Humanities and the Arts
have conferred upon

Kunal Mehta

the degree of

Bachelor of Science

Journalism

So this makes me a scientist now, right? I used to joke that I was putting in all this work for a piece of paper, but now I'm actually very proud of this piece of paper.

I want to thank all of my family, who really kept me going and supported me no matter what.

Thanks to my professors and teachers at De Anza and San Jose State for giving me the opportunity and platform to explore and grow my love for journalism. I'm proud to be an alum of La Voz, Update News and the Spartan Daily.

Of course, the real treasure was all the friends I made along the way. But seriously, I'm so glad I met all of you, and I will treasure our relationships.

Thanks to my colleagues at the Wikimedia Foundation and the Freedom of the Press Foundation for their furthering my professional development, assistance with networking, and just constant support. Also for indulging my sticker addiction.

To the IRC cabal: thanks for being the group of people I can turn to for help, no matter what time of day nor location. And for all the huggles.

I hope to celebrate in-person with you all ... soon.

P.S. Here are some more photos.