Improving quality and maintenance of election result boxes on Wikipedia

I've spent a decent amount of time reading Wikipedia articles about Senate, House, and state races this week. And...there were inconsistencies. Specifically in the election box result templates:

Before I updated the article

That's what it looked like before I updated the article to use my template instead. There are a few different issues. First, Mike Thompson isn't marked as the incumbent. And second, the total number of votes is wrong - if you do the math, it adds up to 292,091. There are two more cosmetic issues: 1) the % column should go to one decimal point, and 2) the empty turnout field should be hidden since we don't have that data available.

The fixed version

That's the fixed version, that's using my template. So what's different? The main thing that most editors will notice is the amount of wikitext it took to generate my version: {{election box US auto|California|2016|United States Representative District 5|Mike Thompson link=Mike Thompson (California politician)}}. Compare that to what was needed previously. I think it's a pretty big improvement. Oh, and if you set the year to a comma separated list, like "2012,2014,2016", it'll generate all three boxes at once, so it becomes even easier to use.

This will also reduce the maintenance burden significantly. These boxes are copied to other articles, including Mike Thompson (California politician), which is using the wrong styles and missing the 2016 general election entirely, and on United States House of Representatives elections in California, 2016, which actually looks correct!

This is all being generated by a single Lua module called Module:Election box US auto, and a tabular data spreadsheet that's available on Commons. The Lua code isn't the cleanest, but it proves that we can replace things that were manually maintained with smarter templates that do most of the heavy lifting. To the best of my knowledge, this appears to be the first usage of the new tabular data system in English Wikipedia articles. I've updated the California's 1st congressional district through 8th district articles to use the "auto" template so far.

What's next? I'm going to import the MIT Election Data for the US House dataset to Commons so we can start using this in more articles outside of California soon (waiting on bot approval). The only thing that's missing from that dataset is incumbency data - it doesn't indicate whether the candidate was running as an incumbent (so if you know of incumbency data, please let me know!). And once we get the House in good shape, we can move onto the Senate and then state races. Aaaand I've even had someone ask me about expanding this to other countries, which should totally be doable! Anything is possible with Lua+tabular data.


On Jim Acosta

Two weeks ago, CNN Chief White House correspondent Jim Acosta came to my university and received the 2018 William Randolph Hearst Award for his work. I'm not really the biggest fan of CNN, but I've been impressed with Acosta's work, and getting to hear him speak was a real treat. He talked about what he endured at Trump rallies, and after hearing that, learning that his press pass was revoked honestly wasn't that surprising. The only way freedom of the press works is if journalists like Acosta ask the tough questions, and hold those in power responsible.

Here's my favorite clips of the student interviews of Acosta, which happened the day after pipe bombs were sent to CNN headquarters:

Jim Acosta discusses the October 2018 attempted bombing of CNN headquarters at SJSU
Full quality video available on Wikimedia Commons.

Jim Acosta discusses his trip back to Cuba at SJSU
Full quality video available on Wikimedia Commons.

Jim Acosta discusses why he became a journalist at SJSU
Full quality video available on Wikimedia Commons.

I also uploaded a full copy of the student interview, and a full copy of Acosta's speech (which I haven't had time to cut into smaller segments yet).

P.S.: If you ever get the chance to meet him, ask to see his socks, they're great.



Writing a new MediaWiki tarball release script

Last week's security release of MediaWiki 1.27.5 / 1.29.3 / 1.30.1 / 1.31.1 mentioned a small hint of a new release script being used for this release. Chad came up with the concept/architecture of the new script, I wrote most of the code, and Reedy did the actual release, providing feedback on missing functionality and other feature requests.

Before I explain the new script, let me explain how the old script worked (source). First, the script would clone MediaWiki core, extensions, skins, and vendor for you. Except it wanted one directory per version being released, so if you wanted to do a security release for 4 MediaWiki versions, then you'd need to have MediaWiki core cloned 4 times! Oh, but since we need to make patch files against the previous release, it'll need to recreate those tarballs (a separate problem), so you now have 8 clones of MediaWiki core. Ouch.

Then comes security patches. These patches are not yet published on Gerrit, and currently only exist as git patch files. The old script required these be in a patches directory, but in a specific naming pattern so the script would know which branch they should be applied to. Mostly this confused releasers and wasn't straightforward.

There were definitely other issues with the old script, but those two were the main motivation for me at least.

Enter makerelease2.py (inital commit). The theory behind this script is to simply archive whatever exists in git. We added the bundled extensions and skins plus vendor as submodules, so we did not have to maintain separate configuration on which extensions should be bundled for which MediaWiki version. This also had the added benefit of making the build more reproducible, as each tag now has a pointer to specific extension commits instead of always using the tip of the release branch.

Excluded files can be maintained with .gitattributes rather than by the release script (yet another plus for reproducibility, maybe you can see a pattern :)).

If you're not already familiar, there's a git-archive command, which creates tarballs (or zipballs) based on what is in your repository. Notably, GitHub uses this for their "Download tarball/zipball" feature.

There's only one drawback - git-archive doesn't support submodules. Luckily other people have also run into this limitation, and Kentzo on GitHub wrote a library for this: git-archive-all. It respects .gitattributes, and had nearly all the features we needed. It was missing the ability to unset git attributes, which I submitted a pull request for, and Kentzo fixed up and merged!

So, running the new script: ./makerelease2.py ~/path/to/mw-core 1.31.1. This will spit out two tarballs, one of the mediawiki-core variant (no extensions or skins bundled), and the full mediawiki tarball. You can create a tarball of whatever you want, a tag, branch, a specific commit, etc., and it'll run. Additional checks will kick in if it is a tag, notably that it will verify that $wgVersion matches the tag you're trying to make.

To create a security release, you take a fresh clone of MediaWiki core, apply the security patches to the git tree, and create the new tags. Using the native git tools makes it straightforward to apply the patches, and then once the release has been announced, it can easily be pushed to Gerrit.

If you pass --previous 1.31.0, then it will additionally create a patch file against the previous tarball that you specified. However, instead of trying to recreate that tarball if it doesn't exist, we download that tarball from releases.wikimedia.org. So regardless of any changes to the release scripts, the patch file will definitely apply to the previous tarball (this wasn't true in the past).

The following bugs were fixed by this rewrite:

What's next? This was really only step 1 in the "streamline MediaWiki releases" project. The next step (as outlined by Chad) is to continuously be generating tarballs, and then be generating secret tarballs that also include the current security patches. I don't think any of this is especially technically hard, it will mostly require process improvements with how we handle and manage security patches.


Goodbye PHPStorm, hello Atom

I've been using the JetBrains IDE PHPStorm ever since I really got started in MediaWiki development in 2013. Its symbol analysis and autocomplete is fantastic, and the built-in inspections generally caught most coding issues while you were still writing the code.

But, it's also non-free software, which has always made me feel uncomfortable using it. I used to hope that they would one day make a free/libre community version, like they did with their Python IDE, PyCharm. But after five years of waiting, I think it's time to give up on that hope.

So, about a year ago I started playing with replacements. I evaluated NetBeans, Eclipse, and Atom. I quickly gave up on NetBeans and Eclipse because it took too long for me to figure out how to create a project to import my code into. Atom looked promising, but if I remember correctly, it didn't have the symbol analysis part working yet.

I gave Atom a try again two weeks ago, since it looked like the PHP 7 language server was ready (spoiler: it isn't really). I like it. Here's my intial feelings:

  • The quick search bar (ctrl+t) has to re-index every time I open up Atom, which means I can't use it right away. It only searches filenames, but that's not a huge issue since now most of MediaWiki class names match the filenames.
  • Everything that is .gitignore'd is excluded from the editor. This is smart but also gets in the way, when I have all MediaWiki extensions cloned to extensions/, which is gitignored'd in core.
  • Theme needs more contrast, I need to create my own or look through other community ones.
  • Language server regularly takes up an entire CPU when I'm not even using the editor. I don't know what it's doing - definitely not providing good symbol analysis. It really can't do anything more advanced than things that are in the same file. I'm much less concerned about this since phan tends to catch most of these errors anyways.
  • The PHPCS linter plugin doesn't work. I need to spend some time understanding how it's supposed to work still, because I think I'm using it wrong.

Overall I'm pretty happy with Atom. I think there are still some glaring places where it falls short, but now I have the power to actually fix those things. I'd estimate that my productivity loss in the past two weeks has been 20%, but now it's probably closer to 10-15%. And as time goes on, I expect I'll start making productivity gains since I can customize my editor significantly more. Hooray for software freedom!


Day 19: The End

Part of a series on my journalism faculty-led program through Italy and Greece.

It's over. Tonight was our last night on the trip, and we head back to Athens tomorrow. I think I learned a lot more about myself than the skills and knowledge I picked up. I spent some time reflecting by talking to people in person, so I'm not going to write anything up tonight. Goodbye (for now!) ^.^

"Boyz of FLP"

"Boyz of FLP"

P.S.: My team won in Jeopardy! tonight, $2,600 - $500 - $400 - $200. It was fun.