mwparserfromhell is now fully on wheels. Well...not those wheels - Python wheels!
If you're not familiar with it, mwparserfromhell is a powerful parser for MediaWiki's wikitext syntax with an API that's really convenient for bots to use. It is primarily developed and maintained by Earwig, who originally wrote it for their bot.
Nearly 7 years ago, I implemented opt-in support for using mwparserfromhell in Pywikibot, which is arguably the most used MediaWiki bot framework. About a year later, Merlijn van Deen added it as a formal dependency, so that most Pywikibot users would be installing it...which inadvertently was the start of some of our problems.
mwparserfromhell is written in pure Python with an optional C speedup, and to build that C extension, you need to have the appropriate compiler tools and development headers installed. On most Linux systems that's pretty straightforward, but not exactly for Windows users (especially not for non-technical users, which many Pywikibot users are).
This brings us to Python wheels, which allow for easily distributing built C code without requiring users to have all of the build tools installed. Starting with v0.4.1 (July 2015), Windows users could download wheels from PyPI so they didn't have to compile it themselves. This resolved most of the complaints (along with John Vandenberg's patch to gracefully fallback to the pure Python implementation if building the C extension fails).
In November 2016, I filed a bug asking for Linux wheels, mostly because it would be faster. I thought it would be just as straightforward as Windows, until I looked into it and found PEP 513, which specified that basically, the wheels needed to be built on CentOS 5 to be portable enough to most Linux systems.
With the new Github actions, it's actually pretty straightforward to build these manylinux1 wheels - so a week ago I put together a pull request that did just that. On every push it will build the manylinux1 wheels (to test that we didn't break the manylinux1 compatibility) and then on tag pushes, it will upload those wheels to PyPI for everyone to use.
Yesterday I did the same for macOS because it was so straightforward. Yay.
So, starting with the 0.6.0 release (no date set yet), mwparserfromhell will have pre-built wheels for Windows, macOS and Linux users, giving everyone faster install times. And, nearly everyone will now be able to use the faster C parser without needing to make any changes to their setup.
Originally posted on mastodon.technology.
The second installment of my Spartan Daily tech column, Binary Bombshells, is out! I discuss design flaws in Twitter that lead to harassment and how Mastodon addresses some of them: https://sjsunews.com/article/binary-bombshells-twitters-tools-help-online-harassers
After a pretty hectic last semester, I'm taking a much more backseat role on the Spartan Daily for hopefully my final semester at San Jose State. I'm going to be the new "Science & Tech Editor" - yes, I invented my own position. I am currently planning for a science & tech section every month as a special feature.
Every two weeks though, I'm going to be publishing a column, titled "Binary Bombshells", about the different values imbued in technology, analyzing the values they contain, explaining what effects they have upon us and suggesting any avenues for improvement.
You can read the first installment of my column now: Values exist in all technologies.

It's been a little over 2 years since I announced MediaWiki codesearch, a fully free software tool that lets people make regex searches across all the MediaWiki-related code in Gerrit and much more. While I expected it to be useful to others, I didn't anticipate how popular it would become.
My goal was to replace the usage of the proprietary GitHub search that many MediaWiki developers were using due to lack of a free software alternative, but doing so meant that it needed to be a superior product. One of the biggest complaints about searching via GitHub was that it pulled in a lot of extraneous repositories, making it hard to search just MediaWiki extensions or skins.
codesearch is based on hound, a code search engine written in go, originally maintained by etsy. It took me all of 10 minutes to get an initial prototype working using the upstream docker image, but I ran into an issue pretty quickly: the repository selector didn't scale to our then-500+ git repositories (now we're at more like 900!). So it wouldn't really be possible to just search extensions easily.
After searching around for other upstream code search engines and not having much luck finding things I liked, I went back to hound and instead tried running multiple instances at once and it more or less worked. I wrote a small ~50 line Python proxy to wrap around the different hound instances and provide a unified UI. The proxy was sketch enough that I wrote "Please don't hurt me." in the commit message!
But it seems to have held up over time, surprisingly well. I attribute that to having systemd manage everything and the fact that hound is abandoned/unmaintained/dead upstream, creating a very stable platform, for better or worse. We've worked around most of the upstream bugs so I usually pretend it's a feature. But if it doesn't get adopted sometime this year I expect we'll create our own fork or adopt someone else's.
I recently used the anniversary to work on puppetizing codesearch so there would be even less manual maintenance work in the future. Shoutout to Daniel Zahn (mutante) for all of his help in reviewing, fixing up and merging all the puppet patches. All of the package installation, systemd units and cron jobs are now declared in puppet - it's really straightforward.
For those interested, I've documented the architecture of codesearch, and started writing more comprehensive docs on how to add a new search profile and how to add a new instance.
Here's to the next two years of MediaWiki codesearch.
Originally posted on mastodon.technology.
My accomplishment for this week is acing the Cybersecurity category, including the two triple stumpers that none of the #JeopardyGOAT s got right!
http://www.j-archive.com/showgame.php?game_id=6522
In March 2018, Facebook began automatically rewriting links to use HTTPS using the HSTS preload list. Now all Wikimedia sites (including Wikipedia) do the same.
If you're not familiar with it, the HSTS preload list tells browsers (and other clients) that the website should only be visited over HTTPS, not the insecure HTTP.
However, not all browsers/clients support HSTS and users stuck on old versions might have outdated versions of the list.
Following Facebook's lead, we first looked into the usefulness of adding such functionality to Wikimedia sites. My analysis from July 2018 indicated that 2.87% of links on the English Wikipedia would be rewritten to use HTTPS. I repeated the analysis in July 2019 for the German Wikipedia, which indicated 2.66% of links would be rewritten.
I developed the SecureLinkFixer MediaWiki extension (source code) to do that in July 2018. We bundle a copy of the HSTS preload list (in PHP), and then add a hook to rewrite the link if it's on the list when the page is rendered.
The HSTS preload list is pulled from mozilla-central (warning: giant page) weekly, and committed into the SecureLinkFixer repository. That update is deployed roughly every week to Wikimedia sites, where it'll take at worst a month to get through all of the caching layers.
(In the process we (thanks Michael) found a bug with Mozilla not updating the HSTS list...yay!)
By the end of July 2019 the extension was deployed to all Wikimedia sites - the delay was mostly because I didn't have time to follow-up on it during the school year. Since then things have looked relatively stable from a performance perspective.
Thank you to Ori & Cyde for bringing up the idea and Reedy, Krinkle, James F & ashley for their reviews.