Interview with Brooke Vibber (2021)

By

I had the pleasure of interviewing Brooke Vibber back in January 2021 for my story in the Wikipedia Signpost, The people who built Wikipedia, technically, which looked back at the technical history of Wikipedia to celebrate the 20th birthday.

I was not able to use all of the responses Brooke sent me for the article, but after re-reading them in light of current events, I feel it's still worth publishing five years later. Hopefully after reading this, you'll celebrate Brooke Vibber day with us today. Dankon Brooke!


What was it like getting started? Today we have a process of signing up to get a Gerrit account, a vote to give out +2 powers, NDAs for server access...what was it like then? How did you convince the powers that be (Was it Jimmy?) to give you server access?

It was all pretty loosey-goosey; if you showed up and put in good work helping out, you might well hear "yes" to getting some fairly direct access to things because that was the only way we were going to get anyone to do it!

In the earliest days I think Wikipedia ran on one of Bomis's servers so we usually would partner up with one of Jimmy's employees as an intermediary, but once we grew to dedicated servers, and then the Tampa-based hosting starting 2003-2004, it was really a core group of a few of us, many of whom are senior employees at WMF today.

As far as commit access and code review, that was very different in the early days -- we had CVS (before going SVN in 2006 or so) which meant no built-in system for local commits and pull requests, so "commit" meant roughly "merge/+2" and everyone else had to e-mail patches and me or Tim would personally review them and apply them. A lot of work!

Then in SVN times we started giving out commit access like candy, and I had to create an on-wiki code review system to help manage post-commit review...

I'm not super fond of gerrit's UI, but any flavor of pull request is a step up. ;)

My understanding is that you dropped out of film school to work on Wikipedia/MediaWiki full time - is that a fair characterization? (Or would you not prefer to be called a dropout?)

That sounds about right! :D I never quite finished my degree, and instead of taking the few more classes I needed I pivoted into work for Wikipedia which lead me into working at WMF. So, score another one for the dropouts, but I never struck it rich because we're a non-profit. ;)

What kind of programming experience did you have beforehand?

I actually grew up around programming -- my father is a software engineer and made sure my brother and I had the tools at home to learn if we were interested (8-bit Ataris when we were little, graduating up into the PC DOS/Windows/Linux worlds later). My brother and I both took well to it... though we craved that arts in high school and college -- he did some years in theater, and I worked on that film degree for a while. ;)

So when I fell back into computers, it was a world I'd never really left, but I'd also never been formally trained in it. Working on a massively-scaling-up project like early Wikipedia was a crash course in algorithmic complexity even if I didn't get all the flowerly language of big-O notation from textbooks. ;)

In February 2002, you suggested adding a WYSIWIG editor - did you ever imagine it would take as long as it did for us to get VisualEditor?

I always suspected it would be much harder to do full WYSIWIG while keeping complete compatibility with the old markup language, and I think I was proven right. :) That said, a lot of people still like the markup, so if we've succeeded at bringing it, hey! Nice.

Superprotect aside, should the rollout of MediaViewer have been as controversial as it was?

I think it should've been less controversial by far, but it also was a good example of the messaging around a rollout, and the communication between the developers, project managers, and users, was just not good. MediaViewer is great for many uses for casual readers -- its primary target audience -- but falls down in a particular way for editors who need to maintain images, not just look at them. This feedback could've been better dealt with and resolved, I think.

There's a trope that Wikimedians today are reluctant to any software change. Can you talk about some of the large software changes from the early days? How was communication with non-technical editors handled? What would be the thinking on whether to revert or leave it in a buggy state?

Honestly I think our change-conservativism issues were in full swing even in the early 2000s. We didn't make many markup changes after 2003, and only a few before that. We're now paying for that by continuing to be conservative in refusing to change things that are weird, confusing, and inconsistent because we didn't quite think through those changes and now refuse to change them again at all.

I believe you, Magnus and Tim were all college students when you got involved with Wikipedia. Do you think that was a coincidence or is there something about Wikipedia that appeals to students?

I definitely think there's an affinity between the college experience -- getting into a medium-sized world outside your house but before the big wide world and learning new fascinating things in a community of peers who want to learn and share knowledge -- and getting into the world of Wikipedia as a casual (or hardcore) editor or other contributor. :)

Today the WMF is a very professional (for lack of a better term) organization, in terms of defined processes, policies. There's an on call SRE team, etc. In the early days the servers were run by Bomis - was it like that? What kind of support did they provide? Were there power struggles between editors/volunteers and Bomis?

Bomis was very hands-off; when we ran on Bomis servers we just had some limitations in what we could access directly, and eventually I think they started encouraging Jimmy to buy separate servers when they started to become a large part of capacity. ;)

In 2012 you told the Signpost that your initial paychecks came from building "content-feed support for some third-party indexers" - I assume this was the Wikimedia update feed service? What parallels, if any, do you think this kind of API and funding model have with the new Wikimedia Enterprise project?

I think there's a direct parallel, and I'm glad to see us getting explicitly back into the business of checking in with folks who reuse our data and making sure they pay us for the privilege of the data being reliably present. :)

With 20 years of hindsight, what would you have told Magnus when he announced he was going to write the new wiki software in PHP?

Honestly, PHP was a good choice at the time. A lot of people make fun of it for its quirky syntax, function naming, and some old security misfeatures from 20 years ago, but it's an easily-approachable language with an execution model that fits web servers reasonably well and has clear ways to scale horizontally to serve more requests.

I can't really tell from the mailing list archives, but did Magnus (and later Lee for phase 3) tell people they were planning major rewrites of the software or did they just show up with it one day?

These were announced and talked about on the mailing lists. I actually feel like we were more attuned as a technical subcommunity to what was going on because there were only two lists to pay attention to, wikipedia-l and wikitech-l. ;) Magnus's code was also tested on meta.wikipedia.com (remember .com?) as I recall, before we upgraded English Wikipedia itself.

Do you think having one person do a major rewrite on their own was necessary or would have incremental refactoring have been better?

I think of Lee's rewrite as a huge one-person refactor from Magnus's early prototype. It changed a lot but had recognizably similar bones.

Ever since then we've been on the incremental refactoring track, with big internal changes being managed much more carefully.

Some of the other technology decisions like MySQL, Apache, memcached, and on have held up over time and are still actively in use today. Why do you think that is? What was the thought process when selecting those in the beginning?

These are tools that we adopted because they were tried and tested in the wild, and some of those have indeed stuck around while others have changed a bit. For instance I think we now use a mix of several web servers as well as Apache to implement the caching and TLS proxies, memcached is supplemented with redis, etc. But they've kept common protocols because they're tools that are so widely used that specialized high-performance versions that are compatible with the original exist.

And that's pretty cool!

Before you left for StatusNet, would you have considered yourself the BDFL of MediaWiki?

I'd say me and Tim Starling were kind of the dream team dual-BDFLs. :) I took a more hands-off role since I left for StatusNet and came back.

My feeling is that today neither you nor Tim (the other BDFL candidate) want that role - is that accurate? Why not? Do you view BDFLs as a good leadership model for a project like MediaWiki?

I think the BDFL role, and hero worship and cult-of-personality in general, are bad. I think communities of mutually respectful people, both developers and users, researchers and managers, editors and educators, should work together to manage their projects fairly.

Wikipedia or Wikimedia? Why?

Wikipedia has the name recognition. :)

For a piece of software named MediaWiki, it really doesn't do a great job of handling media. This is something you've worked on a ton, from ogv.js to maintaining TimedMediaHandler. Why is this an area of interest for you? Can you give a summary of your efforts? What do you think is the next area to explore for improving media?

Indeed, beyond handling image uploads we're not that great at media. ;)

I would love for us to do more work in general in this area, from audio to video to 3d models to interactive graphs and maps and diagrams of all kinds. We have some of these things, but none of them are resourced for additional by Wikimedia Foundation except for whatever research or side time interested parties like me put in.

I do hope something makes it in the budget for 2021-2022!

The main improvements I've been working on with TimedMediaHandler are:

  • finish removing the old jQueryUI-based Kaltura-built frontend
  • finish fixing bugs in the new VideoJS-based frontend, which loads less JS and has fewer modules -> thus performance benefits on first load
  • cleanup on the ogv.js shim which lets us play WebM and Ogg files directly in Safari -> combined with the above, this'll mostly fix video playback on iPhone

The ogv.js layer itself is one of my proudest projects, but one that's designed to become obsolete -- eventually either Safari will adopt a format we're willing to encode, or we'll adopt a format Safari can play. ;)

Future plans that I can't guarantee time for include modernizing the subtitle editor and integrating trim controls into the VisualEditor media selection. But I hope to get to them too...

Was "Contingency plans" something you were involved in? What was it like planning for a hurricane?

I think others worked on that particular list, but it was certainly something we thought about.

We only really had one or two hurricanes that directly threatened Tampa and didn't have any outages, but it really wasn't the best place for servers. :) It just happened to be near where Jimmy lived at the time, so was convenient for him to help load up the earliest servers. ;)

Are there any outage stories, or near outage stories that stick out to you as particularly funny/sad/interesting/memorable?

I think it was around Christmas 2003? We had just set up our new 64-bit database server, and had I think two or three other machines as web servers. This would've been at the Tampa data center, so all remote from where I was at the time in southern California.

Load kept going up on the db server... we weren't sure why, but it seemed to be a hardware failure... eventually it just choked up. We had to switch back to the replica on one of the lower-end boxes. We had partial service during the crash and switchover by using the file cache (is that feature still even there?) to serve out cached pages without hitting the DB.

The secondary machine had trouble too, and died a couple days later, making it a double outage.

On the plus side, this gave us lots of opportunity to push our donation link in error pages! ;)

https://en.wikipedia.org/wiki/Wikipedia:Milestones_2003#December_2003

What MediaWiki feature did you work on that you're most proud of and why?

Special:Export and the XML dump format. It's not pretty, but it gets the job done and helps tons of people build tools on top of it!

Who is someone from the early days of Wikipedia who doesn't get enough credit for their contributions?

Lee Daniel Crocker. Lee and I don't get along on some issues so we don't talk on Facebook anymore, but I learned a lot about programming from him and the work he put in in the early days of what became MediaWiki. And that's something I'll always appreciate.

A lot of old Wikipedia and foundation work and history is in the archives of lists.wikimedia.org. But today (public) mailing lists are mostly used for announcements and discussions (at least on wikitech-l, wikimedia-l and other top lists)...do you think the shift away from mailing lists makes sense?

I think it's not surprising, but I'm not sure I like it. Maybe it's just graybeard syndrome; it was easier to have an idea what's coming down the pipe "in the old days" but at the same time it was a smaller pipe, wasn't it? Maybe email vs phab vs gitlab vs discourse doesn't make a difference. Maybe it's just whether we can have thoughtful interactions and also steer the right people to the right discussions?

You quietly stepped down from TechCom last year - why?

Quite simply, the last couple years have been really rough on my mental health. I don't have the bandwidth or the ability to consistently concentrate on the many ongoing issues I'd need to keep up with TechCom work. While I work on that, I'm concentrating on less directed research projects and underresourced things like TimedMediaHandler that don't have strict deadlines.

Wikipedia is unique in that it allows users to contribute JS/CSS directly to the site rather than requiring browserside user scripts (Greasemonkey, etc.). Do you think that was a good idea at the time? Now? I know you've also worked on exploring how to sandbox these, can you talk a little bit about that?

a) I think it's SUPER GREAT
b) I think WE DID IT SO WRONG

;)

There are two problems with letting people run direct JS in the host app environment:

  1. if they run someone else's malicious code it can take over their account
  2. code may use internal data accessors and methods that aren't going to stay stable, potentially breaking over time

Both can be solved by using a sandboxed environment (probably a suitable iframe). I think there's a lot of cool stuff that can be built on top of this method, with full-on APIs for accessing an editor state as a plugin, or whatever.

So far I get a lot of "yes that sounds great" but not a lot of "yes I'll assign a PM and 2 engineers to it", so this remains on my research backlog. ;)

Why is VIBBER in all caps in your username?

It is a common convention in some language communities, including Esperanto speakers, to all-caps the family name to indicate which name portion is the family name and which is the personal name. I got into Wikipedia through discovering the Esperanto edition, so that's where I created my first user name according to the local convention!

If you could, what language would you rewrite MediaWiki in and why is it Rust?

There's so many good options. ;) Honestly for its primary market (running Wikipedia) PHP scales in the right ways and has an ok combination of easy to use and handles complex paradigms you need in your big program.

Rust would be great for a micro-wiki, which runs as a small executable as a peer-to-peer service or some crazy thing. ;)