Building a less terrible URL shortener

By

The demise of goo.gl is a good opportunity to write about how we built a less terrible URL shortener for Wikimedia projects: w.wiki. (I actually started writing this blog post in 2016 and never got back to it, oops.)

URL shorteners are generally a bad idea for a few main reasons:

  1. They obfuscate the actual link destination, making it harder to figure out where a link will take you.
  2. If they disappear or are shut down, the link is broken, even if the destination is fully functional.
  3. They often collect extra tracking/analytics information.

But there are also legitimate reasons to want to shorten a URL, including use in printed media where it's easier for people to type a shorter URL. Or circumstances where there are restrictive character limits like tweets and IRC topics. The latter often affects non-ASCII languages even more when limits are measured in bytes instead of Unicode characters.

At the end of the day, there was still considerable demand for a URL shortener, so we figured we could provide one that was well, less terrible. Following a RfC, we adopted Tim's proposal, and a plan to avoid the aforementioned flaws:

  1. Limit shortening to Wikimedia-controlled domains, so you have a general sense of where you'd end up. (Other generic URL shorteners are banned on Wikimedia sites because they bypass our domain-based spam blocking.)
  2. Proactively provide dumps as a guarantee that if the service ever disappeared, people could still map URLs to their targets. You can find them on dumps.wikimedia.org and they're mirrored to the Internet Archive.
  3. Intentionally avoid any extra tracking and metrics collection. It is still included in Wikimedia's general webrequest logs, but there is no dedicated, extra tracking for short URLs besides what every request gets.

Anyone can create short URLs for any approved domain, subject to some rate limits and anti-abuse mechanisms via a special page or the API.

All of this is open source and usable by any MediaWiki wiki by installing the UrlShortener extension. (Since this launched, additional functionality was added to use multiple character sets and generate QR codes.)

The dumps are nice for other purposes too, I use them to provide basic statistics on how many URLs have been shortened.

I still tend to have a mildly negative opinion about people using our URL shortner, but hey, it could be worse, at least they're not using goo.gl.