Archiving Hell Gate's FYIs

By

Hell Gate is my favorite New York City-focused news outlet. The coverage is good and the writing tends to be exceptional.

When they redesigned their website in July last year, they added a new "FYI" section, which is usually one to two sentences with a link to a story about some current event. As I write this, the current FYI is:

New pope. Way too quick. Suspicious.

Maybe not the best example. Before that, it was:

In Randy Mastro's New York City, there will be no Pride concerts at Central Park if you support Palestine.

They usually update it every 3-4 days, though during major news events it might be more frequent.

Unfortunately it's not well advertised; on mobile, it's buried in a menu that you need to open before you even learn it exists. Plus they don't always post them on their social media and it's not in their newsletters.

So I've taken it upon myself to create an archive of them and provide an RSS feed. If you visit https://legoktm.com/hellgatenyc-fyi/, you'll see a (hopefully) complete archive of all their FYIs, using a layout and theme that tries to look like Hell Gate's website. It was a fun quick trip through the past year of NYC.

I also added a people filter, so you can just see entries that mention Mayor Eric Adams, disgraced former Governor Andrew Cuomo, and current governor Kathy Hochul. In a surprising-but-not-really-that-surprising twist, the former governor, who is also the leading mayoral candidate, has more entries than the current one.

How I built it#

I started by scraping the Wayback Machine for all the old entries.

Getting the "FYI" out of the HTML was trivial, the script looked for the node that matched the CSS selector .fyi-section p. If the inner HTML was different than what was previously found, it was saved as a new entry. (As a weird contradiction, Hell Gate's website adds both ?ref=hellgatenyc.com to any URL, and sets rel="noreferrer" 🙃.)

The Wayback Machine has some pretty aggressive rate limits, which was annoying for a bit, until I realized I could plug in urllib3's Retry utility and have it, slowly, retry everything until it succeeded. Some days the Wayback Machine had archived Hell Gate's homepage like every 10 minutes so I ended up adding an optimization to skip entries that were within 3 hours of one I already checked (hopefully it didn't miss anything).

Now that I had collected ~75 entries in a JSON database, I wrote a small Rust program to identify new entries and export a RSS feed on a 3-hour timer.

When I started manually reviewing all the entries, I realized that some of them were just typos or other cosmetic changes. For example, back in August 2024:

- To truly understand Bryant Park, y<a href="https://hellgatenyc.com/bryant-park-frog-carousel-flaubert-mystery/">ou must wrestle with its large frog</a>
+ To truly understand Bryant Park, <a href="https://hellgatenyc.com/bryant-park-frog-carousel-flaubert-mystery/">you must wrestle with its large frog</a>

The "y" didn't get linked, and within a few hours they fixed it.

I applied two checks to detect these type of typo entries. First, seeing if the plain text version is the same, to detect links changing or issues like the one above. Then I added in a check for the Levenshtein distance to detect other cases of minor changes.

Even with those two checks, it's not perfect. Sometimes the edits are more substantial, like "George Santos has been sentenced to more than seven years...". The additional "more than" is more than a small typo fix, but still just a correction.

But then there are FYIs like "What are you doing on December 5TONIGHT? ..." Only two words being changed, but it feels like both merit independent entries. The ideal solution would be manual curation, but I don't think I can commit to that, so the current implementation is a reasonable compromise for now.

The last part of this project was creating a HTML browser for all of these, which would allow linking to old FYIs. I tried pretty hard to mimic the styling of the Hell Gate website, which was fun.

It's weird what you learn when you dig very deeply into a website's CSS. On the Hell Gate website, if you hover over an author link, after 2 seconds it turns purple. I never noticed!

Nearly everything draws from elements on the Hell Gate website, except I wasn't able to replicate the font used in the headlines because it's not freely licensed. They use Futura Passata; I looked for free equivalents to Futura and ended up with "League Spartan", which is not really close, but in the ballpark at least. The body text is correctly "Outfit".

I'm exceptionally pleased with how the people filter turned out. In the database, I wrote some code to tag entries based on who was mentioned. "Adams" maps to Eric Adams, unless it's Adrienne Adams (no relation); "Cuomo" maps to Andrew Cuomo unless it's Chris Cuomo (yes relation).

This was especially fun to implement since Rust's primary regex crate doesn't support negative lookaheads.

On the HTML side, it's a radio input element, so only one filter can be selected at a time. The actual filtering is implemented in pure CSS:

body:has(#filter-adams:checked) .entry:not(.person-Adams) {
    display: none;
}
body:has(#filter-cuomo:checked) .entry:not(.person-Cuomo) {
    display: none;
}
body:has(#filter-hochul:checked) .entry:not(.person-Hochul) {
    display: none;
}

The only issue I ran into is that Firefox helpfully remembers the radio button state you last used, which isn't what I wanted here. I ended up adding a few lines of JavaScript to take care of it for now:

document.addEventListener("DOMContentLoaded", () => {
    document.querySelectorAll('input[type="radio"]').forEach((elem) => {
        elem.checked = false;
    });
});

That's pretty much it, I've published the source code for those that want to peek.