<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>The Lego Mirror - Tech</title><id>https://blog.legoktm.com/</id><updated>2025-05-24T02:15:00+00:00</updated><link href="https://blog.legoktm.com/" rel="alternate"/><link href="https://blog.legoktm.com/feeds/tech.atom.xml" rel="self"/><entry><title>Creating IPv4-only and IPv6-only containers with podman</title><id>tag:blog.legoktm.com,2025-05-23:/2025/05/23/creating-ipv4-only-and-ipv6-only-containers-with-podman.html</id><updated>2025-05-24T02:15:00+00:00</updated><author><name>legoktm</name></author><category term="Tech"/><category term="podman"/><link href="https://blog.legoktm.com/2025/05/23/creating-ipv4-only-and-ipv6-only-containers-with-podman.html" rel="alternate"/><published>2025-05-24T02:15:00+00:00</published><summary type="html">By default, newer versions of podman run containers with a dual stack network that supports IPv4 and IPv6 (yay). But if you&apos;re doing something specific, you can set up IPv4-only and IPv6-only networks. (Note: I tested this all with rootless podman 5.5.0, the current version in Fedora 42.) I&apos;m primarily…</summary><content type="html">&lt;p&gt;By default, newer versions of podman run containers with a dual stack network that supports IPv4 and IPv6 (yay). But if you&apos;re doing something specific, you can set up IPv4-only and IPv6-only networks.&lt;/p&gt;
&lt;p&gt;(Note: I tested this all with rootless podman 5.5.0, the current version in Fedora 42.)&lt;/p&gt;
&lt;p&gt;I&apos;m primarily writing this because it took me a while to figure this out, I got entirely tripped up by the &lt;code&gt;--ipv6&lt;/code&gt; option which turned out to not be what I wanted, despite the name implying it enables IPv6.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://docs.podman.io/en/v5.5.0/markdown/podman-network-create.1.html#ipv6&quot;&gt;The documentation&lt;/a&gt; for it is technically accurate, as it says:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Enable IPv6 (Dual Stack) networking. If no subnets are given, it allocates an ipv4 and an ipv6 subnet.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The most important part is in parenthesis — it enables a dual-stack network. Which means that passing &lt;code&gt;--ipv6&lt;/code&gt; when creating a network doesn&apos;t just enable IPv6, it &lt;em&gt;also&lt;/em&gt; enables IPv4!&lt;/p&gt;
&lt;h2 class=&quot;header&quot; id=&quot;real-ipv6-only-1b5c&quot;&gt;&lt;span class=&quot;header-text&quot;&gt;Real IPv6-only&lt;/span&gt;&lt;a href=&quot;/2025/05/23/creating-ipv4-only-and-ipv6-only-containers-with-podman.html#real-ipv6-only-1b5c&quot; class=&quot;header-link&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;What you actually want is:&lt;/p&gt;
&lt;pre style=&quot;background-color:#e3eaf2;&quot;&gt;&lt;code&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;$ podman network create --subnet fd00::/64 --gateway fd00::1 ipv6-only
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You can verify that IPv4 doesn&apos;t work by:&lt;/p&gt;
&lt;pre style=&quot;background-color:#e3eaf2;&quot;&gt;&lt;code&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;$ podman pull quay.io/curl/curl:latest
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;$ podman run --rm -it --net=ipv6-only curl -v4 https://en.wikipedia.org
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;* Host en.wikipedia.org:443 was resolved.
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;* IPv6: (none)
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;* IPv4: 208.80.154.224
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;*   Trying 208.80.154.224:443...
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;* Immediate connect fail for 208.80.154.224: Network unreachable
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;* Failed to connect to en.wikipedia.org port 443 after 13 ms: Could not connect to server
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;* closing connection #0
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;curl: (7) Failed to connect to en.wikipedia.org port 443 after 13 ms: Could not connect to server
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And that IPv6 works:&lt;/p&gt;
&lt;pre style=&quot;background-color:#e3eaf2;&quot;&gt;&lt;code&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;$ podman run --rm -it --net=ipv6-only curl -I6 https://en.wikipedia.org
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;HTTP/2 301 
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;date: Fri, 23 May 2025 00:29:41 GMT
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h2 class=&quot;header&quot; id=&quot;ipv4-only-1b5c&quot;&gt;&lt;span class=&quot;header-text&quot;&gt;IPv4-only&lt;/span&gt;&lt;a href=&quot;/2025/05/23/creating-ipv4-only-and-ipv6-only-containers-with-podman.html#ipv4-only-1b5c&quot; class=&quot;header-link&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;And now for IPv4, which is even simpler:&lt;/p&gt;
&lt;pre style=&quot;background-color:#e3eaf2;&quot;&gt;&lt;code&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;$ podman network create ipv4-only
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Yep, no options needed, you just need a network in which IPv6 is not enabled by the subnet and doesn&apos;t pass the &lt;code&gt;--ipv6&lt;/code&gt; flag.&lt;/p&gt;
&lt;h2 class=&quot;header&quot; id=&quot;final-notes-1b5c&quot;&gt;&lt;span class=&quot;header-text&quot;&gt;Final notes&lt;/span&gt;&lt;a href=&quot;/2025/05/23/creating-ipv4-only-and-ipv6-only-containers-with-podman.html#final-notes-1b5c&quot; class=&quot;header-link&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;The default networking stack for rootless containers is &lt;a href=&quot;https://docs.podman.io/en/v5.5.0/markdown/podman-run.1.html#network-mode-net&quot;&gt;documented&lt;/a&gt; (under &amp;quot;pasta&amp;quot;) as &amp;quot;IPv4 and IPv6 addresses and routes, as well as the pod interface name, are copied from the host&amp;quot;. In my testing this is correct, but this is an entirely separate thing from &lt;code&gt;podman&lt;/code&gt; network that appears to exist by default, which is IPv4-only.&lt;/p&gt;
&lt;p&gt;I ended up figuring out the whole misleading &lt;code&gt;--ipv6&lt;/code&gt; thing thanks to &lt;a href=&quot;https://github.com/containers/podman/issues/7302#issuecomment-823056587&quot;&gt;a GitHub comment&lt;/a&gt;, which explicitly spelled out &amp;quot;The --ipv6 flags means dual-stack&amp;quot;, and even explained the rationale why: &amp;quot;this is fully compatible with docker ...&amp;quot;&lt;/p&gt;
&lt;p&gt;I shouldn&apos;t be too surprised that &lt;a href=&quot;https://gist.github.com/legoktm/a4a69603526a4f50ee5a7d0a0d98ab32&quot;&gt;Claude also got tripped up&lt;/a&gt; by the &lt;code&gt;--ipv6&lt;/code&gt; flag and gave me bad advice. ¯\_(ツ)_/¯&lt;/p&gt;
&lt;p&gt;Final final note: if you try a plain &lt;code&gt;podman run curl ...&lt;/code&gt; without first pulling the image, it won&apos;t know which image you actually want, and none of the three prompts it gives you (registry.fedoraproject.org, registry.access.redhat.com, docker.io/library) are the &lt;a href=&quot;https://github.com/curl/curl-container&quot;&gt;official upstream image&lt;/a&gt;. I&apos;ve &lt;a href=&quot;https://github.com/containers/shortnames/pull/64&quot;&gt;submitted a PR&lt;/a&gt; to the &lt;code&gt;containers/shortnames&lt;/code&gt; repo to fix that, so a plain &lt;code&gt;curl&lt;/code&gt; image name will automatically be aliased to the upstream image.&lt;/p&gt;
</content></entry><entry><title>Switching feed readers</title><id>tag:blog.legoktm.com,2025-04-08:/2025/04/08/switching-feed-readers.html</id><updated>2025-04-08T23:10:00+00:00</updated><author><name>legoktm</name></author><category term="Tech"/><category term="freedom"/><link href="https://blog.legoktm.com/2025/04/08/switching-feed-readers.html" rel="alternate"/><published>2025-04-08T23:10:00+00:00</published><summary type="html">I recently switched my RSS/Atom feed reader away from Tiny Tiny RSS and its companion Android app. I had fallen behind on upgrades on the server, and the app had been removed from F-Droid, so I needed to do something. I had set up Tiny Tiny RSS in 2017, so…</summary><content type="html">&lt;p&gt;I recently switched my RSS/Atom feed reader away from &lt;a href=&quot;https://tt-rss.org/&quot;&gt;Tiny Tiny RSS&lt;/a&gt; and its companion Android app. I had fallen behind on upgrades on the server, and the app had been removed from F-Droid, so I needed to do &lt;em&gt;something&lt;/em&gt;. I had set up Tiny Tiny RSS in 2017, so it was time for a fresh start.&lt;/p&gt;
&lt;p&gt;I wanted a similar setup: a server that would maintain track of my subscriptions, articles and (un)read status, and then an Android app with a good UI and offline support. I do most of my feed reading on my phone these days, so I&apos;m a bit more picky on that side.&lt;/p&gt;
&lt;p&gt;Rachel (by the bay) &lt;a href=&quot;https://rachelbythebay.com/w/2025/03/11/xhtml/&quot;&gt;recently praised&lt;/a&gt; the &lt;a href=&quot;https://www.freshrss.org/&quot;&gt;FreshRSS&lt;/a&gt; developers, so that was the first thing I tried out. And it worked...fine? I set up a container with podman, hooked it up to a MariaDB database (also in a container), and imported all my feeds.&lt;/p&gt;
&lt;p&gt;Mostly for due dilligence, I tried out &lt;a href=&quot;https://miniflux.app/&quot;&gt;Miniflux&lt;/a&gt; next. It only supports postgres, which knocked it down a peg for me since I prefer MariaDB. I was really impressed with its feature to download content from the link instead of what&apos;s in the feed to &amp;quot;fix&amp;quot; feeds that truncate content.&lt;/p&gt;
&lt;p&gt;Then I learned that FreshRSS has the same feature, just a little less automatic! FreshRSS requires you to manually enter a CSS selector, which so far has just been pretty trivial since it&apos;s either &lt;code&gt;article&lt;/code&gt; or &lt;code&gt;main&lt;/code&gt;. It also supports &lt;a href=&quot;https://danq.me/2022/09/27/freshrss-xpath/&quot;&gt;manual feed construction using XPath&lt;/a&gt;, which seems really cool, but I haven&apos;t gotten that far yet.&lt;/p&gt;
&lt;p&gt;On my phone, I evaluated multiple apps: &lt;a href=&quot;https://f-droid.org/en/packages/com.capyreader.app/&quot;&gt;Capy Reader&lt;/a&gt;, &lt;a href=&quot;https://f-droid.org/en/packages/com.readrops.app/&quot;&gt;Readrops&lt;/a&gt; and &lt;a href=&quot;https://f-droid.org/packages/de.circle_dev.flux_news/&quot;&gt;Flux News&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The main limitation I encountered with Readrops was not having a preview or snippet when viewing all the articles (&lt;a href=&quot;https://github.com/readrops/Readrops/blob/44d00e30691ea86440d624a4c09de4cf2e024b00/fastlane/metadata/android/en-US/images/phoneScreenshots/Screenshot_6.jpg&quot;&gt;example&lt;/a&gt;). I also found the &amp;quot;estimated reading time&amp;quot; indicators kind of patronizing with no way to disable them.&lt;/p&gt;
&lt;p&gt;As far as I could tell, Flux News didn&apos;t have any support for offline reading.&lt;/p&gt;
&lt;p&gt;I like Capy Reader, it&apos;s a clean UI that&apos;s similar enough to what I had with Tiny Tiny RSS. FreshRSS worked right away, but I couldn&apos;t get it to connect to Miniflux over the &lt;a href=&quot;https://miniflux.app/docs/google_reader.html&quot;&gt;Google Reader API&lt;/a&gt;. I&apos;m pretty sure it&apos;s a me problem, since others are using it with Miniflux just fine.&lt;/p&gt;
&lt;p&gt;In any case, I tried out FreshRSS+Capy Reader for a week, and while I still need a bit more time to get used to it, I think it will meet my feed reading needs. So yesterday, after taking a backup, I shutdown my Tiny Tiny RSS server after more than 8 years in service 🫡. Here&apos;s to 8 more years of feed reading with FreshRSS and Capy Reader!&lt;/p&gt;
</content></entry><entry><title>A website for Boba Quest 2025</title><id>tag:blog.legoktm.com,2025-03-08:/2025/03/08/a-website-for-boba-quest-2025.html</id><updated>2025-03-08T18:00:00+00:00</updated><author><name>legoktm</name></author><category term="Tech"/><category term="boba"/><category term="bobaquest"/><category term="llm"/><link href="https://blog.legoktm.com/2025/03/08/a-website-for-boba-quest-2025.html" rel="alternate"/><published>2025-03-08T18:00:00+00:00</published><summary type="html">tl;dr: I&apos;ve set up a new website for Boba Quest 2025 🧋: https://legoktm.com/bobaquest/. Now that I have two months of reviews under my belt, I wanted to create a proper website to make it easy to discover them. Taking inspiration from Everything is Everything, I wanted a similar layout with…</summary><content type="html">&lt;p&gt;tl;dr: I&apos;ve set up a new website for Boba Quest 2025 🧋: &lt;a href=&quot;https://legoktm.com/bobaquest/&quot;&gt;https://legoktm.com/bobaquest/&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Now that I have two months of reviews under my belt, I wanted to create a proper website to make it easy to discover them.&lt;/p&gt;
&lt;p&gt;Taking inspiration from &lt;a href=&quot;https://everythingiseverything.nyc/&quot;&gt;Everything is Everything&lt;/a&gt;, I wanted a similar layout with a big map of all the reviews and then explanations of the project itself.&lt;/p&gt;
&lt;img src=&quot;/images/boba_quest_website.png&quot;&gt;
&lt;h2 class=&quot;header&quot; id=&quot;creating-a-static-site-aa89&quot;&gt;&lt;span class=&quot;header-text&quot;&gt;Creating a static site&lt;/span&gt;&lt;a href=&quot;/2025/03/08/a-website-for-boba-quest-2025.html#creating-a-static-site-aa89&quot; class=&quot;header-link&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I figured all of this could be built as a static site and a bit of JavaScript. I&apos;ve had my eye on &lt;a href=&quot;https://www.getzola.org/&quot;&gt;Zola&lt;/a&gt; for a while, since it&apos;s written in Rust and by the same author as &lt;a href=&quot;https://keats.github.io/tera/&quot;&gt;Tera&lt;/a&gt;, my preferred templating library.&lt;/p&gt;
&lt;p&gt;It ended up being a great fit, largely because it has builtin support for loading content from a TOML file via a &lt;a href=&quot;https://www.getzola.org/documentation/templates/overview/#load-data&quot;&gt;&lt;code&gt;load_data&lt;/code&gt;&lt;/a&gt; function. I store all of the review metadata in a &lt;a href=&quot;https://git.legoktm.com/legoktm/bobaquest/src/main/data.toml&quot;&gt;single file&lt;/a&gt;, which drives the entire site.&lt;/p&gt;
&lt;p&gt;My main criticism is that Zola doesn&apos;t support so-called &amp;quot;&lt;a href=&quot;https://github.com/getzola/zola/issues/840&quot;&gt;ugly URLs&lt;/a&gt;&amp;quot; like &lt;code&gt;foo.html&lt;/code&gt; (it only supports &lt;code&gt;foo/&lt;/code&gt;). I guess I&apos;m weird in that, as this blog demonstrates, I like URLs that end with &lt;code&gt;.html&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The last thing I did was set a &lt;a href=&quot;https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP&quot;&gt;Content-Security-Policy&lt;/a&gt;. It&apos;s relatively strict (I think), except for needing to allow &lt;code&gt;style-src-elem &apos;unsafe-inline&apos;&lt;/code&gt; because &lt;a href=&quot;https://github.com/Leaflet/Leaflet/issues/9168&quot;&gt;Leaflet&apos;s popups need it&lt;/a&gt;.&lt;/p&gt;
&lt;h2 class=&quot;header&quot; id=&quot;not-handwritten-aa89&quot;&gt;&lt;span class=&quot;header-text&quot;&gt;Not handwritten&lt;/span&gt;&lt;a href=&quot;/2025/03/08/a-website-for-boba-quest-2025.html#not-handwritten-aa89&quot; class=&quot;header-link&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;I started by asking Claude to generate such a website for me (&lt;a href=&quot;https://gist.github.com/legoktm/9612668914ff38df9e78cf75b3583d98&quot;&gt;chat transcript&lt;/a&gt;), and kept slowly refining  with prompts. While normally Claude is really good at letting you preview HTML websites, its own CSP prevented any of the map tiles from loading so I had to copy it locally and preview it that way.&lt;/p&gt;
&lt;p&gt;I&apos;m reasonably proficient at HTML/CSS/JS, but this was a much more fun way to design a website. Instead of having to focus on syntax, I could just tell it to shape it how I wanted, plus making my own manual improvements here and there.&lt;/p&gt;
&lt;p&gt;People, often rightfully so, criticize AI-generated content as lacking creativity, but I feel that using Claude allowed me to design the website exactly how I, a human, wanted it to look like, and not get bogged down fighting syntax things.&lt;/p&gt;
&lt;p&gt;I eventually hit Claude&apos;s output limit (since it kept repeating the whole HTML file) and had to start branching off into individual chats, like adding an &lt;a href=&quot;https://gist.github.com/legoktm/735c56a36d167b692a18df59a365a035&quot;&gt;escape key handler&lt;/a&gt; and learning about &lt;a href=&quot;https://gist.github.com/legoktm/2857432ad490626462f85d56146d9c7c&quot;&gt;&lt;code&gt;vh&lt;/code&gt; units in CSS&lt;/a&gt; (big TIL for me). I hit the limit again when I added the &lt;a href=&quot;https://gist.github.com/legoktm/c587eda268dcf0061df4b7a98e29bea4&quot;&gt;&amp;quot;Recent reviews&amp;quot; sidebar&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;That was roughly the point I checked it into Git, so you can &lt;a href=&quot;https://git.legoktm.com/legoktm/bobaquest/src/3850ab6eec9d1cb3e4918782301f2d7f2320fa49&quot;&gt;browse how the initial version&lt;/a&gt; looked like. It&apos;s come quite a long way since then.&lt;/p&gt;
</content></entry><entry><title>Simplifying PDF imports on reMarkable</title><id>tag:blog.legoktm.com,2025-02-18:/2025/02/18/simplifying-pdf-imports-on-remarkable.html</id><updated>2025-02-18T23:00:00+00:00</updated><author><name>legoktm</name></author><category term="Tech"/><category term="llm"/><category term="remarkable"/><category term="rust"/><link href="https://blog.legoktm.com/2025/02/18/simplifying-pdf-imports-on-remarkable.html" rel="alternate"/><published>2025-02-18T23:00:00+00:00</published><summary type="html">I&apos;ve been using my reMarkable 2 a lot more recently, and have gotten started with actually hacking on it. It&apos;s a bit overdue, since the main reason I purchased it in the first place was that it is built on top of Linux and doesn&apos;t require any special jailbreaking/rooting. I…</summary><content type="html">&lt;p&gt;I&apos;ve been using my reMarkable 2 a lot more recently, and have gotten started with actually hacking on it. It&apos;s a bit overdue, since the main reason I purchased it in the first place was that it is built on top of Linux and doesn&apos;t require any special jailbreaking/rooting.&lt;/p&gt;
&lt;p&gt;I found &lt;a href=&quot;https://github.com/adaerr/reMarkableScripts/blob/master/pdf2remarkable.sh&quot;&gt;Adrian Daerr&apos;s script&lt;/a&gt; to import PDFs/EPUBs into a reMarkable, which is surprisingly not a straightforward operation.
You need to rename the file to use a UUID as its name, create corresponding &lt;code&gt;.metadata&lt;/code&gt; and &lt;code&gt;.content&lt;/code&gt; files, and then empty &lt;code&gt;.cache&lt;/code&gt;, &lt;code&gt;.highlights&lt;/code&gt;, and &lt;code&gt;.thumbnails&lt;/code&gt; directories.&lt;/p&gt;
&lt;p&gt;Daerr&apos;s bash script does all of this, except it runs on a different machine, and then &lt;code&gt;scp&lt;/code&gt;s the files over to the reMarkable. Modifying it to run on the reMarkable itself didn&apos;t seem too complicated, but I try to avoid writing shell scripts as much as possible, so I took the opportunity to port it to Rust.&lt;/p&gt;
&lt;p&gt;I first asked Claude to port it to Rust, and, probably for the first time, I was disappointed by &lt;a href=&quot;https://gist.github.com/legoktm/483e0615634b56834434302ef4610883&quot;&gt;the result&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Specifically, it generated:&lt;/p&gt;
&lt;pre style=&quot;background-color:#e3eaf2;&quot;&gt;&lt;code class=&quot;language-rust&quot;&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;if&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt; args[&lt;/span&gt;&lt;span style=&quot;color:#755f00;&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;] &lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;== &lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;&amp;quot;-r&amp;quot; &lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;{
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;    restart_xochitl &lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;= !&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;config.restart_xochitl_default;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;    files_start_index &lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;, &lt;/span&gt;&lt;span style=&quot;color:#755f00;&quot;&gt;2&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I couldn&apos;t come up with a reason on why Claude would insert a comma after the equals sign (it&apos;s not really a hallucination I think?), but rust-analyzer flagged it as a syntax error right away.&lt;/p&gt;
&lt;p&gt;I deleted most of that code anyways since I wanted to unconditionally restart, and also use &lt;a href=&quot;https://lib.rs/crates/camino&quot;&gt;camino&lt;/a&gt; instead. As an aside, &lt;code&gt;std::path&lt;/code&gt; represents paths with &lt;code&gt;OsString&lt;/code&gt;, which is incredibly inconvenient to use anywhere else, which&apos;ll expect normal UTF-8 &lt;code&gt;String&lt;/code&gt;s. &lt;code&gt;camino&lt;/code&gt; only supports paths that are fully UTF-8 (aka &lt;code&gt;String&lt;/code&gt; instead of &lt;code&gt;OsString&lt;/code&gt;), which should be fine for most projects that don&apos;t need to support legacy files and encodings, like this one.&lt;/p&gt;
&lt;p&gt;Next, I had to cross-build it for the reMarkable&apos;s ARM v7 CPU. I&apos;ve done it before for Raspberry Pis, but since it&apos;s been a while, I wanted to try out the &lt;a href=&quot;https://github.com/cross-rs/cross&quot;&gt;&lt;code&gt;cross&lt;/code&gt;&lt;/a&gt; tool, which transparently builds in a container with the necessary toolchains. And if you set a &lt;a href=&quot;https://salsa.debian.org/legoktm/rm-import/-/commit/415d5e72b9cec843f493e789c6b60c7134a70128&quot;&gt;little bit of metadata&lt;/a&gt;, it&apos;s as simple as &lt;code&gt;cross build --release&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;To make the newly imported file actually show up in the reMarkable file listing, you apparently have to restart the entire thing, which does work, but surely there&apos;s a better way to tell it to look for new files...&lt;/p&gt;
&lt;p&gt;Finally, to actually make use of it, I set up &lt;a href=&quot;https://rclone.org/&quot;&gt;rclone&lt;/a&gt; to automatically fetch a folder from my Nextcloud instance, and then run &lt;code&gt;rm-import&lt;/code&gt; over it. And now I can drop a PDF in a dedicated Nextcloud folder, and it&apos;ll end up on my reMarkable!&lt;/p&gt;
</content></entry><entry><title>Claude to Markdown browser extension</title><id>tag:blog.legoktm.com,2025-02-11:/2025/02/11/claude-to-markdown-browser-extension.html</id><updated>2025-02-11T05:37:00+00:00</updated><author><name>legoktm</name></author><category term="Tech"/><category term="claude"/><category term="firefox"/><category term="llm"/><link href="https://blog.legoktm.com/2025/02/11/claude-to-markdown-browser-extension.html" rel="alternate"/><published>2025-02-11T05:37:00+00:00</published><summary type="html">tl;dr: A new browser extension allows you to automatically export Claude transcripts to Markdown. You can install it for Firefox. Claude is the primary LLM I use these days, but they don&apos;t have a builtin way to share your sessions, which I find pretty useful to learn from. Simon Willison…</summary><content type="html">&lt;p&gt;tl;dr: A new browser extension allows you to automatically export Claude transcripts to Markdown. You can install it for &lt;a href=&quot;https://addons.mozilla.org/en-US/firefox/addon/claude-to-markdown/&quot;&gt;Firefox&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Claude is the primary LLM I use these days, but they don&apos;t have a builtin way to share your sessions, which I find pretty useful to learn from. &lt;a href=&quot;https://simonwillison.net/&quot;&gt;Simon Willison&lt;/a&gt; had previously posted a &lt;a href=&quot;https://observablehq.com/@simonw/convert-claude-json-to-markdown&quot;&gt;notebook&lt;/a&gt; with code to convert a JSON response to a nicely formatted Markdown transcript. But the process was pretty cumbersome, you need to open up the network console, find a specific request, and then copy the JSON out of it. But on a technical level, that should be pretty straightforward to automate...right?&lt;/p&gt;
&lt;p&gt;I asked Claude, of course, and it created a WebExtension that automatically grabbed the specific JSON response necessary, and displays it to the user. I plugged in Simon&apos;s code to turn it into markdown, and huzzah!&lt;/p&gt;
&lt;img src=&quot;/images/claude-to-markdown.png&quot;&gt;
&lt;p&gt;When you reload the tab with Claude in it, or select a different conversation, their frontend will send a request with the full JSON contents, which the extension intercepts and stores for display in the popup. It doesn&apos;t (yet) support live updating as you send new messages or replies come in.&lt;/p&gt;
&lt;p&gt;As a bonus, I had Claude add functionality to upload the transcript as a GitHub Gist. So you can see the &lt;a href=&quot;https://gist.github.com/legoktm/63f134c6d27308d69135cd987a7d026c&quot;&gt;full transcript of the conversation&lt;/a&gt; used to create the extension, uploaded by itself.&lt;/p&gt;
&lt;p&gt;This is not a super novel idea, there are other extensions in the Firefox and Chrome stores that puport to have similar functionality, but I didn&apos;t immediately find any open source ones. I also wanted to refresh my knowledge on WebExtensions; it&apos;s been seven years since I &lt;a href=&quot;/2018/04/14/firefox-add-on-to-skip-mobile-wikipedia-redirect.html&quot;&gt;last created one&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Overall I&apos;d estimate Claude turned something that would&apos;ve taken me ~5 hours to do from scratch into a 1-2 hour project, and even then most of that time was spent manually testing and verifying the functionality versus actually writing code. The original version was not perfect, I had to make some changes like adding a content-security-policy and removing extraneous permissions. Unsurprisingly it used manifest_version 2 instead of the &lt;a href=&quot;https://extensionworkshop.com/documentation/develop/manifest-v3-migration-guide/&quot;&gt;newer v3&lt;/a&gt;, even though all the code it created was compatible with the v3 requirements.&lt;/p&gt;
&lt;p&gt;I hope this is useful for others, you can &lt;a href=&quot;https://addons.mozilla.org/en-US/firefox/addon/claude-to-markdown/&quot;&gt;install it in Firefox&lt;/a&gt; and &lt;a href=&quot;https://github.com/legoktm/claude-to-markdown&quot;&gt;browse the source code&lt;/a&gt;. In theory it should be compatible with other browsers, but I haven&apos;t tested it; the .zip bundle with the extension is uploaded as a &lt;a href=&quot;https://github.com/legoktm/claude-to-markdown/releases&quot;&gt;release&lt;/a&gt;.&lt;/p&gt;
</content></entry><entry><title>How many Rs are in the word strawberry?</title><id>tag:blog.legoktm.com,2024-08-10:/2024/08/10/how-many-rs-are-in-the-word-strawberry.html</id><updated>2024-08-10T04:08:12+00:00</updated><author><name>legoktm</name></author><category term="Tech"/><category term="chatgpt"/><category term="llm"/><link href="https://blog.legoktm.com/2024/08/10/how-many-rs-are-in-the-word-strawberry.html" rel="alternate"/><published>2024-08-10T04:08:12+00:00</published><summary type="html">There&apos;s a viral Mastodon post that demonstrates a supposed deficiency in large language models (LLMs), specifically ChatGPT, because it apparently can&apos;t count the number of letters in the English word &quot;strawberry&quot;. &gt; How many R&apos;s are in the word strawberry ChatGPT: There are two &quot;R&apos;s&quot; in the word &quot;strawberry.&quot; &gt;…</summary><content type="html">&lt;p&gt;There&apos;s a &lt;a href=&quot;https://mastodon.online/@larsmb/112921325458383423&quot;&gt;viral Mastodon post&lt;/a&gt;
that demonstrates a supposed deficiency in large language models (LLMs), specifically ChatGPT,
because it apparently can&apos;t count the number of letters in the English word
&amp;quot;strawberry&amp;quot;.&lt;/p&gt;
&lt;pre style=&quot;background-color:#e3eaf2;&quot;&gt;&lt;code&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;&amp;gt; How many R&amp;#39;s are in the word strawberry
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;ChatGPT: There are two &amp;quot;R&amp;#39;s&amp;quot; in the word &amp;quot;strawberry.&amp;quot;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;&amp;gt; Are you sure?
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;ChatGPT: Yes, I&amp;#39;m sure. There are two &amp;quot;R&amp;#39;s&amp;quot; in the word &amp;quot;strawberry.&amp;quot;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;[...]
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;&amp;gt; Count the Rs in the word straw and the word berry separately.
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;ChatGPT: Sure. Let&amp;#39;s break it down:
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;* The word &amp;quot;straw&amp;quot; has 1 &amp;quot;R.&amp;quot;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;* The word &amp;quot;berry&amp;quot; has 2 &amp;quot;R&amp;#39;s.&amp;quot;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;So, together they have 3 &amp;quot;R&amp;#39;s.&amp;quot; But in the combined word &amp;quot;strawberry,&amp;quot; there are still only 2 &amp;quot;R&amp;#39;s.&amp;quot;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;At first glance, that&apos;s a big yikes. (I was able to reproduce the inital
result using ChatGPT 4o.)&lt;/p&gt;
&lt;p&gt;But I get annoyed by this kind of post for
two main reasons: 1) if you ask slightly differently, ChatGPT can easily get
this question right, and 2) it&apos;s a missed opportunity to explain how LLMs work,
specifically with regards to tokenization.&lt;/p&gt;
&lt;h2 class=&quot;header&quot; id=&quot;let&apos;s-try-again-29f2&quot;&gt;&lt;span class=&quot;header-text&quot;&gt;Let&apos;s try again&lt;/span&gt;&lt;a href=&quot;/2024/08/10/how-many-rs-are-in-the-word-strawberry.html#let&apos;s-try-again-29f2&quot; class=&quot;header-link&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;Here&apos;s the prompt I used with ChatGPT 4o:&lt;/p&gt;
&lt;pre style=&quot;background-color:#e3eaf2;&quot;&gt;&lt;code&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;&amp;gt; Write and then run a Python program to determine how many R&amp;#39;s are in the word strawberry
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;ChatGPT: The word &amp;quot;strawberry&amp;quot; contains 3 occurrences of the letter &amp;#39;R&amp;#39;. 
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&apos;s not really well known that ChatGPT has a feature to &lt;a href=&quot;https://simonwillison.net/tags/code-interpreter/&quot;&gt;execute Python code&lt;/a&gt;
in a sandbox and use the result in answers.&lt;/p&gt;
&lt;p&gt;The Python code it generated was:&lt;/p&gt;
&lt;pre style=&quot;background-color:#e3eaf2;&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span style=&quot;color:#3c526d;&quot;&gt;# Python program to count the number of &amp;#39;R&amp;#39;s in the word &amp;#39;strawberry&amp;#39;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;word &lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;= &lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;&amp;quot;strawberry&amp;quot;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;count_Rs &lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;= &lt;/span&gt;&lt;span style=&quot;color:#7c00aa;&quot;&gt;word&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;.&lt;/span&gt;&lt;span style=&quot;color:#7c00aa;&quot;&gt;lower&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;().&lt;/span&gt;&lt;span style=&quot;color:#7c00aa;&quot;&gt;count&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;&amp;#39;r&amp;#39;&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;)  &lt;/span&gt;&lt;span style=&quot;color:#3c526d;&quot;&gt;# Convert the word to lowercase and count &amp;#39;r&amp;#39;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;count_Rs
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The code is pretty simple, and I think we&apos;d all agree that this is
a place where Python code can quite accurately answer this question. Having
the LLM generate the code and execute it nicely sidesteps the underlying issue.&lt;/p&gt;
&lt;p&gt;Of course, the problem is you have to prompt it to use Python in the first
place, knowing that this is a subject where an LLM without code execution
would fail. Which brings me to...&lt;/p&gt;
&lt;h2 class=&quot;header&quot; id=&quot;tokenization-29f2&quot;&gt;&lt;span class=&quot;header-text&quot;&gt;Tokenization&lt;/span&gt;&lt;a href=&quot;/2024/08/10/how-many-rs-are-in-the-word-strawberry.html#tokenization-29f2&quot; class=&quot;header-link&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;When you provide a prompt for an LLM, it breaks down each word into tokens.
OpenAI provides a &lt;a href=&quot;https://platform.openai.com/tokenizer&quot;&gt;visualization tool&lt;/a&gt;
for this; you can see that using the GPT-4 tokenizer, the string &amp;quot;strawberry&amp;quot;
is broken down into &amp;quot;str&amp;quot;, &amp;quot;aw&amp;quot; and &amp;quot;berry&amp;quot;, which get transformed to
numerical values of &lt;code&gt;[496, 675, 15717]&lt;/code&gt;. Or in other words, the LLM doesn&apos;t
have access to individual characters, it just sees the tokens.&lt;/p&gt;
&lt;p&gt;In this case, you can try to bypass the tokenizer by using spaces, so each
letter is now its own word and tokenized separately:&lt;/p&gt;
&lt;pre style=&quot;background-color:#e3eaf2;&quot;&gt;&lt;code class=&quot;language-email&quot;&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;&amp;gt; How many r characters are in the letters s t r a w b e r r y
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;ChatGPT: The word &amp;quot;strawberry&amp;quot; contains 3 instances of the letter &amp;quot;r.&amp;quot;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Success, though I&apos;ve had mixed success with approach this in the past.&lt;/p&gt;
&lt;h2 class=&quot;header&quot; id=&quot;closing-thoughts-29f2&quot;&gt;&lt;span class=&quot;header-text&quot;&gt;Closing thoughts&lt;/span&gt;&lt;a href=&quot;/2024/08/10/how-many-rs-are-in-the-word-strawberry.html#closing-thoughts-29f2&quot; class=&quot;header-link&quot;&gt;#&lt;/a&gt;&lt;/h2&gt;
&lt;p&gt;There are plenty of problems, both technically and ethically, with LLMs and
I don&apos;t want to be (or even appear to be) an apologist for them. But I think
it&apos;s important that we all understand how LLMs work to actually tackle and
address those problems.&lt;/p&gt;
</content></entry><entry><title>Side quest: creating a &quot;main&quot; tool</title><id>tag:blog.legoktm.com,2024-07-18:/2024/07/18/side-quest-creating-a-main-tool.html</id><updated>2024-07-18T00:57:56+00:00</updated><author><name>legoktm</name></author><category term="Tech"/><category term="claude"/><category term="git"/><category term="llm"/><category term="rust"/><link href="https://blog.legoktm.com/2024/07/18/side-quest-creating-a-main-tool.html" rel="alternate"/><published>2024-07-18T00:57:56+00:00</published><summary type="html">I like Simon Willison&apos;s framing of using large language models (aka LLMs, aka &quot;AI&quot;) to enable side quests of things you wouldn&apos;t normally do. Could I have done this without LLM assistance? Yes, but not nearly as quickly. And this was not a task on my critical path for the…</summary><content type="html">&lt;p&gt;I like Simon Willison&apos;s framing of using large language models (aka LLMs, aka &amp;quot;AI&amp;quot;)
to enable &lt;a href=&quot;https://simonwillison.net/2024/Mar/22/claude-and-chatgpt-case-study/&quot;&gt;side quests&lt;/a&gt;
of things you wouldn&apos;t normally do.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Could I have done this without LLM assistance? Yes, but not nearly as quickly. And this was not a task on my critical path for the day—it was a sidequest at best and honestly more of a distraction.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;So, yesterday&apos;s side quest: writing a tool that checks out the
default branch of a Git repository, regardless of what it&apos;s named.&lt;/p&gt;
&lt;p&gt;Context: most of my work these days happens on GitHub, which involves creating
PRs off the main branch, which means I&apos;m frequently going back to it, via
&lt;code&gt;git checkout main&lt;/code&gt; and then usually a &lt;code&gt;git pull&lt;/code&gt; to fast-forward the branch.&lt;/p&gt;
&lt;p&gt;But just to make things a little more interesting, the &lt;a href=&quot;https://github.com/freedomofpress/securedrop&quot;&gt;SecureDrop server&lt;/a&gt;
Git repository&apos;s main branch is named &lt;code&gt;develop&lt;/code&gt;, which entirely screws with
muscle memory and autocomplete. Not to mention all the older projects that still use a &lt;code&gt;master&lt;/code&gt; branch.&lt;/p&gt;
&lt;p&gt;For a while now I&apos;ve wanted a tool that just checks out the main branch,
regardless of what it&apos;s actually named, and optionally pulls it and stashes
pending changes.&lt;/p&gt;
&lt;p&gt;I asked &lt;a href=&quot;https://claude.ai/&quot;&gt;Claude&lt;/a&gt; 3.5 Sonnet for exactly that:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I want a Rust program named &amp;quot;main&amp;quot; that primarily checks out the main branch of a Git repository (or master if it&apos;s called that).&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;I want to invoke it as:&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;main&lt;/code&gt; - just checkout the main branch&lt;/li&gt;
&lt;li&gt;&lt;code&gt;main stash&lt;/code&gt; - stash changes, then checkout main, then pop the stash&lt;/li&gt;
&lt;li&gt;&lt;code&gt;main pull&lt;/code&gt; - checkout main and then git pull&lt;/li&gt;
&lt;li&gt;&lt;code&gt;main stash pull&lt;/code&gt; or &lt;code&gt;main pull stash&lt;/code&gt; - stash changes, checkout main, then pull, then pop the stash&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;It was mostly there, except it hardcoded the &lt;code&gt;main&lt;/code&gt; and &lt;code&gt;master&lt;/code&gt; branches
intead of looking it up via Git. I asked:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Is there a smarter way to determine the main branch? What if it&apos;s called something other than main or master?&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And it adjusted to checking &lt;code&gt;git symbolic-ref refs/remotes/origin/HEAD&lt;/code&gt;, which
I didn&apos;t know about.&lt;/p&gt;
&lt;p&gt;I cleaned up the argument handling a little bit, added &lt;code&gt;--version&lt;/code&gt; and published
it on &lt;a href=&quot;https://salsa.debian.org/legoktm/main&quot;&gt;Salsa&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It took me about 5-10 minutes for this whole process, which according to &lt;a href=&quot;https://xkcd.com/1205/&quot;&gt;xkcd&lt;/a&gt;
is an efficiency positive (saves 1 second, but I do it ~5 times a day) over 5 years.&lt;/p&gt;
&lt;p&gt;It probably would&apos;ve taken me 2-3x as long without using an LLM, but honestly,
I&apos;m not sure I would&apos;ve ever overcome the laziness to write something so small.&lt;/p&gt;
&lt;p&gt;Anyways, so far I haven&apos;t really gotten around to writing about my experiences and feelings
about LLMs yet, so here&apos;s literally the smallest piece of work to kick that off.&lt;/p&gt;
</content></entry><entry><title>Running the ArchiveTeam Warrior under Podman</title><id>tag:blog.legoktm.com,2024-07-08:/2024/07/08/running-the-archiveteam-warrior-under-podman.html</id><updated>2024-07-08T03:48:43+00:00</updated><author><name>legoktm</name></author><category term="Tech"/><category term="archiveteam"/><category term="podman"/><category term="systemd"/><category term="warrior"/><link href="https://blog.legoktm.com/2024/07/08/running-the-archiveteam-warrior-under-podman.html" rel="alternate"/><published>2024-07-08T03:48:43+00:00</published><summary type="html">I&apos;m finally back on an unlimited internet connection, so I&apos;ve started running the ArchiveTeam Warrior once again. The Warrior is a software application for archiving websites in a crowdsourced manner, especially when there&apos;s a time crunch when a website announces that it&apos;s closing or planning to delete things. Currently the…</summary><content type="html">&lt;p&gt;I&apos;m finally back on an unlimited internet connection, so I&apos;ve started
running the &lt;a href=&quot;https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior&quot;&gt;ArchiveTeam Warrior&lt;/a&gt; once again.&lt;/p&gt;
&lt;p&gt;The Warrior is a software application for archiving websites in a crowdsourced
manner, especially when there&apos;s a time crunch when a website announces that
it&apos;s closing or planning to delete things. Currently the default project is to archive
&lt;a href=&quot;https://wiki.archiveteam.org/index.php/Telegram&quot;&gt;public Telegram channels&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Historically the Warrior was distributed as a VirtualBox appliance, which was
a bit annoying to run headlessly and was unnecessarily resource intensive because
it required full virtualization. But they now have a &lt;a href=&quot;https://wiki.archiveteam.org/index.php/Running_Archive_Team_Projects_with_Docker&quot;&gt;containerized version&lt;/a&gt;
that is pretty trivial to set up.&lt;/p&gt;
&lt;p&gt;Relatedly, I&apos;ve recently been playing with Podman&apos;s &amp;quot;&lt;a href=&quot;https://blogs.gnome.org/alexl/2021/10/12/quadlet-an-easier-way-to-run-system-containers/&quot;&gt;Quadlet&lt;/a&gt;&amp;quot; functionality, which I really, really like. Instead of needing
to create a systemd service to wrap running a container, you can specify what you want to run in a basically systemd-native way:&lt;/p&gt;
&lt;pre style=&quot;background-color:#e3eaf2;&quot;&gt;&lt;code class=&quot;language-ini&quot;&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;[Unit]
&lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;Description&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;warrior
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;[Container]
&lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;Image&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;atdr.meo.ws/archiveteam/warrior-dockerfile
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;PublishPort&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;8001:8001
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;Environment&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;DOWNLOADER=&amp;lt;your name&amp;gt;
&lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;Environment&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;SELECTED_PROJECT=auto
&lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;Environment&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;CONCURRENT_ITEMS=4
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;AutoUpdate&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;registry
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;[Service]
&lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;Restart&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;on-failure
&lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;RestartSec&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;30
&lt;/span&gt;&lt;span style=&quot;color:#3c526d;&quot;&gt;# Extend Timeout to allow time to pull the image
&lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;TimeoutStartSec&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;180
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;
&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;[Install]
&lt;/span&gt;&lt;span style=&quot;color:#3c526d;&quot;&gt;# Start by default on boot
&lt;/span&gt;&lt;span style=&quot;color:#a04900;&quot;&gt;WantedBy&lt;/span&gt;&lt;span style=&quot;color:#111b27;&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#116b00;&quot;&gt;multi-user.target default.target
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I substituted in my username and dropped this into &lt;code&gt;~/.config/containers/systemd/warrior.container&lt;/code&gt;, ran
&lt;code&gt;systemctl --user daemon-reload&lt;/code&gt; and &lt;code&gt;systemctl --user start warrior&lt;/code&gt; and
it immediately started archiving! Visiting &lt;code&gt;localhost:8001&lt;/code&gt; should bring
up the web interface.&lt;/p&gt;
&lt;p&gt;You can then run &lt;code&gt;systemctl --user cat warrior&lt;/code&gt; to see what the generated
&lt;code&gt;.service&lt;/code&gt; file looks like.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;AutoUpdate=registry&lt;/code&gt; line tells &lt;a href=&quot;https://docs.podman.io/en/latest/markdown/podman-auto-update.1.html&quot;&gt;&lt;code&gt;podman-auto-update&lt;/code&gt;&lt;/a&gt; to automatically fetch
image updates and restart the running container. You&apos;ll likely need to enable/start the timer for this, with &lt;code&gt;systemctl --user enable podman-auto-update.timer&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The one thing I haven&apos;t figured out yet is gracefully shutting down, which is
important to avoid losing unfinished data. I suspect the &lt;code&gt;Restart=always&lt;/code&gt; is harmful here,
since I do want to explicitly shutdown in some cases.&lt;/p&gt;
&lt;p&gt;P.S. I also have a infrequently updated &lt;a href=&quot;https://legoktm.com/view/Free_bandwidth&quot;&gt;Free bandwidth&lt;/a&gt;
wiki page that contains other suggestions for how to use your internet connection for good.&lt;/p&gt;
&lt;p&gt;Update (2024-07-14): I changed the restart options to &lt;code&gt;Restart=on-failure&lt;/code&gt; and &lt;code&gt;RestartSec=30&lt;/code&gt;, which fixes the issue with restarting immediately
after a graceful shutdown and correctly restarting if it starts up before networking is ready.&lt;/p&gt;
</content></entry><entry><title>Basic anti-abuse monitoring for Mastodon</title><id>tag:blog.legoktm.com,2024-06-26:/2024/06/26/basic-anti-abuse-monitoring-for-mastodon.html</id><updated>2024-06-26T00:12:52+00:00</updated><author><name>legoktm</name></author><category term="Tech"/><category term="fediverse"/><category term="mastodon"/><category term="wikisworld"/><link href="https://blog.legoktm.com/2024/06/26/basic-anti-abuse-monitoring-for-mastodon.html" rel="alternate"/><published>2024-06-26T00:12:52+00:00</published><summary type="html">Back in February, Mastodon and the connected Fediverse faced a spam attack caused by unattended instances with an open signup policy. Bots quickly registered accounts and then sent spammy messages that were relayed through the Fediverse. It was annoying and the normal moderation tool of limiting or blocking entire instances…</summary><content type="html">&lt;p&gt;Back in February, Mastodon and the connected Fediverse faced a &lt;a href=&quot;https://tedium.co/2024/02/20/mastodon-spam-maintenance-problem/&quot;&gt;spam attack&lt;/a&gt;
caused by unattended instances with an open signup policy. Bots quickly registered
accounts and then sent spammy messages that were relayed through the Fediverse.&lt;/p&gt;
&lt;p&gt;It was annoying and the normal moderation tool of limiting or blocking entire instances wasn&apos;t effective since the attackers were coming from a wide set of places. Since then people have developed
shared blocklists that you can subscribe to, but that has its own downsides.&lt;/p&gt;
&lt;p&gt;So here&apos;s the tool I developed that we used for wikis.world: &lt;a href=&quot;https://gitlab.wikimedia.org/legoktm/masto-monitor&quot;&gt;masto-monitor&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The workflow is straightfoward:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Poll the federated timeline for all public posts&lt;/li&gt;
&lt;li&gt;Check them against a manually curated list of patterns&lt;/li&gt;
&lt;li&gt;If there&apos;s a match, report it using the API&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This allows us to have an automated process checking all incoming posts while still enabling
humans to make any moderation decisions.&lt;/p&gt;
&lt;p&gt;The code itself is pretty straightforward that it doesn&apos;t really merit much explanation. The matching logic is very very basic, it just looks for substring matches. I think the approach is worth developing further, allowing people to write more expressive rules/filters that trigger automated reports.&lt;/p&gt;
&lt;p&gt;But, I&apos;m not planning to do so myself since we don&apos;t currently have a need, so people are welcome to fork it to enhance it.&lt;/p&gt;
</content></entry><entry><title>Implementing search for my blog in WebAssembly</title><id>tag:blog.legoktm.com,2024-06-08:/2024/06/08/implementing-search-for-my-blog-in-webassembly.html</id><updated>2024-06-08T05:18:24+00:00</updated><author><name>legoktm</name></author><category term="Tech"/><category term="b2"/><category term="rust"/><category term="wasm"/><link href="https://blog.legoktm.com/2024/06/08/implementing-search-for-my-blog-in-webassembly.html" rel="alternate"/><published>2024-06-08T05:18:24+00:00</published><summary type="html">If you visit my blog (most likely what you&apos;re reading now) and have JavaScript enabled, you should see a magnifying glass in the top right, next to the feed icon. Clicking it should open up a search box that lets you perform a very rudimentary full-text search of all of…</summary><content type="html">&lt;p&gt;If you visit my blog (most likely what you&apos;re reading now) and have JavaScript
enabled, you should see a magnifying glass in the top right, next to the feed
icon. Clicking it should open up a search box that lets you perform a very
rudimentary full-text search of all of my blog posts.&lt;/p&gt;
&lt;p&gt;It&apos;s implemented fully client-side using Rust compiled to WebAssembly (WASM),
here&apos;s all the &lt;a href=&quot;https://git.legoktm.com/legoktm/b2/commit/00c3c19a14f8471493a04ceb3d750238ea27282c&quot;&gt;code I added&lt;/a&gt;
to implement it.&lt;/p&gt;
&lt;p&gt;At a high level, it:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Splits all blog posts into individual words, counts them, and dumps it
into a search index that is a JSON blob.&lt;/li&gt;
&lt;li&gt;Installs a click handler (using JavaScript) that displays the search bar and
lazy-loads the rest of the WASM code and search index.&lt;/li&gt;
&lt;li&gt;Installs an input handler (using WASM) that takes the user&apos;s input, searches
through the index, and returns up to 10 matching articles.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The search algorithm is pretty basic, it gives one point per word-occurence
in the blog post, and 5 points if the word is in the title or a tag. Then it
sorts by score, and if there&apos;s a tie, by most recently published.&lt;/p&gt;
&lt;p&gt;There&apos;s no stemming or language processing, the only normalization that
happens is treating everything as lowercase.&lt;/p&gt;
&lt;p&gt;I&apos;ve played with WASM before but this is the first time I&apos;ve actually
deployed something using it. As much as I enjoyed writing it in Rust,
the experience left something to be desired. I had to use a separate
tool (&lt;code&gt;wasm-bindgen&lt;/code&gt;) and load a pre-built JavaScript file first that
then let me initialize the WASM code.&lt;/p&gt;
&lt;p&gt;The payload is also ...heavy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;search.js&lt;/code&gt;: 5.53kB (23.63kB before gzip)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;search_bg.wasm&lt;/code&gt;: 53.78kB (122.82kB before gzip)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;search_index.json&lt;/code&gt;: 323.13kB (322.76kB before gzip)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&apos;m not sure why the index compresses so poorly with Apache, locally it goes
down to 100kB. (I had briefly considered using a binary encoding like
&lt;a href=&quot;https://en.wikipedia.org/wiki/MessagePack&quot;&gt;MessagePack&lt;/a&gt; but thought it
wouldn&apos;t be more efficient than JSON after compression.) And of course,
the more I write, the bigger the index gets, so it&apos;ll need to be addressed
sooner rather than later. I think any pure-JavaScript code would be much
much smaller than the WASM bundle.&lt;/p&gt;
</content></entry></feed>