Running the ArchiveTeam Warrior under Podman

By

I'm finally back on an unlimited internet connection, so I've started running the ArchiveTeam Warrior once again.

The Warrior is a software application for archiving websites in a crowdsourced manner, especially when there's a time crunch when a website announces that it's closing or planning to delete things. Currently the default project is to archive public Telegram channels.

Historically the Warrior was distributed as a VirtualBox appliance, which was a bit annoying to run headlessly and was unnecessarily resource intensive because it required full virtualization. But they now have a containerized version that is pretty trivial to set up.

Relatedly, I've recently been playing with Podman's "Quadlet" functionality, which I really, really like. Instead of needing to create a systemd service to wrap running a container, you can specify what you want to run in a basically systemd-native way:

[Unit]
Description=warrior

[Container]
Image=atdr.meo.ws/archiveteam/warrior-dockerfile

PublishPort=8001:8001

Environment=DOWNLOADER=<your name>
Environment=SELECTED_PROJECT=auto
Environment=CONCURRENT_ITEMS=4

AutoUpdate=registry

[Service]
Restart=on-failure
RestartSec=30
# Extend Timeout to allow time to pull the image
TimeoutStartSec=180

[Install]
# Start by default on boot
WantedBy=multi-user.target default.target

I substituted in my username and dropped this into ~/.config/containers/systemd/warrior.container, ran systemctl --user daemon-reload and systemctl --user start warrior and it immediately started archiving! Visiting localhost:8001 should bring up the web interface.

You can then run systemctl --user cat warrior to see what the generated .service file looks like.

The AutoUpdate=registry line tells podman-auto-update to automatically fetch image updates and restart the running container. You'll likely need to enable/start the timer for this, with systemctl --user enable podman-auto-update.timer.

The one thing I haven't figured out yet is gracefully shutting down, which is important to avoid losing unfinished data. I suspect the Restart=always is harmful here, since I do want to explicitly shutdown in some cases.

P.S. I also have a infrequently updated Free bandwidth wiki page that contains other suggestions for how to use your internet connection for good.

Update (2024-07-14): I changed the restart options to Restart=on-failure and RestartSec=30, which fixes the issue with restarting immediately after a graceful shutdown and correctly restarting if it starts up before networking is ready.