colin

So it's been about ten years since I last posted on this thing, guess I should give it another shot 😛

My #web server hosts a bunch of websites, and I've been watching crawlers scan it for a while. Some of them are pretty mysterious (“Thinkbot” indeed? And it suggests you block its IP while using a decently large IP range), others are obvious (ChatGPT, Google's non-search user agents).

I certainly don't want to encourage LLMs and “AI” (and I hate calling it that, it's nothing of the sort) so I took a very basic attempt at blocking it, by finding a blog post of someone else doing the exact same thing as a starting point.

At this point I'll give a small warning – the list I started with included common scripting language user agents like the ones used by Python and Go. At first glance this makes sense, since some bots will indeed be written with those and not bother setting a custom one, but on the other hand there's plenty of legitimate apps that do the same. The one it took me three months to notice was BlueSky, which uses go and yet my PDS-based account still “worked”, it's just that all the posts I made during that time disappeared into the ether (they're still in the PDS's database but you'll never see them visiting bsky.app). So don't do that!

So I had a bunch of useragents blocked, and nginx will respond 418 I'm a teapot. I add more user agents as I see them appear (e.g. the aforementioned Thinkbot) but I notice that the crawlers are hugely preferring an issue tracker I have installed on one site. My assumption is that they've been programmed to prefer forum-like web apps in the hope that they'll have more “human” content with less LLM pollution.

To support this theory I've noticed a number of smaller but active forums, still using more oldschool (better, as they're more reserved with things like DHTML infinite scrolling) forum software, all started using Cloudflare around the same time, regularly challenging users to the point that posts were often getting lost as Cloudflare apparently has no provisions to save POSTs they decide to interrupt. It's interesting since some of these forums are very anti-big-#internet and looking for a web 1.0 experience, and so are pretty anti-Cloudflare other than the need for anti-bot protection.

Sadly, I'm not really sure what the solution is. Since they're targetting such specific subsets it does feel like there might be an opportunity to poison the well, but it'd need to be in such a way that's transparent to the real users.

When I was young, my dad encouraged me to learn to #program. He sat me down with various computers, which had BASIC (Sinclair Research BASIC in the ZX Spectrum is quite an experience to program - its symbols are tokenised so each single keystroke can type an entire command, with a bunch of modifier keys) or maybe some ancient versions of Turbo Pascal on an IBM XT.

While trying to make platform games, point and click adventure games, text adventure games, simple windowing systems and trying to get some sounds out of Soundblaster-compatible cards on my old 286, I was always interested in the idea of the development of an actual OS. My dad had told me about how #DOS was a layer between the hardware and the software, and had shown me Sun Sparcstations running X-Windows at his work, as well as Windows 3.1, so I was aware of the existence of all these different systems, even though I couldn't get to use many of them.

In addition to this, my dad worked as an electronics engineer for some time, so I was also exposed to 8-bit #microcontrollers, random hardware and a general interest in electronics. This made things like low level drivers even more interesting, so it's no surprise that I'd spend time thinking about what my own OS might look like, sketching random little windows and things. Of course, 12 year old me didn't really appreciate the complexity in a modern OS!

Saying that, hobby OS development has become a popular pastime amongst software engineers – the existence of websites like osdev.org, /r/osdev, the educational xv6 and the many OS projects on Github show that it's something plenty of people are interested in.

So, as the years have gone by, I've still had the notion, but didn't really do much. The more I learned about OS development and software in general, the more I knew was required, whilst still not really getting started. I had particular concerns about things like virtual memory, where you need to be able to manipulate your page tables from within the linear world of memory as defined by your page tables, which is just mind bending when you're first starting out. After a while of thinking about it though, I found things started to fall into place, and if you take it one step at a time, things start to work.

For my OS, I have some ideas I want to play with:

  • Avoid process-oriented computing Most operating systems bundle applications together, which generally include everything up from an OS interface level up to their UI. I want a data and/or task-oriented OS, where smaller components are tied together to perform everyday, and unusual, tasks, resulting in more code reuse, more powerful problem solving for end users even without programming and providing for distributing tasks over multiple machines on the network.
  • Micro-kernel as much as possible These components can be drivers or pure userspace tasks, but from an API perspective, they should look the same. As much as possible will be in userspace to ensure maximum stability – and if a task dies, components can attempt to recover by restarting the task (and in the case of network distributed processing, moving those tasks onto another machine)
  • No “compatibility layers” Though it'll probably put off new users, I don't intend to have any sort of standard libraries to make file, device or UI access work. Though these are useful for running existing applications, I want my OS to be something different, with APIs to match. The only exceptions will be to allow self-hosting, so there will be shims to allow things like gcc to work that aren't part of the OS, but the toolset.
  • Encrypted, versioned file system Speaks for itself! Everyone deserves privacy, and to be able to recover their files.
  • Built in shell is an object-oriented BASIC system This ties in to my belief that computing is for everyone – anybody should be able to write a little app, a script or whatever they need to perform their basic tasks at home and work. In the '80s most home computers had built in BASIC, allowing people to tinker, but computing has been slowly moving away from this. I'd like my OS to bring this flexibility back, allowing people to use the computer to its full potential and self educate.
I have other ideas, but those are the “big” ones for now!

So, I've decided to start a blog. I might never update it, much like my LiveJournal, but I felt the site needed some refreshing since I haven't had time to update it in recent years.

To begin with, I've uploaded my original internal 360 controller driver source to github. A few people have asked for this over the years, but I already had it in internal source control (it's old enough that it was still in Subversion even though I'd moved over to git) and I was always afraid of actually getting around to it. However, I bit the bullet and there it goes! I do still intend to work on it, but it's hard to find time.

Other than that, I always have some random project I'm thinking about, so perhaps I can blog about that. Recently I've been playing with hobby OS development, so there's plenty of drama there.