Even with things like Python, CGI is pretty fast these days. If your CGI script takes a generous 400 milliseconds of CPU to start up and your server has 64 cores, you can serve 160 requests per second, which is 14 million hits per day per server. That's a high-traffic site.
That is, if your web service struggles to handle single-digit millions of requests per day, not counting static "assets", CGI process startup is not the bottleneck.
A few years ago I would have said, "and of course it's boring technology that's been supported in the Python standard library forever," but apparently the remaining Python maintainers are the ones who think that code stability and backwards compatibility with boring technology are actively harmful things, so they've been removing modules from the standard library if they are too boring and stable. I swear I am not making this up. The cgi module is removed in 3.13.
I'm still in the habit of using Python for prototyping, since I've been using it daily for most of the past 25 years, but now I regret that. I'm kind of torn between JS and Lua.
Amusingly that links to https://peps.python.org/pep-0206/ from 14th July 2000 (25 years ago!) which, even back then, described the cgi package as "designed poorly and are now near-impossible to fix".
That fails pretty hard at providing a rationale. Basically it says that CGI is an inefficient interface because it involves creating a new process! Even if that were true, "You shouldn't want to do such an inefficient thing" is very, very rarely a reasonable answer to a technical question like "How do I write a CGI script in Python?" or "How do I parse a CSV file in Python?"
There are certainly some suboptimal design choices in the cgi module's calling interface, things you did a much better job of in Django, but what made them "near-impossible to fix" was that at the time everyone reading and writing PEPs considered backwards compatibility to be not a bad thing, or even a mildly good thing, but an essential thing that was worth putting up with pain for. Fixing a badly designed interface is easy if you know what it should look like and aren't constrained by backwards compatibility.
pjmlp 11 hours ago [-]
Not to mention that if efficiency is a goal, probably Python isn't the language as well, so it is a very strange argument from Python developers.
kragen 11 hours ago [-]
It would have been a less strange argument 25 years ago, before the manycore era, when using Python involved less of a performance sacrifice. And there are still cases where Python is acceptably performant. However, the argument is from only 6 years ago, which makes it ridiculous.
riedel 12 hours ago [-]
Moving stuff out of the standard library seems like a reason. However, I think this all is a weird mix of arguments. IMHO new process spawning is a feature and not a bug in the use cases where CGI is used. Most of the stuff is low traffic config interfaces or remote invocable scripts. There was this trend to move stuff to fcgi. We had tons of cases of memory leaks in long running but really seldomly used stuff like mailing list servers. To me cgi is the poor man's alternative to serverless. However, I also do not really completely understand why a standard library has to support it. I have bash scripts running using the Apache CGI mod.
kragen 12 hours ago [-]
I would have phrased it, "serverless is a marketing term for CGI scripts."
I have bash CGI scripts too, though Shellshock and bash's general bug-proneness make me doubt that this was wise.
There are some advantages of having the CGI protocol implemented in a library. There are common input-handling bugs the library can avoid, it means that simple CGI programs can be really simple, and it lets you switch away from CGI when desired.
That said, XSS was a huge problem with really simple CGI programs, and an HTML output library that avoids that by default is more important than the input parsing—another thing absent from Python's standard library but done right by Django.
Right. Isn't that insane? If I hadn't found it by means of having modules removed that my code depended on, I would have thought it was satire.
That policy, and the heinous character assassination the PSF carried out against Tim Peters, mean I can no longer recommend in good conscience that anyone adopt Python.
peter422 2 hours ago [-]
If you rely on ‘cgi’ in your Python application, you are probably fine using 3.12 until mid-2028 when it stops being maintained (and probably beyond).
You guys are all really getting worked up over very little.
vodou 12 hours ago [-]
I really understand your frustration. Everyone developing in Python for a long time has felt it a bit too often when breaking changes (even between minor version updates) once again ruins the day.
But I also understand that the world is not perfect. We all need to prioritize all the time. As they write in the rationale: "The team has limited resources, reduced maintenance cost frees development time for other improvements". And the cgi module is apparently even unmaintained.
I guess a "batteries included" philosophy sooner or later is caught up by reality.
What do you mean by "character assassination" carried out against Tim Peters? Not anything in the linked article I presume?
3036e4 6 hours ago [-]
Maintenance costs... that only exists because other parts of Python do not keep a stable and backwards compatible API? Same problem as everywhere else, but particularly silly when there are different parts of the same organization that is ruining it for each other internally. Not that I think it is ever defensible. A small cost-saving in one place that is causing more extra work in many other places.
graemep 3 hours ago [-]
On top of that, backward incompatibility creates a cost for everyone using Python. I would prefer a slower rate of change and fewer breaking changes.
It does make me wonder whether Python is still the best choice for what I use it for, and whether I should be moving to something else.
Kwpolska 10 hours ago [-]
He was banned for 3 months for opposing a change to the PSF bylaws that would allow the board to remove members with a simple majority vote.
Alright. Another case when "code of conducts" trumps manners or actually being a grownup. It really is a shame. Happened to a friend of mine on a rather big technical mailing list just for arguing for something that some people disagreed to. It would be nice to get back to a system based on manners and respect. That system worked for years.
bgwalter 4 hours ago [-]
They have limited resources because the inner circle chased away most active people in order to secure their own corporate positions (which hilariously failed since companies caught on and fired some of them anyway).
So the remaining people periodically launch some deprecation PEPs or other bureaucratic things in order to give the appearance of active development.
3 hours ago [-]
js2 2 hours ago [-]
> Everyone developing in Python for a long time has felt it a bit too often when breaking changes (even between minor version updates) once again ruins the day
No, not everyone. I've been using Python as my primary language since 2000 (that's 1.5.2 days). It has been the least troublesome language that I work with, and I work with (or have worked with) a bunch (shell, perl, python, ruby, lua, tcl, c, objective-c, swift, java, javascript, groovy, go and probably others I'm forgetting).
Even all the complaints about the Python packaging ecosystem over the years... I just don't get it. Like, have you ever tried working with CPAN or Maven or Gradle or even, FFS, Ruby Gems/bundler? The Python virtual environment concept is easy to understand and pip mostly does its job just fine, and these days, uv makes all that even faster and easier.
Anywho, just dropping a contrarian comment here because maybe I'm part of the generally silent majority that is just able to use Python day in and day out to get their job done.
kragen 12 hours ago [-]
No, it was an unrelated scandal. I don't have my bookmarks handy at the moment, so hopefully you can find a link.
As for prioritizing, I think the right choice is to deprioritize Python.
sitkack 1 hours ago [-]
Python is for everyone, not just the PSF Cabal. Like the Democratic party, there is a huge need for new leadership. We have all seen what a little brigading can do.
jiggawatts 12 hours ago [-]
I don't even use Python, and even I've read Tim Peters' works and think highly of him! To have him so unceremoniously booted for upsetting a committee is absolutely insane.
This is a bit like Apple firing Steve Jobs for wearing sneakers to work because it violates some dress code.
RandomThoughts3 13 minutes ago [-]
[dead]
hot_gril 13 hours ago [-]
Idk about the internal affairs, I just really don't like Python for web backend kind of things. It's taking them way too long to sort out parallelism and packaging, while NodeJS got both right from the start and gracefully upgraded (no 2->3 mess).
Also I used Python way before JS, and I still like JS's syntax better. Especially not using whitespace for scope, which makes even less sense in a scripting language since it's hard to type that into a REPL.
eurleif 10 hours ago [-]
Node.js actually had no parallelism at the start, other than the ability to manually spawn new processes. Worker threads were only added in 2018 with v10.5.0, and only stabilized in 2019 with v12.
What Node.js had from the start was concurrency via asynchronous IO. And before Node.js was around to be JavaScript's async IO framework, there was a robust async IO framework for Python called Twisted. Node.js was influenced by Twisted[0], and this is particularly evident in the design of its Promise abstraction (which you had to use directly when it was first released, because JavaScript didn't add the async/await keywords until years later).
Yeah, generally I feel like the indentation sensitivity was the right idea (the alternative evidently being worse compiler error messages, bugs like the `goto fail` vulnerability, and greater verbosity) but it causes real difficulties with the REPL, as well as with shell one-liners.
Jupyter fixes the REPL problem, and it's a major advance in REPLs in a number of other ways, but it has real problems of its own.
stefan_ 5 hours ago [-]
It seems Python has firmly reached its "Wikipedia notability" era, with busybodies that code little but discuss much dominating and ruining all progress. They make up stuff that reads insane to anyone doing actual work, like the "maintenance burden" of the cgi module:
Turns out most maintenance this thing received is the various attempts of removing it.
14 hours ago [-]
mjw1007 11 hours ago [-]
The Python maintainers are removing the module _named_ cgi, but they're not removing the support for implementing CGI scripts, which is CGIHTTPRequestHandler in the http.server module.
All that was in the cgi module was a few functions for parsing HTML form data.
kragen 11 hours ago [-]
It would be very difficult indeed to make it impossible to implement CGI scripts in Python; you'd have to remove its ability to either read environment variables or perform stdio, crippling it for many other purposes, so I didn't think they had done that. Even if they removed the whole http package, you could just copy its contents into your codebase. It's not about making Python less powerful.
As a side note, though, CGIHTTPRequestHandler is for launching CGI programs (perhaps written in Rust) from a Python web server, not for writing CGI programs in Python, which is what the cgi module is for. And CGIHTTPRequestHandler is slated for removal in Python 3.15.
The problem is gratuitous changes that break existing code, so you have to debug your code base and fix the new problems introduced by each new Python release. It's usually fairly straightforward and quick, but it means you can't ship the code to someone who has Python installed but doesn't know it (they're dependent on you for continued fixes), and you can't count on being able to run code you wrote yourself on an earlier Python version without a half-hour interruption to fix it. Which may break it on the older Python version.
mjw1007 9 hours ago [-]
My mistake.
The support for writing CGI programs in Python is in wsgiref.handlers.CGIHandler .
oasisaimlessly 8 hours ago [-]
[flagged]
andrewflnr 3 hours ago [-]
I can't wait to live in the world where openly admitting your mistakes is considered evidence of disingenuousness.
with the rise of docker, I assume the current gen of progress just assumes you're going to containerise your solutions. How many projects are you actively upgrading python for?
oblio 7 hours ago [-]
On the other hand, you can't carry old stuff to infinity.
sitkack 60 minutes ago [-]
You can't carry anything to infinity, by definition. Please argue constructively.
girvo 6 hours ago [-]
Why not? Was it broken? If it was, is it easily fixable?
rvnx 6 hours ago [-]
It takes time, and this means that instead of working on something else, their time is locked on this.
LtWorf 4 hours ago [-]
The CGI standard hasn't changed… what changes did the module need?
oblio 6 hours ago [-]
Well, for one, security concerns, especially for an internet oriented component.
Secondly, you have to find a reliable maintainer or several.
A lot of people want stuff to be maintained indefinitely for them by unspecified "others".
rvnx 6 hours ago [-]
You don't have to find a maintainer.
Not updating the system is usually a solution to such problems.
At best there is a nginx or an API in front that acts a reverse proxy to clean-up/normalize the incoming requests and prevent directly exposing the service.
Example: banks, airlines, hospitals, air traffic controllers, electricity companies, etc
All critical services that nobody wants to touch, as it works +/-
oblio 3 hours ago [-]
Guess what, all those places can just use Python 3.12 for as long as it's maintained and if they REALLY can't update, they can:
a) make the system air gapped
b) pay a Python consulting company to back port security fixes
c) hire a Python core dev to do the system, directly
OOOOR, they can just update to Python 3.13 and migrate to the equivalent Python package that's not part of the core. For sure they already use other Python packages already.
We're making a mountain out of a molehill, also on behalf of places that have plenty of money to spend if push comes to shove.
Levitating 2 hours ago [-]
> If your CGI script takes a generous 400 milliseconds of CPU to start up
then that endpoint will have at least 400ms response times, not great
WD-42 14 hours ago [-]
I don't get it. Having a complaint about Python removing CGI from the stdlib is well and fine. But then you say you'd rather consider JS, which doesn't even have a std lib? Lua doesn't have a CGI module in stdlib either.
sitkack 58 minutes ago [-]
The argument isn't about who has the standard lib, what I think Kragen is saying is that that the Python leadership has no qualms about removing functionality that people rely on and making up lame reasons to do so.
kragen 13 hours ago [-]
I think it's fine to not have functionality in the standard library if it can be implemented by including some code in my project. It's not fine to have a standard library that stuff disappears from over time.
getdoneist 11 hours ago [-]
Ruby has been removing stuff from stdlib for some time now. But "moving" is the correct word, because it is simply moved to a stand-alone gem, and with packaging situation in Ruby being so good, it feels completely seamless.
spockz 12 hours ago [-]
Whenever code is removed from the Java standard library it is announced ages ahead of time and then typically it becomes available in a separate artefact so you can still use it if you depended on it.
kragen 12 hours ago [-]
I wrote an online securities trading system in Java, with a little Jython. Java is reasonably good at stability, but I find it unappealing for other reasons, especially for prototyping. Kotlin might be okay.
Jython no longer works with basically any current Python libraries because it never made the leap to Python 3, and the Python community stigmatizes maintaining Python 2 compatibility in your libraries. This basically killed Jython, and from my point of view, Jython was one of the best things about Java.
mike_hearn 8 hours ago [-]
Note that Jython was replaced by GraalPy, which does target Python 3
kragen 4 hours ago [-]
I hadn't heard about GraalPy! Thanks for the note.
WD-42 6 hours ago [-]
It is fine though. CGI for python is one pip install away, as it is for the other languages you listed.
Most rational people are ok with code being removed that 99.99% of users have absolutely no use for, especially if it is unmaintained, a burden, or potentially contains security issues. If you are serious about cgi you’ll probably be looking at 3rd party modules anyway.
LtWorf 2 hours ago [-]
I don't think it's reasonable to expect that most python installs use pip
WD-42 42 minutes ago [-]
You must be living in a different reality.
LtWorf 24 minutes ago [-]
Yes mine. At work I don't use pip. We have thousands of servers all using python and none using pip or modules obtained via pip before imaging.
Personally… I don't use pip. Why? apt is there.
simonw 2 hours ago [-]
Why not?
cb321 1 hours ago [-]
I'm not sure what @LtWorf means, exactly, but one reason I can think of is that on Linux (Gentoo & Debian at least), the system package managers are not putting pip in place by default with python itself, the way they used to. The rationale, I believe, is to steer users towards using the system package manager or only doing ~/.local style things.
Because it's much easier to use apt and install the module system-wide, and then I don't need to mess around with venv, requirements.txt, and there's someone that will fix any CVEs or malicious backdoors that get found out, automatically. Unlike what happens with venv.
rollcat 10 hours ago [-]
> Lua doesn't have a CGI module in stdlib either.
Lua barely has any stdlib to speak of, most notably in terms of OS interfaces. I'm not even talking about chmod or sockets; there's no setenv or readdir.
You have to install C modules for any of that, which kinda kills it for having a simple language for CGI or scripting.
Don't get me wrong, I love Lua, but you won't get far without scaffolding.
kragen 10 hours ago [-]
Right, you need something more specific than Lua to actually write most complete programs in. The LuaJIT REPL does provide a C FFI by default, for example, so you don't need to install C modules.
But my concern is mostly not about needing to bring my own batteries; it's about instability of interfaces resulting from evaporating batteries.
rollcat 9 hours ago [-]
Honestly, I'm not worried about the batteries. Thanks to FFI, you can just talk to libc, and vendor any native Lua code you need. (That's my approach for LÖVE.)
LuaJIT, release-wise, has been stuck in a very weird spot for a long time, before officially announcing it's now a "rolling release" - which was making a lot of package maintainers anxious about shipping newer versions.
It also seems like it's going to be forever stuck on the 5.1 revision of the language, while continuing to pick a few cherries from 5.2 and 5.3. It's nice to have a "boring" language, but most distros (certainly Alpine, Debian, NixOS) just ship each release branch between 5.1 and 5.4 anyway. No "whatever was master 3 years ago" lottery.
bravesoul2 13 hours ago [-]
Node.js provides the defacto standard lib for JS backend and its got a good feature set.
That said these days I'd rather use Go.
kragen 13 hours ago [-]
Golang seems pretty comfortable from the stuff I've done in it, but it's not as oriented toward prototyping. It's more oriented toward writing code that's long-term maintainable even if that makes it more verbose, which is bad for throwaway code. And it's not clear how you'd use Golang to do the kind of open-ended exploration you can do in a Jupyter notebook, for example. How would you load new code into a Golang program that's already running?
Admittedly Python is not great at this either (reload has interacted buggily with isinstance since the beginning), but it does attempt it.
bravesoul2 12 hours ago [-]
There are live reload utilities for Go but yeah it comes down to recompiling and restarting - which is quick but any state is gone.
I agree its not a rapid prototyping kind of language. AI assistance can help though.
oblio 7 hours ago [-]
Can you load new code into a Node.js program that's running?
rvnx 6 hours ago [-]
`eval()` to run new code and `delete require.cache` (to reload a module)
oblio 3 hours ago [-]
Hmmm... Can you eval entire files?
kragen 1 hours ago [-]
Yes.
pjmlp 11 hours ago [-]
I rather stick with PHP or JS, due to having a JIT in the box for such cases.
Since I learnt Python starting in version 1.6, it has mostly been for OS scripting stuff.
Too many hard learnt lessons with using Tcl in Apache and IIS modules, continuously rewriting modules in C, back in 1999 - 2003.
fulafel 3 hours ago [-]
Python got a JIT recently (an experimental / off by default feature in 3.13).
pjmlp 2 hours ago [-]
Finally, are still far away from the competition.
Also lets see the impact of Microsoft's Python team layoffs on it, given that CPython developers only started caring about performance due to Facebook and Microsoft, so far the JITs in Python have been largely ignored by the community.
int0x29 10 hours ago [-]
I don't think the JIT will help that much as each request will need to be JITed again. Unless Node and PHP are caching JIT output
ChocolateGod 10 hours ago [-]
Yes, both do.
int0x29 2 hours ago [-]
On the file system? CGI is a whole new process per request
edit: Looks like yes for Node JS. I can't tell for PHP as I keep getting results for optcache which is different and in memory.
vanviegen 35 minutes ago [-]
Not sure what the current situation is, but PHP used to ship with APC, a shared memory cache, that enables this exactly. Even without filesystem overhead!
I still miss PHP's simple deployment, execution and parallelization model, in these over-engineered asyncy JavaScripty days.
pjmlp 2 hours ago [-]
My point wasn't using them with CGI, rather that they have JIT in the box, nodejs with V8, PHP has zend and its own JIT since Facebook efforts with HipHop and later Hack (HHVM), both with server support as well, no need for CGI approach.
ksec 6 hours ago [-]
Zen 6c is about to get 256 Core Per Socket, 512 vCPU / Thread, or 1024 vCPU in a Dual Socket System. That is 2560 Request Per Second ( or PageView ), and this doesn't even include caching.
If I remember correctly that is about half of what StackExchange served on daily average over 8 servers. I am sure using Go or Crystal would have scale this at least 10x if not 20x.
The problem I see is that memory cost isn't dropping which means somewhere along the graph the memory cost per process together will outweight whatever advantage this has.
Still, sounds like a fun thing to do. At least for those of us who lived through CGI-Bin and Perl era.
diordiderot 1 hours ago [-]
People use crystal?
bravesoul2 13 hours ago [-]
Yeah high performance web used to be an art. Now it's find what you are doing that's stupidly wasteful that you did to ship fast, and stop doing that thing.
Your app could add almost no latency beyond storage if you try.
mathiaspoint 7 hours ago [-]
The scary thing about CGI for me is more the shell insanity than the forking. Although I think after the shell shock RCE most servers probably switched to directly execing the CGI process.
gsliepen 7 hours ago [-]
Indeed. There is no reason why CGI would need shells or scripting languages though, you can just write them in any programming language. It's not that hard; I wrote this pastebin clone in C: https://github.com/gsliepen/cbin/
mathiaspoint 7 hours ago [-]
It's not an issue with the actual CGI program. It's hard to make exec alone work the way people expect without doing something like exec('sh', '-c',...) so a lot of servers were doing that.
chasil 5 hours ago [-]
The problem child for classic CGI is Windows, where process creation is 100x slower than any other POSIX implementation.
You can measure this easily, get a copy of Windows busybox and write a shell script that forks off a process a few thousand times. The performance difference is stark.
kragen 4 hours ago [-]
That's interesting. I hadn't thought about that. Still, fork() (plus _exit() and wait()) takes 0.7ms in Termux on my phone for small processes, as measured by http://canonical.org/~kragen/sw/dev3/forkovh.c.
Are you really saying that it takes 70ms on Microsoft Windows? I don't have an installed copy here to test.
Even if it does, that would still be about 15% of the time required for `python3 -m cgi`, so it seems unlikely to be an overriding concern for CGI programs written in Python, at least on manycore servers serving less than tens of millions of hits per day. Or does it also fail to scale across cores?
layer8 4 hours ago [-]
That’s more an argument against using Windows for a web server than against using CGI.
kqr 13 hours ago [-]
Consider Perl. It's not quite as batteries-included as Python, but it is preinstalled almost everywhere and certainly more stable than JS and Lua. (And Python.)
At the time Perl was the thing I used in the way I use Python now. I spent a couple of years after that working on a mod_perl codebase using an in-house ORM. I still occasionally reach for Perl for shell one-liners. So, it's not that I haven't considered it.
Lua is in a sense absolutely stable unless your C compiler changes under it, because projects just bundle whatever version of Lua they use. That's because new versions of Lua don't attempt backwards compatibility at all. But there isn't the kind of public shaming problem that the Python community has where people criticize you for using an old version.
JS is mostly very good at backwards compatibility, retaining compatibility with even very bad ideas like dynamically-typed `with` statements. I don't know if that will continue; browser vendors also seem to think that backwards compatibility with boring technology like FTP is harmful.
cb321 3 hours ago [-]
That article is a pretty good overview at the time.
Only one benchmark on one system, but over in day before yesterday's HN thread on this (https://news.ycombinator.com/item?id=44464272), I report a rather significant slowdown in Perl start up overhead: https://news.ycombinator.com/item?id=44467268 . Of course, at least for me, Python3 is worse than Python2 by an even larger factor and Python2 worse than Perl today by an even larger factor.
FWIW, in Nim, you can get a CGI that probably runs faster than the Go of this article with simply:
import std/cgi # By default both ..
for (key, val) in decodeData(): #.. $QUERY_STRING & POST
if key == "something":
do_something(val)
I don't know of a `cgitb` equivalent even in the Nimbleverse. Some of the many web frameworks in Nim like jester seem to have that kind of thing built into them, though I realize a framework is not the same as CGI (and that's one of the charms of CGI).
kqr 13 hours ago [-]
Ha, fun bit of history! Many of the listed problems with Perl can be configured away these days. I don't have time for a full list, but as two early examples:
- `perl -de 0` provides a REPL. With a readline wrapper, it gives you history and command editing. (I use comint-mode forn this, but there are other alternatives.)
- syscalls can automatically raise exceptions if you `use autodie`.
Why is this not the default? Because Perl maintainers value backward compatible. Improvements will always sit behind a line of config, preventing your scripts from breaking if you accidentally rely on functionality that later turns out to be a mistake.
That's a great read, thanks! And I didn't know about autodie, though I do use perl -de1 from time to time.
Perl feels clumsy and bug-prone to me these days. I do miss things like autovivification from time to time, but it's definitely bug-prone, and there are a lot of DWIM features in Perl that usually do the wrong thing, and then I waste time debugging a bug that would have been automatically detected in Python. If the default Python traceback doesn't make the problem obvious, I use cgitb.enable(format='text') to get a verbose stack dump, which does. cgitb is being removed from the Python standard library, though, because the maintainers don't know it can do that.
Three years ago, a friend told me that a Perl CGI script I wrote last millennium was broken: http://canonical.org/~kragen/sw/rfc-index.cgi. I hadn't looked at the code in, I think, 20 years. I forget what the problem was, but in half an hour I fixed it and updated its parser to be able to use the updated format IETF uses for its source file. I was surprised that it was so easy, because I was worse at writing maintainable code then.
Maybe we could do a better job of designing a prototyping language today than Larry did in 01994, though? We have an additional 31 years of experience with Perl, Python, JS, Lua, Java, C#, R, Excel, Haskell, OCaml, TensorFlow, Tcl, Groovy, and HTML to draw lessons from.
kqr 11 hours ago [-]
We can definitly do better than Perl. The easy proof is that modern Perl projects are supposed to start with a bunch of config to make Perl more sane, and many of them also include the same third-party libraries that e.g. improve exception handling and tweak the datetime functionality in the standard library.
One benefit Perl had that I think not many of the other languages do was being designed by a linguist. That makes it different -- hard to understand at first glance -- but also unusually suitable for prototyping.
kragen 10 hours ago [-]
What do you think a better design would look like?
rollcat 10 hours ago [-]
What's the landscape like, for when you need to scale your project up? As in: your project needs more structure, third-party integrations, atomic deployments, etc - not necessarily more performance.
Python has Werkzeug, Flask, or at the heavier end Django. With Werkzeug, you can translate your CGI business logic one small step at a time - it's pretty close to speaking raw HTTP, but has optional components like a router or debugger.
tonyhart7 12 hours ago [-]
Yeah but 400ms is unacceptable this days
kragen 12 hours ago [-]
That's intended as an unreasonably high upper bound. On my cellphone, in Termux, python3 -m cgi takes 430–480ms. On my laptop it takes 90–150ms. On your server it probably takes less.
I agree that tens of milliseconds of latency is significant to the user experience, but it's not always the single most important consideration. My ping time to news.ycombinator.com is 162–164ms because I'm in Argentina, and I do unfortunately regularly have the experience of web page loads taking 10 seconds or more because of client-side JS.
cenamus 12 hours ago [-]
If the whole site takes 5 seconds to fully hydrate and load its 20megs of JS I'll gladly take a server side rendered page that has finished loading in a second.
IshKebab 6 hours ago [-]
I would rather take a server side rendered page that finishes loading in 100ms than 400ms.
ksec 6 hours ago [-]
On site like [1] Total Real Returns running on Crystal, response time could be sub 10ms but you are fundamentally limited by latency between you and server. Which could be 150ms if you visit to US server from Valeriepieris Circle, where 50% of population lives.
Average website loading time nowadays is measured in seconds rather than milliseconds...
rvnx 6 hours ago [-]
Now using the solution of OP, the JS loaded website now takes 5.4sec instead of 5sec, a 10% slowdown that the users have to pay, and that will increase server costs.
lucb1e 2 hours ago [-]
> If your CGI script takes a generous 400 milliseconds of CPU to start up and your server has 64 cores, you can serve 160 requests per second
...and then you're wasting a 64-core server at 100% CPU load on just starting up and tearing down script instances, not yet doing any useful work. This is doing 160 startups per second, not requests per second
Would be a pretty big program for it to require 400ms on a fast CPU though, but the python interpreter is big and if you have one slow import it's probably already not far off
refulgentis 2 hours ago [-]
Surprised this is the top comment 12 hours in, should be intuitive to HN that 160 req/s on 64 cores is...not great!...and that's before the errors it takes to get that # (ex. all we're doing is starting the per-request executable)
nurettin 12 hours ago [-]
<PHP> I would like to have a word.
lucb1e 2 hours ago [-]
The PHP interpreter (+mods) don't take anywhere near 400ms to start up, even on very old CPUs, though? Not sure what you mean
14 hours ago [-]
genghisjahn 5 hours ago [-]
I recently ran a test where I had a $350 mini server using a golang binary, rabbitmq, redis and MySQL (all hosted on the same mini server)handle 5000 reqs/s sustained. That translates to 400 million reqs in a 24 hour day.
I’m amazed at how great the tools are these days that are free and yet we pay so much to cloud providers. I know it’s not an apples to apples comparison but it was so great to develop all that and fine tune it on a box in my basement.
stavros 4 hours ago [-]
Yep, and I hear everyone deploying some Kubernetes microservices monstrosity that slows development down tenfold because they don't know that servers can do more than 1 req/sec.
It's crazy to me that we keep paying all these overheads for no reason other than "it's what Google does". I really need to write my article about my modular monolith architecture up, it's worked really well for us.
papichulo2023 4 hours ago [-]
I kinda blame the mentality that the only way to use containers is with K8s. Docker compose and basic scrips can take wayyy too far.
delduca 2 hours ago [-]
I agree — Docker Compose is more than enough in most cases. At work, the only reason we migrated to Kubernetes was to enable zero-downtime deployments (blue/green).
That said, to be honest, I don’t find Kubernetes complex — but that might be because I’ve been using it for quite some time.
papichulo2023 1 hours ago [-]
The problem is not to use it but to maintain it, keeping a baremetal k8 cluster updated takes a lot of effort, reason why most people recommend to use a managed one. It is like the entry pill to the cloud lock-in.
beala 4 hours ago [-]
I've thought about hosting my side projects in my basement, but then I'd be susceptible to power outages and ISP downtime. If a drive fails while I'm traveling I'm SOL. If I accidentally lock myself out, there's no serial terminal to fall back on. I could go the route of those folks on r/homelab, but then it's no longer clear I'm saving money, especially if you factor in my time. My conclusion is that cloud providers are actually a great deal and benefit from huge economies of scale.
lucb1e 2 hours ago [-]
> If a drive fails while I'm traveling I'm SOL [...] My conclusion is that cloud providers are actually a great deal
You realise you can always still restore that backup onto someone else's server? When you need to restore from backup either way. I don't really see why one would pre-emptively pay for it
> If I accidentally lock myself out, there's no serial terminal to fall back on.
Why not? That sounds like a choice you can make. The hardware I hosted on either had a KVM built in or can just attach a USB keyboard and VGA (or nowadays HDMI) display
Power outages aren't common in my area, and otherwise a UPS is not that expensive (compared to if you pay a third party to set up redundant power for your hobby system)
You can choose to pre-emptively pay the cloud premium and give them access to your server so you can also social engineer yourself back in via customer support (after all, if you aren't expecting to lose the password and thus don't need to convince a human to let you into your hosting account, then you could also hold onto your own server's password). It just all seems very opposed from the self-hosting spirit where you're self-reliant, which apparently you value since you were considering whether to self host?
beala 16 minutes ago [-]
> You realise you can always still restore that backup onto someone else's server? When you need to restore from backup either way. I don't really see why one would pre-emptively pay for it
Huh? If a drive goes out in my home server, the entire thing is offline until amazon delivers a new one (don't tell me I now need to keep a stockpile of spares). If a drive goes out in S3, I never know about it because AWS takes care of it. You don't understand why someone would want to "preemptively" pay for that?
> It just all seems very opposed from the self-hosting spirit where you're self-reliant, which apparently you value since you were considering whether to self host?
My self hosting is not ideological at all. I couldn't care less about "self reliance." The reason I considered hosting locally was to save money, and I concluded I was actually getting a lot of value for the money I was spending on AWS.
genghisjahn 3 hours ago [-]
I’ve thought through this as well. I’m not saying your conclusion is wrong, but if I get to where I really need backups to all this, there are solutions available. And I’ve had to learn lot more about logging and alerting to make sure my services stay up. But that time spent is repaid many times over with solutions that I understand. There is a time trade off but it’s usually a few hours of me learning “oh this is how you solve this” and then I don’t worry about it anymore.
In my experience, cloud at scale has ALWAYS required someone with a pager willing/paid to get up at 3am on Christmas Eve. So someone’s time is being used no matter.
varispeed 4 hours ago [-]
You don't have to pay cloud providers. You can rent a dedicated service from a hosting provider and go to town with it (very much limited by your bandwidth or transfer limit).
Cloud providers are used because there is a lot of vested interest - e.g. VC and investors also having shares in cloud companies, then there is fear their investment might not survive imaginary surge of traffic that will never happen in reality. Cloud sales people are masters of playing at investors insecurities.
x_may 3 hours ago [-]
I think it’s also largely driven by the apparently cheapness of turning the CapEX of server buying to the OpEX of cloud renting. Less up front investment and auditing/access controls for SoC2 compliant are so much easier m.
lucb1e 2 hours ago [-]
> and yet we pay so much to cloud providers
Not all of us we don't
jgalt212 4 hours ago [-]
Indeed. When I look at our VM spend, it's high because we need servers with a lot of local disk--not a lot of compute. For some reason, you cannot get one without the other.
Can you imagine the size of the business / service you could run with 4 attached 20TB drives, and a modest CPU? Good luck getting such from a cloud provider.
zepolen 4 hours ago [-]
You can get 4x20TB + decent cpu for under 200$/month.
rvnx 4 hours ago [-]
Where is this ?
the8472 3 hours ago [-]
hetzner SX65
rvnx 3 hours ago [-]
thank you
reidrac 8 hours ago [-]
If the cgi bin needs DB access, every time the process starts it needs to open a connection. Having the code in memory, for example using fastcgi, is not only to avoid the startup time penalty; you can also have a DB connection pool or at least a persistent DB connection per thread.
procaryote 5 hours ago [-]
Do it at scale and your database will be sad about the number of connections
At least that was the case when I did the "python is single threaded, let's run many of them" + "python is slow, let's run many of them" dance
At scale you end up using shared connection pools outside of python (like pgbouncer) and a lot of tuning to make it serve the load while not killing the database
Of course, then we reimplemented in a multithreaded somewhat performant language and it became dead simple again
Twirrim 5 hours ago [-]
There were standard ways to handle that, such as hosting a separate daemon that acts effectively as your proxy. Using Unix sockets instead of TCP/IP makes connecting to it relatively cheap.
pjmlp 4 hours ago [-]
That is why CGI eventually evolved into a model that would keep some of the stuff around between requests.
Iwan-Zotow 4 hours ago [-]
Use udp
Tractor8626 13 hours ago [-]
2400 rps on this hardware on hello world application - isn't it kinda bad?
And we trading performance for what exactly? Code certainly didn't become any simpler.
kqr 13 hours ago [-]
It's not great, but it is enough for many use cases. Should even handle a HN hug of death.
procaryote 5 hours ago [-]
The marging for error becomes tiny though... Performance regression making some requests slow? Suddenly you don't handle even those 2k requests. And a Denial of service attack doesn't even have to try hard.
kqr 3 hours ago [-]
I'm not convinced the run-time performance always comes from the same pool as start-up overhead. If the regression is waiting-based because resource contention or whatnot (very common) then that won't make the OS start up new processes more slowly, for example.
Sure, there are regressions that will make start-up overhead worse, but I mean, there will be pathological regressions in any configuration.
Tractor8626 12 hours ago [-]
But why? What advantages we getting?
kqr 11 hours ago [-]
Hypothetically, strong modularisation, ease of deployment and maintenance, testability, compatibility with virtually any programming language.
In practise I'm not convinced -- but I would love to be. Reverse proxying a library-specific server or fiddling with FastCGI and alternatives always feels unnecessarily difficult to me.
slyall 13 hours ago [-]
It's only bad if you need to get more than 2000 rps
Which is only a small proportion of sites out there.
YmiYugy 7 hours ago [-]
Yes, but it's running on pretty powerful hardware. Try this with 1 vCPU, 512MB RAM and a website, that makes a lot of requests for a single page visit.
Until recently I used to maintain some legacy B2B software, where each customer got their own container with very strict resource limits. A single page visit could cause 20-50 requests. Removing CGI was a significant performance win, even with a single user loading just one page.
Tractor8626 12 hours ago [-]
If there is no some other advantages - it is just bad.
gred 11 hours ago [-]
I'd rather not pay for 8 cores / 16 threads, though...
withinboredom 11 hours ago [-]
Depends on where you are shopping. I pay €211 every month for 96 threads and 384 gb of ram (clustered) -- disks are small (around 1tb each), but I'm still nowhere near 50% utilization there.
gred 11 hours ago [-]
Yeah, I pay $400/month to not be bothered with any installation or upgrade drama, ever. The problem is that I only get 16 threads.
withinboredom 9 hours ago [-]
Drama is pretty much non-existent; but when it happens, it can be a day or two where things are in a not-great state. Backups help a lot here, to easily get into a known working state -- also practicing restoring from those backups is a good exercise, so I don't mind it too much. There's nothing like learning your backup was missing some component or something, especially when the risks aren't high.
I think the worst drama ever was a partial disk failure. Things kinda hobbled along for awhile before things actually started failing, and at that point things were getting corrupted. That poofed a weekend out of my life. Now I have better monitoring and alerting.
gred 5 hours ago [-]
Cool, I could see doing this for some projects. Thanks for going into detail a bit!
johnisgood 10 hours ago [-]
I installed many Arch Linux for servers, they have been online for decades. There is no fuss. The only downtime was when I issued a reboot, but it was back in 5 seconds if not less.
So you really do not have to be bothered by installation or anything of these lines. You install once and you are fine. You should check out the Wiki pages of Arch Linux, for example. It is pretty straightforward. As for upgrades, Arch Linux NEVER broke. Not on my servers, and not on my desktop.
That said, to each their own.
indigodaddy 8 hours ago [-]
Arch on servers is completely insane, unless you never update or are a maniac and update every few days
johnisgood 6 hours ago [-]
Why would you think that it is insane? It works well, and had no issues for decades. Any personal experiences you have that suggest otherwise? I would love to hear.
I will give you the benefit of the doubt that you are not regurgitating what other people have been saying (IMO wrongfully), which is: "Arch Linux for servers? Eww. Bleeding edge. Not suitable for servers.". All that said, please, do share. It will not negate those decades of no issues, however.
As I said, I maintain quite a lot of Arch Linux servers with loads of services without any issues, for decades.
indigodaddy 4 hours ago [-]
I used Arch for a few years on desktop (granted that was over ten years ago), and if I didn't update frequently enough, updates would routinely break. I would never use it on a server because of that. RedHat and Debian exist for a reason.
oblio 7 hours ago [-]
Software is so advanced these days that tech SMBs can probably run Windows XP in production.
hagbard_c 8 hours ago [-]
I paid about that amount once for an ex-lease server which has been in use since 2017. A DL 380 G7 (24 threads, 128 GB) giving me all the freedom I want. A large solar array on a barn roof gives us negative energy bills so power use is a non-issue. If you have the space for the hardware and the possibility to offset power use using solar or just dirt-cheap electricity this might be a solution for you as well. There's plenty of off-lease hardware on the market which can run for many years without problems - in the intervening 8 years I have replaced one power supply (€20), that's it.
masklinn 11 hours ago [-]
> It's only bad if you need to get more than 2000 rps
Or if you don't want to pay for an 8/16 for the sort of throughput you can get on a VPS with half a core.
kqr 13 hours ago [-]
I'd argue it's bad even if you get more than 1000 Bq of requests. You never want to approach 100 % utilisation, and I'd aim to stay clear of 50 %.
We're still serving a cgi-bin directory at work for the occasional quick and dirty internal web app. The ergonomics are great as long as you keep it simple. The fact that it's cgi doesn't mean you have to print http/1.0 to stdout manually. For example, in python the builtin wsgiref.handlers.CGIHandler lets you run any wsgi app as a cgi script:
The way we run the scripts is with uwsgi and its cgi plugin[1]. I find it simpler and more flexible than running apache or lighttpd just for mod_cgi. Since uwsgi runs as a systemd unit, we also have all of systemd's hardening and sandboxing capabilities at our disposal. Something very convenient in uwsgi's cgi handling that's missing from mod_cgi, is the ability to set the interpreter for a given file type:
cgi = /cgi-bin=/webapps/cgi-bin/src
cgi-allowed-ext = .py
cgi-helper = .py=/webapps/cgi-bin/venv/bin/python3 # all dependencies go here
Time to first byte is 250-350ms, which is acceptable for our use case.
I’ve also thought about this moreso as part of a workflow for quickly prototyping stuff. At least for a lot of the modern JIT languages I believe their startup times will be dominated by your imports unless you go with a fastcgi model. This came up as I started adopting h2o web server for local scripts since it has clean and quick to write config files with mruby and fast-cgi handlers and is also crazy fast: https://h2o.examp1e.net/configure/fastcgi_directives.html
Another place this can be useful is for allowing customers to extend a local software with their own custom code. So instead of having to use say MCP to extend your AI tool they can just implement a certain request structure via CGI.
dolmen 14 hours ago [-]
An MCP frontend to CGI programs would not be a bad idea for a end user environment.
This makes me wonder if an MCP service couldn't be also implemented as CGI: an MCP framework might expose its feature as a program that supports both execution modes. I have to dig into the specs.
zokier 9 hours ago [-]
Fastcgi kinda loses all the benefits of cgi though.
jarofgreen 12 hours ago [-]
Had a similar chat with someone recently after I used Apache for a side project in part because of it's .htaccess feature.
One big reason to avoid them was performance; it required extra disk access on every request and it was always better to put the configuration in the main config file if possible.
But now? When most servers have an SSD and probably spare RAM that Linux will use to cache the file system?
Ok, performance is still slightly worse as Apache has to parse the config on every request as opposed to once, but again, now that most servers have more powerfull CPU's? In many use cases you can live with that.
> I'm not a real programmer. I throw together things until it works then I move on. The real programmers will say "Yeah it works but you're leaking memory everywhere. Perhaps we should fix that." I’ll just restart Apache every 10 requests.
PHP got a very long way since then, but a huge part of that was correcting the early mistakes.
> PHP 8 is significantly better because it contains a lot less of my code.
jarofgreen 9 hours ago [-]
I'm not sure in which spirit you mean that, so I'm going to choose "approvingly" :-)
I do have thoughts for later about modes which could take all the config from .htaccess files and build them into the main config so then you avoid any performance issues - however you have to do that carefully to make sure people don't include any bad config that crashes the whole server. One of the nice things about using .htaccess files as intended is Apache has the Nonfatal flag on AllowOverride so you can avoid that. https://httpd.apache.org/docs/2.4/mod/core.html#allowoverrid...
rollcat 9 hours ago [-]
I mean honestly - the "classic" Apache model of throwing things into the www root is very strong for rapid development. Hot code reloading is sometimes finicky, you can end up with unexpected hidden state and lose sanity over a stupid heisenbug. Trust me.
IMO you don't need to compensate for bad configs if you're using a proper staging environment and push-button deployments (which is good practice regardless of your development model). In prod, you can offset its main issue (atomic deployments) by swapping a symlink. In that scenario, having a separate .htaccess file actually helps - you don't want to restart Apache if you can avoid it, and again - hot reloading can hide state.
My main issue is that this is all a very different model from what most languages, frameworks, and runtimes have been doing for almost 20 years now. If you're a sysop dealing with heterogenous environments, it's honestly just annoying to have separate tooling and processes.
Personally, ca 10 years ago, this was the tipping point at which I've demanded from our PHP devs that they start using Docker - I've been reluctant about it until that moment. And then, whether it was .htaccess or the main config, no longer mattered - Apache lived in a container. When I needed to make sure things performed well, I used Locust <https://locust.io/>. Just measure, then optimise.
So in practice, yes, spiritually I'm doing what PHP8 did to PHP3. Whether that's "approvingly" is up to your interpretation ;)
malwrar 6 hours ago [-]
I don’t understand why apache wouldn’t just watch the filesystem, this choice means 99.99% of http requests are going to be slowed down by an unnecessary disk reads.
0xbadcafebee 3 hours ago [-]
Even in the early 2000s CGI was really old tech. People were moving to more modern systems that solved the problems with CGI. What's funny is, modern systems actually have exactly the same problems as some of those newer solutions, yet they're just ignored/accepted now?
In the mid-2000s I worked on a very-large-scale website using Apache2 w/mod_perl. Our high-traffic peaks were something like 25k RPS (for dynamic content; total RPS was >250k). Even at that time it was a bit old hat, but the design scaled very well. You'd have a fleet of mod_perl servers that would handle dynamic content requests, and a fleet of Apache2 servers that served static content and reverse-proxied back to the mod_perl fleet for dynamic requests. In front of the static servers were load balancers. They'd all keep connection pools open and the load balancers avoided the "maximum connection limit" of typical TCP/IP software, so there was no real connection limit, it was just network, memory, and cpu limits.
The big benefit of Apache2 w/mod_perl or mod_php was that you combined the pluggability and features of a scalable and feature-filled web server with the resident memory and cache of an interpreter that didn't need to keep exiting and starting. Yes you had to do more work to integrate with it, but you have to do that today with any framework.
The big downside was bugs. If you had a bug, you might have to debug both Apache and your application at the same time. There was not as much memory to be had, so memory leaks were a MUCH bigger problem than they are today. We worked around it with stupid fixes like stopping interpreters after taking 1000 requests or something. The high-level programmers (Perl, PHP) didn't really know C or systems programming so they didn't really know how to debug Apache or the larger OS problems, which it turns out has not changed in 20 years...
FastCGI and later systems had the benefit that you could run the same architecture without being tied to a webserver and dealing with its bugs on top of your own. But it also had the downside of (in some cases) not multiplexing connections, and you didn't get tight integration with the web server so that made some things more difficult.
Ultimately every backend web technology is just a rehashing of CGI, in a format incompatible with everything else. There were technical reasons why things like FastCGI, WSGI, etc exist, but today they are unnecessary now that we have HTTP/2 and HTTP/3. If you can multiplex HTTP connections and serve HTTP responses, you don't need anything else. I really hope future devs will stop reinventing the wheel and go back to actual standards that work outside your own single application/language/framework.
johnisgood 9 hours ago [-]
I remember when I used a C program with CGI. It was quite fast, and this was decades ago. There were no >100 cores or threads, and RAM was not abundant either, it was at best 1 GB. It was doable then, pretty sure it is even more doable today.
PaulDavisThe1st 33 minutes ago [-]
Amazon in 1995 was a C++ executable invoked via CGI () That couldn't scale once load-balancing was required, but it worked pretty well up until that point.
() technically, two of them: one handled the front (customer visible) end, one handled the back-office side.
petesergeant 17 hours ago [-]
> The nascent web community quickly learned that this was a bad idea, and invented technologies like PHP
Well ackshually ... the technology here that was important was mod_php; PHP itself was no different to Perl in how it was run, but the design choice of mod_php as compared to mod_perl was why PHP scripts could just be dumped on the server and run fast, where you needed a small amount of thinking and magic to mod_perl working.
fcatalan 13 hours ago [-]
At that time I was developing with a friend what later was called a Learning Management System: It had content management, assignment uploads, event calendar, grade management, real time chat, forums... It was all plain C via CGI and it was hell to work with.
What almost brought us to tears the day we learned about PHP was how everything we had been painstakingly programming ourselves from scratch reading RFCs or reverse engineering HTTP was just a simple function call in PHP. No more debugging our scuffed urlencode implementation or losing a day to a stray carriage return in an HTTP header...
simonw 17 hours ago [-]
Right, but mod_php was an early addition to the PHP ecosystem and quickly became the default way of deploying it - I believe the first version of the Apache module was for PHP/FI Version 2.0 in 1996: https://www.php.net/manual/phpfi2.php#module
petesergeant 17 hours ago [-]
It was indeed, and I spent much time wailing and gnashing my teeth as a Perl programmer that nothing similar existed in Perl.
AdieuToLogic 16 hours ago [-]
> It was indeed, and I spent much time wailing and gnashing my teeth as a Perl programmer that nothing similar existed in Perl.
mod_perl2[0] provides the ability to incorporate Perl logic within Apache httpd, if not other web servers. I believe this is functionally equivalent to the cited PHP Apache module documentation:
Running PHP/FI as an Apache module is the most efficient
way of using the package. Running it as a module means that
the PHP/FI functionality is combined with the Apache
server's functionality in a single program.
The key difference was that you had to adapt your Perl to work with mod_perl, where mod_php "just worked" in the same way CGIs did -- you could throw your .php scripts up over FTP and they'd benefit from mod_php being installed. This was a massive difference in practice.
EDIT: I have managed to dig out slides from a talk I gave about this a million years ago with a good section that walks through history of how all this worked, CGIs, mod_perl, PSGI etc, for anyone who wants a brief history lesson: https://www.slideshare.net/slideshow/psgi-and-plack-from-fir...
pests 15 hours ago [-]
Just a story of my past.
I got into web dev in the tail end of perl and cgi-bin. I remember my first couple scripts which were just copy/paste from tutorials and what not, everyone knows how it goes. It was very magical to me how this "cgi-bin" worked. There was a "script kiddy hacking tool" I think named subseven (or similar) written partially in perl that you would trick your friends into running or you'd upload on filesharing. The perl part gave you your web based C&C to mess with people or open chats or whatever. I really got into programming trying to figure out how this all worked. I soon switched over to PHP and in my inexperience never realized the deployment model was so similar.
I do think this model of running the script once per request and then exiting really messed with my internal mental model of how programs and scripts worked. Once I was exposed to long running programs that could maintain state, their own internal data structures, handling individual requests in a loop, etc, was a real shock and took me awhile to conceptualize.
simonw 14 hours ago [-]
I had the same experience! I started out with Perl and CGI, then moved to PHP. Switching to a world where the web application kept running across multiple requests took me quite a bit of effort to get used to.
pests 11 hours ago [-]
It is so odd how request-centric my worldview was back then. I literally couldn’t fathom how “old legacy crusty languages” like c/c++ could possibly be behind a website. Learning google was built this way blew my mind.
It’s strange thinking back to the days where persisting information as simple as a view counter required persisting data to a flatfile* or something involving a database.
These days with node and our modern languages like go and rust it’s immediately obvious how it’s done.
I think it’s both a mix of me learning and growing and the industry evolving and growing, which I think all of us experience over time.
* for years using flat files was viewed as bad practice or amateurish. fun to learn years later that is how many databases work.
cousin_it 4 hours ago [-]
Persisting information should be done using a database, though. Otherwise your view counter will reset to zero on server restart. Overall I still think PHP's request-centric model is the best fit for the web.
tribby 8 hours ago [-]
sub7 was a windows binary (client and server), but it’s possible there was an unofficial perl interface for it or something similar. the perl era definitely saw a lot of precursors to modern C2 dashboards
xnx 17 hours ago [-]
Indeed. Perl was better in many ways, but not in the one that mattered to its continued viability.
ivovk 11 hours ago [-]
It stops working when you need to connect to any external resource. Database, http clients etc maintain connection pools to skip initial connection phase, which can be costly. That’s why you usually need a running web application process
ben0x539 10 hours ago [-]
At my last job, a lot of our web services also benefited immensely from in-process caches and batching (to be fair, some of them were the cache for downstream services), and their scaling requirements pretty much dominated our budget.
I can totally see how the cgi-bin process-per-request model is viable in a lot of places, but when it isn't, the difference can be vast. I don't think we'd have benefited from the easier concurrency either, but that's probably just because it was all golang to begin with.
8organicbits 10 hours ago [-]
You can solve that with a sidecar, a dedicated process (or container) that pools connections for you. Pgbouncer as one example.
rvnx 6 hours ago [-]
Great, additional things to maintain that can break, all of that to work around the original sandcastle instead of fixing the root issue
indigodaddy 8 hours ago [-]
Speaking of old stuff like newspro, anyone remember php scripts like CuteNews and sNews (single file CMS) ?
doublerabbit 53 minutes ago [-]
Yes, CuteNews was clever. PHPNuke, e107, all those happy days downloading source files on 56K and uploading them to my webhost to play with.
indigodaddy 42 minutes ago [-]
Some super fun times for sure and the scene is not easily replicated by anything today
aaronblohowiak 17 hours ago [-]
Ehhhhhhh. I believe but cannot cite that fork() got a lot cheaper over the last 30 years as well (independent of machine stats, I believe the Linux impl is inherently cheaper now but I can’t remember the details.) cgi bin works really well if you don’t have to pay for ssl or tcp connections to databases or other services, but you can maybe run something like istio if you need that. I have long thought that (fast)cgi is better model than proprietary “lambda” /“faas”. But the languages de jure and vendor lock in didn’t favor a standards based approach here.
AdieuToLogic 16 hours ago [-]
> I believe but cannot cite that fork() got a lot cheaper over the last 30 years as well ...
The fork[0] system call has been a relatively quick operation for the entirety of its existence. Where latency is introduced is in the canonical use of the execve[1] equivalent in newly created child process.
> ... cgi bin works really well if you don’t have to pay for ssl or tcp connections to databases or other services, but you can maybe run something like istio if you need that.
Istio[2] is specific to Kubernetes and thus unrelated to CGI.
You should check out OpenFaaS. It uses a very CGI-inspired architecture for self hosting "functions" (in the lambda sense) on more conventional infra.
slashdave 14 hours ago [-]
Maybe my memory is bad, but I don't remember people jumping ship from CGI-bin because of performance. I do remember a lot of security problems.
kragen 14 hours ago [-]
I remember performance being the main reason people jumped ship from CGI in the period 01995–02002. The switch didn't solve security problems itself (except Shellshock, if you wrote CGI scripts in bash, but Shellshock wasn't publicly known until much later) but it sometimes came with a less slapdash approach to building web services which did solve security problems. On the other hand, it often instead came with a move to PHP, which had just unbelievable levels of security problems.
It's possible that your experience with people switching was later, when performance was no longer such a pressing concern.
sramsay 1 hours ago [-]
I remember megatons of Java Servlet(tm) hype convincing us that real programmers do OO.
Java remains the only programming language I've ever heard covered in a feature story for NPR.
Combined with the dot-com boom "general hype", I'm sure a lot of managers pushed heavyweight solutions where lightweight would have sufficed. Well, that may be an eternal problem, but maybe more succeeded in pushing them with a lot of hype. :-)
Not enough people I guess saw this as Sun trying to be the new Microsoft (which was the new IBM, which still has MVS & Cobol!), namely the company in control of The Platform, where here "The" just means the hip new thing kids learn in school and want to continue doing before they become expensive old timers.
sitharus 13 hours ago [-]
The main security issue I recall from CGI was caused by the web server having to execute the binary. This meant either executing as www-data, running the web server as root so it can call setuid, or using setuid binaries which have their own issues.
These were real issues on multi-user hosts, but as most of the time we don’t use shared hosting like that anymore it’s not an issue.
There were also some problems with libraries parsing the environment variables with the request data wrong, but that’s no different from a badly implemented http stack these days. I vaguely recall some issues with excessively log requests overflowing environment variables, but I can’t remember if that was a security problem or DoS.
But why? A lot of modern stacks aren’t just about performance but development speed and stuff
WhereIsTheTruth 11 hours ago [-]
"Just pump more HW at the problem bro"
RedShift1 11 hours ago [-]
I would gladly spend a little more on hardware if it makes the code much simpler and easier to debug.
BirAdam 6 hours ago [-]
Simpler and easier to debug would also mean less expensive to maintain… thus, more hardware is likely cheaper long term.
eyberg 18 hours ago [-]
Outside of nostalgia there's no engineering reason to do this - definitely not for performance.
That same go program can easily go over 10k reqs/sec without having to spawn a process for each incoming request.
CGI is insanely slow and insanely insecure.
simonw 18 hours ago [-]
What makes it insecure? It's a pretty simple protocol - anything in there that makes is insecure beyond naive mistakes that could be avoided with a well designed library?
I agree that there's probably not much of an argument to switch to it from the well established alternative mechanisms we are using already.
The one thing in its favor is that it makes it easier to have a polyglot web app, with different languages used for different paths. You can get the same thing using a proxy server though.
eyberg 18 hours ago [-]
CGI has a very long history of security issues stemming primarily from input validation or the lack thereof.
AdieuToLogic 16 hours ago [-]
> CGI has a very long history of security issues stemming primarily from input validation or the lack thereof.
And a Go program reading from a network connection is immune from the same concerns how?
simonw 18 hours ago [-]
Right, but anything relating to input validation can be avoided by using a well designed library rather than implementing the protocol directly.
BirAdam 18 hours ago [-]
The language in use often has input validation libraries. The failure of the programmer to use them is not the fault of CGI. Further, proper administration of the machine can mitigate file injection, database injection, etc. Again, that people fail to do this isn’t the fault of CGI.
16 hours ago [-]
cranberryturkey 17 hours ago [-]
That's like saying forks and knives are vulnerable becuase you could stab someone with them.
tonyedgecombe 14 hours ago [-]
>EDIT: Looks like the way CGI works made it vulnerable to Shellshock in 2014:
From your linked article: If the handler is a Bash script, or if it executes Bash...
But we are talking about Python not Bash.
kragen 13 hours ago [-]
Yes, Shellshock is kind of a marginal case, but it probably does qualify as a security hole due in part to CGI itself, even though it doesn't affect Python programs (unless they spawn a shell). I don't know of any other examples of security problems caused by CGI, even partly. It's a very thin layer over HTTP.
mike_hearn 8 hours ago [-]
IIRC the main issue was finding ways to convince a CGI script to write something to disk, at which point you could sometimes make it be treated as another CGI script. More of an issue on Windows than UNIX.
slashdave 14 hours ago [-]
Because... it can execute an arbitrary executable? In the old days, it also ran as root.
kragen 13 hours ago [-]
No, on all the servers I have any experience with, it can only execute executables the server administrator configures as CGI programs, not executables supplied by an attacker, and they never ran as root. Apache in particular is universally run as a non-root user, since the very first release, and its suEXEC mechanism (used for running CGI programs as their owners for shared web hosting services) refuses to run any CGI program as root. I've never seen a web server on a Unix system running as root: not CERN httpd, not NCSA httpd, not Apache, not nginx, not python -m http.server, not any of the various web servers I've written myself.
I hesitate to suggest that you might be misremembering things that happened 30 years ago, but possibly you were using a very nonstandard setup?
rvnx 6 hours ago [-]
BusyBox runs as root by default, and it's used by hundreds of millions of devices.
For embedded devices (routers, security cameras, etc), it's very common to run CGI scripts as root.
So it is not even 30 years ago, it's still today, because of bad practices of the past.
kragen 3 hours ago [-]
Oh, that's a good point. In those cases the web server actually needs root, in the sense that it has to be able to upgrade the firmware and reconfigure the network interface.
sitharus 13 hours ago [-]
It definitely can’t. Either you had to put your script in cgi-bin, use an extension like .cgi in a directory with that feature explicitly enabled, or a magic sticky bit on the file if that was enabled.
You could configure the server to be insecure by, eg, allowing cgi execution from a directory where uploaded files are stored.
cperciva 18 hours ago [-]
Hey, it could be worse. Some people launch entire VMs to service individual requests.
klysm 18 hours ago [-]
That’s cloud-native!
jasonvorhe 8 hours ago [-]
How is that done?
cperciva 3 hours ago [-]
I was making a joke about AWS Lambda. It doesn't necessarily start up a new VM for each request, though; it can launch a new VM but it will reuse an existing VM if the same CGI-bin (oops I mean Lambda function) has been executed recently.
giantrobot 15 hours ago [-]
You mean "Web Scale".
__float 18 hours ago [-]
Not everything needs 10k RPS, and in some sense there are benefits to a new process – how many security incidents have been caused by accidental cross-request state sharing?
And in a similar vein, Postgres (which is generally well liked!) uses a new backend process per connection. (Of course this has limitations, and sometimes necessitates pgbouncer, but not always.)
reddec 14 hours ago [-]
Few years ago I felt the same and created trusted-cgi.
However, through the years I learned:
- yes, forks and in general processes are fast
- yes, it saves memory and CPU on low load sites
- yes, it’s simple protocol and can be used even in shell
However,
- splitting functions (mimic serverless) as different binaries/scripts creates mess of cross scripts communication
- deployment is not that simple
- security wise, you need to run manager as root and use unique users for each script or use cgroups (or at least chroot). At that moment the main question is why not use containers asis
Also, compute wise, even huge Go app with hundreds endpoints can fit just few megabytes of RAM - there is no much sense to save so few memory.
At worst - just create single binary and run on demand for different endpoints
Tractor8626 15 hours ago [-]
Even without pgbouncer postgres uses long lived connections (and long lived processes) . So bad example.
Uber famously switched from pg to mysql because their SWEs couldn't properly manage connections
anonzzzies 13 hours ago [-]
Was that the only reason? In our last testing (2021), on the same hardware and for our specific case (a billions of records database with many tables and specific workfloads), mysql consistently left postgres in the dust performance wise. Internal and external devs pointed out that probably postgres (or rather, our table structures & indexes) could be tweaked with quite a lot of work to be faster, but mysql performed well (for our purpose) even with some very naive options. I guess it depends on the case, but I cannot justify spending 1 cent (let alone far more as we have 100k+ tables) on something while something else is fast enough (by quite a margin) to begin with...
17 hours ago [-]
phplovesong 10 hours ago [-]
The way "old" stacks like PHP work also makes it impossible to do stateful stuff, like websockets.
There is work-arounds but usually it a better idea to ditch PHP for a better technology more suited for modern web.
johnisgood 10 hours ago [-]
Better technology? Please, do tell.
And for your information, you can have stateful whatnots in PHP. Hell, you can have it in CSS as I have demonstrated in my earlier comments.
That is, if your web service struggles to handle single-digit millions of requests per day, not counting static "assets", CGI process startup is not the bottleneck.
A few years ago I would have said, "and of course it's boring technology that's been supported in the Python standard library forever," but apparently the remaining Python maintainers are the ones who think that code stability and backwards compatibility with boring technology are actively harmful things, so they've been removing modules from the standard library if they are too boring and stable. I swear I am not making this up. The cgi module is removed in 3.13.
I'm still in the habit of using Python for prototyping, since I've been using it daily for most of the past 25 years, but now I regret that. I'm kind of torn between JS and Lua.
Amusingly that links to https://peps.python.org/pep-0206/ from 14th July 2000 (25 years ago!) which, even back then, described the cgi package as "designed poorly and are now near-impossible to fix".
Looks like the https://github.com/jackrosenthal/legacy-cgi package provides a drop-in replacement for the standard library module.
There are certainly some suboptimal design choices in the cgi module's calling interface, things you did a much better job of in Django, but what made them "near-impossible to fix" was that at the time everyone reading and writing PEPs considered backwards compatibility to be not a bad thing, or even a mildly good thing, but an essential thing that was worth putting up with pain for. Fixing a badly designed interface is easy if you know what it should look like and aren't constrained by backwards compatibility.
I have bash CGI scripts too, though Shellshock and bash's general bug-proneness make me doubt that this was wise.
There are some advantages of having the CGI protocol implemented in a library. There are common input-handling bugs the library can avoid, it means that simple CGI programs can be really simple, and it lets you switch away from CGI when desired.
That said, XSS was a huge problem with really simple CGI programs, and an HTML output library that avoids that by default is more important than the input parsing—another thing absent from Python's standard library but done right by Django.
That policy, and the heinous character assassination the PSF carried out against Tim Peters, mean I can no longer recommend in good conscience that anyone adopt Python.
You guys are all really getting worked up over very little.
But I also understand that the world is not perfect. We all need to prioritize all the time. As they write in the rationale: "The team has limited resources, reduced maintenance cost frees development time for other improvements". And the cgi module is apparently even unmaintained.
I guess a "batteries included" philosophy sooner or later is caught up by reality.
What do you mean by "character assassination" carried out against Tim Peters? Not anything in the linked article I presume?
It does make me wonder whether Python is still the best choice for what I use it for, and whether I should be moving to something else.
https://www.theregister.com/2024/08/09/core_python_developer...
https://tim-one.github.io/psf/ban
https://chrismcdonough.substack.com/p/the-shameful-defenestr...
So the remaining people periodically launch some deprecation PEPs or other bureaucratic things in order to give the appearance of active development.
No, not everyone. I've been using Python as my primary language since 2000 (that's 1.5.2 days). It has been the least troublesome language that I work with, and I work with (or have worked with) a bunch (shell, perl, python, ruby, lua, tcl, c, objective-c, swift, java, javascript, groovy, go and probably others I'm forgetting).
Even all the complaints about the Python packaging ecosystem over the years... I just don't get it. Like, have you ever tried working with CPAN or Maven or Gradle or even, FFS, Ruby Gems/bundler? The Python virtual environment concept is easy to understand and pip mostly does its job just fine, and these days, uv makes all that even faster and easier.
Anywho, just dropping a contrarian comment here because maybe I'm part of the generally silent majority that is just able to use Python day in and day out to get their job done.
As for prioritizing, I think the right choice is to deprioritize Python.
This is a bit like Apple firing Steve Jobs for wearing sneakers to work because it violates some dress code.
Also I used Python way before JS, and I still like JS's syntax better. Especially not using whitespace for scope, which makes even less sense in a scripting language since it's hard to type that into a REPL.
What Node.js had from the start was concurrency via asynchronous IO. And before Node.js was around to be JavaScript's async IO framework, there was a robust async IO framework for Python called Twisted. Node.js was influenced by Twisted[0], and this is particularly evident in the design of its Promise abstraction (which you had to use directly when it was first released, because JavaScript didn't add the async/await keywords until years later).
[0] https://nodejs.org/en/about
Jupyter fixes the REPL problem, and it's a major advance in REPLs in a number of other ways, but it has real problems of its own.
https://github.com/python/cpython/commits/3.12/Lib/cgi.py
Turns out most maintenance this thing received is the various attempts of removing it.
All that was in the cgi module was a few functions for parsing HTML form data.
As a side note, though, CGIHTTPRequestHandler is for launching CGI programs (perhaps written in Rust) from a Python web server, not for writing CGI programs in Python, which is what the cgi module is for. And CGIHTTPRequestHandler is slated for removal in Python 3.15.
The problem is gratuitous changes that break existing code, so you have to debug your code base and fix the new problems introduced by each new Python release. It's usually fairly straightforward and quick, but it means you can't ship the code to someone who has Python installed but doesn't know it (they're dependent on you for continued fixes), and you can't count on being able to run code you wrote yourself on an earlier Python version without a half-hour interruption to fix it. Which may break it on the older Python version.
The support for writing CGI programs in Python is in wsgiref.handlers.CGIHandler .
Secondly, you have to find a reliable maintainer or several.
A lot of people want stuff to be maintained indefinitely for them by unspecified "others".
Not updating the system is usually a solution to such problems.
At best there is a nginx or an API in front that acts a reverse proxy to clean-up/normalize the incoming requests and prevent directly exposing the service.
Example: banks, airlines, hospitals, air traffic controllers, electricity companies, etc
All critical services that nobody wants to touch, as it works +/-
a) make the system air gapped
b) pay a Python consulting company to back port security fixes
c) hire a Python core dev to do the system, directly
OOOOR, they can just update to Python 3.13 and migrate to the equivalent Python package that's not part of the core. For sure they already use other Python packages already.
We're making a mountain out of a molehill, also on behalf of places that have plenty of money to spend if push comes to shove.
then that endpoint will have at least 400ms response times, not great
Jython no longer works with basically any current Python libraries because it never made the leap to Python 3, and the Python community stigmatizes maintaining Python 2 compatibility in your libraries. This basically killed Jython, and from my point of view, Jython was one of the best things about Java.
Most rational people are ok with code being removed that 99.99% of users have absolutely no use for, especially if it is unmaintained, a burden, or potentially contains security issues. If you are serious about cgi you’ll probably be looking at 3rd party modules anyway.
Personally… I don't use pip. Why? apt is there.
EDIT: So, you get threads like this https://stackoverflow.com/questions/65651040/what-is-the-rec... and so on
Lua barely has any stdlib to speak of, most notably in terms of OS interfaces. I'm not even talking about chmod or sockets; there's no setenv or readdir.
You have to install C modules for any of that, which kinda kills it for having a simple language for CGI or scripting.
Don't get me wrong, I love Lua, but you won't get far without scaffolding.
But my concern is mostly not about needing to bring my own batteries; it's about instability of interfaces resulting from evaporating batteries.
LuaJIT, release-wise, has been stuck in a very weird spot for a long time, before officially announcing it's now a "rolling release" - which was making a lot of package maintainers anxious about shipping newer versions.
It also seems like it's going to be forever stuck on the 5.1 revision of the language, while continuing to pick a few cherries from 5.2 and 5.3. It's nice to have a "boring" language, but most distros (certainly Alpine, Debian, NixOS) just ship each release branch between 5.1 and 5.4 anyway. No "whatever was master 3 years ago" lottery.
That said these days I'd rather use Go.
Admittedly Python is not great at this either (reload has interacted buggily with isinstance since the beginning), but it does attempt it.
I agree its not a rapid prototyping kind of language. AI assistance can help though.
Since I learnt Python starting in version 1.6, it has mostly been for OS scripting stuff.
Too many hard learnt lessons with using Tcl in Apache and IIS modules, continuously rewriting modules in C, back in 1999 - 2003.
Also lets see the impact of Microsoft's Python team layoffs on it, given that CPython developers only started caring about performance due to Facebook and Microsoft, so far the JITs in Python have been largely ignored by the community.
edit: Looks like yes for Node JS. I can't tell for PHP as I keep getting results for optcache which is different and in memory.
I still miss PHP's simple deployment, execution and parallelization model, in these over-engineered asyncy JavaScripty days.
If I remember correctly that is about half of what StackExchange served on daily average over 8 servers. I am sure using Go or Crystal would have scale this at least 10x if not 20x.
The problem I see is that memory cost isn't dropping which means somewhere along the graph the memory cost per process together will outweight whatever advantage this has.
Still, sounds like a fun thing to do. At least for those of us who lived through CGI-Bin and Perl era.
Your app could add almost no latency beyond storage if you try.
You can measure this easily, get a copy of Windows busybox and write a shell script that forks off a process a few thousand times. The performance difference is stark.
Are you really saying that it takes 70ms on Microsoft Windows? I don't have an installed copy here to test.
Even if it does, that would still be about 15% of the time required for `python3 -m cgi`, so it seems unlikely to be an overriding concern for CGI programs written in Python, at least on manycore servers serving less than tens of millions of hits per day. Or does it also fail to scale across cores?
At the time Perl was the thing I used in the way I use Python now. I spent a couple of years after that working on a mod_perl codebase using an in-house ORM. I still occasionally reach for Perl for shell one-liners. So, it's not that I haven't considered it.
Lua is in a sense absolutely stable unless your C compiler changes under it, because projects just bundle whatever version of Lua they use. That's because new versions of Lua don't attempt backwards compatibility at all. But there isn't the kind of public shaming problem that the Python community has where people criticize you for using an old version.
JS is mostly very good at backwards compatibility, retaining compatibility with even very bad ideas like dynamically-typed `with` statements. I don't know if that will continue; browser vendors also seem to think that backwards compatibility with boring technology like FTP is harmful.
Only one benchmark on one system, but over in day before yesterday's HN thread on this (https://news.ycombinator.com/item?id=44464272), I report a rather significant slowdown in Perl start up overhead: https://news.ycombinator.com/item?id=44467268 . Of course, at least for me, Python3 is worse than Python2 by an even larger factor and Python2 worse than Perl today by an even larger factor.
FWIW, in Nim, you can get a CGI that probably runs faster than the Go of this article with simply:
I don't know of a `cgitb` equivalent even in the Nimbleverse. Some of the many web frameworks in Nim like jester seem to have that kind of thing built into them, though I realize a framework is not the same as CGI (and that's one of the charms of CGI).- `perl -de 0` provides a REPL. With a readline wrapper, it gives you history and command editing. (I use comint-mode forn this, but there are other alternatives.)
- syscalls can automatically raise exceptions if you `use autodie`.
Why is this not the default? Because Perl maintainers value backward compatible. Improvements will always sit behind a line of config, preventing your scripts from breaking if you accidentally rely on functionality that later turns out to be a mistake.
https://entropicthoughts.com/you-want-technology-with-warts
https://entropicthoughts.com/why-perl
Perl feels clumsy and bug-prone to me these days. I do miss things like autovivification from time to time, but it's definitely bug-prone, and there are a lot of DWIM features in Perl that usually do the wrong thing, and then I waste time debugging a bug that would have been automatically detected in Python. If the default Python traceback doesn't make the problem obvious, I use cgitb.enable(format='text') to get a verbose stack dump, which does. cgitb is being removed from the Python standard library, though, because the maintainers don't know it can do that.
Three years ago, a friend told me that a Perl CGI script I wrote last millennium was broken: http://canonical.org/~kragen/sw/rfc-index.cgi. I hadn't looked at the code in, I think, 20 years. I forget what the problem was, but in half an hour I fixed it and updated its parser to be able to use the updated format IETF uses for its source file. I was surprised that it was so easy, because I was worse at writing maintainable code then.
Maybe we could do a better job of designing a prototyping language today than Larry did in 01994, though? We have an additional 31 years of experience with Perl, Python, JS, Lua, Java, C#, R, Excel, Haskell, OCaml, TensorFlow, Tcl, Groovy, and HTML to draw lessons from.
One benefit Perl had that I think not many of the other languages do was being designed by a linguist. That makes it different -- hard to understand at first glance -- but also unusually suitable for prototyping.
Python has Werkzeug, Flask, or at the heavier end Django. With Werkzeug, you can translate your CGI business logic one small step at a time - it's pretty close to speaking raw HTTP, but has optional components like a router or debugger.
I agree that tens of milliseconds of latency is significant to the user experience, but it's not always the single most important consideration. My ping time to news.ycombinator.com is 162–164ms because I'm in Argentina, and I do unfortunately regularly have the experience of web page loads taking 10 seconds or more because of client-side JS.
[1] https://totalrealreturns.com
...and then you're wasting a 64-core server at 100% CPU load on just starting up and tearing down script instances, not yet doing any useful work. This is doing 160 startups per second, not requests per second
Would be a pretty big program for it to require 400ms on a fast CPU though, but the python interpreter is big and if you have one slow import it's probably already not far off
I’m amazed at how great the tools are these days that are free and yet we pay so much to cloud providers. I know it’s not an apples to apples comparison but it was so great to develop all that and fine tune it on a box in my basement.
It's crazy to me that we keep paying all these overheads for no reason other than "it's what Google does". I really need to write my article about my modular monolith architecture up, it's worked really well for us.
That said, to be honest, I don’t find Kubernetes complex — but that might be because I’ve been using it for quite some time.
You realise you can always still restore that backup onto someone else's server? When you need to restore from backup either way. I don't really see why one would pre-emptively pay for it
> If I accidentally lock myself out, there's no serial terminal to fall back on.
Why not? That sounds like a choice you can make. The hardware I hosted on either had a KVM built in or can just attach a USB keyboard and VGA (or nowadays HDMI) display
Power outages aren't common in my area, and otherwise a UPS is not that expensive (compared to if you pay a third party to set up redundant power for your hobby system)
You can choose to pre-emptively pay the cloud premium and give them access to your server so you can also social engineer yourself back in via customer support (after all, if you aren't expecting to lose the password and thus don't need to convince a human to let you into your hosting account, then you could also hold onto your own server's password). It just all seems very opposed from the self-hosting spirit where you're self-reliant, which apparently you value since you were considering whether to self host?
Huh? If a drive goes out in my home server, the entire thing is offline until amazon delivers a new one (don't tell me I now need to keep a stockpile of spares). If a drive goes out in S3, I never know about it because AWS takes care of it. You don't understand why someone would want to "preemptively" pay for that?
> It just all seems very opposed from the self-hosting spirit where you're self-reliant, which apparently you value since you were considering whether to self host?
My self hosting is not ideological at all. I couldn't care less about "self reliance." The reason I considered hosting locally was to save money, and I concluded I was actually getting a lot of value for the money I was spending on AWS.
In my experience, cloud at scale has ALWAYS required someone with a pager willing/paid to get up at 3am on Christmas Eve. So someone’s time is being used no matter.
Cloud providers are used because there is a lot of vested interest - e.g. VC and investors also having shares in cloud companies, then there is fear their investment might not survive imaginary surge of traffic that will never happen in reality. Cloud sales people are masters of playing at investors insecurities.
Not all of us we don't
Can you imagine the size of the business / service you could run with 4 attached 20TB drives, and a modest CPU? Good luck getting such from a cloud provider.
At least that was the case when I did the "python is single threaded, let's run many of them" + "python is slow, let's run many of them" dance
At scale you end up using shared connection pools outside of python (like pgbouncer) and a lot of tuning to make it serve the load while not killing the database
Of course, then we reimplemented in a multithreaded somewhat performant language and it became dead simple again
And we trading performance for what exactly? Code certainly didn't become any simpler.
Sure, there are regressions that will make start-up overhead worse, but I mean, there will be pathological regressions in any configuration.
In practise I'm not convinced -- but I would love to be. Reverse proxying a library-specific server or fiddling with FastCGI and alternatives always feels unnecessarily difficult to me.
Which is only a small proportion of sites out there.
I think the worst drama ever was a partial disk failure. Things kinda hobbled along for awhile before things actually started failing, and at that point things were getting corrupted. That poofed a weekend out of my life. Now I have better monitoring and alerting.
So you really do not have to be bothered by installation or anything of these lines. You install once and you are fine. You should check out the Wiki pages of Arch Linux, for example. It is pretty straightforward. As for upgrades, Arch Linux NEVER broke. Not on my servers, and not on my desktop.
That said, to each their own.
I will give you the benefit of the doubt that you are not regurgitating what other people have been saying (IMO wrongfully), which is: "Arch Linux for servers? Eww. Bleeding edge. Not suitable for servers.". All that said, please, do share. It will not negate those decades of no issues, however.
As I said, I maintain quite a lot of Arch Linux servers with loads of services without any issues, for decades.
Or if you don't want to pay for an 8/16 for the sort of throughput you can get on a VPS with half a core.
[1]: https://uwsgi-docs.readthedocs.io/en/latest/CGI.html
Another place this can be useful is for allowing customers to extend a local software with their own custom code. So instead of having to use say MCP to extend your AI tool they can just implement a certain request structure via CGI.
This makes me wonder if an MCP service couldn't be also implemented as CGI: an MCP framework might expose its feature as a program that supports both execution modes. I have to dig into the specs.
This let's you drop .htaccess files anywhere and Apache will load them on each request for additional server config. https://httpd.apache.org/docs/2.4/howto/htaccess.html
One big reason to avoid them was performance; it required extra disk access on every request and it was always better to put the configuration in the main config file if possible.
But now? When most servers have an SSD and probably spare RAM that Linux will use to cache the file system?
Ok, performance is still slightly worse as Apache has to parse the config on every request as opposed to once, but again, now that most servers have more powerfull CPU's? In many use cases you can live with that.
[ Side project is very early version but I'm already using it: https://github.com/StaticPatch/StaticPatch/tree/main ]
> I'm not a real programmer. I throw together things until it works then I move on. The real programmers will say "Yeah it works but you're leaking memory everywhere. Perhaps we should fix that." I’ll just restart Apache every 10 requests.
PHP got a very long way since then, but a huge part of that was correcting the early mistakes.
> PHP 8 is significantly better because it contains a lot less of my code.
I do have thoughts for later about modes which could take all the config from .htaccess files and build them into the main config so then you avoid any performance issues - however you have to do that carefully to make sure people don't include any bad config that crashes the whole server. One of the nice things about using .htaccess files as intended is Apache has the Nonfatal flag on AllowOverride so you can avoid that. https://httpd.apache.org/docs/2.4/mod/core.html#allowoverrid...
IMO you don't need to compensate for bad configs if you're using a proper staging environment and push-button deployments (which is good practice regardless of your development model). In prod, you can offset its main issue (atomic deployments) by swapping a symlink. In that scenario, having a separate .htaccess file actually helps - you don't want to restart Apache if you can avoid it, and again - hot reloading can hide state.
My main issue is that this is all a very different model from what most languages, frameworks, and runtimes have been doing for almost 20 years now. If you're a sysop dealing with heterogenous environments, it's honestly just annoying to have separate tooling and processes.
Personally, ca 10 years ago, this was the tipping point at which I've demanded from our PHP devs that they start using Docker - I've been reluctant about it until that moment. And then, whether it was .htaccess or the main config, no longer mattered - Apache lived in a container. When I needed to make sure things performed well, I used Locust <https://locust.io/>. Just measure, then optimise.
So in practice, yes, spiritually I'm doing what PHP8 did to PHP3. Whether that's "approvingly" is up to your interpretation ;)
In the mid-2000s I worked on a very-large-scale website using Apache2 w/mod_perl. Our high-traffic peaks were something like 25k RPS (for dynamic content; total RPS was >250k). Even at that time it was a bit old hat, but the design scaled very well. You'd have a fleet of mod_perl servers that would handle dynamic content requests, and a fleet of Apache2 servers that served static content and reverse-proxied back to the mod_perl fleet for dynamic requests. In front of the static servers were load balancers. They'd all keep connection pools open and the load balancers avoided the "maximum connection limit" of typical TCP/IP software, so there was no real connection limit, it was just network, memory, and cpu limits.
The big benefit of Apache2 w/mod_perl or mod_php was that you combined the pluggability and features of a scalable and feature-filled web server with the resident memory and cache of an interpreter that didn't need to keep exiting and starting. Yes you had to do more work to integrate with it, but you have to do that today with any framework.
The big downside was bugs. If you had a bug, you might have to debug both Apache and your application at the same time. There was not as much memory to be had, so memory leaks were a MUCH bigger problem than they are today. We worked around it with stupid fixes like stopping interpreters after taking 1000 requests or something. The high-level programmers (Perl, PHP) didn't really know C or systems programming so they didn't really know how to debug Apache or the larger OS problems, which it turns out has not changed in 20 years...
FastCGI and later systems had the benefit that you could run the same architecture without being tied to a webserver and dealing with its bugs on top of your own. But it also had the downside of (in some cases) not multiplexing connections, and you didn't get tight integration with the web server so that made some things more difficult.
Ultimately every backend web technology is just a rehashing of CGI, in a format incompatible with everything else. There were technical reasons why things like FastCGI, WSGI, etc exist, but today they are unnecessary now that we have HTTP/2 and HTTP/3. If you can multiplex HTTP connections and serve HTTP responses, you don't need anything else. I really hope future devs will stop reinventing the wheel and go back to actual standards that work outside your own single application/language/framework.
() technically, two of them: one handled the front (customer visible) end, one handled the back-office side.
Well ackshually ... the technology here that was important was mod_php; PHP itself was no different to Perl in how it was run, but the design choice of mod_php as compared to mod_perl was why PHP scripts could just be dumped on the server and run fast, where you needed a small amount of thinking and magic to mod_perl working.
What almost brought us to tears the day we learned about PHP was how everything we had been painstakingly programming ourselves from scratch reading RFCs or reverse engineering HTTP was just a simple function call in PHP. No more debugging our scuffed urlencode implementation or losing a day to a stray carriage return in an HTTP header...
mod_perl2[0] provides the ability to incorporate Perl logic within Apache httpd, if not other web servers. I believe this is functionally equivalent to the cited PHP Apache module documentation:
0 - https://perl.apache.org/docs/2.0/index.htmlEDIT: I have managed to dig out slides from a talk I gave about this a million years ago with a good section that walks through history of how all this worked, CGIs, mod_perl, PSGI etc, for anyone who wants a brief history lesson: https://www.slideshare.net/slideshow/psgi-and-plack-from-fir...
I got into web dev in the tail end of perl and cgi-bin. I remember my first couple scripts which were just copy/paste from tutorials and what not, everyone knows how it goes. It was very magical to me how this "cgi-bin" worked. There was a "script kiddy hacking tool" I think named subseven (or similar) written partially in perl that you would trick your friends into running or you'd upload on filesharing. The perl part gave you your web based C&C to mess with people or open chats or whatever. I really got into programming trying to figure out how this all worked. I soon switched over to PHP and in my inexperience never realized the deployment model was so similar.
I do think this model of running the script once per request and then exiting really messed with my internal mental model of how programs and scripts worked. Once I was exposed to long running programs that could maintain state, their own internal data structures, handling individual requests in a loop, etc, was a real shock and took me awhile to conceptualize.
It’s strange thinking back to the days where persisting information as simple as a view counter required persisting data to a flatfile* or something involving a database.
These days with node and our modern languages like go and rust it’s immediately obvious how it’s done.
I think it’s both a mix of me learning and growing and the industry evolving and growing, which I think all of us experience over time.
* for years using flat files was viewed as bad practice or amateurish. fun to learn years later that is how many databases work.
I can totally see how the cgi-bin process-per-request model is viable in a lot of places, but when it isn't, the difference can be vast. I don't think we'd have benefited from the easier concurrency either, but that's probably just because it was all golang to begin with.
The fork[0] system call has been a relatively quick operation for the entirety of its existence. Where latency is introduced is in the canonical use of the execve[1] equivalent in newly created child process.
> ... cgi bin works really well if you don’t have to pay for ssl or tcp connections to databases or other services, but you can maybe run something like istio if you need that.
Istio[2] is specific to Kubernetes and thus unrelated to CGI.
0 - https://man.freebsd.org/cgi/man.cgi?query=fork&apropos=0&sek...
1 - https://man.freebsd.org/cgi/man.cgi?query=execve&sektion=2&a...
2 - https://istio.io/
It's possible that your experience with people switching was later, when performance was no longer such a pressing concern.
Java remains the only programming language I've ever heard covered in a feature story for NPR.
Combined with the dot-com boom "general hype", I'm sure a lot of managers pushed heavyweight solutions where lightweight would have sufficed. Well, that may be an eternal problem, but maybe more succeeded in pushing them with a lot of hype. :-)
Not enough people I guess saw this as Sun trying to be the new Microsoft (which was the new IBM, which still has MVS & Cobol!), namely the company in control of The Platform, where here "The" just means the hip new thing kids learn in school and want to continue doing before they become expensive old timers.
These were real issues on multi-user hosts, but as most of the time we don’t use shared hosting like that anymore it’s not an issue.
There were also some problems with libraries parsing the environment variables with the request data wrong, but that’s no different from a badly implemented http stack these days. I vaguely recall some issues with excessively log requests overflowing environment variables, but I can’t remember if that was a security problem or DoS.
"A brief, incomplete and largely inaccurate history of dynamic webpages"
https://www.slideshare.net/slideshow/psgi-and-plack-from-fir...
That same go program can easily go over 10k reqs/sec without having to spawn a process for each incoming request.
CGI is insanely slow and insanely insecure.
EDIT: Looks like the way CGI works made it vulnerable to Shellshock in 2014: https://en.m.wikipedia.org/wiki/Shellshock_(software_bug)
I agree that there's probably not much of an argument to switch to it from the well established alternative mechanisms we are using already.
The one thing in its favor is that it makes it easier to have a polyglot web app, with different languages used for different paths. You can get the same thing using a proxy server though.
And a Go program reading from a network connection is immune from the same concerns how?
From your linked article: If the handler is a Bash script, or if it executes Bash...
But we are talking about Python not Bash.
I hesitate to suggest that you might be misremembering things that happened 30 years ago, but possibly you were using a very nonstandard setup?
For embedded devices (routers, security cameras, etc), it's very common to run CGI scripts as root.
So it is not even 30 years ago, it's still today, because of bad practices of the past.
You could configure the server to be insecure by, eg, allowing cgi execution from a directory where uploaded files are stored.
And in a similar vein, Postgres (which is generally well liked!) uses a new backend process per connection. (Of course this has limitations, and sometimes necessitates pgbouncer, but not always.)
However, through the years I learned:
- yes, forks and in general processes are fast - yes, it saves memory and CPU on low load sites - yes, it’s simple protocol and can be used even in shell
However,
- splitting functions (mimic serverless) as different binaries/scripts creates mess of cross scripts communication - deployment is not that simple - security wise, you need to run manager as root and use unique users for each script or use cgroups (or at least chroot). At that moment the main question is why not use containers asis
Also, compute wise, even huge Go app with hundreds endpoints can fit just few megabytes of RAM - there is no much sense to save so few memory.
At worst - just create single binary and run on demand for different endpoints
Uber famously switched from pg to mysql because their SWEs couldn't properly manage connections
There is work-arounds but usually it a better idea to ditch PHP for a better technology more suited for modern web.
And for your information, you can have stateful whatnots in PHP. Hell, you can have it in CSS as I have demonstrated in my earlier comments.