The current distribution Arc Lisp can be found at https://arclanguage.org. It ships with a forum that isn't up to date nor does it have feature parity with HN, which is kind of its own black box because Real Money is involved, and Real Money ruins everything.
Also there is a public fork of Arc Lisp at https://arclanguage.org/anarki, which is not guaranteed to be stable, and ships with a forum which is definitely not to feature parity with HN.
Occasional page load failure while logged in? Pretty common. Complete or nearly complete failure for logged-in users? Happens sometimes. Completely down such that not even a logged-out user can read it? I'm not sure I've ever seen that.
I wonder if Reddit is ever going to fix their "you broke reddit" thing. I probably see that page a couple dozen times a year, and it's been going on since, what, 2014? Earlier maybe? It's like the Twitter fail whale. Except Twitter fixed their site.
In fact, restarting the server is or was the only way to fix at least one bug on HN.[1] Pretty sure I've seen it as late as last year, and dang confirmed the fix was still the same.
(I know, restarting the server != rebooting the machine.)
This reminds me of my old workplace. Government owned company where everything was running on MS Excel, SAP, fax machines [0] and prayers from operative IT office.
I had to go to one of the more "exotic" server rooms, placed in a basement and saw a very old desktop stuffed on the bottom of a rack. Covered with dust, open side panels, plastic turned yellow, you know how old and forgotten hardware looks like. This had a new-ish post-it with the message: "this runs all of the fax infrastructure. you turn it off, you explain it to the CEO".
I worked for a company that had acquired a B-list, top 20 mmorpg - its database was mssql and had been running 24/7 for 5 years, with no way to back it up, no safety net, and an increasingly pissed off player community suffering degrading service. Everyone was terrified to reboot.
They started bouncing paychecks before I got to see the drama come to a head, but I'm obsessive about proper levels of redundancy anymore. Don't build fragility into a system, and at least leave yourself a means to fail gracefully.
There is something comical about this: HN receives an absurd amount of traffic. More than startups and corporations with millions of customers and millions if not billions of dollars behind them, and yet it runs on effectively an arduino nano compared to what they are using. HN serves all the contents significantly faster than any of them at the fraction of the time and cost. Let's take my old job for instance: the company website had almost nothing on it: a list of products, a jobs page and that's it. That website ran on 12(yes, twelve) dell servers, each having a 4 socket motherboard, each occupied by a 12 core xeon, and each of those servers had 384 gigs of ram. Sure, it tracked every click and whatnot but the data that was tracked was never actually used and the all-beloved bundle.min.js was 15 megabytes. So much for low carbon footprint and all that...
Not sure what platforms you've worked on but "6M requests/day" is peanuts in my neighborhood. (No offense to dang and team which do a fantastic job, but that's not high throughput compared to almost any platform I've worked on over the past decade.)
That's definitely less than I'd expected and now it doesn't feel all that magical that it's running from a single box.
I have a site with ~ 2.5M req/day running on an m3.medium instance that rarely spikes 10% cpu. NGINX is frontline server probably handling a good % of those request directly before passing others via proxy.
Also you have to imagine the bulk of the traffic is going to be the same small bits of data (the homepage, top stories comment pages, etc).
It’s not trivial to take advantage of that but that is also much easier to handle than QPS that are distributed across your dataset (IE Facebook or Twitter where everybody is seeing different things).
Much much larger, albeit distributed across multiple datacenters and completely isolated environments. The site I was giving as an example was going possibly serving around 10-20M/day, and it isn't an isolated case: I know plenty of companies that allocate such resources for something like that.
But that's all part of the overall achievement -- 6M requests/day is a key part of the optimisation, and how much real world "traffic" that corresponds too.
On the other side, it is not much content to begin with.
Traffic is a bit of text, extremely basic CSS, no images, no tracking, no bullshit means that load times are probably instantaneous even on a 56k modem.
Content is a bit of plain text => easy rendering, the entire database probably would fit into memory, so not much in terms of i/o going on.
> Traffic is a bit of text, extremely basic CSS, no images, no tracking, no bullshit means that load times are probably instantaneous even on a 56k modem.
In other words: they are serving exactly what the people who come here are looking for, and little else. Don't get me wrong, there are cases where images and moderately more complex CSS improve a website. On the other hand, many websites go to an extreme that degrades the user experience.
Only on the database side, and most people will be logged in, making rendering caches obsolete.
What I have utterly no idea is how the karma system works. Updating karma counts and comment sorting only on new submissions would help with performance, but on the other side karma "decays" over time, so there needs to be some sort of external cron?
you could probably use a weight system. Yesterday's upvote is worth a value of 1. Today's upvote is a value of 2, etc. Then store the absolute count in another column/field if you need that.
6M requests a day. Assuming a "US East Coast dominates" traffic pattern which is typical for this kind of site, that means their rps probably spikes north of 10K, but I'd be surprised if it's over 50K.
I disagree that startups with millions of customers handle less traffic. There are probably some exceptions that prove the rule, but once you're into millions of customers you are most likely over 50K rps at peak times.
Most startups with "millions of customers" can also trivially shard their traffic so it can effectively run on a bunch of largely-independent boxes, with some added redundancy for reliability's sake. No need for anything more complex.
Math on peak RPS doesn't work that way. It's extremely non-uniform. I agree that 10K rps wouldn't be sustained for an hour, but I bet they hit it at points, or get close to it. 1k rps is likely something they exceed for 10's of minutes every day.
At any rate they should be able to serve that easily off one box, so the architecture holds up.
6M per day is an average of 70rps. It would be highly unusual to have a 100x variation from the mean on such a large traffic volume.
Let's imagine that all the traffic occurs in a single hour (6M per hour) and is Poisson distributed. Even then to reach 10,000 rps would require a second with 6x the average traffic volume for that hour. Admittedly this could theoretically occur when some correlated event causes everyone to browse HN, but historically the site goes down during those.
For comparison our service (also largely US-centric) served 30M views yesterday with a peak volume of ~1000rps.
I'm going to assume that since you are referring to exponential distributions in one part of your response that you're familiar with this topic, even though you refer to averages in another part. Averages are, as you likely know, completely irrelevant when it comes to predicting peak load.
Yes, 10K is likely an overestimate by some factor, however it's in the realm of possible. We can argue how much of an overestimate, but I don't think this takes away from the main point that I was trying to make above, which is that they are very unlikely to exceed the serving capacity of a single host (I don't know how fast Arc is, but a well-tuned serving system should be able to handle 40K rps for simple to cache content, which HN surely qualifies as).
Look at okws.org sometime. It gave okcupid a significant cost advantage over competitors by letting them run on a lot less hardware. I've never used it but have always admired it.
HN is a niche internet community. Niche internet communities have almost always existed despite the "quality" of the tools they run on. It's not really fair to compare to any sort of consumer/business product that's in an extremely competitive marketplace.
HN can do an incredible amount of easy caching. Besides a small box in the top right with user profile info, the entire page is completely identical for everyone.
It looks like HN gets about 1-2 submissions per minute and ~10s of comments per minute, although I imagine the daily Joe Rogan argument attracts more than the average. Then there’s the votes. I would ballpark the average mutation rate at ~1/s. Who would doubt it is possible to host this on one box?
> Who would doubt it is possible to host this on one box?
People who've never stood up a "bare-metal" server in their life. Who weren't around to see the kind of traffic one shitty late-90s commodity-parts server with slow memory and maybe two processors if you're very lucky and spinning-rust drives, could handle when tuned properly and without horrendously bloated software or being subjected to a pile of bad DB queries written by people who haven't the first clue what they're doing.
Hear, hear! As much as I adore setting up and managing systems that scale well, not everything needs massive clusters of hardware run by software that's designed to scale to the millions. Skilled system administration goes a long way.
It probably didn't come across in my post (I see on re-reading it) but I wasn't trying to shit on people who haven't had that kind of experience. I think VM/container performance in modern software ecosystems (and especially on shared hardware, which is typical "in the cloud") is so bad that lots of people just genuinely don't realize what kind of workload the hardware itself is cabable of serving, and so are consistently surprised when a single server—much faster than a typical server circa, say, 1999—is happily serving a not-even-that-large workload. It's not their fault, just a difference in experience and some... very good marketing, I'd say.
>People who've never stood up a "bare-metal" server in their life. Who weren't around to see the kind of traffic one shitty late-90s commodity-parts server with slow memory and maybe two processors if you're very lucky and spinning-rust drives
While crossing the Atlantic ocean there were only a few domains whitelisted for our Iridium Go satellite modem. Email, obviously. Predictwind, for weather, and Hacker News.
Bandwidth maxed at 2400bps (yes, really) and HN was the only news site I trusted not to overload my dripping faucet of an internet connection. The modems built-in browser stripped HN down to the bare minimum to be legible.
I really appreciate the thought that goes into making a lightweight, information heavy site.
PSA: Pi 4 + M.2 SATA SSD + USB3 adapter is blisteringly fast. No micro SD card needed any more. Runs Ubuntu x64 and Docker.
Even better with a heat-dissipating case with an integrated M.2 SATA port in the bottom, like the Argon ONE M.2.
I use mine to run multiple .NET 6 microservices, and it's replaced an entire HP Microserver. The entire Pi setup will have paid for itself in under a year in energy savings alone.
Are there any "gotchas" in this setup? I remember trying to set up my Pi 4 to boot from an external SSD a while back and there were a few annoying manual processes to deal with so I just decided to use a SD card instead. But nowadays I'm running so much on my Pi (pi.hole, minecraft server, jellyfin instance, vpn for all my devices) that I'm getting worried about SD wear. Curious if you have an SSD recommendation (I'd love to get 2TB of storage or so on there) as well.
Key to that is "x64", though I assume you meant Arm 64.
I recently had an important infrastructure service die a dumpster fire of a death because it uses mongo, which quietly disables its journal on 32 bit operating systems. (Restored from backup, fwiw.)
I would love to contribute night mode to HN. It's my least favorite thing about reading the fantastic discussions here. Anyone have ideas on how I can go about doing that?
While not built-in to HN, I've been reading it in dark mode for so long I didn't even realize until your comment. Try the Dark Reader extension. Fantastic at transforming webpages into dark mode most of the time, without making it look like a mess. Open source. Available on Firefox and Chrome.
Used to enjoy using userscripts + greasemonkey until the extension was removed from the chrome browser store. Tampermonkey for firefox fell out of favor over the Dark Reader extension as default but didn't know uBlock Origin can be used this way.
On Android, I've been using Kiwi Browser [0]. It's Chrome but supports extensions from the webstore like desktop chrome, and has a dark mode toggle in the menu that applies custom CSS to webpages. Also has devtools, bottom navigation bar, and a setting to disable tab grouping.
I am still baffled by how so many are always assuming that running everything serverless/kubernetes/edge nodes etc is always the best for for performance... because it's 2022. For example, people talk about deploying things on Vercel because of the performance benefits. I found Vercel was waaay slower than just hosting my Next.js app (w backend API) on a VPS. (Even after getting past the cold starts on Vercel.) Just throwing it on a single server (even if it's far away) made the app feel so incredibly snappy and fast.
Both performance and complexity cost. Yes if you are google/amazon/fb scale you basically have to pay the complexity cost because of how many users/how much money downtime costs/etc. Most businesses are not nearly on that scale. Last I knew StackOverflow was running on 2 webservers and one DB server or the like. Most companies are not even at SO's scale.
I thought the advantages of those things are deployment and reproducibility. Otherwise it's easy to find yourself in a situation like my last job where unit tests only passed on the CEOs laptop and nobody knows why
Well yes that's a part of it. (I solved that problem by deploying w Docker personally.) But I think beyond that there's also this idea that it will be faster on serverless.
I'm curious about the database tech behind HN. I've heard before that it's Firebase, but that struck me as odd. Anyone have some enlightening comments from the past? Not sure what to type in the Algolia search.
There is no database, except in the very general sense. Hacker News stores data as Arc lisp tables in flat files or in memory - in fact, the forum is meant to serve as a MVP for the language itself.
See my previous comment in this thread or search "Arc Lisp" and you can find the current distribution of the code and forum software HN is based on. But bear in mind Hacker News itself is somewhat proprietary for business reasons.
I assume not because every comment made would not be visible until cache expired. Unless they have a way to expire the cache (which is not dependable on most CDNs?)
It is now synthesized on global platforms to seize the viral e-markets and aggregate brand schemas on the bleeding edge despite all the challenges in the supply chain.
edit/sorry was joking. I dont know anything about anything.
https://news.ycombinator.com/item?id=28478379
Also, it's still using the Arc programming language
https://en.wikipedia.org/wiki/Arc_(programming_language)
https://news.ycombinator.com/item?id=23483639
reply