Hacker News new | past | comments | ask | show | jobs | submit login
HN ran on a single box in 2018 has anything changed? (news.ycombinator.com)
155 points by ColinWright 1 day ago | hide | past | favorite | 101 comments





dang confirmed a few months ago it's still the same:

https://news.ycombinator.com/item?id=28478379

Also, it's still using the Arc programming language

https://en.wikipedia.org/wiki/Arc_(programming_language)

https://news.ycombinator.com/item?id=23483639


Pft. They have two boxes, a master & standby. Real gamers play on permadeath.

Actually HN is served from Sam Altman's fitbit now.

The current distribution Arc Lisp can be found at https://arclanguage.org. It ships with a forum that isn't up to date nor does it have feature parity with HN, which is kind of its own black box because Real Money is involved, and Real Money ruins everything.

Also there is a public fork of Arc Lisp at https://arclanguage.org/anarki, which is not guaranteed to be stable, and ships with a forum which is definitely not to feature parity with HN.


With that in mind I wonder what the uptime % of HN is, and if the master/standby setup involves automatic failover.

Occasional page load failure while logged in? Pretty common. Complete or nearly complete failure for logged-in users? Happens sometimes. Completely down such that not even a logged-out user can read it? I'm not sure I've ever seen that.

Has it bothered someone? Probably not.

In all my years, HN has been down just once.

I wonder if Reddit is ever going to fix their "you broke reddit" thing. I probably see that page a couple dozen times a year, and it's been going on since, what, 2014? Earlier maybe? It's like the Twitter fail whale. Except Twitter fixed their site.

Don't you mean a couple dozen times per day?

I notice short term outages every couple months. They usually don't last long.

I use HN a lot and it is very rare that it is down, and then only for a short period of time. (Not a very exact metric).

I do get an occasional no-response from firebase, but my guess is that this is really on my ISP. Again, that is also quite rare.


I figured HN probably ran on an old desktop tower under somebody's desk, with a post-it note on it that says "DO NOT REBOOT!!!!"

In fact, restarting the server is or was the only way to fix at least one bug on HN.[1] Pretty sure I've seen it as late as last year, and dang confirmed the fix was still the same.

(I know, restarting the server != rebooting the machine.)

[1] https://news.ycombinator.com/item?id=23228692


This reminds me of my old workplace. Government owned company where everything was running on MS Excel, SAP, fax machines [0] and prayers from operative IT office.

I had to go to one of the more "exotic" server rooms, placed in a basement and saw a very old desktop stuffed on the bottom of a rack. Covered with dust, open side panels, plastic turned yellow, you know how old and forgotten hardware looks like. This had a new-ish post-it with the message: "this runs all of the fax infrastructure. you turn it off, you explain it to the CEO".

[0] https://en.wikipedia.org/wiki/Fax


I worked for a company that had acquired a B-list, top 20 mmorpg - its database was mssql and had been running 24/7 for 5 years, with no way to back it up, no safety net, and an increasingly pissed off player community suffering degrading service. Everyone was terrified to reboot.

They started bouncing paychecks before I got to see the drama come to a head, but I'm obsessive about proper levels of redundancy anymore. Don't build fragility into a system, and at least leave yourself a means to fail gracefully.



Lol, yes. Maybe we’ve worked at the same places.

There is something comical about this: HN receives an absurd amount of traffic. More than startups and corporations with millions of customers and millions if not billions of dollars behind them, and yet it runs on effectively an arduino nano compared to what they are using. HN serves all the contents significantly faster than any of them at the fraction of the time and cost. Let's take my old job for instance: the company website had almost nothing on it: a list of products, a jobs page and that's it. That website ran on 12(yes, twelve) dell servers, each having a 4 socket motherboard, each occupied by a 12 core xeon, and each of those servers had 384 gigs of ram. Sure, it tracked every click and whatnot but the data that was tracked was never actually used and the all-beloved bundle.min.js was 15 megabytes. So much for low carbon footprint and all that...

> absurd amount of traffic

Not sure what platforms you've worked on but "6M requests/day" is peanuts in my neighborhood. (No offense to dang and team which do a fantastic job, but that's not high throughput compared to almost any platform I've worked on over the past decade.)


I’d measure by users and not requests. HN does in 2 requests what other sites do in 50, so it’s easier to serve more users.

That's definitely less than I'd expected and now it doesn't feel all that magical that it's running from a single box.

I have a site with ~ 2.5M req/day running on an m3.medium instance that rarely spikes 10% cpu. NGINX is frontline server probably handling a good % of those request directly before passing others via proxy.


Also you have to imagine the bulk of the traffic is going to be the same small bits of data (the homepage, top stories comment pages, etc).

It’s not trivial to take advantage of that but that is also much easier to handle than QPS that are distributed across your dataset (IE Facebook or Twitter where everybody is seeing different things).


You can't count all the requests wasted on assets when comparing the requests/day metric.

Much much larger, albeit distributed across multiple datacenters and completely isolated environments. The site I was giving as an example was going possibly serving around 10-20M/day, and it isn't an isolated case: I know plenty of companies that allocate such resources for something like that.

But that's all part of the overall achievement -- 6M requests/day is a key part of the optimisation, and how much real world "traffic" that corresponds too.

On the other side, it is not much content to begin with.

Traffic is a bit of text, extremely basic CSS, no images, no tracking, no bullshit means that load times are probably instantaneous even on a 56k modem.

Content is a bit of plain text => easy rendering, the entire database probably would fit into memory, so not much in terms of i/o going on.


> Traffic is a bit of text, extremely basic CSS, no images, no tracking, no bullshit means that load times are probably instantaneous even on a 56k modem.

In other words: they are serving exactly what the people who come here are looking for, and little else. Don't get me wrong, there are cases where images and moderately more complex CSS improve a website. On the other hand, many websites go to an extreme that degrades the user experience.


Meh, I'd kill for a better HN experience.

It's also gotta be super read heavy so traditional caching methods would work well

Only on the database side, and most people will be logged in, making rendering caches obsolete.

What I have utterly no idea is how the karma system works. Updating karma counts and comment sorting only on new submissions would help with performance, but on the other side karma "decays" over time, so there needs to be some sort of external cron?


you could probably use a weight system. Yesterday's upvote is worth a value of 1. Today's upvote is a value of 2, etc. Then store the absolute count in another column/field if you need that.

I would expect fragment caching to exist.

maybe the decay is coded in render logic for homepage with (now()-postedTime) kind of logic.

Database?

It also fails gracefully. Logged-in users may find the site unusable while it still loads for logged-out readers just fine (from cache).

6M requests a day. Assuming a "US East Coast dominates" traffic pattern which is typical for this kind of site, that means their rps probably spikes north of 10K, but I'd be surprised if it's over 50K.

I disagree that startups with millions of customers handle less traffic. There are probably some exceptions that prove the rule, but once you're into millions of customers you are most likely over 50K rps at peak times.


Most startups with "millions of customers" can also trivially shard their traffic so it can effectively run on a bunch of largely-independent boxes, with some added redundancy for reliability's sake. No need for anything more complex.

Let's say "trivially shard most of their traffic" and I can agree.

Product managers are constantly coming with cool ideas that are hard to shard.


10k rps would be 36M requests in an hour, so the peak is likely far lower than that, probably closer to 1k/s.

Math on peak RPS doesn't work that way. It's extremely non-uniform. I agree that 10K rps wouldn't be sustained for an hour, but I bet they hit it at points, or get close to it. 1k rps is likely something they exceed for 10's of minutes every day.

At any rate they should be able to serve that easily off one box, so the architecture holds up.


6M per day is an average of 70rps. It would be highly unusual to have a 100x variation from the mean on such a large traffic volume.

Let's imagine that all the traffic occurs in a single hour (6M per hour) and is Poisson distributed. Even then to reach 10,000 rps would require a second with 6x the average traffic volume for that hour. Admittedly this could theoretically occur when some correlated event causes everyone to browse HN, but historically the site goes down during those.

For comparison our service (also largely US-centric) served 30M views yesterday with a peak volume of ~1000rps.


I'm going to assume that since you are referring to exponential distributions in one part of your response that you're familiar with this topic, even though you refer to averages in another part. Averages are, as you likely know, completely irrelevant when it comes to predicting peak load.

Yes, 10K is likely an overestimate by some factor, however it's in the realm of possible. We can argue how much of an overestimate, but I don't think this takes away from the main point that I was trying to make above, which is that they are very unlikely to exceed the serving capacity of a single host (I don't know how fast Arc is, but a well-tuned serving system should be able to handle 40K rps for simple to cache content, which HN surely qualifies as).


Look at okws.org sometime. It gave okcupid a significant cost advantage over competitors by letting them run on a lot less hardware. I've never used it but have always admired it.

HN is a niche internet community. Niche internet communities have almost always existed despite the "quality" of the tools they run on. It's not really fair to compare to any sort of consumer/business product that's in an extremely competitive marketplace.

HN can do an incredible amount of easy caching. Besides a small box in the top right with user profile info, the entire page is completely identical for everyone.


Only for logged-out users. Logged-in users need to show votes, reply links, etc.

Mark Twain would make fun of us, but luckily he's not around.

Yet the audience brags about absurd complexity in the name of scalability, availability, metrics, etc. generation after generation.

It looks like HN gets about 1-2 submissions per minute and ~10s of comments per minute, although I imagine the daily Joe Rogan argument attracts more than the average. Then there’s the votes. I would ballpark the average mutation rate at ~1/s. Who would doubt it is possible to host this on one box?

> Who would doubt it is possible to host this on one box?

People who've never stood up a "bare-metal" server in their life. Who weren't around to see the kind of traffic one shitty late-90s commodity-parts server with slow memory and maybe two processors if you're very lucky and spinning-rust drives, could handle when tuned properly and without horrendously bloated software or being subjected to a pile of bad DB queries written by people who haven't the first clue what they're doing.


Hear, hear! As much as I adore setting up and managing systems that scale well, not everything needs massive clusters of hardware run by software that's designed to scale to the millions. Skilled system administration goes a long way.

It probably didn't come across in my post (I see on re-reading it) but I wasn't trying to shit on people who haven't had that kind of experience. I think VM/container performance in modern software ecosystems (and especially on shared hardware, which is typical "in the cloud") is so bad that lots of people just genuinely don't realize what kind of workload the hardware itself is cabable of serving, and so are consistently surprised when a single server—much faster than a typical server circa, say, 1999—is happily serving a not-even-that-large workload. It's not their fault, just a difference in experience and some... very good marketing, I'd say.

>People who've never stood up a "bare-metal" server in their life. Who weren't around to see the kind of traffic one shitty late-90s commodity-parts server with slow memory and maybe two processors if you're very lucky and spinning-rust drives

For example, what ibm.com ran on in 1998 (<https://en.wikipedia.org/wiki/File:IBM_RS6000_AIX_Servers_IB...>)


I think it is more than that. On Feb 4 (cst) I count 14796 submissions. (Comments and stories). So it is more like 10 per minute.

While crossing the Atlantic ocean there were only a few domains whitelisted for our Iridium Go satellite modem. Email, obviously. Predictwind, for weather, and Hacker News.

Bandwidth maxed at 2400bps (yes, really) and HN was the only news site I trusted not to overload my dripping faucet of an internet connection. The modems built-in browser stripped HN down to the bare minimum to be legible.

I really appreciate the thought that goes into making a lightweight, information heavy site.


You should try http://lite.cnn.com/

Have you checked out anything like gopher or the competing modern protocols (whose names escape me)?

They might serve you well if you find some sites that interest you.


YC has fallen on hard times. HN now runs on a Raspberry Pi.

Let's pool up funds and buy YC a flash drive to backup the SD card in case of a file system corruption.

PSA: Pi 4 + M.2 SATA SSD + USB3 adapter is blisteringly fast. No micro SD card needed any more. Runs Ubuntu x64 and Docker.

Even better with a heat-dissipating case with an integrated M.2 SATA port in the bottom, like the Argon ONE M.2.

I use mine to run multiple .NET 6 microservices, and it's replaced an entire HP Microserver. The entire Pi setup will have paid for itself in under a year in energy savings alone.


Are there any "gotchas" in this setup? I remember trying to set up my Pi 4 to boot from an external SSD a while back and there were a few annoying manual processes to deal with so I just decided to use a SD card instead. But nowadays I'm running so much on my Pi (pi.hole, minecraft server, jellyfin instance, vpn for all my devices) that I'm getting worried about SD wear. Curious if you have an SSD recommendation (I'd love to get 2TB of storage or so on there) as well.

Key to that is "x64", though I assume you meant Arm 64.

I recently had an important infrastructure service die a dumpster fire of a death because it uses mongo, which quietly disables its journal on 32 bit operating systems. (Restored from backup, fwiw.)


Raspberry pi now supports 64 bit (not amd64)

What version of MongoDB are you running? The mainstream MongoDB storage engine (WiredTiger) only supports 64bit systems.

How'd they manage to get a hold of one?!?

I heard it was a thumbdrive.

It'd be more appropriate to fall back to a toaster.

I would love to contribute night mode to HN. It's my least favorite thing about reading the fantastic discussions here. Anyone have ideas on how I can go about doing that?

Here's a lengthy discussion on HN dark mode with the site admins chipping in

https://news.ycombinator.com/item?id=23197966


While not built-in to HN, I've been reading it in dark mode for so long I didn't even realize until your comment. Try the Dark Reader extension. Fantastic at transforming webpages into dark mode most of the time, without making it look like a mess. Open source. Available on Firefox and Chrome.

Bonus points: available in Firefox mobile, too.


Add one of the following to 'Ublock Origin' 'My Filters'

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


Whoa, this is helpful.

Used to enjoy using userscripts + greasemonkey until the extension was removed from the chrome browser store. Tampermonkey for firefox fell out of favor over the Dark Reader extension as default but didn't know uBlock Origin can be used this way.


Check this one out to stop youtube embeds from loading until clicked:

||youtube.com^$3p,frame,redirect=click2load.html

||youtube-nocookie.com^$3p,frame,redirect=click2load.html

Got this from gorhill, Ublock Origin's developer. He sometimes puts out tips and tricks on his twitter

https://twitter.com/gorhill

https://nitter.net/gorhill


Perhaps email dang? He's very responsive and helpful.

I'd love this also, but it's outside of my domain to attempt to pitch in.


Dark theme is left as an exercise for the reader.

Browser plugin which applies custom CSS?

Plugins tend to make browsers more unique.

What about on mobile?

On Android, I've been using Kiwi Browser [0]. It's Chrome but supports extensions from the webstore like desktop chrome, and has a dark mode toggle in the menu that applies custom CSS to webpages. Also has devtools, bottom navigation bar, and a setting to disable tab grouping.

[0] https://play.google.com/store/apps/details?id=com.kiwibrowse...


Use Firefox with 'Ublock Origin' and add one of the following to it's 'My Filters' section:

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


On Android Firefox, "Dark Reader" supports HN.

Firefox Mobile supports plugins. You save a ton of battery life just running UBlock Origin with it

Firefox on Android has some plugins that can do this.

On iOS, I use Smart Invert.

There are a number of HN readers:

* https://hckrnews.com/ -> my favorite thread browser

* https://hackerweb.app/ -> a mobile focused app w/ dark mode


I am still baffled by how so many are always assuming that running everything serverless/kubernetes/edge nodes etc is always the best for for performance... because it's 2022. For example, people talk about deploying things on Vercel because of the performance benefits. I found Vercel was waaay slower than just hosting my Next.js app (w backend API) on a VPS. (Even after getting past the cold starts on Vercel.) Just throwing it on a single server (even if it's far away) made the app feel so incredibly snappy and fast.

Both performance and complexity cost. Yes if you are google/amazon/fb scale you basically have to pay the complexity cost because of how many users/how much money downtime costs/etc. Most businesses are not nearly on that scale. Last I knew StackOverflow was running on 2 webservers and one DB server or the like. Most companies are not even at SO's scale.

I thought the advantages of those things are deployment and reproducibility. Otherwise it's easy to find yourself in a situation like my last job where unit tests only passed on the CEOs laptop and nobody knows why

Well yes that's a part of it. (I solved that problem by deploying w Docker personally.) But I think beyond that there's also this idea that it will be faster on serverless.

I'm curious to find out as well.

I feel most websites can run on a single instance with decent caching. But owners want reliability and that adds complexity, cost, and technical debt.


> But owners want reliability and that adds complexity, cost, and technical debt.

Which reduces reliability


So you find a SRE to manage your enormously complex, responsive site.

I'm curious about the database tech behind HN. I've heard before that it's Firebase, but that struck me as odd. Anyone have some enlightening comments from the past? Not sure what to type in the Algolia search.

There is no database, except in the very general sense. Hacker News stores data as Arc lisp tables in flat files or in memory - in fact, the forum is meant to serve as a MVP for the language itself.

See my previous comment in this thread or search "Arc Lisp" and you can find the current distribution of the code and forum software HN is based on. But bear in mind Hacker News itself is somewhat proprietary for business reasons.



That’s just the API. I no longer work on HN, but as of 2016 it was still flat files (s-exprs) on a ZFS dataset.

It's now deployed to fly.io because edge nodes are important. /s

They should deploy on us! Dan! Call me!

Would that be less work than running one server?

does HN use a heavy cache layer?

I assume not because every comment made would not be visible until cache expired. Unless they have a way to expire the cache (which is not dependable on most CDNs?)


HN was upgraded incrementally over the years.

It is now synthesized on global platforms to seize the viral e-markets and aggregate brand schemas on the bleeding edge despite all the challenges in the supply chain.

edit/sorry was joking. I dont know anything about anything.




Applications are open for YC Summer 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: