- 10 BTC sounds a lot but it's peanuts for such large data sets.
- 750k row of sample data is large enough for a leak by itself, many on reddit/twitter/fediverse have already started to explore the data set for gender ratio, age composition and frequency of raping cases, etc.
Apparently there was a "blogpost" of a developer showing of their code, where they accidentally leaked access tokens in a piece of commented code: https://archive.ph/mP3bh
This is completely unverified though, so take it with a grain of salt.
Starting today, this will be known as "Shanghai'd credentials" and be reason #1 why we use ephemeral credentials (e.g. AWS STS/SSO) rather than static credentials (e.g. IAM Users)
We got rid of all IAM users used by applications and moved to role-based access. Nowhere in the application do you need to enter AWS credentials. AWS SDK will attempt to discover short-lived credentials for you and will assume the role specified at the infrastructure layer, e.g. in a task definition.
One of the major benefits of ephemeral tokens is that they become less attractive to put into the code, and more attractive to put in a config file/vault that's easier to update and keep secret. This in itself is useful because it makes it less likely that it will be in some source file someone shows, or pushed to some remote repo that at some point has permissions allowed so people can see it.
Yes, but credentials should either be long lived with (very) limited scope _or_ short lived with required scope.
For example, for AWS you can create long lived credentials for users which are scoped to only allow one operation, namely obtaining a short lived token (with the aid of a hardware token such as a Yubikey) with scope to perform other operations.
You add the long lived IAM user API key/secret to it and it stores it in a password protected storage (MacOS keychain or similar).
Then you invoke aws-vault with an IAM role and command, and it will handle obtaining short-lived credentials scoped to that role (including TOTP 2-factor code auth), and then run the command with those temporary credentials as env vars.
With the right AWS permissions on your user, it can also automatically rotate the IAM user API keys for you.
It can either use a secret injected into an env var to bootstrap rotating ephemeral/refresh tokens or use a role provided by the environment (which can also provide short lived tokens), depending on your runtime environment and use case (on prem, cloud, k8s, etc).
Static, long lived secrets with limited governance that have no conditional access guards are weapons of mass self destruction.
Keeping secrets in environmental variables has always seemed dodgy to me. Unless specifically cleared, they get inherited by all child processes. Maybe there are never any child processes in your application, or that could be desired behavior in some circumstances, but generally it seems like asking for trouble.
There's also the reverse issue - if they change after your process is started.
Refreshing an environment variable that has changed is (for me) a line I won't cross. Time to write the app a different way, once that becomes a concern.
You may also setup federated (trusted) relationships. For example, a GitHub Workflow can be trusted to assume an IAM role. In that scenario, there's no long lived secret in scope.
The oidc subject includes the GitHub org, repo, branch, and environment for the IAM assume role policy to match or filter.
This is not at all the takeaway from this. It's "this shitty developer should not have had access to this data in the first place". With a nuance of "this database probably shouldn't exist in this form in one place to begin with".
I don't believe this comment is made in good faith, there is nothing wrong with the "right" and it's senselessly adding fuel to our political division.
There is something deeply wrong with the authoritarian politics of the right and its casual use of racism to further political control.
> it's senselessly adding fuel to our political division.
This comment, whether you realize it or not, is coming from a place of extreme social privilege.
Remember that for the majority of people, politics is not a game. It is serious. People lose their rights to live the life they want all the time. Sometimes those politics turn violent and people lose everything.
I wonder if you could make a luhn-like check that would require an additional approval step to post if it comes back positive. Something like "It looks like you may be posting a secret *****. Do you wish to continue?
If vendors agreed to a common prefix on all secret key values then it'd be easy for everyone to add checks, to everything. Something like "_SECRET88_".
Of course, then your secret key checker would need to build that string by concatenating so that it wouldn't set off itself.
More and more providers have been adding unique prefixes to their tokens and access keys which makes detection much easier. Ex, GitLab adds `glpat-` to their PAT.
A project I maintain, Gitleaks, can easily detect "unique" secrets and does a pretty good job at detecting "generic" secrets too. In this case, the generic gitleaks rule would have caught the secrets [1]. You can see the full rule definition here [2] and how the rule is constructed here [3].
I was thinking about that too, but it's actually tricky, even the example given, they use the var `accessId` but you could filter for all that, even the standard ones, but you couldn't have enough confidence in it so that if someone did post with a typo or even a random var name, they would think "Okay, no warning so must be okay".
Something like giving false confidence to the user. Not the best idea.
When you do this is there a way to completely get rid of the information? Usually you can go back an look at the edit history to see the original post.
Wouldn't matter. Tons of bots are scraping every inch of the internet all the time, and if something been online for five seconds, it has been cached/stored somewhere. Always assume that anything you've put up on the internet, can forever be accessed by someone.
The only thing you can do is rotating the token/secret.
Assuming this unverified version of the story is true, the danger of accidentally leaking credentials in code is enormous and one of the reasons I continue to maintain and develop gitleaks. Those credentials[1] would have been caught by the gitleaks' generic rule [2]
The consensus in Chinese community is while this is likely how the token got leaked, this alone isn't enough. To visit private Alibaba Cloud instance you can't just use some random IP. It's isolated from the Internet in certain way.
Karen Hao (WSJ): "I downloaded the sample the hacker provided and called dozens of people listed. Nine picked up & confirmed exactly what the data said."
I made a webapp home icon from my Firefox and picked out the app-bait popover with uBlock.
Basically just about every app (YouTube, Reddit, Facebook, ...) is better this way. I.e., no ads, erase-able elements, less spyware, defaults to no notification and sometimes even gets better functionality. For instance, it (browsers) gets rid of "hearts" in Duolingo for whatever damn reason, so you can practice however much you'd like in a day.
The downsides I've found is that you seemingly can't Chrome-cast from it, and it often creates new tabs instead of reusing existing ones or making it's own app-instance, so you gotta close all tabs every so often.
It seems the majority of people on the planet now have had some of their data leaked. Or are becoming ever more entangled with government and corporate systems which control and peddle their information as they see fit.
Is it ultimately a big nothing burger, or is this some singularity we are passing through?
I was thinking - if I had this, what could I do with the personal
records of a billion Chinese people?
And I must conclude - absolutely nothing. It's of no interest to me.
Now, I probably lack sufficient criminal imagination, but the point is
stuff like this is hard to fence because there's a very small market
of buyers. In an article I wrote for Routledge about the markets for
stolen digital data (specifically movie and album releases) I
suggested that the underlying problem is there's symbiosis between
leakers and buyers.
If you want to do anything, target the buyers. There's less of
them. Don't try to secure inherently insecure massively centralised
systems (Blotto + Dolev Yeo problem) . Or chase leakers. Or blame
users. Or fire the CIO. Find out who wants this stuff and take down
the show from the demand-side.
But hold on! Guess who the buyers are. And guess what sincere will
exists within "law enforcement" to tackle this sort of "cybercrime".
I suppose you could go the other direction. You could be an international human rights organization, and treat the database like a billion claim checks.
Having a definitive record of people's existence would make it more difficult for the authorities to skimp on natural disaster rescue efforts then lie about casualty numbers, treat citizens as canon fodder for military purposes, or simply wipe out individuals who have grievances with the government or powerful functionaries.
At this point I've basically accepted that all my info will be found on sites like fastpeoplesearch.com and that anything I tell any company (or I guess in this case, govt too) will eventually be leaked, correlated, and used against me.
LinkedIn doesn't have my Social Security number. It doesn't have a list of my bank accounts and credit cards. So, more people, but less damaging information.
It is both. It is huge, I'd say it's absolutely the latter. but I can't think of a single thing anyone can do about any of it at this point, which also makes it the former.
At several places I've seen they keep certain data such as phone, address, etc as a bullshit "business need" to "prevent abuse" and "prevent promo reuse" and keep forever even through CCPA.
Also they keep the record of the delete request, which contains the PII you ask to remove.
Not quite the same, but the US used census records that were supposed to be protected to round up the west coast japanese for their internment during WWII.
They were "protected". That is, they didn't leak out of the government into private hands. But that still turned out pretty badly.
In fact, information in the government's hands is the most dangerous, because they have more power than anyone else to use it against you.
(On the other hand, as others have said about Denmark and Netherlands, data that was not in government hands became in government hands, and was used against people. So it's not "safer" if it's in private hands, except to the degree that the government has to go through the extra step of getting it.)
There were also the "pink lists" tracking gay men [1] (link to German Wiki sorry) and which the nazis also greatly appreciated. Although to be fair^blunt they were collected exactly for reasons of prosecution, so not that far off from their use by the nazis.
IIRC there was a central registry of religion in the Netherlands that had the same effect. Can't find anything on that now, though (it's mentioned in Wikipedia in an unsourced paragraph; I think I first read about it on HN, actually).
-----
Tangent: the info pages on the Anne Frank House site have sections cycling through different pastel background colours.[0] I've wondered before whether something like that would the brain acquire context in a long page, making comprehension more like that of a physical book. Seeing it implemented, it doesn't seem to help. I think being able to easily flip to a previous page and back was one of the advantages of printed paper, so maybe a sticky TOC with the same colours or a minimap scrollbar would allow that? Actually, why not have that standard in browsers?
Hmm, the concept of coloured sections was known in 2013 already.[1]
"Fun" fact: It was IBM who helped tabulate data from the 1933 national census, which was then used to identify hundreds of thousands more Jews than would have been found by the Nazi party without their efforts.
"Machine-tabulated census data greatly expanded the estimated number of Jews in Germany by identifying individuals with only one or a few Jewish ancestors. Previous estimates of 400,000 to 600,000 were abandoned for a new estimate of 2 million Jews."
These days you'd just go to a data broker, who would also tell you what toothpaste they preferred and whether they managed to finish bingewatching The Sopranos.
Antisemitism was not really about religion. Many Jews had actually converted to Christianity for generations. The Nazis still considered them to be Jews.
Ahh...well there is the famous saying, "I decide who is a Jew." It was used on the head of the German Manhattan Project and a Jewish head (like a headmaster some shit) of a concentration camp, forget which one. And that's why we say "German Manhattan Project" stedda "Americaner Atomwaffenunternehmen" (I made that word up, it is correct in German to make words up, that means atom weapon undertaking), because German antisemitism amounted to forfeiting the bomb.
That was the price, the defeat of their last hope against the Allies. All of the Great Jews that slapped those firecrackers together were exiled due to antisemitism: Fermi, Szílard, Einstein (to get the president to read the letter to get the Los Alamos show on the road in the first place, get Roosevelt to read top to bottom left to right, no easy task), von Neumann (spesh because of his schizophrenia, no concentration camp for him, he would have been experimented on to then do that same sin to everybody in the camps, Schizophrenic Jews were at the absolute bottom o the Nazi world order).
Fermi was originally a fascist, it basically made sense to him as a way of organizing a country.
Only non-Jew in the top desks of Los Alamos. Why? Only when the racial laws against his Jewish wife and children did he pack his shit and leave for America.
I would say, impossible to compare. Digital changes the cost of acting upon this information, for good or bad purposes.
Obvious comparisons to e.g. the Netherlands' famous over-registering of religion and how the Nazis abused that. But I feel this is long term potentially worse than that. Not in the level of horribleness, but in the effect on society moving forward.
All you can do (in the USA) is freeze your credit and sign up for one of the free (or paid) credit monitoring services. That only protects you from financial ruin though. Not sure about people using your credentials to commit fraud, fake birth certificates, etc.
Well, if you look at (global) society as a dynamical system it seems to me that there are two stable basins or attractors, call them "Star Trek" and "North Korea".
In the "Star Trek" future the people in charge are themselves also subject to the panopticon, and the world is ruled fairly and humanely. (The other name I use for this is the "Tyranny of Mrs. Grundy".)
In the "North Korea" future there are (human or AI or hybrid) masters and brain-chipped cyborg slaves, and rule is absolute and enforced with digital precision.
(Of course, this is all predicated on the idea that we can't put the genie back in the bottle in re: ubiquitous surveillance. I think that's likely the case (although I do not like it) but I'm not going to make the argument here unless someone asks.)
Given the above the thing to do is work to make politicians subject to 24/7 total surveillance (ASAP, before everybody else) so we can keep an eye on them. This policy would also presumably weed out the crazies and corrupt, eh?
> Well, if you look at (global) society as a dynamical system it seems to me that there are two stable basins or attractors, call them "Star Trek" and "North Korea".
Nice analogy. Do you really believe, that us being on an utopian trajectory is realistic?
I would counter that, although it could, some groups will be able to evade it, effectively maintaining their advantage/power. Effectively averaging out the position of middle and lower classes, and lowering their chances of moving up the social ladder?
The standard "leak" of names and addresses of people is totally meaningless, though HN "privacy" obsessives blow it out of the water all the time. It's basically public information, we used to have everyone in phone books in the US and almost no one cared.
Cell phone number is a riskier one because of the opportunity for 2FA hacks. It's not hard to get people's cell phone numbers as it is (you can buy direct marketing lists for pennies per person in the US) but its not good to make it easy for hackers.
However this leak in particular appears to go much deeper so it is insidious. Police records are named and who knows what else. That is a genuine privacy issue and sucks for those involved.
Names and addresses can absolutely be used to stalk and harass people, and there are password reset flows that involve physically mailing secrets to people. Perhaps almost no one cared about phone books, but if you thought about the differences between phone books and a website for a moment, you'd see that these are different technologies that have different implications, and that it is entirely reasonable for people to have a different reaction.
You've chosen some arbitrary amount of information where you begin to care and become interested, and decided everyone with a different cutoff is an absolutist you don't need to listen to. But it's really just that your situation permits you to leak that information without fear, and you haven't deigned to imagine that other people are in a different situation.
The Shanghai police has a unique role in China and abroad. For example the Shanghai police is tasked with spreading pro-CCP propaganda globally on platforms like twitter and Facebook.
Someone posted a comment explaining a little more about Shanghai's special relationship with the CCP/PLA:
>Shanghai is a city with a unique role in the progression of the CCP and its global efforts. Also PLA Unit 61398 is in Pudong, the shanghai district mentioned in the article. Overall there's a lot of CCP/PLA-adjacent tech talent in the area, and of course the local police still ultimately report to the CCP.
So I'm guessing that database would have quite a few activists listed in it and other anti-government people. Might even give someone a much-needed warning if they find themselves there.
I wouldn't call being able to check if your dissident handle is in a hacked leaked database a warning parent. A warning is something like someone literally telling you out of band that now is the time to go encrypt all the things[1], bury the drive in the woods, then go home and wipe your personal machine followed by filling the hard drive with milquetoast DVDrips to ensure everything is written over.
I thought China was very strict, this sounds similar to what the Russia hacking scene allegedly used to be -- you can a foreign hospital hack for profit, but if you deface one little wordpress they'll throw you in a literal gulag with the rest of the pesky faggots and feminists. (The state's words, not mine[2])
In 2018 I saw a local branch office were using Windows XP and an old Internet Explorer. You cannot expect that to be secure. This does not surprise me at all.
A lot of those are actually pirated/modified installs of Windows. I think its called Tomato Windows or something like that? I forget, but its incredibly prevalent in China.
it's in US ones too, it's an industry wide issue in the aviation sector, don't hack the airport, people will come for you and if you are lucky they will be carrying badges
Surprise, it's 2022, and XP is still a de-facto standard Windows version, with hacked Win7 slowly gaining.
Why? Tons of Software was written for XP, and then abandoned without any support. Many of that stuff in the government sector. A lot of online banking clients outright say "only works on XP," and copyright years reads 2006.
This is similar how Android 7+ support was almost nuked in China for nearly a year because Tencent didn't want to port Wechat to newer APIs cuz "nobody uses Android newer than 4.X in China"
That was not why they refused to port it to newer APIs though. It was because Google changed the permissions API to be more granular and request permissions at runtime, which would have meant Tencent would have to request tons of permissions to gather user data (presumably users would not be inclined to grant so many permissions).
Kinda interesting that The Register does not even speculate about steps which China's higher-level security services might take in response, to "memorably demonstrate their displeasure" at the theft. (A certain cynical attitude is usually part of The Register's stock-in-trade.)
The leaked screenshot of the data's metadata looks like the output of Elasticsearch's /_cat command. Someone probably left the port 9200 open to the public, or stored the index on a public cloud but somehow leaked its keys either on github-like service or in some discussion forum -- a typical mistake that engineers make.
"Looks genuine" from my Chinese friends. Also this might be leaked through a hardcoded token in some code posted on CSDN (sort of blog for programmers).
The US government might buy it to help them find good candidates to recruit as spies and saboteurs, or to note if current spies and saboteurs are under suspicion or have been discovered.
If the records are digital and non-air-gapped in any system of any country, you can assume that the US government has access to those records already. The exceptions to this assumption are exceedingly rare.
making money is not the motive for some. this database will be very useful going forward. imagine the leverage you could have over business dealings.
some guys at the top of the game are probably already doing this and have figured out how to both insulate themselves and launder/hide data they horde.
I don't know how it works in China but where I am a person's criminal record is not public but not exactly private either. In the sense that an employer can ask for your criminal record and you have the choice giving a printout of it or not having your job. Making it kind of hard to see how the knowledge of a criminal record could be used to blackmail someeone.
As for "data brokers. Advertisement, financial credibility, trustworthines of buisness partners etc.". Maybe. But these companies would turn themselves into criminals by using or purchasing this information.
It is likely, that this DB contains more information, than what a formal printout gives.
"But these companies would turn themselves into criminals by using or purchasing this information."
Which is why they probably would not deal with the information gathering directly, but use a service of a data analyst company. When they do something illegal, nobody who contracted then did ever know anything. I think this game is played in china as well.
I am guessing he means that it highlights the incompetence or even just the consequences of centralizing power.
Personally I don't expect this to bear true. Historically in China, government failures have been cited as evidence for further centralizing the power of the federal government. And this argument is bought hook-line-and-sinker by the people. I don't think that will change until there is serious economic hardship.
Anyone talking calmly about this should check themselves as to why. And if you are making apps that collect private data for governments or corporations you should quit in protest. This is what we have been screaming about for the last five years, but no one listens.
Governments have been collecting (and poorly securing) this sort of information and more for most of recorded history. It's not to say that I like it, or would work for somewhere like Meta or the like, but plenty of these major data leaks have been from places that used to collect and store physical data bases of this stuff since before most of us were alive.
I'm talking calmly about this because people have been screaming in my ear about it for 20 years, and I listened. And then I lived my life around the fact that this was going to be happening whether you scream yourself hoarse or not, at least for now.
For anyone wondering what that is, English uses short-scale, i.e. 1 billion = 1000 million, some other languages / countries use long-scale i.e. 1 billion = 1 million million.
It was a joke. But it made me realize, thanks to the comment above, that Earth's population is around 8 thousand millions, and not 8 billion as I'd come to believe.
- 750k row of sample data is large enough for a leak by itself, many on reddit/twitter/fediverse have already started to explore the data set for gender ratio, age composition and frequency of raping cases, etc.
reply