A new count-distinct algorithm:
We present a simple, intuitive, sampling-based space-efficient algorithm whose description and the proof are accessible to undergraduates with the knowledge of basic probability theory.
Donald Knuth likes it: https://www-cs-faculty.stanford.edu/~knuth/papers/cvm-note.pdf
Their algorithm is not only interesting, it is extremely simple.
Furthermore, it’s wonderfully suited to teaching students who are learning the basics of computer science.
I’m pretty sure that something like this will eventually become a standard textbook topic.
There is the CWEB implementation he produced: cvm-estimates.w (archive.org)
Source: https://jmason.ie/2024/05/21/165901a.html
Interesting HackerNews comments: https://news.ycombinator.com/item?id=40379175
tl;dr: User countermeasures:
- Noreply-Email-Address: Every GitHub user should either use a dedicated commit email address or GitHub’s noreply-email-address service, also enabling the option to block accidental command line pushes.
- 2-Factor-Authentication: Every GitHub user should have 2-Factor-Authentication enabled
- Raise Awareness: it’s the duty of developers aware of this issue toinform their colleagues about it
sed -i "s/$real_email/$github_email/" /opt/*/.git/config