Daily Shaarli
September 3, 2024
This is one of the things I genuinely appreciate about Hacker News. Most fields have a problem with ‘ghost knowledge’, hard-won practical understanding that is mostly passed on verbally between practitioners and not written down anywhere public. At least in programming some chunk of it makes it into forum posts. It’s normally hidden in the depths of big threads, but that’s better than nothing.
A new count-distinct algorithm:
We present a simple, intuitive, sampling-based space-efficient algorithm whose description and the proof are accessible to undergraduates with the knowledge of basic probability theory.
Donald Knuth likes it: https://www-cs-faculty.stanford.edu/~knuth/papers/cvm-note.pdf
Their algorithm is not only interesting, it is extremely simple.
Furthermore, it’s wonderfully suited to teaching students who are learning the basics of computer science.
I’m pretty sure that something like this will eventually become a standard textbook topic.
There is the CWEB implementation he produced: cvm-estimates.w (archive.org)
Source: https://jmason.ie/2024/05/21/165901a.html
Interesting HackerNews comments: https://news.ycombinator.com/item?id=40379175
Have you ever wondered how Shazam works? I asked myself this question a few years ago and I read a research article written by Avery Li-Chun Wang, the confounder of Shazam, to understand the magic behind Shazam. The quick answer is audio fingerprinting, which leads to another question: what is audio fingerprinting?
This article is a summary of the search I did to understand Shazam.