Recreating the CVM algorithm for estimating distinct elements gives problems
Briefly

It did not work! I got silly numbers. I could download Hamlet split it into words, (around 32,000), do len(set(words) to get the exact number of distinct words, (around 7,000), then run it through the algorithm and get a stupid result with tens of digits for the estimated number of distinct words.
I tried it on arbitrary-bits of data and the result was a reasonable (but not exact), estimate of the number of distinct items. Then I wrote a segment of Javascript to compare the AI and the algorithm and I found the probable source of my problem. Hardly rocket science! However at one or two points the algorithm was explained in English (or rather pseudo code.)
Read at Paddy3118
[
add
]
[
|
|
]