|
| QP9: More Related to @MODE |
 |
Mon, 14 Jan 2008 15:56:23 +000 |
I'm now using the fix I said I would implement - rounding each number in
the column to 14 decimal places before using @MODE. However I now find a
need to know the second most frequent number in the column. Is there any
function with the same relationship to @MODE that @LARGEST has to @MAX?
If not, does anyone have any suggestions for a quick way to calculate
what I want? It has to be something that doesn't require run-time manual
intervention, because it needs to be calculated about halfway through a
macro.
Thanks in advance for any help,
NigelH
|
| Post Reply
|
| Re: QP9: More Related to @MODE |
 |
Tue, 15 Jan 2008 17:04:11 -070 |
NigelH wrote:
> I'm now using the fix I said I would implement - rounding each number in
> the column to 14 decimal places before using @MODE. However I now find a
> need to know the second most frequent number in the column.
Nigel, I think you're just trying to be difficult :-)
> Is there any function with the same relationship to @MODE that @LARGEST has
to @MAX?
That sure would be useful -- alas, the short answer is no. My guess is that
@MODE probably has the necessary info assembled in its internal algorithm in
order to answer the question "what's the second/third/fourth most frequent
value", because in order to identify the most frequent value you would have
to
know the frequency of all others as well. Thus, @MODE probably already knows
the lower frequencies INTERNALLY, it just doesn't bother to make those values
accessible via output options. So I think that an advanced version of @MODE
should not be too difficult to implement (perhaps in a future upgrade), using
the syntax @MODE(range,n) where n would be the nth most frequent occurrence.
> If not, does anyone have any suggestions for a quick way to calculate
> what I want? It has to be something that doesn't require run-time manual
> intervention, because it needs to be calculated about halfway through a
> macro.
A quick way ... no, but it presents an interesting challenge nevertheless.
@FREQDIST provides the most obvious vehicle for a solution, but it is hampered
by the requirement for an interval array in ascending order, and thus not only
involves additional computational overhead, but more importantly, a profligate
imposition on spreadsheet "real estate." Therefore, the most
efficient solution
I can think of requires one helper column. Assuming your data is in A1..A10,
enter in B1 (and copy down to B10):
@IF(A1=@MODE(A$1..A$10),"-",A1)
Then, the 2nd most frequent value: @MODE(B1..B10)
You could extend this concept to the third most frequent value by calculating in
column C (C1, then copy down):
@IF(B1=@MODE(B$1..B$10),"-",B1)
Then, the 3rd most frequent value: @MODE(C1..C10)
Please keep in mind that in a multimodal distribution, the "second most
frequent" value thus identified would actually be a multimodal root
relegated to
"2nd class status," even though this so-called "second most
frequent" value
would be on par with the value identified as the "most frequent" one.
For
example, @MODE(1,2,2,2,4,4,4,5,5,6) will return the value 2 as the mode, even
though there are two equally suited candidates. The above described method of
eliminating higher frequencies to find the next most frequent value would then
identify 4 as the "second most frequent" value, even though 4 occurs
as
frequently as 2. The bottom line is to be aware that @MODE is inherently
handicapped in "telling the full story" when you have a multimodal
distribution;
it's really only suited for scenarios involving a unimodal distribution.
Cheers,
Uli
|
| Post Reply
|
| Re: QP9: More Related to @MODE |
 |
Wed, 16 Jan 2008 14:39:38 -070 |
NigelH wrote:
> Uli wrote:
>> You could extend this concept to the third most frequent value by
>> calculating in column C (C1, then copy down):
>>
>> @IF(B1=@MODE(B$1..B$10),"-",B1)
>>
>> Then, the 3rd most frequent value: @MODE(C1..C10)
>
> This kills two birds with one stone. I wasn't sure what happened in the
> event of multi-modes (the application requires a single mode value). I
> did some experimenting and it seemed to me that @MODE returned the value
> closest to the mean, but your example shows that the lowest is returned.
FWIW, there's an even more efficient method of knocking multiple birds out of
the sky (I have nothing against fauna of the ornithic variety, it's just a
figure of speech). At any rate, there's a way to have one helper column take
care of ALL frequencies, rather than one helper column per frequency, thusly:
In B1 (beware wordwrap), then copy down to B10:
@IF(@MATCH(A1,A$1..A$10,0)=(@ROWS(A$1..A1)-1),@COUNTIF(A$1..A$10,A1)+@ROWS(A$1..
A1)/100000,"-")
Note: If your sample size exceeds 100,000 data points, you need to increase the
magnitude of the divisor (100000 in the example above) accordingly. This will
preclude the integer portion of the frequency count from being altered, while
allowing assignment of a unique "frequency value" to each unique value
in the
sample. I.e., column B now holds the frequency count with a fractional
"signature", rather than a new sample with a previously calculated
mode eliminated.
Then the most frequent value is obtained by:
@INDEX(A1..A10,0,@MATCH(@LARGEST(B1..B10,1),B1..B10,0))
Second most frequent (or, if it exists, duplicate most frequent):
@INDEX(A1..A10,0,@MATCH(@LARGEST(B1..B10,2),B1..B10,0))
.... etc.
And, finally, the number of most frequent values (i.e., a multimodal
indicator):
@SUM(@INT(B$1..B$10)=@INT(@MAX(B$1..B$10)))
Cheers,
Uli
|
| Post Reply
|
| Re: QP9: More Related to @MODE |
 |
Wed, 16 Jan 2008 14:47:55 GMT |
Uli:
> Therefore, the most efficient solution
> I can think of requires one helper column.
>
Bookmarked! <s>
--
Good wishes!
Roy Lewis
C_Tech volunteer
(UK)
|
| Post Reply
|
| Re: QP9: More Related to @MODE |
 |
Wed, 16 Jan 2008 16:11:41 +000 |
Uli wrote:
> NigelH wrote:
>> If not, does anyone have any suggestions for a quick way to calculate
>> what I want? It has to be something that doesn't require run-time
>> manual intervention, because it needs to be calculated about halfway
>> through a macro.
>
>
> A quick way ... no, but it presents an interesting challenge
> nevertheless. @FREQDIST provides the most obvious vehicle for a
> solution, but it is hampered by the requirement for an interval array in
> ascending order, and thus not only involves additional computational
> overhead, but more importantly, a profligate imposition on spreadsheet
> "real estate." Therefore, the most efficient solution I can
think of
> requires one helper column. Assuming your data is in A1..A10, enter in
> B1 (and copy down to B10):
>
> @IF(A1=@MODE(A$1..A$10),"-",A1)
>
Thank you! So simple and so obvious when you think about it :)
I keep an archive of spreadsheet tips, and this is joining them.
> Then, the 2nd most frequent value: @MODE(B1..B10)
>
> You could extend this concept to the third most frequent value by
> calculating in column C (C1, then copy down):
>
> @IF(B1=@MODE(B$1..B$10),"-",B1)
>
> Then, the 3rd most frequent value: @MODE(C1..C10)
>
>
> Please keep in mind that in a multimodal distribution, the "second
most
> frequent" value thus identified would actually be a multimodal root
> relegated to "2nd class status," even though this so-called
"second most
> frequent" value would be on par with the value identified as the
"most
> frequent" one. For example, @MODE(1,2,2,2,4,4,4,5,5,6) will return
the
> value 2 as the mode, even though there are two equally suited
> candidates. The above described method of eliminating higher
> frequencies to find the next most frequent value would then identify 4
> as the "second most frequent" value, even though 4 occurs as
frequently
> as 2. The bottom line is to be aware that @MODE is inherently
> handicapped in "telling the full story" when you have a
multimodal
> distribution; it's really only suited for scenarios involving a unimodal
> distribution.
This kills two birds with one stone. I wasn't sure what happened in the
event of multi-modes (the application requires a single mode value). I
did some experimenting and it seemed to me that @MODE returned the value
closest to the mean, but your example shows that the lowest is returned.
Finding the frequency of the 'second' mode and comparing it to the
frequency of the primary mode means I can knock out the unwanted values.
Many thanks,
NigelH
|
| Post Reply
|
|
|