Groups > WordPerfect Office > Quattro Pro older versions > Re: QP9: More Related to @MODE




QP9: More Related to @MODE

QP9: More Related to @MODE
Mon, 14 Jan 2008 15:56:23 +000
I'm now using the fix I said I would implement - rounding each number in 
the column to 14 decimal places before using @MODE. However I now find a 
need to know the second most frequent number in the column. Is there any 
function with the same relationship to @MODE that @LARGEST has to @MAX?
If not, does anyone have any suggestions for a quick way to calculate 
what I want? It has to be something that doesn't require run-time manual 
intervention, because it needs to be calculated about halfway through a 
macro.

Thanks in advance for any help,

NigelH

Post Reply
Re: QP9: More Related to @MODE
Tue, 15 Jan 2008 17:04:11 -070
NigelH wrote:
> I'm now using the fix I said I would implement - rounding each number in 
> the column to 14 decimal places before using @MODE. However I now find a 
> need to know the second most frequent number in the column.

Nigel, I think you're just trying to be difficult  :-)

> Is there any function with the same relationship to @MODE that @LARGEST has
to @MAX?

That sure would be useful -- alas, the short answer is no.  My guess is that 
@MODE probably has the necessary info assembled in its internal algorithm in 
order to answer the question "what's the second/third/fourth most frequent

value", because in order to identify the most frequent value you would have
to 
know the frequency of all others as well.  Thus, @MODE probably already knows 
the lower frequencies INTERNALLY, it just doesn't bother to make those values 
accessible via output options.  So I think that an advanced version of @MODE 
should not be too difficult to implement (perhaps in a future upgrade), using 
the syntax @MODE(range,n) where n would be the nth most frequent occurrence.

> If not, does anyone have any suggestions for a quick way to calculate 
> what I want? It has to be something that doesn't require run-time manual 
> intervention, because it needs to be calculated about halfway through a 
> macro.

A quick way ... no, but it presents an interesting challenge nevertheless. 
@FREQDIST provides the most obvious vehicle for a solution, but it is hampered 
by the requirement for an interval array in ascending order, and thus not only 
involves additional computational overhead, but more importantly, a profligate 
imposition on spreadsheet "real estate."  Therefore, the most
efficient solution 
I can think of requires one helper column.  Assuming your data is in A1..A10, 
enter in B1 (and copy down to B10):

   @IF(A1=@MODE(A$1..A$10),"-",A1)

Then, the 2nd most frequent value:  @MODE(B1..B10)

You could extend this concept to the third most frequent value by calculating in

column C (C1, then copy down):

   @IF(B1=@MODE(B$1..B$10),"-",B1)

Then, the 3rd most frequent value:  @MODE(C1..C10)


Please keep in mind that in a multimodal distribution, the "second most 
frequent" value thus identified would actually be a multimodal root
relegated to 
"2nd class status," even though this so-called "second most
frequent" value 
would be on par with the value identified as the "most frequent" one. 
For 
example, @MODE(1,2,2,2,4,4,4,5,5,6) will return the value 2 as the mode, even 
though there are two equally suited candidates.  The above described method of 
eliminating higher frequencies to find the next most frequent value would then 
identify 4 as the "second most frequent" value, even though 4 occurs
as 
frequently as 2.  The bottom line is to be aware that @MODE is inherently 
handicapped in "telling the full story" when you have a multimodal
distribution; 
it's really only suited for scenarios involving a unimodal distribution.

Cheers,
Uli
Post Reply
Re: QP9: More Related to @MODE
Wed, 16 Jan 2008 14:39:38 -070
NigelH wrote:
> Uli wrote:
>> You could extend this concept to the third most frequent value by 
>> calculating in column C (C1, then copy down):
>>
>>   @IF(B1=@MODE(B$1..B$10),"-",B1)
>>
>> Then, the 3rd most frequent value:  @MODE(C1..C10)
> 
> This kills two birds with one stone. I wasn't sure what happened in the 
> event of multi-modes (the application requires a single mode value). I 
> did some experimenting and it seemed to me that @MODE returned the value 
> closest to the mean, but your example shows that the lowest is returned.

FWIW, there's an even more efficient method of knocking multiple birds out of 
the sky (I have nothing against fauna of the ornithic variety, it's just a 
figure of speech).  At any rate, there's a way to have one helper column take 
care of ALL frequencies, rather than one helper column per frequency, thusly:

In B1 (beware wordwrap), then copy down to B10:
@IF(@MATCH(A1,A$1..A$10,0)=(@ROWS(A$1..A1)-1),@COUNTIF(A$1..A$10,A1)+@ROWS(A$1..
A1)/100000,"-")

Note:  If your sample size exceeds 100,000 data points, you need to increase the

magnitude of the divisor (100000 in the example above) accordingly.  This will 
preclude the integer portion of the frequency count from being altered, while 
allowing assignment of a unique "frequency value" to each unique value
in the 
sample.  I.e., column B now holds the frequency count with a fractional 
"signature", rather than a new sample with a previously calculated
mode eliminated.

Then the most frequent value is obtained by:
@INDEX(A1..A10,0,@MATCH(@LARGEST(B1..B10,1),B1..B10,0))

Second most frequent (or, if it exists, duplicate most frequent):
@INDEX(A1..A10,0,@MATCH(@LARGEST(B1..B10,2),B1..B10,0))

.... etc.

And, finally, the number of most frequent values (i.e., a multimodal
indicator):
@SUM(@INT(B$1..B$10)=@INT(@MAX(B$1..B$10)))

Cheers,
Uli
Post Reply
Re: QP9: More Related to @MODE
Wed, 16 Jan 2008 14:47:55 GMT
Uli:
> Therefore, the most efficient solution 
> I can think of requires one helper column.
>
Bookmarked! <s>
-- 
Good wishes!
Roy Lewis
C_Tech volunteer
(UK)

Post Reply
Re: QP9: More Related to @MODE
Wed, 16 Jan 2008 16:11:41 +000
Uli wrote:

> NigelH wrote:

>> If not, does anyone have any suggestions for a quick way to calculate 
>> what I want? It has to be something that doesn't require run-time 
>> manual intervention, because it needs to be calculated about halfway 
>> through a macro.
> 
> 
> A quick way ... no, but it presents an interesting challenge 
> nevertheless. @FREQDIST provides the most obvious vehicle for a 
> solution, but it is hampered by the requirement for an interval array in 
> ascending order, and thus not only involves additional computational 
> overhead, but more importantly, a profligate imposition on spreadsheet 
> "real estate."  Therefore, the most efficient solution I can
think of 
> requires one helper column.  Assuming your data is in A1..A10, enter in 
> B1 (and copy down to B10):
> 
>   @IF(A1=@MODE(A$1..A$10),"-",A1)
> 

Thank you! So simple and so obvious when you think about it :)

I keep an archive of spreadsheet tips, and this is joining them.


> Then, the 2nd most frequent value:  @MODE(B1..B10)
> 
> You could extend this concept to the third most frequent value by 
> calculating in column C (C1, then copy down):
> 
>   @IF(B1=@MODE(B$1..B$10),"-",B1)
> 
> Then, the 3rd most frequent value:  @MODE(C1..C10)
> 
> 
> Please keep in mind that in a multimodal distribution, the "second
most 
> frequent" value thus identified would actually be a multimodal root 
> relegated to "2nd class status," even though this so-called
"second most 
> frequent" value would be on par with the value identified as the
"most 
> frequent" one.  For example, @MODE(1,2,2,2,4,4,4,5,5,6) will return
the 
> value 2 as the mode, even though there are two equally suited 
> candidates.  The above described method of eliminating higher 
> frequencies to find the next most frequent value would then identify 4 
> as the "second most frequent" value, even though 4 occurs as
frequently 
> as 2.  The bottom line is to be aware that @MODE is inherently 
> handicapped in "telling the full story" when you have a
multimodal 
> distribution; it's really only suited for scenarios involving a unimodal 
> distribution.

This kills two birds with one stone. I wasn't sure what happened in the 
event of multi-modes (the application requires a single mode value). I 
did some experimenting and it seemed to me that @MODE returned the value 
closest to the mean, but your example shows that the lowest is returned.

Finding the frequency of the 'second' mode and comparing it to the 
frequency of the primary mode means I can knock out the unwanted values.

Many thanks,

NigelH

Post Reply
about | contact