A collection of learnings and opinions.

Thursday, September 25, 2008

CountDistinct - How About SumDistinct - Or Just AggregateDistinct?

Making a report in Microsoft SQL Server 2005 Reporting Services today I needed to sum up several levels in a table. However, the numbers where available only at an intermediate aggregated level, so I needed to sum up only the distinct sublevel-items. Got that? Maybe it's easier if I show an example:


Let's concoct an example here. I'm extracting data in the following groups: Firm, ProductType, Product, Account, Item. Say I'm summing up one measure, BalanceAmount, on products, but my dataset returns other measures on the lowest level, Item.


Here's an example DataSet






















































































Firm ProductType Product Account Item BalanceAmount ItemMeasure
Acme Explosive Nitroglyserin C-001 Bottle $200 $2.14
Acme Explosive Nitroglyserin C-001 Mug $200 $4.87
Acme Explosive TNT C-001 Stick $200 $0.79
Acme Explosive Plastique C-001 Wad $200 $7.64
Acme Prop Fake rock C-001 Crate $200 $4.55
Acme Explosive TNT C-124 Stick $89 $0.79
Acme Explosive TNT C-124 Half-stick $89 $0.40
Acme Explosive Nitroglyserin C-089 Stick $40 $0.79


In the report I need to show this as a drill-down table, with the measures down on the lowest level (Item) summed up on every level, and the aggregated measure (BalanceAmount) summed up on every level where it makes sense.


The problem is that if you use the common SUM function the report would sum up all BalanceAmounts on the Account level resulting in a BalanceAmount of $1000 for Account C-001, not $200 which is the correct value (that would be $200, Einstein).


What you really need is something like CountDistinct, but just for Sum. In fact, this is a specific case of a more general problem where you need to conduct some aggregation on some scope. How do I do something like that?


It's not really hard. What I need is some way to make just the first instance of a value in whatever scope needed be the counting one. To get this I enter the
following into the Code tab of you Report properties:



    Dim scopes As System.Collections.Hashtable
Function getDistinct(ByVal checkMe As Object, ByVal scope As String) As Boolean
Dim firstTimeSeen As Boolean

If (scopes Is Nothing) Then
scopes = New System.Collections.Hashtable
End If

If (Not scopes.Contains(scope)) Then
scopes.Add(scope, New System.Collections.ArrayList)
End If

If (Not CType(scopes(scope), System.Collections.ArrayList).Contains(checkMe)) Then
firstTimeSeen = True
CType(scopes(scope), System.Collections.ArrayList).Add(checkMe)
Else
firstTimeSeen = False
End If

Return firstTimeSeen
End Function

The usage pattern to get SumDistinct in this case is:


=SUM(IIF(Code.getDistinct(Fields!account.Value, "scopeId"), Fields!BalanceAmount.Value, 0))

The "scopeId" string identifies the current scope, so it will change from group to group. E.g. in the Firm group you'd enter "firm", in the ProductType group you'd enter "prodType", etc.


So what am I doing here? We're remembering the values of the checkMe parameter in the context of the scope, and returning True in only the first instance. Thus we can just sum the first of every checkMe value. You could use this to do your own CountDistinct by entering the following expression (but this would be a bit roundabout for that use-case):


=COUNT(IIF(Code.getDistinct(Fields!account.Value, "scopeId"), Fields!BalanceAmount.Value, 0))

I hope you get some use out of this. This kind of code is easy to crank out, but even easier to just copy-paste.

1 comment:

Bruno Valente said...

You are big, MAN. Thanks for the script. Work's great !!