Illustrating the aggregation functions
Written by Jason on July 20, 2011A common question is how to use (and think about) the aggregation functions. While we have a comprehensive reference guide, we thought it would be useful to illustrate them graphically, especially since they aren't single-step functions in a spreadsheet.
We're using the same data for each of these charts - you may have seen it before in an earlier post on natural gas pricing in various currencies. We'll use the minimum version of each function set for consistency, but you can also find maximums, means, sums, counts, standard deviations and variances.
We'll begin with the all_ and cond_ series. First, we'll show the chart, and then explain how each function works.
Looking at the legend in the top-left, the first (green) line is the monthly price of US city gate natural gas, from 2000 through April 2011.
The second (blue) line is the minimum price found over the entire time period, and was calculated with the formula shown in the legend.
The third (red) line is the minimum price found in the results where the year is 2005. This formula finds all of the results where the year of the Date field is 2005, and then returns the minimum value of NGCitygatePrice.
The fourth (dark yellow) line is the minimum price found in the results where the year is 1998. However, since there are no results earlier than 2000 (as determined by our filter), there are no values that satisfy the cond_min criteria, and the yellow line is at the zero mark.
The fifth (purple) line is the minimum price found in the entire dataset where the year is 1998 (note the 1 as the second parameter, as opposed to 0 in the prior yellow line).
Note that both of the all_ and cond_ functions return the same value for all rows in the result set. This is not the case for the rollup_ series.
As before, the first (green) line is the monthly price of US city gate natural gas.
The other (purple) line is the minimum price for each individual year and displayed over the entire time period. We find it useful to think of this function by rolling up the data by the consolidation fields (the year(Date)
portion of the legend), and then applying the operation to each consolidated bucket.
Keep reading to see the back_ and forward_ function series demonstrated in chart form.
The operation of the back_ and forward_ functions aren't as obvious when looking at the chart, but they are among the most useful that Tuhunga offers.
We'll start with the back_min line (blue). It represents the lowest point in the prior 180 days of natural gas prices.
The forward_min line (purple) represents the minimum price of natural gas in the following 180 days.
Note that the result of both of these functions is constrained to the data that met the filter criteria. If the third parameter in either were 1 instead of 0 (as seen in the legend), all of the data in the dataset would be used to calculate the forward and back minimum price.
In this particular instance, it wouldn't have any effect except in the first half of 2000 for the back_min function, but this isn't always the case, especially when the filter returns discontinuous data.