SQLite Tutorial  Hot PDF Print E-mail
Tag it:
Delicious
Furl it!
Digg
NewsVine
Reddit
YahooMyWeb
Technorati
Articles Reviews Structured Query Language
Written by Mike Chirico   
Monday, 27 November 2006
Article Index
SQLite Tutorial  Hot
Logging All Inserts, Updates, and Deletes
UTC and Localtime
The ATTACH Command
The Power of the Sign Function
Creating a Permanent Sign Function
Spreadsheet Format to Normalized Data
C and C API
A C Program -- Building a Class to Do the Work
Defining SQLite User Functions
Aggregate Functions
Reading Images (Blob data)


Aggregate Functions

A good place to look for ideas on creating functions is the ./src/func.c file in the sqlite3 source. Suppose you would like to create a new sum function call S. It will create the aggregate sum of the rows.

The following data is used to explain this function.

$ ./myfuncpp DATABASE "create table t(a integer, b integer, c integer)"
$ ./myfuncpp DATABASE "insert into t (a,b,c) values (1,-1,2)"
$ ./myfuncpp DATABASE "insert into t (a,b,c) values (2,-2,4)"
$ ./myfuncpp DATABASE "insert into t (a,b,c) values (3,-3,8)"
$ ./myfuncpp DATABASE "insert into t (a,b,c) values (4,-4,16)"
$ ./myfuncpp DATABASE "select * from t"
a = 1 b = -1 c = 2
a = 2 b = -2 c = 4
a = 3 b = -3 c = 8
a = 4 b = -4 c = 16
a = 4 b = -4 c = 16

Now for how the function S will create a list of the sums. Unlike the standard aggregate sum() function, a list is returned.

$ ./myfuncpp DATABASE "select S(a),S(b),S(c) from t"
S(a) = (1,3,6,10,14) S(b) = (-1,-3,-6,-10,-14) S(c) = (2,6,14,30,46)

Note that a column which contains the values in the table (1,2,3,4,5) shows the cumulative sum (1,1+2=3,1+2+3=6,1+..) in a list. This is different from any function defined in ./src/func.c, since the data must be in a string.

To view this example and all other examples, see the download. Since a list is returned, this example will use the C++ std::stringstream, since this is fast and well suited for all types of data, integer, double, and text.

Since S is an aggregate function, there are two functions, "SStep" and "SFinalize". Aggregate functions always have a "Step" and a "Finalize". "Step" is called for each row, and after the last row, the "Finalize" function is called.

Both the Step and the Finalize can make use of a structure for holding the cumulated data collected from each row. For this function, the structure SCtx is defined below. std::stringstream is global. I would not advise putting an additional variable in SCtx, "char *ss". You may think that this could be dynamically increased with realloc, which will work. However, the problem is freeing the memory hanging off the structure. There's a bit of confusion here. As the sqlite documentation correctly points out, the structure SCtx will be freed; but again, in my testing, any additional memory allocated off members in the structure will not. On the other hand, an array of std:stringstrings "BS" will have to be kept for when this function is called more than once in the same select "select S(a),S(b),...S(100th)" The overhead appears minimal.

#define MAXSSC 100

typedef struct SCtx SCtx;
struct SCtx {
double sum; /* Sum of terms */
int cnt; /* Number of elements summed */
int sscnt; /* Keeps counts for ss */
};

std::stringstream ss[MAXSSC];
int sscnt=0;

Below is the step function. p gets initialized the first time through SStep. On the first pass, all the values in the SCtx structure will be zeroed. This is a feature of "sqlite3_aggregate_context". Since std::stringstream s0 is defined as a global variable, care will have to be taken to ensure that when S is called in the same select "Select S(a),S(b)...", S(a) does not use S(b)'s stringstream.

static void SStep(sqlite3_context *context, int argc, sqlite3_value **argv){
SCtx *p=NULL;
int i;

std::string d;
if( argc<1 ) return;
p = (SCtx *) sqlite3_aggregate_context(context, sizeof(*p));
if( p->cnt == 0) /* When zero first time through */
{
if ( sscnt >= MAXSSC )
{ fprintf(stderr,"MAXSSC needs to increasen");
exit(1);
}
p->sscnt=sscnt;
sscnt++;
ss[p->sscnt].str("");
ss[p->sscnt] << "(";
d="";
} else {
d=",";

}

p->sum += sqlite3_value_double(argv[0]);
p->cnt++;
ss[p->sscnt] << d << p->sum ;


/*
* If the simple function is not used, this
* comes into play.
*/
if (p->cnt == 1)
{
for(i=1; i< argc; ++i) {
p->cnt++;
p->sum+=sqlite3_value_double(argv[i]);
ss[p->sscnt] << "," << p->sum ;
}

}

}

The line:

p = (SCtx *) sqlite3_aggregate_context(context, sizeof(*p));

will initialize p->sum, p->cnt, and p->sscnt to zero on the first entry into this function. On each successive entry, the old values be passed back. Although the std:stringstream ss variable is global, S(a) called in the select uses ss[0], S(b) will use ss[1], etc. 

Also note the comment "If the simple function is not used, this comes into play". Below that statement, i walks through the argument count. It is possible to have a function name "S", in this case defined as both an aggregate function and a simple function. The distinction is made with the number of arguments in the calling function.

This is set in sqlite3_create_function. For example, a name could be assigned to a simple function and an aggregate function. Normally, this is set up so that the simple function takes two or more arguments max(1,2,3,4,5) and the aggregate function just takes one argument max(a). Take a look at max in ./src/func.c.

Here is the Finalize function:

static void SFinalize(sqlite3_context *context){
SCtx *p=NULL;
char *buf=NULL;
p = (SCtx *) sqlite3_aggregate_context(context, sizeof(*p));

ss[p->sscnt] << ")";
buf = (char *) malloc (sizeof(char)*(ss[p->sscnt].str().size()+1));
if (buf == NULL)
fprintf(stderr,"malloc error in SNFinalize, bufn");



snprintf(buf,ss[p->sscnt].str().size(),"%s",ss[p->sscnt].str().c_str());
sqlite3_result_text(context,buf,ss[p->sscnt].str().size()+1,free );
sscnt--; /* reclaim this stream */

}

After all the rows in select have gone through the "SStep" function, the "SFinalize" function is called. The last value for the SCTx structure is assigned to p in the statement "p = (SCtx *) sqlite3_aggregate_context(context, sizeof(*p));". Note that p->sscnt is needed for indexing the correct ss. The proper memory size is allocated using +1 in ss[p->sscnt].str().size()+1, to allow for the NUL character. sqlite3_result_text takes care of freeing the memory allocated for buf.

The user functions "SStep" and "SFinalize" need to be added to the SQL language interpreter. This is done with the "sqlite3_create_function":

if (sqlite3_create_function(db, "S", 1, SQLITE_UTF8, NULL, NULL, &SStep,
&SFinalize) != 0)
fprintf(stderr,"Problem with S using SStep and SFinalizen");

Note the 1 for the third argument. This aggregate function is used when one argument is passed. To have it both ways, to have "S" defined as both an aggregate and a simple function, an SFunc would have to be created. That could handle 2 to N variables. Once this function is created, the additional "sqlite3_create_function" would be defined in main as follows:

... still in main
if (sqlite3_create_function(db, "S", -1, SQLITE_UTF8, NULL, &SFunc, NULL,
NULL) != 0)
fprintf(stderr,"Problem with S using SFunc -- simple functionn");

Here is an example SFunc function:

static void SFunc(sqlite3_context *context, int argc, sqlite3_value **argv){
std::stringstream s;
std::string d;
double sum=0;
char *buf=NULL;
int i;

s.str("");

s << "(";
d="";
for(i=0; i < argc; i++)
{
switch( sqlite3_value_type(argv[i]) ){
case SQLITE_INTEGER: {
sum+=(double) sqlite3_value_int64(argv[i]);
s << d << sum;
d=",";
break;
}
case SQLITE_NULL: {
s << d << "()";
d=",";
break;
}
default: {
sum+=sqlite3_value_int64(argv[i]);
s << d << sum;
d=",";
break;
}
}

}

s << ")";
buf = (char *) malloc (sizeof(char)*(s.str().size()+2));
if (buf == NULL)
fprintf(stderr,"malloc error in SNFunc, bufn");
snprintf(buf,s.str().size()+1,"%s",s.str().c_str());
sqlite3_result_text(context,buf,s.str().size()+1,free );

}


Now, S works as both a simple function and an aggregate function. The simple function can go in any expression, but the aggregate only works after the select. Hence, this goes back to the power of the sign function, which is a simple function.

./myfuncpp DATABASE 'select S(1,2,3,4)'
S(1,2,3,4) = (1,3,6,10)

For a few more examples, take a look at myfuncpp.cc in the download. There are some interesting functions there. For instance, there is an I or index function that works as follows:

$ ./myfuncpp DATABASE "select S(1,2,3)"
S(1,2,3) = (1,3,6)

$ ./myfuncpp DATABASE "select I(S(1,2,3),0)"
I(S(1,2,3),0) = 1

$ ./myfuncpp DATABASE "select I(S(1,2,3),1)"
I(S(1,2,3),1) = 3
, which takes the index in the list. The first index starts at zero.



Last Updated ( Sunday, 06 January 2008 )
 
< Prev   Next >