.\" XXX standard disclaimer belongs here....
.\" $Header: /private/postgres/ref/postquel/RCS/defineFunction,v 1.6 1992/07/14 05:54:17 ptong Exp $
.SP "DEFINE FUNCTION" COMMANDS 7/9/92
.XA 2 "Define Function"
.uh "NAME"
.lp
define function \*- define a new function
.uh "SYNOPSIS"
.lp
.(l
\fBdefine function\fP function_name \fB(\fP
          \fBlanguage =\fP {"c" | "postquel"}\fB,\fP
          \fBreturntype =\fP type-r
          [ \fB, percall_cpu =\fP \fB"costly\fR{\fB!\fR*}\fB"\fR ]
          [ \fB, perbyte_cpu =\fP \fB"costly\fR{\fB!\fR*}\fB"\fR ]
          [ \fB, outin_ratio = \fP percentage]
          [ \fB, byte_pct = \fP percentage]
        \fB)\fP
    \fBarg is (\fP type-1  { \fB,\fP type-n } \fB)\fP
    \fBas\fP {"/full/path/filename.o" | "list-of-postquel-queries"}
.)l
.uh "DESCRIPTION"
.lp
With this command, a \*(PP user can register a function with \*(PP.
Subsequently, this user is treated as the owner of the function.
.lp
When defining the function, the input data types,
.i type-1 ,
.i type-2 ,
\&...,
.i type-n ,
and the return data type,
.i type-r
must be specified,
along with the language, which may be
.i 
"c"
.r
or
.i
"postquel".
.r
The input types may be base or complex types.  The output type may be
specified as a base type, complex type, or 
.i "setof <type>".
The 
.i setof
modifier indicates that the function will return a set of items,
rather than a single item.
The
.i as
clause of the command is treated differently for C and \*(PQ
functions, as explained below.
.lp
.uh "C FUNCTIONS"
.lp
Functions written in C can be defined to \*(PP, which will
dynamically load them into its address space.  The loading happens
either via the \fBload\fP command, or automatically the
first time the function is necessary for execution. Repeated execution
of a function will cause negligible additional overhead, as the
function will remain in a main memory cache.
.lp
The 
.i percall_cpu,
.i perbyte_cpu,
.i outin_ratio,
and
.i byte_pct
flags are provided for C functions to give a rough estimate of the
function's running time, allowing the query optimizer to postpone applying
expensive functions used in a query's
.i where
clause.  The 
.i percall_cpu
flag captures the overhead of the function's invocation (regardless of
input size), while the 
.i perbyte_cpu
flag captures the sensitivity of the function's running time to the
size of its inputs.  The magnitude of these two parameters is
determined by the number of exclamation points appearing after the
word
.i costly: 
specifically, each exclamation point can be thought of as another
order of magnitude in cost, i.e.,
.i "cost = 10\s-2\unumber-of-exclamation-points\d\s0."
The default value for 
.i percall_cpu
and
.i perbyte_cpu
is 0.  Examples of reasonable cost values may be found in the system catalog
.i pg_proc;
most simple functions on base types have costs of 0.
.lp
The
.i outin_ratio
is provided for functions which return variable-length types, such as
.i text
or
.i bytea.
It should be set to the size of the function's output as a percentage
of the size of the input.  For example, a function which compresses
its operands by 2 should have
.i "outin_ratio = 50."
The default value is 100.
.lp
The
.i byte_pct
flag should be set to the percentage of the bytes of the arguments
that actually need to be examined in order to compute the function.
This flag is particularly useful for functions which generally take a
large object as an argument, but only examine a small fixed portion of
the object.  The default value is 100.
.lp
The body of a C function following 
.i as
should be the FULL PATH of the object code (.o file) for the function,
bracketed by quotation marks.  (\*(PP will not compile a function
automatically -- it must be compiled before it is used in a define
function command.)
.lp
C functions with base type arguments can be written in a
straightforward fashion.  The C equivalents of built-in \*(PP types
are accessible in a C file if 
.ft C
$POSTGRESHOME/src/lib/H/utils/builtins.h
.ft
is included as a header file.  This can be achieved by having
.(l
.ft C
#include <utils/builtins.h>
.ft
.)l
at the top of the C source file and by compiling all C files with the
following include options:
.(l
.ft C
 -I$POSTGRESHOME/src/lib/H
 -I$POSTGRESHOME/src/port/$PORTNAME
 -I$POSTGRESHOME/O/lib/H
.ft
.)l
before any ".c" programs in the "cc" command line, e.g.:
.(l
.ft C
cc -I$POSTGRESHOME/src/lib/H \\
        -I$POSTGRESHOME/src/port/$PORTNAME \\
        -I$POSTGRESHOME/O/lib/H \\
        -c progname.c
.ft
.)l
The directory
.ft C
$POSTGRESHOME/O/lib/H
.ft
contains "tags.h", which is generated in the
build process.  The directory
.ft C
$POSTGRESHOME/src/port/$PORTNAME
.ft
contains
"machine.h".  Typical values for PORTNAME are sunos4 and ultrix4.
.lp
The convention for passing arguments to and from the user's C functions is
to use pass-by-value for data types that are 32 bits (4 bytes) or
smaller, and pass-by-reference for data types that require more than 32 bits.
The following table gives the C type required for parameters in the C
functions that will be loaded into \*(PP.  The "Defined In" column
gives the actual header file (in the
.ft C
$POSTGRESHOME/src/lib/H
.ft
directory) that the equivalent
C type is defined.  However, if you include "utils/builtins.h", these
files will automatically be included.
.uh "Equivalent C Types for Built-In \*(PP Types"
.lp
.TS
center;
l l l
l l l.
\fBBuilt-In Type\fP	\fBC Type\fP	\fBDefined In\fP
_
abstime 	AbsoluteTime	utils/nabstime.h
bool	bool	tmp/c.h
box	(BOX *) 	utils/geo-decls.h
bytea	(bytea *)	tmp/postgres.h
char	char	N/A
char16	Char16 or (char16 *)	tmp/postgres.h
cid	CID	tmp/postgres.h
int2	int2	tmp/postgres.h
int28	(int28 *)	tmp/postgres.h
int4	int4	tmp/postgres.h
float4	float32 or (float4 *)	tmp/c.h or tmp/postgres.h
float8	float64 or (float8 *)	tmp/c.h or tmp/postgres.h
lseg	(LSEG *)	tmp/geo-decls.h
oid	oid	tmp/postgres.h
oid8	(oid8 *)	tmp/postgres.h
path	(PATH *)	utils/geo-decls.h
point	(POINT *)	utils/geo-decls.h
regproc 	regproc or REGPROC	tmp/postgres.h
reltime 	RELTIME 	tmp/postgres.h
text	(text *)	tmp/postgres.h
tid	ItemPointer	storage/itemptr.h
tinterval	TimeInterval	tmp/nabstime.h
uint2	uint16	tmp/c.h
uint4	uint32	tmp/c.h
xid	(XID *) 	tmp/postgres.h
.TE
.lp

Complex arguments to C functions are passed into the C function as a
special C type, TUPLE, defined in
.ft C
$POSTGRESHOME/src/lib/H/tmp/libpq-fe.h.
.ft
Given a variable t of this
type, the C function may extract attributes from the function
using the function call:
.(l
.ft C
GetAttributeByName(t, "fieldname", &isnull)
.ft
.)l
where 
.i isnull
is a pointer to a bool, which the function sets to \fItrue\fP if the
field is null.  The result of this function should be cast
appropriately as shown in the examples below.
.lp
.uh "POSTQUEL FUNCTIONS"
.lp
\*(PQ functions execute an arbitrary list of \*(PQ queries,
returning the results of the last query in the list.  \*(PQ
functions in general return sets.  If their returntype is not
specified as a \fIsetof\fP, then an arbitrary element of the last
query's result will be returned.  The expensive function parameters
.i percall_cpu,
.i perbyte_cpu,
.i outin_ratio,
and
.i byte_pct
are not used for \*(PQ functions; their costs are determined
dynamically by the planner.
.lp
The body of a \*(PQ function following
.i as
should be a list of queries separated by
whitespace characters and bracketed within quotation marks.  Note that
quotation marks used in the queries must be escaped, by preceding them
with two backslashes (i.e. \\").
.lp
Arguments to the \*(PQ function may be referenced in the queries
using a $n syntax: $1 refers to the first argument, $2 to the second,
and so on.  If an argument is complex, then a "dot" notation may be used to
access attributes of the argument (e.g. $1.emp), or to invoke
functions via a nested dot syntax.  
.lp
.uh "EXAMPLES: C Functions"
.lp
The following command defines a C function, overpaid, of two basetype
arguments.
.(l
.ft C
define function overpaid
        (language = "c", returntype = bool)
        arg is (float8, int4)
        as "/usr/postgres/src/adt/overpaid.o"
.ft
.)l
The C file "overpaid.c" might look something like:
.(l
.ft C
#include <utils/builtins.h>

bool overpaid(salary, age)
        float8 *salary;
        int4    age;
{
        if (*salary > 200000.00)
                return(TRUE);
        if ((age < 30) && (*salary > 100000.00))
                return(TRUE);
        return(FALSE)
}
.ft
.)l
The overpaid function can be used in a query, e.g:
.(l
.ft C
retrieve (EMP.name)
    where overpaid(EMP.salary, EMP.age)
.ft
.)l
.lp
One can also write this as a function of a single argument of type
EMP:
.(l
.ft C
define function overpaid_2
        (language = "c", returntype = bool)
        arg is (EMP)
        as "/usr/postgres/src/adt/overpaid_2.o"
.ft
.)l
The following query is now accepted:
.(l
.ft C
retrieve (EMP.name) where overpaid_2(EMP) 
.ft
.)l
In this case, in the body of the overpaid_2 function, the fields in the EMP
record must be extracted.  The C file "overpaid_2.c" might look
something like:
.(l
.ft C
#include <utils/builtins.h>
#include <tmp/libpq-fe.h>

bool overpaid_2(t)
TUPLE t;
{
    float8 *salary;
    int4    age;
    bool    salnull, agenull;

    salary = (float8 *)GetAttributeByName(t, "salary",
                                          &salnull);
    age = (int4)GetAttributeByName(t, "age", &agenull);
    if (!salnull && *salary > 200000.00)
        return(TRUE);
    if (!agenull && (age<30) && (*salary > 100000.00))
        return(TRUE);
    return(FALSE)
}
.ft
.)l
.lp
.uh "EXAMPLES: \*(PQ Functions"
.lp
To illustrate a simple \*(PQ function, consider the following,
which might be used to debit a bank account:
.(l
.ft C
define function TP1
        (language = "postquel", returntype = int4)
        arg is (int4, float8)
        as "replace BANK (balance = BANK.balance - $2)
              where BANK.accountno = $1
            retrieve(x = 1)"
.ft
.)l
A user could execute this function to debit account 17 by $100.00 as follows:
.(l
.ft C
retrieve (x = TP1( 17,100.0))
.ft
.)l
.lp
The following more interesting examples take a single argument of type
EMP, and retrieve multiple results:
.(l
.ft C
define function hobbies
    (language = "postquel", returntype = setof HOBBIES)
    arg is (EMP)
    as "retrieve (HOBBIES.all)
          where $1.name = HOBBIES.person"

define function children
    (language = "postquel", returntype = setof KIDS)
    arg is (EMP)
    as "retrieve (KIDS.all)
          where $1.name = KIDS.dad
             or $1.name = KIDS.mom"
.ft
.)l
.lp
Then the following query retrieves overpaid employees, their hobbies, and
their children:
.lp
.(l
.ft C
retrieve (name=name(EMP), hobby=name(hobbies(EMP)),
	kid=name(children(EMP)))
    where overpaid_2(EMP)
.ft
.)l
Note that attributes can be projected using function syntax (e.g.
name(EMP)), as well as the traditional dot syntax (e.g. EMP.name).
.lp
An equivalent expression of the previous query is:
.(l
.ft C
retrieve (EMP.name, hobby=EMP.hobbies.name,
	kid=EMP.children.name)
    where overpaid_2(EMP)
.ft
.)l
This "nested dot" notation for functions can be used to cascade
functions of single arguments.  Note that the function after a dot
must have only one argument, of the type returned by the function
before the dot.
.lp
\*(PP 
.i flattens
the target list of the queries above.  That is, it produces the cross-product
of the hobbies and the children of the employees.  For example, given
the schema:
.(l
.ft C
create BANK (accountno = int4, balance = float8) 
append BANK (accountno = 17,
             balance = "10000.00"::float8)
create EMP (name = char16, salary = float8,
            dept = char16, age = int4)
create HOBBIES (name = char16, person = char16) 
create KIDS (name = char16, dad = char16, mom = char16) 
append EMP (name = "joey", salary = "100000.01"::float8,
            dept = "toy", age = 24) 
append EMP (name = "jeff", salary = "100000.01"::float8,
            dept = "shoe", age = 23)
append EMP (name = "wei", salary = "100000"::float8,
            dept = "tv", age = 30) 
append EMP (name = "mike", salary = "500000"::float8,
            dept = "appliances", age = 30) 
append HOBBIES (name = "biking", person = "jeff" ) 
append HOBBIES (name = "jamming", person = "joey" ) 
append HOBBIES (name = "basketball", person = "wei") 
append HOBBIES (name = "swimming", person = "mike") 
append HOBBIES (name = "philately", person = "mike") 
append KIDS (name = "matthew", dad = "mike",
             mom = "teresa")
append KIDS (name = "calvin", dad = "mike",
             mom = "teresa")
.ft
.)l
.sp
the query above returns
.lp
.ft C
.TS
center;
l l l
l l l.
name	hobby	kid
_
jeff	biking	(null)
joey	jamming	(null)
mike	swimming	matthew
mike	philately	matthew
mike	swimming	calvin
mike	philately	calvin
.TE
.ft
.lp
Note that flattening preserves the name and hobby fields even when the kid
field is null.
.lp
.uh "SEE ALSO"
.lp
information(unix), load(commands), remove function(commands).
.lp
.uh "NOTES"
.lp
The 
.i percall_cpu
and
.i perbyte_cpu
flags can take integers surrounded by quotes instead of the 
.i """costly{!*}"""
syntax
described above.  This allows a finer grain of distinction between
function costs, but is not encouraged since such distinctions are
difficult to estimate accurately.
.lp
On Ultrix, all .o files that \*(PP is expected to load dynamically must
be compiled under \f2cc\fP with the
.ft C
"-G 0"
.ft
option turned on.
.uh "RESTRICTIONS"
.lp
The name of the C function must be a legal C function name, and the name of
the function in C code must be exactly the same as the name used in
define function.
.lp
.uh "BUGS"
.lp
Function names must be unique per database, except for the fact that
there may be attributes of the same name as a function.  In the case
that a there is an ambiguity between a function on a complex type and
an attribute of the complex type, the attribute will always be used.
.lp
C functions cannot return a set of values.
.lp
The dynamic loader for DECstation Ultrix has exceedingly bad performance.
