10 Sep 2010 
Support Center » Knowledgebase » SQL Template Queries
 SQL Template Queries
Solution SQL Template Queries are a new powerful feature, which would allow definition of custom queries for ACL's, WBL's and even other more: autowhitelist / autoblacklist. This feature would use MySQL engines defined in configuration file. At this point, SQL Template Query defintions cannot be done from MPP GUI, but they should added manually in configuration file: mppd.conf.xml

Introduction

MySQL Engine allows to defined a connection to a MySQL database and to associate one or  more  queries  of  arbitrary  form  with  the  connection.   Then  queries  can  be  used  by  other MPP   parts   (Users)   to   utilize   database   functionality.   It   is   possible   to   specify   that   during execution  of   queries   different   variables   (sender,   recipient,   client   ip,   etc)   must   be  inserted into   query   text   before   sending   it   to   the   database.   Basic   means   of   processing   results   of queries   and   presenting   them  to   Users   are   provided.   MySQL  Engine   implements   Generic Engine interface and can be used in any context that uses this interface.

Define MySQL Engine

An  instance  of  MySQL  Engine  is  defined  in  mppd.conf.xml  file  under  <mppd><engines> node with node name <mysql>. ID of the engine is defined with “id” attribute of <mysql> node.  Multiple MySQL  Engines with  different  ID's  can  be defined but  ID  must  be  unique with <engines>. MySQL Node Structure is described further in “Structural View” section. There   should   be   specified   a   connection   and   one  or   more   queries.   Queries   will   share specified connection.

Define Queries

One   or   more   queries   that   will   be   associated   with   MySQL   Engine   and   will   share   its connection are defined under <mysql> node with a series of <query> nodes. ID of a query is   defined   with   “id”   attribute   of   <query>   node   and   must   be   unique   within   the   engine.  Query consists from Query Template and a series of Query Results.

Specify Query Template

Query   Template   is   specified   under   <query>   node   using   <template>   node.   During execution   query   text   is   made   from   Query   Template   by   substituting   different   runtime variables. 

Specify Query Results

One  or  more  Query  Results   are  specified  under  <query>  node  using  <result>  node.   ID  of result is defined with “id” attribute of <result> node and must be unique within the query. Query  Result  is a  string  that  is deduced  from  result  table  using a set of  conditions,  setters and rules for table rows traversal.

Use MySQL Engine and Queries

MySQL  Engine  is   used  though  its   Queries.   Query  is   a  Generic  Engine  that   is   referenced by  ID  that   is   composed  from  ID  of   MySQL  Engine  and  ID  of   a  Query  separated  by  dot (like   “mysql_engine_id.query_id”).   Query   can   be   referenced   in   any   context   that   accepts Generic Engines.

Use Query Results

Query  Results  are  used  in two ways.  First  of  all  different  contexts where  queries  are used may   impose   requirements   on   Query   Results   and   use   this   results   in   specific   manner. Second   way   is   usage   of   results   in   templates,   conditions   and   setters   of   other   queries   that goes   next   in   processing   chain.   In   that   way   result   of   a   query   is   referenced   using   dot­
separated   composition   of   word   “engines”,   engine   ID,   query   ID   and   result   ID   (like “engines.mysql_engine_id.query_id.result_id”).

Handle Multi­Value Variables

Some  variables   withing  processing  context   may  have  multiple   values.   For   example  there may   be   multiple   recipients   for   a   message   so   recipient   is   considered   as   multi­value variable. To avoid introducing of loops operators MySQL Engine does implicit looping,  i.e.   it   repeats   corresponding   action   for   each  value   of   a  variable.   Consequently   if   there   are two  multi­value  variables   then  action  is   repeated  for   each  combination or the two and  so on. Results of such queries become multi­value variables too.

MySQL Node Structure

MySQL  Node   defines   a   configuration   of   MySQL   Engine   and   its   queries.   The   following  example is intended to provide quick inside into its structure:
<mppd>
 <common>
 <passwords>/usr/local/MPP/passwords</passwords>
....
 </common>
 <engines>

  <mysql id=”mpp”>
   <connection>
   <host>example.com</host>
   <port>2222</port>
   <database>mpp</database>
   <unix_socket>/usr/local/mysql/var/mysql.sock</unix_socket>
   <defaults_file>/etc/my.cnf</defaults_file> 
   <concurrency>4</concurrency>
   <timeout>
    <connect>60</connect>
   </timeout>
   <user>mpp_database_user</user>
   <password_id>password_id_from_passwords_file</password_id>
 </connection>
 <query id=”query_1”>
  ...
 </query>
 <query id=”query_2”>
  ...
 </query>
  ...
 </mysql>
</engines>


The passwords file would contain a list of password_id and password pairs, separated by space:
password_id_1 password1
password_id_2 password2

Query  Nodes   will   be  specified  in  next   sections.   All   options   and  attributes  are  optional.   If any   is   omitted   then   default   value   is   used.   Minimum   survival   specification   that   defines engine with ID “mysql” and single query with ID “query” can look like this:

<engines>
<mysql>
<connection>
<host>example.com</host>
<port>2222</port>
<database>mpp</database>
</connection>
<query>
...
</query>
</mysql>
</engines>

Description of node values and attributes is provided bellow:
<mysql>
Specifies: MySQL Engine
Attributes:
id
Specified: id if the engine
Value: string
Default: “mysql” (i.e. name of the node)
Value: breaks further on sub­nodes

<connection>
Specifies: Properties for connection to database
Value: breaks further on sub­nodes

<host>

Specifies: remote host where database server is running
Value: string (DNS name or IP of a host)
Default: empty

<port>
Specifies: remote port that database server listens for
Value: integer
Default: 3306

<database>
Specifies: remote database to query in
Value: string
Default: empty

<unix_socket>
Specifies: local socket to connect to local server
Value: string (path to a socket)
Default: empty

<defaults_file>
Specifies: path to file with defaults options for mysql client library
Value: string (path to file)
Default: empty

<concurrency>
Specifies: number of concurrently running connections
Value: integer
Default: equal to the number of processing threads

<timeout>
Specifies: timeouts for different operations
Value: breaks further on sub­nodes

<connect>
Specifies: timeout for connecting to server
Value: integer (seconds)
Default: 60

<query>
See next sections

Query Node Structure

Query   Node   specifies   configuration   of   a   query   and   its   results   within   MySQL   engine. Multiple <query> nodes may be specified under <mysql> node.  The following example is intended to provide quick inside into its structure:

<mppd>
 <engines>
  <mysql id=”mpp”>
   ...
   <query id=”wbl”>
    <template>...</template>
    <result id=”result_1”>
      ...
    </result>
    <result id=”result_2”>
     ...
    </result>
   <query>
  </mysql>
 </engines>
</mppd>

Query Node contains query template and one or more results. Results specifies how result  table   will   be   mapped   to   string   results   that   then   exposed   as   results  to   query   users. Description of node values and attributes is provided bellow:

<query>
Specifies: single query
Attributes:
id
Specifies: id of the query
Value: string
Default: “query” (i. e. name of the node)
Value: breaks further on sub­nodes

<template>
Specifies: query template
Value: string (format is described further in Query Template section)
Default: empty (no query is sent at all)

<result>
Specifies: how result table is converted to string result of the query
Attributes:
id
Specifies: id of the result
Value: string
Default: “result”
Value: breaks further on sub­nodes (see Query Result section)

Query Template

 Query  Template   is   a  string   with   macro  substitutions.   For   example  template  “here  comes  $recipient   of   a   message”   during   runtime   will   produce   the   following   query   string:   “here comes   recipient@example.com  of   a  message”.   In   this   example  “here  comes   “  and  “  of   a  message” are a plain text pieces and $recipient is macro substitution. Plain   text   pieces   are   just   'constant'   text.  
  There   are   a   few  special   characters   that   must   be escaped within it. First of all it is XML escapes:
char     escape
  &        &amp;
  <        &lt;
  >        &gt;

Second group of escapes is connected with template syntax itself and is the following:
char    escape
   \          \\
   {         \{
   }         \}
   $         \$

Macro   substitution   starts   from  unescaped   '$'   (dollar)   character.   Macro   consists   of   macro name and depending on macro type – macro arguments.  Macro name may consists of any of alphabetic letters (a­zA­Z), '.' (dot) and '_' (underscope).
For macro that doesn't expect arguments (simple macro) there are two possible forms:
$macroName
${macroName}

In first form macro name is everything starting immediately after '$' and ending at first not allowed  for   macro  name  character   (usually  space).   In  second  form  macro  name  is  quoted in  curly  braces  '{}'.   This  form  is  useful   if  immediately  after  macro  should  go  a  plain  text character that otherwise can be considered as part of macro name.

 Examples:
SELECT `id` FROM `contacts` WHERE `address`=$sender INSERT IGNORE INTO `contacts` (`address`) VALUES (${sender}with_suffix)

For macro that expects one or more arguments there is the following form:
${macroName macroArgument1 macroArgument2}

Macro   arguments   are   templates   itself   that   may   recursively   consists   of   plain   text   and macros.   Macro   arguments   are   separated   with   spaces   so   it   is   required   that   plain   text   in arguments at highest level contain no space. Space inside recursive macros are allowed. If there  is   a  need  for   space  in  argument   then  entire  argument   must   be  quoted  with  '{}'.   For example:
${wrap { OR `users`.`address`='$#'} ${escape $sender.component}}

Here you may see 'wrap' macro with two arguments. First argument must contain space in plaint text around 'OR' so entire argument is quoted with '{}'. Quotation characters are not appeared in result string generated   from  macro.   You may notice nonconforming macro name '$#'. This is special feature of 'wrap' macro that will be described further. Last macro name is 'sender.component'.   Dot   '.'   is   legal   character   for   macro   name and generally expresses access to smaller property of a bigger object.

The following simple macros are available:
$sender
Specifies: SMTP envelope sender of a message
Dimensions: zero dimension variable

$sender.local
Specifies: local part of SMTP envelope sender
Dimensions: zero dimension variable

$sender.domain
Specifies: domain part of SMTP envelope sender
Dimensions: zero dimensional variable

$sender.component
Specifies: list of components of SMTP envelope sender. Components are full email itself and email domains starting from most specific to less specific. For example address sender@domain.example.com will have the following components:
 sender@domain.example.com, domain.example.com, example.com and com.
Dimensions: one dimensional variable

$recipient
Specifies: SMTP envelope recipient of a message
Dimensions: one dimensional variable

$recipient.local
Specifies: local part of SMTP envelope recipient of a message
Dimensions: one dimensional variable

$recipient.domain
Specifies: domain part of SMTP envelope recipient of a message
Dimensions: one dimensional variable

$recipient.component
Specifies: list of components of SMTP envelope recipient. Components are full email itself and email domains starting from most specific to less specific. For example address recipient@domain.example.com will have the following components:
  recipient@domain.example.com, domain.example.com, example.com and com.
Dimensions: two dimensional variable

$ip
Specifies: IP address of client host as string. If IP is not available this is empty string.
Dimensions: zero dimensional variable

$ip.dec
Specifies: IP address of client host as integer. If IP is not available this is zero.
Dimensions: zero dimensional variable

$ip.hex
Specifies: IP address of client host as hexadecimal integer. If IP is not available this is zero.
Dimensions: zero dimensional variable

$host
Specifies: DNS name of client host. If DNS name is not available this is empty string.
Dimensions: zero dimensional variable

$group
Specifies: name of MPP group that executes a query.
Dimensions: zero dimensional variable
Next   macros   are   functions   that   preprocess   their   arguments   before   inserting   them   into result string. Dimensions of a function is a union of dimensions of its arguments.

${escape argument}
Evaluates its argument and does MySQL specific escaping of character that are special to MySQL syntax.  It is strongly recommended that all strings coming from outside world   ($sender,   $recipient,   ...)   are   escaped   and   quoted   when   used   in   query templates.  Otherwise  it may  appear  runtime error  or even security  threat  if those strings   contain   special   characters   or   commands.   For   example   one   of   previously appeared statements should be refined as following:
SELECT `id` FROM `contacts` WHERE `address`='${escape $sender}'
INSERT IGNORE INTO `contacts` (`address`) VALUES ('${escape $sender}with_suffix')
Note   that   it   is   also   added   MySQL   quotation   around   '${escape   $sender}'   to   prevent interpreting of strings as command.
${wrap template argument2}
This is a special tool for concatenating multi­value variable into single string. It creates 'written'  lists. The tool effectively 'wraps' last dimension of its last argument.  That is why it is called 'wrap'. Lets start explanation of this tool from an example. Imagine that there   is   a   message   with   two   recipients:   rcpt@example.com   and   other@domain.net.
Then the following template:
SELECT `id` FROM `contacts` WHERE `address`='${escape $recipient}'
will lead to generation and execution of two separate queries:
SELECT `id` FROM `contacts` WHERE `address`='rcpt@example.com'
SELECT `id` FROM `contacts` WHERE `address`='other@domain.net'

Result of the query will have same dimension as $recipient variable and thus will be multi­value too. From the other hand another template:
SELECT `id` FROM `contacts` WHERE ${wrap `address`='$#'{ OR } ${escape $recipient}}
will lead to single query with list of recipients separated with ' OR ':
SELECT `id` FROM `contacts` WHERE address`='rcpt@example.com' OR address`='other@domain.net'
Result   of   last   query   will   have   zero   dimension   even   that   its   template   depends   on  $recipient.

Wrap macro imposes special syntax and semantics on its first argument (which is called wrap template):

* Argument consists from main part and optional suffix. Main part generates list entries and suffix generate list separator. Important  property of separator (suffix) that it will not be put after last item in list. Suffix is placed at the end of main part and   separated   from   it   with   '{}'   quotation.   In   the   example   above   main   part   is “`address`='$#'”  and  separator   (suffix)   is  “  OR  “.   If   main  part's   plain   text   must contain   spaces   the   entire   argument   including   suffix   must   be   quoted   with   '{}'   as  follow: ${wrap {plain text $# with spaces {suffix}} argument2}.

*  Special   macro  $#  is   substituted   with   value   of   second   argument   each   time   list entry  is   generated  during  iteration  through  last   dimension  of   the  argument.   This macro is called placeholder.
Placeholder must not appear inside other macro so it is illegal to write '${escape $#}'.  Main   purpose   of   wrap   macro   is   to   generate   lists   of   email   components.   Lets   again imagine   that   there   is   a   message   with   two   recipients:   rcpt@example.com   and other@domain.net.   The   following   template   will   produce   two   queries   (one   for   each recipient) that will contain lists of recipient email components:
SELECT `id` FROM `contacts`  WHERE ${wrap `address`='$#'{ OR } ${escape $recipient.component}}
Will produce:
SELECT `id' FROM `contacts` WHERE `address`='rcpt@example.com' OR `address`='example.com' OR `address`='com'
SELECT `id' FROM `contacts` WHERE `address`='other@domain.net' OR `address`='domain.net' OR `address`='net'
Consequently query result will be one dimensional with dimension same as $recipient variable.

Next macros can be used only when evaluating conditions of result cases and generating result strings after query is generated and executed. It is a runtime error when this macros are used for query generation.
${field name}
Returns  a  field  value  for   field  with  name  'name'  for  current   row.   Should  be  used  with SELECT  queries.   Operates  on  field  aliases  if  specified.   For  example  if  query  template is:
SELECT `id` FROM `contacts` WHERE `address`='${escape $recipient}'
then  expression  ${field   id}  will   evaluate   to   selected   value  of  `id`  field.  
Example with alias:
SELECT COUNT(*) AS n FROM `contacts` WHERE `address`='${escape $recipient}'
In this case ${field n} will evaluate to calculated value of COUNT(*). Field  macro  should  not   be  used  for   fields   that   may  contain  binary  data  with  '\0'   (zero) characters. Otherwise unexpected results may be produced.

$insert_id
Returns value of auto­increment column for last INSERT query.

Query Result

Query Result defines how result table is mapped to a result string that is exposed to engine users.  Multiple results can be defined for each query. The following example is intended to provide quick inside into structure of Query Result node:
<mppd>
<engines>
<mysql id=”mpp”>
...
<query id=”wbl”>
...
                 <result id="result">
                     <if_empty_table>
                         <result>none</result>
                     </if_empty_table>
                     <if_filled_table>
                         <row_to_case_relation>all­to­one</row_to_case_relation>
                         <case>
                             <condition>${field wb} $EQ W $OR ${field wb} $EQ Y</condition>
                             <result>spam_whitelist</result>
                         </case>
                         <case>
                             <condition>${field wb} $EQ B $OR ${field wb} $EQ N</condition>
                             <result>blacklist</result>
                         </case>
                         <result>none</result>
                    </if_filled_table>
                 </result>
</query>
...
</mysql>
</engines>
</mppd>

Each result has an id assigned to it that will be used to access the result by users. Result is branched   for   empty   result   table   (or   for   not   performed   query   due   to   unspecified   query template   or   other   reason)  <if_empty_table>  and   for   filled   result   table  <if_filled_table>. Each  branch   contains   zero   or   more  <case>  and  mandatory  default   <result>  setter.   Case consists   from  <condition>  and  <result>  setter   that   will   be   used   if   corresponding condition   is   met.   Additional   option  <row_to_case_relation>  that   have   effect   in <if_filled_table> branch. It tunes the way rows and cases are iterated and matched.
Processing flow is the following.
After template is generated and query is executed result table is obtained. If table is empty (query was not actually performed due to unspecified query template or other reason) then <if_empty_table> branch is executed. Otherwise <if_filled_table> branch is executed. Branch  <if_empty_table>  is   executed  in  the  following  way.   First   each  <case>  condition
is   matched   until   match   is   found.   If   it   is   found   corresponding  <result>  template   is evaluated  and  final result  is set  equal to  obtained  value.  If matched condition is not found
or  there  are  no  <case>  nodes  at   all   then  default  <result>  template  is  evaluated  and  final result   is   set.   Because   the   branch   is   for   empty   table   macro  ${field   ...}  is   not   allowed   in
<condition>'s and <results>'s of the branch.
Execution   of  <if_filled_table>  branch   is   a   bit   complexer.   Because   in   general   case   there may be multiple rows in result table and multiple <case>'s  specified final matching result
of rows against <case> <condition>'s may depend on algorithm of traversal through both sets.   Option  <row_to_case_relation>  provides   a   user   with   ability   to   select   predefined algorithm  suitable   for   his   needs.     The   option   may  have   the   following   values:  all­to­one, one­to­all  and  first­to­all.   Algorithm  all­to­one  matches   all   rows   to   one   first   (one)   case, then   to   second   (one)   case   and   so   on   until   match   is   found.   Algorithm  one­to­all  matches first   (one)   row   to   all   cases,   then   second   (one)   row  to   all   case   and   so   on   until   match   is found. Algorithm firsttoall matches only first row to all cases until match is found (other rows   are   not   matched   at   all   and   thus   ignored).   If   match   '<case>   <condition>  is   found then  corresponding  <case>  <result>  is   evaluated  and  is   taken  as   final   result.   Otherwise default  <result>.   Macro  ${field  ...}  can  be  used  in  <case>  <condition>'s  and  <result>'s to   access   any   field   of   current   row   by   name.   This   macro   can   also   be   used   in   default <result>  but its meaning depends on <row_to_case_relation>: for  all­to­one  and one­to­all it is applied to last row but for first­to­all it is applied to first row.
Option  <condition>  is   a   string   that   has   a   format   similar   to   Query   Templates   but   with   a few   additional   features   needed   for   making   comparisons   and   boolean   expressions.   The
format is described further in Case Condition Format section.
Option <result> is a string that has the same format as Query Template. One deference is in   the  <if_filled_table>  branch   macro  ${field   ...}  is   available.   In   both   contexts ${insert_id} is available but it should be used for INSERT's which do not return results so actually <if_empty_table> branch will always be executed.

Case Condition Format

Case   Condition   string   is   a   boolean   expression   string   that   consists   from   arguments, comparisons   and   boolean   expressions.   Arguments   have   the   same   format   as   Query Templates.   Then   arguments   are   compared   with   comparison   operators.   Then   if   there   are more   then   one   comparison   needed   comparisons   may   form   boolean   expression.   For example here is a simple comparison: ${field condition} $EQ 0. Field condition is checked for   equality   with   zero.   And   here   is   two   comparisons   that   forms   boolean   expression:
${field wb} $EQ W $OR ${field wb} $EQ Y. Field wb is compared for equality with 'W' or 'Y' strings.

The following comparison operator are available:
Operator           Meaning           Equivalent in C/C++
 $EQ                    equal                ==
 $NE                  not equal            !=
 $GT                   greater               >
 $LT                    lower                <
 $GE               greater or equal    >=
 $LE               lower or equal      <=
 
The following boolean operators are available:
Operator            Meaning                  Equivalent in C/C++
 $NOT          unary logical negate              !
 $AND         binary logical AND             &&
 $OR            binary logical OR                  ||

Arguments should not contain spaces at top most plain text. However if they must they should be quoted with {...}: ${field some} $EQ {here comes text with spaces}. Empty strings are expressed as {}: $ip $EQ {}.
Comparison operators have greater precedence over boolean operator. Boolean operators have precedence over each other as they appear in table above. To explicitly specify in that precedence boolean operator should be applied operator may be bracketed with {...}.
For example: $NOT {${field wb} $EQ W $OR ${field wb} $EQ Y}. Here $NOT applied after $OR because $OR is bracketed. Otherwise $NOT would be applied to first comparison and the $OR would be applied.
When comparison is performed first of all string values for corresponding arguments are generated. If both arguments can be converted to integers the they are converted and compared as integers (signed 64 bit). If not they converted to reals (signed 64 bit). If not they compared as string.



Article Details
Article ID: 41
Created On: 03 Mar 2009 09:58 PM

 This answer was helpful  This answer was not helpful

 Back
 Login [Lost Password] 
Email:
Password:
Remember Me:
 
 Search
 Article Options
Home | Register | Submit a Ticket | Knowledgebase | Downloads
Language: