Stats Jam SQL help

From StatsJam

Jump to: navigation, search

[edit] Writing Stats Jam Queries

Writing a statsjam query is easy. What we call a 'statsjam query' is really two things: an SQL query that extracts and formulates the data you want, and the type of visualisation you want for the data.

To write a statsjam query, you must enclose both the SQL query and visualisation directives in a <statsjam> tag, like so:

 <statsjam>
   ...statsjam query goes here...
 </statsjam>

Inside the <statsjam> tag, the SQL query must itself appear inside a <query> tag. The visualisation is declared by using a <view> tag (much more on this in a moment). For example:

 <statsjam>
   <query>
     select log_num, sin(log_num) 
     from interaction_log 
     where log_num &gt; 10 
     limit 300;
   </query>
   <view type="plot"/>
 </statsjam>

Here's what that statsjam query looks like:

Note: The text inside the <statsjam> tags is XML, and so must obey the XML formatting rules. In particular, if you need to use '<' or '>' characters in your SQL or view tag, then you must encode them as &lt; and &gt;, respectively.

[edit] Stats Jam Toolbar

As you can see, accompanying a statsjam query on a wiki page is a small, textual toolbar that appears as hyperlinks.

The link to this result link provides a direct link to the result of a statsjam query. For textual views (like 'table') this is a link to complete webpage; for image views this is a link directly to the image (that is, it can be used as the 'src' attribute of an HTML image tag.

The show/hide query link toggles the display of the underlying statsjam query. This enables the wiki reader to see exactly what query was being run, and how it is being visualised, without having to switch into the edit view of the wiki page.

The show/hide playground link provides an environment for quickly editing and rendering statsjam query, without having to go through the edit-preview cycle of the entire wiki page. It also allows readers of the wiki to experiment with existing statsjam queries while browsing the page. Showing the playground reveals a textbox and a button labelled "Render query". The textbox initially contains the original statsjam query. The user can edit the query, and then click "Render query" to see the results immediately.

To see this in action, try showing the playground for statsjam query above, and edit the query to change the limit to 50 data points.

Note: When viewing a page, changes to the statsjam query made through the playground are NOT saved! The playground is only a place to experiment.

To change the statsjam query the wiki user must first switch into the edit view of the wiki page, and edit the statsjam query from there. In this mode, when previewing edits to a page, the playground provides an additional button "Update wiki text below". On clicking this, an attempt is made to update the wiki text with the editted query.

[edit] View types

Stats Jam has several different visualisations, called 'views'. Every <statsjam> query has a <view> tag that indicates which view to use to show the results of the query. The following describes each of the different views available.

Table

To view the results of a query as a table use the table view:

  <view type="table"/>

The column names for the table are taken from the column names for the query returned from the database. Thus, you may use the SQL 'AS' modifier to supply a custom name for a column. See the example for details.

The 'table' view has the following options:

caption string - A caption given to the table

Trig. Functions
xsin(x)xcos(x)xtan(x)
0.10.09983341664680.10.9950041652780.10.100334672085
0.20.1986693307950.20.9800665778410.20.202710035509
0.30.2955202066610.30.9553364891260.30.30933624961
0.40.3894183423090.40.9210609940030.40.422793218738
0.50.4794255386040.50.877582561890.50.546302489844
0.60.5646424733950.60.825335614910.60.684136808342
0.70.6442176872380.70.7648421872840.70.842288380463
0.80.71735609090.80.6967067093470.81.02963855705
0.90.7833269096270.90.6216099682710.91.26015821755
1.00.8414709848081.00.5403023058681.01.55740772465

Line Plot

A query that returns only numeric data can be used to produce a line plot:

 <view type="plot">

The query for a 'plot' view must return an even number of columns, and each column must contain numeric data only. The columns are taken in pairs with each pair supplying the x-y coordinates for a line in the plot. That is, the first column in a pair is used as the x coordinates, the second as y coordinates.

The 'plot' view has the following options:

title string - A title for the plot
caption string - A caption for the plot (appears below the plot)
legend boolean - 'yes' or 'no' indicating whether to display a legend for the plot
xlabel string - a label for the x-axis. If not given then the x column name for the first xy pair is used.
ylabel string - a label for the y-axis. If not given then the y column name for the first xy pair is used.
fmt string comma delimited list of matplotlib-style format strings, one for each pair of query columns
alpha number - the alpha value for all plot lines
animated boolean
antialiased boolean
color string
dash_capstyle string - 'butt' or 'round' or 'projecting'
dash_joinstyle string - 'miter' or 'round' or 'bevel'
label string
linestyle ' or 'steps' or 'None' or ' '
linewidth number
lod boolean
marker string - '+' or ',' or '.' or '1' or '2' or '3' or '4'
markeredgecolor string
markeredgewidth number
markerfacecolor string
markersize number
solid_capstyle string - 'butt' or 'round' or 'projecting'
solid_joinstyle string - 'miter' or 'round' or 'bevel'

The 'plot' view is based on the matplotlib plot() function. See the matplotlib documentation for more details.

Trig. Functions

Date Plot

Queries that return date columns can be plotted using the 'plot_date' view:

  <view type="plot_date"/>

The query must return an two columns which are treated as an xy pair. The column types can be either date or numeric, or both. The first column is used as the x-coordinates, and the second column is used as the y-coordinates.

The 'plot_date' view has the same options as the 'plot' view.

The 'plot_date' view is based on the matplotlib plot_date() function. See the matplotlib documentation for more details.

Date Test

Histogram View

A query that returns a single, numeric column can be viewed as a histogram:

  <view type="hist"/>

The 'hist' view has the following options:

bins number - number of bins to use in the histogram. Default 10
bottom number
align string - 'edge' or 'center'
orientation string - 'vertical' or 'horizontal'
width number
log boolean - use log scale?
alpha number
animated boolean
antialiased boolean
edgecolor string
facecolor string
fill boolean
label string
linewidth number
lod boolean

The 'hist' view is based on the matplotlib hist() function. See the matplotlib documentation for more details.

Bi-histogram plot

The 'bihist' view allows a query that return two columns of numeric data to be viewed as two histograms on the same plot.

  <view type="bihist"/>

The options for the 'bihist' view are the same as for the 'hist' view, except that 'orientation' cannot be specified.

The 'bihist' view is based on the matplotlib hist() function. See the matplotlib documentation for more details.

Scatter plot

A query that returns two numeric columns can be viewed as a scatter plot using the 'scatter' view:

 <view type="scatter"/>

The first column is taken as the x-values, and the second as the y-values.

The 'scatter' view has the following options:

s number - size of points
c string
marker string
alpha number
animated boolean
color string
edgecolor string
facecolor string
label string
linewidth number
lod boolean
lw number

The 'scatter' view is based on the matplotlib scatter() function. See the matplotlib documentation for more details.

Sin scatter plot

Personal tools