Icon

This is the documentation for an older version of Qube. The latest version of the documentation can be found here: Qube

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3


The pfx_dw schema is a fairly textbook "star schema", with things like jobs and worker usage in fact tables, and the various things that you might use in a SQL WHERE clause in dimension tables.

The job dimension table, a typical "fact" table.

For example, to get reports about various jobs over time, you'll be querying the pfx_dw.job_fact table:

Code Block
languagesql
mysql> SHOW CREATE TABLE pfx_dw.job_fact\G
*************************** 1. row ***************************
Table: job_fact
Create Table: CREATE TABLE `job_fact` (
`time_sk` int(11) NOT NULL,
`job_id` int(11) DEFAULT NULL,
`job_name` varchar(255) DEFAULT NULL,
`job_priority` int(11) DEFAULT NULL,
`job_timesubmit` int(11) DEFAULT NULL,
`job_timestart` int(11) DEFAULT NULL,
`job_timecomplete` int(11) DEFAULT NULL,
`jobstatus_sk` int(11) DEFAULT NULL,
`user_sk` smallint(5) unsigned DEFAULT NULL,
`cluster_sk` smallint(5) unsigned DEFAULT NULL,
`account_sk` smallint(5) unsigned DEFAULT NULL,
`kind_sk` smallint(5) unsigned DEFAULT NULL,
`prototype_sk` smallint(5) unsigned DEFAULT NULL,
`prod_show_sk` smallint(5) unsigned DEFAULT NULL,
`prod_shot_sk` smallint(5) unsigned DEFAULT NULL,
`prod_seq_sk` smallint(5) unsigned DEFAULT NULL,
`prod_client_sk` smallint(5) unsigned DEFAULT NULL,
`prod_dept_sk` smallint(5) unsigned DEFAULT NULL,
`prod_custom1_sk` smallint(5) unsigned DEFAULT NULL,
`prod_custom2_sk` smallint(5) unsigned DEFAULT NULL,
`prod_custom3_sk` smallint(5) unsigned DEFAULT NULL,
`prod_custom4_sk` smallint(5) unsigned DEFAULT NULL,
`prod_custom5_sk` smallint(5) unsigned DEFAULT NULL,
`cpu_count` int(10) unsigned NOT NULL,
`cpu_seconds` int(10) unsigned NOT NULL,
`work_count` int(10) unsigned NOT NULL,
`worktime_min` int(10) unsigned NOT NULL,
`worktime_max` int(10) unsigned NOT NULL,
`worktime_avg` int(10) unsigned NOT NULL,
`mem_max` bigint(20) unsigned DEFAULT NULL,
`mem_avg` bigint(20) unsigned DEFAULT NULL,
`mem_stddev` bigint(20) unsigned DEFAULT NULL,
UNIQUE KEY `job_id` (`job_id`),
KEY `time_sk` (`time_sk`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

 

"*_sk" columns can be used to do INNER JOINs to a similarly named dimension table

Any column that is named with an _sk suffix is a Synthetic Key that points to a corresponding dimension table, named with the part of the column before the _sk; the dimension table will have a _dim suffix in the name. This way, it's easy to write the JOIN's, the column name is a clue to the dimension table, which will have a column of the same name. Almost every dimension table will consist of a *_sk PRIMARY KEY and a name column.

 

A typical dimension table, the "user_dim" table

For example, the user_sk column can be used to do a SQL INNER JOIN to the user_dim table.

Code Block
languagesql
mysql> SHOW CREATE TABLE pfx_dw.user_dim\G
*************************** 1. row ***************************
Table: user_dim
Create Table: CREATE TABLE `user_dim` (
`user_sk` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`user_sk`),
UNIQUE KEY `name` (`name`)
) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
 
mysql> SELECT name FROM user_dim;
+---------------+
| name          |
+---------------+
| Administrator |
| bar           |
| foo           |
| foobar        |
| fubar         |
| jburk         |
| perforce      |
| root          |
| shinya        |
+---------------+

 

Get a count of all jobs for a particular user:
Code Block
languagesql
mysql> SELECT
	COUNT(*) 
FROM 
	job_fact AS fact
INNER JOIN
	user_dim AS user
ON
	fact.user_sk = user.user_sk
WHERE
	user.name = 'jburk'
;
+----------+
| COUNT(*) |
+----------+
| 5868     |
+----------+

The time dimension table

The pfx_dw.time_dim table is provided so that you don't have to perform date/time operations on every row in a fact table (since they can run into the 100's of millions of rows), instead you do a SQL INNER JOIN to it and use the values in the time_dim table in your WHERE clause. The time_sk column in every fact table has an identical value in the time_dim table which has a single row with a primary key time_sk.  The time_sk value is actually the unix epoch time in seconds:

Code Block
languagesql
mysql> SELECT 
	* 
FROM 
	time_dim 
ORDER BY 
	time_sk 
DESC LIMIT 1
;
+------------+---------------------+------+------------+-----+------------+-------+------+
| time_sk    | date_time           | hour | date       | dow | month_name | month | year |
+------------+---------------------+------+------------+-----+------------+-------+------+
| 1392339600 | 2014-02-13 17:00:00 | 17   | 2014-02-13 | 5   | February   | 2     | 2014 |
+------------+---------------------+------+------------+-----+------------+-------+------+
1 row in set (0.00 sec)

mysql> SELECT 
	FROM_UNIXTIME(time_sk)
	, date_time 
FROM 
	time_dim 
ORDER BY 
	time_sk 
DESC LIMIT 1
;
+------------------------+---------------------+
| FROM_UNIXTIME(time_sk) | date_time           |
+------------------------+---------------------+
| 2014-02-13 17:00:00    | 2014-02-13 17:00:00 |
+------------------------+---------------------+

The "job status" dimension table

The pfx_dw.jobstatus_dim table is one of the few exceptions to the normal dimension table structure; it provides a mapping between the integer and human-readable status values.

Code Block
languagesql
mysql> SELECT * FROM jobstatus_dim;
+--------------+------------+-------------+----------------+-------------+
| jobstatus_sk | status_int | status_char | effective_date | expiry_date |
+--------------+------------+-------------+----------------+-------------+
| 1            | 16         | complete    | 1999-12-31     | 9999-12-31  |
| 2            | 32         | failed      | 1999-12-31     | 9999-12-31  |
| 3            | 48         | killed      | 1999-12-31     | 9999-12-31  |
| 4            | 272        | blocked     | 1999-12-31     | 9999-12-31  |
| 5            | 288        | waiting     | 1999-12-31     | 9999-12-31  |
| 6            | 304        | suspended   | 1999-12-31     | 9999-12-31  |
| 7            | 320        | pending     | 1999-12-31     | 9999-12-31  |
| 8            | 323        | waiting     | 1999-12-31     | 9999-12-31  |
| 9            | 325        | badlogin    | 1999-12-31     | 9999-12-31  |
| 10           | 336        | running     | 1999-12-31     | 9999-12-31  |
| 11           | 261        | dying       | 1999-12-31     | 9999-12-31  |
+--------------+------------+-------------+----------------+-------------+

 

Get a count of all jobs for a particular user for January, 2014:
Code Block
languagesql
mysql> SELECT
	COUNT(*) 
FROM 
	job_fact AS fact
INNER JOIN
	user_dim AS user
ON
	fact.user_sk = user.user_sk
INNER JOIN
	time_dim AS time
ON
	fact.time_sk=time.time_sk
WHERE
	user.name = 'jburk'
AND
	time.month = 1
AND 
	time.year = 2014
;
+----------+
| COUNT(*) |
+----------+
| 83       |
+----------+
Get a count of all jobs for each user for all of 2013:
Code Block
languagesql
mysql> SELECT
	user.name
	, time.month_name
	, COUNT(*) as "job count" 
FROM 
	job_fact AS fact
INNER JOIN
	user_dim AS user
ON
	fact.user_sk = user.user_sk
INNER JOIN
	time_dim AS time
ON
	fact.time_sk=time.time_sk
WHERE
	time.year = 2013
GROUP BY
	user.name
	, time.month
ORDER BY
	user.name
	, time.month
;
+---------------+------------+-----------+
| name          | month_name | job count |
+---------------+------------+-----------+
| Administrator | January    | 4         |
| bar           | July 		 | 1  	 	 |
| foo           | July 		 | 2 	 	 |
| foobar        | July 		 | 1 		 |
| foobar        | September  | 2 		 |
| fubar         | July		 | 1 		 |
| jburk         | March		 | 1 		 |
| jburk         | May		 | 123 		 |
| jburk         | June		 | 24 		 |
| jburk         | July		 | 220 		 |
| jburk         | August	 | 140 		 |
| jburk         | September	 | 68 		 |
| jburk         | October	 | 560 		 |
| jburk         | November	 | 59 		 |
| jburk         | December	 | 4467 	 |
| perforce		| January	 | 128 		 |
| perforce		| February	 | 4 		 |
| root 			| January	 | 31 		 |
| root 			| February	 | 37 		 |
| root 			| March		 | 7 		 |
| root 			| April		 | 17 		 |
| root 			| May		 | 7 		 |
| root 			| June		 | 1 		 |
| root 			| July		 | 11 		 |
| root 			| September	 | 2 		 |
| root 			| December	 | 2 		 | 
| shinya 		| January	 | 23 		 |
| shinya 		| February	 | 69 		 |
| shinya 		| March		 | 7 		 |
| shinya 		| April		 | 20 		 |
| shinya 		| May		 | 2 		 |
+---------------+------------+-----------+

Get a count of all jobs for each user for all of 2013, broken down by month and the job's final status:
Code Block
languagesql
mysql> SELECT
	user.name
	, time.year
	, time.month_name
	, status.status_char
	, COUNT(*) as "job count" 
FROM 
	job_fact AS fact
INNER JOIN
	user_dim AS user
ON
	fact.user_sk=user.user_sk
INNER JOIN
	time_dim AS time
ON
	fact.time_sk=time.time_sk
INNER JOIN
	jobstatus_dim AS status
ON 
	fact.jobstatus_sk=status.jobstatus_sk
WHERE
	time.year = 2013
GROUP BY
	user.name
	, time.month
	, status.status_int
ORDER BY
	user.name
	, time.month
	,status.status_char
;
+---------------+------+------------+-------------+-----------+
| name 			| year | month_name | status_char | job count |
+---------------+------+------------+-------------+-----------+
| Administrator | 2013 | January 	| failed	  | 4		  |
| bar 			| 2013 | July 		| complete	  | 1		  |
| foo 			| 2013 | July 		| complete	  | 2		  |
| foobar 		| 2013 | July 		| complete	  | 1 		  |
| foobar 		| 2013 | September 	| complete	  | 2 		  |
| fubar 		| 2013 | July 		| complete	  | 1		  |
| jburk 		| 2013 | March 		| complete	  | 1		  |
| jburk 		| 2013 | May 		| complete	  | 100		  |
| jburk 		| 2013 | May 		| failed	  | 11		  |
| jburk 		| 2013 | May 		| killed	  | 12		  |
| jburk 		| 2013 | June 		| complete	  | 4		  |
| jburk 		| 2013 | June 		| failed	  | 20		  |
| jburk 		| 2013 | July 		| complete	  | 134		  |
| jburk 		| 2013 | July 		| failed	  | 33		  |
| jburk 		| 2013 | July 		| killed	  | 53		  |
| jburk 		| 2013 | August 	| complete	  | 67		  |
| jburk 		| 2013 | August 	| failed	  | 25		  |
| jburk 		| 2013 | August 	| killed	  | 48		  |
| jburk 		| 2013 | September 	| complete	  | 38		  |
| jburk 		| 2013 | September 	| failed	  | 17  	  |
| jburk 		| 2013 | September 	| killed	  | 13  	  |
| jburk 		| 2013 | October 	| complete	  | 450		  |
| jburk 		| 2013 | October 	| failed	  | 61		  |
| jburk			| 2013 | October 	| killed	  | 49		  |
| jburk 		| 2013 | November 	| complete	  | 38		  |
| jburk 		| 2013 | November 	| failed	  | 12		  |
| jburk 		| 2013 | November 	| killed	  | 9		  |
| jburk 		| 2013 | December 	| complete	  | 3812	  |
| jburk 		| 2013 | December 	| failed	  | 627		  |
| jburk 		| 2013 | December 	| killed	  | 28		  |
| perforce 		| 2013 | January 	| failed	  | 46		  |
| perforce 		| 2013 | January 	| killed	  | 82		  |
| perforce 		| 2013 | February 	| complete	  | 3		  |
| perforce 		| 2013 | February 	| killed	  | 1		  |
| root 			| 2013 | January 	| complete	  | 24		  |
| root 			| 2013 | January 	| failed	  | 6		  |
| root 			| 2013 | January 	| killed	  | 1		  |
| root 			| 2013 | February 	| complete	  | 34 		  |
| root 			| 2013 | February 	| killed	  | 3 		  |
| root 			| 2013 | March 		| complete	  | 7		  |
| root 			| 2013 | April 		| complete	  | 9		  |
| root 			| 2013 | April 		| failed	  | 4		  |
| root 			| 2013 | April 		| killed	  | 4		  |
| root 			| 2013 | May 		| complete	  | 6		  |
| root 			| 2013 | May 		| killed	  | 1		  |
| root 			| 2013 | June 		| complete	  | 1		  |
| root 			| 2013 | July 		| complete	  | 7		  |
| root 			| 2013 | July 		| failed	  | 1		  |
| root 			| 2013 | July 		| killed	  | 3		  |
| root 			| 2013 | September 	| complete	  | 1		  |
| root 			| 2013 | September 	| failed	  | 1		  |
| root 			| 2013 | December 	| complete	  | 2		  |
| shinya 		| 2013 | January 	| complete	  | 4		  |
| shinya 		| 2013 | January 	| failed	  | 7 		  |
| shinya 		| 2013 | January 	| killed	  | 12		  |
| shinya 		| 2013 | February 	| complete	  | 60		  |
| shinya 		| 2013 | February 	| failed	  | 2		  |
| shinya 		| 2013 | February 	| killed	  | 7		  |
| shinya 		| 2013 | March 		| complete	  | 2		  |
| shinya 		| 2013 | March 		| failed	  | 2		  |
| shinya 		| 2013 | March 		| killed	  | 3		  |
| shinya 		| 2013 | April 		| complete	  | 16		  |
| shinya 		| 2013 | April 		| killed	  | 4		  |
| shinya 		| 2013 | May 		| killed	  | 2		  |
+---------------+------+------------+-------------+-----------+

Get the sum total of cpu_seconds used for each user for the last 7 days, broken down by user, date, and the job's final status:
Code Block
languagesql
SELECT
    user.name
    , time.date
    , status.status_char
    , SUM(fact.cpu_seconds) as "cpu_time"
FROM
    job_fact AS fact
INNER JOIN
    user_dim AS user
ON
    fact.user_sk=user.user_sk
INNER JOIN
    time_dim AS time
ON
    fact.time_sk=time.time_sk
INNER JOIN
    jobstatus_dim AS status
ON
    fact.jobstatus_sk=status.jobstatus_sk
WHERE
    DATEDIFF(CURDATE(), time.date_time) < 7
GROUP BY
    user.name
    , time.date
    , status.status_int
ORDER BY
    time.date
	, cpu_time DESC
    , status.status_char
;
 
+--------+------------+-------------+----------+
| name   | date       | status_char | cpu_time |
+--------+------------+-------------+----------+
          << snipped >>
| jburk  | 2014-07-14 | complete    |   351036 |
| jburk  | 2014-07-14 | killed      |    60029 |
| jburk  | 2014-07-14 | failed      |      139 |
| coxj   | 2014-07-14 | killed      |       98 |
| garza  | 2014-07-14 | killed      |        0 |
| jburk  | 2014-07-15 | complete    |    28910 |
| fubar  | 2014-07-15 | complete    |    18610 |
| foobar | 2014-07-15 | complete    |    18561 |
| jburk  | 2014-07-15 | killed      |    16967 |
| jburk  | 2014-07-15 | failed      |       27 |
| jburk  | 2014-07-16 | complete    |    46797 |
| jburk  | 2014-07-16 | killed      |    17136 |
| jburk  | 2014-07-16 | failed      |        2 |
          << snipped >>
+--------+------------+-------------+----------+