Icon

    This is the documentation for for an older version of Qube.

    Documentation for the latest release is available here.

    You are viewing an old version of this page. View the current version.

    Compare with Current View Page History

    « Previous Version 2 Current »


    The pfx_dw schema is a fairly textbook star schema, with things like jobs and worker usage in fact tables, and the various things that you might use in a SQL WHERE clause in dimension tables.

    The job dimension table, a typical "fact" table.

    For example, to get reports about various jobs over time, you'll be querying the pfx_dw.job_fact table:

    The "job_time*" columns are stored in UNIX epoch time

    Icon

     

    To convert them to human-readable format use the MySQL FROM_UNIXTIME() function.

     

    mysql> SELECT job_id, job_timesubmit, FROM_UNIXTIME(job_timesubmit) FROM pfx_dw.job_fact WHERE job_id=98269;
    +--------+----------------+-------------------------------+
    | job_id | job_timesubmit | FROM_UNIXTIME(job_timesubmit) |
    +--------+----------------+-------------------------------+
    |  98269 |     1414709214 2014-10-30 15:46:54           |
    +--------+----------------+-------------------------------+

     

    "*_sk" columns can be used to do INNER JOINs to a similarly named dimension table

    Any column that is named with an _sk suffix is a Synthetic Key that points to a corresponding dimension table, named with the part of the column before the _sk; the dimension table will have a _dim suffix in the name. This way, it's easy to write the JOIN's, the column name is a clue to the dimension table, which will have a column of the same name. Almost every dimension table will consist of a *_sk PRIMARY KEY and a name column.

     

    A typical dimension table, the "user_dim" table

    For example, the user_sk column can be used to do a SQL INNER JOIN to the user_dim table.

     

    Get a count of all jobs for a particular user:

    The time dimension table

    The pfx_dw.time_dim table is provided so that you don't have to perform date/time operations on every row in a fact table (since they can run into the 100's of millions of rows), instead you do a SQL INNER JOIN to it and use the values in the time_dim table in your WHERE clause. The time_sk column in every fact table has an identical value in the time_dim table which has a single row with a primary key time_sk.  The time_sk value is actually the unix epoch time in seconds:

    The "job status" dimension table

    The pfx_dw.jobstatus_dim table is one of the few exceptions to the normal dimension table structure; it provides a mapping between the integer and human-readable status values.

     

    Get a count of all jobs for a particular user for January, 2014:
    Get a count of all jobs for each user for all of 2013:

    Get a count of all jobs for each user for all of 2013, broken down by month and the job's final status:
    Get the sum total of cpu_seconds used for each user for the last 7 days, broken down by user, date, and the job's final status:

     

    SELECT
        user.name
        time.date
        , status.status_char
        SUM(fact.cpu_seconds) as "cpu_time"
    FROM
        job_fact AS fact
    INNER JOIN
        user_dim AS user
    ON
        fact.user_sk=user.user_sk
    INNER JOIN
        time_dim AS time
    ON
        fact.time_sk=time.time_sk
    INNER JOIN
        jobstatus_dim AS status
    ON
        fact.jobstatus_sk=status.jobstatus_sk
    WHERE
        DATEDIFF(CURDATE(), time.date_time) < 7
    GROUP BY
        user.name
        time.date
        , status.status_int
    ORDER BY
        time.date
        , cpu_time DESC
        , status.status_char
    ;
     
    +--------+------------+-------------+----------+
    name   date       | status_char | cpu_time |
    +--------+------------+-------------+----------+
              << snipped >>
    | jburk  | 2014-07-14 | complete    |   351036 |
    | jburk  | 2014-07-14 | killed      |    60029 |
    | jburk  | 2014-07-14 | failed      |      139 |
    | coxj   | 2014-07-14 | killed      |       98 |
    | garza  | 2014-07-14 | killed      |        0 |
    | jburk  | 2014-07-15 | complete    |    28910 |
    | fubar  | 2014-07-15 | complete    |    18610 |
    | foobar | 2014-07-15 | complete    |    18561 |
    | jburk  | 2014-07-15 | killed      |    16967 |
    | jburk  | 2014-07-15 | failed      |       27 |
    | jburk  | 2014-07-16 | complete    |    46797 |
    | jburk  | 2014-07-16 | killed      |    17136 |
    | jburk  | 2014-07-16 | failed      |        2 |
              << snipped >>
    +--------+------------+-------------+----------+

     

     

     

    • No labels