You are viewing an old version of this page. View the current version.

    Compare with Current View Page History

    Version 1 Next »

    This is the quickest and most effective way to decrease the load on your supervisor.  If you only change one thing on your farm from the defaults, make this change.

    Icon

    Windows users: This is a bit tricky to set up with a Windows supervisor and workers. You need to ensure that the worker and supervisor services can access the shared filesystem, which usually means having the shared log directory near the top of a share, and having the log directory itself set to Everyone [Full Control]Full Control As well, the paths supplied in *_logpath must be UNC, as drive letters are not visible to Windows system services.

    By default the job log information is handled with remote log transmission that follows the following process:

    1. the job logs are first stored locally on the Worker
    2. then transmitted from the Worker to the Supervisor
    3. then finally written locally on the Supervisor's filesystem.

    However, the most efficient way is for both the Supervisor and the Worker to share the job log files directly on a common file server mounted by both the Supervisor host and the Worker hosts. In either case, the Supervisor will need to have access to the entire job log directory structure. 

    Similarly the Client should read the job log files direct from disk as well instead of having the Supervisor transmit the files to it.

    On the Supervisor, job logs will be located in <supervisor_logpath>/job. On the Worker, job logs will be located in the <worker_logpath>/job. Both these directories should point to the same location on a shared filesytem.

    Icon

    The permissions on the shared log directory must be world-writable, which on linux and OS X means drwxrwxrwx or mode 0777, and on Window is Everyone [Full Control]

    Steps to set the job log directory on the Supervisor, Worker, and Client

    For the supervisor:

    Set the Supervisor job log directory to control where the supervisor writes the job logs by modifying the supervisor_logpath entry in the supervisor's qb.conf:

    supervisor_logpath = <put shared directory here>

    then restart the supervisor service for the change to take effect.

    For the workers:

    Set the Worker job log directory to control where the supervisor writes the job logs by modiying both the worker_logpath and worker_logmode entries in either the qbwrk.conf (recommended) or each worker's qb.conf:

    worker_logmode = mounted

     worker_logpath = <put shared directory here>

    If you make the changes in the qbwrk.conf on the supervisor, push the changes out with "qbadmin w --reconfigure".  (See:Centralized Worker Configuration - Manually editing the qbwrk.conf).  If you edited each worker's qb.conf, you will need to restart the worker service for the change to take effect.

    For all the clients:

    Set the Client job log directory. Modify the client_logpath entry in each client machine's qb.conf so the client machines will directly access the job log files from disk instead of going through the Supervisor:

    client_logpath = <put shared directory here> 

    To test:

    • submit a new job that is very simple, perhaps one that only runs the "set" command.  You just want a job that starts, prints a few lines, and exits.
    • verify the job log directory is being created in the expected location
      • if not, the supervisor is not set correctly.  Verify and correct, restart the supervisor service, and re-submit to test.
    • verify that the job log directory contains at minimun a .qja and .xja file.  These are written by the supervisor.
    • once the job is complete, verify that the job log directory contains at least a .out file (there will be 1 per job instance).  There should probably also be a .err file.  These are written by the worker.
      • if no .out or .err files exist, or the .out does not contain anything that looks like it came from the job itself, then the workers' logmode and logpath are not set correctly.  Verify and correct, the re-submit to test

     

    • No labels