Qube 7.5 Complete Release Notes

##############################################################################
#
# Qube Release Notes
#
##############################################################################

##############################################################################
@RELEASE: 7.5-2

==== CL 23290 ====
@CHANGE: support Apple-provided Python 3.8, instead of Home Brew version

==== CL 23262 ====
@CHANGE: postgresql startup scripts no longer control where the logs go (now it's specified in postgresql.conf)

JIRA: QUBE-3888

==== CL 23261 ====
@NEW: Add Qube-custom logging parameters to postgresql.conf. Now PGSQL logs are written to DATADIR/pg_log/pgsql.log on ALL platforms.

Formerly, log wasn't written to a file on Windows.

JIRA: QUBE-3888

==== CL 23260 ====
@FIX: some parsing issues in configure_postgresql_conf.py script

* bug where commented out parameters would end up in the end section of the postgresql.conf file, unless the commented out value exactly matches our value

* fixed regex to match better (had issues with params with empty value '', and with whitespace)

==== CL 23259 ====
@UPDATE: add signal handler for segfault in QbPreForkDaemon to print stack trace upon the said signal

==== CL 23258 ====
@FIX: a couple of issues with python scripts that export MySQL data and import it into PostgreSQL

* export script: quoting issue with the path to MySQL executable
* import script: jobid data must be imported before job data to satisfy a constraint added since 7.5-0

ZD: 21118

==== CL 23253 ====
@FIX:binarySort Python 3 compatibility

==== CL 23227 ====
@FIX: proxy program (proxy.exe) crashing under certain Windows environments

ZD: 21090

==== CL 23203 ====
@UPDATE:Edits for database_checks.py

==== CL 23201 ====
@FIX:Encoding xdrlib integers 128 and greater

==== CL 23190 ====
@NEW:Add Media Encoder jobtype and serverBackend base class

==== CL 23163 ====
@CHANGE: make sure that the "pfx" db is created with its Encoding set to 'UTF8'

JIRA: QUBE-3865

==== CL 23160 ====
@NEW: Adding back the slotcount_fact table as well as the cached "data subset" tables to the Data Warehouse (dwh)

JIRA: QUBE-3867
ZD: 20996

* Add back, to dwh, the slotcount_fact table and all releated .sql scripts:
create_hostState_dim.sql
create_slotCount_fact.sql
create_worker_dim.sql
populate_slotCount.sql
regular_slotCount.sql

* Also add upgrade_scripts/upgrade_v2.sql, and updated datawarehouse_version to 2.

In upgrade_v2.sql, reimplemented the PFX_CREATE_DATASUBSET_TABLE function (that was available in our MySQL dwh), and also added commands to source the create*.sql and populate_slotCount.sql files mentioned above to build the necessary tables.

* Add back cron jobs (Linux, macOS) and scheduledTask (Windows) that periodically run the regular_slotCount.sql collector, and build the "data subset" tables that contain data for a limited time-range (12hr, 36hr, 7day, 3wk, and 3mo) for faster preset queries for charting:

Linux: qube/etc/cron.d/com.pipelinefx.DataWarehouse.cron

macOS: add back qube/datawh/data_collectors/osx*.sh scripts in the data_collectors subdir, and their corresponding macOS cron drivers, qube/etc/com.pipelinefx.DataWarehouse.*.plist, qube/pkg/supepkg.pl and qubepkg.pm to package them up into the supervisor installer

Windows: changes made to qubemsi.pm and rrd_tables.bat to install and enable the scheduledTasks when the supervisor MSI installer is run

==== CL 23146 ====
@FIX: Python API: qb.rangepartition(), and consequently qb.genpartions(), broken with python3, returning an empty list to valid input.

ZD: 20992
JIRA: QUBE-3863

##############################################################################

##############################################################################
#
# Qube Release Notes
#
##############################################################################

##############################################################################
@RELEASE: 7.5-1

This is a patch release of the qube-core (Linux only) and qube-supervisor
packages, to fix a few key issues found in the 7.5-0 release.

If you use Python-based callbacks on Linux or macOS supervisor, or the DRA
(required for Metered Licensing and the Online System Metrics) on Linux,
you'll want to upgrade.

==== CL 23052 ====
@FIX: include file path to the python execution module in the call to exec() so that qb.backend.utils.getModulePath() works properly

Also modified internals of getModulePath() in qb/backend/utils.py and qb/util/__init__.py to be more compatible across python versions

ZD: 20941

==== CL 23034 ====
@FIX: wildcard and regular expression (regex) filtering features in tools such as "qbjobs" broken since 7.0

The filtering features now work as described in doc at:
http://docs.pipelinefx.com/display/QUBE/Using+Wildcards+and+Regular+Expressions

JIRA: QUBE-3844

==== CL 23022 ====
@FIX: Python-based jobtype backends make undesirable calls to qb.reportjob(), rendering instance-based postflight return values useless

pyCmdrange, pyCmdline, appFinder, and pyCmdrangeGPU jobtypes were making a call to qb.reportjob() at the end of their execute.py scripts which is undesirable, and causing the return values of instance-based postflights to be meaningless. For example, a job instance could "complete", but the postflight that runs right after it could return non-zero indicating that the instance should be marked "failed". However, since the job instance was reporting "complete" via its own call to qb.reportjob() before running the postflight, the status that was amended by the postflight never took effect (it was too late to reach the supe).

JIRA: QUBE-3836

==== CL 22906 ====
@NEW:Add VRED 2021 support

==== CL 22873 ====
@FIX: additional fixes for "DataWH collectors aren't run by cron on Ubuntu"

JIRA: QUBE-3846

==== CL 22872 ====
@FIX: DataWH collectors aren't run by cron on Ubuntu

JIRA: QUBE-3846

==== CL 22845 ====
@FIX: add openssl 1.1.1h lib files to qube core package, which are needed by the Qt 5.14.2 network module used by the DRA.

JIRA: QUBE-3833

==== CL 22828 ====
@FIX: supervisor's embedded python3 interpreter fails running callbacks with error: ModuleNotFoundError: No module named '_struct'

ZD: 20850
JIRA: QUBE-3832

==== CL 22748 ====
@FIX: issue where some files/folders won't be deleted upon MSI uninstall.

==== CL 22735 ====
@FIX: Uninstalling supervisor on Windows now removes the postgresql software (but preserves the data directory).

@FIX: Windows: the 7zip self-extraction of PostgreSQL software throws an "ERROR: can not delete output file" error when upgrading supervisor

@CHANGE: Installation of supervisor on Windows now first makes sure that the previous installation of postgresql is removed.

JIRA: QUBE-3819, QUBE-3817

ZD: 20811, 20816

##############################################################################
#
# Qube Release Notes
#
##############################################################################

##############################################################################
@RELEASE: 7.5-0

==== CL 22602 ====
@NEW: add one more file for initialization of the central preferences database, qubedb_prep_0054.sql, which creates the "prefs" DB user.

JIRA: QUBE-3795

==== CL 22597 ====
@CHANGE: implemented code to make changes to postgresql.conf and pg_hba.conf files on installation, needed to support the new central preferences feature.

JIRA: QUBE-3795

==== CL 22596 ====
@CHANGE: "pfx" account's default password changed to a longer one (for new installations, except on Linux)

JIRA: QUBE-3667

==== CL 22593 ====
@NEW: add SQL to initialize the central preferences database.

JIRA: QUBE-3795

==== CL 22592 ====
@FIX: init_supe_db.py: make all calls to the "psql" command using the DB owner

==== CL 22458 ====
@CHANGE: point PYTHONPATH to $QBDIR/lib/python3.8 before supervisor service is started, for its embedded python interpreter (Linux, macOS)

==== CL 22288 ====
@NEW: add "disable_central_prefs" flag to supervisor_flags

JIRA: QUBE-3778

==== CL 22230 ====
@FIX: a bunch more fixes to make python-based backends to work properly with python3 while maintaining python2 compatibility.

Now if a python-based jobtype's job.conf specifies "execute_binding = Python" or "execute_binding = Python3", python3 will be used. If "execute_binding = Python2", python2 is used.

JIRA: QUBE-3747

==== CL 22190 ====
@CHANGE: Switch supervisor's embedded Python interpreter to python3.8.

JIRA: QUBE-2762, QUBE-3749

==== CL 22179 ====
@CHANGE: made Linux installation (RPM and DEB for CentOS/RHEL and Ubuntu, repectively) require "python3"

JIRA: QUBE-3767

==== CL 22177 ====
@CHANGE: convert python-based jobtypes (appFinder, pyCmdline, pyCmdrange) qube/types/ from python2 to python3

JIRA: QUBE-3747

==== CL 22176 ====
@CHANGE: convert example python scripts in qube/examples/python from python2 to python3

JIRA: QUBE-3746

==== CL 22175 ====
@CHANGE: convert python scripts in qube/scripts from python2 to python3

JIRA: QUBE-3745

==== CL 22174 ====
@CHANGE: convert python scripts in qube/utils from python2 to python3

JIRA: QUBE-3745

==== CL 22081 ====
@FIX: "pfx" user to be created w/o a home directory now, in the install_supervisor script, which is used to do some initialization on DEB-based Linux platforms (i.e. Ubuntu).

Was previously set up to create a home dir, causing the DEB qube-supe
installation to exit prematurely when the root user doesn't have write
permissions to create "/home/pfx" (e.g. NFS-mounted /home).

Now the "useradd" command points to "/var/tmp" as the pfx user's homedir.

==== CL 22080 ====
@NEW: add perl 5.30 support (for Ubuntu 20.04 LTS)

JIRA: QUBE-3721

==== CL 22077 ====
@FIX: "pfx" user to be created w/o a home directory now, in the install_supervisor script, which is used to do some initialization on RPM-based Linux platforms.

Was previously set up to create a home dir, causing the RPM qube-supe
installation to exit prematurely when the root user doesn't have write
permissions to create "/home/pfx" (e.g. NFS-mounted /home).

Now the "useradd" command points to "/usr/tmp" as the pfx user's homedir.

==== CL 22057 ====
@NEW: Ubuntu 18.04 and 20.04 support

JIRA: QUBE-3720, QUBE-3721

==== CL 22039 ====
@NEW: Add Python 3.7 (standard) and 3.8 (homebrew) API support on macOS

==== CL 22035 ====
@NEW: Add Python 3.6, 3.7, and 3.8 API support for Windows.

JIRA: QUBE-2762

==== CL 22030 ====
@NEW: Add python 3.6, 3.7, and 3.8 Qube API support on Linux.

==== CL 21802 ====
@CHANGE: macOS to build Qube core with Qt 5.14.2

JIRA: QUBE-3688

==== CL 21801 ====
@CHANGE: remove python 2.6 support from all platforms

JIRA: QUBE-3691

==== CL 21800 ====
@CHANGE: Linux to build with Qt5.14.2

JIRA: QUBE-3688

==== CL 21769 ====
@NEW: add Python 3.8 compatibility to the main Qube Python API , including its supporting .py scripts.

JIRA: QUBE-2762

==== CL 21731 ====
@FIX: Fix crash when no options are given. Now usage message is printed when no args are present.
@CHANGE: Made the "checks" arguments instead of options.
@INTERNAL: refactored a bunch of stuff.

JIRA: QUBE-3206

==== CL 21644 ====
@FIX: not all child processes of job instances sometimes not dying properly when parent thread dies

ZD: 20225

==== CL 21641 ====
@FIX: not all child processes of job instances sometimes not dying properly when parent thread dies

ZD: 20225

==== CL 21556 ====
@FIX: fixed problem where a job can get stuck in "dying" state due to a timing-related issue.

This was causing, among other things, global resources to not be released properly.

ZD: 20307

==== CL 21432 ====
@CHANGE: install_supervisor script: install Data W/H DB by calling $QBDIR/datawh/install_datawarehouse_db.sh

==== CL 21385 ====
@TWEAK: Don't give up on the first error in enableRequiredPrivileges(), but try enabling all privileges. Also print number of errors.

==== CL 21360 ====
@FIX: Agressively preempted frames can get missed and left in "pending" while instances all finish

ZD: 20177

==== CL 21351 ====
@CHANGE: add SE_DEBUG_NAME to list of privileges to be enabled; also add more info to print to workerlog

* add SE_DEBUG_NAME to the list of privileges to be enabled
* print WARNING when OpenProcess() fails in cleanup(), and the reason
* add instance name to print in more output lines when available/applicable

==== CL 21242 ====
@FIX: On host reboot, supervisor needs to start after postgresql is started and ready

* added code to check the DB connection at supervisor's boot time, and retry after 10 seconds, up to 6 attempts (1 minute),
effectively delaying the supervisor boot until after the DB is ready.

JIRA: QUBE-3637

==== CL 21239 ====
@NEW: add a way to tell qbjobinfo() API routine to only query for and pull selective job data (aka "columns" or "fields").

Developers using the Qube C++ and/or Python API can now tell qbjobinfo() routine (qb.jobinfo() for Python) to only query
for and pull selective job data (aka "columns" or "fields"), for leaner, meaner, more economical queries.

* Add support for explicitly specifying needed fields in C++ API's qbjobinfo().
* Add support for explicitly specifying needed fields in Python API's qb.jobinfo(), a la 6.10's direct query API.
* Also add "-fields" option to qbjobs
* qbjobs now makes leaner queries by default (unless an option to display details is specified, like "-long" or "-notes")

[Examples]

C++:
QbString query_fields_str = "id,username,status";
QbStringList query_fields;
QbExpression::split(query_fields_str, query_fields);
QbQuery query;
for(int i = 0; i < query_fields.length(); i++) {
QbField *f = new QbField(*query_fields.get(i));
query.fields().push(f);
}
QbJobList jobs;
qbjobinfo(query, jobs)

Python:
jobs = qb.jobinfo(fields = ['id','username','status'])

JIRA: QUBE-3623
ZD: 19955

==== CL 21234 ====
@FIX: Add timeout for agenda-based jobs stuck in "running" status, in a "waiting" loop.

TL;DR:
Sometimes, agenda-based job instances can get stuck in the "running" state, in a "wating" loop. A timeout, currently hardcoded to 60 seconds, has been added to force those jobs to break out of the loop.

Details:
Sometimes, agenda-based job instances can get stuck in a "wating" loop, with messages like the following repeating indefinitely in the job's stdout:

[Dec 20, 2019 18:23:46] HOSTNAME[47572]: requesting work for: 424805.0
[Dec 20, 2019 18:23:46] HOSTNAME[47572]: got work: -1: - waiting
[Dec 20, 2019 18:23:46] HOSTNAME[47572]: INFO: informing worker[127.0.0.1]
INFO: told to wait & retry from supe-- sleeping for [7] seconds

A job instance stuck in this state can tie up a worker's job slot(s) until it is manally intervened with (killed, migrated, etc), or
until it hits its "subjob timeout" (assuming the job was setup with it).

This issue, newly introduced in 7.x, has been found to happen due to race conditions.

It is particularly likely to occur when the following conditions are met:
* jobs have the migrate_on_frame_retry job flag set AND they use retrywork/retrysubjob
* job instances fail quickly (i.e. job process/renderer crashes and exits quickly)
* there are idle workers

(There are other scenarios that this can also happen, such as when aggressive preemption is done
rapidly, but there's normally not many idle workers when preemptions do happen, so it's less likely.)

In a nutshell:
* instance fails on a worker
* supe detect the failure, migrates and starts the instance on a new worker
* the new worker reports the instance now "running"
* the first worker finishes cleaning up and reports that the instance is now "pending"
* instance gets stuck in a "wating" loop on the new worker.

A timeout, currently hardcoded to 60 seconds, has been added to force those jobs to break out of the infinite loop.

ZD: 19977, 20094, 19967
JIRA: QUBE-3638

==== CL 21217 ====
@FIX: Add timeout for agenda-based jobs stuck in "running" status, in a "waiting" loop.