Manager should detect starvation due to blacklisting #59491

Closed
opened 2018-12-17 13:57:17 +01:00 by Sybren A. Stüvel · 3 comments

The solution for #50981 does NOT include starvation detection. In other words, if a job has certain tasks that were failed by all available workers (and thus all workers are blacklisted for this job & task type) there is no detection that this happened. As a result, the job will be stuck in 'active' status without it ever having a chance of being finished.

We should have the Manager regularly inspect queued tasks, to see if there is still at least one worker that is not blacklisted and able to execute them. If not, we can mark those tasks as 'failed' to reflect the actual failure on each worker.

The solution for #50981 does NOT include starvation detection. In other words, if a job has certain tasks that were failed by all available workers (and thus all workers are blacklisted for this job & task type) there is no detection that this happened. As a result, the job will be stuck in 'active' status without it ever having a chance of being finished. We should have the Manager regularly inspect queued tasks, to see if there is still at least one worker that is not blacklisted and able to execute them. If not, we can mark those tasks as 'failed' to reflect the actual failure on each worker.
Sybren A. Stüvel self-assigned this 2018-12-17 13:57:17 +01:00
Author
Owner

Added subscriber: @dr.sybren

Added subscriber: @dr.sybren

This issue was referenced by archive/flamenco-manager@72c46706ea

This issue was referenced by archive/flamenco-manager@72c46706ea801c7400cf5659d888dcf9bb05a6f0
Author
Owner

Changed status from 'Open' to: 'Resolved'

Changed status from 'Open' to: 'Resolved'
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: studio/flamenco#59491
No description provided.