Page MenuHome

Task: Graph Flow Task Scheduling
ClosedPublic

Authored by Jeroen Bakker (jbakker) on Apr 30 2020, 5:19 PM.

Details

Summary

Add TBB::flow graph scheduling to BLI_task.

Using flow graphs, a graph of nodes (tasks) and links can be defined.
Work can flow though the graph. During this process the execution of the nodes will be
scheduled among the available threads.

We are planning to use this to improve the threading in the draw manager.

The implemented API is still limited it only supports sequential flows. Joins and buffers
are not supported. We could eventually support them as part of an CPP API. These features
from uses compile time templates and are hard to make a clean C-API for this.

Diff Detail

Repository
rB Blender

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Jeroen Bakker (jbakker) retitled this revision from [WIP] Task: Graph Flow Task Scheduling to Task: Graph Flow Task Scheduling.May 1 2020, 11:59 AM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)May 1 2020, 12:51 PM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)May 1 2020, 3:02 PM

Added a free func for the node user data

Brecht Van Lommel (brecht) requested changes to this revision.EditedMay 1 2020, 4:38 PM

This is using function_node which takes an input and produces an output for the next node(s) that depend on it. But what is passed along seems to be a pointer to memory that can be written to by multiple tasks at the same time? And that's doesn't seem thread-safe.

If passing along messages is the intent then I think this would need callbacks to copy and free messages. Which all gets rather complicated and would probably be easier with the TBB C++ API directly.

If the intent is just to execute some tasks that don't pass along messages, then a continue_node could be used instead.
https://www.threadingbuildingblocks.org/docs/help/reference/flow_graph/dependency_flow_graph_example.html

As far as I know the the current dependency graph and draw manager don't pass along messages, but I haven't checked.

source/blender/blenlib/intern/task_graph.cc
110

This memory allocation can be avoided, by making tbb_node not a pointer.

This revision now requires changes to proceed.May 1 2020, 4:38 PM
Jeroen Bakker (jbakker) updated this revision to Diff 24293.EditedMay 1 2020, 5:17 PM
Jeroen Bakker (jbakker) marked an inline comment as done.
  • removed the TaskMessage
  • use continue_node
  • optimized allocation of the tbb_node

@Brecht Van Lommel (brecht) I was looking to the draw manager and there it isn't needed. I was passing around data that was not logical. I think we should start small with this.

It is also a long time that I did CPP coding so any feedback on CPP style is welcome.

1diff --git a/source/blender/draw/intern/draw_cache_extract_mesh.c b/source/blender/draw/intern/draw_cache_extract_mesh.c
2index 401f5c76a02..919fc3084b9 100644
3--- a/source/blender/draw/intern/draw_cache_extract_mesh.c
4+++ b/source/blender/draw/intern/draw_cache_extract_mesh.c
5@@ -126,15 +126,14 @@ typedef struct MeshRenderData {
6 float (*poly_normals)[3];
7 int *lverts, *ledges;
8 } MeshRenderData;
9-
10 static MeshRenderData *mesh_render_data_create(Mesh *me,
11 const bool is_editmode,
12 const bool is_paint_mode,
13 const float obmat[4][4],
14 const bool do_final,
15 const bool do_uvedit,
16- const eMRIterType iter_type,
17- const eMRDataType data_flag,
18+ const eMRIterType UNUSED(iter_type),
19+ const eMRDataType UNUSED(data_flag),
20 const DRW_MeshCDMask *UNUSED(cd_used),
21 const ToolSettings *ts)
22 {
23@@ -144,9 +143,6 @@ static MeshRenderData *mesh_render_data_create(Mesh *me,
24
25 copy_m4_m4(mr->obmat, obmat);
26
27- const bool is_auto_smooth = (me->flag & ME_AUTOSMOOTH) != 0;
28- const float split_angle = is_auto_smooth ? me->smoothresh : (float)M_PI;
29-
30 if (is_editmode) {
31 BLI_assert(me->edit_mesh->mesh_eval_cage && me->edit_mesh->mesh_eval_final);
32 mr->bm = me->edit_mesh->bm;
33@@ -221,7 +217,32 @@ static MeshRenderData *mesh_render_data_create(Mesh *me,
34 mr->v_origindex = CustomData_get_layer(&mr->me->vdata, CD_ORIGINDEX);
35 mr->e_origindex = CustomData_get_layer(&mr->me->edata, CD_ORIGINDEX);
36 mr->p_origindex = CustomData_get_layer(&mr->me->pdata, CD_ORIGINDEX);
37+ }
38+ else {
39+ /* BMesh */
40+ BMesh *bm = mr->bm;
41+
42+ mr->vert_len = bm->totvert;
43+ mr->edge_len = bm->totedge;
44+ mr->loop_len = bm->totloop;
45+ mr->poly_len = bm->totface;
46+ mr->tri_len = poly_to_tri_count(mr->poly_len, mr->loop_len);
47+ }
48+
49+ return mr;
50+}
51
52+/* Part of the creation of the MeshRenderData that happens in a thread. */
53+static void mesh_render_data_update(MeshRenderData *mr,
54+ const eMRIterType iter_type,
55+ const eMRDataType data_flag)
56+{
57+ Mesh *me = mr->me;
58+ const bool is_auto_smooth = (me->flag & ME_AUTOSMOOTH) != 0;
59+ const float split_angle = is_auto_smooth ? me->smoothresh : (float)M_PI;
60+
61+ if (mr->extract_type != MR_EXTRACT_BMESH) {
62+ /* Mesh */
63 if (data_flag & (MR_DATA_POLY_NOR | MR_DATA_LOOP_NOR | MR_DATA_TAN_LOOP_NOR)) {
64 mr->poly_normals = MEM_mallocN(sizeof(*mr->poly_normals) * mr->poly_len, __func__);
65 BKE_mesh_calc_normals_poly((MVert *)mr->mvert,
66@@ -300,13 +321,6 @@ static MeshRenderData *mesh_render_data_create(Mesh *me,
67 else {
68 /* BMesh */
69 BMesh *bm = mr->bm;
70-
71- mr->vert_len = bm->totvert;
72- mr->edge_len = bm->totedge;
73- mr->loop_len = bm->totloop;
74- mr->poly_len = bm->totface;
75- mr->tri_len = poly_to_tri_count(mr->poly_len, mr->loop_len);
76-
77 if (data_flag & MR_DATA_POLY_NOR) {
78 /* Use bmface->no instead. */
79 }
80@@ -360,7 +374,6 @@ static MeshRenderData *mesh_render_data_create(Mesh *me,
81 mr->loop_loose_len = mr->vert_loose_len + mr->edge_loose_len * 2;
82 }
83 }
84- return mr;
85 }
86
87 static void mesh_render_data_free(MeshRenderData *mr)
88@@ -672,45 +685,13 @@ static const MeshExtract extract_lines = {
89 0,
90 false,
91 };
92-
93 /** \} */
94
95 /* ---------------------------------------------------------------------- */
96-/** \name Extract Loose Edges Indices
97+/** \name Extract Loose Edges Sub Buffer
98 * \{ */
99
100-static void *extract_lines_loose_init(const MeshRenderData *UNUSED(mr), void *UNUSED(buf))
101-{
102- return NULL;
103-}
104-
105-static void extract_lines_loose_ledge_mesh(const MeshRenderData *UNUSED(mr),
106- int UNUSED(e),
107- const MEdge *UNUSED(medge),
108- void *UNUSED(elb))
109-{
110- /* This function is intentionally empty. The existence of this functions ensures that
111- * `iter_type` `MR_ITER_LVERT` is set when initializing the `MeshRenderData` (See
112- * `mesh_extract_iter_type`). This flag ensures that `mr->edge_loose_len` field is filled. This
113- * field we use in the `extract_lines_loose_finish` function to create a subrange from the
114- * `ibo.lines`. */
115-}
116-
117-static void extract_lines_loose_ledge_bmesh(const MeshRenderData *UNUSED(mr),
118- int UNUSED(e),
119- BMEdge *UNUSED(eed),
120- void *UNUSED(elb))
121-{
122- /* This function is intentionally empty. The existence of this functions ensures that
123- * `iter_type` `MR_ITER_LVERT` is set when initializing the `MeshRenderData` (See
124- * `mesh_extract_iter_type`). This flag ensures that `mr->edge_loose_len` field is filled. This
125- * field we use in the `extract_lines_loose_finish` function to create a subrange from the
126- * `ibo.lines`. */
127-}
128-
129-static void extract_lines_loose_finish(const MeshRenderData *mr,
130- void *UNUSED(ibo),
131- void *UNUSED(elb))
132+static void extract_lines_loose_subbuffer(const MeshRenderData *mr)
133 {
134 /* Multiply by 2 because these are edges indices. */
135 const int start = mr->edge_len * 2;
136@@ -720,17 +701,24 @@ static void extract_lines_loose_finish(const MeshRenderData *mr,
137 mr->cache->no_loose_wire = (len == 0);
138 }
139
140-static const MeshExtract extract_lines_loose = {
141- extract_lines_loose_init,
142- NULL,
143- NULL,
144+static void extract_lines_with_lines_loose_finish(const MeshRenderData *mr, void *ibo, void *elb)
145+{
146+ GPU_indexbuf_build_in_place(elb, ibo);
147+ extract_lines_loose_subbuffer(mr);
148+ MEM_freeN(elb);
149+}
150+
151+static const MeshExtract extract_lines_with_lines_loose = {
152+ extract_lines_init,
153 NULL,
154 NULL,
155- extract_lines_loose_ledge_bmesh,
156- extract_lines_loose_ledge_mesh,
157+ extract_lines_loop_bmesh,
158+ extract_lines_loop_mesh,
159+ extract_lines_ledge_bmesh,
160+ extract_lines_ledge_mesh,
161 NULL,
162 NULL,
163- extract_lines_loose_finish,
164+ extract_lines_with_lines_loose_finish,
165 0,
166 false,
167 };
168@@ -1649,8 +1637,8 @@ static void extract_lnor_hq_loop_mesh(
169 }
170
171 /* Flag for paint mode overlay.
172- * Only use MR_EXTRACT_MAPPED in edit mode where it is used to display the edge-normals. In paint
173- * mode it will use the unmapped data to draw the wireframe. */
174+ * Only use MR_EXTRACT_MAPPED in edit mode where it is used to display the edge-normals. In
175+ * paint mode it will use the unmapped data to draw the wireframe. */
176 if (mpoly->flag & ME_HIDE ||
177 (mr->edit_bmesh && mr->extract_type == MR_EXTRACT_MAPPED && (mr->v_origindex) &&
178 mr->v_origindex[mloop->v] == ORIGINDEX_NONE)) {
179@@ -1728,8 +1716,8 @@ static void extract_lnor_loop_mesh(
180 }
181
182 /* Flag for paint mode overlay.
183- * Only use MR_EXTRACT_MAPPED in edit mode where it is used to display the edge-normals. In paint
184- * mode it will use the unmapped data to draw the wireframe. */
185+ * Only use MR_EXTRACT_MAPPED in edit mode where it is used to display the edge-normals. In
186+ * paint mode it will use the unmapped data to draw the wireframe. */
187 if (mpoly->flag & ME_HIDE ||
188 (mr->edit_bmesh && mr->extract_type == MR_EXTRACT_MAPPED && (mr->v_origindex) &&
189 mr->v_origindex[mloop->v] == ORIGINDEX_NONE)) {
190@@ -4419,10 +4407,14 @@ static const MeshExtract extract_fdot_idx = {
191 /** \} */
192
193 /* ---------------------------------------------------------------------- */
194-/** \name Extract Loop
195+/** \name ExtractTaskData
196 * \{ */
197+typedef struct ExtractUserData {
198+ void *user_data;
199+} ExtractUserData;
200
201 typedef struct ExtractTaskData {
202+ void *next, *prev;
203 const MeshRenderData *mr;
204 const MeshExtract *extract;
205 eMRIterType iter_type;
206@@ -4430,9 +4422,15 @@ typedef struct ExtractTaskData {
207 /** Decremented each time a task is finished. */
208 int32_t *task_counter;
209 void *buf;
210- void *user_data;
211+ ExtractUserData *user_data;
212 } ExtractTaskData;
213
214+static void extract_task_data_free(ExtractTaskData *data)
215+{
216+ MEM_SAFE_FREE(data->user_data);
217+ MEM_freeN(data);
218+}
219+
220 BLI_INLINE void mesh_extract_iter(const MeshRenderData *mr,
221 const eMRIterType iter_type,
222 int start,
223@@ -4509,31 +4507,181 @@ BLI_INLINE void mesh_extract_iter(const MeshRenderData *mr,
224 }
225 }
226
227-static void extract_run(TaskPool *__restrict UNUSED(pool), void *taskdata)
228+static void extract_init(ExtractTaskData *data)
229+{
230+ data->user_data->user_data = data->extract->init(data->mr, data->buf);
231+}
232+
233+static void extract_run(void *__restrict taskdata)
234 {
235- ExtractTaskData *data = taskdata;
236- mesh_extract_iter(
237- data->mr, data->iter_type, data->start, data->end, data->extract, data->user_data);
238+ ExtractTaskData *data = (ExtractTaskData *)taskdata;
239+ mesh_extract_iter(data->mr,
240+ data->iter_type,
241+ data->start,
242+ data->end,
243+ data->extract,
244+ data->user_data->user_data);
245
246 /* If this is the last task, we do the finish function. */
247 int remainin_tasks = atomic_sub_and_fetch_int32(data->task_counter, 1);
248 if (remainin_tasks == 0 && data->extract->finish != NULL) {
249- data->extract->finish(data->mr, data->buf, data->user_data);
250+ data->extract->finish(data->mr, data->buf, data->user_data->user_data);
251+ }
252+}
253+
254+static void extract_init_and_run(void *__restrict taskdata)
255+{
256+ extract_init((ExtractTaskData *)taskdata);
257+ extract_run(taskdata);
258+}
259+
260+/** \} */
261+
262+/* ---------------------------------------------------------------------- */
263+/** \name Task Node - Update Mesh Render Data
264+ * \{ */
265+typedef struct MeshRenderDataUpdateTaskData {
266+ MeshRenderData *mr;
267+ eMRIterType iter_type;
268+ eMRDataType data_flag;
269+} MeshRenderDataUpdateTaskData;
270+
271+static void mesh_render_data_update_task_data_free(MeshRenderDataUpdateTaskData *taskdata)
272+{
273+ BLI_assert(taskdata);
274+ mesh_render_data_free(taskdata->mr);
275+ MEM_freeN(taskdata);
276+}
277+
278+static void mesh_extract_render_data_node_exec(void *__restrict task_data)
279+{
280+ MeshRenderDataUpdateTaskData *update_task_data = task_data;
281+ mesh_render_data_update(
282+ update_task_data->mr, update_task_data->iter_type, update_task_data->data_flag);
283+}
284+
285+static struct TaskNode *mesh_extract_render_data_node_create(struct TaskGraph *task_graph,
286+ MeshRenderData *mr,
287+ const eMRIterType iter_type,
288+ const eMRDataType data_flag)
289+{
290+ MeshRenderDataUpdateTaskData *task_data = MEM_mallocN(sizeof(MeshRenderDataUpdateTaskData),
291+ __func__);
292+ task_data->mr = mr;
293+ task_data->iter_type = iter_type;
294+ task_data->data_flag = data_flag;
295+
296+ struct TaskNode *task_node = BLI_task_graph_node_create(
297+ task_graph,
298+ mesh_extract_render_data_node_exec,
299+ task_data,
300+ (TaskGraphNodeFreeFunction)mesh_render_data_update_task_data_free);
301+ return task_node;
302+}
303+
304+/** \} */
305+
306+/* ---------------------------------------------------------------------- */
307+/** \name Task Node - Extract Single Threaded
308+ * \{ */
309+typedef struct ExtractSingleThreadedTaskData {
310+ ListBase task_datas;
311+} ExtractSingleThreadedTaskData;
312+
313+static void extract_single_threaded_task_data_free(ExtractSingleThreadedTaskData *taskdata)
314+{
315+ BLI_assert(taskdata);
316+ LISTBASE_FOREACH_MUTABLE (ExtractTaskData *, td, &taskdata->task_datas) {
317+ extract_task_data_free(td);
318 }
319+ BLI_listbase_clear(&taskdata->task_datas);
320+ MEM_freeN(taskdata);
321 }
322
323-static void extract_range_task_create(
324- TaskPool *task_pool, ExtractTaskData *taskdata, const eMRIterType type, int start, int length)
325+static void extract_single_threaded_task_node_exec(void *__restrict task_data)
326+{
327+ ExtractSingleThreadedTaskData *extract_task_data = task_data;
328+ LISTBASE_FOREACH (ExtractTaskData *, td, &extract_task_data->task_datas) {
329+ extract_init_and_run(td);
330+ }
331+}
332+
333+static struct TaskNode *extract_single_threaded_task_node_create(
334+ struct TaskGraph *task_graph, ExtractSingleThreadedTaskData *task_data)
335+{
336+ struct TaskNode *task_node = BLI_task_graph_node_create(
337+ task_graph,
338+ extract_single_threaded_task_node_exec,
339+ task_data,
340+ (TaskGraphNodeFreeFunction)extract_single_threaded_task_data_free);
341+ return task_node;
342+}
343+
344+/** \} */
345+
346+/* ---------------------------------------------------------------------- */
347+/** \name Task Node - UserData Initializer
348+ * \{ */
349+typedef struct UserDataInitTaskData {
350+ ListBase task_datas;
351+} UserDataInitTaskData;
352+
353+static void user_data_init_task_data_free(UserDataInitTaskData *taskdata)
354+{
355+ BLI_assert(taskdata);
356+ LISTBASE_FOREACH_MUTABLE (ExtractTaskData *, td, &taskdata->task_datas) {
357+ extract_task_data_free(td);
358+ }
359+ BLI_listbase_clear(&taskdata->task_datas);
360+ MEM_freeN(taskdata);
361+}
362+
363+static void user_data_init_task_data_exec(void *__restrict task_data)
364+{
365+ UserDataInitTaskData *extract_task_data = task_data;
366+ LISTBASE_FOREACH (ExtractTaskData *, td, &extract_task_data->task_datas) {
367+ extract_init(td);
368+ }
369+}
370+
371+static struct TaskNode *user_data_init_task_node_create(struct TaskGraph *task_graph,
372+ UserDataInitTaskData *task_data)
373+{
374+ struct TaskNode *task_node = BLI_task_graph_node_create(
375+ task_graph,
376+ user_data_init_task_data_exec,
377+ task_data,
378+ (TaskGraphNodeFreeFunction)user_data_init_task_data_free);
379+ return task_node;
380+}
381+
382+/** \} */
383+/* ---------------------------------------------------------------------- */
384+/** \name Extract Loop
385+ * \{ */
386+
387+static void extract_range_task_create(struct TaskGraph *task_graph,
388+ struct TaskNode *task_node_user_data_init,
389+ ExtractTaskData *taskdata,
390+ const eMRIterType type,
391+ int start,
392+ int length)
393 {
394 taskdata = MEM_dupallocN(taskdata);
395 atomic_add_and_fetch_int32(taskdata->task_counter, 1);
396 taskdata->iter_type = type;
397 taskdata->start = start;
398 taskdata->end = start + length;
399- BLI_task_pool_push(task_pool, extract_run, taskdata, true, NULL);
400+ struct TaskNode *task_node = BLI_task_graph_node_create(
401+ task_graph, extract_run, taskdata, MEM_freeN);
402+ BLI_task_graph_edge_create(task_node_user_data_init, task_node);
403 }
404
405-static void extract_task_create(TaskPool *task_pool,
406+static void extract_task_create(struct TaskGraph *task_graph,
407+ struct TaskNode *task_node_mesh_render_data,
408+ struct TaskNode *task_node_user_data_init,
409+ ListBase *single_threaded_task_datas,
410+ ListBase *user_data_init_task_datas,
411 const Scene *scene,
412 const MeshRenderData *mr,
413 const MeshExtract *extract,
414@@ -4551,52 +4699,60 @@ static void extract_task_create(TaskPool *task_pool,
415
416 /* Divide extraction of the VBO/IBO into sensible chunks of works. */
417 ExtractTaskData *taskdata = MEM_mallocN(sizeof(*taskdata), "ExtractTaskData");
418+ taskdata->next = NULL;
419+ taskdata->prev = NULL;
420 taskdata->mr = mr;
421 taskdata->extract = extract;
422 taskdata->buf = buf;
423- taskdata->user_data = extract->init(mr, buf);
424+ taskdata->user_data = MEM_callocN(sizeof(ExtractUserData), __func__);
425 taskdata->iter_type = mesh_extract_iter_type(extract);
426 taskdata->task_counter = task_counter;
427 taskdata->start = 0;
428 taskdata->end = INT_MAX;
429
430 /* Simple heuristic. */
431- const bool use_thread = (mr->loop_len + mr->loop_loose_len) > 8192;
432+ const int chunk_size = 8192;
433+ const bool use_thread = (mr->loop_len + mr->loop_loose_len) > chunk_size;
434 if (use_thread && extract->use_threading) {
435+
436 /* Divide task into sensible chunks. */
437- const int chunk_size = 8192;
438 if (taskdata->iter_type & MR_ITER_LOOPTRI) {
439 for (int i = 0; i < mr->tri_len; i += chunk_size) {
440- extract_range_task_create(task_pool, taskdata, MR_ITER_LOOPTRI, i, chunk_size);
441+ extract_range_task_create(
442+ task_graph, task_node_user_data_init, taskdata, MR_ITER_LOOPTRI, i, chunk_size);
443 }
444 }
445 if (taskdata->iter_type & MR_ITER_LOOP) {
446 for (int i = 0; i < mr->poly_len; i += chunk_size) {
447- extract_range_task_create(task_pool, taskdata, MR_ITER_LOOP, i, chunk_size);
448+ extract_range_task_create(
449+ task_graph, task_node_user_data_init, taskdata, MR_ITER_LOOP, i, chunk_size);
450 }
451 }
452 if (taskdata->iter_type & MR_ITER_LEDGE) {
453 for (int i = 0; i < mr->edge_loose_len; i += chunk_size) {
454- extract_range_task_create(task_pool, taskdata, MR_ITER_LEDGE, i, chunk_size);
455+ extract_range_task_create(
456+ task_graph, task_node_user_data_init, taskdata, MR_ITER_LEDGE, i, chunk_size);
457 }
458 }
459 if (taskdata->iter_type & MR_ITER_LVERT) {
460 for (int i = 0; i < mr->vert_loose_len; i += chunk_size) {
461- extract_range_task_create(task_pool, taskdata, MR_ITER_LVERT, i, chunk_size);
462+ extract_range_task_create(
463+ task_graph, task_node_user_data_init, taskdata, MR_ITER_LVERT, i, chunk_size);
464 }
465 }
466- MEM_freeN(taskdata);
467+ BLI_addtail(user_data_init_task_datas, taskdata);
468 }
469 else if (use_thread) {
470 /* One task for the whole VBO. */
471 (*task_counter)++;
472- BLI_task_pool_push(task_pool, extract_run, taskdata, true, NULL);
473+ struct TaskNode *one_task = BLI_task_graph_node_create(
474+ task_graph, extract_init_and_run, taskdata, extract_task_data_free);
475+ BLI_task_graph_edge_create(task_node_mesh_render_data, one_task);
476 }
477 else {
478 /* Single threaded extraction. */
479 (*task_counter)++;
480- extract_run(NULL, taskdata);
481- MEM_freeN(taskdata);
482+ BLI_addtail(single_threaded_task_datas, taskdata);
483 }
484 }
485
486@@ -4617,6 +4773,8 @@ void mesh_buffer_cache_create_requested(MeshBatchCache *cache,
487 eMRIterType iter_flag = 0;
488 eMRDataType data_flag = 0;
489
490+ const bool do_lines_loose_subbuffer = mbc.ibo.lines_loose;
491+
492 #define TEST_ASSIGN(type, type_lowercase, name) \
493 do { \
494 if (DRW_TEST_ASSIGN_##type(mbc.type_lowercase.name)) { \
495@@ -4654,7 +4812,6 @@ void mesh_buffer_cache_create_requested(MeshBatchCache *cache,
496 TEST_ASSIGN(IBO, ibo, fdots);
497 TEST_ASSIGN(IBO, ibo, lines_paint_mask);
498 TEST_ASSIGN(IBO, ibo, lines_adjacency);
499- TEST_ASSIGN(IBO, ibo, lines_loose);
500 TEST_ASSIGN(IBO, ibo, edituv_tris);
501 TEST_ASSIGN(IBO, ibo, edituv_lines);
502 TEST_ASSIGN(IBO, ibo, edituv_points);
503@@ -4685,11 +4842,16 @@ void mesh_buffer_cache_create_requested(MeshBatchCache *cache,
504 double rdata_end = PIL_check_seconds_timer();
505 #endif
506
507- TaskPool *task_pool;
508-
509- /* Create a suspended pool as the finalize method could be called too early.
510- * See `extract_run`. */
511- task_pool = BLI_task_pool_create_suspended(NULL, TASK_PRIORITY_HIGH);
512+ /* TODO: Move to draw_manager.c */
513+ struct TaskGraph *task_graph = BLI_task_graph_create();
514+ struct TaskNode *task_node_mesh_render_data = mesh_extract_render_data_node_create(
515+ task_graph, mr, iter_flag, data_flag);
516+ ExtractSingleThreadedTaskData *single_threaded_task_data = MEM_callocN(
517+ sizeof(ExtractSingleThreadedTaskData), __func__);
518+ UserDataInitTaskData *user_data_init_task_data = MEM_callocN(sizeof(UserDataInitTaskData),
519+ __func__);
520+ struct TaskNode *task_node_user_data_init = user_data_init_task_node_create(
521+ task_graph, user_data_init_task_data);
522
523 size_t counters_size = (sizeof(mbc) / sizeof(void *)) * sizeof(int32_t);
524 int32_t *task_counters = MEM_callocN(counters_size, __func__);
525@@ -4697,8 +4859,16 @@ void mesh_buffer_cache_create_requested(MeshBatchCache *cache,
526
527 #define EXTRACT(buf, name) \
528 if (mbc.buf.name) { \
529- extract_task_create( \
530- task_pool, scene, mr, &extract_##name, mbc.buf.name, &task_counters[counter_used++]); \
531+ extract_task_create(task_graph, \
532+ task_node_mesh_render_data, \
533+ task_node_user_data_init, \
534+ &single_threaded_task_data->task_datas, \
535+ &user_data_init_task_data->task_datas, \
536+ scene, \
537+ mr, \
538+ &extract_##name, \
539+ mbc.buf.name, \
540+ &task_counters[counter_used++]); \
541 } \
542 ((void)0)
543
544@@ -4726,7 +4896,30 @@ void mesh_buffer_cache_create_requested(MeshBatchCache *cache,
545 EXTRACT(vbo, skin_roots);
546
547 EXTRACT(ibo, tris);
548- EXTRACT(ibo, lines);
549+ if (mbc.ibo.lines) {
550+ /* Both lines and lines loose are requested.
551+ * Schedule lines extraction that also extracts the lines_loose sub buffer.
552+ * This saves creating a new task related to the lines task. */
553+ const MeshExtract *lines_extractor = do_lines_loose_subbuffer ?
554+ &extract_lines_with_lines_loose :
555+ &extract_lines;
556+ extract_task_create(task_graph,
557+ task_node_mesh_render_data,
558+ task_node_user_data_init,
559+ &single_threaded_task_data->task_datas,
560+ &user_data_init_task_data->task_datas,
561+ scene,
562+ mr,
563+ lines_extractor,
564+ mbc.ibo.lines,
565+ &task_counters[counter_used++]);
566+ }
567+ else {
568+ if (do_lines_loose_subbuffer) {
569+ /* When only the lines_loose is requested we create the subbuffer on the go. */
570+ extract_lines_loose_subbuffer(mr);
571+ }
572+ }
573 EXTRACT(ibo, points);
574 EXTRACT(ibo, fdots);
575 EXTRACT(ibo, lines_paint_mask);
576@@ -4736,26 +4929,34 @@ void mesh_buffer_cache_create_requested(MeshBatchCache *cache,
577 EXTRACT(ibo, edituv_points);
578 EXTRACT(ibo, edituv_fdots);
579
580+ /* Micro optimalization: Only create the edge when there are user datas that needs to be
581+ * inited.
582+ * The task is still part of the graph so the task_data will be freed when the graph is freed.
583+ */
584+ if (!BLI_listbase_is_empty(&user_data_init_task_data->task_datas)) {
585+ BLI_task_graph_edge_create(task_node_mesh_render_data, task_node_user_data_init);
586+ }
587+
588+ if (!BLI_listbase_is_empty(&single_threaded_task_data->task_datas)) {
589+ struct TaskNode *task_node = extract_single_threaded_task_node_create(
590+ task_graph, single_threaded_task_data);
591+ BLI_task_graph_edge_create(task_node_mesh_render_data, task_node);
592+ }
593+ else {
594+ extract_single_threaded_task_data_free(single_threaded_task_data);
595+ }
596+
597+ BLI_task_graph_node_push_work(task_node_mesh_render_data);
598 /* TODO(fclem) Ideally, we should have one global pool for all
599 * objects and wait for finish only before drawing when buffers
600 * need to be ready. */
601- BLI_task_pool_work_and_wait(task_pool);
602-
603- /* The next task(s) rely on the result of the tasks above. */
604-
605- /* The `lines_loose` is a sub buffer from `ibo.lines`.
606- * We schedule it here due to potential synchronization issues.*/
607- EXTRACT(ibo, lines_loose);
608-
609- BLI_task_pool_work_and_wait(task_pool);
610+ BLI_task_graph_work_and_wait(task_graph);
611
612 #undef EXTRACT
613
614- BLI_task_pool_free(task_pool);
615+ BLI_task_graph_free(task_graph);
616 MEM_freeN(task_counters);
617
618- mesh_render_data_free(mr);
619-
620 #ifdef DEBUG_TIME
621 double end = PIL_check_seconds_timer();
622

Can be used as an actual example.
The paste isn't complete it reimplemented the current logic. On my Ryzen 1700 the framerate went from 11.40 to 11.50 (Spring/020_02A.anim.blend).

The only benefit this implementation has is that the whole extraction is done in tasks also the preparation steps. The speedup will be more when the graph handles multiple meshes at the same time (removing the wait in the current implementation)

Jeroen Bakker (jbakker) planned changes to this revision.May 5 2020, 6:48 PM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)
  • fallback implementation
  • test cases for task_data.
  • Added Test case for free-ing task data
  • Check that a tree is created
  • Support for WITH_TBB
Sergey Sharybin (sergey) requested changes to this revision.May 7 2020, 10:55 AM
Sergey Sharybin (sergey) added inline comments.
source/blender/blenlib/BLI_task.h
241

Is always more clear to put usage into the comment, which will visualize an idea rather than pointing to a specific code which achieves some specific case.

You can use https://developer.blender.org/diffusion/B/browse/master/extern/ceres/include/ceres/cost_function_to_functor.h$31 as inspiration.

261–262

Please pay attention when typing comments, make them add value, which will help understanding what is the intended use of the function. Ideally with some simple real world example.

263–264

Data or the execution flow is forwarded to the to_node ?

source/blender/blenlib/intern/task_graph.cc
32

System headers are yo be included using triangle brackets.

42–43

This this is C++ code get full benefit from it.

Having something like List<TaskNode> nodes; makes it more explicit what the type, and avoids possibility of a developer to cast it to a wrong type.

It also avoids use of LISTBASE_FOREACH() by allowing you to use regular C++ range-based loops.

Not sure whether @Jacques Lucke (JacquesLucke) have List already. If not, use std::list for now and quite sure at the time this patch is fully reviewed Jacques will commit native List ;)

69–70

This is non helpful over documentation, which does add noise.

Generally, comment should add clarify things, not state something you can deduct already.

Here it would be helpful to know who is the owner of this data, how and when its freed.

75–76

Not sure how knowing number of incoming links allows you to reliably detect invalid graphs.

Use of "tree" is at very least confusing or at a maximum wrong here. In all definitions I am aware of "tree" has a single strongly connected component. Whether it is directed or not varies from author to author. And last but not least, quite commonly it is a requirement to only have single path from node A to node B (which wouldn't be the case in dependency graph, i.e.).

So what is it you are actually supporting? Directed graphs without cycles? Single directed tree? Forest of directed trees?

97–98

In C++ code use nullptr.

129

If you must do cast use static_cast, or reinterpret_cast.

Here you might have avoided this by doing LISTBASE_FOREACH (TaskNode *, link, &outgoing_edges) {.

If you use proper C++ List then this becomes for (TaskNode* node : outgoing_edges).

207

Not quite sure who this is intended to be used.

I would imagine that from the API point of view the flow goes as:

  1. Create graph
  2. Create nodes
  3. Create edges
  4. Run the graph

Having function which immediately runs the code does not fit to this mind set.

tests/gtests/blenlib/BLI_task_graph_test.cc
69

answer is ambiguous.

Use expected_value, observed_value terminology.

This revision now requires changes to proceed.May 7 2020, 10:55 AM
source/blender/blenlib/BLI_task.h
251

Should these comments be moved to the actual implementation? The style guide says Keep comments close to code.. Also they should end with ..

source/blender/blenlib/intern/task_graph.cc
42–43

There is no C++ linked list implementation in blenlib yet. I could add one for sure. For now std::list is definitely better than ListBase.

However, I'd like it much more if people would stop using linked lists in many cases. Often there is an alternative that expresses the intend better and has better performance. I quickly checked where nodes is used in this file and it seems you only call BLI_addtail and BLI_pophead on it. So that is a FIFO data structure. Maybe std::queue works as well.

This can be simplified even more by noticing that you call BLI_pophead only in the destructor and you only do that to iterate over all nodes. You could also iterate over all nodes without calling BLI_pophead. Therefore, instead of using a queue, you could also just use BLI::Vector<TaskNode *>.

I'm not sure how many nodes are expected, but if the number is large, you could also use BLI::LinearAllocator to construct them in larger memory chunks. For that, simply add BLI::LinearAllocator<> allocator to TaskGraph. Then in add_node do TaskNode *task_node = allocator.construct<TaskNode>(this, run, task_data, free_func). Remember that you cannot call delete on it anymore now, because delete is destruction plus deallocation. The deallocation is handled by the linear allocator. Therefore, you would have to call the destructor explicitely with task_node->~TaskNode().

source/blender/blenlib/BLI_task.h
251

Comments in header are optional but keep brief.

So it is all fine and helpful.

Keep comments close to code.

I actually wouldn't mind altering this, since this forces to constantly switch between code and header when looking for exact description, which is rather annoying.

One of the main reasoning of this policy existing is the legacy state of headers, with a lot of functions, and the fear was that finding an exact function of interest will be more complicated with an elaborate description.

But this is actually beyond the scope of this patch.

Jeroen Bakker (jbakker) marked 10 inline comments as done.

Code review

  • Better usage documentation
  • Use more CPP
  • Don't use TBB when running single threaded
  • Added test cases
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)May 7 2020, 1:55 PM
source/blender/blenlib/intern/task_graph.cc
42–43

I changed nodes to a queue, but kept siblings to a list due to the reference_wrapper it uses.

207

I documented this in the header.

The concept is that the graph are actually a forest of directional trees. the building and triggering can be done in a lower level than the graph. For example see D7618: DrawManager: Graph Task Scheduling where the graph is owned by the draw manager, but any mesh can register their tree and trigger the execution ot their part. We could isolate this more clearly in the API.

There is also support to start execution somewhere in the middle of the tree. I haven't found any cases where we would use this though.

source/blender/blenlib/intern/task_graph.cc
137

This calls the implicit copy constructor of TaskNode. So every time add_node is called, it actually allocates two TaskNodes. However, when the TaskGraph is destructed, only the copy of the actual TaskNode is freed. Therefore, this is a memory leak.

source/blender/blenlib/intern/task_graph.cc
42–43

I'm not sure why one would use a queue, when a vector works as well. Or do you intend to add other functionality that requires it to be a queue?

Jeroen Bakker (jbakker) marked 2 inline comments as done.
  • Changed queue<TaskNode> to vector<unique_ptr<TaskNode>>
  • fixed memory leak.

Use vector for siblings

source/blender/blenlib/BLI_task.h
245–249

Doesn't seem you have multiple roots and trees in this example. Call them root and node_{1, 2, 3, 4}. Makes it easier to follow (including indices of root and tree kind makes you expect to have multiple roots and trees).

You can also add something like this to help visualizing the graph in mind:

    +---- [root] ----+
    |                |
    v                v
[node_1]    +---- [node_2] ----+
            |                  |
            v                  v
         [node_3]           [node_4]
source/blender/blenlib/intern/task_graph.cc
192–193

WHat is the reasoning of this? (add in the comment ;)

And specify is it limitation for now, something to work on in the future, or there is a huge design limitation to this (caused by...)?

Jeroen Bakker (jbakker) marked 6 inline comments as done.

Removed num_incoming_links check and describe the actual behavior
Naming children and child.

Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)May 8 2020, 7:52 AM
Jeroen Bakker (jbakker) added inline comments.
source/blender/blenlib/intern/task_graph.cc
192–193

Having multiple incoming edges results in that the task was triggered for each finished parent. As there is no data forwarding messaging system in place this is not that useful. I added the check with this in mind. But better not make any assumption about the usage yet.

TBB supports joining the execution of multiple parents before continuing to execute their common children. This is done by Task Inputs and heavily relies on cpp-templates where the function parameters are created dynamic based on the number of inputs, again during compilation time. I am proposing when we need to support joining tasks that we would make this a CPP only feature. I do see the value of having it, but would rather add this in a separate patch.

I removed the check of num_incoming_links and added documentation of the behavior in the BLI_task.h

Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)May 8 2020, 7:55 AM
  • rebase with latest master
  • added task isolation
source/blender/blenlib/BLI_task.h
243

Remove trailing space.

268

Remove trailing space.

source/blender/blenlib/intern/task_graph.cc
28

Remove unused include.

63

Outdated comment, this is not forwarded.

145

Remove unnecessary NULL pointer checks.

183

Avoid double mem alloc overhead when using TBB.

tests/gtests/blenlib/BLI_task_graph_test.cc
8

Remove unnecessary extern "C"

Fix own nitpicky comments.

This revision is now accepted and ready to land.Mon, May 25, 11:49 AM
This revision was automatically updated to reflect the committed changes.