Suggestion to improve "Top 50 devices" list #73322

Closed
opened 2020-01-22 18:38:13 +01:00 by Francesco Siddi · 23 comments

This is open for discussion and comes from a conversation about giving visibility to Nvidia devices using both CUDA and OptiX which are currently penalized by averaging the scores.

The simplest solution would be to query and display the fastest result for a device, instead of taking the median over all the results. But that may not be accurate enough, depending on how stable the results are. Another potential option would be to group results by both device name and device type (CUDA, OptiX, OpenCL, …), rather than just device name (here: https://developer.blender.org/source/blender-open-data/browse/master/website/opendata_main/views/home.py$138), calculate the median for each and then select the fastest.

This is open for discussion and comes from a conversation about giving visibility to Nvidia devices using both CUDA and OptiX which are currently penalized by averaging the scores. The simplest solution would be to query and display the fastest result for a device, instead of taking the median over all the results. But that may not be accurate enough, depending on how stable the results are. Another potential option would be to group results by both device name and device type (CUDA, OptiX, OpenCL, …), rather than just device name (here: https://developer.blender.org/source/blender-open-data/browse/master/website/opendata_main/views/home.py$138), calculate the median for each and then select the fastest.
Sem Mulder was assigned by Francesco Siddi 2020-01-22 18:38:13 +01:00
Author
Owner

Changed status from 'Needs Triage' to: 'Confirmed'

Changed status from 'Needs Triage' to: 'Confirmed'
Author
Owner

Added subscriber: @fsiddi

Added subscriber: @fsiddi

#74214 was marked as duplicate of this issue

#74214 was marked as duplicate of this issue

Added subscriber: @pmoursnv

Added subscriber: @pmoursnv

The simplest solution would be to query and display the fastest result for a device, instead of taking the median over all the results. But that may not be accurate enough, depending on how stable the results are.

We could take e.g. the 10th percentile instead, which is also my preferred solution. The problem with the other approach is that it makes the results too hard to interpret/verify.

> The simplest solution would be to query and display the fastest result for a device, instead of taking the median over all the results. But that may not be accurate enough, depending on how stable the results are. We could take e.g. the 10th percentile instead, which is also my preferred solution. The problem with the other approach is that it makes the results too hard to interpret/verify.

Another approach might be to limit the results on the homepage to recent results (e.g. one year).

Another approach might be to limit the results on the homepage to recent results (e.g. one year).

You can still send in results for CUDA on RTX GPUs right now though, in addition to OptiX. It's only that those ideally should not affect the overall rank, since it's the worse option.

You can still send in results for CUDA on RTX GPUs right now though, in addition to OptiX. It's only that those ideally should not affect the overall rank, since it's the worse option.

There is some weird behavior with the results currently displayed on http://opendata.blender.org:
Screenshot.JPG
It shows the GeForce RTX 2080 SUPER as being the fastest GPU, even though e.g. the GeForce RTX 2080 Ti has better results. Looks like this is because it uses the OptiX results for the SUPER, but the CUDA results for all other RTX GPUs to determine rank.

GeForce RTX 2080 Ti: https://opendata.blender.org/benchmarks/query/?device_name=GeForce%20RTX%202080%20Ti&benchmark=bmw27&group_by=device_type:
CUDA: 39.894, OptiX: 21.02255
GeForce RTX 2080 SUPER: https://opendata.blender.org/benchmarks/query/?device_name=GeForce%20RTX%202080%20SUPER&benchmark=bmw27&group_by=device_type:
CUDA: 52.1603, OptiX: 27.4187

Yet on the home page, the GeForce RTX 2080 Ti shows 38.63 for the bmw27 scene (which is close to the CUDA number) and the GeForce RTX 2080 SUPER shows 28.94 (which is close to the OptiX number).

Technically the GeForce RTX 2080 Ti is the faster GPU of the two and the benchmark results prove that, but it is not being displayed correctly in both the Top 50 and Fastest GPUs lists.

There is some weird behavior with the results currently displayed on http://opendata.blender.org: ![Screenshot.JPG](https://archive.blender.org/developer/F8327485/Screenshot.JPG) It shows the `GeForce RTX 2080 SUPER` as being the fastest GPU, even though e.g. the `GeForce RTX 2080 Ti` has better results. Looks like this is because it uses the OptiX results for the SUPER, but the CUDA results for all other RTX GPUs to determine rank. `GeForce RTX 2080 Ti`: https://opendata.blender.org/benchmarks/query/?device_name=GeForce%20RTX%202080%20Ti&benchmark=bmw27&group_by=device_type: CUDA: 39.894, OptiX: 21.02255 `GeForce RTX 2080 SUPER`: https://opendata.blender.org/benchmarks/query/?device_name=GeForce%20RTX%202080%20SUPER&benchmark=bmw27&group_by=device_type: CUDA: 52.1603, OptiX: 27.4187 Yet on the home page, the `GeForce RTX 2080 Ti` shows `38.63` for the bmw27 scene (which is close to the CUDA number) and the `GeForce RTX 2080 SUPER` shows `28.94` (which is close to the OptiX number). Technically the GeForce RTX 2080 Ti is the faster GPU of the two and the benchmark results prove that, but it is not being displayed correctly in both the Top 50 and Fastest GPUs lists.
Author
Owner

@SemMulder mind having a look at the last remark here?

@SemMulder mind having a look at the last remark here?

Yet on the home page, the GeForce RTX 2080 Ti shows 38.63 for the bmw27 scene (which is close to the CUDA number) and the GeForce RTX 2080 SUPER shows 28.94 (which is close to the OptiX number).

This is because the number on the home page is the median of all benchmarks of that (device, scene) combo. Since the Ti has more CUDA than OptiX results the final number is closer to the CUDA result.

> Yet on the home page, the GeForce RTX 2080 Ti shows 38.63 for the bmw27 scene (which is close to the CUDA number) and the GeForce RTX 2080 SUPER shows 28.94 (which is close to the OptiX number). This is because the number on the home page is the median of all benchmarks of that (device, scene) combo. Since the `Ti` has more CUDA than OptiX results the final number is closer to the CUDA result.

@SemMulder How about this for the Top 50 query:

SELECT rank() OVER (ORDER BY sum(per_benchmark.median_render_time)) AS rank,
       per_benchmark.device_name                                    AS device_name,
       json_object_agg(per_benchmark.benchmark,
                       per_benchmark.median_render_time)            AS median_render_times,
       sum(per_benchmark.number_of_samples)                         AS number_of_samples
FROM (
         SELECT rank() OVER (PARTITION BY per_device_type.device_name, per_device_type.benchmark ORDER BY per_device_type.median_render_time) as rank,*
         FROM (
                  SELECT percentile_cont(0.5)
                         WITHIN GROUP (ORDER BY opendata_main_benchmark.render_time) AS median_render_time,
                         opendata_main_benchmark.device_name                         AS device_name,
                         opendata_main_benchmark.device_type                         AS device_type,
                         opendata_main_benchmark.benchmark                           AS benchmark,
                         count(*)                                                    AS number_of_samples
                  FROM opendata_main_benchmark
                  WHERE opendata_main_benchmark.benchmark IN %(benchmarks)s AND opendata_main_benchmark.device_name NOT IN %(device_name_blacklist)s
                  GROUP BY opendata_main_benchmark.device_name, opendata_main_benchmark.device_type, opendata_main_benchmark.benchmark
                  HAVING count(*) >= %(minimum_number_of_samples_per_benchmark)s
              ) per_device_type
     ) per_benchmark
WHERE per_benchmark.rank = 1
GROUP BY per_benchmark.device_name
HAVING count(per_benchmark.benchmark) = %(number_of_benchmarks)s
ORDER BY rank
LIMIT 50;

This adds one additional step to the query which selects the better device type for a device (see the extra SELECT rank() in there, the rest is pretty much identical to the old query). It will only have an effect on RTX GPUs currently, which support two device types (CUDA and OptiX). But it ensures the median render time is only calculated for the better option instead of calculating it over all device types.

@SemMulder How about this for the Top 50 query: ``` SELECT rank() OVER (ORDER BY sum(per_benchmark.median_render_time)) AS rank, per_benchmark.device_name AS device_name, json_object_agg(per_benchmark.benchmark, per_benchmark.median_render_time) AS median_render_times, sum(per_benchmark.number_of_samples) AS number_of_samples FROM ( SELECT rank() OVER (PARTITION BY per_device_type.device_name, per_device_type.benchmark ORDER BY per_device_type.median_render_time) as rank,* FROM ( SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY opendata_main_benchmark.render_time) AS median_render_time, opendata_main_benchmark.device_name AS device_name, opendata_main_benchmark.device_type AS device_type, opendata_main_benchmark.benchmark AS benchmark, count(*) AS number_of_samples FROM opendata_main_benchmark WHERE opendata_main_benchmark.benchmark IN %(benchmarks)s AND opendata_main_benchmark.device_name NOT IN %(device_name_blacklist)s GROUP BY opendata_main_benchmark.device_name, opendata_main_benchmark.device_type, opendata_main_benchmark.benchmark HAVING count(*) >= %(minimum_number_of_samples_per_benchmark)s ) per_device_type ) per_benchmark WHERE per_benchmark.rank = 1 GROUP BY per_benchmark.device_name HAVING count(per_benchmark.benchmark) = %(number_of_benchmarks)s ORDER BY rank LIMIT 50; ``` This adds one additional step to the query which selects the better device type for a device (see the extra `SELECT rank()` in there, the rest is pretty much identical to the old query). It will only have an effect on RTX GPUs currently, which support two device types (CUDA and OptiX). But it ensures the median render time is only calculated for the better option instead of calculating it over all device types.

Ping =)

Ping =)

Added subscribers: @Walles, @SemMulder

Added subscribers: @Walles, @SemMulder

@pmoursnv except that the actual SQL is too much for me, I like the idea and I think it is an improvement on the current tables.

Some issues still though:

  • It is possible to game this by not having data for anything but bmw27, since the rank is based on the sum of the per-benchmark median render times, and avoiding the expensive ones will keep that sum low. Not sure how much of a problem that is.
  • For CPUs, num_cpu_threads might have to be taken into account as well. If you have two $FAST_CPUs in a box, that will be twice as fast as having just one, but both results will count towards the ranking in this table.
  • Before Blender 2.82 (I think) Windows renders were slower than Linux / macOS renders. Not sure (how) that should be taken into account.
  • Older Blender versions are slower than newer. Older devices might be unfairly penalized in this table (or is that what the percentiles are for?).

Thanks for thinking about this, I like top lists! :)

@pmoursnv except that the actual SQL is too much for me, I like the idea and I think it is an improvement on the current tables. Some issues still though: * It is possible to game this by not having data for anything but bmw27, since the rank is based on the sum of the per-benchmark median render times, and avoiding the expensive ones will keep that sum low. Not sure how much of a problem that is. * For CPUs, `num_cpu_threads` might have to be taken into account as well. If you have two `$FAST_CPU`s in a box, that will be twice as fast as having just one, but both results will count towards the ranking in this table. * Before Blender 2.82 (I think) Windows renders were slower than Linux / macOS renders. Not sure (how) that should be taken into account. * Older Blender versions are slower than newer. Older devices might be unfairly penalized in this table (or is that what the percentiles are for?). Thanks for thinking about this, I like top lists! :)

It is possible to game this by not having data for anything but bmw27, since the rank is based on the sum of the per-benchmark median render times, and avoiding the expensive ones will keep that sum low. Not sure how much of a problem that is.

This is false :). A benchmark needs to have minimum_number_of_samples_per_benchmark for each of the hardcoded scenes in benchmarks to show up.

For CPUs, num_cpu_threads might have to be taken into account as well. If you have two $FAST_CPUs in a box, that will be twice as fast as having just one, but both results will count towards the ranking in this table.

This is also not entirely accurate, since we put num_cpu_sockets in device_name.

Before Blender 2.82 (I think) Windows renders were slower than Linux / macOS renders. Not sure (how) that should be taken into account.
Older Blender versions are slower than newer. Older devices might be unfairly penalized in this table (or is that what the percentiles are for?).

Maybe we should just hardcode the Blender version to a recent one, and pick the minimum median render time over all configurations (i.e. the minimum over (device_type, os)).

> It is possible to game this by not having data for anything but bmw27, since the rank is based on the sum of the per-benchmark median render times, and avoiding the expensive ones will keep that sum low. Not sure how much of a problem that is. This is false :). A benchmark needs to have `minimum_number_of_samples_per_benchmark` for each of the hardcoded scenes in `benchmarks` to show up. > For CPUs, num_cpu_threads might have to be taken into account as well. If you have two $FAST_CPUs in a box, that will be twice as fast as having just one, but both results will count towards the ranking in this table. This is also not entirely accurate, since we put `num_cpu_sockets` in `device_name`. > Before Blender 2.82 (I think) Windows renders were slower than Linux / macOS renders. Not sure (how) that should be taken into account. > Older Blender versions are slower than newer. Older devices might be unfairly penalized in this table (or is that what the percentiles are for?). Maybe we should just hardcode the Blender version to a recent one, and pick the minimum median render time over all configurations (i.e. the minimum over `(device_type, os)`).

If we want to not hardcode the Blender version, we could go for the Blender version that has the highest number of samples over the last couple of months / the last 10000 samples / whatever makes sense.

If we want to not hardcode the Blender version, we could go for the Blender version that has the highest number of samples over the last couple of months / the last 10000 samples / whatever makes sense.

I'd like to emphasize the importance of the original issue again. The information about RTX GPUs has been broken for weeks now, which is really bad for the affected party (considering that Open Data does influence decisions as to what hardware to buy for Blender).

@SemMulder You haven't commented on the fix I proposed above. Are there objections? It seems a simple enough change that doesn't affect most of the data. Do you want me to create a Differential request or could you commit something along those lines?

I'd like to emphasize the importance of the original issue again. The information about RTX GPUs has been broken for weeks now, which is really bad for the affected party (considering that Open Data does influence decisions as to what hardware to buy for Blender). @SemMulder You haven't commented on the fix I proposed above. Are there objections? It seems a simple enough change that doesn't affect most of the data. Do you want me to create a Differential request or could you commit something along those lines?

You haven't commented on the fix I proposed above. Are there objections?

Sorry about this, we had an internal discussion about this going on. I should have at least posted a message that we were evaluating our options.

I see you created a Differential, thank you for that. I have a few other improvements I'd like to make as well. Such as taking the fastest median time over all configurations where a configuration is the tuple (device_type, os, blender_version). I will implement and discuss this internally, after which we can hopefully resolve this issue.

> You haven't commented on the fix I proposed above. Are there objections? Sorry about this, we had an internal discussion about this going on. I should have at least posted a message that we were evaluating our options. I see you created a Differential, thank you for that. I have a few other improvements I'd like to make as well. Such as taking the fastest median time over all configurations where a configuration is the tuple `(device_type, os, blender_version)`. I will implement and discuss this internally, after which we can hopefully resolve this issue.

Awesome, thank you! Let me know if there is anything else you need from me.
I'm hoping this can be resolved as soon as possible. If you expect more than a week, it would be helpful to get a rough time estimate as to when, so that I have more answer material for the inevitable press questions on the matter that we receive here.

Awesome, thank you! Let me know if there is anything else you need from me. I'm hoping this can be resolved as soon as possible. If you expect more than a week, it would be helpful to get a rough time estimate as to when, so that I have more answer material for the inevitable press questions on the matter that we receive here.

The changes are in 39c9dc48a6. I am in the process of deploying it as we speak.

The changes are in 39c9dc48a6. I am in the process of deploying it as we speak.

Changed status from 'Confirmed' to: 'Resolved'

Changed status from 'Confirmed' to: 'Resolved'

The changes are deployed. I'm closing this ticket now, but still would like to know what you think. Note that we opted to remove the table in favor of improving the chart since it was hard to convey the required information in the form of a table.

The changes are deployed. I'm closing this ticket now, but still would like to know what you think. Note that we opted to remove the table in favor of improving the chart since it was hard to convey the required information in the form of a table.

Great job! The scene breakdown is especially useful.
Long term it might be useful to add a graph (or link to one) that compares all devices (CPU, GPU), but for now the two graphs works well.

Great job! The scene breakdown is especially useful. Long term it might be useful to add a graph (or link to one) that compares all devices (CPU, GPU), but for now the two graphs works well.
Sign in to join this conversation.
No Milestone
No project
No Assignees
5 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: infrastructure/blender-open-data#73322
No description provided.