Page MenuHome

Suggestion to improve "Top 50 devices" list
Closed, ResolvedPublic

Description

This is open for discussion and comes from a conversation about giving visibility to Nvidia devices using both CUDA and OptiX which are currently penalized by averaging the scores.

The simplest solution would be to query and display the fastest result for a device, instead of taking the median over all the results. But that may not be accurate enough, depending on how stable the results are. Another potential option would be to group results by both device name and device type (CUDA, OptiX, OpenCL, …), rather than just device name (here: https://developer.blender.org/source/blender-open-data/browse/master/website/opendata_main/views/home.py$138), calculate the median for each and then select the fastest.

Event Timeline

Francesco Siddi (fsiddi) changed the task status from Needs Triage to Confirmed.Jan 22 2020, 6:38 PM
Francesco Siddi (fsiddi) created this task.

The simplest solution would be to query and display the fastest result for a device, instead of taking the median over all the results. But that may not be accurate enough, depending on how stable the results are.

We could take e.g. the 10th percentile instead, which is also my preferred solution. The problem with the other approach is that it makes the results too hard to interpret/verify.

Another approach might be to limit the results on the homepage to recent results (e.g. one year).

You can still send in results for CUDA on RTX GPUs right now though, in addition to OptiX. It's only that those ideally should not affect the overall rank, since it's the worse option.

There is some weird behavior with the results currently displayed on http://opendata.blender.org:


It shows the GeForce RTX 2080 SUPER as being the fastest GPU, even though e.g. the GeForce RTX 2080 Ti has better results. Looks like this is because it uses the OptiX results for the SUPER, but the CUDA results for all other RTX GPUs to determine rank.

GeForce RTX 2080 Ti: https://opendata.blender.org/benchmarks/query/?device_name=GeForce%20RTX%202080%20Ti&benchmark=bmw27&group_by=device_type:
CUDA: 39.894, OptiX: 21.02255
GeForce RTX 2080 SUPER: https://opendata.blender.org/benchmarks/query/?device_name=GeForce%20RTX%202080%20SUPER&benchmark=bmw27&group_by=device_type:
CUDA: 52.1603, OptiX: 27.4187

Yet on the home page, the GeForce RTX 2080 Ti shows 38.63 for the bmw27 scene (which is close to the CUDA number) and the GeForce RTX 2080 SUPER shows 28.94 (which is close to the OptiX number).

Technically the GeForce RTX 2080 Ti is the faster GPU of the two and the benchmark results prove that, but it is not being displayed correctly in both the Top 50 and Fastest GPUs lists.

@Sem Mulder (SemMulder) mind having a look at the last remark here?

Yet on the home page, the GeForce RTX 2080 Ti shows 38.63 for the bmw27 scene (which is close to the CUDA number) and the GeForce RTX 2080 SUPER shows 28.94 (which is close to the OptiX number).

This is because the number on the home page is the median of all benchmarks of that (device, scene) combo. Since the Ti has more CUDA than OptiX results the final number is closer to the CUDA result.

@Sem Mulder (SemMulder) How about this for the Top 50 query:

SELECT rank() OVER (ORDER BY sum(per_benchmark.median_render_time)) AS rank,
       per_benchmark.device_name                                    AS device_name,
       json_object_agg(per_benchmark.benchmark,
                       per_benchmark.median_render_time)            AS median_render_times,
       sum(per_benchmark.number_of_samples)                         AS number_of_samples
FROM (
         SELECT rank() OVER (PARTITION BY per_device_type.device_name, per_device_type.benchmark ORDER BY per_device_type.median_render_time) as rank,*
         FROM (
                  SELECT percentile_cont(0.5)
                         WITHIN GROUP (ORDER BY opendata_main_benchmark.render_time) AS median_render_time,
                         opendata_main_benchmark.device_name                         AS device_name,
                         opendata_main_benchmark.device_type                         AS device_type,
                         opendata_main_benchmark.benchmark                           AS benchmark,
                         count(*)                                                    AS number_of_samples
                  FROM opendata_main_benchmark
                  WHERE opendata_main_benchmark.benchmark IN %(benchmarks)s AND opendata_main_benchmark.device_name NOT IN %(device_name_blacklist)s
                  GROUP BY opendata_main_benchmark.device_name, opendata_main_benchmark.device_type, opendata_main_benchmark.benchmark
                  HAVING count(*) >= %(minimum_number_of_samples_per_benchmark)s
              ) per_device_type
     ) per_benchmark
WHERE per_benchmark.rank = 1
GROUP BY per_benchmark.device_name
HAVING count(per_benchmark.benchmark) = %(number_of_benchmarks)s
ORDER BY rank
LIMIT 50;

This adds one additional step to the query which selects the better device type for a device (see the extra SELECT rank() in there, the rest is pretty much identical to the old query). It will only have an effect on RTX GPUs currently, which support two device types (CUDA and OptiX). But it ensures the median render time is only calculated for the better option instead of calculating it over all device types.

@Patrick Mours (pmoursnv) except that the actual SQL is too much for me, I like the idea and I think it is an improvement on the current tables.

Some issues still though:

  • It is possible to game this by not having data for anything but bmw27, since the rank is based on the sum of the per-benchmark median render times, and avoiding the expensive ones will keep that sum low. Not sure how much of a problem that is.
  • For CPUs, num_cpu_threads might have to be taken into account as well. If you have two $FAST_CPUs in a box, that will be twice as fast as having just one, but both results will count towards the ranking in this table.
  • Before Blender 2.82 (I think) Windows renders were slower than Linux / macOS renders. Not sure (how) that should be taken into account.
  • Older Blender versions are slower than newer. Older devices might be unfairly penalized in this table (or is that what the percentiles are for?).

Thanks for thinking about this, I like top lists! :)

It is possible to game this by not having data for anything but bmw27, since the rank is based on the sum of the per-benchmark median render times, and avoiding the expensive ones will keep that sum low. Not sure how much of a problem that is.

This is false :). A benchmark needs to have minimum_number_of_samples_per_benchmark for each of the hardcoded scenes in benchmarks to show up.

For CPUs, num_cpu_threads might have to be taken into account as well. If you have two $FAST_CPUs in a box, that will be twice as fast as having just one, but both results will count towards the ranking in this table.

This is also not entirely accurate, since we put num_cpu_sockets in device_name.

Before Blender 2.82 (I think) Windows renders were slower than Linux / macOS renders. Not sure (how) that should be taken into account.
Older Blender versions are slower than newer. Older devices might be unfairly penalized in this table (or is that what the percentiles are for?).

Maybe we should just hardcode the Blender version to a recent one, and pick the minimum median render time over all configurations (i.e. the minimum over (device_type, os)).

If we want to not hardcode the Blender version, we could go for the Blender version that has the highest number of samples over the last couple of months / the last 10000 samples / whatever makes sense.

I'd like to emphasize the importance of the original issue again. The information about RTX GPUs has been broken for weeks now, which is really bad for the affected party (considering that Open Data does influence decisions as to what hardware to buy for Blender).

@Sem Mulder (SemMulder) You haven't commented on the fix I proposed above. Are there objections? It seems a simple enough change that doesn't affect most of the data. Do you want me to create a Differential request or could you commit something along those lines?

You haven't commented on the fix I proposed above. Are there objections?

Sorry about this, we had an internal discussion about this going on. I should have at least posted a message that we were evaluating our options.

I see you created a Differential, thank you for that. I have a few other improvements I'd like to make as well. Such as taking the fastest median time over all configurations where a configuration is the tuple (device_type, os, blender_version). I will implement and discuss this internally, after which we can hopefully resolve this issue.

Awesome, thank you! Let me know if there is anything else you need from me.
I'm hoping this can be resolved as soon as possible. If you expect more than a week, it would be helpful to get a rough time estimate as to when, so that I have more answer material for the inevitable press questions on the matter that we receive here.

The changes are in rBOD39c9dc48a6db975e9b4389d2d06be705151e962e. I am in the process of deploying it as we speak.

Sem Mulder (SemMulder) closed this task as Resolved.Feb 28 2020, 12:26 PM

The changes are deployed. I'm closing this ticket now, but still would like to know what you think. Note that we opted to remove the table in favor of improving the chart since it was hard to convey the required information in the form of a table.

Great job! The scene breakdown is especially useful.
Long term it might be useful to add a graph (or link to one) that compares all devices (CPU, GPU), but for now the two graphs works well.