Benchmarks should be Samples #56304

Open
opened 2018-08-11 01:48:13 +02:00 by Francesco Siddi · 3 comments

The Open Data platform aims at hosting more than benchmark results. For this reason we should look into introducing a more generic data structure, called 'sample'.

Sample schema on My Data

field type description
id int the PK
aid str Alphanumeric representation of the PK
manage_token str Token issued to the data owner to update/remove the data from Open Data
raw_data json The sample as submitted by the client
date_created datetime Submission date
is_redacted bool (to discuss - optionally specify which fields to redact?)
weight int (to discuss)
user int FK to the User
serie int FK to the Serie - e.g. benchmark, blender_org downloads, telemetry, etc

A similar schema could be adopted by Open Data

field type description
id int the PK
aid str Alphanumeric representation of the PK
manage_token str Token issued to the data owner to update/remove the data from Open Data
data json The (redacted data) which should be indexed
date_created datetime Submission date
weight int (to discuss)
serie str The Serie - e.g. benchmark, blender_org downloads, telemetry, etc

AlphaID vs UUID

While it would be fantastic to be able to reference a sample with a 6-chars alphanumeric string, which would be build starting from the My Data sample id, we understand that the Open Data portal will get data input from various sources. This issue can be solved by providing a 'serie' name with the sample.

The Open Data platform aims at hosting more than benchmark results. For this reason we should look into introducing a more generic data structure, called 'sample'. Sample schema on My Data | **field** | **type** | **description** | | -- | -- | -- | | id | int | the PK | | aid| str | Alphanumeric representation of the PK | | manage_token | str | Token issued to the data owner to update/remove the data from Open Data| | raw_data | json | The sample as submitted by the client | | date_created | datetime | Submission date | | is_redacted | bool | (to discuss - optionally specify which fields to redact?) | | weight | int | (to discuss) | | user | int | FK to the User | | serie | int | FK to the Serie - e.g. benchmark, blender_org downloads, telemetry, etc | A similar schema could be adopted by Open Data | **field** | **type** | **description** | | -- | -- | -- | | id | int | the PK | | aid| str | Alphanumeric representation of the PK | | manage_token | str | Token issued to the data owner to update/remove the data from Open Data| | data | json | The (redacted data) which should be indexed | | date_created | datetime | Submission date | | weight | int | (to discuss) | | serie | str | The Serie - e.g. benchmark, blender_org downloads, telemetry, etc | ## AlphaID vs UUID While it would be fantastic to be able to reference a sample with a 6-chars alphanumeric string, which would be build starting from the My Data sample id, we understand that the Open Data portal will get data input from various sources. This issue can be solved by providing a 'serie' name with the sample.
Author
Owner

Added subscriber: @fsiddi

Added subscriber: @fsiddi

Added subscriber: @SemMulder

Added subscriber: @SemMulder

more generic data structure

I think a better approach would be to create different specific (i.e. non-generic) data structures for different kind of samples. I believe this to be a better option because statically known data structures are easier to reason about than dynamic ones (especially if combined with type checking), resulting in more robust code.

> more generic data structure I think a better approach would be to create different specific (i.e. non-generic) data structures for different kind of samples. I believe this to be a better option because statically known data structures are easier to reason about than dynamic ones (especially if combined with type checking), resulting in more robust code.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: infrastructure/blender-open-data#56304
No description provided.