GHash Improvements #43766

New Issue

Bastien Montagne · 2015-02-21T15:21:03+01:00

Bastien Montagne commented

2015-02-21 15:21:03 +01:00

This task is related to work done on our GHash in temp-ghash-experiments branch.

Current Branch vs Master

This is a simple comparison between new code in branch and master one, using as close a possible same tests. Both are done using modulo, and no pre-reservation of buckets. Master uses load 3.0, new branch, load 0.75. New branch also adds hash stored in entries.

	Master		Branch
	Insert ms	Lookup ms	Insert ms	Lookup ms
--------------------------------	---------	---------	---------	---------
Strings (1M words, 122M file)	8.287163	5.990606	2.646282	2.613921
100M first integers	18.988747	23.433619	18.983423	12.533957
50M random integers	8.690880	10.420476	8.778986	5.496768
50M random integers (no hash)	8.495241	9.679239	8.603362	5.333819
20M random 4D integers vectors	6.970162	5.530896	3.979335	2.019020

Notes:

Switching from a load 3.0 to 0.75 implies between one and two additional resizes in branch code, which means using pre-reservation gives even better speedup.
integer hashing uses 'uint in pointer' (i.e. value of the void pointer itself). We can see how this is efficient (no dereference at all), storing hashes in entries gives nearly no gain in this case during insert - even the super-quick 'modulo only' option (no hash in table above) gives nearly no speedup.
Storing hashes helps a lot with expansive hashing (strings…), and in all cases during lookup (because it avoids having to call cmpfp on all entries sharing a same bucket until we find the good one).

Reducing Load

Here are the results of some tests, ran using this corpus (1 million of words, about 122MB of text), from corpora.informatik.uni-leipzig.de.

Inserted keys were: the whole text, all its lines, and all its words, leading to 2 917 605 entries.

Pre-reservation of buckets	Hash	LoadT	Type	Insert ms	Lookup ms	Quality	Variance	Load %	Empty %	Overloaded %
Strings (1M words, 122M file)	GH	0.75	Mask	2.556978	3.036947	1.002262	0.349653	34.78	70.68	4.84 (7)
Strings (1M words, 122M file)	GH	0.75	Mod	2.741920	3.156516	1.002311	0.349693	34.78	70.68	4.85 (7)
Strings (1M words, 122M file)	GH	3	Mask	3.567847	4.707220	1.002759	1.404238	139.12	25.03	5.36 (10)
Strings (1M words, 122M file)	GH	3	Mod	4.116788	5.141498	1.001088	1.396343	139.12	24.94	5.29 (11)
Strings (1M words, 122M file)	MM2	0.75	Mask	2.836161	3.226247	1.000052	0.347848	34.78	70.63	4.82 (6)
Strings (1M words, 122M file)	MM2	0.75	Mod	3.083521	3.520704	0.999960	0.347772	34.78	70.62	4.81 (7)
Strings (1M words, 122M file)	MM2	3	Mask	3.807093	4.904332	0.999995	1.391200	139.12	24.89	5.27 (11)
Strings (1M words, 122M file)	MM2	3	Mod	4.198665	5.395439	0.999922	1.390842	139.12	24.89	2.28 (9)
No pre-reservation of buckets	Hash	LoadT	Type	Insert ms	Lookup ms	Quality	Variance	Load %	Empty %	Overloaded %
--------------------------------	----	-----	----	------	------	-------	--------	------	-------	------------
Strings (1M words, 122M file)	GH	0.75	Mask	6.177619	3.579811	1.001687	0.698775	69.56	49.95	15.45 (8)
Strings (1M words, 122M file)	GH	0.75	Mod	6.670569	4.010712	1.002150	0.699641	69.56	49.96	15.43 (8)
Strings (1M words, 122M file)	GH	3	Mask	7.459112	5.181792	1.002515	2.815917	278.24	6.29	30.41 (13)
Strings (1M words, 122M file)	GH	3	Mod	7.634238	5.366527	1.000709	2.791862	278.24	6.23	30.45 (14)
Strings (1M words, 122M file)	MM2	0.75	Mask	5.133478	3.935534	1.000095	0.695790	69.56	49.88	15.43 (9)
Strings (1M words, 122M file)	MM2	0.75	Mod	5.434215	4.090286	0.999919	0.695456	69.56	49.88	15.44 (8)
Strings (1M words, 122M file)	MM2	3	Mask	6.287116	5.402290	1.000289	2.786288	278.24	6.20	30.43 (15)
Strings (1M words, 122M file)	MM2	3	Mod	6.577213	5.520325	0.999844	2.780355	278.24	6.17	30.47 (16)

Notes:

GH stands for standard GHash hashing, MM2 for Murmur2 method
LoadT is the 'load threshold', i.e. the maximum average amount of entries per bucket.
Overloaded % is the percentage of buckets holding more entries than max(1, LoadT) (i.e. two or more for 0.75, four or more for 3).
Did same tests using first 100M integers, and 10M random vectors of four integers, as keys. Results are more or less similar as with strings.

From that we can conclude (at least for now) that:

Our current load threshold (3) is really an issue! We are talking of several tens of percent slower than with a threshold of 0.75, in insertion and lookup - memory saving seems really not worth it here!
Murmur2 hashing is slightly better than GHash one, but tradeoff in performances is not worth it usually (only exception would be big-enough strings, where it becomes better than our char-looping current hashing).
Modulo is even more slightly better than masking, but again, noticeably slower.

Also, we can see how important it is to pre-allocate (reserve) GHash buckets whenever possible. Resizing buckets is a very expansive operation. We can also see how interesting it can be to over-reserve buckets for better lookup times (see differences in lookup times between different % of load).

Storing Hashes

And now, here are some results with hashes stored in entries:

Pre-reservation of buckets	Hash	LoadT	Type	Insert ms	Lookup ms	Quality	Variance	Load %	Empty %	Overloaded %
Strings (1M words, 122M file)	GH	0.75	Mask	2.182862	2.383782	1.002262	0.349653	34.78	70.68	4.84 (7)
Strings (1M words, 122M file)	GH	0.75	Mod	2.438983	2.601138	1.002311	0.349693	34.78	70.68	4.85 (7)
Strings (1M words, 122M file)	MM2	0.75	Mask	2.439041	2.674447	1.000052	0.347848	34.78	70.63	4.82 (6)
Strings (1M words, 122M file)	MM2	0.75	Mod	2.624448	2.830084	0.999960	0.347772	34.78	70.62	4.81 (7)
No pre-reservation of buckets	Hash	LoadT	Type	Insert ms	Lookup ms	Quality	Variance	Load %	Empty %	Overloaded %
--------------------------------	----	-----	----	------	------	-------	--------	------	-------	------------
Strings (1M words, 122M file)	GH	0.75	Mask	2.475735	2.459534	1.001687	0.698775	69.56	49.95	15.45 (8)
Strings (1M words, 122M file)	GH	0.75	Mod	2.743342	2.621367	1.002150	0.699641	69.56	49.96	15.43 (8)
Strings (1M words, 122M file)	MM2	0.75	Mask	2.825784	2.770305	1.000095	0.695790	69.56	49.88	15.43 (9)
Strings (1M words, 122M file)	MM2	0.75	Mod	3.047072a	2.975727	0.999919	0.695456	69.56	49.88	15.44 (8)

Benefits are obvious, non-reserved times are now quite close to reserve times even!

When Using Non-hashing Hash Functions

Yes, we do that in Blender - quit a bit actually!

Pre-reservation of buckets	Hash	LoadT	Type	Insert ms	Lookup ms	Quality	Variance	Load %	Empty %	Overloaded %
20M pointers from an array	None	0.75	Mask	0.870727	0.500824	4.068809	5.344611	59.60	93.75	6.25 (10)
20M pointers from an array	None	0.75	Mod	0.840899	0.320517	0.770402	0.240775	59.60	40.40	0.00 (1)

Here we store 20M entries using raw pointers addresses as keys (from an array, so keys are consecutive addresses). No surprise here, a single modulo division shines, because (due to load < 1) it can perfectly spread all entries, while masking (since addresses are not small values) fails quite heavily. Note that even in this hyper-worst-case scenario, performances of masking are not that catastrophic.

In any other tested cases (consecutive ints starting from zero, and random ints), masking and modulo get very similar results.

So I think either we decide to stick to modulo, accepting the (more or less) 10% loss of speed, or we decide to switch to masking, which involves a careful check of all cases where we are currently using raw integers as hashes (e.g. ghashutil_bmelem_indexhash(), imagecache_hashhash(), …). First solution looks simpler for now.

About memory

Here are some (rough) estimations for 100M of entries:

Memory usage for 100M entries	GHash			GSet
Memory (in MB)	Buckets	Entries	Total	Buckets	Entries	Total
-------------------------------	-------	-------	-----	-------	-------	-----
Current Master (load: 3)	268	2400	2668	268	1600	1868
Load: 0.75	1074	2400	3474	1074	1600	2674
Load: 0.75, storing hashes	1074	3200	4274	1074	2400	3474

In other words, all proposed changes approximately increase the memory footprint of GHash of 60%, and of GSet of 86%.
But our typical usage in Blender nearly never involves more than a few millions entries (usually much less actually), so I would not consider that an issue?

This task is related to work done on our GHash in [temp-ghash-experiments](https://developer.blender.org/diffusion/B/browse/temp-ghash-experiments/) branch. Current Branch vs Master **** This is a simple comparison between new code in branch and master one, using as close a possible same tests. Both are done using modulo, and no pre-reservation of buckets. Master uses load 3.0, new branch, load 0.75. New branch also adds hash stored in entries. | | Master | | Branch | | -------------------------------- | --------- | --------- | --------- | --------- | | Insert ms | Lookup ms | Insert ms | Lookup ms | -------------------------------- | --------- | --------- | --------- | --------- | Strings (1M words, 122M file) | 8.287163 | 5.990606 | 2.646282 | 2.613921 | 100M first integers | 18.988747 | 23.433619 | 18.983423 | 12.533957 | 50M random integers | 8.690880 | 10.420476 | 8.778986 | 5.496768 | 50M random integers (no hash) | 8.495241 | 9.679239 | 8.603362 | 5.333819 | 20M random 4D integers vectors | 6.970162 | 5.530896 | 3.979335 | 2.019020 Notes: * Switching from a load 3.0 to 0.75 implies between one and two additional resizes in branch code, which means using pre-reservation gives even better speedup. * integer hashing uses 'uint in pointer' (i.e. value of the void pointer itself). We can see how this is efficient (no dereference at all), storing hashes in entries gives nearly no gain in this case during insert - even the super-quick 'modulo only' option (`no hash` in table above) gives nearly no speedup. * Storing hashes helps a lot with expansive hashing (strings…), and in all cases during lookup (because it avoids having to call cmpfp on all entries sharing a same bucket until we find the good one). Reducing Load **** Here are the results of some tests, ran using [this corpus](http:*corpora.uni-leipzig.de/downloads/eng_wikipedia_2010_1M-text.tar.gz) (1 million of words, about 122MB of text), from [corpora.informatik.uni-leipzig.de](http:*corpora.informatik.uni-leipzig.de/download.html). Inserted keys were: the whole text, all its lines, and all its words, leading to 2 917 605 entries. | Pre-reservation of buckets | Hash | LoadT | Type | Insert ms| Lookup ms | Quality | Variance | Load % | Empty % | Overloaded % | -------------------------------- | ---- | ----- | ---- | ------ | ------ | ------- | -------- | ------ | ------- | ------------ | Strings (1M words, 122M file) | GH | 0.75 | Mask | 2.556978 | 3.036947 | 1.002262 | 0.349653 | 34.78 | 70.68 | 4.84 (7) | Strings (1M words, 122M file) | GH | 0.75 | Mod | 2.741920 | 3.156516 | 1.002311 | 0.349693 | 34.78 | 70.68 | 4.85 (7) | Strings (1M words, 122M file) | GH | 3 | Mask | 3.567847 | 4.707220 | 1.002759 | 1.404238 | 139.12 | 25.03 | 5.36 (10) | Strings (1M words, 122M file) | GH | 3 | Mod | 4.116788 | 5.141498 | 1.001088 | 1.396343 | 139.12 | 24.94 | 5.29 (11) | Strings (1M words, 122M file) | MM2 | 0.75 | Mask | 2.836161 | 3.226247 | 1.000052 | 0.347848 | 34.78 | 70.63 | 4.82 (6) | Strings (1M words, 122M file) | MM2 | 0.75 | Mod | 3.083521 | 3.520704 | 0.999960 | 0.347772 | 34.78 | 70.62 | 4.81 (7) | Strings (1M words, 122M file) | MM2 | 3 | Mask | 3.807093 | 4.904332 | 0.999995 | 1.391200 | 139.12 | 24.89 | 5.27 (11) | Strings (1M words, 122M file) | MM2 | 3 | Mod | 4.198665 | 5.395439 | 0.999922 | 1.390842 | 139.12 | 24.89 | 2.28 (9) | No pre-reservation of buckets | Hash | LoadT | Type | Insert ms| Lookup ms | Quality | Variance | Load % | Empty % | Overloaded % | -------------------------------- | ---- | ----- | ---- | ------ | ------ | ------- | -------- | ------ | ------- | ------------ | Strings (1M words, 122M file) | GH | 0.75 | Mask | 6.177619 | 3.579811 | 1.001687 | 0.698775 | 69.56 | 49.95 | 15.45 (8) | Strings (1M words, 122M file) | GH | 0.75 | Mod | 6.670569 | 4.010712 | 1.002150 | 0.699641 | 69.56 | 49.96 | 15.43 (8) | Strings (1M words, 122M file) | GH | 3 | Mask | 7.459112 | 5.181792 | 1.002515 | 2.815917 | 278.24 | 6.29 | 30.41 (13) | Strings (1M words, 122M file) | GH | 3 | Mod | 7.634238 | 5.366527 | 1.000709 | 2.791862 | 278.24 | 6.23 | 30.45 (14) | Strings (1M words, 122M file) | MM2 | 0.75 | Mask | 5.133478 | 3.935534 | 1.000095 | 0.695790 | 69.56 | 49.88 | 15.43 (9) | Strings (1M words, 122M file) | MM2 | 0.75 | Mod | 5.434215 | 4.090286 | 0.999919 | 0.695456 | 69.56 | 49.88 | 15.44 (8) | Strings (1M words, 122M file) | MM2 | 3 | Mask | 6.287116 | 5.402290 | 1.000289 | 2.786288 | 278.24 | 6.20 | 30.43 (15) | Strings (1M words, 122M file) | MM2 | 3 | Mod | 6.577213 | 5.520325 | 0.999844 | 2.780355 | 278.24 | 6.17 | 30.47 (16) Notes: * *GH* stands for standard GHash hashing, *MM2* for Murmur2 method * *LoadT* is the 'load threshold', i.e. the maximum average amount of entries per bucket. * *Overloaded %* is the percentage of buckets holding more entries than max(1, LoadT) (i.e. two or more for 0.75, four or more for 3). * Did same tests using first 100M integers, and 10M random vectors of four integers, as keys. Results are more or less similar as with strings. From that we can conclude (at least for now) that: * Our current load threshold (3) is really an issue! We are talking of several tens of percent slower than with a threshold of 0.75, in insertion **and** lookup - memory saving seems really not worth it here! * Murmur2 hashing is *slightly* better than GHash one, but tradeoff in performances is not worth it usually (only exception would be big-enough strings, where it becomes better than our char-looping current hashing). * Modulo is even more slightly better than masking, but again, noticeably slower. Also, we can see how important it is to pre-allocate (reserve) GHash buckets whenever possible. Resizing buckets is a very expansive operation. We can also see how interesting it can be to over-reserve buckets for better lookup times (see differences in lookup times between different % of load). ---------- Storing Hashes **** And now, here are some results with hashes stored in entries: | Pre-reservation of buckets | Hash | LoadT | Type | Insert ms| Lookup ms | Quality | Variance | Load % | Empty % | Overloaded % | -------------------------------- | ---- | ----- | ---- | ------ | ------ | ------- | -------- | ------ | ------- | ------------ | Strings (1M words, 122M file) | GH | 0.75 | Mask | 2.182862 | 2.383782 | 1.002262 | 0.349653 | 34.78 | 70.68 | 4.84 (7) | Strings (1M words, 122M file) | GH | 0.75 | Mod | 2.438983 | 2.601138 | 1.002311 | 0.349693 | 34.78 | 70.68 | 4.85 (7) | Strings (1M words, 122M file) | MM2 | 0.75 | Mask | 2.439041 | 2.674447 | 1.000052 | 0.347848 | 34.78 | 70.63 | 4.82 (6) | Strings (1M words, 122M file) | MM2 | 0.75 | Mod | 2.624448 | 2.830084 | 0.999960 | 0.347772 | 34.78 | 70.62 | 4.81 (7) | No pre-reservation of buckets | Hash | LoadT | Type | Insert ms| Lookup ms | Quality | Variance | Load % | Empty % | Overloaded % | -------------------------------- | ---- | ----- | ---- | ------ | ------ | ------- | -------- | ------ | ------- | ------------ | Strings (1M words, 122M file) | GH | 0.75 | Mask | 2.475735 | 2.459534 | 1.001687 | 0.698775 | 69.56 | 49.95 | 15.45 (8) | Strings (1M words, 122M file) | GH | 0.75 | Mod | 2.743342 | 2.621367 | 1.002150 | 0.699641 | 69.56 | 49.96 | 15.43 (8) | Strings (1M words, 122M file) | MM2 | 0.75 | Mask | 2.825784 | 2.770305 | 1.000095 | 0.695790 | 69.56 | 49.88 | 15.43 (9) | Strings (1M words, 122M file) | MM2 | 0.75 | Mod | 3.047072a | 2.975727 | 0.999919 | 0.695456 | 69.56 | 49.88 | 15.44 (8) Benefits are obvious, non-reserved times are now quite close to reserve times even! ---------- When Using Non-hashing Hash Functions **** Yes, we do that in Blender - quit a bit actually! | Pre-reservation of buckets | Hash | LoadT | Type | Insert ms| Lookup ms | Quality | Variance | Load % | Empty % | Overloaded % | -------------------------------- | ---- | ----- | ---- | ------ | ------ | ------- | -------- | ------ | ------- | ------------ | 20M pointers from an array | None | 0.75 | Mask | 0.870727 | 0.500824 | 4.068809 | 5.344611 | 59.60 | 93.75 | 6.25 (10) | 20M pointers from an array | None | 0.75 | Mod | 0.840899 | 0.320517 | 0.770402 | 0.240775 | 59.60 | 40.40 | 0.00 (1) Here we store 20M entries using raw pointers addresses as keys (from an array, so keys are consecutive addresses). No surprise here, a single modulo division shines, because (due to load < 1) it can perfectly spread all entries, while masking (since addresses are not small values) fails quite heavily. Note that even in this hyper-worst-case scenario, performances of masking are not that catastrophic. In any other tested cases (consecutive ints starting from zero, and random ints), masking and modulo get very similar results. So I think either we decide to stick to modulo, accepting the (more or less) 10% loss of speed, or we decide to switch to masking, which involves a careful check of all cases where we are currently using raw integers as hashes (e.g. `ghashutil_bmelem_indexhash()`, `imagecache_hashhash()`, …). First solution looks simpler for now. ---------- About memory **** Here are some (rough) estimations for 100M of entries: | Memory usage for 100M entries | GHash | | |GSet | | | ------------------------------- | ------- | ------- | ----- | ------- | ------- | ----- | Memory (in MB) | Buckets | Entries | Total | Buckets | Entries | Total | ------------------------------- | ------- | ------- | ----- | ------- | ------- | ----- | Current Master (load: 3) | 268 | 2400 | 2668 | 268 | 1600 | 1868 | Load: 0.75 | 1074 | 2400 | 3474 | 1074 | 1600 | 2674 | Load: 0.75, storing hashes | 1074 | 3200 | 4274 | 1074 | 2400 | 3474 In other words, all proposed changes approximately increase the memory footprint of GHash of 60%, and of GSet of 86%. But our typical usage in Blender nearly never involves more than a few millions entries (usually much less actually), so I would not consider that an issue?

Bastien Montagne commented

2015-02-21 15:21:03 +01:00

Changed status to: 'Open'

Bastien Montagne commented

2015-02-21 15:21:03 +01:00

Added subscribers: @mont29, @ideasman42

Bastien Montagne commented

2015-02-21 15:34:30 +01:00

Next things to check:

Storing hash in Entry struct (could help a bit during resizes).
Try to be more drastic when increasing our buckets size, maybe incrementing by two (i.e. ~four times more buckets) instead of one (i.e. ~two times more buckets), at least for smaller sizes (maybe up to 2^16).

Next things to check: * Storing hash in Entry struct (could help a bit during resizes). * Try to be more drastic when increasing our buckets size, maybe incrementing by two (i.e. ~four times more buckets) instead of one (i.e. ~two times more buckets), at least for smaller sizes (maybe up to 2^16).

Bastien Montagne commented

2015-02-22 11:16:18 +01:00

Added hashes to entries, this gives another nice boost in resizing, but also lookup (due to the fact comparing the hash avoids us many comparison on actual key value, which can be expansive e.g. with strings, I guess).

Also tried to increment sizes by two (instead of one) for small ones (below 16), but this did not showed any benefit when inserting into a non-reserved ghash, times were actually even worse (not sure why).

Added hashes to entries, this gives another nice boost in resizing, but also lookup (due to the fact comparing the hash avoids us many comparison on actual key value, which can be expansive e.g. with strings, I guess). Also tried to increment sizes by two (instead of one) for small ones (below 16), but this did not showed any benefit when inserting into a non-reserved ghash, times were actually even worse (not sure why).

Bastien Montagne self-assigned this 2015-04-29 16:24:17 +02:00

Bastien Montagne commented

2019-03-14 10:10:23 +01:00

Changed status from 'Open' to: 'Archived'

Bastien Montagne closed this issue

2019-03-14 10:10:23 +01:00

Sign in to join this conversation.

No Label

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

GHash Improvements #43766