A few days ago my colleague Yves mentioned a perplexing finding to me - It was significantly faster to call find on a std::unordered_map, and then insert a item into the map only if the find call indicates it is not present, than to just attempt to insert an item directly with an insert call.

This is strange, because the semantics of std::unordered_map::insert are that if an item with a matching key is already in the map, then the item is not updated, and insert returns a value indicating that it failed.

Calling find, then insert, is effectively doing the lookup twice in the case where the item is not in the map already. (and doing the lookup once when it is).

Calling insert directly should only be doing the lookup once, and hence should be of comparable speed or faster.

Here is some code demonstrating the problem:

const int N = 1000000; // Num insertions to attempt
const int num_unique_keys = 65536;

std::unordered_map<int, int> m;
for(int i=0; i<N; ++i)
{
	const int x = i % num_unique_keys;
	if(m.find(x) == m.end()) // If no such entry with given key:
		m.insert(std::make_pair(x, x));
}
Is significantly faster than
std::unordered_map<int, int> m;
for(int i=0; i<N; ++i)
{
	const int x = i % num_unique_keys;
	m.insert(std::make_pair(x, x));
}
Both code snippets result in a map with the same entries. Perf results:
std::unordered_map find then insert: 0.013670 s, final size: 65536
std::unordered_map just insert:      0.099995 s, final size: 65536
std::map find then insert:           0.061898 s, final size: 65536
std::map just insert:                0.131953 s, final size: 65536
for Visual Studio 2012, Release build.

Note that for std::unordered_map, 'just inserting' is about seven times slower than calling find then insert, when it should be comparable or faster.

std::map also appears to suffer from this problem.

Looking into this, it became apparent quickly that insert is slow, and especially that insert is slow even when it doesn't insert anything due to the key being in the map already.

Stepping through the code in the debugger reveals what seems to be the underlying problem - insert does a memory allocation for a new node (apparently for the bucket linked list) unconditionally, and then deallocates the node if it was not actually inserted! Memory allocations in C++ are slow and should be avoided unless needed.

This seems to be what is responsible for the insert code being roughly seven times slower than it should be.

The obvious solution is to allocate memory for the new node only when it has been determined that it is needed, e.g. only when a matching key is not already inserted in the map.

(There is a related issue as to why chaining is even used in std::unordered_map as opposed to open addressing with linear probing, but I will leave that for another blog post)

Performance of std::map and std::unordered_map can be critical in certain algorithms, so suffering from such massive unneeded slowdowns could be significantly slowing down a lot of code using std::unordered_map.

More results on other platforms

The problem is not fixed in VS2015:
std::unordered_map find then insert: 0.015878 s, final size: 65536
std::unordered_map just insert:      0.096205 s, final size: 65536
std::map find then insert:           0.060897 s, final size: 65536
std::map just insert:                0.128556 s, final size: 65536
Xcode on MacOS Sierra: (Apple LLVM version 7.3.0 (clang-703.0.31)):
std::unordered_map find then insert: 0.020444 s, final size: 65536
std::unordered_map just insert:      0.069502 s, final size: 65536
std::map find then insert:           0.053847 s, final size: 65536
std::map just insert:                0.094894 s, final size: 65536
So the Clang std lib on Sierra seems to suffer from the same problem.

Clang 3.6 I built on Linux: (clang version 3.6.0 (trunk 225608)):

std::unordered_map find then insert: 0.013862 s, final size: 65536
std::unordered_map just insert:      0.015043 s, final size: 65536
std::map find then insert:           0.076331 s, final size: 65536
std::map just insert:                0.092790 s, final size: 65536
The results for this Clang are much more reasonable.

I haven't tried GCC yet.

If you want to try yourself, source code for the tests is available here: map_insert_test.cpp. You will have to replace some code to get it to compile though.

Bug report filed with MS here.

Edit: Interesting discussion on reddit here and in the c++ subreddit here.