Unordered Associative

Insertion Hints

Here is how the hinting works in the libstdc++ implementation of unordered containers, and the rationale behind this behavior.

In the following text, the phrase equivalent to refer to the result of the invocation of the equal predicate imposed on the container by its key_equal object, which defaults to (basically) ==.

Unordered containers can be seen as a std::vector of std::forward_list. The std::vector represents the buckets and each std::forward_list is the list of nodes belonging to the same bucket. When inserting an element in such a data structure we first need to compute the element hash code to find the bucket to insert the element to, the second step depends on the uniqueness of elements in the container.

In the case of std::unordered_set and std::unordered_map you need to look through all bucket's elements for an equivalent one. If there is none the insertion can be achieved, otherwise the insertion fails. As we always need to loop though all bucket's elements, the hint doesn't tell us if the element is already present, and we don't have any constraint on where the new element is to be inserted, the hint won't be of any help and will then be ignored.

In the case of std::unordered_multiset and std::unordered_multimap equivalent elements must be linked together so that the equal_range(const key_type&) can return the range of iterators pointing to all equivalent elements. This is where hinting can be used to point to another equivalent element already part of the container and so skip all non equivalent elements of the bucket. So to be useful the hint shall point to an element equivalent to the one being inserted. The new element will be then inserted right after the hint. Note that because of an implementation detail inserting after a node can require updating the bucket of the following node. To check if the next bucket is to be modified we need to compute the following node's hash code. So if you want your hint to be really efficient it should be followed by another equivalent element, the implementation will detect this equivalence and won't compute next element hash code.

It is highly advised to start using unordered containers hints only if you have a benchmark that will demonstrate the benefit of it. If you don't then do not use hints, it might do more harm than good.

Hash Code

Hash Code Caching Policy

The unordered containers in libstdc++ may cache the hash code for each element alongside the element itself. In some cases not recalculating the hash code every time it's needed can improve performance, but the additional memory overhead can also reduce performance, so whether an unordered associative container caches the hash code or not depends on a number of factors. The caching policy for GCC 4.8 is described below.

The C++ standard requires that erase and swap operations must not throw exceptions. Those operations might need an element's hash code, but cannot use the hash function if it could throw. This means the hash codes will be cached unless the hash function has a non-throwing exception specification such as noexcept or throw().

Secondly, libstdc++ also needs the hash code in the implementation of local_iterator and const_local_iterator in order to know when the iterator has reached the end of the bucket. This means that the local iterator types will embed a copy of the hash function when possible. Because the local iterator types must be DefaultConstructible and CopyAssignable, if the hash function type does not model those concepts then it cannot be embedded and so the hash code must be cached. Note that a hash function might not be safe to use when default-constructed (e.g if it a function pointer) so a hash function that is contained in a local iterator won't be used until the iterator is valid, so the hash function has been copied from a correctly-initialized object.

If the hash function is non-throwing, DefaultConstructible and CopyAssignable then libstdc++ doesn't need to cache the hash code for correctness, but might still do so for performance if computing a hash code is an expensive operation, as it may be for arbitrarily long strings. As an extension libstdc++ provides a trait type to describe whether a hash function is fast. By default hash functions are assumed to be fast unless the trait is specialized for the hash function and the trait's value is false, in which case the hash code will always be cached. The trait can be specialized for user-defined hash functions like so:

      #include <unordered_set>

      struct hasher
      {
        std::size_t operator()(int val) const noexcept
        {
          // Some very slow computation of a hash code from an int !
          ...
        }
      }

      namespace std
      {
        template<>
          struct __is_fast_hash<hasher> : std::false_type
          { };
      }