Skip to main content

py-hamt

Overview

This library is a Python implementation of the https://github.com/rvagg/iamap library which is written in Typescript. As mentioned in py-hamt README this python version is adapted from rvagg's code; there are one-to-one mappings between functions/classes in this repo and those in the JS one. As a result, the JS code can serve as a canonical guide to implementation and functionality. The IAMap serves as the underlying algorithm behind https://github.com/rvagg/js-ipld-hashmap which is the Typescript interface which wraps IAMap. As a result, all HAMTs created with the py-hamt library are fully cross compatible with the js-ipld-hashmap library (i.e. a HAMT created with the py-hamt python library can be read by the javascript js-ipld-hashmap, which is exactly what dClimate uses to read data on its frontend and is what the JS SDK does).

See https://ipld.io/specs/advanced-data-layouts/hamt/spec/ for information on the concept of a HAMT and how it fits into the IPLD/IPFS ecosystem.

Motivation

dClimate uses HAMTs as key/value stores that can be distributed across multiple nodes and used without the whole data structure being loaded into memory. This is extremely useful in the context of zarrs, where metadata mapping coordinates to chunks containing the actual data can stretch into the 10s or even 100s of MBs. Because IPFS imposes a limit on the sizes of blocks that can be transferred from peer to peer, it is not feasible to store all this metadata in a single IPFS object. Instead, a HAMT can be used to provide efficient lookups in a data structure distributed across many IPFS objects, with only the parts of the HAMT needed for the lookup ever being accessed. Additionally since a lot climate data work and tools are in python, the creation of this library was a natural solution.

See ipldstore for an example of this HAMT implementation in action, specifically here where the HAMT is wrapped by the python client. Another quick start to work in a REPL is

>>> import ipldstore as ipldstore
>>> from py_hamt.hamt import load

>>> hamt_memory_store = ipldstore.hamt_wrapper.HamtMemoryStore("http://localhost:5001")
>>> hamt_memory_store.load("bafyreighgv6x6etoxqyyxw474q5arjvjwtqdxnk7vas43pobbifjqm2paq")

>>> Hamt.register_hasher(
0x12, 32, lambda x: hashlib.sha256(x).digest()
)
>>> hamt = load(hamt_memory_store, "bafyreighgv6x6etoxqyyxw474q5arjvjwtqdxnk7vas43pobbifjqm2paq")

# Both of the operations hamt.size() and list(hamt.keys()) involve pulling the
# entire HAMT into memory and will take a long time over the IPFS network. For
# example doing hamt.size() # or list(hamt.keys()) . HAMT sizes are ~100-200mb
# depending on the size of the Zarr where it's used so there are many small
# IPLD nodes (around 80,000). This adds to the time it takes compared to a
# straight download of a 100 MB file
# To see the HAMT in action,

>>> keys_generator = hamt.keys()
>>> print(next(keys_generator))

# OR

>>> for key in hamt.keys():
print(key)

Repo: https://github.com/dClimate/py-hamt