On-chain Streams

BeidouChain streams enable a blockchain to be used as a general purpose append-only database, with the blockchain providing timestamping, notarization and immutability.

Streams provide a natural abstraction for blockchain use cases which focus on general data retrieval, timestamping and archiving, rather than the transfer of assets between participants. Streams can be used to implement three different types of databases on a chain:

  1. A key-value database or document store, in the style of NoSQL.
  2. A time series database, which focuses on the ordering of entries.
  3. An identity-driven database where entries are classified according to their author.

These can be considered as the ‘what’, ‘when’ and ‘who’ of a shared database.

 

Streams basics

Any number of streams can be created in a BeidouChain blockchain, and each stream acts as an independent append-only collection of items. Each item in a stream has the following characteristics:

  • One or more publishers who have digitally signed that item.
  • An optional key for convenient later retrieval.
  • Some data, which can range from a small piece of text to many megabytes of raw binary.
  • timestamp, which is taken from the header of the block in which the item is confirmed.

Behind the scenes, each item in a stream is represented by a blockchain transaction, but developers can read and write streams with no awareness of this underlying mechanism. (More advanced users can use raw transactions to write to multiple streams, issue or transfer assets and/or assign permissions in a single atomic transaction.)

Streams integrate with BeidouChain’s permissions system in a number of ways. First, streams can only be created by those who have permission to do so, in the same way that assets can only be issued by certain addresses. When a stream is created, it is open or closed. Open streams are writeable by anybody who has permission to send a blockchain transaction, while closed streams are restricted to a changeable list of permitted addresses. In the latter case, each stream has one or more administrators who can change those write permissions over time.

Each blockchain has an optional ‘root’ stream, which is defined in its parameters and exists from the moment the chain is created. This enables a blockchain to be used immediately for storing and retrieving data, without waiting for a stream to be explicitly created.

Confidentiality is the biggest challenge in a large number of blockchain use cases. This is because each node in a blockchain sees a full copy of the entire chain’s contents. Streams provide a natural way to support encrypted data on a blockchain, as follows:

  1. One stream is used by participants to distribute their public keys for any public-key cryptography scheme.
  2. A second stream is used to publish data, where each piece of data is encrypted using symmetric cryptography with a unique key.
  3. A third stream provides data access. For each participant who should see a piece of data, a stream entry is created which contains that data’s secret key, encrypted using that participant’s public key.

This provides an efficient way to archive data on a blockchain, while making it visible only to certain participants.

 

Retrieving from streams

The core value of streams is in indexing and retrieval. Each node can choose which streams to subscribe to, with the blockchain guaranteeing that all nodes which subscribe to a particular stream will see the same items within. (A node can also be configured to automatically subscribe to every new stream created.)

If a node is subscribed to a stream, information can be retrieved from that stream in a number of ways:

  • Retrieving items from the stream in order.
  • Retrieving items with a particular key.
  • Retrieving items signed by a particular publisher.
  • Listing the keys used in a stream, with item counts for each key.
  • Listing the publishers in a stream, with item counts.

As mentioned at the start, these methods of retrieval allow streams to be used for key-value databasestime series databases and identity-driven databases. All retrieval APIs offer start and count parameters, allowing subsections of long lists to be efficiently retrieved (like a LIMIT clause in SQL). Negative values for start allow the most recent items to be retrieved.

Streams can contain multiple items with the same key, and this naturally solves the tension between blockchain immutability and the need to update a database. Each effective database ‘entry’ should be assigned a unique key in your application, with each update to that entry represented by a new stream item with its key. BeidouChain’s stream retrieval APIs can then be used to: (a) retrieve the first or last version of a given entry, (b) retrieve a full version history for an entry, (c) retrieve information about multiple entries, including the first and last versions of each.

Note that because of a blockchain’s peer-to-peer architecture, items in a stream may arrive at different nodes in different orders, and BeidouChain allows items to be retrieved before they are ‘confirmed’ in a block. As a result, all retrieval APIs offer a choice between global (the default) or local ordering. Global ordering guarantees that, once the chain has reached consensus, all nodes receive the same responses from the same API calls. Local ordering guarantees that, for any particular node, the ordering of a stream’s items will never change between API calls. Each application can make the appropriate choice for its needs.