Oracle usage patterns in Corda (and other blockchains)
The role of oracles in Blockchain
In many cases, a transaction’s validity depends on some external piece of data, that all parties need to agree on, such as the current exchange rate. However, if we were to let each participant evaluate the transaction’s validity based on their own view of this external data, the contract’s execution would be non-deterministic, because different parties would call this service at different times, and get different data. (This is especially the case for constantly changing data such as exchange rates, stock market, temperature data, etc.) Then, some signers would consider the transaction valid, while others would consider it invalid. As a result, disagreements would arise over the true state of the ledger.
We must find a way to ensure a single value is reported in a verifiable manner, especially since the outcome of the transaction is irreversible once settled. For that purpose, we need oracles.
An oracle is an entity which signs claims about the state of the world.
The assumption is that all nodes in the network trust the Oracle to provide accurate data and be fair to all. Oracles in turn get paid for providing this service, hence are incentivized to stay honest. (Oracles might still collude with a participant without getting detected; an option is to deploy multiple (decentralized) oracle nodes and use the aggregated information to validate a transaction.)
A note before we go further
Usage of the word “oracle” is not to be confused with Oracle Corporation (of which you will find many hits when searching “blockchain” and “oracle” in your search phrase). This may sound silly to people who know about this concept already, new learners might get confused with terms such as “oracle node” or “oracle data provider” (which I will clarify further), below.
Also, The Matrix references are not required for understanding any of this, but I couldn’t let an opportunity like this pass without sprinkling some classics!
Now that we have that out the way..
Using Oracles in Corda
PS: Though we discuss this example in the context of Corda (and use a bit of Corda jargon, this pattern can be used in any blockchain)
Some definitions of the terms used below
- oracle/oracle data provider — The organization that provides the required data. Members of the blockchain (business) network generally pay a subscription or usage based fee in return for services
- oracle node — A blockchain node hosted by the oracle data provider
- oracle service — The program/data source used by the oracle to fetch data and return to the requester
- oracle flow — A flow is a sequence of steps that tells a node how to perform a specific operation — in this case: getting the oracle data
Lets see how this concept works in Corda by taking their oracle example.
The read-me gives a clear overview of the application:
This CorDapp implements an oracle service that allows nodes to
Request the Nth prime number from an oracle node
Request the oracle’s signature to prove that the number included in their transaction is actually the Nth prime number
Whilst the functionality is superfluous (as primes can be verified deterministically via the contract code), this CorDapp is a simple example of how to structure an oracle service that provides querying and signing abilities. In the real world, oracles would instead provide and sign statements about stock prices, exchange rates, and other data.
Lets see how the flow works with a sequence diagram
The participants of the transaction can be sure that the value that oracle specified is included in the transaction by doing 2 checks in the contract
- The value of the prime in the command and the state is the same
- The command has been signed by the Oracle node
All good? As with all tutorials, they can only show you the door. You have to make the journey. (This is the last reference, I promise)
External (Synchronous) requests
In the example above, the computation done by the oracle is simply running a function in the oracle service which computes a prime, which calls the query service. This value is that supposed to be taken by the caller (requesting node), including it in the transaction and signing it.
This transaction (or a filtered transaction¹ ² ) is then sent back to the oracle node. who will execute sign service to confirm that the original provided value has been included in the transaction. The entire sequence happens in a single flow.
So there are two phases here: Querying, and signing.
Now lets think about the case where the Oracle cannot do a local computation, but has to call an API to fetch data which changes regularly (e.g. Stock price). If we simply call the API once during query and once during signing, there’s a very high likelihood of getting different results.
The obvious solution to this is to cache the result of the query call in memory or store it in a database, then query this database during the signing phase.
This is a established pattern, and works fine when the API call that the oracle makes during the query phase is synchronous and the result is available in a predictable (and reasonable) amount of time.
Buuuut there’s another case that we need to cover.
External (Asynchronous) requests
We have discussed two scenarios, one in which the oracle node does a local computation, and one in which the oracle node makes a synchronous API call to another service.
Now what if the API call is asynchronus?
By asynchronous, I mean that the the service to be queried receives a request, but doesnt respond immediately. This may be because the API is performing a long running operation or for optimizing threads. An asynchronous API returns only a promise (e.g. a “token”) that when the data becomes available it will do a “continue with processing of that data”, like calling a callback you provide, emitting an event that can be listened to, and other manners of triggering a so-called “continuation” to be executed.
To reduce this to specific cases, you can receive the answer in the following ways
- You send a callback URL to the external service which is triggered by external service when it has computed an answer. The response will contain the token amongst other things, which can be used to match with the tokens sent in the original response
- The requesting service (oracle node, in our case) polls the external service to check if a response has arrived.
In both of these cases, you dont have the answer available immediately. The first one is especially tricky because your oracle flow is in progress and simply receives a token. Now the external service calls back some URL with the response data, but how does your running oracle flow get this response? One option is to split the oracle actions into separate flows — one as a receiver of the request and a second flow to intitiate when the response has been received (if you didnt understand this, dont worry) — Its simply down right messy.
In the second case, its more manageable, the Oracle flow can keep polling the external source for a response, so it still flows like a typical synchronous call. But imagine this oracle node servicing hundreds of requests, you will run out of (flow) threads very soon³ .
In a specific case that I worked on, we faced a THIRD (worst) version of the problem. (sigh)
The service to be queried required us to establish a TCP socket connection with it, send the request on this connection and we would get a response on this connection after sometime.
Now you can certainly do TCP send-receives in the oracle node’s flow, but we had limitations. The specification that described the method to connect with this external service required that a single TCP socket connection is established and all communcations happen on this channel.
So now we need to ensure that the oracle node’s flow establishes a single TCP connection with the external service (which intializes the TCP connection on startup), and then every time a query request comes in, it sends that data over that channel and needs to wait for an unspecified amont of time for the result to return. We might be receiving responses to some previous requests as well in the meanwhile. Multiple flows are calling this service in parallel. The participant flow side is suspended in the meanwhile.
Lets see how we can reduce the complexity as well as time spent in the flow of Oracle node.
Getting around the problem
This workaround required us support from the oracle data provider to make a few changes:
Instead of the oracle node making request to the service, the participant node’s client will implement
- A wrapper API (REST) which internally establishes a TCP connection with the oracle service where it sends requests and listens for responses.
- A callback API which is triggered by the wrapper API when a response has come for any of the data requests on the TCP connection
Note: This will require the oracle data provider to provision access to the TCP server for the client to connect to (hence the change). But with this, we have moved the problem out of the oracle flow and to the client (where there is more freedom to design & customize your application).
The client application will not start off the transaction flow till the async call has has been received.
On the other side, when the response is computed by the oracle service, it writes this result to a local database/cache with some unique key, before returning the result to the participant node’s client.
The participant node client now has all the required data and can now call its Corda flow with this input.
Now instead of having a query and sign phases with the Oracle node, we only have the signing phase. Also, we cached the query result in a local DB/cache, so we also skip any blocking operations during the flow itself.
Lets see again in a flow chart how this sequence is different. (now assuming that calculating Nth prime number is a async/long running operation)
Since the query phase is being done in advance, the time spent during the flows in considerably smaller and deterministic.
There is an even better* way!
There is a way to make this optimize this flow further. (If you can get the now friendly oracle provider to work with you.)
So the key observation to make here is that the Oracle node is now only doing the work of signing the transaction sent by the participant node if it included the correct data, and this signature is then verified by the contract.
Why not have the original response sent to the client in the query phase be signed? Instead of just returning plaintext response, the oracle response sends along a signature on this data. This signature is then included in the transaction proposal.
Now all parties need to verify this signature in the contract, so that they can confirm that the initiating node got the information from a trusted source.
To do this, the public key corresponding the oracle provider’s signing key must be encoded into the Corda (smart) contract. Here* comes the first caveat — The public key has to be hardcoded into the contract and confirmed by each party to be correct. Secondly, if the oracle provider’s service updates its signing key for any reason, all parties must update and upgrade the contract to use this key for verification. This will lead to downtimes whenever such a situation arises, which may or may not be acceptable depending on the use-case. Pfft, trade-offs.
With this pattern, we have removed the oracle node itself from the network, so its cost-efficient for all parties involved.
That’s all folks! This is one of my longer articles, so it may take some time to digest. And there might be even better ways to do this. If you have any questions or suggestions, leave a comment below, or message me on Twitter or LinkedIn.
¹ — Filtered transactions can be created in Corda by selectively removing inputs from the merkle tree structure in order to provide only relevant data to the oracle for privacy reasons.
² — As pointed out by a reader, make a note of this important caveat while using filtered transactions.
³ — From Corda 4.4 onwards, you can define long asynchronous calls for flows and have the flow releasing the flow thread while waiting for those external operations to complete; allowing other flows to use the flow thread. More info HERE.