The First Synchronization of Ethereum-Based Blockchains

Traditionally, when syncing an Ethereum blockchain, your client would download and validate every block and every transaction since the very start—i.e., from the genesis block.

While it is possible to fully sync the blockchain this way, this type of sync will take a very long time and has high resource requirements (it will need much more RAM, and will take a very long time indeed if you don’t have fast storage).

Many Ethereum-based blockchains were the victim of denial-of-service attacks at the end of 2016. Affected blockchains will tend to sync slowly when doing a full sync.

For example, on Ethereum, a new client will make rapid progress until it reaches block 2,283,397. This block was mined on September 18, 2016, and marks the beginning of the DoS attacks. From this block to block 2,700,031 (November 26, 2016), the validation of transactions becomes extremely slow, memory intensive, and I/O intensive. This results in validation times exceeding 1 minute per block. Ethereum implemented a series of upgrades, using hard forks, to address the underlying vulnerabilities that were exploited in the DoS attacks. These upgrades also cleaned up the blockchain by removing some 20 million empty accounts created by spam transactions.

If you are syncing with full validation, your client will slow down and may take several days, or perhaps even longer, to validate the blocks affected by the DoS attacks.

Fortunately, most Ethereum clients by default now perform a “fast” synchronization that skips the full validation of transactions until it has synced to the tip of the blockchain, then resumes full validation.

Geth performs fast synchronization by default for Ethereum. You may need to refer to the specific instructions for other chosen Ethereum chain.

Parity also does fast synchronization by default.

Note

Geth can only operate fast synchronization when starting with an empty block database. If you have already started syncing without fast mode, Geth cannot switch. It is faster to delete the blockchain data directory and start fast syncing from the beginning than to continue syncing with full validation. Be careful to not delete any wallets when deleting the blockchain data!

Running Geth or Parity

Now that you understand the challenges of the “first sync,” you’re ready to start an Ethereum client and sync the blockchain. For both Geth and Parity, you can use the —help option to see all the configuration parameters. The default settings are usually sensible and appropriate for most uses. Choose how to configure any optional parameters to suit your needs, then start Geth or Parity to sync the chain. Then wait…​

Tip

Syncing the Ethereum blockchain will take anywhere from half a day on a very fast system with lots of RAM, to several days on a slower system.

The JSON-RPC Interface

Ethereum clients offer an application programming interface and a set of Remote Procedure Call (RPC) commands, which are encoded as JavaScript Object Notation (JSON). You will see this referred to as the JSON-RPC API. Essentially, the JSON-RPC API is an interface that allows us to write programs that use an Ethereum client as a gateway to an Ethereum network and blockchain.

Usually, the RPC interface is offered as an HTTP service on port 8545. For security reasons it is restricted, by default, to only accept connections from localhost (the IP address of your own computer, which is 127.0.0.1).

To access the JSON-RPC API, you can use a specialized library (written in the programming language of your choice) that provides “stub” function calls corresponding to each available RPC command, or you can manually construct HTTP requests and send/receive JSON-encoded requests. You can even use a generic command-line HTTP client, like curl, to call the RPC interface. Let’s try that. First, ensure that you have Geth up and running, configured with —rpc to allow HTTP access to the RPC interface, then switch to a new terminal window (e.g., with Ctrl-Shift-N or Ctrl-Shift-T in an existing terminal window) as shown here:

  1. $ curl -X POST -H "Content-Type: application/json" --data \
  2. '{"jsonrpc":"2.0","method":"web3_clientVersion","params":[],"id":1}' \
  3. http://localhost:8545
  4. {"jsonrpc":"2.0","id":1,
  5. "result":"Geth/v1.9.11-unstable-0b284f6c-20200123/linux-amd64/go1.13.4"}

In this example, we use curl to make an HTTP connection to the address http://localhost:8545. We are already running geth, which offers the JSON-RPC API as an HTTP service on port 8545. We instruct curl to use the HTTP POST command and to identify the content as type application/json. Finally, we pass a JSON-encoded request as the data component of our HTTP request. Most of our command line is just setting up curl to make the HTTP connection correctly. The interesting part is the actual JSON-RPC command we issue:

  1. {"jsonrpc":"2.0","method":"web3_clientVersion","params":[],"id":1}

The JSON-RPC request is formatted according to the JSON-RPC 2.0 specification. Each request contains four elements:

jsonrpc

Version of the JSON-RPC protocol. This MUST be exactly “2.0”.

method

The name of the method to be invoked.

params

A structured value that holds the parameter values to be used during the invocation of the method. This member MAY be omitted.

id

An identifier established by the client that MUST contain a String, Number, or NULL value if included. The server MUST reply with the same value in the response object if included. This member is used to correlate the context between the two objects.

Tip

The id parameter is used primarily when you are making multiple requests in a single JSON-RPC call, a practice called batching. Batching is used to avoid the overhead of a new HTTP and TCP connection for every request. In the Ethereum context, for example, we would use batching if we wanted to retrieve thousands of transactions over one HTTP connection. When batching, you set a different id for each request and then match it to the id in each response from the JSON-RPC server. The easiest way to implement this is to maintain a counter and increment the value for each request.

The response we receive is:

  1. {"jsonrpc":"2.0","id":1,
  2. "result":"Geth/v1.9.11-unstable-0b284f6c-20200123/linux-amd64/go1.13.4"}

This tells us that the JSON-RPC API is being served by Geth client version 1.13.4.

Let’s try something a bit more interesting. In the next example, we ask the JSON-RPC API for the current price of gas in wei:

  1. $ curl -X POST -H "Content-Type: application/json" --data \
  2. '{"jsonrpc":"2.0","method":"eth_gasPrice","params":[],"id":4213}' \
  3. http://localhost:8545
  4. {"jsonrpc":"2.0","id":4213,"result":"0x430e23400"}

The response, 0x430e23400, tells us that the current gas price is 18 gwei (gigawei or billion wei). If, like us, you don’t think in hexadecimal, you can convert it to decimal on the command line with a little bash-fu:

  1. $ echo $((0x430e23400))
  2. 18000000000

The full JSON-RPC API can be investigated on the Ethereum wiki.

Parity’s Geth compatibility mode

Parity has a special “Geth compatibility mode,” where it offers a JSON-RPC API that is identical to that offered by Geth. To run Parity in this mode, use the —geth switch:

  1. $ parity --geth