Thursday, June 27, 2019

Connectivity and port forwarding for Kafka clients


When I started with Kafka, I had some questions about connectivity, and ran into seemingly inexplicable issues when trying to set up port forwarding for clients that have no direct access to the brokers or zookeeper. During my search I noticed that I wasn’t the only one struggling with this, so I decided to dedicate a short blog to this topic after I figured it out.

The picture below shows a small Kafka cluster of 2 brokers and a zookeeper, all with default port settings; both brokers listen on port 9092 and zookeeper on port 2181. Also we have a client that wishes to connect to the cluster.


Connectivity-wise you should be aware that:

-    brokers need to be able to access each other, not just zookeeper.
-    clients need to be able to access all brokers, not just the one they bootstrap from.
-    clients do not have to use zookeeper (at least not since API release 0.9.0).

If you run Kafka in an environment that is heavily firewalled, this picture should tell you which connectivity to arrange.

So what if a client has no direct access to the cluster? This may arise when you want to use some Kafka GUI or the kafkacat command line tool on your laptop from an office network. If you are allowed access by means of a bastion server (jump host, stepping stone, etc.) you can forward a local port to a remote one over a secure SSH connection.

However, from the picture it should be obvious that this will never work when both brokers are running on port 9092 (on different servers). After all, you cannot just forward one local port to two remote destinations. And even if you could (with a load balancer) the client would have no way to specify which broker to connect to. Incidentally, this is why you should not use a load balancer to access a Kafka cluster.

One solution that may work for you, is to put the brokers on different ports. This is shown in the picture below:


If – for instance – you tell broker2 to listen on port 9093, you can set up your port-forwarding so that local port 9092 relays to broker1:9092 and local port 9093 to broker2:9093, avoiding a conflict.

This can be arranged simply by changing just one property in the server.properties of broker2 (this example uses SASL over SSL, but your mileage may vary):

listeners=SASL_SSL://:9093

With this set-up, your local client can connect to either localhost:9092 or localhost:9093 or even localhost:2181 if you are port-forwarding to zookeeper as well. Remember that broker1 needs connectivity to broker2 on port 9093, otherwise the whole thing won’t work.

Note:
  • you will need to modify your local hosts file so that broker1 and broker2 both resolve to localhost (127.0.0.1). This is because by default, brokers will not advertise their listeners on localhost.
  • clients with direct access need to use the correct bootstrapping endpoint (or use zookeeper) to obtain a valid broker endpoint. So they must be configured to use either broker1:9092 and/or broker2:9093 (or zookeeper:2181).
Since this trick involves changing the hosts file (which may be OK on your laptop but not elsewhere) it is considered more of a hack than a solution.

There is an alternative that uses a standard feature of the broker to advertise its listeners. In this scenario, you keep the default port 9092 for clients with direct access and for inter-broker communication, and you configure additional ports for clients that use port forwarding.

This can be arranged (appropriate changes in boldface) in the server.properties of broker1:

listeners=SASL_SSL://:9092,PORTFWD://:9093

advertised.listeners=SASL_SSL://:9092,PORTFWD://localhost:9093


listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL,PORTFWD:SASL_SSL


Make similar changes for broker2, but use port 9094 instead of 9093. Afterwards, your set-up will look like this:

  
Arrange your port-forwarding so that local port 9093 relays to broker1:9093 and local port 9094 to broker2:9094. Your forwarded clients can connect to either localhost:9093 or localhost:9094, while clients with direct access can still use broker1:9092 or broker2:9092. Again, keep in mind that the brokers need connectivity to their peers on ports 9093 and 9094.

The upside of this scenario is that you do not need to modify your hosts file, because port 9093 and 9094 are advertised on localhost by the brokers.

The downside is that zookeeper will dutifully report all broker endpoints (listener groups), but the clients may have no way of knowing which one to use. So you will probably lose the option to bootstrap a connection through zookeeper.

If you have remarks or find a mistake, let me know!