The following error occured within a Docker environment that contains Apache Kafka and Zookeper and is managed by Confluentinc.
o.apache.kafka.clients.NetworkClient - [Producer clientId=producer-2] Connection to node -1 could not be established. Broker may not be available.
It was a mess to find the solution. First, we checked the source code and the functionalities of the existing Akka actors. It became clear that this would not lead to the solution.
Another idea was to understand what the Confluentinc Docker image really was doing. After researching the default setting that are used to startup the Apache Kafka and the Zookeeper instances, it was clear that here would be the crux.
We modified the docker-compose.yml by adding one additional parameter to the Apache Kafka configuration. The final docker-compose.yml file looks like follows:
version: '3' services: my-service-postgres-test-s: image: postgres:9.6.6-alpine container_name: my-service-postgres-test-c env_file: - ./db/env_files/env_test.list ports: - "36822:5432" volumes: - my-service-postgres-test-data-v:/var/lib/postgresql/data networks: - "my-service-n" my-service-s: image: URI_TO_THE_IMAGE container_name: my-service-c ports: - "22822:22822" - "22825:22825" depends_on: - my-service-postgres-test-s volumes: - ./config:/opt/docker/conf networks: - "my-service-n" my-service-zk4kafka-s: image: confluentinc/cp-zookeeper:4.0.0 container_name: my-service-zk4kafka-c environment: ZOOKEEPER_SERVER_ID: 1 ZOOKEEPER_CLIENT_PORT: 33272 ZOOKEEPER_TICK_TIME: 2000 ZOOKEEPER_INIT_LIMIT: 10 ZOOKEEPER_SYNC_LIMIT: 5 ZOOKEEPER_AUTOPURGE_SNAPRETAINCOUNT: 5 ZOOKEEPER_AUTOPURGE_PURGEINTERVAL: 2 ZOOKEEPER_SERVERS: 127.0.0.1:33277:33278 ZOOKEEPER_LOG4J_ROOT_LOGLEVEL: INFO KAFKA_JMX_HOSTNAME: 127.0.0.1 KAFKA_JMX_PORT: 33279 network_mode: host # ports: # - "33272:33272" # - "33277:33277" # - "33278:33278" # - "33279:33279" volumes: - my-service-zk4kafka-data-v:/var/lib/zookeeper/data - my-service-zk4kafka-log-v:/var/lib/zookeeper/log my-service-kafka-s: image: confluentinc/cp-kafka:4.0.0 container_name: my-service-kafka-c depends_on: - my-service-zk4kafka-s environment: KAFKA_BROKER_ID: 1 KAFKA_ZOOKEEPER_CONNECT: 127.0.0.1:33272 KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://127.0.0.1:33222 KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true' KAFKA_COMPRESSION_TYPE: 'lz4' KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_NUM_PARTITIONS: 1 KAFKA_UNCLEAN_LEADER_ELECTION_ENABLE: 'false' KAFKA_LOG_CLEANER_ENABLE: 'false' KAFKA_LOG_RETENTION_MS: 315360000000 KAFKA_OFFSETS_RETENTION_MINUTES: 6912000 KAFKA_MAX_CONNECTIONS_PER_IP: 100 KAFKA_DELETE_TOPIC_ENABLE: 'true' KAFKA_LOG4J_ROOT_LOGLEVEL: DEBUG KAFKA_LOG4J_LOGGERS: "kafka.controller=DEBUG,state.change.logger=DEBUG" KAFKA_TOOLS_LOG4J_LOGLEVEL: DEBUG KAFKA_JMX_HOSTNAME: 127.0.0.1 KAFKA_JMX_PORT: 33229 network_mode: host # ports: # - "33222:33222" # - "33229:33229" volumes: - my-service-kafka-data-v:/var/lib/kafka/data volumes: my-service-postgres-test-data-v: my-service-zk4kafka-data-v: my-service-zk4kafka-log-v: my-service-kafka-data-v: networks: my-service-n:
The important addition was the following line:
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
This overwrites the standard default value set by the managed environment and makes it synchron to the number of partitions.
Important: You have to delete the formerly created Topics, because these have been created with the default value for offset replication that was determined by Confluentinc.