Real Estate marketplace

A friend of mine created his own project, it is in essense a real estate marketplace –
Assisted with some issues related to Spring, Hibernate, Amason S3 during development.

Main difference from other marketplaces, among other things is that there are no real estate agency fees or comissions – sellers, buyers and tentants can contact each other directly (buyers can save thousands of dollars on fees).
Pretty good design and usability!


Apache Kafka notes

Apache Kafka Java example, topic, producer, consumer:

Apache Kafka and Zookeeper clusters docker:

Purpose of Kafka, mostly:

  • messaging systems
  • streaming data

Message – data unit

Batch – collection of messages flushed to disk together for performance optimisation (after some amount is collected, or some time passed).

Schema – message serialization by producer and deserialization by consumer, popular – avro. Schema can be stored within Kafka separately from actual messages.

Topic – group of partitions (default number of partition and replication factor: 1)

Partition – messages are appended at the end, or deleted from beginning. Message ordering is preserved only within partition.

Producers send messages to Kafka, of no partition key specified, messages evenly distributed over partition within topic specified. If a message need to be written to specific partition, message key should be specified or custom partitioner can be written.

Consumers – consume messages from a a topic, as part of consumer group. Several consumer groups can subscribe for topic. Only one consumer from a group can read messages from specific partition.

Kafka server – one or more brokers.

Controller – one elected broker from cluster, for administration tasks, like assigning partition to brokers and monitoring brokers failures.

Each partition is read/written by single broker, called leader for this partition (for consistency). Partition can be bound to multiple brokers, but all other brokers beside leader for this partition just replicate (duplicate) it for failover reasons.

Messages within partitions of some topic are preserved (retention) for:

  • some time (default is 7 days)
  • partition length (default 1 GB)

MirrorMaker makes cluster duplication between datacenters, for failover reasons.

Broker important settings:

  • id
  • port
  • zookeeper
  • log directory

Topic important settings:

  • number of partition created automatically
  • retention time
  • retention size

Brokers are stateless regarding number of consumed messages, recent offset should be stored and provided by consumer.

Log compaction – relates to messages with same key – most recent message is preserved, while previous can be deleted, as less interested for consumer.

Message compression – group of messages (batch), is compressed:

  • on producer, to minimize network traffic
  • on leader, it is decompressed, offset is assigned for each message and messages get compressed again to minimize storage size and speedup potential cross-cluster replication.

Replication – copying messages from leader to followers, and notifying producer about success/failure:

  • sync – producer is acknowledged by leader about successful write, after all followers write the message to disk.
  • async – producer is acknowledged by leader about successful write, after message is flushed to disk on leader node only, but before all followers write the message to disk.

Writing producers:

  • connects to random broker from cluster, gets information about leaders, and connects second type for sending actual messages to leader broker for topic specified.
  • messages can be sent:
    • synchronously, per message
    • asynchronously, per message or in batches (flushed by count of messages pre-collected, or by time limit for collecting)
    • fire-and-forget

Producer important settings:

  • list of brokers
  • key and value serializers
  • type (sync, async)
  • acks count
  • partitioner class
  • compression codec (gzip, snappy, none)
  • batch size or time to wait for async batching mode

Reading consumers:

  • connects to random broker from cluster, gets information about leaders, and connects second type for reading actual messages to leader broker for topic specified.
  • high and low level API

Consumer groups:

  • name is global across cluster
  • adding new consumer to group causes rebalancing consumers between brokers, what can cause message deliver inconsistency. To mitigate: shutdown all consumers and start them up.

Consumed messages, but not committed, can be consumed once again (by another consumer) after rebalancing, if:

  • consumer which consumed some amount of messages crashed before commit
  • rebalancing happened for another reason, like adding new consumer into group, altering cluster, adding brokers, changing replication factor

Committing offset:

  • auto (default every 5 seconds), or next poll(), commits offset of previous poll()
  • manually, commitSync() or commitAsync(), commits all messages from last poll() called.
  • manually, particular offset (sync or async)

Rebalancing can have special logic assositated with, by implementing ConsumerRebalanceListener interface.

Exiting consumer loop: consumer.wakeup()

High level consumer API:

  • current offset is stored within Zookeeper, it is handled automatically underneath.

Low level (simple) consumer API:

  • offset is stored by consumer itself
  • contacts any broker to find a leader broker for topic of interest

Multithreaded consumers should map: one partition – one thread – one consumer.

Consumer can subscribe for topic or to specific partitions.

Consumer important settings:

  • group id
  • id
  • zookeeper, with timeouts
  • client id (kafka client identification)
  • offset autocomits timeouts
  • initial offset (latest, smalest, ?)
  • time interval to wait for incoming messages before exception

Administration, tools:

  • tool for graceful shutdown of broker (another leader will be selected proactively or shutdown operation failed if there is no new leader, to minimize downtime, milliseconds, and avoid data loss)
  • tool for rebalancing of leads between available brokers
  • tools for populating newly added broker with partitions (or replacing existing partitions from a broker planned for decommission), since after new broker is added, it is empty.
  • tool for increasing number of partition at some topic
  • tool to list all topics, partitions, all other details for specific cluster.

Apache Zookeeper notes

Zookeeper handles coordination tasks between multiple activities, cooperation and contention, sharing state for distributed application. Before same problems were solved via DB primary.

It is just a remote tree (with such called znodes). Znodes can have data associated with, binary format. No partial writes are allowed, whole data can be added/replaced/deleted only.

Types of tasks to solve:

  • master election
  • master/worker crash detection
  • membership/clastering
  • common data storage (small amount)

Operations allowed:

  • create znode (with optional data)
  • delete znode
  • check if znode exists
  • get/set data on existing node
  • get znode’s children

Kinds of nodes:

  • persistent – can be deleted via delete operation
  • ephemerial – can deleted via delete operation or automatically, if client created the node, crashed or simply closed connection. No children allowed here.
  • sequential – assigned with unique increasing integer.

Tracking node state changing:

  • polling
  • notification: set a “watch” – one-time trigger, reacts on change of some znode. Should be reset each time it fires, if znode should be constantly monitored.

Znode has a version associated with it, it allows conditional operation (delete and setData), means optimistic lock.

Zookeeper operates as standalone server or quorum multi-server (ensemble), clients should establish and keep connection session with server.

Quorum (odd number) – amount of running servers to run an ensemble, or amount of servers to store data (to avoid brain-split problem)

Session can be moved to another server within ensemble, transparently to client. Client’s commands within same session are strongly FIFO executed on server.


Can be JMX monitored.

Java Spring Cloud notes


  • configurations
  • service registration
  • service discovery
  • load balancing
  • circuit breakers
  • messaging

Change of configuration setting are not pulled by clients for updates, but it can be turned-on (@Sheduled, @RefreshScope annotation) or even Spring Cloud Bus can be used (EnvironmentChangeEvent).

Ribbon – client side load balancing, @LoadBalanced for RestTemplate and for WebClient. Retry logic can be enabled.

Configuration usually locates at GIT (or other SCM), for debug purposes it can be stored locally as file. Vault backend can be used. JDBC backend for configuration storing. Config client retry.

Discovery server heartbeats clients

Discovery Client – @EnableDiscoveryClient: for Eureka, Consul, Zookeeper

Hystrix – circuit breaker

Hystrix Dashboard – allows to track Hystrix clients individually or through accumulated stream of multiple Hystrix dashboards via Turbine.

Feign – REST client

Archaius – external configurations

Spectator, Servo, Atlas – metrics

Cloud Stream – application communicates with external world through input and output channel

Binder for Kafka and RabbitMQ

Consumer group (like AWS’s target group) – set of competing members, only on is given with particular message. But all groups, subscribed for same source of messages get a copy of data. If no group specified, each service is considered a member of anonymous single-item consumer group.

Reactive streams a supported as well: RxJava, Reactor

Aggregation – connection inputs and outputs together, to avoid load on broker.

Binder – connection to particular Broker (RabbitMQ or Kafka)

Schema based message converters (out-of-the-box is Avro only supported for the moment), Schema registry stored schemas.

Spring Cloud Bus – delivering configuration changes or infrastructure management instructions to microservices (RabbitMQ or Kafka).

Sleuth – distributed tracing. Span – request+response (usually it is HTTP request+response, but can be written manually, like transaction, or in form of annotation, assigned on Runnable interface). Set of spans – tree-like structure, span and all its children. Zipkin – trace visualisation.

Consul – Service Discovery, Control Bus and Configuration.

Spring Cloud Contract – like customer written acceptance tests, to make sure that any service fits its microservice environment.

Spring Cloud Vault Config – stores configs and secrets for microservice application (HashiCorp Vault).


Java Spring Boot notes

Important maven artifacts:

  • spring-boot-starter-parent (versions)
  • spring-boot-maven-plugin (executable jar)


Main features:

  • BOM, bills of materials (spring-boot-starter-parent or spring-boot-dependencies)
  • Autoconfiguration (@EnableAutoConfiguration, @ComponentScan, @Configuration, or @SpringBootApplication), with exclude and redefine config, from annotation or yaml-config.
  • creating own configurations
  • embedded servlet (version 3.1) containers: Tomcat, Jetty, UnderTow, Netty
  • eeveloper tools (can be automatically removed while running fully packaged application): caching, automatic restart/reload, remote automatic restart/reload
  • starters
  • actuators
  • CommandLineRunner
  • custom health checks and info
  • custom metrics


Auto-configuration for:

  • caching function returned cached values (it is possible to update or evict)
  • messaging
  • WebClient is better than RestTemplate (at least it is reactive)
  • bean validation
  • metrics
  • http tracing (last 100 calls)

Java framework: Spring Core

Each bean has an ID in scope of context (by default it is modified class name, can be specified explicitly)

Container implementations:

  • bean factories (simple)
  • application context (advanced)
    • internationalization
    • event publication
    • resource management
    • life-cycle events

Popular application contexts:

  • AnnotationConfigApplicationContext – java based application context
  • AnnotationConfigWebApplicationContext – java based web context
  • ClassPathXmlApplicationContext
  • FileSystemXmlApplicationContext
  • XmlWebApplicationContext

Bean lifecycle

  • instantiate
  • set properties
  • setBeanName()
  • setBeanFactory()
  • setApplicationContext()
  • postProcessBeforeInitialization() (from BeanPostProcessor interface)
  • afterPropertiesSet() (from InitializingBean interface)
  • custom init method
  • postProcessAfterInitialization() (from BeanPostProcessor interface)
  • destroy() (from DisposableBean interface)
  • custom destroy method

Bean configuration:

  • explicit XML
  • explicit Java config
  • implicit (component scanning and autowiring)


  • @Component/@Named – marks bean (more specific @Service, @Repository, @Controller)
  • @ComponentScan – turns on component scanning (XML: <context:component-scan>)
  • @Configuration – marks java configuration class – current package and subpackages will be scanned for @Component marked classes
  • @Profile/@ActiveProfiles/@IfProfileValue
  • @Conditional – if condition is true – bean gets created
  • @Autowired/@Inject – marks a destination for bean wiring (if no beans wired, exception is thrown – default, of null reference left – can be enabled explicitly, if more than one bean found – exception is thrown)

Destination for wired beans:

  • constructor
  • setter
  • method

What can be wired:

  • references on other objects/beans
  • literals
  • collections (list, map, set)

Mitigating ambiguity:

  • @Primary – marked bean will have precedence over other matching beans
  • @Qaulifier – additional level of matching between @Component and @Autowired

Scoping @Scope bean annotation, instantiation mode:

  • singleton (default) – only one instance is created
  • prototype – each time new instance
  • session – WEB specific
  • request – WEB specific
  • global session – for portlet-based WEB application
  • thread
  • custom

WEB specific beans are wired through proxy (in case of interface, java dynamic proxy creation) or through inheritance (CGLib)

Spring contexts can be nested to properly separate logic domains, beans from child context can refer beans from parent context.

Spring bean’s method can be replaced (via xml configuration) by other bean’s method.

Beans can have multiple names (but single ID), and aliases to names/id.

SBT extract


  1. Short, concise DSL, can be extended by pure Scala code
  2. Interactivity
  3. Background execution
  4. Default parallel execution (restriction on CPU, network and disk can be specified)
  5. Scala REPL integration
  6. Incremental compilation
  7. Default folder structure (can be adjusted)
  8. Defined workflow (can be adjusted or redefined)
  9. Type safety
  10. Direct dataflow between tasks is supported
  11. Simple script entities hierarchy, just tasks and settings, some already defined, but it is easy to add custom
  12. Crossbuild (for several Scala versions in parallel)
  13. Plugin extensible

Folder structure

  • <root>/project/plugins.sbt
  • <root>/project/
  • <root>/build.sbt

SBT tasks, executing items, can depend on other tasks (use other task return value inside body), can accept a user input.

  • Declare key: val keyName = taskKey[keyType](“key description”)
  • Assign value: keyName := …
  • Get value: keyName.value

SBT setting – just a named value, can dependent only no literal or value of other setting. The exact value is determined during starting script up. It cannot depend on some task return value.

  • settingName := settingValue – for assign (redefine, if already defined)
  • settingName += settingValue – for append single value to Seq
  • settingName ++= settingValue – for append Seq to Seq


  • project
  • configurations – namespaces for keys (default: Compile,Test, Runtime, IntegrationTest)
  • task
  • global – default, if not specified

Multiproject – can be declared as single or multiple (own for each project) sbt file. Abstract parent project can have common settings, added or redefined by concrete child projects. dependsOn – defines dependency.

Sources (compile/test configurations):

  • location settings: javaSource, resourceDirectory, scalaSource.
  • filtering: includeFilter, excludeFilter.
  • Managed: autogenerated by SBT or added explicitly into build.
  • Unmanaged: created outside of SBT, written by coder.

Dependencies (compile/test/runtime):

  • internal (between projects) or external (on some lib outside – maven / ivy)
  • external can be: managed (maven / ivy) or unmanaged (jars from lib folder)
  • resolvers – setting that can be added with additional maven/ivy external repositories.

Dependency format:  ModuleID – “groupID/organisation” % or %% “artifactID/product” % “version” (optional: “test”, “provided”)

  • exclude – specified dependency will be omitted (additionaly rules can be applied)
  • classifier – additional parameters, like JDK version
  • intransitive or notTransitive – do not load dependencies
  • withSources
  • withJavadoc
  • externalPom
  • externalIvy

Forking – execution Test or Run in separate JVM, custom settings can be applied

Session – memory mapped SBT configuration, will be lost after reload, can be saved as SBT file.

SBT script troubleshooting: streams.value.log

Extending SBT: commands and plugins

Publishing artifact: publishTo