A note about complexity

I came across this very interesting reading about complexity: https://iveybusinessjournal.com/publication/coping-with-complexity/

What was the most useful advice that I’ve found there was the idea of improvisation. So far in the context of professional work I had rather negative connotation of improvisation. I’ve seen it as a way of hiding lack of preparation or knowledge that was rather degrading expected quality.

But I was wrong. Improvisation turns out to be a great tool that anyone can start using relatively easy to deal with complexity. The inspiration is taken from theater improvisation and music jam sessions where the play is based on “yes, and…” rule.

Basically the rule says that whoever enters the show, has to build his part based on what others have already said or played. Not trying to negate or start something irrelevant. Participants are expected to continue and extend the plot that was already created.

I find this rule very useful when working on the complex projects. I can recall many situations in projects when there seem to be so many options with so much uncertainty that it seemed impossible to progress in the right direction. Those situations can be unblocked by improvisation, where we are allowed to progress based on knowledge that is limited. And in VUCA world we all have limited understanding about any subject that is non-trivial. The key is to identify the minimum set of facts required to progress and create progress based on your own expertise on top of what was already created. The facts from which to start, are identified by the skill of listening to others, not focusing solely on your own part.

The rule of not negating others work is the key factor here. You are allowed to suggest turns into left or right but it should still be a progress of the same journey. We should not start from a completely new point on our own, as it creates even more VUCA.

By using this method we can progress even when we are not sure where to go (like in machine learning). We can use joint force to explore and move faster. While we move on, we will make mistakes but also crate chances for victories. Moving on is the key. Staying in palce paralyzed by hard decision is something that may kill the project. And negating or ignoring what was already said and done, does not create progress.

In VUCA world, being certain that we are on the optimal path is impossible. What is possible, is exploration. If we are focused and making every small step based on competent knowledge, then we can expect that partial results will be achieved on daily basis and eventually also bigger goals are very likely to be met. Probably in a way that was not initially expected.

What software architecture is about?

Martin Fowler has assembled great material explaining what is software architecture: https://martinfowler.com/architecture/

My key takeaway from this reading, is that software architecture is about making sure that adding functionality to software will not become more and more expensive as the time goes.

In the other words, the development effort that is done today should also support future innovation. We often hear that motto in business to focus on strengths and build on top of that. One of the strengths that an organization has, may be its software. So to follow the motto in the software teams, it should be much easier to build new products and services on top of the existing codebase and infrastructure, rather than starting from the scratch. Existing codebase should be a competitive advantage, not a burden.

But is it always like that? Organizations in some cases come to the conclusion that it makes more sense to start a product from the scratch or rewrite existing software to support new functionalities. Does it mean that the old systems has wrong software architecture?

Yes and no. Old architecture could be great and efficient to support the business goals of the past, but may be not suitable in the context of new reality and new business goals. Sometimes there must be a brave decision made to switch to the new architecture, to be able to stay on top. Otherwise new competitors that do not have to carry the burden of historical decisions, may grow much faster in the new reality and eventually take over the market. Proper timing of such technological shifts may be crucial for organizations to perform well.

Changing environment does not have to mean the change of the business model or market, but may also mean availability of new technologies or organizational changes.

Nevertheless, those kind of radical architecture changes should happen as seldom as possible as there is always a huge cost and complexity behind such shifts. Architecture should aim to predict likely scenarios and aim to be open for changes. There is always at least a few options that are supported by current constraints. Options more open for change and extensibility should be always preferred if the cost is comparable.

Hangfire.io and .NET Expressions

I was troubleshooting an interesting bug recently thanks to which I’ve learned a bit more about Hangfire.io and expressions in .NET.

The situation was that Hangfire dashboard looked correctly, we had all jobs registered as expected. But what was actually executed by the scheduler for each job was same logic, which was supposed to be executed only for the last job. Nasty bug. We were not yet on production with hangfire.io, but still it was quite an unexpected behavior to see.

Reason was that we were wrapping each job in a class called JobRunner. This class was adding some generic functionality to update UI progress bars when jobs are running. Our code looked like that:

JobRunner runner = new JobRunner(myJobClass);
RecurringJob.AddOrUpdate(myJobClass.JobId, () => runner.Execute(), myJobClass.CronExpression);

Crucial thing to understand about Hangfire is that the what we pass to AddOrUpdate method is not a function to execute but an Expression describing the function to be executed. See this thread for difference between Expression<> and Func<>.

runner instance is not kept in memory or serialized. When Hangfire executes the job, it needs to create the instance by calling the constructor of given type. Constructor arguments are resolved from IoC container. In our case constructor argument was of type IJob. This interface was providing properties like JobId or CronExpression. So what was happening when EVERY job was running, was firsts implementation of IJob found in the container injected into a JobRunner. For each job same implementation of IJob was injected. And here we are – all jobs magically are executing same logic…

Now it seems quite obvious but it was necessary to learn couple of rules along the way to understand that behavior. It seems to be a common misunderstanding as there is even a comment about people making that mistake in hangfire.io source code, see Job.cs .

I hope this case study will help someone to avoid similar traps.

Lambda architcture λ

I’ve been doing some research recently about architectures for large scale data analysis systems. An idea that appears quite often when discussing this problem is lambda architecture.

Data aggregation

The idea is quite simple. Intuitive approach to analytics is to gather data as it comes and then aggregate data for better performance of analytic queries. E.g. when users do reports by date range, pre-aggregate all sales/usage numbers per day and then produce the result for given date range by making the sum of aggregates for each data in the range. If you have let’s say 10k transactions per day, that approach will create only 1 record per day. Of course in reality you’d probably need many aggregates for different dimensions, to enable data filtering, but still you will probably have much less dimensions than the number of aggregated rows.

Aggregation is not the only way to increase query performance. It could be any kind of pre-computing like batch processing, indexing, caching etc. This layer in lambda architecture is called “serving layer” as it’s goal is to server for the analytical queries as a fast source of information.

Speed layer

This approach has a significant downside – aggregated results are available after a delay. In the example above the aggregated data required to perform analytical queries, will be available on the next day. Lambda architecture mitigates that problem by introducing so called speed-layer. In our example that would be the layer keeping data for current day. The amount of data for a single day is relatively small and probably does not require aggregates to be queried or can fit into a relatively small number of fast and more expensive machines (e.g. using in-memory computing)

Queries

Analytical queries combine results from 2 sources: aggregates and speed layer. The speed layer can be also used to crate aggregates for the next day. Once data is aggregated it can be removed from speed layer to free the resources.

Master data

Let’s do not forget that besides speed layer and aggregates, there is also a so called master data that contains all raw, not aggregated records. This dataset in lambda architecture should be append-only

Technology

This architecture is technology-agnostic. For example you can build all the layers on top of SQL servers. But typically a distributed file system like  HDFS would be used as master data. MapReduce pattern would be used for batch processing the maser data. Technologies like Apache HBase or ElephantDb would be used to query the serving layer. And Apache Storm would be used for the speed layer. Those choices could be quite common in the industry but technology stack can vary a lot from project to project or company to company.

Sources

http://lambda-architecture.net/
https://amplitude.com/blog/2015/08/25/scaling-analytics-at-amplitude
https://databricks.com/glossary/lambda-architecture

 

Learning from work experience vs self-studying

I’ve share in one of my articles that is is estimated that only 10% of the learning happens a the formal training. The remaining 90% comes from everyday-tasks and learning from coworkers. Formal training includes things like self-studying from online resources which I would like to emphasis in this article. Since it’s only 10%, can it be treated with low priority?

Self study – 10% of the time

I do not know how those numbers were calculated. I’ve seen those numbers in on of the managers training. In my opinion those numbers can be quite accurate when we think about the time that we are able to spend on learning. Most of the time we spend at work and this is where we have the biggest opportunity to learn. Work builds real experience and practice. The knowledge becomes not just theoretical but also tested in real-life. We are able to come up with our own use-cases, examples and experience practical challenges.

Studying vs practicing vs teaching

When we think about the levels of knowledge it may be illustrated as: student → practitioner → teacher. Self-study brings you only to a student level. Good courses include hands-on labs, so that you can experience also some practice. But training exercises are always simplified and cover only simple “happy paths”. They do not include production-level challenges. It’s like fight with a shadow vs fight with a real opponent in martial arts.

Self-study – the impact

But does it mean that self-study can be ignored as it contributes only 10% to your learning? Absolutely not. It’s 10% in terms of the time but can be much more in terms of the impact. We do not always have the comfort to learn new things at work. Especially when you are an architect or in general technical lead, your company expects that you are the one who teaches others, who knows the new trends and who is up to date with latest technologies. You have to do a lot of self-study so that the whole company does not settle down with old solutions.

Online resources for self-studying

I was recently writing about DNA program which is a great example of self studying materials for software architects. Recently I’ve also singed-up for Google Cloud Platform Architecture training with Coursera. After doing the first module and getting this certificate I can definitely recommend Coursera GCP trainings. The best thing is that the training include a lot of hands-on labs with real GCP resources provisioned via Qwiklabs platform.

I have to admit that the online training possibilities that we have now are amazing. For a small price you can have access to resources that are often of much better quality than (unfortunately) some lectures at stationary universities.

Summary

So, go and self-study! Then use it at work and build a better world 🙂

Monorepo in Azure DevOps

When working with projects using microservices architecture I opt for monorepo pattern. A single repository containing source code of all the services. It facilitates knowledge sharing between teams and encourages more unified programming style. Microservices give a lot of technological freedom, but this freedom should be used wisely. Common standards across the organization are still important.

Working with a single repository forces all the programmers to at least have a glance at the list of commits from other teams when making git pull. This channel of knowledge sharing can be very beneficial.

Despite having a single repository, we still need separate pipelines and policies for separate directories. Azure DevOps facilitates it by allowing directory path filter in crucial places:

  • path filter in build trigger settings
  • branch policies
  • required reviewers

It allows to setup a clear ownership of different parts of the repository and apply different pipelines to different parts of the repo while still having all the benefits of monorepo pattern.

Kubernetes basics for Docker users

The aim of this article is to build a high-level mind-map and understanding of concepts like Kubernetes, Helm and cloud-native applications. The assumption is that you have worked already with Docker.

Docker

So you know Docker. Docker container is like a virtual machine but lighter. Why lighter? It does not have operating system inside. It relies on host operating system. Docker adds only applications layer on top. You can run many Docker containers in a single server / virtual machine.

Docker Compose

You may have also heard about Docker Compose. For example when your application consists of .NET Core web server, MySQL database, and Redis cache – you can define 3 separate containers for it. To run all of them in a virtual network you define a docker-compose.yml file. Then 3 of them can be run with a single docker-compose up command.

Scaling the application for production

Now let’s imagine you want to scale your application. You introduce a load balancer and 2 additional web server containers. You are also adding a RabbitMq container and an instance of a background processing worker. There are also other requirements for production environment:

  •  containers need to be distributed across many servers
  • containers which do not need much resources can be run together in same server to use provisioned servers in a cost-effective way
  • when a container is not responding it should be restarted
  • when connectivity with container is lost it should be replaced with a new instance
  • number of containers should autoscale
  • number of servers should autoscale
  • new containers added to this environment should be auto-discovered
  • it should be possible to mount and share storage volumes in a flexible way

Things are getting complicated. To achieve all that requirements we must write a lot of code to monitor and manage infrastructure. This is called containers orchestration. Or… we can use Kubernetes (k8s) which has all that features and more built-in!

Kubernetes concepts

Pod

Abstraction of a single app. It can have one ore more containers. If containers are tightly coupled they may be placed in same pod. All containers inside a pod share storage volumes. A pod is an unit of deployment and scalability. Each pod has IP address assigned, so there is no need to care about port conflicts.

Node

This is how k8s names physical servers or virtual machines hosting the containers.

Cluster

Set of nodes available for k8s. Example cluster would be 4 nodes and 20 pods running on them, managed dynamically by k8s.

Namespace

All objects within a cluster can have a namespace. It allows to create many virtual clusters inside single cluster. It is useful for example to model many independent environments for staging in a single k8s environment.

Service

Since each pod has it’s own IP and pods can be started and shut down, it would be not easy for other pods to keep track of constantly changing IPs. That’s why we have Services in k8s. Service groups a set of pods by given labels. Pods may come and go, but as long as labels criteria are matching, all matching pods are automatically tracked. Service has a logical name assigned, so that other pods can use this name to communicate with the pods behind the service. Service routes and load-balances the traffic dynamically to relevant pods.

Services can be also used to point traffic to an endpoint outside k8s cluster. In this case instead of defining a service by providing pod labels selector, it is necessary to define an IP of the service backend.

Ingress

Ingress is used to expose services to outside word via http(s). It can also terminate SSL.

Deployment

Deployment specifies the pod and the number of its replicas that should be run. Deployments controller is responsible for rolling out updated pods (e.g. with updated container image). It starts new pods, shuts down old pods and then keeps monitoring them to make sure that desired number of replicas is run.

StatefulSet

StatefulSets are used to manage containers which contain data. Containers that have data cannot be just removed and replaced as we cannot loose its data. Pods in StatefulSets have sticky identities and persistence storage assigned. Persistent storage is not deleted when pod is deleted.

Worth to mention that in many scenarios managing persistence would be simpler outside Kubernetes cluster. Many cloud providers have sql an noSql as-a-service offerings which usually takes care about things like backups, availability and replication.

Monitoring containers

Each container has a liveness and readiness probe defined. Typically those are HTTP endpoints called by Kubernetes to check if container is healthy. K8s calls the endpoints periodically, e.g. every 10 seconds depending on configuration. When liveness probe fails container is restarted. When readiness probe fails traffic is not anymore routed to this instance. Health check endpoints must be implemented in every service. Simple liveness endpoint could be just returning status code 200. Readiness endpoint could additionally check things like database connection, cache readiness or amount of currently used resources to check if service is really ready to process new requests. When readiness endpoint detects a problem that could be solved by restarting the container, it could potentially switch a variable to force liveness endpoint to fail causing a restart.

Helm

An application targeting Kubernetes is configured in a set of yaml files for deployments, services etc. Those sets of yaml files can grow pretty complex. We also need some versioning tool for them. A common approach is to package all k8s files into a Helm package called a chart. What is being deployed to k8s cluster is a chart.

Cloud native applications

Kubernetes is an operating system for cloud native applications. Cloud native applications are usually designed in microservices architecture and aim to be cloud-agnostic – can run in many public clouds or in a private/hybrid cloud. There is already a lot of predefined Helm charts available in public repositories if you’d like to pull scalable k8s setup e.g. for Cassandra or Prometheus in your cloud-native setup.

Sources

DNA – week one

What is DNA?

DNA comes from Polish name “Droga Nowoczesnego Architekta” – it means: “The road of modern architect”. English abbreviation would be “TROMA”. This one of very little examples where Polish is simpler than English 😉

This is a 19-weeks course crated by 3 experienced architects who are also trainers. They are supported by popular blogger Maciej Aniserowicz who is the publisher of the course. He made the program well known thanks to the broad IT audience in Poland following his devstyle.pl blog and social media channels.

Course is dedicated to senior software developers and architects. The goal is to propagate modern patterns in software development. Presented theory is backed up by cases from real-life. It is meant to be the “killing feature” of the course. In addition to that there are also practical exercises after each week. The idea is  that participants get not only theory but also examples from real life and exercises to practice.

Visit droga.dev to learn more about DNA.

First impressions

I was following more ore less what was published by DNA mentors as “teasers” in recent weeks before full program became available. I was prepared to expect good content and I was not disappointed. The content is solid and also the way of presenting it is fully professional.

Good content was not a surprise for me after buying access to the course. I believed that those guys will do great stuff. I knew what I am buying. When you go to good restaurant you expect to get good food, no excitement here. But there was one thing that was a bonus that I was not expecting to be so meaningful: the community.

Joining DNA community on Slack was pure fresh excitement. This is a closed group for all course members. Joining this group was like joining tens of new teams in different companies in one day. People share and discuss how they approach various topics in their projects. This is great, tons of information and different points of view! You can read about an useful tool, online resource or interesting approach to specific problem. I expect that the value of content generated by the community may exceed the value of the course content itself as the time goes. Of course it’s Slack so no structure is present here and it will be impossible to find anything after couple of weeks 😉 But among all possible distractions you can have turned on, this Slack community is the beneficial one and can really broaden your mind

Eye opener #1

I like to search for analogies between constructions and software industry. In general there is a lot of analogy, but one of them seemed harmful to me: software developer is a creative role whereas constructions developer most often is doing a repetitive job based on detailed project. DNA explained that discrepancy – software developer should not be treated as construction worker but as constructor.

Developers write code which is a kind of design. In software the role of worker is taken by compiler and build pipeline. Having code written, we can then start many instances of running program created almost automatically. How beautiful it would be if we had “compilers” for building construction projects that would automatically execute design documents to create a real building?

Disclaimer to all construction workers who read that: if you don’t have specs from architect/designer/constructor, then you are also a creative worker creating something out of nothing 🙂

Looking forward for more eye openers when continuing with the course 🙂

Divide, eliminate and conquer

Ancient Romans were saying “divide and conquer”. I really like this rule both in system design and project management. To adjust it to contemporary times which are full of distractions I would also add “eliminate” phase. It helps to avoid conquering false targets and allows to focus only on biggest value goals.

Please find below some examples of what this rule means in practice.

Divide

  • Split complex problem into a number of smaller problems
  • Isolate application modules
  • Assign well defined responsibilities to teams and individuals
  • Break up project into phases
  • Define sub-tasks for stories to implement
  • Set clear SMART goals

Eliminate

  • Identify problems that can be ignored
  • Reduce scope
  • Remove distractions
  • Ask “why?”
  • Focus on goals, not procedures

Conquer

  • Execute goals with deep believe
  • Don’t give up
  • Improve
  • Know when is enough
  • Celebrate success

“Divide, eliminate and conquer” approach often helped me when I was feeling overwhelmed by a difficult problem or had to choose which path to go. I believe it is useful in wide range of situations starting from fulfilling New Year’s resolutions up to setting and executing enterprise strategy.

Pros and cons of microservices architecture

Cons

Microservices bring many problems that do not exist in monolithic applications or are much easier to solve. What challenges will you have?

  • Data consistency is difficult to achieve. Distributed transactions result in poor performance. The trade-off is eventual consistency but asynchronous communications is often not easy to accept
  • Error handling becomes more complicated because of distributed nature of the system. Failure in one service may require to trigger undo operation in another service
  • Synchronous dependencies between services reduce performance and availability. Calling service waits until called service will do its stuff. In case of synchronous communication SLA of the system is multiplication of its micro services (0.99 SLA in service one x 0.99 SLA in service 2 = 0.9801 SLA of the system composed of 2 services)
  • Remote calls between services are over network, usually HTTP, so data serialization is required what reduces performance
  • Bigger effort must be put into integration tests
  • Fault tolerance needs a lot of attention
  • Fully automated build and release pipelines are a must because of large number of deployment units. Manual management would be too time-consuming and too error-prone
  • Application monitoring, metrics, alerts and self-healing mechanisms must be implemented to successfully manage production environments. It is absolutely counter-productive to manually search logs in tens or hundreds of running application nodes

Pros

Why to take the burden then? When it is beneficial to choose micro-services over monolithic app?

  • Divide and conquer principle. System split into smaller highly independent parts is easier to manage and maintain. It is like splitting large complicated problem into many smaller and easier problems
  • Teams can be smaller and work more independently. Ownership is bigger because teams feel responsible for their parts. In large monolithic app responsibility is often diffused
  • Failures affect smaller parts of the system, other components may be still functional
  • Each system part can be planned, developed and deployed quite independenty.
  • Different parts of the system may use different technology what allows to update technology stack more often and use different tools for different problems
  • Each part of the system can scale independently. It is easier to identify bottlenecks
  • With mature dev ops culture it is easy to spin up a new service and provision new computing resources. It is often much more effective and less risky than modifying existing service
  • Microservices can be a medicine pill for tired teams. Sometimes teams just need to start from the scratch, leave old buggy system aside for a while and do something smaller that they can be proud of and have fun with

Summary

Microservices are technically harder to develop but in some situations it pays off to take that extra effort. This architecture would be a more natural direction for larger organizations with proven business model and mature teams who are committed to devops culture. For smaller products and start-ups which are still discovering their business model the extra burden brought with microservices may not help but be counter-productive.