March 2019 - Tech Lead

A story about running a software house

The beginning

In 2013 I’ve co-founded Epicode software house. Before starting the company we were doing after-hours projects with my business partner. He was handling sales and design. I was handling development. This is also how we split the duties in Epicode. He became CEO and I was CTO. At the beginning, it was quite an easy role in a company of 2 employees.

Working as we liked

We would probably not start Epicode as a full-time job if not the big customer open for cooperation: Grupa Pracuj where we both worked before. Grupa Pracuj was just starting a new startup: emplo.com and needed to build a development team for it. They become our biggest customer and shareholder. It was a good deal. We had financial security and independence. We could do the software business as we liked. Our customers including Grupa Pracuj cared only about results.

Quick start and early success

I’ve relocated back from Ireland to Poland to start working full time at Epicode. In July 2013 there was me, my partner, emplo.com project starting and 3 other small ecommerce customers we had as a result of our part-time moonlight work before Epicode. Good start to grow. Soon we started hiring and rented an office. After 2 years we were over 20 people: developers, testers and designers.

Growing challenges

Next 3 years has shown us exactly the challenges of growing company over 30 employees. For that you need to delegate and to have middle-management. We were not ready for that. As local “superheros” we felt that we must be directly involved in every project to be sure that it will succeed. We were focusing on project-related work, not on growing the company. There was also a constant feeling that if we hire more people we are not sure if we will manage to get enough customers. On the one hand we always had a big workload, but we were never sure if this workload is stable enough to hire more people.

When running a software house, the biggest challenge is to have proper balance between new projects coming-in and number of hired people. The only cost that matters are salaries. If you hire too many people you can quickly produce lost. Crucial part is to have predictable sales pipeline so that you can plan at least couple of months ahead.

Learning sales

We did an effort to hire a sales person but it has only shown how unprepared we were to scale. That person reached out to over 500 potential customers but didn’t even manage to setup a single meeting. It turned out that sales person did not understand what we were really selling. For us it was obvious: we were listening what problems customer wants to solve and then we were proposing how we would approach that by providing appropriate software solution. It usually worked. Our sales meetings were not about sales, it was about free consultancy after which the customer simply wanted to work with us. But this approach did not scale without proper staff training, marketing content and well-prepared case studies. With an ad-hoc sales person concentrated solely on rates that we offer and deadlines that we can meet, the effect was terrible. We were treated like spammers.

Goal bigger than sales – do what you like doing

But the reality was that we were not convinced if we would really like to focus on marketing, sales and growing the company. Once we have got a project we were sinking into delivering it, forgetting about searching the next thing to do after current project will be done. When I was working on non-technical stuff I had a feeling of loosing time and energy on something that is not really my pair of shoes. Using social media for promotion? Blah… We are hackers, developers, creators, not marketers. If somebody does not want to work with us, it’s them who should regret. I was not realising how arrogant that way of thinking was.

Our company started to drift into games direction and developing our own game. Many people we had onboard wanted to work on own product which would give much more freedom on how to work. Not to relay solely on B2B sales, contracts and timelines, especially that we were not mastering that processes. Developing a game had a chance to become a high-margin business – with higher risk but also with higher gratitude if it will turn out to be successful.

But for me personally it was not something I wanted to do. I was excited about solving real-life problems that companies or end-users have. I didn’t want to close myself in a dark room coding a virtual world. My preference was working with people and business processes.

There is no progress without change

After 5 years at Epicode I decided to go my own path. We had great time at Epicode with plenty of success stories and delivered projects. But I needed a fresh context and new fuel. Now I am excited to be a part of scale-up process at Mash.com. Scaling is something that failed for us in Epicode, that’s why I am so enthusiastic about being a part of it at Mash.

Technology and business

I’d like to share 2 sentences which came to my mind this evening…

1. Business is easy to understand but hard to master

So many business coaches, so many books and success stories. They say: just do it! But in practice it’s so hard to build your own product which would earn 1 little dollar a day.

2. Technology is hard to understand but easy to master

Technology is often abstract or complicated. A lot of jargon around it. It’s not easy to understand that jargon without being a part of it. On the other hand the rules are simple. You have the specs, the docs, just learn it and it’s almost guaranteed that it will work if you follow the instructions carefully. Everything is predictable and well defined – totally not like in business.

Where do I see myself?

I understand business, but I do not master it. I understand technology and master significant areas of it. In my professional journey I am looking for environments when I can work closely with business and people who master it. My role is to add value by managing and building technical solutions supporting given business model.

Actor model programming in Orleans framework

I’ve spent some time recently playing around with Orleans Framework. It’s an alternative to Akka.NET offering similar actor-based architecture.

What is actor model?

Actors are called Grains in Orleans. Actor or Grain is a class representing some business entity. It can be instance of a Player, Trader, Customer or Account depending on domain topic. In practice Grains are implemented like normal C# classes. A constraint that we have is that all methods have to be asynchronous to keep all the communication between grains non-blocking.

Local state

Grains have local state. It means that they live in memory between requests to system. It gives big performance benefits compared to creating entity instance based on data from database for each request. State can be persisted to database to avoid loosing data on system restarts. As programmer you can invoke saving to storage any time when state has changed.

Horizontal scalability

Orleans can run in cluster using many servers. Framework can place grains in all nodes across the cluster, so that each grain is located only in a single node. There can be some exceptions from that rule: when node crashes framework may be not sure when exactly grain has finished its processing. This problem is in general called a split-brain in computing. But this is an edge-case which falls into error handling strategies, overall assumption is that each grain is activated only once.

Grains are exchanging massages between each other. That messages use super-fast .NET binary serialization. Messages can go over network if 2 grains are on separate nodes. So it is important to make grains not too chatty if you care about performance, and you probably care if you are interested in frameworks like Orleans 🙂

Possibility to run Orleans in a cluster gives beautiful linear scalability.

What problems is actor-model good for?

Actor model is suitable when you have a lot of objects communicating with each other. Example use cases:

Real-time trading systems
Multiplayer games
IoT applications connected to many devices

Grain activations should be distributed randomly and decentralized. Actor-model is not suitable for batch processing or centralized design where some entities have to process most of the requests (so called hot-spots).

Event sourcing

Actors are a good fit to match with event sourcing pattern. Grain supports that pattern by JournaledGrains. But here comes a disappointment. Available storage mechanisms for event log persistence are poor. The only built in storage provider saves event log for given grain as collection serialized into single state object, so the whole event log needs to be read before recreating grain state. Other built in storage saves only state snapshot without saving event log. Good thing is that there is flexible extensibility point allowing to write your own provider by implementing just 2 methods for reading and writing events. There is also a community contribution available which integrates Orleans with Event Store but this database is not my favorite. Probably I’m complaining too much and should instead contribute by implementing events log storage based on Cassandra or CosmosDB, it does not look like a hard task, but the next topic is much harder – distributed transactions.

Distributed transactions

Creators of Orleans framework did a great job to formally describe frameworks semantics. You can have a look at how they implemented distributed transactions: https://www.microsoft.com/en-us/research/publication/transactions-distributed-actors-cloud-2/

The algorithm is very interesting but from practical point if view, what I miss is lack of support for transactional communication between JournaledGrains. Again, support for event sourcing pattern seems to have been not a top priority in Orleans so far.

I you would like to jump deeper into other theoretical aspects of actor-based architecture, you may be interested in other Microsoft Research materials:
https://www.microsoft.com/en-us/research/project/orleans-virtual-actors/

Message delivery

Orleans can give you one of the guarantees:

message will be delivered at most once
message will be delivered at least once

There is not guarantee to deliver the message exactly once. We are in distributed system and this problem is not easy to solve without sacrificing performance. This is something to be aware of. It’s up to you how to introduce fault tolerance.

Orleans and microservices

You can think of Orleans as of microservices framework. The services are really micro. Each grain is a service. You probably cannot go more micro with microservices than in actor-based architecture. If you are building a microservices-based system, please have a look at Orleans docs and ask yourself an honest question: have you thought about all that problems that Orleans addresses and solves when building your microservices solution? We often make shortcuts through mud and bush because we do not even know that there is a better way. Please have a look at this presentation to illustrate some examples:

Distributed Transactions are dead, long live distributed transaction! from J On The Beach

Summary

I’m very grateful for all contributors who put Orleans into existence because it provides decent ground for building well-defined actor based architecture. Even if this model is not suitable for your needs, Orleans is very educational. Making a deep dive into its architecture and implementation can broaden architectural horizons a lot.

But on the other hand in my opinion you have to be prepared to make quite a lot of custom extensions / contributions on a framework level to build production-class system. There is an interesting initiative called Microdot framework which adds to Orleans many must-have features when building real system. But even with Microdot, this ecosystem looks more like an academical research rather than a shiny framework ready to ship to production out-of-the box. For everyone looking for something more mature with bigger support I would recommend to look at Azure Service Fabric.

But forgetting about production and enterprise readiness, programming model in Orleans is sweet. APIs are well designed and framework offers many extensions points to play with. Worth trying before signing-up for a cloud solution.