Articles Archives - Tech Lead

Low code solution with Azure Logic Apps and PowerBI

I’ve been recently working on a small project to support a new business process. Time to market was critical for the customer, to be able to capture emerged businesss oppurtunity. Budget was also tight, to not over-invest before bussiness case is validated.

There was a strong preference from the customer to do the whole data management via Excel to “keep it simple”. Not a surprise preference when you talk to sales people or CEO as in this case 🙂 There was also a need to enrich data with information from 3rd party systems and provide a number of reports.

High level architecture of this small system looks like this:

It was not the goal to avoid coding at all when building the solution, but the goal was to have a low code approach to save time.

The biggest saving was avoiding custom UI development completely, but still having the solution highly interactive from the users’ perspective. Please find below the description of how it was achieved.

Online sign-up form

For online sign-up form https://webflow.com/ was used. This tool allows to create websistes without the need to write any code. The only piece of JavaScript that had to be written was about making an AJAX request to custom API that would pass form data.

“CRM” via OneDrive and Excel

All the accounts were managed via Excel files. One file per parner company. That kind of approach has many benefits out of the box. Let’s mention a few:

Intuitive and flexible data management via Excel
Access management and sharing capabilities provided by OneDrive
Online collaboration and change tracking built-in

Azure Logic Apps – the glue

The core business logic was developed as custom service implemented in .NET Core and C#. This service also had its own database. Data edited in Excel files needed to be synced with the database in various ways:

changes made via Excel files needed to be reflected in central database
when data was modified by the business logic (for example status was changed and data generated as a result of the business flow), changes needed to be reflected back in the Excel to have a consistent view
when a new account was registered in the system, new Excel file to manage it was automatically created in OneDrive

All of that use cases were implemented via Azure Logic Apps. Logic App is composed from a ready to use building blocks. Here’s the example of single execution log of an example Logic App:

In this case, any time an Excel file is modified in OneDrive, a request is made to custom API to uplaod the file for processing the updates. Before the request, an access token is obtained. Processed file is saved for audit, and in case of error an email alert is sent.

Unther the hood Logic App is defined as a JSON file, so its definition can be stored in the code repository and deployed to Azure via ARM.

Power BI to provide data insights

Reporting was the ultimate goal of that project. Business needed to know about the performance of particular sales agents and internal employees for things like commission reporting and follow-up calls.

When comparing to developing a custom reporting solution, Power BI is super easy to create the UI to browse, filter and export data. Once the connection with database is established, data model can be defined to create interesting visualistations with extensive filtering options. All that features available for 9,99$/month/user.

If you know SQL, and relational data modelling, but are new to Power BI, I can recommend this tutorial to get up to the speed with Power BI:

Summary

Thanks to low-code and no-code tools like Azure Logic Apps, Power BI or Webflow, it was possible to deliver end-to-end solution that users were able to interact with, without any custom code to build UI. If that project included also UI and related backend developent to support UI, it would take a few times more to provide similar capabilities. We could imagine simple UI with less effort but it would be not even close to the rich capabilities provided by Power BI and Excel out of the box.

Happy low-coding! 🙂

Azure Service Bus with MassTransit

Mass Transit project is a well known abstraction layer for .NET over most popular message brokers, covering providers like RabbitMQ or Azure Service Bus. It will not only configure underlying message broker via friendly API, but will also address issues like error handling or concurrency. It also introduces a clean implementation of saga pattern and couple of other patterns useful in distributed systems.

This article is a brief introduction to MassTransit for Azure Service Bus users.

Sending a message

When sending a message we must specify so called “send endpoint” name. Under the hood, send endpoint is Azure Service Bus queue name. When send method is called, queue is automatically created.

ISendEndpointProvider needs to be injected to call Send method. See producers docs for other ways on how to send a message.

Queue name can be specified by creating a mapping from message type to queue name: EndpointConvention.Map<IMyCommand>(new Uri("queue:my-command"));

Publishing an event

When publishing an event we do not have to specify any endpoint name to which we send the message. MassTransit by convention creates a topic corresponding to published massage full name (including namespace). So under the hood we have a concrete topic as we have concrete queue when sending a message, but in case of events we do not specify that topic explicitly. I find it a bit inconsistent, but I understand the idea – conceptually on the abstraction level, publishing an event does not have a target receiver. It is up to subscribers to subscribe for it. Publisher just throws the event into the air.

To publish an endpoint we simply call Publish method on injected IPublishEndpoint

await _publishEndpoint.Publish<IMyEventDone>(new {
  MyProperty = "some value"
});

Event subscribers

It is important to understand how topics and subscriptions work in Azure Service Bus. Each subscription is a queue. When topic does not have any subscriptions, then events published to this topic are simply lost. This is a by-design behaviour in pub/sub pattern.

Consumes connect to subscriptions to process the messages. If there is no active consumers, then messages will be left in the subscription queue until there is a consumer or until a timeout. Subscription is a persistent entity, but consumers are dynamic processes. There may be multiple competing consumers for given subscription to scale-out message processing. Usually different subscriptions are created by different services/sub-systems interested in given events.

cfg.SubscriptionEndpoint<IMyEventDone>("my-subscription", c => {
  c.ConfigureConsumer<Consumer>(context);
});

Worth to mention that if we use MassTransit and we subscribe to a subscription endpoint but we will not register any consumers, then messages sent to this endpoint will be automatically moved to _skipped queue created for IMyEventDone type.

There is also an alternative way of creating subscriptions, which will use additional queue that will have messages from the subscription auto-forwarded, see docs for details.

Anonymous types for messages

It is recommended by MassTransit author to use interfaces for massage contracts. MassTransit comes with a useful Roslyn analysers package which simplifies using anonymous types as interface implementations. After installing analyzers: Install-Package MassTransit.Analyzers we can automatically add missing properties with Alt+Enter:

Blazor Web Assembly large app size

There is a gotcha when crating Blazor application from the template that includes service worker (PWA support enabled). Notice that the service worker is pre-fetching all the static assets (files like JavaScript, CSS, images). See this code on GitHub as an example of the code generated by the default dotnet template for web assembly with PWA.

If you are using a bootstrap like https://adminlte.io/ then web assets folder will include all the plugins that the bootstrap comes with. You may use just 1% of the bootstrap functionality, but default service worker will pre-fetch all unused assets.

That can easily make initial size of the app loaded in the browser be around 40 MB (in the release mode). Do not think that all that files are necessary .net framework libraries loaded as web assembly. When looking into the network tab you’ll notice that most of the resources are js/css files that you had probably never used in your app.

Normal web assembly application size created with Blazor, even on top of the bootstrap or components library like https://blazorise.com/ (or both), should be less than 2 MB of network resources (excluding images that you may have).

So, please watch out for the default PWA service worker. It can make initial app loading time unnecessary long. If you are not providing any offline functionality, the easiest solution is just to remove service worker from your project by removing it from index.html file. Another option is to craft includes path pattern to include only what is really used.

Hangfire.io and .NET Expressions

I was troubleshooting an interesting bug recently thanks to which I’ve learned a bit more about Hangfire.io and expressions in .NET.

The situation was that Hangfire dashboard looked correctly, we had all jobs registered as expected. But what was actually executed by the scheduler for each job was same logic, which was supposed to be executed only for the last job. Nasty bug. We were not yet on production with hangfire.io, but still it was quite an unexpected behavior to see.

Reason was that we were wrapping each job in a class called JobRunner. This class was adding some generic functionality to update UI progress bars when jobs are running. Our code looked like that:

JobRunner runner = new JobRunner(myJobClass);
RecurringJob.AddOrUpdate(myJobClass.JobId, () => runner.Execute(), myJobClass.CronExpression);

Crucial thing to understand about Hangfire is that the what we pass to AddOrUpdate method is not a function to execute but an Expression describing the function to be executed. See this thread for difference between Expression<> and Func<>.

runner instance is not kept in memory or serialized. When Hangfire executes the job, it needs to create the instance by calling the constructor of given type. Constructor arguments are resolved from IoC container. In our case constructor argument was of type IJob. This interface was providing properties like JobId or CronExpression. So what was happening when EVERY job was running, was firsts implementation of IJob found in the container injected into a JobRunner. For each job same implementation of IJob was injected. And here we are – all jobs magically are executing same logic…

Now it seems quite obvious but it was necessary to learn couple of rules along the way to understand that behavior. It seems to be a common misunderstanding as there is even a comment about people making that mistake in hangfire.io source code, see Job.cs .

I hope this case study will help someone to avoid similar traps.

C# snippets in Azure APIM policies

It I was an interesting finding this week. I was not aware that it is possible to use multi-line C# code snippets inside Azure API Management policies.

For example if you’d like to calculate an expression (could be e.g. a backend service query parameter) based on a request header value, that you could use a snippet similar to this one:

@{ 
  /* here you can place multi-line code */ 
  new dict = new Dictionary<string, string>() {
    {"a", "1"}, 
    {"b", "2"}
  };
  var val = context.Request.Headers.GetValueOrDefault("test", "a");
  return dict.ContainsKey(val) ? dict[val] : "1";
}

Details about expressions syntax can be found here: here: https://docs.microsoft.com/en-us/azure/api-management/api-management-policy-expressions

One gotch’a was with moving that kind of policy definition to Terraform. It is necessary to replace characters “<” and “>” with entities: < and > respectively. Otherwise Tarraform could not apply the changes, although it worked directly in Azure portal.

Worth to note that you could achieve the same by using control flow policies, but this example is only an illustration, you can have more complex snippets e.g. for request verification or composing/manipulating response body.

Lambda architcture λ

I’ve been doing some research recently about architectures for large scale data analysis systems. An idea that appears quite often when discussing this problem is lambda architecture.

Data aggregation

The idea is quite simple. Intuitive approach to analytics is to gather data as it comes and then aggregate data for better performance of analytic queries. E.g. when users do reports by date range, pre-aggregate all sales/usage numbers per day and then produce the result for given date range by making the sum of aggregates for each data in the range. If you have let’s say 10k transactions per day, that approach will create only 1 record per day. Of course in reality you’d probably need many aggregates for different dimensions, to enable data filtering, but still you will probably have much less dimensions than the number of aggregated rows.

Aggregation is not the only way to increase query performance. It could be any kind of pre-computing like batch processing, indexing, caching etc. This layer in lambda architecture is called “serving layer” as it’s goal is to server for the analytical queries as a fast source of information.

Speed layer

This approach has a significant downside – aggregated results are available after a delay. In the example above the aggregated data required to perform analytical queries, will be available on the next day. Lambda architecture mitigates that problem by introducing so called speed-layer. In our example that would be the layer keeping data for current day. The amount of data for a single day is relatively small and probably does not require aggregates to be queried or can fit into a relatively small number of fast and more expensive machines (e.g. using in-memory computing)

Queries

Analytical queries combine results from 2 sources: aggregates and speed layer. The speed layer can be also used to crate aggregates for the next day. Once data is aggregated it can be removed from speed layer to free the resources.

Master data

Let’s do not forget that besides speed layer and aggregates, there is also a so called master data that contains all raw, not aggregated records. This dataset in lambda architecture should be append-only

Technology

This architecture is technology-agnostic. For example you can build all the layers on top of SQL servers. But typically a distributed file system like HDFS would be used as master data. MapReduce pattern would be used for batch processing the maser data. Technologies like Apache HBase or ElephantDb would be used to query the serving layer. And Apache Storm would be used for the speed layer. Those choices could be quite common in the industry but technology stack can vary a lot from project to project or company to company.

Sources

http://lambda-architecture.net/
https://amplitude.com/blog/2015/08/25/scaling-analytics-at-amplitude
https://databricks.com/glossary/lambda-architecture

Why focused domain models are better than generic ones

In software we like things to be generic and reusable. When designing data models we try to predict the future and possible scenarios where system can evolve. We often forgot that to be reusable, first system must be usable.

I like generic mechanisms. Generic search engine, generic RPC library, generic audit, generic authorization framework, generic UI components. Reusability on technical level is great. You can quickly build new things from existing blocks.

But generic business model? It is much easier to work with business domain model when it is focused and very specific. There should be no properties, classes, database tables or relationships just in case for the future. Every domain class and field must have specific, well defined meaning and purpose. There should be a real-life story behind each domain element. There should be no place for misinterpretation. Source code is often the most reliable source of truth about how whole organization works. Source code and its data must not lie about what it can really do.

I like to design systems by starting with business use-cases, end-to-end user flows and data flows. Once you have those dynamic elements of the system defined, then data model to support the processes should be straightforward. It is the business process to define domain model, not the other way around.

Development teams tend to focus too much on generalization to support unknown processes instead of pushing stakeholders to define concrete processes. Let’s do not pretend that we are prepared for all possible future scenarios. Both in business and in code. Let’s focus on bundling something that works and is well defined. This is why we adopt agile methodologies – to be able to evolve our models in iterations. This is why we adopt microservices – to be able to retire outdated process easily and build new ideas independently.

To conclude. Avoiding generalization on business domain level is good. It is not an anti-pattern. Generalization is often a distraction from the real goals. Generalization makes the goals less achievable and vague. Let’s be focused and specific.

Learning from work experience vs self-studying

I’ve share in one of my articles that is is estimated that only 10% of the learning happens a the formal training. The remaining 90% comes from everyday-tasks and learning from coworkers. Formal training includes things like self-studying from online resources which I would like to emphasis in this article. Since it’s only 10%, can it be treated with low priority?

Self study – 10% of the time

I do not know how those numbers were calculated. I’ve seen those numbers in on of the managers training. In my opinion those numbers can be quite accurate when we think about the time that we are able to spend on learning. Most of the time we spend at work and this is where we have the biggest opportunity to learn. Work builds real experience and practice. The knowledge becomes not just theoretical but also tested in real-life. We are able to come up with our own use-cases, examples and experience practical challenges.

Studying vs practicing vs teaching

When we think about the levels of knowledge it may be illustrated as: student → practitioner → teacher. Self-study brings you only to a student level. Good courses include hands-on labs, so that you can experience also some practice. But training exercises are always simplified and cover only simple “happy paths”. They do not include production-level challenges. It’s like fight with a shadow vs fight with a real opponent in martial arts.

Self-study – the impact

But does it mean that self-study can be ignored as it contributes only 10% to your learning? Absolutely not. It’s 10% in terms of the time but can be much more in terms of the impact. We do not always have the comfort to learn new things at work. Especially when you are an architect or in general technical lead, your company expects that you are the one who teaches others, who knows the new trends and who is up to date with latest technologies. You have to do a lot of self-study so that the whole company does not settle down with old solutions.

Online resources for self-studying

I was recently writing about DNA program which is a great example of self studying materials for software architects. Recently I’ve also singed-up for Google Cloud Platform Architecture training with Coursera. After doing the first module and getting this certificate I can definitely recommend Coursera GCP trainings. The best thing is that the training include a lot of hands-on labs with real GCP resources provisioned via Qwiklabs platform.

I have to admit that the online training possibilities that we have now are amazing. For a small price you can have access to resources that are often of much better quality than (unfortunately) some lectures at stationary universities.

Summary

So, go and self-study! Then use it at work and build a better world 🙂

Software developer 2020 vs 2010

I was starting my career in software development in 2008. We are now in 2020. What seemed to be true in 2010 and seems funny in 2020?

jQuery is the best thing that could happen for JavaScript and best library ever
Internet is not usable on phone. You need a desktop browser
You need to be a Linux or Windows guru to setup a production-ready server
SQL database is the most important component of the system. You cannot just store data in files
We do not have any unit tests and are not worried at all
Agile methodologies is a new thing. We have read about it and maybe will try in a new project after the specs are ready
Big data is a new thing and it’s about creating OLTP cubes
Machine learning is a niche academical subject, not usable in practice
Functional programming is a niche, good only for mathematicians
XML is nice, you can use cool XSLT transformations with it
You cannot be hired as a programmer after 3-weeks course
You can use cookies and users do not have to know

In 2010 you could say those things and be promoted to a team leader as a very wise person. How does it sound in 2020? 🙂

To get even more sentimental you can also check this article as a nice summary of last decade in software development: https://blog.pragmaticengineer.com/the-decade-in-review-in-software-development/

Monorepo in Azure DevOps

When working with projects using microservices architecture I opt for monorepo pattern. A single repository containing source code of all the services. It facilitates knowledge sharing between teams and encourages more unified programming style. Microservices give a lot of technological freedom, but this freedom should be used wisely. Common standards across the organization are still important.

Working with a single repository forces all the programmers to at least have a glance at the list of commits from other teams when making git pull. This channel of knowledge sharing can be very beneficial.

Despite having a single repository, we still need separate pipelines and policies for separate directories. Azure DevOps facilitates it by allowing directory path filter in crucial places:

path filter in build trigger settings
branch policies
required reviewers

It allows to setup a clear ownership of different parts of the repository and apply different pipelines to different parts of the repo while still having all the benefits of monorepo pattern.