With the rise of microservices and cloud architecture, software distributed systems are evolving rapidly with increasing complexity over time. Even when all of the individual services in a distributed system are functioning properly, the interactions between those services can cause unpredictable outcomes. Unpredictable outcomes, compounded by rare but disruptive real-world events that affect production environments, make these distributed systems inherently chaotic.
In order to prevent the chaotic nature of production environments that lead to outages, chaos engineering is the practice of facilitating controlled experiments to uncover weaknesses and help instil confidence in the system’s resiliency.
Principles of Chaos Engineering
From the Principles of chaos engineering
“Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production”.
Software testing commonly involves implementing and automating unit tests, integration tests and end-to-end tests. Although these tests are critical, they do not encompass the broader spectrum of disruptions possible in a distributed system
Chaos engineering is not meant to replace the likes of Unit and Integration tests but are meant to work together in harmony to give a high availability and durability which means no outages and therefore good customer experience.
So, what is the process involved in Chaos Engineering?
Some of the concepts which aid in chaos engineering are performed in different phases
Putting the Application to a “Steady State”
It’s the regular behaviour of the service based on its business metric.
Business metric Is that metric that shows you the real experience that your end users have with your system. Finding a use in the application which are most used by the users, track its behaviour week after week until they find its regular usage. That’s a steady state.
Building a Hypothesis case
When we have the business, metrics defined and set up the application into a steady state, the next step is building hypothetical use cases for which we don’t know the exact outcome. For example
Database stops working?
Requests increase unexpectedly?
Latency increases by 100%?
A container stops working?
A port becomes inaccessible?
Designing the Experiment
Now that we have built hypothetical use cases for our application for which we don’t know the outcome, it’s time to start experimenting. Best Practices include
Start small (failure injections)
Spinup customer/Production like environment
Setup the groups – Experimental and Control Group(s)
Control group: Ex: small set of users that are under the same conditions of the steady state
Experiment group: has the same size as the Control group, but it’s where the chaos will be injected
Minimize Blast radius – This means that you should minimise the number of potential users affected by the experiment. Always have an emergency stop so that any unintended consequences can be halted or rolled back.
Option to immediately stop the experiment
Findings on the chaos experiment, analysis on the differences found between control and experimental group when the application was in steady state are good indicators on how the system behaved under chaotic conditions. Some of the observations could be like
Time to detect the failure?
Time to get notified/alarm systems
Time taken for graceful degradation?
Time taken for Partial/Full recovery
Time taken for system to go back to steady state
Continuos Chaos Experiments
Focus on automating the entire chaos experimental process to keep repeatedly carrying out the same tasks, set up similar chaos experiments in both non-production (customer/production like) and actual production environments.
The problems found during chaos experiments need to be fixed and verified again in all phases of the chaos experiment.
The importance of observability
As systems evolve, they tend to become more complex and as such have more chance of failing. The more complex a system the more difficult it is to debug. Focus on the principles of observability – you need to become more data driven when debugging systems and using the feedback to improve the software. Use your chaos engineering experiments to build on this observability to allow you to pre-emptively uncover issues and bugs in the system
“Failure is a success if we learn from it.”
Learning from failures makes the system more resilient and increases the confidence in the system’s capabilities.
To achieve this, chaos engineering could be a game changer in the industry and top tech giants like Netflix, Facebook, Google, LinkedIn are using it to ensure systems can withstand any breakdowns by acting on plugging issues caused during the chaos experiments.
Author Srinidhi Murthy Senior Test Automation Engineer | Ammeon
Containerization is the process of bundling an application together with all of its required libraries, frameworks, configuration files and dependencies so that it can be run efficiently in different computing environments. Containers are isolated but run on a single host and access the same operating system kernel. Since they encapsulate an application with its own operating system, they enable developers to work with identical environments.
How it started?
Containerization was born in 1979 with Unix version 7 and became dormant for twenty years. In the early 2000s FreeBSD introduced “jails” which are partitions of a computer. There could be multiple partitions on the same system. This idea was further developed in the next six years with the addition of partitioning of resources in Linux VServer in 2001, Linux namespaces in 2002, to control groups (cgroups) in 2006. At first, cgroups were used to isolate the usage of resources like CPU or memory, but they were further developed in Linux Containers (LXC) in 2008. LXC became the most stable version of container technology at the time running on a Linux kernel. Due to its reliability technologies started building on top of LXC, one of them being Docker in 2013.
The Golden Age
Docker introduced an easy way to use and maintain containers which was a great success and the start of a golden age. Soon after in 2014, using Docker’s success, rkt (pronounced Rocket) came onto the market offering an alternative engine. It tried to solve some of Docker’s problems by introducing more secure requirements when creating containers. Docker has since introduced the use of containerd, hence, solving what rkt was trying to do. Both Docker and rkt were allowed to run on Windows computers in 2016 using a native Hyper-V hypervisor. The introduction of Kubernetes, a highly effective orchestration technology, and its addition to cloud providers became an industry standard to use with container technology. The momentum continues to this day with further improvements introduced by the community.
Containerization vs Virtualization
Virtualization is the process of emulating a computer on top of a real computer using hypervisor software. A hypervisor, also known as a virtual machine monitor, manages the execution of guest operating systems (OS) on the physical hardware. Thus, in a single powerful machine, there can be multiple virtual machines, each with its own space for processing, network interface, kernel, file systems and all the other things a fully functional operating system entails.
Both technologies have their advantages and disadvantages. If you require an OS’s full functionality, with a long life cycle, or want to deploy multiple applications on a server, then virtual machines will be better suited to your needs. However, if you want to use less resources and your application will have a short life cycle, then it is better to use containers. They are designed to have what an application needs and nothing more, making it useful for one task i.e. a microservice. In terms of development, virtual machines tend to have a more complex life cycle due to an increase to virtual copies as well as required resources, hence, more thought needs to be put into design and implementation. Even though containers are small, fast to execute in most cases and use less resources, they are more vulnerable to any threat because they all run on the same operating system kernel. This means that each container on a particular host machine needs to be designed to run on the same kind of operating system, whereas virtual machines can run a different operating system than the underlying OS.
Development life cycle with containers
From my experience, working with containers and microservices on a day to day basis is a lot faster than monolithic style systems. The code is much easier to understand in a microservice running inside a container because it is typically designed to perform a singular task. It requires less implementation to perform a specific task which means that debugging the code can be done rapidly. This reduces the feature request, development and bug fixing life cycles a lot.
The containers are lightweight and can be moved between different systems which makes it very versatile. The time to create it is very minimal and can be used instantly after pushing it to a repository. It is easy to set up your own repository similar to Docker’s default one, Docker Hub, where millions of images are freely available to download and build upon. It can be used in conjunction with an orchestrator like Kubernetes to create the containers with my piece of functionality very quickly.
In conclusion, containers are revolutionizing the way we develop and deliver applications. They are very portable since they have a write once, run anywhere structure. The same container can be used locally, in a cloud or in a development environment, thus eliminating the “works on my machine” problem that is heard a lot in the industry. There are a few security issues involved with it, however, this technology is constantly evolving and they will probably be addressed in the future. Using orchestrators like Kuberenetes can help out greatly by utilizing the install, upgrade and rollback functionality.
The strange-sounding name Kubernetes comes from the Greek word for Helmsman and is an etymological root for the word cybernetics. This envisions Kubernetes as the platform to steer the ship that is container-based Cloud-native applications. The project was originally designed by Google in 2014, starting as a C++ project was heavily influenced by the Google project Borg. The first stable release of the project saw it rewritten in Go and adopting the Greek moniker. The project has since been made open-source in a partnership between Google and the Cloud Native Computing Foundation.
What is Kubernetes?
Kubernetes is widely configurable in terms of its deployment model, however, it is always made up of Master and Minion nodes or the primary/replica architecture. The role of the Master node in deployment is to provide features such as ETCD a key-value data store, an API server which allows the passing of JSON of HTTP, a Scheduler which acts as an OS process manager by slotting Kubernetes pod objects into available ‘slots’ within the Minion nodes, amongst many others.
The Minion nodes in the Kubernetes deployment run the containerized Software in ‘Pod’ objects. These pods can consist of one or more containers which are co-located on the node and can share resources. Each pod is given a unique identifier and will communicate across the Kubernetes cluster by use of a Service object with service discovery using environment variables or Kubernetes DNS.
These objects are just a tip of the iceberg of the functionality available within the Kubernetes ecosystem. Users can also define their own Custom Resource Definitions objects for Kubernetes to act upon allowing complete flexibility of the platform.
K8s versus Swarm
So why Kubernetes and not Docker Swarm or Apache Mesos? To explain why Ammeon chose to lead with Kubernetes over other containers orchestration platforms we need to look at the main differences between these main competitors. While all three might be able to fulfil the needs of orchestration, they all have a better fit depending on the project. Mesos is the best fit for data centre management. It provides this with having a layer of container management frameworks such as Kubernetes run on top of it.
For those looking to build a local deployment, for a personal project Swarm would be a good fit. It’s quick and easy to set up, but in its simplicity, it lacks some of the industry-grade features provided by Kubernetes. These features include auto-scaling, a wide array of storage options, external load-balancing.
Advantages of K8s
The first major advantage of adding Kubernetes to a product project to be highlighted here is Scalability. Kubernetes provides the ability to horizontally scale services easily across the cluster. By using either Daemonsets to deploy a pod on each Node within the cluster or have a Deployment object to deploy multiple pods in a single namespace. Incoming requests to the cluster are automatically load-balanced amongst these pods. Kubernetes also allows for vertical scalability, this is the auto-scaling feature available in Kubernetes. This frees the users from having to put specific CPU and Memory values on the running containers.
Advantage 2 – Availability, this is provided to the running services by use of Liveness and Readiness Probes. These probes act by detecting if a pod has shifted from a healthy state into a broken state. It does this by executing a command against the service container.
Advantage 3 – Container configuration, by use of the ConfigMap objects. This supplies a range of environment variables as well as other configuration options to the underlying pod. These configuration changes can be dynamic configuring the Deployment object to restart when changes to the ConfigMap object occurs. Kubernetes also provides declarative management of its objects using configuration files. Each of the Kubernetes objects can be stored as YAML files, which allows for easy input and output from the system. It also stores a ‘last-applied configuration’ field as part of the object description, which is helpful to see what has changed between states.
Monolith to Microservice – slimline containers
I came onto my current work with Kubernetes from a monolithic-style project, and the differences between my day-to-day are immense. My work is focused on creating slim-lined containers with microservices running inside. These containers are then hosted in the Kubernetes namespaces where they can communicate and process the traffic incoming to the cluster. This brings a level of flexibility to the development process I hadn’t experienced before. It brings the lead time to make a change to service, build a new container and test it on a live system to a minimum.
This, as a developer, speeds up the feedback time for development, ensuring quicker feature request and bug fixing lifecycles. These services are also easily versioned when deployed on the cluster, which allows easy traceability for the underlying code. This is a vital tool in terms of debugging issues which occur during production.
In terms of testing new versions of containers Kubernetes, there is a handy feature available to you as a developer. This is the Canary Deployment feature, this involves deploying an in-test version of a service next to a stable release. Then using the built-in ingress of Kubernetes, we can monitor if this new service handles traffic without disrupting the running system. Then based on this test the canary version can be rejected from the cluster or promoted to the new leader.
Another handy mechanism which is similar to the canary deployments is rolling updates. Rolling updates allow Deployment objects updates to take place with zero downtime. This happens by incrementally updating Pod objects with new ones as they pass the liveness probes. This means when I change a container image, environment variable or port of the container, these changes will be rolled out across all the pods for the deployment on the cluster.
Eliminating the “works on my machine” problem
I can also mount my local workspace onto the cluster, by using one of the provided Kubernetes storage options. This also allows me as a developer to use the same container as someone else although having my local code available. This then eliminates the “works on my machine problem”, so all the service code is containerized, and those containers are no longer run on local machines with the use of a Docker daemon. Now they are centralized on the Kubernetes nodes. This ensures that all users of the system have access to the same
services and pods running within the deployment. This will greatly reduce fraudulent bugs being opened due to environmental issues or differences in OS architecture.
Metrics server Deployment
Kubernetes itself provides the metrics server deployment as standard. This allows us to retrieve certain data about the health of the entire cluster. This includes the CPU and memory utilization cluster-wide, resources used by the containers running on the clusters. It takes all these metrics into consideration when applying the horizontal or vertical pod autoscaler. When looking for more in-depth information on the metrics from the cluster, Kubernetes provides a pain-free integration with monitoring solutions such as Prometheus. As a platform Kubernetes has boosted the development lifecycle in all the projects I’ve seen it used in with its tie-ins to CI/CD. It has easy integration with Jenkins, Spinnaker, Drone and many other existing applications. This allows a reliable backing to existing CI applications as well as providing natural scalability to the CI world. Which in turn provides better infrastructure to develop and deliver code upon.
Network Function Virtualisation
OSM Release SEVEN from the European Telecommunications Standards Institute project provides support for deploying and operating cloud-native applications to Network Function Virtualization (NFV) deployments and with this over 20,000 production-ready Kubernetes applications. This includes support for Kubernetes based Virtual Network Functions (VNFs). To facilitate the Kubernetes based VNFs (KNFs) Kubernetes preferably needs to be located in a virtual infrastructure manager as opposed to a bare-metal installation. After the initial deployment of the cluster, KNFs can be onboarded using Helm charts or Juju bundles. This drives a delivery system from VNFs to Container-Native Network functions (CNFs) gives a new level of portability of these applications regardless of the underlying infrastructure choices such that containers can run atop Public cloud, Private Cloud, Bare metal, VMWare, or even a local deployment. These CNFs also bring a lightweight footprint due to the small container images, a rapid deployment time, a lower resource overhead, and in unison with the Kubernetes autoscaling better resilience. The proposed view of the NFV space is that year on year the shift towards this new CNF model will increase with a projection of 70% of all NFV deployments will follow this model by the end of 2021.
In summary, Kubernetes is a useful technology that provides benefits all across the software industry. It is a constantly evolving technology with more than 500 companies contributing to Kubernetes-related projects, so the benefits are constantly increasing. It provides stability and uniformity, to companies, projects, and developers alike. It is a great fit for every project moving towards the cloud-native application area.
This is the first part in a series of posts aiming to illustrate the opportunities of leveraging modern software in different sectors in the market. By understanding what is going on “under the hood” in these sectors, interesting differences and commonalities emerge.
Software is eating the world!
– Marc Andreessen, Wall Street Journal, 2011
With the risk of stating the obvious, our claim is that modern software is key to successful Digital Transformation. But we’re mindful this is (1) only possible with highly compelling hardware and cloud platforms and (2) it needs to be accompanied by other components such as building a good plan, developing skills and experiences and driving positive change with senior management support.
For us, at Ammeon, it is natural to start this sector review with telecoms. Collectively we have thousands of years of experience delivering software into this space
Telco & software – Isn’t it all about cell towers and racks of hardware?
From the outside, the telecom sector may seem slow considering regulatory requirements and standardization. But looking under the hood you realize how this sector keeps breaking ground by creating enabling technologies with critical scale, performance, quality, reliability and security characteristics. As consumers, we have come to expect that our connectivity must work. Always. And standardization has ensured interoperability and thereby scale. As well as significant cost efficiencies which in turn has enabled wide-scale adoption. The iPhone would not be what it is today without the innovations in telecoms providing such a high-performance network platform. Thus, as an industry, telecom has seen tremendous innovation for decades, to the extent a wireless connection now offers hundreds of Megabits Per Second of speed. Accessing the internet is often faster without a cable than with a cable!
Increasingly software is key to progress in this sector
Mobile Radio Network – moving to 5G
The mobile network has traditionally been heavily performance and cost-optimized given the requirements of large volume deployments of physical sites for coverage.
New developments in 4G and 5G bring virtualization technologies to the mobile network. Smart software makes better use of spectrum and ensures hyper-efficient network performance and network operations.
More recently the software-based radio network enables new deployments such as 5G for industries and manufacturing. The pace of innovation is increasing with the introduction of vRAN / virtual RAN (Radio Access Network).
Mobile Core Network – moving to 5G
The mobile core network has been at the forefront of embracing virtualization, with early consolidation of software functions in data centres and large-scale deployments using cloud technologies like Open Stack. The core network is now developing towards a “cloud-native” paradigm with unprecedented flexibility in development and deployment, powering further innovation. Software-based “network slicing” promises to offer even better Quality of Service (QoS) for critical enterprise use cases such as automotive and healthcare.
Fixed Network – deploying fiber and embracing SDN
The ‘fixed network’ has developed with step-function improvements over the past years. New last-mile technologies over copper and fiber now deliver gigabits of speed. Switching and routing networks have embraced increasingly sophisticated Ethernet, IP/MPLS and Security technologies in software on high-performance silicon. More recently software-defined network (SDN) technologies have become ready for commercial deployments. Software is instrumental in integrating and managing these networks.
At the far end of the fixed network, whether at home or at the office, we often find a Wi-Fi wireless network where advanced software ensures high performance and great user experience even when we’re in a congested, high ‘noise’ environment (Wi-Fi operates in shared spectrum environments). Some Wi-Fi deployments now experiment with advanced software technologies for indoor location, security and analytics in enterprise use-cases.
Across these networks that deliver high-performance connectivity, operators and enterprises require a software backend that delivers business support functions such as network monitoring and management, analytics and optimization as well as customer-facing functions such as billing. As operators merge across geographical boundaries and converge mobile and fixed networks, the need to consolidate data and back-end platforms increase to ensure efficient operations. Manual network management based on statistics and KPIs is replaced by sophisticated algorithms for self-healing and self-optimizing networks. The use of “AI” in network operations ensure performance and significantly reduces OPEX and to some degree CAPEX.
With all these legacy systems, new technologies, different vendors and services, the need for “orchestration” arises. Several proprietary, as well as open-source initiatives, aim to deliver seamless configuration and automation of disparate network infrastructure elements and services. One example being ONAP, going through a maturing phase similar to the early days of Open Stack. The jury is still out on what open platform will succeed beyond each major vendors’ network management system. But the consensus view is still that a high degree of software-based configuration (“programmability”) and automation is key to efficient telco network operations.
Services and Internet of Things (IoT)
Our networks have moved from providing basic connectivity services to offer more advanced services. NMT and AMPS had voice calls as killer application. GSM and CDMA introduced support for SMS messaging. 3G introduced compelling data services and 4G/5G essentially gives you ‘fiber in the air’ with throughput and latency suitable for anything from Netflix to remote surgery. These services are defined in the standardization of software and integration between providers or enabled by new service providers.
Adding to the complexity, some operators, equipment providers and cloud providers like Telenor, Vodafone, Ericsson, AWS and Microsoft offer compelling IoT platform services to facilitate end-to-end solutions for enterprises to connect devices to the cloud, be it cars, cattle or chainsaws. Beyond connectivity, the smarts of the solution offer significant value-add for companies and consumers in every niche, but software and solution integration are key.
OTT – Over The Top
The incredible power of the telco network platform(s) has enabled a large swath of Over The Top (OTT) services and applications. We now take Facebook, Twitter, Instagram, TikTok, Google Maps, Uber, mobile bank apps, Spotify and Podcasts for granted… But the reality is, none of these would reach us were it not for the ubiquity and reliability of the network. Most of these applications “live in the cloud”.
Apple iPhone and IOS. Google and Android. ’nuff said?
Common to all the above is a drive towards more modular software development, with defined integrations and APIs. And the aspiration to leverage cloud-native software design methodologies and tools. With software automation for integration and delivery from lab to live operations. Whether deployed on “bare metal” hardware optimized environments or to cloud environments. With the purpose of attaining speed and a better end-user experience.
In the next post, we will take a closer look at ‘financial services’. A sector which has been building software for decades. But where competitive pressures put new requirements on efficient delivery of much better customer experience. Where manual ‘paper’ based processes and the ‘outsourcing to India’ worked for some time to manage costs, but now hamper innovation and speed as financial institutions recognize tech is core and not just outsourced scope of work.
As per Wikipedia, a penetration test, colloquially known as a “pen test”, pentest or ethical hacking, is an authorized simulated cyberattack on a computer system, performed to evaluate the security of the system. Not to be confused with a vulnerability assessment which is is the process of identifying, quantifying, and prioritizing the vulnerabilities in a system
Pentest activities performed either from the outside of the target system (external) or within the target system (internal) typically result in Pentest Report which consolidates the findings and recommendations around the efficiency of existing security controls and defence mechanisms of the targeted system.
DevOps is all about efficient and speedy completion of development processes for faster delivery of products and services. Avoiding or missing security considerations in general in a DevOps cycle may lead to serious quality issues of final deliverables. Security vulnerabilities not discovered and fixed on time typically lead to a sizable technical debt which at the end becomes very costly to resolve and usually holds baggage of “credibility loss” of the software vendor.
In order to ensure that security is embossed into DevOps, pentesting should be performed on an ongoing basis to keep up with the continuous developments. Obviously performing it manually can be a burden as it might slow down the development process leading it to be of no value at all. It’s a no-brainer to state that it has to be automated as much as possible.
To do this you need to start with knowing exactly your development methodology and the environment. An Agile developed cloud-hosted system would have security challenges very different from those of the system “hidden” from the internet behind a set of firewalls and segregated VLANs. Such understanding of circumstances and associated risks will define the scope of your pentesting and you have to be very careful in choosing methods that will be the most effective, i.e. giving you back the most value of pentesting while fully respecting the speed at which DevOps has to work. Think about the network exposure, connected interfaces, data flows, access control etc. as well as your internal company security requirements.
Once the scope is defined, lookout for the best possible tool you can use. Sometimes a fully automated one may not be the best choice. Since your requirements could be specific, it is best to go for the tool which can take in customized input (e.g. scripts) and follow your definition of severity levels. Sometimes the very basic can work (e.g. CIS-CAT benchmark), you need to invest time in understanding your own needs and benefits.
All this makes the “planning” of pentesting in DevOps critical for the success of the investment.
Even though you can defining gating of the development progress on some types of such automated pentesting results I’m afraid that off-line educated analysis of results cannot be avoided and have to be done with care. The loop has to be closed back to both your code changes and in some cases your pentest automation. Engaging with the development teams is essential to make sure security becomes part of their daily code development “thinking”
Pentesting as such adds massive value to the quality of your software and also the credibility of your organization.
When embedded in your DevOps cycle it has to be automated to a large extent, planned carefully in terms of methodology and tooling so it is the most effective choice as it must not slow down your development cycle.
Analyze results carefully, discuss and bring the design organization with you on fixing them and continuously improving your code and DevOps cycle.
If your organisation has security issues, the worst possible way to find out about them is via a headline on a major tech blog. Conversely, the best possible way to find these types of issues is from your CI pipeline before code even gets merged to master. DevSecOps aims to take DevOps principles, such as shift-left testing, fast feedback and automation, and apply them to security.
If you’ve begun to implement DevOps practices and are starting to see your pace of delivery accelerated, you may find that security practices which were developed with “Big Bang” release model in mind can’t keep up. This post aims to explore some of the ways one might go about implementing DevSecOps in their organisation to ensure confidence in security at scale.
Static Application Security Testing
This is a form of white-box security testing. In much the same way as code may be scanned for maintainability purposes, by a linter such as PyLint, code can be scanned without execution for security vulnerabilities. Issues like password fields not being hidden or insecure connections being initialised can be caught in an automated manner. A static scan can be configured to run on every code push with analysis tools like Fortify, or even earlier in your workflow with IDE plugins such as Cigital SecureAssist.
Dynamic Application Security Testing
Dynamic Application Security Testing (DAST) is a black box technique that can be used once your code is deployed and running. One approach is to trigger a tool like Netsparker or Veracode as soon as your changes have been deployed to staging, blocking promotion to production until your dynamic scanner has completed its work and marked your latest deployment as secure.
Docker Image Scanning
If you’re working with Docker, you need to make sure your images are secure. You’ll find container scanning capability built into many modern DevOps tools, from GitLab’s Container Scanning functionality or JFrogs XRay to Dockers own Docker Trusted Registry – which comes with many other nice features such as RBAC for your images and Notary to sign and verify known good images. Under the hood, each layer from which your image is built will be scanned and an aggregate security rating generated, meaning you get confidence in not only your own artefacts but any third-party dependencies your images may have.
Speaking of third-party dependencies…
Many large attacks in recent years have worked by exploiting third-party software utilised within projects. Using third-party software is unavoidable – there’s no point in every organisation having to reinvent the wheel before they can start building their own products. However, external dependencies often expose massive attack vectors with some libraries having requirements on 10s or even 100s of other libraries.
To make matters worse, these requirements change constantly between versions. Manually working through dependency trees every time a version changes is completely unfeasible in a modern software house, but luckily there are tools that take the pain out of this important task. The OWASP Foundation has a dependency checking tool that can be run from the command line, added as a Maven Goal or triggered via a Jenkins Plugin, letting you check dependencies dynamically as part of your build process. Another approach is to use built-in dependency checkers provided by some SCM tools, such as GitHub or GitLab.
Security is an important and complex part of modern software development and one we at Ammeon are well familiar with. Whether you’re integrating current security checks with new DevOps practices or looking to build out your security capabilities we can help you ensure confidence all without sacrificing delivery speed.
The Covid-19 situation took everyone by surprise, with the lockdown forcing everyone (yes, including IT and technical support) into working remotely with not enough advanced notice. The impact has been that it has completely changed the way a companies operate. We saw a lot of companies having trouble with thousands of people having to work over their VPN and no infrastructures in place to support that.
Buying and providing laptops, supplying equipment, and even furniture to help staff work from home as best as they can really is a serious job. Having employees work from home means businesses face challenges when it comes to maintaining security while keeping critical business functions going. But when you put infrastructures in front of security you can have bigger problems.
Common Cyberthreats During Covid-19
Cybercriminals are aware of the situation and are ready to exploit it. So, here are some of the most common threats in this situation and what to do to make sure your assets and information are secure.
A denial-of-service attack is a cyber-attack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host connected to the Internet. Taking advantage of already overloaded networks (Distributed) Denial of Services is highly effective and can take down whole networks causing disruption of many services sometimes for several hours, impacting employee work and client data and services.
Remote access for your staff on servers and machines is a common practice but is an easy target for cybercriminals to try and get access to your network, especially when it allows connection over the internet.
Lack of antivirus and malware protection, use of personal machines, personal USB drives and phishing emails are the easiest way to get virus/worms/ransomware and compromise your data. Since companies are overwhelmed with the health crisis and cannot afford to be locked out of their systems, the criminals believe they are likely to be paid a ransom.
Probably the most common one and maybe the most dangerous one right now. Taking advantage of our thirst for information, cybercriminals are exploiting it with spam/phishing emails regarding Covid-19, government benefits, fake news and more; trying to get hold on personal/company information. Using emails pretending to be important people within the company, requesting for payments to be done, taking advantage of the lack of communication within the company, giving false information and trying to redirect users to fake websites are some of the ways they go about it.
Tips To Tighten Up Security
After understanding the threats and identifying the risks your company faces, it’s time to mitigate them. To do so, you need to know the defence lines available to you and how to best make use of them. They usually are:
Make sure your firewall has the latest stable firmware and updates, that you have disabled unused features and you are only allowing the strictly necessary services (specific IPs, ports, networks). Both Network and OS firewalls are important to complement each other. UTM firewalls are the best option nowadays.
This is extremely important to allow users to access resources in your network. Always use strong encryption, MFA, and make resources, where possible, only available over the VPN instead of the internet.
An enterprise and always up to date antivirus is essential to avoid malicious files, connections and websites. Not only on end-user machines but also in your servers.
Counting on users’ common sense isn’t enough and having an antispam is very important to stop malicious emails going to your users. Blocking them before they arrive to your users’ inbox will drastically lower the chances that they fall for a phishing email.
A very important piece of your defence in depth strategy to help detect anomalies in the network and stop them. Always keep your IDS/IPS databases up to date to protect from new threats.
Keeping logs, real-time monitoring, correlating events and notifications make SIEM a powerful tool, helping you to identify attacks and threats and prevent them as soon as possible.
Encryption should always be used to protect your network traffic from malicious people and applications trying to steal your data in transit. Always use HTTPS in your webservers and preferably TLS 1.2 where possible. Don’t forget to encrypt your emails, attachments, hard disks and USB drives. Not only laptops but server disks and desktops as well.
Patches and Updates
Always keep your operating system, applications and firmware up to date. This will prevent known security issues being exploited. Having a patch management system will facilitate the automation and management of those.
Multi-Factor Authentication adds an extra layer of protection to your systems. Where possible, always force the use of that. Even in the event that a password gets compromised, cybercriminals would still need the token (software or hardware) to get into your systems. Especially in your VPN, this is essential.
Good password complexity, reasonable password expiration and Single Sign-On will help your systems to be more secure.
As cool as it sounds to allow users to use their own devices to do their work, this can have a great impact on your security. Not having any control of the security on it, what’s installed, which antivirus is being used, encryption etc is just a recipe for disaster. Avoid the use of personal devices on your network at all costs (including personal storage like external disks and USB drives).
Policies and Procedures
Clear and concise policies and procedures will help your staff know what they can and cannot do with company’s equipment, network, internet etc. These will also let them know what can happen in case they do things they shouldn’t be doing and how to proceed in case they experience issues or find potential threats in your network. Staff can be crucial in helping to monitor and report suspicious activity within systems and networks, and even on-premises.
Lastly, and probably one of the most important defence lines, is Training. Training your staff to know the threats they face, how to recognise them and how to act upon encountering them is vital. Trained staff should know how to check for a fake domain in an email or website, check the sources of their information and the people that are contacting them.
Security and Productivity Balance
As important as security is, so is productivity, and having a fine balance between them is very important. Keeping a very tight security will impact in your staff productivity and allowing them to be more productive at the cost of security can be challenging. A few examples and tips are:
Don’t force password changes too often. Making long and hard passwords to remember will only make users write them down somewhere or make silly changes like changing one character at the end. 90 to 180 days is an acceptable amount of time. Providing a password management system for your users will help them to keep a strong password without having to memorise everything.
The rekey of VPN should cover an entire shift. 8 to 10 hours can easily achieve that. You don’t want users losing work and complaining that their VPN dropped in the middle of something. If you have strong encryption on your VPN there’s no real reason to have a rekey every couple of hours.
MFA enabled with Single-Sign-On on your systems will help users not have to remember many passwords or type them every time. Additionally, will help you to lock down all the access with ease if need.
Routing all traffic from users’ machines when working from home over the network can help you monitor their activities but can also have a huge impact on the performance of your network. Especially when users at home want to watch videos online or listen to music for example. You don’t want hundreds or thousands of users accessing them over your broadband.
Remember that everyone has different needs and each company has its own way of working and should always evaluate the risks, costs and try to understand what impact it will have on business before doing anything. The idea is to find an optimal balance between security and productivity that suits your needs, focusing on minimizing the negative impact on productivity and maximizing security processes.
Author Adonis Tarcio Senior System Administrator | Ammeon
During this uncertain time, officed based workers have been presented with new and unexpected challenges. At Ammeon, like many other companies, we have had to adapt and scale. Luckily our journey to the cloud had began in 2018. In the next series of posts, we are sharing our experiences to help you and your teams.
In this post, we will discuss what tools and systems we implemented to assist with our internal communication and HR processes.
As a medium-sized company, we couldn’t afford to spend a fortune on systems. So, we didn’t. We researched widely and picked wisely. Here is the stack we chose:
Group communication is vital in times of crisis. In the current emergency, our business couldn’t function without the communication tools we have tested and implemented over the past 18 months.
Ramping from zero to full adoption isn’t going to happen overnight. Tools like Slack take a bit of getting used to because of the asynchronous nature of the chat. Zoom is click and go, but bear with it and you will reap the rewards.
Communicating broadly across sites is challenging at best. We landed on the collaboration tool Slack, which allows direct communication with employees and allows for asynchronous communication via personal IMs, groups and channels. It takes a while to get critical mass and adoption but by insisting on its sole use we have successfully created an ecosystem where we can broadcast information as well as receive feedback.
You can integrate with simple productivity tools and integrate with third-party systems. You can also share files and it’s available on mobile. What’s interesting is once usage begins, it takes on a life of its own. Banter channels and special interests like gaming or hobbies start to spring up. You don’t need a moderator: set a few simple rules and trust your people. Ensure people are conscious of security and you are away.
Zoom is our tool of choice for setting up conference and video calls. The quality of the calls, in terms of picture and sound was better than anything else we tried. It’s really made the transition to home working very straightforward.
Slacks quality is outstanding, but you can’t send links outside the organisation. Our team love Zoom for our 3:00 pm Virtual coffee meetings where we sit and chat about informal topics for 15 minutes. For those with kids stuck at home during the Covid crisis, you may have noticed that Zoom is being commonly used by teachers to connect kids and their pals. The company must be booming right now.
Having a set of HR systems which are not only functionally complete but also usable and accessible remotely has allowed our People Operations team to scale a shared services function with a relatively small team. Consistency, templates and automation in all the tools have reduced admin, improved employee self-service and allowed HR to move up the value chain. Here’s what we picked
After road-testing a heap of different HR tools, we were excited to choose PeopleHR as our tool of choice. There are masses of HR tools out there. But for a company of our size (currently around 200 people), it’s an ideal choice.
Not only does it do all the basic HR admin storage, timesheets, holiday planners but it also integrates with a rake of other systems. The bulk upload of data was easy, the user interface is modern and easy to use. It has powerful reporting capability and mobile app. We even worked with finance to integrate the expenses module which has totally streamlined the user experience of submitting expenses by using the mobile phone.
The killer app though is process automation. Auto-generation of letters reports and processes is fab. It doesn’t have all the things that Workday has but you don’t need half of the functionality they have in a smaller enterprise. What’s more, the output from PeopleHR can be extracted into our SaaS accounting tool Xero which keeps our finance people up to date with changes to employee data.
As a services organisation, we handle a huge number of job applications across a number of different roles. Greenhouse is the creme da la creme of candidate management systems.
The tool allows recruiters to push out to our website and to various third-party recruitment sites. Some API integration is required but your website provider can help you with this. The tool allows you to post multiple roles and manage workflow as candidates move from stage to stage in a fully configurable workflow. The system automates responses to potential candidates. Reporting allows managers to see what is in the candidate funnel and help identify where in the process candidates fall out. API integration with PeopleHR simplifies the hiring processes by ensuring that when a candidate is hired data is pushed into the HR system, so we don’t have to reenter the information. Sweet!
We use a heap of other online tools for group whiteboards, Kanban boards and so on. Follow the guidelines above and you will be up and running in no time. And maybe after this crisis is over, remote working may continue to be a desirable and productive way of working for your company.
At regular intervals, teams reflect on how to become more effective, then tunes and adjusts its behaviour accordingly – this is one of the principles behind the Agile Manifesto stating that we need to constantly adapt and improve.
But how do we know if we are Improving if we don’t keep metrics?
We need to measure in order to tell if we are improving. Flow efficiency is a great metric that can be used to improve the lead time of any request, in any industry. It measures the percentage of time spent actively working on a request. Flow efficiency examines the two basic components that make up your lead time: work and wait time.
Lead Time is a term that comes from the Toyota Production System (TPS), where it is defined as the length of time between a customer placing an order and receiving the product ordered. Translated to the software domain, lead time can be described as the length of time between the identification of a requirement and its completion.
When we are asked for a prediction on when a new request will be completed, we often look at lead times to give an answer. Looking at lead times over a period of time gives us a high confidence level in setting delivery dates.
When we look at what we can focus on to improve lead times, we normally choose active work on requests, such as test automation and continuous delivery pipelines. However, we should focus on reducing the time we spend NOT working on requests.
Work in progress isn’t always actually work in progress. Flow efficiency shows us how often that is true.
To calculate flow efficiency, you need two pieces of information:
1. Active Work Time – Work time
2. Overall Lead time – Work + Wait time
Normal Flow Efficiency is the term given to teams who generally aren’t paying attention to this concept, and this Normal Flow Efficiency is about 15%. That means that 85% of the lifecycle was spent waiting on something! The lowest recorded flow efficiency is 2% while the highest is 40%. The highest is achieved with a lot of work and effort and should be what teams are aiming for.
There are many reasons for wait times such as the following:
1. Unplanned work and lack of focus
All the little tasks add up
2. Waterfall effect
Should not depend on other parts of the organisation to get work done
3. Not testing (enough)
The more code that is written without testing, the more paths you have to check for errors
We can keep on a straight path with proper unit testing
4. Manual testing
Why is there manual testing?
Why can’t it be automated?
5. Endless environment configuration
6. System Architecture
Processes belong to waterfall are not equipped to deal with changes so quickly
You can measure flow efficiency for a single request, but it is much more useful to measure the flow efficiency of all your requests over a specific time period such as one sprint.
Tracking Flow Efficiency
Decide whether to track in days or hours. Either is fine, it’s about the right balance for your team. But remember, creating too much overhead will lead to the tracking being abandoned. If you decide to track in hours, assume an 8 hour day.
Use project tracking tools, such as Jira or Trello, to track the status of the request.
Keep the request updated with the time where work was ongoing, and the time where work was waiting.
Improving Flow Efficiency
1. Focus primarily on wait time.
There is a greater benefit in reducing waste than optimising the performance of activities that add value. For example, reducing the approval of new software from two weeks to one week is quite quick and cheap to implement. However, refactoring test cases to improve the time it takes them to run would take a lot more time, effort and money to do.
Identify the root cause of the wait time so that actions can be put in place to change it.
2. Make small changes at a time.
Allow the team to suggest what changes might work. They will be more enthusiastic to implement the changes if the suggestions come from them.
3. Design experiments to reduce wait and increase flow efficiency.
Organisations which have executed an Agile Transformation are often disappointed with the results. They typically adopt the Scrum Framework and create cross-functional teams with all the skills required to do the job.
Agile coaches and Scrum Masters outline the three pillars of Scrum, Transparency, Inspection and Adaption, and help everyone understand the various Scrum events, such as the Sprint, the Daily Scrum, Sprint Planning, Sprint Review and Sprint Retrospective.
The teams are mentored on the importance of continuous improvement and it is explained to them that each Scrum ceremony is an opportunity to Inspect and Adapt.
Despite all of these efforts, the delivery of business value seems to take much longer than expected and the organisation is not reaping the rewards or promises of the Agile Transformation.
The Scrum + CI/CD + Test Automation Combination
It is important to point out that Scrum is a management framework. It does not outline software engineering practices. A software organisation needs to combine the Scrum framework with software engineering best practices in order to achieve high performance.
This leads to a critical point regarding Agile Software Development. The team can only Inspect and Adapt in the Daily Scrum if they receive feedback regarding the previous day’s work. This is where Continuous Integration and Continuous Delivery, CI/CD, and DevOps come into the picture.
The basic idea of CI/CD is to automate test cases as part of User Story development and include these test cases in the CI/CD pipeline. This enables fast and reliable delivery of all future software updates by executing fully automated testing of all updates. In other words, we fully automate the path to production and remove the need for manual approval and bureaucracy. This greatly improves the flow of business value from the development team to production and also reduces the feedback time.
No Continuous Integration or Continuous Delivery
The diagram below shows what happens within an organisation lacking CI/CD. This organisation is inherently anti-Agile as it places zero trust in the people who do the work.
There is extensive bureaucracy and approval required at every stage of the software delivery cycle.
Over time the code delivery process takes longer and longer as extra layers of approval are added. This labyrinth of bureaucracy is stifling to the creativity of development teams.
Eventually, organisations face the stark realisation that engaging in the delivery process is only worth the effort when delivering a large update. Tragically, this is the exact opposite of the regular delivery of small software updates which Agile promotes.
Typically organisations that work in this way struggle to retain the best employees and are extremely slow to deliver value to customers, even for small software changes.
CI but no CD
The next example outlines an organisation which has implemented a CI pipeline, but still suffers from the black hole of Manual testing and approval processes.
I state black hole as it is a good analogy for what happens to software delivery. Any business value/software entering the black hole will be very lucky to escape and achieve delivery in the end product or service.
Ultimately, this organisation still suffers severely from the lack of a Continuous Delivery pipeline and finds it very difficult to deliver value to Customers.
Full CI/CD Pipeline
The final example shows a simplified CI/CD pipeline which facilitates:
the fast flow of business value/software from the developer to the end product,
the fast feedback from automated testing performed in the pipeline to the developer,
a culture of learning, as fast flow and fast feedback provide the environment to support experimentation.
Ideally, the complete pipeline would be as short as possible in terms of execution time. A pipeline which takes longer than 24 hours will negatively impact on team performance as the opportunity to Inspect and Adapt will be delayed.
Any issues due to the new software will result in a visible failure somewhere along the pipeline and automatically prevents delivery of the software. This allows the developer to perform rapid code changes and get super fast feedback.
Testing should be shifted left as much as possible within the pipeline. This means that the testing should occur as early as technically possible. This is done for two reasons. Firstly, it enables the fastest possible feedback in case of a failure. Secondly, the execution time of test cases generally increases along the pipeline from left to right and the Deployments used become larger. For that reason, these scarce resources should only be used if all earlier stages of testing have passed.
But we already do all that!
At the time of writing this blog, the DevOps movement, which includes CI/CD and Test automation, has become standard across the software industry.
This has led to a new and strange phenomenon by which organisations often genuinely believe they have a full CI/CD pipeline, but in reality, they have only implemented the CI pipeline with no CD or only partial CD.
One product component may have a full CI/CD pipeline while another component only has a CI pipeline. As you can probably guess, the product component with a CI/CD pipeline is typically of a higher quality and supports fast delivery of new features, bug fixes and code refactoring, while the product component with only a CI pipeline typically has quality issues and is difficult to update.
Summary – why continuous integration is important for agile
Agile is built on the expectation that developers can deliver small incremental updates to a product or service. In practice, this will only be achieved if an organisation has put the time and effort into developing and maintaining a CI/CD Pipeline for their products and services.
Anything less will result in extensive delays and a lack of Agility.
The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations. Authors: Gene Kim, Patrick Debois, John Willis, Jez Humble.
Discover the Benefits of Agile
Our large team of Agile experts have a wealth of experience guiding critical transformation projects in some of the most challenging sectors. We will provide you with a unique insight into how your people, processes and technology can work together for positive change.
How many e-mails have you received this week
talking about the “digital transformation” going on everywhere around us?
Businesses in any sector grow increasingly paranoid over digitally competent
competitors, and/or excited about the possibilities. Google “digital
transformation” and you’ll find as many definitions as search hits. The most
respectable consultancy firms will be happy to help define your strategy (and
take a lot of your money).
If there has ever been a good time to talk to the
geeks about where to take the business, now maybe it.
Ammeon just celebrated its 16th birthday, and our
heritage is software development. We have been early adopters of test
automation. We have been early adopters of modern Continuous Integration,
Delivery methods and tools. We build awesome Agile teams. We architect and
build Software Defined Networks, and we build software for the Cloud. We have
worked side by side with our customers on “Automation” for years, learning and
solving problems together. And we are proud of that.
As I mentioned in my previous blog, Ammeon has a
story that still needs to be told. As we’re developing the strategy and plan
for this next chapter of our company’s history we asked ourselves the question;
What is Ammeon’s role in digital transformation??
In plain English, we think it is Better Software Faster.
Do you need help with adopting Agile ways of
working to deliver more with the budget and resources you have?
Do you need help with breaking that large,
monolithic legacy software into modern software based on containers and
micro-services destined for the Cloud?
Do you need help with releasing software with
customer value more frequently? Like every day as opposed to every few months?
Our experience in software Automation can help your team test, integrate and
release software features with quality. All the time. Every time.
Some of our customers need basic solutions to get
started. Some of our customers have the most complex and demanding requirements
for a modern “software factory”.
Whatever your starting point and need, we can help
you deliver Better Software Faster.
The “Snake on the wall” technique is one that has been used by many Agile and Lean teams in various
forms for many years. In its simplest form, the scrum master draws a snake’s
head on a post-it on the wall, and as team members run into distractions,
impediments and other frictions during the course of their work, they note it
on another post-it, and join it to the head … further post-its attach to the
previous, and so on, forming a snake body.
The length of the snake’s body gives an indication of how many problems there are at any moment. The scrum master collapses the snake down as the impediments are resolved.
A variant of this we tried in Ammeon was the 8 Lean Wastes Snake. Here, the Snake is drawn on a large poster in the team area. The snake is divided into 8 sections, one for each of the 8 Lean Wastes(the original 7 Lean Wastes + “Skills”):
As team members run into
impediments, they place post-its on the appropriate section of the snake. The scrum master keeps
an eye on any new issues appearing and attempts to resolve as appropriate;
perhaps also presents back to the team at the retro how many issues of
each type were logged this sprint, how
many resolved, and how
Another benefit of the lean waste snake is that it can provoke interesting team discussion around addressing waste; I
have used this to spur discussion of what types of waste our team encounters that would fit into the
8 categories, challenge
the team to think of examples (hypothetical or real) for each
category. I found this is very useful to help
the team identify, and put a label on, the various friction and
pain points they encounter; also as a “safety valve”.
Recently I had the opportunity to try introduce this technique to a non-software team who were on the beginning of their agile journey. They found the waste snake intriguing and worth discussing but ultimately a bit hypothetical, as they could not easily identify what section of the snake they should stick their post-its to. Also, they found the amount of space afforded to be quite limiting. For these reasons engagement with the Snake was slow and difficult.
So – we decided to iterate simpler and
more user-friendly. Replacing the snake with a 2×4 grid, one box for each Waste,
with written examples in each of that category of waste, and crucially, that the team helped
contributed to themselves, as a reminder. Now we have lots more space for
post-its, along with some written
reminders of grounded examples relevant to them.
While engagement is still growing with
the new Wastes Grid,
going through the exercise of capturing the team’s own examples and a few
reminders during the course of the sprint, helps capture and crucially visualise the current friction
The HR congress was on this week, I couldn’t attend but once again AgileHR was on everyone’s lips. In my last blog, I spoke about how adapting Agile concepts to HR can bring about some significant changes in the productivity and engagement of HR organisations. It sounds like Agile is great but why are more people not adopting it? Having spoken to some of my colleagues in the HR community it seems that the difficulty in adopting agile is in three parts:
The Language of Agile Because of its origins in Software development, the language of HR can seem inaccessible to non-technical people. It refers to scrums, retros, ceremonies and sprints. It’s a lot to get through to even begin to understand what it means for teams
Fear of change Having met with a number of HR colleagues either in training, in forums or in conferences its clear that as a discipline we have a habit of hiding behind convention and compliance in order to keep things going as they are. One of the reasons for this is that many HR organisations hold fast to the Ulrich Model of HR, the 4 roles etc. But guess what? That model was built in 1995! Guess what else happened in 1995. Amazon sold its first book; Netscape the first commercial web browser; We were still 10 years away from the iPhone! Look how much the world has changed since then and we are still relying on a decades-old methodology for HR. Let’s evolve!
Where to start HR folk by necessity are both smart and determined. So even if they wade through all the language it’s hard to know where to start.
With that in mind, I am going to give you a
jargon-free kick start to Agile which you can start using today. I am going to
look at two primary concepts which will help you get off the ground one is
about silos in and around HR and the second is the importance of a brief daily
Tear down the walls
The first thing to do is to break down as many silos as you can. One of the big issues found in software development was that software was being developed in one area and “thrown over the wall” to be tested. The testers had no idea what the code was how it worked and were expected to make sure it worked after a quick handover at the end of the development process. Since the advent of Agile and more recently DevSecOps, Developers, Testers, Security and Operations cocreate software products and solutions with everyone being involved throughout the process to ensure everyone knows what’s going on. Clearly, the same involvement is important in HR functions. A common silo exists between HR and recruitment with recruited candidates being “thrown over the wall” to HR to onboard sometimes with little knowledge of who the candidates are.
HR and recruitment (and in my teams, I pull in admin) work more closely together to ensure joined-up thinking and action in day to day activities. Not only does this lead to better candidate experience and better onboarding but a broader understanding of the roles of other departments. It leads to a greater understanding of the broader business context; an understanding of talent and recruitment challenges; cross-training and better teamwork. Okay so now you’ve kicked these silos into touch what next?
Daily cross-functional meetings
One of the critical aspects of Agile is the daily stand-up meeting (so-called because it’s daily and you stand up for the meeting to keep things brief!) The success of the meeting is its fiendish simplicity. One person is dedicated as the lead for the meeting to ensure the meeting happens, keeps it moving along by ensuring detailed topics are taken offline. This person also ensures items are tracked. Each team member speaks very briefly. They say 1. What they did yesterday (brief update but also ensures people keep their commitments 2. Plans for the day ( what are they committing to do today) 3. Blockers or help needed. In this quick meeting (15- 20 minutes depending on the size of the team) everyone learns what other team members are focusing on, can see how they can help if required, make suggestions and bring others up to speed with their focus areas. What I have found since I was introduced to the concept of the stand-up meeting is that it is a very quick way to exchange a lot of information as quick headlines rather than having to go through slides for updates. The meetings stop duplication of effort, which can sometimes happen in HR helpdesk environments. It also provides a broader context for the team about the workload of each team member and allows those with a reduced workload to step up and help. It reduces the need for lots of other meetings. For HR teams, I insist on doing stand-ups in an office and a commitment from participants that absolutely nothing confidential is mentioned outside the stand-up space and that employee personal matters are not discussed in this format with the broader team.
I really think your teams will enjoy the format if
you add pastries to the occasional meeting you will get additional brownie
points. As a manager you can check in on stress levels, happiness index
whatever your temperature check mechanism is. I can’t predict what will happen
in your teams but each time I have implemented this I have seen less stress, a
greater feeling of “being on top of things” and from teams a sense that they
are at once empowered by managers and supported by colleagues.
Why not give those ideas a go and let me know how you get on? Next time I will talk about some common digital tools that will help with organising your team workload and improve team communication both with HR and the customer organisations you support.
Lately I’ve been working with OpenShift and its source-to-image capabilities and I must say I am impressed. How impressed you ask? Impressed enough to want to write my own “Hello World” app for it. Currently, there is a simple Python app available which is used for most demos/education material. Even though the app does the job, I think that it doesn’t fully demonstrate some of the more powerful capabilities of OpenShift and OpenShift S2I. And then of course there is the result. Let’s face it, no one gets excited by seeing a “Hello World” message on a white canvas.
I spent some time thinking
how to enrich the experience and came up with five criteria for the app.
app has to be simple and not by means of importing 100 packages.
needs dynamic content – “Hello World” is boring.
should connect to other APIs – that’s what you’d do in real life, isn’t it?
should be configured – how about injecting secrets? If this fails, you’ll know
– no smoke and no mirrors.
not production, it’s a demo – best practices for secret handling, security and
high availability are not part of this project.
With these specs in mind, and
given my past experience with Twitter from building Chirpsum, a Twitter news
aggregator, I chose it as the source of dynamic content. Consuming the Twitter
API requires configuring secrets so two more ticks. To cover the remaining two
criteria, I chose Python and Flask.
I basically built a search
app for Twitter which returns you the top 10 tweets related to a word or a phrase
and also the top 4 words/mentions that are used along with your search query.
Want to try it out yourself?
What follows is a step-by-step walk-through of deploying the app. If you’re
here just to see results, skip that section – although the deployment of the
app is the real magic.
Deploying the app
This post assumes you have an
understanding of container technologies and know what
Kubernetes, OpenShift and Docker are. If you don’t, that’s fine, you don’t
really need to. It’s just that you won’t be able to fully appreciate the
effort and time saved in DevOps. In any case, try to think of your current
path to a cloud deployment and compare it to the following.
What is OpenShift’s Source-to-Image (S2I)?
Simply put, it is a tool that
automatically builds a Docker image from your source code. Now the real
beauty of it is that for specific languages (Python being one of them)
OpenShift will pick a builder image and build the Docker image for you without
you having to define a Dockerfile. You just need to provide some configuration
details in the expected format.
Basic knowledge of OpenShift and Kubernetes (here’s a start)
A preconfigured OpenShift environment (you can use one of the trials if you don’t have one)
A Twitter account with a registered phone number
Create a Twitter app
and obtain the secrets for it
to apps.twitter.com and create a new app
created, switch to the Keys and Access Tokens tab and take note of the
following: ◦ Consumer Key (API Key) ◦ Consumer Secret (API Secret) ◦ Access Token ◦ Access Token Secret
to Permissions and change to Read only
Deploy the app in
Login to the OpenShift webconsole
Create a New Project: ◦ Name: twitter-search ◦ Display Name: Twitter Search
Select Python under languages (this might differ depending on the configuration of your OpenShift environment)
Select Python 2.7
Click on Show advanced routing, build, deployment and source options and fill out the form ◦ Name: twitter-search ◦ Git Repository URL: https://gitlab.com/ammeon-public/twitter-search-s2i.git ◦ Routing: tick the Create a route to the application box ◦ Deployment Configuration: add the following environment variables (you need your Twitter secrets now – if in production you should use a more secure way of injecting secrets) • OAUTH_TOKEN=Access Token • OAUTH_TOKEN_SECRET=Access Token Secret • CONSUMER_KEY=Consumer Key (API Key) • CONSUMER_SECRET=Consumer Secret (API Secret) These environment variables will be exported in the container that the Python app will run. The Python app then reads these variables and uses them to authenticate with the Twitter API. Warning! Your browser might save these values. Make sure to either delete the app at the end, use an incognito window or clear the Auto-fill form data from your browser.
Scaling: Set the number of Replicas to 2 (this is to avoid downtime during code updates and also increase the availability of the app – these concepts are not covered in this demo)
Click Continue to overview
and wait (~2mins) for the app to build and deploy.
Note: the first build will be
considerably slower than subsequent ones as OpenShift has to build
the image and get the required Python packages. On subsequent builds, the
base image is reused changing only the source code (unless significant
configuration/requirement changes are made).
Hello World, Twitter Style
That’s it, you’re ready to
greet the world! Just enter a word and click on the Search button.
After you finish demonstrating
the app, it’s a good idea to clean up. To do so, follow these steps:
the OpenShift webconsole ◦ Click on the project drop-down on the top-left ◦ Select view all projects ◦ Click the bin icon on the right of the project’s name (“Delete Project”)
to apps.twitter.com and select your app ◦ Scroll to the bottom and select “Delete Application”
To Wrap it up
OpenShift’s source-to-image capability makes cloud deployment and DevOps extremely easy. For production environments of big enterprises with complex software that needs to be optimized at a Docker or OS level, S2I might not be optimal. But for building and deploying simple apps it saves you the hassle of defining a Dockerfile and the necessary deployment scripts and files (think yaml).
It just streamlines the
experience and allows the developer to focus on building the best app they can.
Thanks for reading! I hope you’ll enjoy playing around with the app and perhaps use it as your default demo app. Please do open pull requests if you want to contribute and of course, follow me on Twitter @YGeorgiou.
Ammeon Enables Cathx Ocean To Deliver Faster Through Agile-Lean Consulting
Cathx Ocean design, manufacture and supply advanced
subsea imaging and measurement systems. Software and hardware
R&D projects in Cathx were challenged by lengthy development cycles
and a lack of project visibility. In addition, Cathx was developing a business-critical
product and needed help to set up new processes to plan and execute its
delivery. Cathx selected Ammeon help overcome these challenges.
Ammeon recommended that Cathx undertake an Agile-Lean Start programme. Over a 6-week period, Ammeon reviewed development processes and established Agile-Lean work methods. Key improvements achieved at Cathx Ocean during their participation in Agile-Lean Start include:
Replaced multiple processes with a single
standardised workflow and trained Cathx teams in its use.
A 79% reduction in the work
backlog in the first week through the use of visual management systems and
closer in-team collaboration.
A pilot project was brought from ideation to
delivery in less than 5 days.
“The Agile-Lean Start has been a huge leap forward for us in adopting Agile practices,” said Marie Flynn, COO, Cathx Ocean.
“Planning and prioritisation of Research and Development work with the new processes and workflows is much simpler and more efficient. We now need to apply these practices to other areas and embed them in the company.”
I was delighted to have been asked to talk about my
journey into AgileHR at the Agile
Lean Ireland event in Croke Park in Dublin
in April. Despite overwhelming imposter syndrome talking about Agile to a room
full of Product Owners and Scrum Masters, I was happy to see a connection being
made with these experts by using language and terms which they were familiar.
What was sad from my perspective was how few HR people were at the conference
despite the HR focus. For those of us from HR who were there, it was a wake-up
As a long term HR professional I sat in awe as the giants of the Agile world such as Barry O’Reilly and Mary Poppendieck talked about applying agile principles to massive technology projects – and that these incredible achievements were less about the tools but the product of a people-centric mindset: teamwork, coaching, innovation, leadership, mentoring and facilitation. Surely, I thought to myself, this list of skills belongs on HR’s turf!
Well, I’m sorry fellow HR folks but while we have
been hiding behind compliance, admin and working on yet another iteration of
annual performance reviews the technology world has moved on without us,
developing a set of people principles which is driving development in
everything from AI to space rockets.
So how do we get back in the game? Have we missed
the boat? Well, no. The good news is there is a quiet revolution going on in
the HR world. The agile mindset is being applied to HR and is being driven by
thought leaders in this area: the great Kevin Empey, Pia-Maria Thoren and Fabiola Eyholzer are evangelising the AgileHR message to
enthusiastic audiences worldwide. It’s clear that despite being slow off the
blocks, HR professionals have all the skills and competencies to be more than
just bit players in the future of work – we can utilise our natural skills to
bring strategic value to our businesses.
Over the next few articles, I am going to be talking about how you can quickly and easily begin to introduce Agile practices supercharge both your team’s performance and the perception of the people function within your organisation. Sounds good? Stay tuned.
The good folks at 3scale gave us access
to the first beta version of the on-premise API Gateway application. This
presented us with an exciting opportunity to test its applicability for a proof
of concept IoT application we’re building.
and IoT Application
The 3scale API Gateway lets us manage access to our concept IoT application in terms of who can access the API (through an application key for registered users) and the rate at which API calls can be made.
The IoT application is a web server exposing a REST API to retrieve information from IoT devices in the field. After developing and testing locally, we realised that running the webserver on the OpenShift instance where 3Scale was running made everything simpler. This enabled all team members to access the webserver all of the time, instead of just when the development machine was connected to the company network.
The diagram below shows the stack where both 3Scale and the IoT proof of concept are deployed to OpenShift.
Build Process in OpenShift
The on-premise installation of 3Scale is an OpenShift application that we deployed from a template. For the IoT application, we created a new OpenShift application from first principles. A search of the OpenShift website returned this article, which correlated closely with what we wanted to do. We had already written the webserver using Python, specifically Flask.
The article describes how to deploy a
basic Flask application onto OpenShift from a github repository. This is a
basic use of the S2I build process in OpenShift. The S2I build is a framework
that allows a developer to use a “builder” container and source code directly
to produce a runnable application image. It is described in detail here.
After following the article on Python application deployment and understanding the process, we forked the repo on github, cloned it locally and changed it to reflect our existing code. We cloned the repo instead of creating a new one because the Getting Started article, referenced above, used gunicorn rather than the built-in python webserver and had the configuration already in place.
Running through the process with our own
repository included the following steps:
Add to Project option from the OpenShift web console
Selected a Python builder image and set the version to 2.7
Gave the application the name webserver and pointed to the previous git URL
When the builder started, we selected Continue to Overview and watched it complete.
Using the S2I process we could
easily and repeatedly deploy a web server with more functionality than a basic
“Hello World” display.
All of the API methods were merely stubs
that returned fixed values. What we needed was a database for live data.
We developed the database functionality
locally with a MySQL DB running on the local machine. When it came to deploying
this onto OpenShift, we wanted the same environment. We knew that there was an
OpenShift container for MySQL and it was straightforward to spin it up in the
Storage in OpenShift
The nature of containers is that, by
default, they have only ephemeral storage (temporary, tied to the life of the
container). We wanted the database to persist over potential container failures
or shutdowns. This required attaching storage, known as a persistent volume to
the container. OpenShift supports different types of persistent volumes
AWS Elastic Block Stores
RBD (Ceph Block Device)
To progress quickly, we choose NFS storage and created an NFS share. This NFS share was then provisioned in OpenShift. This involves creating a file defining the properties of the share and running the command:
oc create -f nfs_vol5.yml
The file contents are shown as follows:
Behind the scenes, the database application creates a “claim” for a storage volume. A claim is a request for a storage volume of a certain size from the OpenShift platform. If the platform has available storage that meets the size criteria, it is assigned to the claim. If no volume meets the exact size requirements, but a larger volume exists, the larger volume will be assigned. The NFS storage we defined in Openshift met the criteria for this claim and was assigned to the application.
After the persistent volume was added to the application, we used the console tab of the container to edit the schema. We set the schema as required but we faced a new issue, connecting from the application to the database.
the Application to the Database
To connect from the application to the
database requires the database-specific variables set in the database container
to be exposed in the application container. This is achieved by adding the
variables into the deployment configuration. This causes the application to
redeploy picking up the new environment variables.
Specifically, the database container is
deployed with the following environment variables:
MySQL database name
These environment variables can be set
as part of the initial configuration of the container but if they aren’t,
default values are provided.
The environment variables are set
in the application container using the following command:
oc env dc phpdatabase -e MYSQL_USER=myuser -e MYSQL_PASSWORD=mypassword -e MYSQL_DATABASE=mydatabase
The seven wastes of Lean, when translated from the original Japanese of Taiichi Ohno, are Transport, Inventory, Motion, Waiting, Overproduction, Overprocessing and Defects . The 8th waste, which was added later , is “under used skills” and is the least mechanical and most human of all the wastes. Often it is the most overlooked and, in my experience, the most important waste.
Under used skills deliver no value
few years ago, I performed an analysis for a lean project at the Localisation
division of a major international software vendor. At the time, the
standard process used was to receive the English version of the software,
translate the strings into 26 languages, test and then release. The
process to translate took over six weeks to complete and required translators,
testers and linguists. As I examined the workflow, I discovered that the
product had zero active users in one of the languages. On
further investigation, it turned out that the company had stopped all sales and
distribution in that regional market several years previously, but sales had
failed to inform Localisation. It was a difficult day when I had to
explain to the translators and linguists that not only was their work no longer
needed, they had not added any value to the product for almost
half a decade. Thankfully, these employees were reassigned to other contracts
within the company where they were able to use their skills and experience to
add real value.
another occasion, I discovered that a team of 10 people were performing eight
hours of post-release testing on a piece of software that they had previously
tested pre-release. These tests existed because at one point a failure in the
release process had caused a corruption on a client site. The failure had been
fixed but because no-one could be sure a separate failure might not appear,
these tests remained and were dreaded by the testers because the work was
boring and almost always pointless.
this case, our solution was to develop new automated tests to provide the same
function as the manual testing. The automated tests could be triggered
immediately after the release process instead of the next working day. It also
had a run time of less than 80 minutes, which was much less than
the 80 hours need to manually run the tests. The new process
made the team happier as they could focus on more interesting work and, as part
of handover, two of the testers were trained in how to maintain and further
improve the tool.
Independence and objectiveness
Ammeon we offer an initial assessment of your workflows for free. We
believe that it is really important to have a regular independent objective
review of processes to identify waste.
Most of the time our analysis will show that your problems can be solved with improved tools, improved processes and adapting your culture to drive toward continuous innovation. Often this will lead to a recommendation of further training or a supported change through a Bootcamp! If this article has inspired you to address inefficient work practices in your IT organisation, request your free assessment by clicking here.
T. Ohno. Toyota Production System, Productivity
J. Liker. The
Toyota Way: 14 Management Principles from the World’s Greatest
Manufacturer, New York, McGraw-Hill, 2004
The management of Application
Programming Interfaces (APIs) is a hot topic. Discussions usually include
mention of phrases like ‘exposure of your customer data’, ‘monetizing your
underlying assets’ or ‘offering value add services to your customer base’.
If you have ‘assets’ or data which you
think may be useful to other third parties or end customers, or if you are
being driven by regulatory changes or market pressure, then an API Gateway has
to form part of your solution strategy.
An API gateway allows an organisation to
expose their internal APIs as external public facing APIs so that application
developers can access the organisation’s data and systems functions. The
capabilities of an API gateway include: management of the APIs, access control,
developer portal and monetization functionality.
There are a number of offerings in the market and in this post we focus on the 3Scale offering, one of the latest entrants into the on-premise space. 3Scale, who were acquired by Red Hat, have had an API management offering as a Software as a Service for several years and have now taken this offering and packaged it for use inside the enterprise.
The good folks at 3Scale gave us access to the first beta version and we gave it a detailed examination. In our evaluation, we look at how to install it, how it works with Red Hat OpenShift and we describe some of the interesting use cases it enables. We also share some insights and top-tips.
How easy is it to Install?
The 3Scale platform comes with several deployment options, one of which is an on-premise option. For this deployment, 3Scale utilises the Red Hat OpenShift environment. The ease of integration between the 3Scale platform and OpenShift demonstrates that Red Hat have put a lot of work into getting the API Gateway working in a containerised environment. The 3Scale platform itself is deployed within OpenShift’s containers and proved relatively easy to install and run.
The architecture of the OpenShift cluster we used was a simple single master and a single minion node, as shown below.
The servers, which came configured with
Red Hat Enterprise Linux (RHEL) 7.3 installed, have their own domain and the
API endpoints and portals are contained within it.
Top Tip: When copying the ssh key, make sure it is copied to the host that generated the key.
Otherwise, it can't ssh to itself and the installer notes an error.
With the installation complete, the next step was to get access to the console. This required us to edit the master configuration file (/etc/origin/master/master-config.yaml). For our purposes (and since we are an extremely trusting bunch), we used the AllowAll policy detailed here.
Following the edit, restart the master
systemctl restart atomic-openshift-master.service
The OpenShift console is available at https://vm-ip-address(1):8443/console/.
To administer OpenShift from the command
line, simply login to the OpenShift master node as root. Again, the
AllowAll policy means that you can log in with any name and password
combination but to keep things consistent you should use the same username all
You can then create a project for the
3Scale deployment. After this, 3Scale can be deployed within the containers
allocated by OpenShift.
(1) This is the IP Address of the master
3Scale has the following prerequisites:
A working installation of
OpenShift 3.3 or 3.4 with persistent storage
An up to date version of the
A domain, preferably
wildcarded, that resolves to the OpenShift cluster
Access to the Red Hat
(Optionally) a working SMTP
server for email functionality.
Persistent volumes are required by the 3Scale deployment and therefore should be allocated within OpenShift prior to deploying 3Scale. For our deployment, the persistent volumes were configured using NFS shares.
Once the persistent storage was set up
for 3Scale, the deployment was straightforward. We were supplied a
template file for the application that just required us to provide the domain
as a parameter.
After about 25 minutes the application was up and running and we were able to login.
A final setup step was to configure the
SMTP server, this was a simple matter of defining and exporting variables into
the OpenShift configuration.
API Gateway: sample use case
In order to exercise the platform we
needed a use case to implement and an API to expose. We decided that an
Internet of Things (IoT) use case made a lot of sense, not least because it’s
such a hot topic right now!
So with that in mind, allow yourself to
be transported on a journey through time and space, to a world where Air
pollution is actively monitored everywhere and something is actually done about
it. And consider the following scenario:
There are a number of IoT devices monitoring Air Quality and Pollution levels throughout a given geographic area.
There may be a number of different makes and types of devices monitoring, for example, carbon monoxide levels, nitrogen dioxide, particulates etc.
A micro-service architecture could be deployed on the Beta IoT platform. Each micro-service could then process its own specific API.
The 3Scale API Gateway would then be responsible for offering these APIs out to public third parties via the Internet to consume.
The 3Scale API Gateway would also be responsible for managing the external access to these micro-service APIs and providing authentication and authorisation, as well as injecting policies. Auto-scaling of micro-service resources could also be provided by the 3Scale platform in conjunction with the OpenShift environment.
The third party applications, which consume the public APIs, could then use this information to provide, for example, a Web Dashboard of pollution levels or a mobile application for users’ smartphones.
SmartCity IoT Use Case
In keeping with this, we wrote a number of user stories and scenarios around the use of the API. To implement our IOT API, which the 3Scale platform was going to ‘front’ to the outside world, we developed a basic web service written as a python flask application. The application was stored in GitHub to ease deployment to OpenShift. Rather than create a completely new project, the OpenShift example python app was forked and changed. This is the GitHub repo we used.
What do you get for your Rates?
The 3Scale platform allows you to configure rate limits for your API. The 3Scale documentation on rate limits is here.
Rate limits are set up on a per plan per
application basis. This means that each application has a set of plans and each
plan has the capability of setting multiple limits on each method. This is done
using the Admin Portal, where specific rates can be configured. The rates work
by essentially counting the number of calls made on an API method over a time
period and then measuring this against the limit configured on the portal for a
given application. It is possible to stop a method from being used in a plan.
Examples of rate limits are:
10 method calls every minute
1000 method calls every day
20 method calls every minute
with a max of 100 in an hour
When a limit is exceeded, the method returns a 403 Forbidden response and, if
selected, generates alerts via email and the 3Scale dashboard.
The API Gateway can be configured to use
either username/password or OAuth v2.0 authentication of applications. The
username/password configuration is pretty simple but OAuth authentication is a
little more tricky.
You can reuse the existing system-redis service and set either REDIS_HOST or REDIS_URL in the APIcast deployment configuration (see reference). If the gateway is deployed in a project different from where the 3Scale AMP platform is, you will need to use a route.
The Analytics feature of the platform
allows you to configure metrics for each of your API method calls,
configuration is done via the platform dashboard. The platform will graph and
show the following information:
number of hits to your API,
hits can be calls to the API or broken out into individual methods on the API
quantity in MB or GB of data
uploaded and downloaded via the API
compute time associated with
calls to the API
count of the number of
records or data objects being returned or total disk storage in use by an
The Developer Portal allows users to
customise the look and feel of the entire Developer Portal in order to match
any specific branding. You have access to nearly every element of the portal so
you can basically modify to suit your own environment.
It has to be said that the documentation
around how the portal is customised could be better, but if you are an
experienced web developer it will probably be straightforward.
Integrating with Swagger
Swagger is a standard, language-agnostic
interface to REST APIs that allows both humans and computers to discover and
understand the capabilities of the service without access to source code or
documentation. 3Scale allows swagger documentation to be uploaded for the
APIs that are to be exposed.
We used this online editor to create the swagger documentation for the IOT API. The specification can be viewed here. The files above are the basic documentation for the API but require updating to use in 3Scale and you need to reference this documentation. The host needs to change as does the schemes setting, and the user key need to be added. To add the file follow the instructions here.
Top Tip: One of our findings from working with the swagger documentation was that valid
security certificates need to be installed for the 3scale platform. When they aren't,
the swagger-generated curl requests returned an error.
Some documentation improvements could be made. For example, to provide an overall context or architecture overview. This would be a benefit as a starting point in order to provide the user with a better understanding of the different components ‘under the hood’. Some of the specifics which need to be modified (such as the Developer Portal web pages) could be explained better, with examples being provided for the most common tasks. Given that we were working on the first Beta version of the product, we’re going to assume that the documentation improvements will be in place prior to general availability.
we didn’t do
Due to time and project pressures, we
didn’t perform any stability, HA or performance tests. So we can only go on
what been published elsewhere. 3Scale have stated that they carry out
performance tests in order to provide benchmark data to size the infrastructure
for given API rates. The billing mechanism wasn’t available to test so we
weren’t able to set up any customer billing plans. Therefore we weren’t able to
test monetization options for our fictional API.
Our experience of using both the
OpenShift platform and the 3Scale API Gateway was positive and informative. It
is relatively straightforward to install both OpenShift and 3Scale and get a
simple API up and running. There’s a lot of ‘out of the box’ features which are
useful (and perhaps essential) if you are going to deploy an API Gateway within
your own premise. There’s also a good degree of flexibility within the platform
to set rates, integrate to backends and customise your portals.
Overall, a good experience and a good
addition to the world of API management!
The Scrum Guide, the official definition of Scrum, created and described the role of Product Owner (PO). The role is described as “responsible for maximizing the value of the product and the work of the Development Team” . It’s a challenging role; as it requires someone with technical ability, business analysis ability and authority to make decisions and deal with the consequences. It is often considered to be the most difficult role in Scrum [2, 3, 4].
There are a number of tools
available that can help the Product Owner be successful. This post describes
one such tool, called the PICK chart, which can be used to aid planning and
prioritisation between the development team and the stakeholders in the
PICK your Stories (and Battles)
The Scrum Guide describes the Product Owner’s responsibility as “the sole person responsible for managing the Product Backlog” . Commonly this is interpreted to mean that the product backlog is a one-dimensional list of tickets ranked by business value. This is a bad idea. By ordering in this simplistic manner, some low-value stories remain untouched by the team for a very long time (sometimes years). The stakeholders who requested this work are effectively placed at the end of a queue only to see others skipping in front of them.
At the top of the product
backlog, a one-dimensional list also causes problems for the unwitting Product
Owner. This is because some valuable tasks are straightforward to implement and
others are complicated or have very high levels of uncertainty.
The INVEST  and MoSCoW 
techniques can help improve story refinement and prioritisation. INVEST ensures
that each story satisfies certain criteria. It doesn’t provide a ranking
criteria as long as these criteria are met. MoSCoW provides a method for
managing project scope and identifying out-of-scope and at-risk items. It tends
to be subjective and why an item is in one of its four categories rather than
another can be contentious.
A PICK chart, similar to the
one shown below, is a useful method of addressing the weaknesses of existing
methods while also meeting the needs of the team and stakeholders:
This two-dimensional chart
shows both the potential value and the likely difficulty of each story. The
y-axis shows the value (“payoff”) from delivering story and ranks the highest
value to the top. This is similar to the usual lists used to display product
backlogs. The x-axis shows the effort required to deliver a story. Stories on
the left have lower risk as the effort required to deliver is less than those
further to the right.
The PICK chart can be used
effectively during sprint planning to help the Development Team select stories
that can be implemented in the next sprint as well as identify work that needs
further investigation. This helps ensure the sprint does not consist entirely
of high-effort, high-risk stories.
The chart can also be used as a visual tool to remove stories from the bottom of the backlog because they are both low-value and technically challenging. Involving some or all of the team in backlog grooming gives a degree of empowerment to the work they will and won’t work on incoming sprints. The outcome of this analysis simplifies the conversation with stakeholders who need to be told their idea will never be worked on. Instead of it being your opinion versus theirs, there is business and technical justification.
The PICK chart is a powerful
tool in any Product Owner’s arsenal. By eliminating long wait times for
features that will never be delivered, it ensures that internal stakeholders
don’t waste time on false hopes. By ensuring work is delivered in each sprint,
the team are seen to be continuously reducing risk and adding value to the
product. Its visual nature and relative ranking in business and technical
dimensions mean there are fewer heated arguments between teams, stakeholders
and the Product Owner. It makes “the most difficult role” that little bit
The 8 Wastes Snake is a continuous
process improvement tool. It is a poster-sized sheet that allows people working
on a process to record any perceived wastes and annoyances when they occur
during process execution. This record can then be reviewed periodically by the
teams and management identify changes to improve the process and conditions of
the people working on it.
The Purpose Of The Waste Snake
The purpose of the waste snake is to embed a culture of continuous improvement. By allowing individuals to express their frustrations at processes they are working on provides better information for managers to identify and eliminate wasted time, effort and money. In turn, by continuously solving frustrations should, in turn, reduce staff dissatisfaction, increase morale and improve staff retention.
The 8 Wastes Snake is a fusion of Schlabach’s “Snake on the wall” with the “8 wastes of lean”. The 8 wastes of lean is, in turn, an extension of Ohno’s original 7 wastes (“7 muda). The “snake on the wall” concept allowed teams to record wasted time in an immediate visual fashion to allow repeated wasteful activities to be identified and reduced/eliminated. However, it considered only lost productivity to be a waste. The 8 wastes uses the mnemonic “TIM WOODS” to consider various types of waste but did not provide an actionable tool to record when each type of waste was encountered. This technique seeks to build and improve on the older techniques.
How to use it
The 8 Wastes Snake can be used as a brainstorming tool for people to record perceived wastes in a process. However, it’s primary purpose is to be used to record actual experienced wastes during process execution for review at a later stage. This use of the tool varies by if the team using the snake are normally co-located or are distributed to one or more remote sites. It is important that the snake should belong to a single process owner (for scrum teams, this could be the Scrum Master, for example)
For teams that usually work in the same
Hang the poster close to the work area so that it is visible and accessible to the team.
If they do not have access to them already, provide post-its to the team.
Provide an introduction to the purpose of the snake (to identify and eliminate waste) and an overview of each type of waste.
Agree on a date for the first review (for Scrum teams, this could be part of a retrospective)
Encourage the team to record any wastes (such as time waiting for a process to complete) on a post-it stuck on the snake.
Review with the team and identify actions for improvement and actions for escalation.
Remove any wastes that have been reduced/eliminated and add “dot-votes” for ones that have been witnessed repeatedly.
Distributed / Remote teams
For teams that don’t usually work in the
Use a virtual tool to create a virtual poster where others can submit. A team wiki or a free virtual tool like Realtimeboard can provide this functionality
Provide an introduction to the purpose of the snake (to identify and eliminate waste) and an overview of each type of waste.
Agree on a date for the first review (for Scrum teams, this could be part of a retrospective)
Encourage the team to record any wastes (such as time waiting for a process to complete) on the same page as the snake.
Review with the team and identify actions for improvement and actions for escalation.
Remove any wastes that have been reduced/eliminated and add “dot-votes” for ones that have been witnessed repeatedly.
The great promise of DevOps is that organisations can reap the benefits of an automated workflow that takes a developer’s commit from test to production with deliberate speed. Inevitably, problems arise in the process of setting up and continuously improving a DevOps workflow. Some problems are cultural or organisational in nature, but some are technical.
This post outlines three
patterns that can be applied when debugging difficult failure modes in a DevOps
You Don’t Need a Debugger for Debugging
While IDEs provide developers with the convenience of an environment that integrates code and debugging tools, there’s nothing that says you can’t inspect running code in a staging or production environment. While deploying an IDE, or a heavy developer-centric package on a live environment can be difficult (or impossible) for operational reasons, there are lightweight CLI tools you can use to aid in the diagnosis of issues, such as a hanging process on a staging system. Tools such as ltrace and SystemTap/DTrace, even plain old lsof can reveal a lot about what’s actually happening. If more visibility into what’s in memory is needed, you can use gcore to cause a running process to generate a core dump without killing it so that it can be subsequently analysed with gdb offline.
In the Java world, tools such as jvmtop leverage the instrumentation capability built inside the virtual machine (VM) to offer a view of the VM’s threads; while jmap and VisualVM can be used to generate and analyse a heap dump, respectively.
While it is frequently useful to practice rubber duck debugging, some failure modes do not lend themselves to a dialectic approach. This is particularly true of intermittent failures seen on a live system the state of which is not fully known. If you find yourself thinking “this shouldn’t happen, but it does”, consider a different approach: aggressive quantification. A spreadsheet program can, in fact, be a debugging tool!
Gather timings, file descriptor counts, event counts, request timestamps, etc. on a variety of environments – not just where the problem is seen. This can be achieved by adding logging or instrumentation to your code or tooling, or by more passive means such as running tshark or polling the information in procfs for certain processes. Once acquired, transform the data into CSV files, import it and plot it as time series and/or as a statistical distribution. Pay attention to the outliers. What else was happening when that bleep occurred? And how does that fit in with your working hypothesis regarding the nature of the issue?
When All You Have Is a Hammer
Got a tricky issue that only
occurs very intermittently? A suspicion that there is some sort of
race condition between multiple streams of code execution, possibly in
different processes or on different systems, that results in things going
wrong, but only sometimes? If so, it’s hammer time! Inducing
extreme load, or “hammering the system” is an effective way to reproduce these
bugs. Try increasing various factors by an order of magnitude or more above
what is typically seen in regular integration testing environments. This can artificially
increase the period of time during which certain conditions are true, to which
other threads or programs might be sensitive. For instance, by repeatedly
serialising ten or a hundred times as many objects to/from a backing database,
you’ll increase the time during which other DB clients have to wait for their
transactions to run, possibly revealing pathological behaviours in the process.
Applying this debugging pattern goes against the natural inclinations of both developers and operations folks, as both would rather see code run at a scale that is supported! That’s precisely what makes it valuable, as it can reveal unconscious assumptions made about the expected behaviour of systems and environments.
In a recent post, I wrote about how traditional companies are being disrupted by more nimble competitors. Banking is one sector taking an interesting approach to this challenge.
Like their counterparts in other sectors (including telecoms, networking, media and life sciences), banks are turning to digital transformations to speed delivery and fend off competitive threats. Digital transformations are about DevOps adoption and infrastructure automation using public and private cloud.
Banks are also
adopting two other winning strategies:
Spinning up digitally-focused ‘start ups’ that are free from old
processes, oppressive corporate culture and technical debt.
Working closely with the industry’s new Fintech players.
Although not as
high-profile as the UK, Netherlands, Germany and the Nordics as a
place for Financial innovation, Ireland boasts a lot of Fintech clout.
“It makes sense that Ireland’s Fintech community would be diverse and successful”
Since the set up of
the IFSC in Dublin’s docklands in 1987, the Irish government has
supported the growth potential of the financial services sector. The sector is
supported by a well-developed financial and communications infrastructure, a
young, well-educated workforce, access to European markets and good transport
links to the UK and the US. Coupled with the boom in Ireland’s tech scene as a whole,
it makes sense that Ireland’s Fintech community would be diverse and
Ireland’s Fintech Map
In spite of this,
much of Ireland’s Fintech talent has gone under noticed.
“Get a handle on the ecosystem”
I discovered this when I established my own Fintech start-up and was trying to find local partners to connect with. I struggled to get a handle on the ecosystem and to identify the key players in each of the areas. So I started something called Ireland’s Fintech Map.
A labour of love
(and sometimes just a labour!), the Irish Fintech Map provides a snapshot of
this ever-changing, dynamic landscape.
Ireland’s Fintech Map has been gaining traction. Recent sightings have been in an Enterprise Ireland presentation to a group of Nordic Banks, in a meeting of the Business Ireland Kenya network and it has been spotted doing the rounds in Hong Kong. It has been shared countless times on LinkedIn and other social media. I also used it as part of a presentation of the Irish Fintech scene to a contingent from Skandiabanken who were eager to learn from these bright newcomers.
There are too many
companies featured on the map to mention, but I would like to give a shout out
CR2: A tech company whose solution stack enables banks to have onmi-channel relationships with their customers. CR2 and its BankWorld platform were recently recommended by analyst firm Ovum. And in November CR2 announced a deal with Jordanian Bank Al Etihad. Wishing every success to new CEO Fintan Byrne.
Leveris: Who provide a turn-key banking platform and are raising €15 Million in Series A funding. Good luck to CEO Conor Fennelly and team!
Rockall Technologies: Enabling banks to better control their banking book collateral, Rockall Technologies are ranked among the world’s top 100 risk technology companies. This is ‘one to watch’ as it is now steered by CEO Richard Bryce who has a great track record of driving company growth through innovation.
candy for Fintech fans!
To keep Ireland’s Fintech Map project manageable, I have had to employ some fairly tight parameters and exclude the following:
Companies founded outside of Ireland, even if they have an office in Ireland or the founders are born and raised in Ireland (sorry Stripe!)
Companies that are “Coming Soon”, in “beta” – or are no longer operating.
Companies that service several industries or sectors, even if Financial Services is part of their market.
Companies that provide consulting, managed services or people-based solutions.
This 2 day Business
Agility Workshop you will use simulations to engage the intuitive as well
as the rational brain while learning about the foundations of business agility.
Date: 21st & 22nd
Learning Centre, O’Connell Bridge House, Dublin 2
It is more than
scaling agile (development) practices. It is more than the sum of different
organisational units that each implement their own chosen agile method on their
own little island, constrained by a traditional management system. It is a
different way of thinking about agility.
WORKSHOP WILL GIVE YOU:
– New ways of
teaching and coaching agility that engages and mobilizes all levels of the
organisation (including decision makers) across all functions (not just IT or
– A different way of thinking about change where agility spreads virally through the (informal) network. – Agile thinking at scale to develop unique capabilities to thrive in an ever-changing and highly competitive business landscape .
ENTERPRISE FLOW that balances supply with demand from team to portfolio level,
NETWORKED COLLABORATION where highly engaged teams work, decide and learn together in a network,
ORGANISATIONAL LEARNING where experience and experiment complement each other,
DAY 1 – CORE AGILE
Explore flow, collaboration and learning as core agile capabilities as
opposed to methods and practices that do not scale
Learn how to use simulation as a way of teaching and coaching the core
agile capabilities in a way that inspires action not just talk
Experience how to teach agility at all levels and across all functions
(not just IT) in the organisation
DAY 2 – SCALED
Learn about enterprise flow, networked collaboration and organisational
learning as the core capabilities for business agility
Use simulation to explore scaling problems including cross-team
dependencies and balancing demand with supply in end-to-end flow
Explore the use of Customer Kanban to manage capacity downstream, and
triage and order points to shape demand upstream. Apply what has been learned
in the context of agile portfolio management.
Who the workshop is
The workshop is
intended for agile coaches and practitioners who want to engage business as
well as IT (including decision makers) in their agile initiatives. It is a
must for coaches and trainers who want to use Okaloa Flowlab in their own training and
Patrick Steyaert of Okaloa, a principal lean agile coach with extensive experience in practicing Kanban, will be your workshop facilitator. Patrick received a Brickell Key Award in 2015 as a recognition for his contributions to the Kanban community. Patrick is one of the first, and only Belgian, Lean Kanban University accredited trainers. He is also a regular speaker at Kanban and agile conferences.
Installation and upgrade
processes can be a problematic and frustrating. Manual steps carried out on
many different servers introduce the potential for human error – adding time,
cost and risk. Ammeon Workflow Engine (AWE) is a collection of services designed
to address the complexity of end-to-end installations and upgrades. Our
workflow engine emphasizes reducing cost and effort by removing complexity and
The core functionality of AWE is to automate the manual steps involved
in any software application installation or package upgrade (for example,
supporting upgrade automation of deployments to so-called ‘brown-field’
environments). AWE is highly configurable and extensible and can be applied to
a variety of different use cases, including:
Automation of application installs
Automation of application upgrades
Checks on running systems
Our customers have seen dramatic results from
adopting AWE, including:
>75% reduction in upgrade steps
Up to ⅓ wiped off completion time
More than a 50% reduction in effort
AWE implements a multi-phase
‘Prepare for’ phase – preparing the target ‘To’ state
Verification phase – check target systems readiness
Execution phase – script execution
Continuous health checks and reporting at each phase
You can pause and resume
upgrade jobs as required. You can also configure dependencies between steps, enabling
the workflow engine to automatically ignore, pause or stop when errors are
AWE contains a Python-based
engine at its core. This engine takes inputs from multiple XML files:
Workflow Steps –
describes order and dependencies and is used to determine the ‘To’ state.
Deployment Descriptor –
describes the physical/virtual deployment including parameters such as machine
names and IP Addresses.
Master State –
saves the execution state in the event that the workflow is paused or
encounters an error.
The engine orchestrates the execution of the steps, continually monitoring progress through to completion. AWE is capable of executing Python and Bash scripts, enabling you to reuse your existing scripts as part of the upgrade workflow.
Process step reduction and simplification
Significant reduction in time to upgrade
Ability to execute installations within narrow maintenance windows
Reduction in errors, failures and documentation
Capability to reuse existing scripts and mix & match automated and manual steps