Jan 5, 2023
by Alexei Falco

The Integration Failure That Celadon Developers Were Fixing at 5 A.M.

The best things that can demonstrate our expertise are failures and the ways out found. That’s why we decided to make a series of articles where we share the most unsuccessful case studies and what we did to improve them. Throughout our journey as a software development company, we have encountered three types of failures:

  • Cost estimation failures;
  • Misdirected expectations;
  • Technical failures.

The previous article was about our failure in choosing between a fix-priced contract or Time & Materials which helped us to improve the approach to cost estimation.

Today, we’re going to talk about a technical failure that happened in one of Celadon’s early projects.

A Few Words About the Project

Our client was a huge fan of horse racing. He owned his horse club and wanted to develop an IoT-based mobile app featuring horse riding events. As you may know, for racing a special GPS tracking device is used to track the whereness of a horse. We needed to connect the tracking software and the app to be developed. The concept was the following: the device captures the horse's location and transmits the data to the app. The goal was to make it work in a way, so it could review and present sophisticated race data in real-time and be intuitively understandable.

The app is supposed to be run on both iOS and Android. However, the client set a very tight deadline for us to deliver full-cycle of services: consulting, technical and graphical design, project management, frontend & backend development, and QA.

What the Technical Failure was About

The app was delivered on time and the deadline was met. The first launch coincided with a racing start. The developers uploaded the app into the build. What could go wrong? Well, surprisingly the app crashed on the very first attempt to run it. As the development team and the client were in different time zones, so the development team had to fix it on Saturday at 5 AM. It couldn’t wait as the racing were in place in the UAE and measure were urgently required. Celadon could not allow letting the customer down. This is the core principle of Celadon - live up to customer’s trust. However, before fixing the technical bugs, it is necessary to find them. The developers got down to work and here’s what was found - The integration with the timing data-providing app caused the crash.

What Was Wrong With the Integration?

The team started with the code review which showed no errors found. Then we tested the product and went into the nitty-gritty of all the functions and integrations. Finally, we arrived at the misfunctioning of CSV-file integration.

The client had software providing the timing date of the racing. Once a horse reached a gate, the data was collected in a separate CSV file. Later, we unparsed the file to receive the data. This wasn’t the best solution in the case and offered other options to make risk-free integration. However, The customer expressed no desire to fine-tune their timing software.

So, as it was made by default, the CSV file was overwritten every time the horse went through the gate. While being tested, it worked as it was supposed to. However, when the race took place, the app in production crashed. Yet, for some reason, their timing software began to add data to the CVS file, not overwrite it, increasing its volume by 160 times (the number of participants in the race) when a horse passed a gate.

As we said, we had to wake the development team to fix it. Eventually, the bug was fixed and the app was brought back up to full functionality. However, we’re here to tell you about the reasons and what we did to correct them.

Why Failure Happened and What QA App Testing has to do With it

We don't have many failures in our experience, but we would be lying if we said there were none at all. When things happen, our team always analyzes the situation and identifies the reasons for the incident to prevent it from recurring in the future.

The bug occurred due to two reasons. The first one is related to testing. The team had no chance not test the app running with horses before the launch. The first testing in a proper testing environment was carried out when the racing started. The second reason is integration itself. The client’s software wasn’t working in the way it was supposed to. This led to malfunction and the app failure. The tracking software needed adjustment for the integration and further proper operation jointly with the developed app.

What’s the lesson learned? In cases when existing software can lead to a crash, it’s better to insist on its refining with an explanation of what the outcome can be faced. The second lesson learned is simple - conduct more profound tests by creating a testing environment. The roadmap of QA services was also improved. Now, it’s comprised of several stages that are:

  • Business requirements analysis relying on the Guide to the Business Analysis Body of Knowledge;
  • Test plan indicating all the tests to be carried out;
  • Preparing the test environment required for the complete QA;
  • Compiling test design based on the pre-developed company’s QA checklist;
  • Manual and automated testing;
  • Reporting the QA results.

Overpowering software estimation contributes to high-quality and smoothly running apps, satisfied clients, and the development team's sound sleep.

On the Final Note

The best way to describe how we work is through our failures, of which we don't have many, but they are part of our journey and growth. Analyzing the causes of our failures helps us become stronger and reinforce the quality of our services. We show our work from different angles so that you can get out of fears while outsourcing software development and better understand the level of our development team’s professionalism and expertise.

Thinking of ordering software development from Celadon - click here and we'll get back to you as soon as possible!

Drop Us A Messageand we will get back to you in the next 12 hours