fbpx

How I Got Stung by a Stringly typed API

LAST UPDATED: July 3, 2020

Barisere Jonathan

At about 9 AM on Friday, 27 March 2020, we received a troubling email from SendGrid. It read, “This is a friendly notification that you have exceeded 90% of your monthly email limit.”

Several questions flooded my mind: was this a marketing ploy to trick us into upgrading our plan? Did Twilio change SendGrid plans after their acquisition, but forgot to update their documentation? If this was real, how many emails had been sent? Was I hallucinating?

As with most incidents, I found the answer scattered through space and time. We shall talk about time briefly, after we go back in time to understand how I found myself staring at that disturbing email.

Table of Contents

A Requirement Came In

A week earlier (20 March, 2020), a requirement came in. A zombie had woken up: a project in maintenance needed a quick update; you know, like bolting a new leg on your horse to make it go faster. The requirement was that we periodically send notifications of subscriptions that were about to expire. A quick one it would be. I scheduled the work and got to it the next week. Time and scheduling; we shall come to those.

The project was built using NestJS. NestJS has a package for task scheduling, so it was trivial to implement the requirement in the main application process. Great. A few hours of work it took; actually, two days (dependency upgrades and meetings), and the requirement was live on Thursday, 26 March.

Quenching the Fire

From the requirement a week earlier, to sitting on my bed on a Friday morning, staring at that email, there was only one possibility: I screwed up the requirement. After all, that was the only part of the application in which we used SendGrid. The first thing to do was to quench the fire. I disabled the API key our application was using: every second that passed could mean another email in some inbox. Confident that no more emails would be sent, I headed out to work.

The postmortem

Sitting at my desk at work, I started to piece together what must have happened. Every programmer knows that the answer is always in logs, somewhere in there. I opened Stackdriver logging and jumped to 8 AM at UTC. That was the time the mailer task was scheduled to run, so it made sense to start there. I found nothing. Actually, I found something different: the only error logs were request rejections from SendGrid, and they showed up long after I expected. That meant that earlier requests must have succeeded.

I went over to the SendGrid dashboard to confirm my fear; yes, over 300 requests had been acknowledged by SendGrid. How did this happen?

8 AM Every Other Day

Batch jobs that run periodically are not a new thing in programming. Most UNIX systems have a utility called cron that can be used to schedule tasks. Systemd, the new service manager for Linux, has timer units that implement task scheduling. Most cloud platforms have services that implement task scheduling: App Engine’s cron servicesKubernetes’ CronJob, Google Cloud Platform’s Cloud SchedulerHeroku’s Scheduler add-on, etc. I chose cron, but not the Linux utility. I chose the @nestjs/scheduler package’s in-process task scheduler which supports declaring tasks using expressions similar to crontab.

How Many Stars Have We Got?

* * 8 */2 * *

That, certainly, is not a movie rating. It is part of a crontab expression. You can notice that it has six sets of characters separated by spaces. Each of those characters determine, in part, the schedule of the task they are defined for. Those familiar with the original crontab specification (man 5 crontab) know that it has seven parts, but only five of those parts determine the scheduling of the task.
From the crontab manual page:

“Each line has five time and date fields, followed by a user name  if this  is  the  system crontab file, followed by a command.”

Why then do we have six fields determining the schedule here?

The original cron utility ran every minute, executing tasks that have been scheduled for that minute. This makes sense, considering that starting processes every second could be expensive. But the @nestjs/schedule package supports an optional sixth field, one that determines the second a task should run. This is possible because it schedules functions that run in a process, not processes for the operating system to run.

The crontab expression correctly specifies that whatever task it is defined for should run at 8 AM every other day, but it also specifies “every second”. So, for every second between 8 AM and 9 AM, the function ran, sending emails.

Can I Have a String?

In programming communities, there have been strong opinions about weak type systems, weak opinions about strong type systems, and every other combination of opinions and type systems. But one thing has become more evident recently, that type systems are important for both correctness and comprehension.

It is good practice to design programs and APIs in ways that use the type system to prevent programmer errors. Stringly typed APIs have not fared well in these debates. Crontab expressions are one such API. They are succinct and fine when used with the cron utility on Linux, but they make a bad API in a programming language, because they are string-only.

Stringly typed: A riff on strongly typed. Used to describe an implementation that needlessly relies on strings when programmer & refactor friendly options are available.
See item 7 on New Programming Jargon

Let us define the type of a crontab expression as follows.type NumberRange = {
  start: number;
  end: number;
};

type RangeExpr = { range: "Every" | NumberRange; step?: number };

type ScheduleExpr = number | RangeExpr;

type CronExpression = {
  second?: ScheduleExpr;
  minute: ScheduleExpr;
  hour: ScheduleExpr;
  dayOfMonth: ScheduleExpr;
  month: ScheduleExpr;
  dayOfWeek: ScheduleExpr;
};

Now consider the crontab expression written according to our new type CronExpression.const everyOtherDayAt8AM: CronExpression = {
  second: { range: "Every" },
  minute: { range: "Every" },
  hour: 8,
  dayOfMonth: { range: "Every", step: 2 },
  month: { range: "Every" },
  dayOfWeek: { range: "Every" }

};

It is easy to see some mistakes that were harder to see in the original expression. We can see that we specified that the task for which the expression is defined should run every second and every minute; we actually want it to run only once. We can correct it as follows.const everyOtherDayAt8AM: CronExpression = {
  minute: 0,
  hour: 8,
  dayOfMonth: { range: "Every", step: 2 },
  month: { range: "Every" },
  dayOfWeek: { range: "Every" }
};

We omit the field “second” because it is not needed. By making the type system aware of our intention, we get assistance from the type system when defining cron expressions. We also use the structure of the type to make our intention clear, thereby reducing the cognitive overhead in understanding the expression.

The problem with stringly typed APIs is that they encode both intent and instruction in strings. Types should encode intent; arguments should encode data/instruction.

Aftereffect

I sat at my computer that Friday morning, certain that the client would be totally disappointed. I even entertained some fear that they would come in with an axe and make me pay for the annoyance of over 300 emails in their inbox. I thought to reach out first and offer an explanation, but I could not put words together. Then I went back to the SendGrid dashboard. There I saw an arithmetic of unfortunate events. Our emails had been bounced by Zoho Mail. That meant that the client did not receive them in their inbox, so they will remain a happy client. Surely, -1 x -1 = 1.

Although I resolved the problem, I did not design a better task scheduling API to save myself in the future. I walked away with a gnawing feeling that, someday, another stringly typed API will get me, and it shall be my ruin.

About the Author

Message Sent

Let's connect