Practical Data is an occasional series that examines data concepts that we encounter in our everyday lives, sometimes without even knowing it.

It is my pleasure to introduce you to the Sigmoid Function. I can’t help but think of it as the “Sigmund Function” after the main character in the 1970s Sid and Marty Krofft Saturday morning television program, Sigmund and the Sea Monsters. No, it’s not a short, green, seaweed-covered creature with a single snaggletooth, but a class of functions that describes a surprising variety of phenomena in the natural world and human experience. Unless you’re an electrical engineer, sound designer, mathematician, or neural network developer you’ve probably not encountered it. Until now.

Generally speaking, a Sigmoid Function is any s-shaped curve. It approaches from a minimum asymptote on one side and increases slowly. (An asymptote is a value that a function gets infinitely close to but never reaches.) The function rises quickly in the middle, then flattens out, approaching a maximum asymptote on the other side. A simple form, the Logistic Curve, is shown below. 

I first encountered Sigmoids in my graduate research, as they were commonly used as the activation function for neural network nodes. Each node accumulates the values over a set of inputs, then passes the sum through a sigmoidal activation function to determine whether that node “lights up” or not. In recent years, considerable research has gone into designing and optimizing activation functions. Although a few other functions have become more popular, Sigmoids are still commonly used.

I was thinking recently about the function itself and how much in life it describes (yes, goofy stuff pops into my head on occasion and so I share). Do any of these sound familiar? 

At various times I have tried teaching or learning something new and it just doesn’t stick. It can become frustrating, sometimes to the point of giving up. Fortunately, perseverance is more often than not rewarded with that “Eureka!” moment when all of a sudden it becomes clear. Success begets success. One breakthrough leads to another. 

Thomas Edison and his team tested thousands of materials trying to find a suitable filament for his electric light bulb. He tried every material he could get his hands on, and finally discovered that a particular bamboo from Japan satisfied his requirements. He famously said, “I have not failed. I’ve just found 10,000 ways that won’t work.” (If you ever have the opportunity to visit the re-creation of Edison’s Menlo Park laboratory at Greenfield Village in Dearborn, Michigan, do it!)

My own dissertation research followed a similar path. I wanted to use genetic algorithms to evolve fuzzy logic controllers, but my initial approach didn’t work. Neither did every subsequent approach for the next several months. I was scheduled to present at a conference in the fall, and while I could cobble together enough successes for a serviceable presentation, the overarching methodology still eluded me. Two weeks before the conference, just when I was about to contact the organizers and tell them that I would not be able to attend, I had a breakthrough. The idea worked. I refined the idea. That worked. I got on a roll. The conference presentation was a success (whew), and a few months later that methodology became the centerpiece of my dissertation.  

This pattern also shows up strikingly within the results of my dissertation research. Each simulation started with a population of random controllers attempting to accomplish some task. Not surprisingly, none of the random controllers could accomplish the task. Through several generations of evolution, they continued to perform poorly. Reproduction favored those instances that performed least poorly and failed the least completely. Eventually, a small number of controllers were minimally successful. Once that happened, within one or two generations every single controller was fully successful. The population went from uniformly failing to uniformly succeeding. 

Interestingly, the behavior of my computer simulations mirrored the progress of the evolution of life on Earth. Until about 530 million years ago, nearly all life consisted of single cell or simple multi-cellular organisms. Then, suddenly, an enormous diversity of complex organisms appeared in what is referred to as the Cambrian explosion. (This, of course, assumes you can call 20 million years “sudden” which, when you’re talking geological and evolutionary time scales, you can.) Simple to complex seemingly all at once.

Most career paths follow that trajectory. You start out slowly. Pay your dues. You get an opportunity and you take advantage of it. Progress accelerates. Eventually, promotions slow and ultimately flatten out. After all, there’s only so much room for advancement and there’s more competition for fewer opportunities.

Never forget that the Beatles, and earlier as the Quarrymen, played hundreds of gigs in small, sometimes seedy clubs over more than five years before becoming what many perceived to be an “overnight” sensation.

Sigmoid Functions model the progression of a nuclear chain reaction through critical mass and the pace of technology adoption. Can you think of others?

If you’re interested in the math, the most common Sigmoid functions are the Logistic Function, Hyperbolic Tangent (tanh), and Inverse Tangent (arctan). The standard Logistic Function that varies from 0 to +1, often referred to as the Sigmoid Function, is: 

Depending upon the amount of flexibility you want in shaping the Logistic Function, you can parameterize the minimum value of the curve (A), the maximum value of the curve (D), the right/left location of the inflection point (C), the slope of the curve at the inflection point (B), and the symmetry of the curve (S).

f(x)=A+DA(1+(xC)B)Sf of x equals cap A plus the fraction with numerator cap D minus cap A and denominator open paren 1 plus open paren the fraction with numerator x and denominator cap C end-fraction close paren to the cap B-th power close paren to the cap S-th power end-fraction

Unlike the step function used in the original neural network perceptrons, sigmoids are differentiable across their entire domain. This makes them suitable for calculating gradients and updating neural network parameters using the backpropagation algorithm.