Welcome!

Java IoT Authors: Elizabeth White, Sematext Blog, Janakiram MSV, Mike Raia, Liz McMillan

Related Topics: Java IoT

Java IoT: Article

Genetic Algorithms in Java

Genetic Algorithms in Java

Starting about 3.5 billion years ago with bacteria, nature em- barked on the grandest of all algorithms: the evolution of highly complex and dynamic machines capable of interacting with and adapting to their environments in order to solve problems. We know these machines as plants and animals.

One look at the genetic code of even the simplest living organism reveals a structure that's enormously complex and efficiently tuned, ensuring the survival of the organism in its environment. We might even use the terms fault-tolerant, highly parallel, high performance, and ubiquitous. Don't forget that nature accomplished this extraordinary programming feat without a single developer coding an exhaustive list of if-then rules and switch statements to account for all possible scenarios. It was simply based on a random set of interactions with the fittest organisms surviving to replicate their genetic code into the next generation.

With the advent of the internet over the past decade, an entirely digital world has arisen in which web sites and applications are the organisms fighting for survival in a highly complex, internetworked environment replete with computer viruses, server crashes, and the like - an environment in which only the fittest will survive. As such, it's my belief that more sophisticated means of software development are needed to build web applications capable of interacting with and adapting to the complexities of the new digital world thriving within our computers.

One simple, yet extremely powerful, technique that will likely play a role in the evolution of the internet (and the web applications that live within it) borrows several concepts from the biological world and transforms them into bits and bytes with the goal of building adaptive software systems.

This article is the first of a two-part series that examines a technique from the AI community called genetic algorithms, which borrows concepts from biology to solve complex and often nonlinear problems encountered in the world of computer science. This article will introduce you to the concepts of genetic algorithms and discuss why Java is well suited to their implementation. The next installment will investigate the details of implementing these algorithms in Java. It's my hope that after reading these articles, you'll think a little differently about software development and its future. Genetic algorithms provide a problem-solving technique that's too powerful to ignore.

Genetic Algorithms
First a little history. Genetic algorithms were born out of the idea of evolutionary programming introduced by I. Rechenberg in the 1960s. John Holland, a professor at the University of Michigan at the time, is credited with the invention of genetic algorithms following the publication of his 1975 book Adaptation in Natural and Artificial Systems. In his book Holland formulated the basics of genetic algorithms as models of machine learning that derive their behavior from concepts of biology's theory of evolution. It was one of Holland's students, David Goldberg, who popularized the use of genetic algorithms when he was able to solve a difficult problem involving gas-pipeline transmission for his dissertation in 1989.

That said, what exactly is a genetic algorithm? What are they used for? What are the benefits over traditional programming techniques? How does Java fit into this? I'll attempt to answer these questions so you'll have the foundation needed to start implementing genetic algorithms (see Figure 1).

Darwin in Your Computer
A genetic algorithm can be thought of as a model for machine learning in which a population of randomly created individuals goes through a simulated process of evolution - a digital survival of the fittest where each individual represents a point in the problem's solution search space. Using correct terminology, an individual is represented by a chromosome, which consists of several genes. Genes are essentially the parameters of the problem to be solved. A collection of chromosomes is considered a population and is the fundamental unit on which a genetic algorithm operates. Once the algorithm is set into motion, individuals are selected from a population and combined in a process called crossover to create a set of children. The children are randomly mutated to create a new set of chromosomes to be reinserted into the population. Once enough children chromosomes have been created to replace a population, a generation is said to have passed.

With each generation, all the chromosomes are evaluated according to some fitness criterion that's a measure of the strength of the chromosome compared to the rest of the population. Only the fittest chromosomes survive into the next generation where the selection, crossover, and mutate process begins anew. After a number of generations have elapsed, the best chromosome is selected from the population and represents the optimal solution to the problem being solved. Essentially what's happening is that a random set of solutions to a problem within a given search space is created and evolved over an amount of time to find an optimal solution. A concrete example will help clarify the concepts described above.

The Traveling Salesman
The traveling salesman problem (TSP) is a classic computer science problem in which a salesman must traverse a number of cities, visiting each only once, while minimizing the distance traveled. For the case of 20 cities, an exhaustive search method that examines all possible routes dictates a search through over 2.4 billion billion (20!) permutations which, if evaluated at a rate of 500 million per second, would take over 150 years to complete.

Employing a gen- etic algorithm reduces the amount of time to seconds (or a fraction thereof, de- pending on the computing power available) and produces the optimum solution in some cases and a near optimal solution in most others. The representation of this problem in the genetic algorithm domain consists of cities with their x and y coordinates serving as individual genes. A chromosome is a list of cities, in order, that represent one possible solution to the traveling salesman problem. The fitness of the chromosome is then the Cartesian distance between the cities when traversed in order, with the fittest chromosomes being those with the shortest overall distance (see Figure 2).

Typically, genetic algorithms have been utilized in solving complex optimization problems when traditional programming techniques (such as exhaustive search, analytic optimization, and line minimization) fail to arrive at a solution in a reasonable amount of time. genetic algorithms confer the following advantages:

  • They evaluate several solutions simultaneously, covering a large search space.
  • They work well in parallel implementation.
  • They optimize parameters with very complex cost functions.
  • They create a list of optimal solutions, not just a single solution.
  • They work with various data types.
This leads to the next question: Why use Java?

Why Java?
As you can see, genetic algorithms can become computationally expensive depending on a number of parameters (including the size of the population, the complexity of the fitness function, the size of the chromosome, and the time to converge on an optimal solution. Thus, in choosing a language for implementation, weighing the benefits of using Java versus using a compiled language such as C or C++ is essential. For Java to be a viable language for genetic algorithm implementation, it must present significant advantages to make up for its degraded performance as compared to other compiled languages. And it does! The advantages of Java are particularly evident in the area of distributed computing.

Simple and Object-Oriented
Given the dynamic memory requirements for a genetic algorithm, Java's garbage collector relieves us from having to allocate and deallocate memory for chromosomes in each generation. This allows us to focus specifically on coding the problem at hand and not worrying about memory management details. Also, the use of objects allows us to create an endless number of problem encodings and still use the genetic algorithm framework. This means that once the basic algorithm structure is developed, implementing a genetic algorithm to solve new problems becomes a matter of defining the problem and its encoding. Next month we'll take an in-depth look at what this means during implementation .

Robust and Secure
Java was designed for creating software that's highly reliable and capable of operating in distributed environments. As developers start to move genetic algorithms from a single CPU to a network of parallel and distributed CPUs, robustness and security are essential. Think of partitioning a genetic algorithm into a number of populations and letting them evolve separately in parallel, frequently distributing the most fit from each population into all the populations. JavaSpaces presents itself as an excellent candidate for moving genetic algorithms into a distributed environment.

Architecture-Neutral and Portable
As referenced above, the real power of genetic algorithms can be obtained in parallel and distributed environments. With Java's platform-neutrality, populations and the algorithm to evolve them can be distributed among a network of computers for processing, provided that a JVM is available. Don't worry about the implementations for different operating systems and CPUs. Think of the [email protected] project that utilized over two million PCs connected to the internet to churn through radar data in the search for extraterrestrial intelligence. Genetic algorithms are ideal candidates for use in such a distributed environment, with java being the obvious language of choice given its portability. Computing power is no longer an issue; there will be more than enough to go around.

Building Blocks
Now that we've briefly examined the nature of genetic algorithms and why Java makes sense as the development language of choice, let's take a more detailed look at the fundamental components that make up a genetic algorithm. For the sake of simplicity, we'll cover the most basic implementations of genetic algorithms and introduce the essential core concepts. I highly recommend further research and study if genetic algorithms spark a deeper curiosity. A number of resources are available on the web for such study.

Genes
A gene can be defined as the encoding of a single parameter in a genetic algorithm. A gene can take many forms depending on the problem definition. For the traveling salesman problem, a gene represents a city and its longitude and latitude coordinates. However, when solving a high-order, nonlinear, partial differential equation, a gene can represent one of the variables to solve for and its range of acceptable values.

This highlights the two main flavors of genetic algorithms: permutation-encoded versus real-parameter. In the former version, the goal is to find the optimal ordering of a set of genes such as in the TSP. As for the latter, an example of a real-parameter genetic algorithm is finding x and y such that the following function is minimized: f(x, y) = 2x * sin(3 * y) + 4y * cos (5 * x).

Historically, genes were represented as sequences of 1's and 0's. However, this approach has not been shown to yield better performance and introduces a layer of complexity as a translation is needed between the actual values of parameters and their binary representation. In addition, handling genes as objects in Java makes the implementation more intuitive and can be extended to make them reusable across different genetic algorithm implementations. (More on this in next month's article.)

Gene Pool
Much like its biological equivalent, the gene pool for a genetic algorithm is a collection of all the available genes. From the gene pool, chromosomes are created at the beginning of a genetic algorithm by randomly drawing genes from the gene pool and assembling them to build a chromosome that represents one solution for the search space defined for the genetic algorithm.

Returning to the examples mentioned above, the gene pool for solving the traveling salesman problem consists of one gene per city to be traversed. For the case of 20 cities, there will be 20 genes in the gene pool from which random chromosomes will be created. For real parameter genetic algorithms, such as minimizing the function f(x, y), the gene pool will consist of two genes, one representing the variable x and the other representing the variable y.

Chromosomes
Continuing with definitions, a chromosome is a collection of genes representing a single point in the solution search space. The fitness of a chromosome is determined by a cost function determined prior to the execution of the genetic algorithm. Again, returning to the traveling salesman problem, the fitness of a given chromosome is the sum of the distances between the cities when traversed in the order specified by the chromosome. for the real parameter chromosome (f(x, y)), the fitness is the result of substituting the x and y values back into the original function and performing the calculation. Note that the fitness of a chromosome tells you nothing about its strength relative to other chromosomes; rather, it's a raw evaluation of the chromosome's fitness. It's at a higher level that fitnesses are compared and selection proceeds according to the rules of a genetic algorithm. This higher level is the population.

Population
A population is a collection of all the chromosomes being evolved in a genetic algorithm. As new chromosomes are created and reinserted into the population, less fit chromosomes are replaced and only the most fit survive into the next generation. As mentioned previously, it's here that the process of digital evolution occurs, as the fitness of the competing chromosomes is compared in order to select parent chromosomes to reproduce.

Depending on the search space for a given problem, population size can range from a few dozen chromosomes to several hundred, several thousand, or more. Given the fact that a chromosome represents a single point in the solution search space, for problems with extremely large search spaces (such as the 20-city TSP), it makes sense that a large population size is needed to cover as much of the space as possible. Otherwise, the genetic algorithm may approach a local minimum and converge toward it, rather than the global minimum. Convergence is a core issue in genetic algorithm implementation, and I highly recommend further examination outside of this article to gain additional insight.

Genetic Algorithm Operations
Now that we've discussed the requisite components of a genetic algorithm, it's essential to understand how a genetic algorithm operates on each of the components to create a simulated evolutionary environment that combs a search space for an optimal solution. There are five elementary genetic algorithm operations:

  • Fitness evaluation: With the examination of a chromosome and its role within a population, we talked briefly about fitness evaluation and its importance. The proper definition and evaluation of a fitness function is critical to the success of the genetic algorithm. It's the means by which chromosomes are compared to one another to determine the most fit individuals. The primary goal here is differentiation between the more fit chromosomes and the less fit chromosomes. Remember, it's survival of the fittest.
  • Selection: This is the method by which chromosomes are chosen to reproduce in order to create children for the next generation. The goal of selection is to choose individuals that, on average, are more fit than others to pass on their genes to the next generation while, at the same time, maintaining genetic diversity. If a population consists of identical individuals, genetic diversity is lost and it's difficult for the genetic algorithm to explore different regions of a search space.

    Several different methods are available for genetic algorithm selection, but for the sake of simplicity and brevity I'll focus on a technique labeled tournament selection. With this technique, a group of individuals is selected at random and the two most fit are selected for reproduction (i.e., they win the tournament). Keeping the tournament size small (4-8 chromosomes) ensures genetic diversity as the group is small, and what appears to be the most fit within the group may actually be a weak chromosome when compared with the entire population.

  • Crossover: Once two parent chromosomes are selected, they reproduce two child chromosomes via the crossover operation. One of the parameters of a genetic algorithm is the crossover probability (typically 75-90%) that represents the statistical chance that two given chromosomes will cross over. For each potential crossover, a random number between 0.0 and 1.0 is generated. If the number is greater than the crossover rate, then crossover doesn't occur and the children chromosomes are exact replicas of their parents. If crossover does occur, then the parents randomly exchange genes to create new chromosomes.

    There are three types of crossover covering a wide range of problem encodings:
    - Permutation encoding with unique genes: In this case, a gene can appear only once within a chromosome. One example is the TSP. Each city may appear only a single time within the chromosome.
    - Crossover operating on the permutation encoding, with the exception that genes don't have to be unique: Let's imagine that we have a genetic algorithm that's evolving a musical piece within the key of C. All the notes in the key of C are viable and can be repeated indefinitely up to the size of the chromosome.
    - Real parameter chromosome crossover: In a real parameter chromosome, each gene will represent a parameter to be applied to a given cost function. Building on the function, f(x, y) described earlier, two parent chromosomes will have genes for the x variable, both representing different values. A method for crossing over the two genes might involve creating a new gene for the x variable with the value being the average of the two parent genes.

Crossover is another essential genetic algorithm operator that ensures genetic diversity within a population. The conceptual goal of crossover is, over time, to combine the good portions of chromosomes into newer and better chromosomes. For a better understanding, see Figure 3. I highly recommend further exploration of the crossover operator before attempting to implement your own genetic algorithm.

  • Mutation: Similar to crossover in that it randomly modifies chromosomes, it operates on only a single chromosome at a time (see Figure 4). As with crossover, there's a probability associated with the occurrence of mutations, albeit a small one (typically 5-25%). Yet again, returning to the TSP, a typical mutation can include randomly selecting two endpoints within a chromosome and reversing the order of the genes. Several mutation techniques that can be utilized depending on the problem encoding won't be discussed here. It's important to remember that mutation is a fundamental operator for ensuring genetic diversity within a population, which translates into a better coverage of the search space.

  • Insertion: This is the final algorithmic step to conclude a generation in a genetic algorithm. Insertion is the process of introducing children chromosomes into a population and removing the less fit chromosomes. One common technique for insertion utilizes a technique called elitism in which the n best chromosomes of a population are kept for the next generation and the rest are replaced with new children. This ensures that the most fit chromosomes survive into the following generation and have the opportunity to reproduce again.

Genetic Algorithm Considerations
By now you should have a basic understanding of what a genetic algorithm is and how it works. Let's now quickly look at some considerations when implementing a genetic algorithm.

Convergence
The goal of implementing any genetic algorithm is convergence on an optimal solution for a given search space. Convergence will be affected by numerous factors associated with the implementation of the genetic algorithm, such as parameter encoding, population size, crossover and mutation rates, and selection technique. Depending on the problem being solved, these factors are usually determined only by experience working with genetic algorithms of all flavors. My recommendation is to start coding!

Performance
Performance is an issue that has constantly plagued genetic algorithms due to their heavy-duty processing power requirements. with the combination of Moore's Law and the increased availability of highly parallel, distributed computing power, I don't think performance will be an issue in the near future.

Real-World Applications
Here's the number one barrier to acceptance of genetic algorithms as a practical programming technique: real-world applications. Genetic algorithms have resided primarily in academia solving classic computer science problems. Their use in business and commercial environments is highly unproven. As computing power becomes more readily available, I think we'll see an increase in adaptive software systems with genetic algorithms at their core.

One particular area of work that may break down the wall is security. Researchers have begun to develop operating systems modeled after the immune system of animals. As new viruses invade the system, strategies are evolved to combat the virus, remove it from the operating system, and identify similar attacks in the future. And with the proliferation of highly sophisticated attacks on internet sites, such an "immune" system offers a much better (and quicker) response than waiting for a human to recognize the attack and code a patch to fix it or close ports on a firewall to deny it.

Another interesting outbranching of genetic algorithms is the field of genetic programming pioneered by John Koza. Without getting into the details, genetic programming is essentially using genetic algorithms with the genes representing programmatic constructs (e.g., AND, OR, IF, THEN, +, and -). What's evolved are chromosomes representing computer programs. It's an exciting field that's worth a deeper look.

Conclusions
The goal of this article wasn't to encourage you to implement genetic algorithms in your code tomorrow, but rather to inform and educate you about one technique for building software capable of adaptation. As the Internet continues to grow at a furious pace, a new digital world is being created that operates in the language of 0's and 1's. The organisms fighting for survival are the web sites that you and I create on a daily basis. Whether fighting for survival in the sense of attracting new customers or warding off the latest computer hacker, adaptability will be crucial to survival in the complex digital world. Hopefully this article has sparked a newfound interest in software development and its future. If so, stay tuned for the next issue of JDJ, in which I'll demonstrate a simple implementation of a genetic algorithm.

More Stories By Michael Lacy

Michael Lacy is an
engineer for the platform development group at Shutterfly, an online photo service, where he
develops Web-based
solutions for digital image printing, enhancing, and sharing. He's also a
certified Java programmer and developer.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@ThingsExpo Stories
With so much going on in this space you could be forgiven for thinking you were always working with yesterday’s technologies. So much change, so quickly. What do you do if you have to build a solution from the ground up that is expected to live in the field for at least 5-10 years? This is the challenge we faced when we looked to refresh our existing 10-year-old custom hardware stack to measure the fullness of trash cans and compactors.
The emerging Internet of Everything creates tremendous new opportunities for customer engagement and business model innovation. However, enterprises must overcome a number of critical challenges to bring these new solutions to market. In his session at @ThingsExpo, Michael Martin, CTO/CIO at nfrastructure, outlined these key challenges and recommended approaches for overcoming them to achieve speed and agility in the design, development and implementation of Internet of Everything solutions wi...
Cloud computing is being adopted in one form or another by 94% of enterprises today. Tens of billions of new devices are being connected to The Internet of Things. And Big Data is driving this bus. An exponential increase is expected in the amount of information being processed, managed, analyzed, and acted upon by enterprise IT. This amazing is not part of some distant future - it is happening today. One report shows a 650% increase in enterprise data by 2020. Other estimates are even higher....
Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like “How is my application doing” but no id...
SYS-CON Events announced today that 910Telecom will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Housed in the classic Denver Gas & Electric Building, 910 15th St., 910Telecom is a carrier-neutral telecom hotel located in the heart of Denver. Adjacent to CenturyLink, AT&T, and Denver Main, 910Telecom offers connectivity to all major carriers, Internet service providers, Internet backbones and ...
SYS-CON Events announced today Telecom Reseller has been named “Media Sponsor” of SYS-CON's 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
Smart Cities are here to stay, but for their promise to be delivered, the data they produce must not be put in new siloes. In his session at @ThingsExpo, Mathias Herberts, Co-founder and CTO of Cityzen Data, will deep dive into best practices that will ensure a successful smart city journey.
DevOps at Cloud Expo, taking place Nov 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long dev...
The 19th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportuni...
There is growing need for data-driven applications and the need for digital platforms to build these apps. In his session at 19th Cloud Expo, Muddu Sudhakar, VP and GM of Security & IoT at Splunk, will cover different PaaS solutions and Big Data platforms that are available to build applications. In addition, AI and machine learning are creating new requirements that developers need in the building of next-gen apps. The next-generation digital platforms have some of the past platform needs a...
Pulzze Systems was happy to participate in such a premier event and thankful to be receiving the winning investment and global network support from G-Startup Worldwide. It is an exciting time for Pulzze to showcase the effectiveness of innovative technologies and enable them to make the world smarter and better. The reputable contest is held to identify promising startups around the globe that are assured to change the world through their innovative products and disruptive technologies. There w...
Internet of @ThingsExpo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 19th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices - comp...
Data is the fuel that drives the machine learning algorithmic engines and ultimately provides the business value. In his session at Cloud Expo, Ed Featherston, a director and senior enterprise architect at Collaborative Consulting, will discuss the key considerations around quality, volume, timeliness, and pedigree that must be dealt with in order to properly fuel that engine.
19th Cloud Expo, taking place November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterpri...
Personalization has long been the holy grail of marketing. Simply stated, communicate the most relevant offer to the right person and you will increase sales. To achieve this, you must understand the individual. Consequently, digital marketers developed many ways to gather and leverage customer information to deliver targeted experiences. In his session at @ThingsExpo, Lou Casal, Founder and Principal Consultant at Practicala, discussed how the Internet of Things (IoT) has accelerated our abil...
I wanted to gather all of my Internet of Things (IOT) blogs into a single blog (that I could later use with my University of San Francisco (USF) Big Data “MBA” course). However as I started to pull these blogs together, I realized that my IOT discussion lacked a vision; it lacked an end point towards which an organization could drive their IOT envisioning, proof of value, app dev, data engineering and data science efforts. And I think that the IOT end point is really quite simple…
Identity is in everything and customers are looking to their providers to ensure the security of their identities, transactions and data. With the increased reliance on cloud-based services, service providers must build security and trust into their offerings, adding value to customers and improving the user experience. Making identity, security and privacy easy for customers provides a unique advantage over the competition.
Is the ongoing quest for agility in the data center forcing you to evaluate how to be a part of infrastructure automation efforts? As organizations evolve toward bimodal IT operations, they are embracing new service delivery models and leveraging virtualization to increase infrastructure agility. Therefore, the network must evolve in parallel to become equally agile. Read this essential piece of Gartner research for recommendations on achieving greater agility.
SYS-CON Events announced today that Venafi, the Immune System for the Internet™ and the leading provider of Next Generation Trust Protection, will exhibit at @DevOpsSummit at 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Venafi is the Immune System for the Internet™ that protects the foundation of all cybersecurity – cryptographic keys and digital certificates – so they can’t be misused by bad guys in attacks...
For basic one-to-one voice or video calling solutions, WebRTC has proven to be a very powerful technology. Although WebRTC’s core functionality is to provide secure, real-time p2p media streaming, leveraging native platform features and server-side components brings up new communication capabilities for web and native mobile applications, allowing for advanced multi-user use cases such as video broadcasting, conferencing, and media recording.