Can we mix PSE and NSGA2 in a single workflow?

leonard · 2017-11-30 10:18:19 UTC

Hello, today I have a question regarding what I can achieve with Open Mole. I have a specific job to do, I’m translating it in a genetic friendly problem here
I can be resumed by : is it possible to mix PSE and NSGA2 algorithm in a workflow ?

Here is what I want to do :
Let’s say we have two elements : an animal specie in a natural environment. They are both represented with different parameters, for example :
The animals have a reproduction rate and an indicator of individual sociability, respectively parameters A and B.
The environment is represented by a temperature and a ratio water/land, parameters C and D.

The model output, given A, B, C and D, the probability of not survival of the specie, named P, it has to be minimized.
The point is, I doesn’t make sense to run a PSE on A, B, C and D, I need to run a PSE on C and D, and a NSGA2 on A and B.
It means that I want:
1 - Given an environment C-D, find the best parameters A,B for the animals to survive.
2 - Explore the environment configurations given by C-D in order to see if there is a single super optimized specie able to survive in all environment or several genetic variations each adapted to a specific environment configuration.

The expected result is that only a few genetic variations (A, B) of the specie is needed in order for it to survive in any environment. But I need to find as much quadruplets (A, B, C, D) as I can.

Conclusion : I hope to run a NSGA2 with input (A, B) and output P and a PSE with input (C, D) and output the optimized (A, B), does it seem possible ? It may want to do something like :

nsga2 =
NSGA2(
mu = 100,
genome = Seq(A in (0.0, 1.0), B in (0.0, 1.0)),
objectives = Seq§
),

pse =
PSE (
genome = Seq(C in (0.0, 1.0), D in (0.0, 1.0)),
objectives = Seq(A in (0.0, 1.0), B in (0.0, 1.0)),
stochastic = Stochastic(seed = seed)
)

val optimization =
SteadyStateEvolution(
algorithm = nsga2,
evaluation = task,
parallelism = 10,
termination = 100
)

val evolution =
SteadyStateEvolution(
algorithm = PSE,
evaluation = optimization,
parallelism = 10,
termination = 100
)

Please let me know if all of this make no sense
Léonard

reuillon · 2017-11-30 10:28:14 UTC

I Leonard, this is super interesting. What you need it a niching optimisation algorithm with niches defined as discrete values taken along C and D. This would be super useful to provide that in OpenMOLE and also a new way of exploring models.

In OpenMOLE we rely on MGO for the genetic algorithm part: https://github.com/openmole/mgo

If you like to give it a try to implement such a niching algorithm in MGO I would be delighted to help and then to integrate it in OpenMOLE. Ping me if you need tips to start with MGO. The architecture is quite simple but it is implemented using a lot a monads and it can be confusing if you are not familiar with the concept.

reuillon · 2017-11-30 10:35:15 UTC

Also, you are more than welcome to come work a few days at the ISC-PIF with us to develop this method.

leonard · 2017-11-30 11:16:01 UTC

Hi Romain,
I’m glad you like the idea I’ll go through MGO documentation for a few days and come back to you with some insight of how we can achieve this (I may over estimate myself here) !

I have a first interrogation, I’m no expert of genetic algorithms but if I understand well nsga2 will provide a set of nearly optimal solutions and none of them can be said to be better than the others, this diversity is enforced by the fact that the model on which nsga2 is running is stochastic.

The point here is how do we include this diversity given by nsga2 genome optimization (output of nsga2 : a set of (A, B)) in the genome tuning algorithm of PSE ? Is it ok for PSE to have, given a genome (C, D) a set of objectives (A, B) ? Does it seem doable or the PSE won’t tolerate several potentially very different output for a given input ?

Hope this is clear, !

gcherel · 2017-11-30 14:39:16 UTC

Hi Léonard,

I’m joining the discussion. This is a interesting and difficult problem that you have! But I don’t think the solution that you propose would help you to answer your question. The way you put it, PSE will try to find different values of (a,b) by trying different (c,d), and from the design of your nsga2 task, those value will happen to be those that minimize p for a given (c,d). This is a headache inducing sentence, so let’s try and see what it means in terms of results.

The result of pse will be a set of points {(a_0, b_0),...,(a_n, b_n)} in the unit square (since a and b vary in [0,1]). If the points don’t cover the whole square, it will mean that the values of a and b which minimize p are never in those empty regions, whatever the values of c and d. All you can say from this is that such values of a and b are never the fittest values, whatever the environment (c,d). (In fact, if you want to be rigourous, you need to be more cautious and replace “whatever the environment (c,d)” with “in all environments (c,d) explored by the algorithm”. It is not rigorously possible to rule out that such environments (c,d) exist and that pse just couldn’ find them.)

If the points do cover the whole square, you can conclude that for any (a,b), you can find an environment (c,d) such that (a,b) are the fittest parameters. But “the fittest” don’t necessarily mean “fit”. It may be the species with the lowest probability of not survival, but still have a very high one.

And last, I don’t see how any of those results would allow you to say anything about wether or not there is a super species that would survive anywhere.

I’ve been thinking about a proper way to answer your original question about a super organism or a few genetic variations able to survive. I think your question needs to be formalized a bit more. For example, what is a “super species able to survive in all environment”? Is this a couple (a*,b*) such that those values are the optimal solution with respect to p whatever (c,d)? Or is it (a*, b*) such that p is lower than a certain fixed value for all (c,d)? Or are you trying to find the smallest set of genetic variations {(a_1, b_1),...} such that there is one that survive in any (c,d)? Those are all different question that probably call for different approaches.

Like Romain said, I think it’s an interesting problem and I’d be happy to help you with this, don’t hesitate to come visit us at ISCPIF if you want to talk about it further!

leonard · 2017-11-30 17:57:01 UTC

Hello Guillaume,
Thank you for your post ! Let me bring you more details. I can’t make this post answer to address all your comments, I’m sorry about that, I’ll try to be as concise and clear as possible.

Fit and fittest problem
First you are absolutely right about the distinction between a specie fit this environment and a specie is the fittest for this environment, but I’m only looking for the the fittest specie. When I will have the genome of the fittest specie for a given area of the environment, the answer to the question Does it fit enough? will be asked and answered on the basis of what we expect this specie to achieve in its life time.

Covering the unit square of species
What I’m interested in is to see if the cover of (A,B) is sparse and if so, where exactly are the points on this surface and what is their fitness value. An exemple : for a monkey genotype, how many mutations of its genotype and which mutations make it able to survive in every environment it can access and for these pairs (mutation, environment), how likely are they to survive ? My interest is in the ability of these species to fit (as much as they can) all the possible environment. The exploration of species genotype with nsga2 is done only in order to find the more polyvalent of them without sacrificing fitness and what I’m really interested in is the match between a genotype (a,b) and a surface in (C, D).

Here are three insights of what kind of results the algorithm can give in my opinion and I will explain what these results mean to me.

If polyvalent species exist, the square (A, B) will have a sparse cover and all explored environments will be well fitted. This will raise the question In what kind of environment these polyvalent species don’t fit ? so the environment model will have to be refined.
2)a) If polyvalent species don’t exist, there will be on the map (A, B) a lot of points when the algorithm is done. If this is the case, I will rely on what I said before, once the fittest species are found I have to decide if they fit enough. If they don’t, they are discarded and so we know that we need to build a complete new model of specie in order to fit the whole environment.
2)b) Finally if the square (A,B) is crowded of species and if they all fit very well a small surface of the environment space, I’ll also have to make a complete new model of specie, more polyvalent.

The super specie
A super specie is indeed a specie that is, among all the species the nsga2 has produce, the fittest for all the environments the PSE has explore.
So as you said the super specie is described with a couple (a*, b*) and this couple is such that :
For all explored (c, d), there is no (a-, b-) in the species discovered by nsga2 such that (a*, b*) is dominated by (a-, b-)
This concept of super specie was more a way of explaining what I’m trying to do than a real objective, for complex problems its existence is very unlikely.

Formalization
I formalize my question as follow :
Given the set of all possible species S = { (a1, b1), (a2, b2), …, (an, bn) }, the set of all environments E = { (c1, d1), (c2, d2), …, (cm, dm) }, and a fit function f(a, b, c, d) in R (R means real here) defined for all possible (a,b,c,d). What is the subset Sfittest of S such that, for any environment (ci, di) in E, the specie (ai, bi) which maximize f(ai, bi, ci, di) is in Sfittest. From this, I say that if |Sfittest| = 1, there is super specie (for the model of specie and environment and interaction between these two we took, this is not a general result).

I hope that you found some clarification in this post, in particular about the question whether mixing nsga2 and pse can help me to answer my question. I can come at ISCPIF on monday afternoon if it’s ok for Romain and you ? I would gladly accept all possible help

gcherel · 2017-12-01 13:47:19 UTC

A quick answer about monday: you can come, Romain wont be here but I will. I will read the rest of your post in depth later.

leonard · 2017-12-01 17:50:51 UTC

Ok, I will come monday around 2pm at the ISC and be there for the afternoon, and I will come back to talk with Romain later,
Have a nice week end,
Léonard

reuillon · 2017-12-02 01:30:55 UTC

I have been flying for 12 hours and I had a little time to think about your pb. I think what you want might be a generalisation of the profile algorithm. It would support multi-objective optimisation and you would be able to define what are your niches (not only a discretisation of a single parameter, but more something like the current PSE). I’ve look into it, and the implem should be pretty straightforward starting from the current NoisyProfile algorithm in MGO.

leonard · 2017-12-04 12:32:55 UTC

Hi Romain,
I have been thinking of how tu use the profile algorithm on my problem lately, I’m not sure that what I understood matches what you have in mind so I’m writing my thoughts here and if you have some time this week I can swing by of the ISC to discuss this.

I’m using here the same presentation of the problem I gave earlier, with the same variable names. Quick reminder :
A and B are the parameters of the species
C and D are the parameters of the environment
P is the output probability of the model. Let’s say the model is a function f(A,B,C,D) = P

Indeed it’s very important for me to have the niches of the parameters, but only for A and B. The parameters of the environment (C, D) are explored and what I’m expecting on them is more something like a surface and a link between this surface and a set of niches of A and B, this bipartition of the parameters opens to a wide variety of profiling in my opinion.

To expose one of them I will try to stick as much as possible to what is in the article about the CP algorithm (http://jasss.soc.surrey.ac.uk/18/1/12.html).
In §2.3 you defined the domain of the parameters vectors, I think we now need two domains, one for calibration (the same as the one you defined) and one for exploration (for the PSE).
Given this 2.4 shall be modified to take into account these two domains (straight forward)

§2.5 is not that trivial to modify, it may be the point I’m struggling on. hk(x) must be extended to take into account the fact that the calibration is done in the same time as the exploration of the environment. hk should bears the surface of the environment on which the free parameters have been calibrated and also, I’m not certain if the calibration should accept several environment or not. If yes, it means that it would be possible to have the profile of x in several environment domain determined by the calibration of the free parameters. To reuse the notations of 2.9, it would be possible to have the profile of x in intervals I1,…,Im for a corresponding domain of environment and the profile in Im+1,…, IIk for another domain of environment.

After the profiling of all the species parameters, it would be possible to determine if some of them do not impact the behaviour of the model and if so in which environment this happen. Then for a given parameter, I could extract the calibration of the free parameters on its accepting range.

Am I close to what you imagined in your last post ?

reuillon · 2017-12-05 04:09:00 UTC

Hi leonard,

very interesting, however I am not in France this week, and we have a full week OpenMOLE coding session next week. Lets meet the week after.

Romain

leonard · 2017-12-05 15:48:06 UTC

Hi Romain, ok for next week I’ll contact you by mail on your iscpif mail adress to set up the details

Regarding my problem, after a talk with Guillaume it turns out that mixing PSE and NSGA2 is not what I need and doesn’t do what I expected.
So the problem is still valid but the answer is not in a mix of these two algorithms (as far as I know today).
I’m marking this post as accepted answer to close the thread, and I thank all of you for taking part in it.