Server with multiple breakdowns

AGarcia · October 3, 2012

I am trying to model a server with several failures, where each failure may have a different explanation and thus, a different pattern.

One obvious approach is subclassing a Server and defining additional elements, such as Failure, Timers, etc and modify the processes so that several failures can cause the downtime of the server.

However, I do not feel very comfortable with the scalabitliy of this approach. I mean, if a server has three types of failures I will have to add two new Failure elements, 8 timers, 4 corresponding to each Failure and so on.

If at some point in time I need to add an additional failure, I would need to do the adding all over again, an so on.

I am considering two different approaches for this, but I do not know if these are feasible or if there is a far better approach.

Approach 1.

I would like to subclass the server and create a Repeat Group property so that the user, by means of adding rows, may characterize as many failures as desired.

The thing is, is is possible to modify the processes so that they are general enough to reflect any number of failures (with their corresponding characterization)?

Approach 2.

I was thinking of creating a new object, let us call it objFailure, which contains definitions and processes to reproduce the behaviour of a failure type.

The idea would be that this objFailure has a property which would contain the number of an element within the project, so that when objFailure fails, the corresponding object fails.

This way, modelling a server with several types of failures woukd consist in creating that server and as many objFailure as required, which would cause the server to fail as they themselves fail.

The issue with this approach is that it may not be that easy to represent interferences among failures. For example, if objFailure1 is a count based event failure and objFailure is a Processing Time based failure, when objFailure1 occurs, the server stops and the time during which is down should not be taken into account by objFailure2. This may very complex to program effectively and efficiently.

ASagan · October 3, 2012

I would abandon using Simio's built-in elements and do this all in a process with repeat groups plus delay steps and custom token states as you describe in Option 1. You could use the search step to spawn a new token for each row in the repeat group which would provide scalability, which then gets a delay time randomized according to an input distribution for that row of the repeat group. At the end of the delay time it would trigger a failure. These tokens effectively become your timers instead of using timer elements.

The failure could suspend this 'timer' process such that the other delays stop counting during a timer. This token would then loop back on itself, having a different delay again.

However, my conclusion (but not implemented yet) was that I would use the 'non-scalable' approach. The reason for this is that adding more than 2-3 failure distributions is getting overly complex/detailed and isn't likely to add value and could even possibly subtract from it! Not only that but I've found that when I combine different types of events (in reality, not modelled) they almost always result in some fairly good erlang/exponential/log-normal distributions. These only need to be split out to remove multiple peaks rather than getting to the exact causes.

Not only that, but I've often found that failure data (not models) is recorded very poorly, and often mis-reported. For example a 'slow operation' is instead recorded as a failure. Alternatively, short failures are not recorded at all because the operators are busy fixing the problem. Because of this going into the level of detail you've described is going beyond the quality of the data.

If you actualy trust your failure data, by all means go to town with the scalable approach. If you were part of my firm, I would give you feedback along the lines of "never trust failure data" :wink: However, a colleague of mine does feel differently so this is not a hard and fast rule -- just my experience.

AGarcia · October 4, 2012

Many thanks for your comments and your advice and for the quick response.

I am just new to Simio and this activity is kind of a way to learn. This hint allows my focusing in the right approach or at least to have some idea about what are the advantages and disadvantages of each approach.

I understand your point, though.

Sign In

Server with multiple breakdowns

Recommended Posts

AGarcia

ASagan

AGarcia

Browse

Activity