Notes from Kathy's Desk
Guidance, Ruminations, Thoughts, Miscellaneous Points and Suggestions


Welcome. This is a space for us to share with you some of the interesting questions that come across our desks. There are general articles about optimization  Linear Programming, MIP, SLP, recursion  as well as specific tips and best practice suggestions for GRTMPS. I hope there will be something here that will help you make a success of your Successive Linear Programming models and, occassionally at least, to have a bit of number fun.
3rd February 2022  The second most widely read desk note –more than 10,000 hits since 2017  is #4 How Distributed Recursion Solves the Pooling Problem. While there are a few equations, the explanation is largely underpinned by considering what is being represented in the modelling and the solution. I finished by saying "If you wanted to come at it from the maths, rather than the refining viewpoint, distributed recursion can be shown to be a special derivation of the more general firstorder Taylor expansion linear approximation of a nonlinear equation in which the formula for the average pool quality is used in the specification row. I will do an article about this at some point “ I said, but as a reader recently pointed out, I never did – until now. Here is the one with all the equations…..Enjoy.
Kathy
P.S. In case you are wondering, the note with the most hits is #10 – Does Convergence Matter? The answer  spoiler alert if you haven’t read it yet  is Yes.
New entries will appear monthly or thereabouts. Index to Previous Notes by Topic
Use the feedback form to ask to subscribe to the mailing list if you want to be notified when a note has been posted.
Comments, suggestions gratefully received via the usual email addresses or here.
(Who is Kathy?)
I have previously described how distributed recursion solves the pooling problem (Desk Note #4) – focussing on it as an issue of quality balancing. I mentioned that this method does also have some mathematical underpinnings which I would write about later – and after being reminded by a reader  here is an explanation of how distributed recursion can be derived from a Taylor expansion – and why the extra transformation that gives us distribution factors is useful for stable optimization.
Distribution planning models usually include targets for product quantities to be delivered reflecting commitments that have already been made. Customer N is due to receive X m3 of gasoline. Ideally, your refineries will be able to supply product to fulfil all the demands, but plans don’t always work out and you might find yourself having to manage a shortfall. Contracts sometimes include penalty clauses requiring compensation to be paid if all the contracted product is not delivered. These need to be represented in the model so it can give guidance in how to allocate the product that is available.
If you have multiple distillation towers at your refinery, you may find that the model allocates different crude slates to each. If operational constraints mean that crudes will be mixed in advance and run as a single feed, then the model is overoptimizing by directing them to the towers in different proportions. This is exactly the issue that is solved by pooling.
If you have multiple towers at your refinery, you may find that the model allocates different crude slates to each. If you have the flexibility to achieve that, then this is exactly the kind of solution you want. But if your operating options are more restricted and such plans cannot be implemented, the model is overoptimizing and you need to add some constraints on the blends to guide it to more realisitic plans.
The SSI system that is used to exchange data between model databases and Excel workbooks is useful for many tasks, and even more so when the additional capabilities of the MultiSSI panel and the Workflow Tool for handling multiple imports are taken into account.
Process unit representations in planning models need to include capacity controls so that the optimization takes into account limits on how much material can be processed. The obvious and easy one to set up is a count of the feed.
Process vectors are usually written per unit of feed, so if you put a 1 as a loading factor on the capacity control and give the maximum, the unit has a size. Historically weight models were set up with weightbased limits because that was easy and linear but as it almost certain that physical constraints on the actual plant are volumetric flows, volume capacity controls would be more accurate.
Does your model have a lot of old clutter in it? Cleaning up your model can make it easier to understand, reduce its run time and improve stability. Unused pool qualities are a good decluttering target.
In GRTMPS you can define “generic” operations on distillation units that automatically expand for whichever set of crudes is available in the case. This is more convenient than having to write out operating modes for every crude. But if you have multiple towers or modes and some crudes that cannot be processed in all of them, you need to exclude those feeds from the expansion.
Planning models are usually run so that the optimization maximizes profit margin with no limits set on what can be spent to achieve it. But what if credit is tight? The global economy and political landscape have had quite a few upsets so far this century. For some companies that has certainly meant operating with limited access to finance. Feedstock evaluation changes from being an assessment of how valuable each option is to a question of what is the best combination that can be obtained for the money available.

Crude oil evaluation is one of the most common optimization applications in the oil industry. The predicted profit margin/bbl for each grade relative to the others gives a “pecking order” of preferred feedstocks for a refinery. Even a difference of a few cents per barrel translates to a large sum of money, so it is very worrying when a crude that looked good drops from the top to the bottom of the list from one assessment to the next. Does it mean that the original evaluation was wrong?

Help! My data is in rows, but I need columns. How can I get from:

to 

? 
Where in the reports you can see the margins on all the process units?

Blending Scaling?
Choose 1. 
How does a planning model decide how much process unit capacity to use?
Have you ever rightclicked on a scroll bar?
Case generator is a flexible tool for setting up loads of cases to go at the press of a button – and a 181 file of initial estimates can be very helpful in improving recursion behaviour. So how, asks a client, can I use one with the other?
Spreadsheet or Database? Let’s be honest about our preferences. Given a pile of data to work with your first click wiill be the one that opens Excel. Databases are great for validation but they are lousy at maths so we in the GRTMPS team try to offer you the best of both – so you can use the input database to protect yourself from time wasting over undefined codes, but use a spreadsheet to manipulate data. The tool for passing data in and out of the model database via Excel is called SSI – for SpreadSheet Import.
What do cleaning a fish, climbing a mountain peak, and LP models have in common?
There should be mass balance across a refinery – every molecule of feed stock that comes in comes back out again (eventually) – and we hope, mostly as useful products. But the reality is that when inputs and outputs are compared there will be some discrepancy. How should this be handled in our planning models?

If you have ever sat and stared at a screen absolutely puzzled  by the behaviour of your code, model, simulator etc. you will appreciate the aptness of Dawkin's observation that.... 
*W* Data for recursed qualities missing for components of pool S1# at RR. Values assumed ZERO.
Systematically varying an operating parameter such as severity is a fairly common task for the user of a refinery planning model. Such exercises help us to confirm that a model is appropriately responsive to changes in conditions, allowing you to gain some understanding of the economic impacts as well as to see how the overall system adapts.
It is very useful for the refinery industry that many of the stream qualities that we need to predict blend linearly. But many don't. Vapour pressure, octane, cloud point and viscosity are examples of properties where a simple proportional average of the measured values of the components will not tell you what value the mixture will have. Many such properties can be handled linearly after all, however, if a blending index is applied.
Clearing out a pile of old magazines while tidying up my desk, I came across an old review of "Unknown Quantity: A Real and Imagined History of Algebra" by John Derbyshire (2006). Since I basically earn my living messing about with simultaneous equations and I like a clever title, I thought I would get hold of a copy and have a look.
“I have a multirefinery model and it reports the crude marginal values at each site. Where can I find the overall marginal value for a crude?”
When writing a nonlinear process unit representation most of the effort goes into building the Process Simulator Interface (PSI) spreadsheet itself. However, once that has been completed, it is necessary to link each calculated output PUP with the variables that it depends on in the Adherent Recursion panel to connect it to the LP model. Here is how to to save some time using the Spreadsheet Import (SSI) and PSI Analyzer tools.
If you have the multicore extension to GRTMPS so you can run multiple optimizations simultaneously, the Queue Manager will default Max Jobs to the number of cores. On my i7 laptops that would be 8, but I normally set it to….
If the price of a crude is less than the marginal value, why doesn’t it buy more?
When there is a Process Simulator Interface (PSI) workbook connected to your GRTMPS model the solution values that are used as inputs are passed into the calculations on each recursion pass. These aren’t left behind in the InputValue section of the workbook after the run though, so if you want to set up the simulator to match a particular LP solution, you need to put them in yourself. The Haverly Excel Addin includes a tool for this that you will find very useful for analysing and debugging.
Do most use a single “transfer price” for gas and diesel and other products or do they use a series of pricing levels…..?
The gap between not doing something at all and its maximum level often includes a range where the quantity is too small to be practical. A minimum would force the option to be active, while it might be more profitable to leave it at zero. You can incorporate this kind of condition into an LP model using semicontinuous bounds
GRTMPS offers several ways to temporarily deactivate data records that are not currently needed. But sometimes it just more useful, for example when combining independent models to make a larger multisite one, to be able to obliterate a whole set of information. This can easily be done with an OMNI:DELETE TABLE command.
Where does negative sulphur come from?
Swing cuts are a wellestablished method for representing flexibility in cutpoints on refinery distillation towers. One of the first tech support questions I had this year was about reporting and controling the cut point temperatures on a unit using that method, given that they are determined by how much of the swing has blended up or down into the adjacent core cuts.
Do you think "Dilbert" is a documentary? To mark the end of another busy year and give you something to play with over the holiday season, I have been inspired by modern digital encryption to create a new coding system, “semiprimary”, to challenge your cracking skills.
Did you know that very small numbers are bad for the stability of linear optimization? If you see a zero value for some yield or property in a solution, you will usually be right in assuming that it is actually zero, or so small that it has rounded to it in the report. However, just occasionally, sometimes, if you calculate out the value using your input data and the solution activities, you will find that something should be there. It might be very small, but it should not actually be zero. Here are some suggestions for improving the scaling of process unit representations.
Oil refinery and other process industry optimization problems are largely covered by Linear Programming models. Most variables represent continuous quantities, such as the amount of a component to mix into a blend, that are allowed to take on any real number value between a minimum and maximum. However, there are some constraints that are best handled with discrete variables. A model that contains both linear and discrete variables is an example of Mixed Integer Programming (MIP) and is traditionally solved using a Branch and Bound algorithm.
In GRTMPS, nonlinear equations can be connected to the model directly using Adherent Recursion. Below we’ll present a simple way to fit process unit data into a polynomial function and then use that in a model to drive the linear approximations that are needed for each optimization pass
A ciritical requirement for anyone making blended products like gasoline or diesel is that the properties – such as density, sulphur, octane, cloud point  of the final mixtures are within the legally required specifications. Refinery optimization models obviously need to have equations that represent these constraints.
I travel a lot. My colleagues in Haverly travel a lot. Amongst us we have probably experienced every possible reason for delay that you can imagine. Some distilled wisdom is offered here from our collective experience of air travel and business trips, particularly those dreaded longhaul flights that land you in a different time zone.
Why is the marginal value on the total blend different from the marginal value of the component?
Are you paranoid enough about backing up your work? How many hours would you lose if your computer just wouldn’t boot or if that core document, spreadsheet or database you have just spent a week on was corrupted?
Haverly’s Matrix Analyzer is a very useful tool for combining the matrix structure with the solution values to see how the equations that make up your model are working  all in Excel so you can tinker around with it.
As an example of how it can be used, this note walks through a look at how block operation affects process unit limits.
If you constrained an operating parameter to take a specific value, forcing it away from the optimal value, you would expect to see an incentive on the limit. But what if it always came out blank, no matter what value you fixed it too? What could be going on?
Installing GRTMPS on another computer? You can copy across all your g5 preferences and run history as long as you are installing the same version.

Marginal values – the additional profit to be made if a constraint is relaxed – are one of the benefits of optimizing planning problems with Linear Programming as they can help us understand the economic drivers of the solution. Refinery planning models are normally written with a balance row for each hydrocarbon material being tracked. As equality rows they are always constraining and so we can see a marginal value for each stream – but what exactly do they mean? 
What makes this case different from the base? It worked yesterday, what’s changed? Why don’t we get the same answer? I have previously written about how the GRTMPS compare tool can be used to identify differences between spreadsheets that contain OMNI format input tables. Here's a tool included in MS Office 2016 that can identify the differences betweeen any pair of Excel worbooks, to help you when you have spreadsheets containing other data formats.
This is a linear programming problem written in MPS format. Can you make sense of it?
What makes this case different from the base? It worked yesterday, what’s changed? Why don’t we get the same answer?
When did you last take a good look at your initial pool property estimates? A good first approximation should put your optimization on a good path and save you some recursion passes. Recursion Monitor is a useful tool for checking out your starting qualities; you can compare first to last pass values and check them for internal consistency.
Would you expect the objective function of an integrated refinery model (two or more sites) to normally be lower or higher than the sum of the value of the individual models?
How did I get here? The recursion monitor is a useful tool as it provides an easy way to see what is going on with the recursed parts of your model across each recursion pass. Looking at the final reports only ever shows you where you arrived, not how you got there. Taking a look at what is going on during earlier passes will give you insights into a model’s solution path and help you resolve problems with instability and infeasibilty.
Fancy an infinite amount of money? UNBD as a solution status is offering you just that – but unbounded solutions are not very likely to be prove true out in the real world so they are no better a basis for a plan than an infeasibility
The Matrix is the question and the Solution Print is the answer.
Have you ever tried to track a stream with multiple uses through a solution?
Fancy yourself as a cryptographer? Here's a little code cracking challenge to keep the brain active over the year end holidays.
Its a bit of a waste to make the solution MDB on every run, if you aren't necessarily going to use it, but even more of a bother if you want it and don't have it. Well you can just…..
Many regulatory regimes include product specifications that cover not just individual batches of a particular grade, but also the overall average which is exported from the refinery over multiple grades. These regulations sometimes include incentives for doing even better than the legal requirement – effectively paying you to blend with giveaway. Including such an incentive in your LP model, is easier than you might think.
Does it make any difference if you use a FIX limit or equal MIN and MAX constraints?
Do you have the Haverly Addin activated? Originally for indexing GRTMPS input data spreadsheets, it now has some useful functions for working with your PSI workbooks and for analysing solutions.
When a model doesn’t converge, one possible cause is that some of the adherent recursion PUPs don’t converge. To address this problem, one can use the adherent recursion slope damping to reduce the variations of PUPs between recursions passes to help the model to converge.
Maths boring? Never! Here are some books and movies that illustrate the dramatic (and even comic) potential of a story that revolves around mathematics and computing
This is a guide to the hierarchy of data types in the GRTMPS input database: Model, Case, and Base/Alternate.

A butterfly flaps its wings in the Andes and …. 
Do you ever find yourself grumbling because you got the scaling wrong on a set of numbers, or reversed your positives and negatives? The Multiply and Divide options in Paste Special allow you to sort the problem out with just a few clicks.
If you are working with GRTMPS database models and entering your data via the interface panels the older GRTMPS data table system may be something of a mystery to you. However since the database information is exported into these tables before being processed it can be very helpful to be able to recognize the connections between panels and tables. Debugging tools such as data check, run time messages and file compare all refer to data in the internal tables, so knowing how the names work will make you more efficient. Here is a brief guide.
Every variable and equation in an LP matrix generated via GRTMPS has a unique name built from user assigned codes and internal elements. Understanding them will help in debugging model problems, such as infeasibilities.

It is time for a new computer! What should I get?
We are often asked for computer hardware recommendations to reduce the time required to solve a GRTMPS planning case. Let's answer the hardware question and throw in some additional suggestions that can reduce run times.

If you’ve ever wasted a few hours debugging a broken Excel workbook, then you will be as excited as I was when I learned about Go To Special last year. This does a lot of really interesting things, including finding all the cells in a worksheet with errors in them. 

If you have a model with multiple locations and / or periods, you may want to create limits that control subtotals across all or some of these places and times. If you can buy a stream at 3 locations in 3 periods, how many constraints would you need to cover every possible subset? How do you put them in the model?
Are you using the “spooling” option when you submit runs? It might save you some run time.
Have you ever entered a formula into Excel only to have it treat it as text and just sit there displaying what you typed without resolving it?
When you are setting up the case data for your monthly plan, what price should you use for the crude or other materials that you have already bought? Quite a few people I have asked thought the answer was obvious, but they did not all come up with the same answer.
Distributed Recursion SLP* models require initial estimates for the pool properties that are being optimized and the pool’s distribution factors. By default the pool property values are taken from the blending data and the error distribution is an even division over all the ways the pool can be used. However, you can override these numbers by using a “181 file” as an additional input to the model. Sometimes this helps the optimization converge sooner and may find you a better value.
Sorting your stream list can help you manage your data and make your reports easier to read. In most full database models, this is quite a long list and it can be challenging to use and maintain as it is unlikely to fit on one screen. (Even if you use a tablebased model, you probably have a list of crude streams here – so read on.)
The trend towards larger and larger models works against our desire for fast run times. If you are adding many crudes, periods and / or locations, adjusting the OMNI settings for your GRTMPS model might help speed things up again. It might even be essential to keep it running.
This is an introduction to the fundamental issue that brought recursion into refinery planning models and how this approximation allows us to optimize the qualities of products where some of the components are themselves mixtures of varying qualities.
HOW DO YOU SEARCH FOR "*", "?" and "~" ?
If you are working with GRTMPS data in a spreadsheet  data tables, SSIs, etc.  you probably have some cells that contain asterisks(*) and question marks (?) since GRTMPS uses these as wild cards, replacing them with specific period, location and crude codes when the data is processed. The challenge on doing a Find for a specific entry with one of these characters is that Excel uses them as wild cards in search terms, “?” for any single character and “*” for any group of characters (as does the find in Windows Explorer and many editors).
WHAT DO YOU DO WHEN H/XPRESS FREEZES OVER?
Have you had a run that appears to freeze in the optimize step? It might well have already done some recursion passes, but now it’s just sitting there in the Queue Manager like it is never going to finish.
Have you ever wondered why pool qualities sometimes have marginal values, even when there is no specification?