[lmi] Toward a 7702A testing strategy

Next Topic
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view

[lmi] Toward a 7702A testing strategy

Greg Chicares
Later this year, we'll rewrite 'ihs_irc7702a.?pp' to reflect new
specifications and address known inaccuracies in its implementation
of section 7702A of the US tax code. It would be unwise to commence
that work without a testing strategy. The statute prescribes a
bright-line test, and an error of one cent in the wrong direction
can lead to tens of thousands of dollars in fees and penalties.

Here's an advocacy piece I wrote last year, slightly retouched.

I. Validate 7702A calculations with a unit test, not a system test.

Let's use these definitions from the IEEE Standard Computer Dictionary:

| System testing: testing conducted on a complete, integrated system
| to evaluate the system's compliance with its specified requirements
| Unit testing: testing of individual hardware or software units or
| groups of related units

7702A calculations are one "unit" among many in a life-insurance admin
or illustration system. One unit test might look like this:

- original state
  non-MEC = status
        3 = 7702A "contract year"
   100000 = account value
  1000000 = specified amount
  1000000 = 7702A "lowest death benefit"

- transaction
   500000 = specified-amount decrease

- resulting state (what the test validates)
  non-MEC = status
        3 = 7702A "contract year"
   100000 = account value
   500000 = specified amount
        0 = seven-pay premium

That's simplified (I omitted age, NSP, and many other things) only
for presentation of the concept. A real-world unit test is as simple
as it can be (but no simpler). It isolates the inputs and outputs we
care about. What it isolates is small enough to calculate by hand.
If you repeat a unit test and find a discrepancy, you immediately
know exactly which calculation must be broken, and writing a defect
report is trivial.

A system test, on the other hand, might look like this:

- setup
  1000000 = specified amount
       45 = issue age
    40000 = annual premium to be paid every year
  context: one particular product

- outcome that we really care about
    50000 = original seven-pay premium
        9 = year it becomes a MEC

- what we actually would wind up testing
    zillions of values

That's extremely simplified, of course. The output is all daily or
monthly values--product mechanics are commingled with what we really
care about. We can extract some subset like ninth-year 7702A values,
but we can hardly know that the benchmark results we originally stored
for comparison are correct--we'd have to calculate all values by hand,
and that's infeasibly laborious. If some detail of monthiversary
processing is later changed, the benchmark becomes invalid. Therefore,
if we find a discrepancy when we repeat the tests later, we don't know
whether it has anything to do with 7702A. System testing isn't the
appropriate way to test a unit. Unit testing is.

II. Automate all tests. Use all tests as regression tests.

| Regression testing: selective retesting of a system or component to
| verify that modifications have not caused unintended effects and
| that the system or component still complies with its specified
| requirements

Unit tests can be regression tests. System tests can be regression
tests. Any test you run repeatedly is a regression test. (Originally,
the term meant only tests added as a result of fixing a defect; it
ensured that behavior didn't "regress" to its previous defective
state. But it's evolved to encompass any test that guards against
deviation from an original correct behavior.)

"Automate all tests" may be seen as a polemical statement:

| Manual vs. Automated
| Some writers believe that test automation is so expensive relative
| to its value that it should be used sparingly. Others, such as
| advocates of agile development, recommend automating 100% of all
| tests. A challenge with automation is that automated testing
| requires automated test oracles (an oracle is a mechanism or
| principle by which a problem in the software can be recognized).

"Agile development" advocates respond: tests must run quickly, so
design them with speed in mind; and manual is the opposite of fast, so
design an automated testing framework up front. Tests that take a day
to run are tests a vendor would ask us not to run often. Manual tests
suffer from a resource constraint on our end: "it'd take three person-
days to run the whole test suite, and we can spare only half a person,
but we have to release this tomorrow".

"Agile" developers would also quarrel with the word "selective" in the
definition above. If the tests are fast enough, then even thinking
about being "selective" takes more time than running the whole suite

| we have UnitTests that check whether (a) class X works, and (b) all
| other classes work in the context of class X. The UnitTests run in
| under 5 minutes, checking everything. (We are only testing around
| 1000 classes, with only about 20,000 individual checks, but, well, we
| all know Smalltalk is slow. C++ would probably be a lot faster. ;-> )

Rounding is crucial in lmi. It has 8216 tests, and they all run in
thirteen thousandths of a second. Maybe we could do with fewer tests,
but why spend any thought on that? More complicated calculations would
take a little more time; we test about forty other units, and the
whole suite runs in less than ten seconds. Our system tests take eight
minutes for 1327 tests, of which 295 are intended specifically to test
7702A--though none of the 295 has been verified. It's interesting to
compare these two approaches (numbers rounded):

  approach:                   all system tests   rounding unit tests
  number of tests:            1300               8000
  time:                       500 s              0.013 s
  time per test:              0.38s              0.0000016s
  number of values:           13000000           8000
  time per value:             0.000038s          0.0000016s
  % of values verified:       about 0%           100%
  is it right?                uh, dunno          yes, assuredly

Spending a fraction of a second to test a few thousand values, each
of which has been carefully validated, is sane. Spending several
minutes to test over ten million values, few or none of which we
really know to be correct, is a different matter. My main regrets are
that we don't have enough unit tests yet, and that we tried to write
system tests where unit tests would have been more powerful as well
as faster--notably, for 7702 and 7702A. I intend not to repeat that

Unit tests are the fastest tests, so prefer them for testing units
like 7702A. System tests are also necessary, but don't try to use
them where unit tests are more appropriate. Some testing is going to
remain manual ("they changed the webpage to blue on green, so it's
inaccessible to the colorblind"), but prefer automation wherever it's
feasible. Automation, of course, gives the lowest lifetime costs,
because computers check output more cheaply than people can.

III. 7702A testing

Consider the above points in the 7702A context. The lmi system tests
for 7702A are documented like this (small sample):

  ben incr [8] with unnec prem [8] (opt A) nonmec [defect?]
  ben incr [8] with unnec prem [8] (opt B) nonmec [defect?]
  ben incr [8] with unnec prem [8] (opt ROP) nonmec [defect?]
  opt B, w/d, but not below init DB - nonmec, cvat
  opt ROP, w/d, not below init DB - non-mec, cvat
  -50% SA rate, nec prem pmt [8] (AV < DCV), cvat
  12% cred rate, nec prem pmt [8] (DCV < AV), cvat

It is interesting to inquire what happens when, I guess, the full
necessary premium is paid in year eight, and to consider how that
might differ depending on whether the account value is higher or lower
than the deemed cash value. And it's clever to test the latter
condition by manipulating the separate- and general-account rates.
Unfortunately, forcing 7702A into system testing requires that sort
of cleverness. So much cleverness was expended frivolously that not
enough remained to look into the suspected defects.

Each of the 295 7702A system tests produces about ten thousand
year-end values, few of which are relevant to 7702A. Monthly detail
would be about twelve times as long and take twelve times as much
time, so we skipped it to keep the tests fast--even though 7702A
events can occur on any monthiversary in an illustration, and even
though tracking down a discrepancy in regression testing probably
requires generating and studying monthly detail.

Since this was originally written six months ago, we've found that
off-anniversary MEC testing is incorrect, in a very general way that
could easily have been detected by a simple unit test. The scope of
the system test was too vast (295 * ~10000 = about 3000000 values),
yet at the same time too narrow (zero valid off-anniversary tests).
And there's another grave problem: none of the 2950000 values is
known to be correct. Some are thought to have been validated to some
degree by hand, but which, and how, are questions with no documented
answers. These are mistakes to learn from, not to repeat.

We haven't yet made the time to do 7702A unit tests the right way,
but there's a sketch here:


Let me translate the 'test02' function you'll find there:

  first month of first year
    100000 specified amount
      1000 payment
  * test: it shouldn't be a MEC

  second month of first year
     99999 payment
  * test: it should be a MEC

There are good reasons why the code is much more verbose than that,
but that's all it really says.

This is independent of any product. It just uses dummy values, like
  NSP: .1, .2, .3 for the first three years
which have the great virtue of simplifying hand calculations.

There are only seven such tests; they take about two-thousands of a
second each. The system tests mentioned above take a third of a
second each--more than two orders of magnitude slower. A complete
7702A unit-test suite might have a thousand tests and take half a
second to run. It would take a month to write and validate that many
tests at a rate of ten minutes apiece, but that's the total lifetime
cost unless the tax law changes, because half a second of computer
time costs nothing even if you spend it every day. Can't spend a
month? Then spend what you can afford, knowing that you're spending
it in the most effective way possible.

We can test lmi this way, because lmi is under our control. Other
systems (e.g., for administration) can and should use the same tests.
For this to be feasible, we need an interface to the target system's
calculations. We need to be able to say "jam in these values at this
point in time" and "spit back these values after a 7702A cycle", and
those have to be machine rather than human instructions--no rekeying.
And we need a program on this end to send those instructions, receive
the results, and compare the results against values known to be
correct. These are not exotic demands that contemporary technology
can't meet. There's a term for it--client-server--and that's a core
technology in any company's systems strategy today. Insurers should
make this a nonnegotiable requirement for any vendor system.

Of course, once the test suite is established, it's not hard to
apply it to every system. And once there's an automated test suite
for every system, it's not hard to make them all match closely.

Here, I've restricted the scope to abstract 7702A transactions.
Tabular values are easily checked by other means that can easily be
demonstrated. If things like NSPs are to be calculated from first
principles, lmi has some unit tests for that already. Deemed cash
value is a chimera that deserves separate consideration, but is
beyond the scope of this posting.

lmi mailing list
[hidden email]
Reply | Threaded
Open this post in threaded view

Re: Toward a 7702A testing strategy

Greg Chicares-2
On 2006-01-06 14:13Z, Greg Chicares wrote:
> A real-world unit test [...] isolates the inputs and outputs we
> care about. What it isolates is small enough to calculate by hand.

Class 'mec_input' embodies the necessary inputs fairly well, though
of course we may refine it. Now let's consider what outputs would be
helpful so that we can establish a suite of '.mec' unit tests (which
can replace 'irc7702a_test.cpp'). Recording only the most important
results--MEC status, 7PP, LDB, and perhaps DCV--is not enough:
 - they might all be correct for a particular test case, even though
   there's an error in some intermediate step--so that a defect that
   might have been detected escapes notice; and
 - if they seem to be incorrect, then the reason remains obscure--so
   that an excessive amount of spelunking is required.

This matrix seems to capture most 7702A calculation details [0] for
a group of transactions applied on the same day (which is exactly
what a '.mec' file tests):

           or 1035  incr  decr  nec_prem  MC unnec_prem
  bft          +      +     +       -      -      -        3
  LDB          -      -     +       -      +      -        2
  amts_pd      -      -     -       +      -      +        2
  MC           -      +     -       -      -      +        2
  DCV          +      -     -       +      +      +        4
  7PP          +      -     +       -      +      -        3
  MEC          +      -     +       +      +      +        5
                                                          21 / 42

Columns are events that trigger a calculation. Rows are values that
may need to be recalculated due to various triggers. Intersections
are marked '+' where a value might change, and '-' otherwise; at the
right is a total of '+' signs. Calculations generally flow from left
to right, and within a column from top to bottom.

I'm inclined to include all of that matrix's forty-two data in an
output file for testing, without even bothering to compress out the
half that AFAICT cannot bear any actual information. Symmetry and
clarity seem more important than saving a few bytes.

The output file should also record certain scalar values that aren't
affected in different ways by different triggers--including these
values that are looked up or deduced directly from input:
  policy year
  contract year
  7PP rate
  NSP rate
  target premium
  target premium-load rate
  excess premium-load rate
as well as these intermediate calculated values:
  net 1035 amount
  NSP (dollar amount)
  gross and net maximum necessary premium
  cumulative 7PP and cumulative amounts paid
and these results that an admin or illustration system would either
use or store for subsequent use:
  maximum non-MEC premium
  last MC date
  cash value as of contract duration zero (for later decreases)

I'll probably use xml for the output file.


[0] "This matrix seems to capture most 7702A calculation details"

Here are some notes on its rows and columns.

The first column represents the initial state determined by input
parameters on the as-of date. It's new business if that date equals
the effective date; otherwise, it's inforce. A 1035 exchange is
permitted only as of the issue date.

The increase and decrease columns might have been combined. I've
kept them separate because their effects are very different.

Amounts paid are separated into necessary and unnecessary portions
because material changes are recognized after the necessary portion
has been applied.

The amounts-paid row represents premiums less any nontaxable
distributions. The benefits row represents either death benefit or
specified amount, depending on the insurer's 7702A interpretation.

Material change appears both as a column and as a row. It's a column
because it triggers changes in other values. It's a row because its
value is an important part of the state of the contract. The column
shows the effects of processing a material change; the row indicates
that it may or will be necessary to process a material change.

A material change is processed before any unnecessary premium is
applied. Payment of unnecessary premium is one possible trigger for
recognizing a material change. The apparent circularity is resolved
by observing that the presence of pending unnecessary premium can be
ascertained before its application.

lmi mailing list
[hidden email]
Reply | Threaded
Open this post in threaded view

Re: Toward a 7702A testing strategy

Greg Chicares-2
"The methods on which the mathematician is content to hang his reputation
are generally those which he fancies will save him and all who come after
him the labour of thinking about what has cost himself so much thought."
  -- James Clerk Maxwell

- Dataflow analysis of §7702 and §7702A -

Why is it so hard to implement §7702 and §7702A correctly? Traditional
documentation, including our own ( http://www.nongnu.org/lmi/7702.html ),
spells out exact formulas, and identifies the progression of program
steps clearly enough, because we think of formulas and flowcharts as
normal tools for technical writing. Studying the mistakes in lmi's tax
implementation leads me to the conclusion that I have too often used the
right data as of the wrong moment, because the flow of data has not been
made explicit. This note addresses that shortcoming.

Even though "dataflow" is not typically mentioned in our specifications,
it's a familiar concept. It's simply what a spreadsheet does. Suppose
cumulative 7PP is in column C, and it's updated in cells C3, C5, and C8.
If a cell in column F references C5 when it should use C8, that's a
dataflow problem. A spreadsheet does what a flowchart cannot: it shows
progressive changes to the small handful of quantities that matter for
taxation. Pinning down when and why those values change is tricky unless
we work through it systematically up front.

To analyze the data flow, we must first isolate the data. It will be
clearest to imagine a server that handles all taxation on behalf of a
client admin or illustration system. Even if we don't actually write a
client-server application, this notion forces a modular design that keeps
tax data together with tax code, and separate from policy data. Each day's
transactions are captured and handed to the server--which handles them in
an order appropriate for taxation (which need not be identical to the
client system's order), and notifies the client of consequences.

- Motivating example -

Recently I began to sketch out a new GPT implementation. I started with
the order of operations specified in this "flowchart":
  adjustment event --> forceout --> payment
That looks right: an adjustment event can cause a forceout, and it seems
reasonable enough to force money out if necessary before trying to put
any more in. But consider this case:

  pay GSP at issue; one year later, take a withdrawal

  ----client---  ---server---
    spec   acct         prems
     amt  value    GSP   paid
  100000  26000  25000  25000   before withdrawal
   90000  16000                 after withdrawal
                 22000          adjustment event
          13000?        -3000?  forceout
            ???        -10000?  payment decrement due to withdrawal

Taxation should certainly be a standalone module with exclusive ownership
of its data. The admin or illustration system in which the GPT module is
embedded shouldn't modify that data directly. How should the withdrawal
flow through to affect premiums paid? It seemed reasonable to treat a
withdrawal as a payment that happens to be negative, but doing so caused
a spurious forceout. Swapping the last two rows seems to make the problem
go away:

          16000        -10000   payment decrement due to withdrawal
          16000             0   forceout

Should we conclude that the original flowchart
  adjustment event --> forceout --> payment
was wrong, and reversing the order of the last two operations
  adjustment event --> payment --> forceout
is correct?

- Analyzing the difficulties -

GPT and §7702A calculations pose two principal difficulties. One is the
complexity of the life-contingencies formulas for calculating premiums.
They involve a lot of little pieces--q's and i's--but those pieces are
all similar, and that homogeneity makes formulas testable. For example,
there's only one path through a subroutine that calculates seven-pay
premiums, so once it's validated for a couple of inputs (including one
with less than seven years to maturity), it's likely to be correct for
any other input. A well-written formula can be gotten at a glance, so
that its correctness may be judged by inspection; and ours have been
confirmed to exactly reproduce values published in the actuarial
literature. Calculating premiums from formulas isn't our problem.

The other major difficulty is interweaving those premiums with policy
transactions, adjusting and enforcing statutory limits as needed. This
requires only a little simple arithmetic and an occasional premium
recalculation, but the interdependencies among steps become intricate.
A program that's correct for one input scenario can easily be incorrect
for another. Rarely is every path through the program tested. Even the
number of points that should be tested is not typically known, so a test
suite's coverage cannot be measured.

The root of the problem is that we've sliced the task into pieces that
can be understood separately, but not together. We know very well how
to slice out a GPT adjustment:
or a forceout:
  cum premiums paid - max(GSP, cum GLP)
but a few simple rules like those are usually embedded in sprawling
narrative commentary that's often incomplete or ambiguous. Tax values
like cumulative premiums paid change from one step to the next, and it's
hard to be sure we're always using the right value at the right time.

- Tabulating the data flow -

Whenever we work through a problem like the motivating example above, we
naturally draw a table whose rows are the steps in a flowchart, and whose
columns are these intermediate tax values. That is, we resort to dataflow
analysis when we encounter a difficulty. Could we avoid difficulties in
the first place by using this technique up front? We might start by
noting where each quantity is read or written:

                GLP     GSP     paid  ...
  issue        write   write   write
  withdrawal  ignore  ignore   write
  adj evt      write   write  ignore
  forceout      read    read    read
  payment        ...

That immediately sheds light on the motivating example. The real issue
is not whether forceouts precede payments, but rather when the withdrawal
decreases premiums paid.

              GSP    paid
  issue      25000  25000
  WD            --  15000   <-- decrement premiums paid before GPT limit
  adj evt    22000     --
  forceout      --     --   <-- then these calculations...
  payment       --     --   <-- ...work correctly

In effect, that says that withdrawals are atomic: their side effects
occur at the same moment--so that, in the motivating example, premiums
paid is decremented before an adjustment event is processed. However, we
don't want the client to change some GPT values directly and defer to the
server to handle others: the server should have exclusive ownership of
all tax values. Rather, the client should notify the server that premiums
paid are to be decremented, and let the server change its datum. Thus:

                            GSP     paid
  withdrawal              hidden  notify  send notification from client
  decrease premiums paid  ignore   write  process notification on server
  adj evt                  write  ignore
  forceout                  read    read  now we compare the right data

Furthermore, certain transactions must be combined together, as specified
in section 5.8 of our specifications:

| All adjustment events and material changes that occur on the same date
| are combined together and processed as one single change. This is done
| as soon as all transactions that potentially create adjustment events
| have been applied, and must be done before any new premium is accepted
| because an adjustment can affect the guideline premium limit.

Let's tabulate the complete data flow, along with queued actions that the
server must handle based on client notifications:

                             --- Tableau GPT-1 ---

            GPT: data and actions, by cause (rows) and effect (columns)

                 decr             queue                         amt                  cum
                prems spec          adj     cum       queue  forced queue rejected prems
                 paid  amt CSV DB event GLP GLP GSP forceout    out  pmt     pmt    paid
  initialization   -    i   i   i   -    i   i   i      -       -     -       -       i
  non-1035 issue   -    -   -   -   -    -   -   -      -       -     -       -       -
  1035     issue   -    -   u   w   -    -   -   -      -       -     t       -       -
  dbo     change   -    w   -   w   t    -   -   -      -       -     -       -       -
  specamt change   -    w   -   w   t    -   -   -      -       -     -       -       -
  withdrawal       t    w   u   w   t    -   -   -      -       -     -       -       -
  decr prems paid  -    -   -   -   -    -   -   -      -       -     -       -       u
  adj evt          -    r   -   r   -    u   u   u      t       -     -       -       -
  march of time    -    -   -   -   -    r   u   -      t       -     -       -       -
  forceout         -    -  (u) (w)  -    -   r   r      -       w     -       -       u
  new premium      -    -  (u) (w)  -    -   r   r      -       -     -       w       u

    i  initialize
    r  read
    w  write  (replace, ignoring prior value)
    u  update (increment or decrement prior value)
    t  trigger
    -  no effect possible
    () effect performed on client after server returns

Effects shown are possible, but not certain: for example, an adjustment
event may or may not cause a forceout. Only direct effects are indicated:
thus, a DBO change triggers an adjustment event, which in turn affects
GLP, but the DBO change itself has no immediate direct effect on GLP.

The meanings of most rows and columns will be clear enough to anyone well
versed in §7702. Positive and negative payments are separated for reasons
already discussed; the "march of time" increments cumulative GLP on each
policy anniversary; and "initialization", which precedes everything else,
sets up policy-specific data (mortality, interest, benefit amount, etc.)
required for premium calculations, and sets premiums appropriately for
inforce or new business.

Some items are both rows and columns, for synchronization: for example,
a withdrawal queues up a decrement to premiums paid. The row for that
operation can also handle exogenous events that decrease premiums paid,
such as a payment returned to preserve a non-MEC. The effect on CSV and
DB is completed beforehand, on the client; this row serves only to
synchronize the premiums-paid datum owned by the GPT code. Similarly, a
DBO change and a benefit reduction are combined and treated as a single
adjustment event.

The rows are in chronological order. This order is not necessarily unique,
particularly because some rows are mutually exclusive. For example, the
order of forceouts and positive payments is undefined because a premium
cannot be accepted under conditions that require a forceout, so swapping
those two rows would have no effect. (That's the real answer to the
question posed in the motivating example.)

Columns are nearly chronological, but can't be strictly so. For instance,
the forceout calculation depends on premiums paid, but the amount forced
out must be removed from premiums paid.

The tableau is "reversible": it can be used as a tool to look backwards.
For example, if a forceout occurs, then it must have resulted either from
updating the guideline premium limit with a negative guideline on a policy
anniversary, or from an adjustment event off anniversary. No other cause
is possible--and in that sense the '-' symbols serve the positive purpose
of ruling out other causes.

- Extension and refactoring -

The tableau doesn't encompass every client transaction of interest, yet
it is readily extensible. For example, QAB changes could be treated by
adding a QAB column that triggers an adjustment event. For the purpose
of writing the server, though, that's a needless complication. The server
must be able to process adjustment events triggered by exogenous causes
whose nature is known only to the client. As long as the tableau has a
row to handle every aspect of GPT calculations, the server needn't care
why an event was triggered.

Similarly, it needn't care what caused a forceout. Testing for a forceout
at the right moment, unconditionally, will always produce the correct
outcome, so we can use our knowledge of the data flow to simplify the
logic by eliminating the triggering column.

The CSV column isn't read, so it can be deleted. The specified-amount
and DB columns could be combined--guideline premiums reflect whichever
one we choose to regard as the §7702(f)(3) benefit. Even better, those
two columns can be removed, and the current and prior benefit amounts
shown as a footnote, because GPT calculations don't change those values.

We can also group all "queue" columns together. Their order doesn't
matter because they have no immediate effect--they just raise flags to
notify the server of actions that it must take, in its own order.

It will also be clearer if we move "decrement premiums paid" down, next
to the forceout row, because both have the effect of decreasing premiums
paid. Our dataflow analysis proves that this is safe. "Initialization"
can likewise be moved down.

These simplifications and rearrangements produce rectangles at lower left
and upper right that can be ignored because they contain only '-':

                 ---- triggers ---- | -------------- data ---------------
                 queue  queue queue |                                 cum
                 prems adjust  pos  |     cum              rejected prems
                 paid-  event  pmt  | GLP GLP GSP forceout    pmt    paid
  non-1035 issue    -     -     -   |  -   -   -      -        -       -
  1035     issue    -     -     t   |  -   -   -      -        -       -
  dbo     change    -     t     -   |  -   -   -      -        -       -
  specamt change    -     t     -   |  -   -   -      -        -       -
  withdrawal        t     t     -   |  -   -   -      -        -       -
  initialization    -     -     -   |  i   i   i      -        -       i
  GPT adjustment    -     -     -   |  u   u   u      -        -       -
  march of time     -     -     -   |  r   u   -      -        -       -
  decr prems paid   -     -     -   |  -   -   -      -        -       u
  forceout          -     -     -   |  -   r   r      w        -       u
  new premium       -     -     -   |  -   r   r      -        w       u

Thus, the tableau has been factored into these separate components:

-control flow-|------------------------- data flow ------------------------
              | admin or illustration client | GPT server
      name    | consolidate transactions and |
       of     | queue consequent GPT actions |
      each    |------------------------------------------------------------
   subroutine |                              | initialize: pass parameters
     called   |                              | from client to server
       in     |                              | - - - - - - - - - - - - - -
   sequential |                              | guideline premium formulas
      order   |                              | involving life contingencies
  constitutes |                              | - - - - - - - - - - - - - -
       the    |                              | simple arithmetic to enforce
    flowchart |                              | guideline premium limit

The upper-left portion specifies the interface from client to server.
All adjustment events are combined into one that's fully characterized by
benefit values before and after the adjustment (including QABs if we wish).
Decrements to premiums paid are similarly combined. And the only positive
payment indicated comes from a 1035 exchange, which occurs as of the issue
date, on which no adjustment event is allowed; if it exceeds the guideline
limit, that's an error that prevents issue. Thus, this submatrix is just a
machine that consolidates adjustment events and negative payments into
summary data for the server to process (along with new premium).

The lower-right portion specifies the server's operation, in a 6 by 6
subset of the 11 by 13 initial tableau:

                 --- Tableau GPT-2 ---
                      cum              rejected prems
                  GLP GLP GSP forceout    pmt    paid
  initialization   i   i   i      -        -       i
  GPT adjustment   u   u   u      -        -       -
  march of time    r   u   -      -        -       -
  decr prems paid  -   -   -      -        -       u
  forceout         -   r   r      w        -       u
  new premium      -   r   r      -        w       u

- A template for testing -

This tableau, combined with current and prior benefit values and details
of guideline premium calculations, is exactly what a knowledgeable tester
needs for validating the effect of a day's transactions on an admin system
that has its own GPT implementation. Therefore, it suggests a template for
the output of a GUI tool to support acceptance testing--which may also be
handy for answering what-if questions.

For instance, consider the guideline-negative example on page 101 of the
SOA §7702 textbook (Table V-4):

  reduction from $100000 to $50000 on twenty-fifth anniversary
  assume DBO 1 and $2000 annual premium
  '-' cells are blank; others show values written, or '...' if unchanged

                                 cum                     rejected    prems
                       GLP       GLP        GSP forceout    pmt       paid
  initialization   2035.42  50885.50   23883.74                   50000.00
  GPT adjustment  -1804.87     ...     -5067.35
  march of time             49080.63
  decr prems paid                                                    ...
  forceout                                        919.37          49080.63
  new premium                                             2000.00    ...

(If the decrease had occurred off anniversary, then a pro-rata adjustment
to the premium limit would have reduced the premium limit immediately.)

This table shows how the relevant GPT quantities change at each step, in
a uniform way that works well for any combination of transactions. If you
step through the calculations for any scenario, you'll find yourself
populating at least a portion of this table. Alternatively, if a GPT
server is programmed to print an optional trace of its calculations at
each step, this is the output it should produce.

Tabulating the dataflow renders the operation of the GPT server readily
intelligible: it can all be seen at a glance. The tableau guides the
design of unit and acceptance tests that ensure full testing coverage.
The server can be verified exhaustively by checking every cell in each
unit test's tableau (GLP and GSP calculations being tested separately).
And the GUI tool mentioned above can be used to calibrate expected
acceptance test results for an admin system's GPT implementation.

- MEC testing -

Similar tableaux can be devised for MEC testing:

                                --- Tableau MEC-1 ---
                   decr                    queue                max             cum
                   amts spec        queue    mat                nec        cum amts
                   paid  amt CSV DB reduc change DCV CSV0 bft0 prem CY 7PP 7PP paid MEC
  initialization     -    i   i   i   -      -    i    i    i    -   i  i   i    i   i
  non-1035 issue     -    r   -   -   -      -    -    -    -    -   -  -   -    -   -
  1035     issue     -    r   u   w   -      t    -    -    -    -   -  -   -    -   w
  GPT adj evt        -    w   -   w   t      t    u    -    -    -   -  -   -    -   -
  CVAT increase      -    w   -   w   -      t    u    -    -    -   -  -   -    -   -
  CVAT decrease      -    w   -   w   t      -    u    -    -    -   -  -   -    -   -
  withdrawal         t    w   u   w   t      -    u    -    -    -   -  -   -    -   -
  march of time      -    -   -   -   -      -    -    -    -    -   u  r   u    -   -
  decr amts paid     -    -   -   -   -      -    -    -    -    -   -  -   -    u   -
  reduction rule     -    r   -   r   -      -    -    r    w    -   r  w  wr    r   w
  nec prem test      -    -   r   -   -      t    r    -    r    w   -  -   -    -   -
  pay necessary      -    -   u   w   -      -    u    -    -    r   r  -   r   ur   w
  mat change         -    r   r   r   -      -    w    w    w    -   w  w   w    w   -
  pay unnecessary    -    -  (u) (w)  -      -   (u)   -    -    r   r  -   r   ur   w

Here, the upper-right submatrix isn't entirely empty because it indicates
that a 1035 from a MEC is a MEC, but we needn't fret over that "impurity"
as long as it's covered by a unit test.

Obviously MEC testing is more complicated than the GPT. Reductions and
material changes use the same 7PP formula, but with different parameters
and different effects. While the GPT simply limits premium, §7702A also
monitors cash-value growth through its necessary premium test, which
introduces a DCV that must largely be calculated by the client. Time
marches to the pulse of §7702A's peculiar "contract year", and CSV and
benefit amount as of contract year zero must be stored.

CSV and DB change between the necessary-premium and material-change rows.
Therefore, in a true client-server architecture, the server would be
consulted more than once for a single day's transactions. The GUI tool
described above probably ought to update these quantities itself.

Largely because the benefit and CSV columns must be retained, extracting
the lower-right submatrix eliminates fewer columns than it did for the
GPT. The "queue material change" column can be removed, much as the GPT
"queue forceout" column was removed above.

                       --- Tableau MEC-2 ---
                                         max                   cum
                                         nec contract     cum amts
                  CSV DB  DCV CSV0 bft0 prem     year 7PP 7PP paid MEC
  initialization   i   i   i    i    i    -      i     i   i    i   i
  march of time    -   -   -    -    -    -      u     r   u    -   -
  decr amts paid   -   -   -    -    -    -      -     -   -    u   -
  reduction rule   -   r   -    r    w    -      r     w  wr    r   w
  nec prem test    r   -   r    -    r    w      -     -   -    -   -
  pay necessary    u   w   u    -    -    r      r     -   r   ur   w
  mat change       r   r   w    w    w    -      w     w   w    w   -
  pay unnecessary (u) (w) (u)   -    -    r      r     -   r   ur   w

We can compare dataflow complexity by comparing tableaux. Counting cells
that are modified (because they contain 'u' or 'w'):

   tableau  rows columns cells modified
   -------  ---- ------- ----- --------
   GPT-2      6      6     36      9
   MEC-2      8     11     88     22

suggests that MEC testing is about twice as complex as the GPT by this
measure, although the life-contingencies formulas are more complicated
for the GPT.

- Adaptability and insight -

Is the §7702(f)(3) "death benefit" specified amount, or death benefit?
To conform to new administration systems, lmi must soon be adapted to
support both. At first, I was horrified at the prospect, figuring that
I'd need to comb through many, many pages of narrative documentation in
order to identify every required change. The problem becomes tractable
because of this dataflow analysis.

Consolidating the dataflow so that it can all be seen at a glance also
affords insight. For example, a new administration system defines a §7702A
material change as payment of unnecessary premium. Seven 7PPs accumulate
to exactly the NSP, so DCV can't exceed NSP in a seven-pay period without
producing a MEC; that is, all non-MEC premium is necessary. Therefore, no
material change can be recognized in the first seven years: the initial
7PP is locked in despite any coverage increase. We suspect that the new
admin system behaves in this way, which was not necessarily foreseen; it
occurred to me only as I was using Tableau MEC-2 to create test cases.

lmi mailing list
[hidden email]