Cumulative curves

Updated 2025-12-31, for Game Server 8.048

The script analyze-transcripts-curves.sh draws cumulative error curves for human players sharing a particular experience. It pulls the data for a group of players from the data store (the SQL database and transcript files) and produces SVG images displaying the individual players' cumulative errors curves and their median curve.

This script is to be used in much the same way as analyze-transcripts-mwh.sh. It takes the same arguments related to choosing the players (e.g. by specifying an experiment plan or supplying a list of players) and identifying distinct experiences (-precMode) as that script.

Support for the -export and -import options has the same form as well, although the exported CSV file now has an additional column, containing the summary of the each series' transcript (in terms of the sequence of "e[j]:p0[j]" pairs for all moves in the series). Additionally, several options related to drawing curves can be provided.

Specifying the curve plotting modes

The option -curveMode yVar:xVar specifies what you actully want to plot. The possible values are as follows.

In all formulas, m is the move count (more precisely, the number of successful and failed moves and failed picks) since the beginning of the series. (Successful picks are entirely ignored, as if we deem them to be merely "slips of the fingers", or actions of a player who does not understand what he needs to do). q is the number of game pieces removed so far in the series, i.e. the number of successful moves.

yVar specifies the variable plotted on the vertical axis. The possible values are:

W = Raw error count, W(M)=Sum_m(e_m). One would expect E(m) for a random player to increase roughly linearly with m, and for successful learner, to stabilize at some value (the total number of errors the player makes before he fully learns the rules).
AAI = Error count normalized error prob, AAI = Sum_m(e_m)/Sum_m(1-p0(m)). One would expect the average AAI(m) for a grouop of random players to stabilize around 1.0, and for a successful learner, to decrease as const/m.
AAIB = AAI * m. One would expect the average E(m) for a large group of random players to be equal to m, and for a successful learner, to stabilize at some value (which is the total number of errors the player makes before he fully learns the rules, divided by the "average error rate of a random player").
AAIC = A different type of error count normalization, AAIC = AAI * C, where C is the number of good moves so far (i.e. the number of game pieces that have been removed). This can be viewed as a "regularization" of Paul's AAID metric. An average AAIC(C) curve for a group of random players behaves much like y=x.
AAID = Paul's proposal, a bit more complicated than AAIC. (See below).
AAIE = like AAIC, but with the y value "frozen" for the duration of each correct-move streak
AAIG, AAIH: see "C-based metrics" below.
ALL = produce multiple plots, one for each of the above modes

AAID (proposed by Paul on 2025-11-19): Can there be an "normalized to start of good runs" curve for each different player that is on the C vs. W plot?

sumW=0;
sumE=0;
sumC=0;
C=0; 
plot(0,0);
at each move that is either  correct or wrong (not ignored)
  if wrong: sumW+=1
      sumE+=(1-p)  
      no point on plot
  if correct: sumC +=1
     if preceding move was correct
                  plot  (sumC,y)
     if preceding move was wrong
                 y=(sumW/sumE)*sumC
                 plot(sumC,y)
     if this is the first move, plot (1,0).

I think if the player moves randomly, this plot will be around the 45 degree line, whether the rule is hard or easy.
If the player starts a run of correct moves, the y-value will become flat, and will stay flat until a wrong moves.

if the player makes a wrong move, the trace of errors will be smoothed, by averaging against the whole run, from the start. I think but am not sure, that this approach will not produce strange dips at the end of a run of correct moves.

AAIE is very similar to AAID, except that sumE includes (1-p) for all moves, not just the incorrect ones. Thus,

sumW=0;
sumE=0;
sumC=0;
C=0; 
plot(0,0);
at each move that is either  correct or wrong (not ignored)
  sumE+=(1-p)  
  if wrong: sumW+=1
      no point on plot
  if correct: sumC +=1
     if preceding move was correct
                  plot  (sumC,y)
     if preceding move was wrong
                 y=(sumW/sumE)*sumC
                 plot(sumC,y)
     if this is the first move, plot (1,0).

xVar specifies the variable plotted on the horizontal axis. The possible values are:

M = m, the move count (more precisely, the number of successful and failed moves and failed picks) since the beginning of the series.
C = c, the number of game pieces removed by the player since the beginning of the series. (This is also equal to the number of successful ["correct"] moves).
ALL = produce multiple plots, one for each of the above modes

The default value of curveMode is AAIC:C, meaning plotting AAIC as a function of C.

An advantage of using C instead of M as the variable on the horizontal axis is that in a game where most players clear all boards presented to them (i.e. no incentive scheme, and no "give up" option), all curves will end at roughly the same value of C, so a median would be much better defined than it would in a plot where the variable being plotted is plotted with respect to M.

If -curveMode has been specified as ALL.ALL (or just as all, for short), the tool will plot the curves for all supported yVar.xVar combinations, placing each family of plots in an appropriately named subdirectory of the out directory. E.g. the plots of AAI vs. C will be in the subdirectory AAI_C.

C-based metrics

Background. I understand that the properties we'd like to see in a metric useful for comparing different rule sets (or, generally, "experiences") on a given human player population are as follows:

(1) Monotonicity: the cumulative curve for any player is monotonically non-decreasing. (Such as the raw W count is, with respect to any argument, be it M or C).

(2) Flatlining: It would be nice for a cumulative curve for a "learner" (someone who learns to play without making any more errors after some point) to end in a horizontal segment, so it would be easy to extrapolate it by a horizontal line, or to describe it by a single number (such as W, for which the final value can be called is Wstar). AAIB, for example is designed this way.

(3) Comparability: In order for different rule sets' data to be comparable on the same graph, it would be desirable for the metric to be defined in such a way that the curve for an "average random player", however defined, behaved in the same way (e.g. as y=x) for all rule sets. This was the idea for AAIB_M, or AAIC_C. Again on AAIC. The AAIC_C curve satisfies (1) and (3), but does not quite meet (2): a "learner's" AAIC curve would have a horizontal asymptote, but the curve itself (in AAIC_C coordinates) will be a hyperbola approaching that asymptote.

Additionally, the mapping of the height of the cumulative curve in "raw" W_C coordinates to that in AAIC_C coordinates is very non-linear. That is, curves that are very different in the W space (you can easily say that rule set R1 is much more difficult than R2, when one compares the ratio W(human players)/W(random players)) may look much less different in the AAIC_C coordinates; on the AAIC_C plot, all these curves are all much closer to the y=x line, and don't look very different from each other. The idea of a C-based metric. What if, instead of focusing on moves, we concentrate on removed pieces. That is, one can ask:

for a given board state, what is the average number of move attempts mu that a random player would end up making in order to remove my piece from this board? (This is easily computable).
for a given board, what is the number of move attempts that a given human player ended up making in order to remove my piece from this board? (This is obtainable from the logs).
can we use some kind of a ratio of the latter divided by the former to measure how much better our human players did on a given rule set than random players would have done?

AAIG is pretty similar to Paul's original AAIB, but conceptually based on summing over removed pieces (i.e. over good moves) rather than over all moves. The metric is the ratio of two cumulative values, that is
AAIG(C) = ( W / Sum_b( mu(b)-1) ) * C
Here, the sum is over the C intermediate board states (over the C removed pieces), and thus is meant to estimate the number of errors that a random player would need to change that board state (i.e. to remove the piece). By this design, AAIG_C for random players is meant to approximate C, thus satisfying (3). For a game where mu is constant (e.g., for shape matching or color matching or quadNearby, mu=4 in the COMPLETELY_RANDOM player model), AAIG(C) = W / mu. This obviously satisfies (2) (learners flatlining), although for games with varying mu (like ordL1, or various "one color at a time" games), this flatlining is only approximate. Also, the condition (1) (monotonicity) only holds for games with constant mu (e.g. http://action.rutgers.edu/tmp/out-all-ignore-COMPLETELY_RANDOM/pairs/AAIG_C/FDCL/basic/buckets_2130-quadMixed1.svg ); for variable-mu games (like ordL1) you get a lot of somewhat unsightly artifacts, since the C/Sum(mu) ratio varies during an episode, even as W (for a learner after passing mStar) stays constant. (E.g. http://action.rutgers.edu/tmp/out-all-ignore-COMPLETELY_RANDOM/AAIG_C/FDCL/basic/allOfColOrd_BRKY.svg http://action.rutgers.edu/tmp/out-all-ignore-COMPLETELY_RANDOM/pairs/AAIG_C/FDCL/basic/allOfShaOrd_qcts-shaOrdL1_csqt.svg )
AAIH. This is an enhancement on AAIG, involving two ideas: computing ratios for individual moves, and removing "trivial moves" from consideration.
AAIH(C') = Sum_{b over C'} ( w(b) / (mu(b)-1) )
Here, w(b) is the number of wrong attempts the player made when removing a piece from a given board state, and mu(b) is, again, the avg number of errors a random player would make for removing that state. The summation is over the board states that includes "trivial moves" (when mu=1, i.e. a wrong move is impossible, such as for the first piece of cw or ccw, or the last piece of ordL1). C' is also defined as including only "non-trivial" moves (thus, for cw or ordL1 we have C=24, rather than 27, for our 3-board experiment; for methods such as allOfShaOrd_qcts, it is a variable number, always <= 24, since some players may end up ending the game with just several squares on it, and no thinking needed). http://action.rutgers.edu/tmp/out-all-ignore-COMPLETELY_RANDOM/pairs/AAIH_C/FDCL/basic/allOfShaOrd_qcts-shaOrdL1_csqt.svg
This formula guarantees monotonicity (since it is a simple non-decreasing sum), flatlining (since no new terms are added once the player makes no more errors), and it is meant to approximate AAIH(C')=C' well for random players. Besides, explaining the metrics is easy in the terms of the ratio of the number of errors made by Prolific players vs random players. So I think it's the best of our formulas so far.
The final C' for AAIH. One small complication with this formula is that when we look not only at the learners, but at all players (thus striving to e.g. see if a bimodal distribution appears, or comparing the distributions for 2 rule sets), the question arises, Where (at which C') exactly to put the end point (for the error bars, or the end of the shading area)? For constant-mu rules, like sm/cm/quadNearby this is not an issue, since every non-learner who does not give up reaches C'=C=27. For a rule that has always the same number of trivial moves (e.g. cw/ccw/ordL1), we also have the same C' (24) for all non-learners. But for a rule set such as allOfShaOrd_qcts, C' varies slightly from player to player, since some got easier boards than others. Now, proactively, this could be handled by ensuring that all player are given boards with the same number of trivial moves (e.g. all boards in allOfShaOrd_qcts have exactly 2 squares). But to handle already collected data, I have imposed a simple heuristic, which simply places a final bar at some point where some non-learners' curves start disappearing. That could be as low as C'=19 or 20 in some rule sets.
Output

The tool creates directory out; in it, subdirectories corresponding to display modes (W_M etc). Within each of them, plots are written as individual SVG files, each file name corresponding to the identifier of the "experience" reflected therein. SDepedning on the -precMode, that may be just the name of the rule set, or a combination of strings including the preceding rule sets. In the latter case, the components of the file names are separated by dots (because using semicolons would not be appropriate in file names).
Identifying learners. Behavior of learners' curves. Extrapolating

The plotting tool uses the same criterion as the MWH tool (-targetR or -targetStreak) to identify "learners", i.e. players who appear to have "mastered the rules" and have demonstrated this mastery by making a sequence of correct moves. The two options (-targetR or -targetStreak) correspond to Paul's two incentive schemes (LIKELIHOOD or DOUBLING, respectively); if the data you analyze have come from an experiment in which one of these two incentive schemes was employed, you may want to use the corresponding option, with the appropriate value (as per the experiment's parameter sets), so that the tool's identification of the players are learners or non-learners matched the identification of them as such made in real time by the Game Server.
W, AAIB

In W and AAIB plots, the curves for the identified "learners" can be easily identified, because on the graph the curve is extrapolated, by a solid horizontal line, beyond the point where the player stopped playing.
The solid line shows how the curve would hypothetically continued if the player kept playing, without making any more errors than Wstar, the number of errors he had made by the point of reaching mastery. The position of the solid line can be theoretically computed as W=Wstar, or (for a game with a constant p for all moves), AAIB=Wstar/(1-p).
In the AAI plots, there is no extrapolations, because I did not bother drawing hyperboles (for AAI(m)=const/m).
AAID, AAIE

In AAID and AAIE plots, the curve for a learner becomes a horizontal line after the last error has been made, because of the design of these metrics: the "freeze" the metric value during an error-free stretch. (Of course, should even a single error happen later, the metric will make a huge jump forward, more or less toward the hyperbole of AAIC). Thus on the plots we also draw horizontal extrapolation lines.
AAIC

For AAIC, the AAIC_C curve of a learner, after the player has mastered the rule, is either a hyperbole (for a game with constant p for all moves), or an approximation of such a hyperbole. Consider a game with a constant p, e.g. quadNearby, where the probability of a good move for a COMPLETELY_RANDOM player is p=0.25. One can show that for a perfect learner the post-mastery curve is
AAIC(C) = (1/(1-p))*W*C/(C+W) = (W/(1-p)) * (1 - W/(W+C)),
where W is Wstar, i.e. the number of wrong moves the player had made before achieving mastery. This is a section of hyperbole, that passes through (0,0) and runs upward, approaching the horizontal asymptote AAIC = Wstar /(1-p), which for quadNearby (and, very nearly, foorr cw or ccw) is (4/3)*Wstar. (Incidentally, this is also the value which AAIB has for a learner once he has achieved mastery).
For a game where p varies from one move to the next, the "future behavior" of a curve for a perfect learner (after he has stopped actually playing) is, in general, not perfectly predictable, since for some more complicated games their randomly generated boards may come with different distribution of p. Still, it stands to reason that if the player kept playing (without making more errors), his AAIC curve would
Specifying the median plotting mode

-median Real -- the default mode. For every segment [m,m+1](or [q,q+1], as the case may be), the median curve is drawn as the median of the actually recorded values of the plotted value at m amd m+1 (or q and q+1).
-median Extra -- For every segment [m,m+1](or [q,q+1], as the case may be), the median curve is drawn as the median of the actually recorded or extrapolated values of the plotted value at m amd m+1 (or q and q+1). (Extrapolated values only exist for extrapolated curves, i.e. those for "learners").

The median curve so constructed typically has discontinuities at the points where some of the participating players' curves end. (For example, if we records of 10 players up to m=30, but of only 9 players at m=31, then the median curve will likely have a discontinuity at m=30, since to the left of that point the median is constructed as the median of 10 functions, and to the right of that point, as the median of 9 functions).
You probably only want to use the -median Extra mode when you're working with a data set that includes only learners, i.e. one where all curves are extrapolated. Otherwise, you'll see that the median curve changes in an "unnatural" way once the "non-learners'" curves end and only extrapolated "learners'" curves remain.
Example

This example can be fined in the script /home/vmenkov/curves/curves-ignore.sh
#!/bin/csh rm -rf out #-- Extract all records for players who played plan "FDCL/basic". Save them to a CSV file, and also produce the default set of curves /home/vmenkov/w2020/game/scripts/analyze-transcripts-curves.sh -precMode Ignore -targetR 1000000 -export all-ignore.csv FDCL/basic > tmp.log mv out out-all-ignore #--select players who have "learned" head -1 all-ignore.csv > learned-ignore.csv awk -F ',' '$7 == "true"' all-ignore.csv >> learned-ignore.csv #--draw curves for learners, /home/vmenkov/w2020/game/scripts/analyze-transcripts-curves.sh -precMode Ignore -import learned-ignore.csv -median Extra > tmp-2.log mv out out-learned-ignore #-- list the directories with SVG files du out-all-ignore du out-learned-ignore

I have copied the output directories (for all players, and for learners only) to http://action.rutgers.edu/tmp/out-learned-ignore/ http://action.rutgers.edu/tmp/out-all-ignore/
See also curves-every.sh in the same directory. It's output has been copied to http://action.rutgers.edu/tmp/out-learned-every/ http://action.rutgers.edu/tmp/out-all-every/
Random players

As Paul requested,
"(d) it would be good to show the purely random line in some uniform way because scales may change."
The "purely random line" was interpreted in a slightly broader sense, as the one that would describe the median curve of a large population of players described by the same random player model that is used to compute p0 in this particular series of plots. (At present, the Game Server and its analysis tools support two such models, COMPLETELY_RANDOM and MCP1. The former is, I think, a fine candidate for producing what you describe as a " purely random line").
Naturally, the behavior of the "purely random line" will depend, in general, not only on the rule set, but also on the mix of the initial boards being played. (The rule set ordL1 is a good example here: http://action.rutgers.edu/tmp/out-all-ignore-COMPLETELY_RANDOM/AAIB_C/FDCL/basic/ordL1.svg ; the random line is the blue short-dashed one). Therefore, an effort was made for the "purely random" line to be based on the same boards on which the actual players played. For every "series" (rule set + player) combination in the analysis, several (3) random players were created, and each one was made to play the same sequence of boards that the real player did. That created a sufficiently large population of random players whose curves can be meaningfully averaged (or, actually, "medianized").
One can see such sample curves for the FDCL/basic rule sets in
http://action.rutgers.edu/tmp/out-all-ignore-COMPLETELY_RANDOM/ http://action.rutgers.edu/tmp/out-all-ignore-MCP1/ http://action.rutgers.edu/tmp/out-learned-ignore-COMPLETELY_RANDOM/ http://action.rutgers.edu/tmp/out-learned-ignore-MCP1/
(In these runs, precMode=Ignore, i.e. the experiences of all players who played a particular rule set are put together, regardless of what rule sets they may have played previously).
Theoretical discussion of random lines:

W_M (wrong moves vs all moves). This is easy to theoretically analyze for many examples; e.g. for quad match (Nearby or Mixed), cw/ccw, color or shape match you'd expect W = 0.75 *M, and this is indeed displayed. For ordL1, to remove 9 pieces, one would need, on average, approximately 9 + 8 + ... + 2 + 1 = 45 moves (summing from the last move backward) moves, thus giving the W/M ratio of (45-9)/45 = 0.80, and indeed W = 0.8*M is more or less what's displayed.
W_C (wrong moves vs good moves). For quad match / sm / cm / ccw etc, we have W = 3*C. For ord1, this produces a sequence of upside-down parabolas, since all boards are of the same size, and each successive piece is easier to remove.
AAIB_M: theoretically, one should expect the average of AAIB = M for a random player (since the same random player model is used in the p0 computation, and in the random play simulation!) on all rule sets, and the displayed curves are indeed fairly close to that. In some cases the displayed curve is a bit higher than expected, and I would not mind looking deeper into that.
AAIB_C: theoretically, that should approximate the curve of M vs C (all moves against correct moves). Since M = C+W, this is basically the addition of the curve in W_C and the y=x curve. E.g., for cm/sm/ccw etc, we get M=4*C, and that's what the random curves show. For ordL1 and its derivative rules sets, we have the expected sequence of upside-down parabolas.
AAIC_C: very good approximation of the AAIC=M straight line
AAID_C, AAIE_C: a bit of corruption of the above, primarily due to the step-wise character of each player's curve.