The Rule Game server and the suite of tools that come with it offer a variety of tools to access and process the accumulated game data. Depending on one's needs and inclinations, one can simply use the data as they are, by reading the transcript CSV files and sending queries to the SQL database; one can copy the entire data set (CSV files and SQL tables) to your own computer; one can divide transcript files into sections associated with different rules sets, and add to them informaiton about the statistical difficulty of correct moves.
This document describes one of the simplest tools, the script export-transcripts.sh. Its primary purpose is to select the transcript files from the participants who satisfy your criteria (e.g. "everybody listed in study summary files on Prolific"; "everybody who participated in experiment plans FDCL/basic, and whose player ID started with prolific- or RU-", etc). Additionally, it will also extract the selected players' rows from the tables in the MySQL database, and will export them in CSV format. All files so selected or produced will be put into a new directory, so that you can ZIP it and then transfer the ZIP file to another computer for further analysis.
The script export-transcripts.sh takes most of the same command line arguments as does analyze-transcripts.sh, allowing selecting records based on experiment plan name, player IDs, user ID (for repeat players), etc. This section discusses a few examples that are likely to occur in your practice.
If you run an "experiment" on Prolific, Prolific stores the list of participants for each "study" included in the experiment. You can download that list by going to the study's detail page, opening the "Actions" pop-up menu, choosing "Download summary", and saving the CSV file offered to you.
The download summary file is in CSV format with many columns; it looks like this:
% more prolific_export_671b8ae84509b9a624b95e13.csv Submission id,Participant id,Status,Custom study tncs accepted at,... 671b8e95b17f88050648883e,5e9088a8a4b6ed018b450000,APPROVED,Not Applicable,.... 671b8e98fd64b1b65fa1a726,5c655e3c42faef0001280000,APPROVED,Not Applicable,.... ....The name of the file contains the Prolific study ID, while the second column of the file contains the participants' Prolific PID. (Note that it only includes the real Prolific participants, to whom you pay via the Prolific platform; it does not have an entry for yourself doing "preview as participant" of the study).
When the Prolific participants play our games, their Prolific PIDs are converted to our Game Server player IDs, which combine the study ID and the Prolific PID, e.g. prolific-671b8ae84509b9a624b95e13-5e9088a8a4b6ed018b450000.
Once you have downloaded the study summary files for your experiment to your PC, you need to upload them to the Game Server host (action.rutgers.edu) using sftp or scp. Put them into the same directory under your home directory, e.g. fdcl-01.
Log in to the Game Server host using ssh, go to the directory where you have put the summary files, and run the export transcripts script:
cd fdcl-01 /home/vmenkov/w2020/game/scripts/export-transcripts.sh -pid -file prolific_export_671*.csv
If the script runs successfully, you will find that it has created a directory named tmp, with the following content:
% ls -l tmp total 8 drwxr-xr-x 75 vmenkov staff 2400 Nov 6 19:54 detailed-transcripts/ -rw-r--r-- 1 vmenkov staff 928 Nov 9 15:22 pid.csv drwxr-xr-x 4 vmenkov staff 128 Nov 9 15:22 sql/ drwxr-xr-x 75 vmenkov staff 2400 Nov 6 19:54 transcripts/
Here, the subdirectories transcripts and detailed-transcripts contain copies of the relevant CSV files from Game Server's saved-files directories of the same names, with any duplicate episode records removed. The file pid.csv contains the list of the Game Server player IDs, and the subdirectory sql contains relevant extracts from the SQL server's tables as well as the scoring date (see the section "Score data" below)
You can now make a ZIP file containing the entire content of the directory tmp:
zip -r tmp.zip tmpand then transfer tmp.zip to your personal computer, or any another host, for further analysis. After that, you can remove the directory tmp the Rutgers server, or rename it somehow, so that its content would not be confusing for any future run of the export transcript tool at the same location.
Suppose you ran an experiment without using Prolific, by directly sending an invitation URL to a group of students. You have ensured that all participants' player IDs match a certain pattern, e.g. they start with RU-FDCL-basic-. In this example we also assume that you want to include all players with a matching ID into your analysis, because you have never used that exact prefix for your preview. In this case, you can use the following command line:
/home/vmenkov/w2020/game/scripts/export-transcripts.sh -pid 'RU-FDCL-basic-%'(Note the single quotes and the percent sign, rather than asterisk, in the player ID pattern. This is the syntax needed to instruct the MySQL server to carry out "pattern matching", which is similar to wildcard expansion for file names in a UNIX shell, but uses different special characters.)
The rest of the processing is the same as in Example 1.
What if all player IDs that you're interested in all start with the same prefix... but not all player IDs that start with that prefix are relevant? In other words, you want to exclude some player IDs from your analysis, for example because they are associated with your own testing, rather than with recruited players?
If that is the case, you may need to hand-edit the player list. First, prepare the list of all player IDs matching the pattern. In this example, you'll start by putting that list into the file my-pids.csv in your home directory:
cd /opt/w2020/saved/transcripts /bin/ls RU-FDCL-basic-*.transcript.csv > ~/my-pids.csv
After that, go to your home directory and edit the player ID list file, removing the IDs you want to exclude. You can use your favorite text editor (emacs, vi, vim, nano... whatever!)
cd emacs my-pids.csv(All lines in this file end with the suffix "transcript.csv", but that's OK; the export script will ignore this suffix, and look only at the player ID itself).
After that, you can feed the CSV file to the export transcript tool, much like in Example 1:
/home/vmenkov/w2020/game/scripts/export-transcripts.sh -pid -file my-pids.csv
The subsequent processing is the same as in Example 1.
What if an experiment involved both players recruited from Prolific (listed in Prolific study summary files) as well as locally recruited players (identified by their player ID prefix)? In this case, to analyze all players' data together, you can combine the approaches of Examples 1 and 3.
/home/vmenkov/w2020/game/scripts/export-transcripts.sh -pid -file prolific_export_671*.csv my-pids.csv
The subdirectory sql contains relevant extracts from the SQL server's tables, as well as 2 files with computed numbers, playerStats.csv and scores.csv:
% ls -l total 436 -rw-r--r-- 1 vmenkov users 374594 Nov 29 23:47 Episode.csv -rw-r--r-- 1 vmenkov users 40133 Nov 29 23:47 PlayerInfo.csv -rw-r--r-- 1 vmenkov users 21470 Nov 29 23:47 playerStats.csv -rw-r--r-- 1 vmenkov users 743 Nov 29 23:47 scores.csv
This file contains some processed information about each player's interaction with the first rule set of his trial list (i.e. the rule set for which he was a "naive player"). The content looks like this:
#playerId,trialListId,seriesNo,rule,maxToRemove,episodes,attempts,removed,couldAlsoAttempt,couldAlsoRemove,finishCode,xFactor prolific-671b8ae84509b9a624b95e13-5e9088a8a4b6ed018b45ba7b,basic-02-A,0,quadNearby,27,2,12,11,16,16,5,4 prolific-671b8ae84509b9a624b95e13-5e4d01bbca55661437cd235b,basic-03-A,0,quadNearby,27,3,46,27,0,0,1,0 prolific-671b8ae84509b9a624b95e13-5c655e3c42faef0001283e77,basic-03-B,0,quadMixed1,27,3,93,27,0,0,1,0 prolific-671b8ae84509b9a624b95e13-5e64e9aae18d1524c7743b5d,basic-04-A,0,allOfColOrd_BRKY,27,3,37,27,0,0,1,0 prolific-671b8ae84509b9a624b95e13-613018599782583980fbf406,basic-04-B,0,allOfShaOrd_qcts,27,3,36,27,0,0,1,2 ...
The columns' values are as follows:
Aggregate information about the naive player's performance on each rule set.
sql> more scores.csv #rule,n,avgAttempts,score,scoreTimesObjectCnt,x allOfColOrd_BRKY,10,59.1000,2.2074,59.6000,0.30000000 allOfShaOrd_qcts,11,51.1818,1.9327,52.1818,0.50000000 ccw,10,47.8000,1.9333,52.2000,0.45000000 cm_RBKY,9,36.1111,1.4897,40.2222,0.77777778 col1OrdBuck_BRKY0213,12,117.6667,4.3580,117.6667,0.20833333 col1Ord_BRKY,10,54.4000,2.0985,56.6591,0.35000000 colOrdL1_BRKY,10,95.4000,3.6565,98.7252,0.15000000 cw,11,56.1818,2.2803,61.5675,0.50000000 ordL1,22,65.6364,2.6875,72.5625,0.56818182 ordL1_Nearby,10,84.5000,3.2741,88.4000,0.65000000 ordRevOfL1,10,70.8000,2.7000,72.9000,0.55000000 ordRevOfL1_Remotest,11,113.0000,4.4607,120.4393,0.40909091 quadMixed1,10,81.7000,3.1303,84.5172,0E-8 quadNearby,21,52.5238,2.0970,56.6190,0.54761905 sha1OrdBuck_qcts0213,12,97.1667,3.9731,107.2727,0.58333333 sha1Ord_qcts,11,54.0909,2.1246,57.3636,0.40909091 shaOrdL1_qcts,10,80.5000,3.1074,83.9000,0.40000000 sm_csqt,10,47.4000,1.8815,50.8000,0.65000000
The semantics of the columns is as follows:
The corrected score for a rule set is computed as
score = sum( attempts(P) + couldAlsoAttempt(P) ) / sum( removed(P) + couldAlsoRemove(P)),with the summation over all players P who played that rule set as the first rule set in their trial list.
The values being summed are defined as
In other words, for each "early winner" (somebody who played so well that we ended his game) we have notionally "extended" his transcript as if, instead of being stopped, he continued removing all game pieces in all episodes of the series without making any additional errors. (This may be a bit over-optimistic, of course, but not as over-optimistic as simply counting that player's moves without regard to how early he was stopped). For other players, we take their actual transcripts (which may include fewer than maxToRemove (=27 in FDCL/basic)) removed pieces, if they formally gave up or just walked away). Then we compute the weighted average of attemptedMoves/piecesRemoved over all n players.
The value of this "corrected score" would be 1.0 if nobody made any errors, and is proportionally higher as people make more errors. If every player had played all 3 episode to completion, thus removing exactly maxToRemove=27 game pieces, then scoreTimesObjectCnt=score*maxToRemove would be exactly equal to avgAttempts (which, I think, is pretty much the score used by Jacob).