(page in progress)
Rapid Eval estimates both efficacy and engagement during early playtests. It is less accurate, quicker and less expensive than traditional pilot testing.
Goal: reduce rejection sensitivity in boys 10-16. Specifically:
- Desensitize: feel rejection in game events, and get used to handling it well.
- Internalize a new mental model of rejection and recovery from game mechanics
- Consciously notice that strategies for our game can help in real life (Gamer Self)
- Seed parent/teen conversations about these themes.
- Real-world rejection via “lost connection” features
Timeline: 6 month dev/test timeline whose key features were:
- build an early prototype
- hold a workshop
- iterate on prototypes with early evaluation
- build final prototype, in pilot study
September workshop (& shift in direction)
As the Workshop Report in the Appendix describes, the two-day workshop/game jam in September changed the project design signficantly. We targeted PC not mobile first; reduced scope from the 5 points above to only points 1 and 4, and shifted focus to father/son as the best way to generate maximum effect sizes; and reset the prototype design to use more multiplayer interaction, less story-driven design.
At the end of the game jam, we created “v3”, a prototype of the battle mechanic that proved to be highly engaging with teens in the playtest, and a four-step process that embodied rejection (not shown)
At the core of the new vision was starting dad/son conversations about rejection. For example, we hoped future dads would say this about the game: “It’s a game about mastering multiplayer politics. There’s RPG mechanics and my son is good at those, but he often gets booted “for no reason.” …but he missed something or did something that pissed off the other players – I can find strategies to keep them onside, and together we get through the level. But it’s tough too – one level he got rejected over and over, and he got pretty mad – and we ended up talking about this girl that never called him back. At this age, anything that opens him up like that, I love.”
Execution of “Rapid Eval”
From October to January 2016, core team members (Josh, Kristen, and Marek) with regular support and guidance from psychologists (Isabela and Anouk) and NMV staff, built and tested the vision and prototype from the workshop, using the “rapid eval” method described below.
This work culminated in “Scrollquest v11”, a prototype of a 2-4 player game for PCs.
Player experience tutorial:
https://www.youtube.com/watch?v=OzPJQ23SS6M
Pilot Study
The project concluded with a traditional pilot study of the final prototype. The pilot study consisted of N=10 evaluations of the final v11 prototype from December 26 to January 6 (see appendix for details).
Development Team
The project was led by Josh Whitkin and Isabela Granic. Dr. Whitkin is a designer/researcher with 20 years of commercial and academic experience building and studying video games for good. Dr. Granic is a psychologist with clinical and research background who has designed acclaimed video games for mental health. See appendix for more complete bios.
During Rapid Eval, the core development team was led by Josh, with regular guidance from Isabela, and later psychologist grad student Anouk. Playtesting was done by RA Kristen Barta, with programmers Marek Vyzamel, Dayvid Jones, and Mathieu Allaert, and irregular advice from Valve advisors Robin Walker and Brian Jacobsen, programmer. Robin Walker was a key influence to the design. A senior game designer at Valve with credits on Portal 2 and many other world-famous million-selling video game titles, Robin is one of world’s premiere game designers.
Core team role (FTE during Rapid Eval)
- lead design / production: Josh (75%)
- playtest recruiting, testing: Kristen (50%)
- Programmer: (50%-75%)
- Psychologist: Isabela (5%)
Plus Support Team: The project enjoyed support from a multidisciplinary team of consultants plus volunteer advisors from Valve, U of W.
Specific Goals
Within the broad project goal of “rapidly and affordably generate and evaluate a variety of mobile games that build children’s social emotional skills and resilience”, are these specific goals:
- Build and Test the “Rapid Eval” development method
- Prototype a video game-based product that is….
- Engaging: Will it attract and retain users?
- Effective: Will it effective in its prosocial aims[1]?
- Marketable: would it be financially self-sustaining?
Method: “Rapid Eval”
Rapid Eval estimates both efficacy and engagement during early playtests. It aims to be less accurate and much quicker than traditional pilot testing, as shown in this chart:
Traditional Pilot | Rapid Eval | |
Purpose | reliable findings for all stakeholders | Rough estimate of efficacy, for developers only |
Intervention design | Fixed during pilot | Constantly evolving |
Precision of outcomes | reliable enough for publication | “order of magnitude” estimate |
Iteration speed | 1 – 3 months | 1 – 2 weeks |
Scope | business & scientific | scientific |
Leadership/Team | Subject Matter Expert | Game Designer/PM |
The project conducted both the Rapid Eval study and a traditional Pilot study, so we could compare and confirm findings. The pilot study consisted of 10 playtests conducted in two countries using the final v11 prototype from December 26 to January 6 (see appendix for details), and largely confirmed what Rapid Eval method found.
During Rapid Eval, we iteratively built and tested 12 distinct prototypes (v4.0 through v11). We conducted 3.7 playtests per week for 11 weeks from October 2015 through January 2016, 47 playtests in total. During each week, we:
- reviewed past week’s playtest
- developed feature requirements
- built those features
- modified playtest protocol
- conducted new playtests
At times, our weekly cycle was stretched to two weeks to accommodate major feature changes.
This chart details the features added in each version, and the reason for the change (Purpose column):
date | build version | key changes | purpose |
8/10/2015 | v1 | tech test: thought dialog test, dating | tech demo |
9/10/2015 | v2 | thought dialog test, bidding (never shipped properly) | tech demo |
9/11/2015 | v3 | version from game jam: battle (no effects expected) | engagement |
9/17/2015 | v3.1 | game jam, plus josh’s small fixes: chests healed, (no effects expected) | engagement |
10/22/2015 | v4.0 | first release. networking, 4 scenes. | tech test |
10/22/2015 | v4.1 | second release. minor fixes to networking | any effects via mechanics |
10/26/2015 | v4.5 | tuning: reduce starting gold to 10, added gold animation. | any effects via mechanics |
10/29/2015 | v5.0 | added rejection mechanics: co-op healing, instructions, less easy | deepen effects via iteration |
11/5/2015 | v5.2 | add depth for iteration: harder monsters, tools for confederate play, more levels | deepen effects via iteration |
11/15/2015 | v5.5 | text chat, ghost wall | nuanced communication to rejection event; engage unskilled players earlier |
11/24/2015 | v6.2 | rewards for rejection puzzle solutions | new conversation topics |
12/2/2015 | v6.5 | whisper before battle, no reroll – two confederate protocol | amplify rejection experience |
12/14/2015 | v10 | First playable version of modeless world. | amplify rejection experience |
12/18/2015 | v10.5 | add select traits. remove random attributes. | amplify rejection experience |
12/26/2015 | v11 | “Feature complete” | fix bugs |
In “Rapid Eval”, we measured efficacy by conducting weekly 1-hour playtests and structured interviews. Two coders independently coded observations of teens and parents play behavior and verbal statements. We evolved our protocol along with the product, as detailed in the chart below.
Tutorial, explaining player experience of prototype:
https://www.youtube.com/watch?v=OzPJQ23SS6M
A sample evaluation is shown in a short video: https://www.youtube.com/watch?v=lGnsZcI1FbI. Note this was done mid-project with a scope of aims that is different from the final pilot test.
Findings
Rapid Eval method is Valuable
Compared to our pilot study, we found the “Rapid Eval” method to be beneficial as predicted, though unexpected limitations were found. Key benefits:
- We discovered efficacy problems earlier than traditional pilots did
- g. at v4, we had built a playable prototype that revealed a key problem with efficacy: players were dismissing rejection event as “just a game”. from v4 to v11, we iterated on that key problem. WIthout Rapid Eval, we may not have detected the problem until the final pilot.
- Our efficacy measurement method was not accurate enough.
- We needed larger sample sizes. We had 3.7 playtests per week. but needed 20 per week. This caused problems. e.g. we incorrectly thought we may have saw “Promising SIgns” at v7, but later findings and final pilot suggest those signs were a statistically insignificant observation.
- Short, single-session playtests, multiple plays over weeks
- Online playtesting is a good fit for Rapid Eval
- We got a representative sample of national population, vs local ‘bubble’ recruiting
- Online testing was convenient way for team and participants alike
- We got benefit of ‘in home’ settings at low cost (normally very expensive to travel to homes)
- Low cost – no need for equipment, research space; cost of recruitment very low
- Recruiting was convenient (mTurk ads)
- We could change recruiting language in ads quickly
Prototype: Engaging and Marketable
We saw very strong signs of engagement from early phases, and did not focus on detailed marketability study, choosing to focus and addressing the lack of efficacy.
We conducted continuous surveys that support the points above. We asked marketing-related questions during every playtest, both formally (self-report ratings) and in discussion (see interviews).
From the workshop prototype onward, we consistently observed strongly positive responses to the fun or engagement of the game itself. In the pilot, after playing, both parent and son rated the experience a 1-5 scale with 1 being “not at all”, 3 being “a little” and 5 being “a lot”
In the pilot, we asked “Does a game that prompts you and your son to tackle puzzles around social mechanics sound appealing?”. Answers mostly support the finding, as these samples show:
Yes, esp. a dual-player game with challenging puzzles. “I’m really excited about the fact that kids like Liam, who have a whole lot of trouble, might have an easy way to figure things out,” and a platform like gaming seems an ideal space. |
No: “if it happens to have a social message, that’s fine, but I don’t want to know about it” |
Yes: “an interesting premise”. choices might be a personality test on one hand and a fantasy escape/chance to play as someone else on the other. |
yes, very much so. He found the personality and puzzle aspects of the game fun. |
These findings were similar to validation we saw during Rapid Eval playtests.
For the limitations and concerns in marketing, a secondary source of data (in addition to interviews) was seen in responses to the mTurk recruiting. We saw 18% conversion (among paid participants): Of 40 people who fit our criteria (had a son 10-17), and who agreed to fill out a survey for less than $1 payment, 7 agreed to a 1-hour playtest for a payment of $15. Among the 33 mTurk users who declined, we found a variety of reasons, as this chart described:
We roughly estimate the total addressable market as 14 million, using these criteria:
criteria | amount | total (m) | source/reasoning | |
all english-speaking 1stworld | 509.4 | Wikipedia | ||
is age 11-16 | 8% | 40 | 7.9% of US pop is teen 11-16 – census | |
not living with 1+ bio parent | -10% | -4 | 36 | est from US census data |
is below lower-middle class | -25% | -9 | 27 | estimated from US census data |
is not a gamer | -16% | -4 | 23 | from ESA and fits Pew study |
is not a Steam active user | -40% | -9 | 14 | estimated based on ESA’s stat: 62% gamers play on PC and 150m active Steam users |
Discussion
Project-Level Discussion
This project contributed significantly to NMV’s capacity to rapidly and affordably generate and evaluate a variety of mobile games that build children’s social emotional skills and resilience. Specifically, it demonstrates that NMV can design, build, test (and discard if ineffective) experimental “blue sky” game-based products.
Perhaps the most valuable outcome of this project, for future NMV projects, is the development of the Rapid Eval method. We found it successful overall, despite clear need for improvement.
Rapid Eval | Success | |
Purpose | Rough estimate of efficacy, for developers | partial success |
Intervention design | Constantly changing | partial success |
Precision of outcomes | “order of magnitude” estimate | fail |
Iteration speed | 1 – 2 weeks | success |
Scope | scientific | success |
Leadership/Team | Game Designer/PM | partial success |
We feel that we successfully demonstrated that rapid iteration of commercial game prototyping with efficacy assessment from psychology. We believe a 2 week iteration cycle is achievable and appropriate for a variety of games for health. We believe we could address the specific shortcomings mentioned below within that cycle.
Future Rapid Eval projects should address the specific shortcomings of this project:
- Precision of outcomes was insufficient to guide development reliably. Sample size and protocol needs improvement, and important design decisions were made on faulty data.
- The project needs a more hours from the subject matter expert (research psychologist, in this case) more than this project had. While many academics contributed their time generously, these donations were insufficient for the pace of Rapid Eval project needs. Many of the Rapid Eval failures may have been caused by the delay and lack of time to investigate and experiment via a paid psychologist consultant with game development experience.
- We improved the measurement methods and protocol as we went, but were perhaps not ambitious enough in changing the intervention itself. We could have addressed “key problem” noted in early findings by trying multiple session protocols, and other intervention designs. We mainly made changes to the product, but kept the way the product was used with teens and parents (single 30 minute playtest, interview).
Discussion of Prototype Findings
The Scrollquest prototype aimed to be effective, engaging, and marketable. It achieved “two out of three.”
We found reason to believe the final game implied by the prototype could be highly engaging and marketable.
- parents want a prosocial game played cooperatively between parents and sons
- this prototype’s core mechanic that could be a commercially successful video game
These are our key concern regarding possible barriers to commercial success inherent in this design:
- Discovery. As with any indie game, it is unclear if the cost of marketing required to raise awareness among targeted users could be recovered from retail sales. We recommend seeking partnership with an entity that has capacity to promote the game among its target users (e.g. Valve, via Steam promotion) to address this concern.
- Parent time: Most parents seemed to find the game fun enough to play for a while, but it’s unclear how long they’d enjoy playing. If they don’t enjoy it for several hours, it’s unclear if that’s enough to time to have the prosocial effects in a native environment.
- Parent uptake: Many parents are familiar with RPGs generally, but found our prototype’s design difficult to learn in 15 minutes. Given the limited time and interest most parents have in playing video games, it is possible a commercial version could be created that is simple enough for parents to to learn, but this prototype does not suggest a particular solution. Further prototyping work is needed.
[confidential information redacted]
Conclusion / Next Steps
- Use CfC’s existing networks and contacts among schools to build partnerships that enable recruiting NMV’s targeted audiences.
- Improve the Rapid Eval method and use it for evaluating future “blue sky” game-based product concepts
- Regarding Scrollquest prototype, additional research is necessary to find an effective mechanic. This research must address the fundamental problem that virtually all boys dismissed the rejection as “just a game” – ie does not transfer to real life. We believe this problem probably requires a very different design, not further iteration, so we do not recommend continued development of the design shown in the Scrollquest v11 prototype; instead we suggest testing some of the basic design assumptions (parent/son, traditional RPG format).
End of Final Report.
Appendices
Appendix: Final Prototype (Scrollquest v11)
Video of game features and functionality is here: https://www.youtube.com/watch?v=OzPJQ23SS6M
v11 dad/son playtest videos are here:
[confidential information redacted]
Appendix: Recruiting Detail
We experimented with a variety of recruiting methods.
Recruiting Method
|
Phase A – Contact owners of MeetUps ; local parent-attended youth sports events |
Phase B – Post advertisements in gamer forums and local community online forum (NextDoor) |
Phase C – Customer Discovery Ninja (CDN) (phone interview natiionwide recruitment) |
Phase D – Mechanical Turk; personal via teacher connections |
Phase E – improved mTurk recruiting; any kids not just shy |
Phase F – Personal; tabling recruitment; 10 years old OK |
Phase G – expand criteria: include moms; personal contacts; Dutch |
These experiments continued throughout Rapid Eval, and mapped to specific versions as this chart illustrates:
date | build version | evaluation | audience | recruiting |
8/10/2015 | v1 | exploratory | teen | (internal) |
9/10/2015 | v2 | exploratory | teen | (internal) |
9/11/2015 | v3 | exploratory | teen | (internal) |
9/17/2015 | v3.1 | exploratory | father/son | Phase A |
10/22/2015 | v4.0 | exploratory, slight focus | father/son | Phase A/B |
10/22/2015 | v4.1 | exploratory, focused | father/son | Phase C/D |
10/26/2015 | v4.5 | coding, with exploration | father/son | Phase C/D/E/F |
10/29/2015 | v5.0 | coding, with exploration | father/son | Phase C/D/E/F |
11/5/2015 | v5.2 | coding, with exploration | parent/son | Phase F |
11/15/2015 | v5.5 | coding, with exploration | parent/son | Phase G |
11/24/2015 | v6.2 | coding, with exploration | parent/son | Phase E/F/G |
12/2/2015 | v6.5 | coding, with exploration | parent/son | Phase E/F/G |
12/14/2015 | v10 | improved coding | parent/son | Phase E/F/G |
12/18/2015 | v10.5 | improved coding | parent/son | Phase E/F/G |
12/26/2015 | v11 | improved coding | parent/son | Phase E/F/G |
Appendix: Pilot Study
The pilot study protocol document (including confederate instructions, RA step-by-step procedure, etc) for pilot study
[confidential information redacted]
The pilot data is collected on the Google Sheets document “CfC Scrollquest v11 Pilot Study Data”
[confidential information redacted]
This data includes
- Five items administered prior to playtest. These are used to label participants “Rejection Sensitive” or not.
- RA observations during playtests. These are used to assess in-game behaviors
- Results from interview immediately after playtests. Data from both father and son are capture.
Coding:
RAs were trained to code the following specific behaviors during each playtest:
To what extent did son show (in-game) after the rejection:
- Withdrawal (e.g. no attacking (monsters or other players), no trading, moves little, no interaction with other teens, doesn’t use chat )
- Angry/aggressive behavior (e.g. attacks other teens, doesn’t trade with other players, doesn’t cooperate, steals money from other teens, uses negative chat options)
- Prosocial behavior (e.g. doesn’t attack other teens, trades with other teens, heals other teens, helps other teens kill monsters, uses positive chat options)
To what extent did son and father…
- Ruminate. (e.g., talk about the same rejection episode in only one way, focus on negative emotions/appraisals of rejection, rehash same solutions).
- Actively problem-solve (e.g., look for or suggest alternative reasons for rejection, experiment/try out different responses to discover reasons for rejection, ask the other for help thinking about reasons and different approaches/moves in response)
- Accept (e.g., “let’s just move on”, “oh well, he’s just a crazy player”).
- Reappraise (this may fall into #2 a lot, but you can try as separate code; e.g., change initial negative appraisal of rejection to more positive or banal appraisal)
- Internal reasons for rejection (e.g., attribute rejection to own self; “I’m always the one left out”, “I guess I just can’t play this game well”, “I’m too slow for these good players”
- External reasons for rejection (e.g., attribute rejection to circumstances outside the self; “That guy has no idea how to play coop games”, “Maybe he’s playing as his “selfish” personality”, “He’s just trying to get more money because that’s his strength”)
Coders rated each of the items above on 1-5 likert scale, with 1 being “never happened” to 3 being “sometimes happened” to 5 being “happened a lot”:
Researchers recorded unstructured observations (Key Findings column) for each playtests. e.g. “Jan did not like the gameplay. He compared it a lot to other games and mentioned a few times that there are many games that have better gameplay and look better. When he thought of it as a prototype he was more mild in his judgement. Kees mentioned the game is different from what he normally plays. He didn’t like the fact that walking and attacking didn’t go fluently. Both did like the multiplayer aspect of the game a lot. Kees doesn’t seem to be our target teen (though a little shy, but this could be because of not knowing me). When Jan suspects the rejection was scripted I told him he was right. “
Criteria for RS / non-RS children:
Before every pilot playtest, each parent scored their son on a 1-5 scale for five statements:
- My teen son feels sad.
- My teen son feels bad about himself.
- My teen son has trouble sleeping.
- My teen son has trouble making friends.
- My teen son is withdrawn.
Assuming that RS children would score high on above questions, the criteria for RS children were scores of three or higher on these items.
Expectations of in-game behavior:
RS children will start playing to avoid rejections. They will help the others to try to be liked and use positive or neutral chat options. Non-RS children might be more competitive, but will also show prosocial behavior.
Following clear rejection RS children will either withdraw or show angry/aggressive behavior. Non-RS children will try to fix the relationship or accept it and move along.
- Withdrawal: e.g. doesn’t speak anymore (no chats), moves little, makes no choices.
- Angry/aggressive: e.g. attacks the confederate, doesn’t cooperate, rejects the confederate in non-fight scenes (votes against confederate, doesn’t give confederate money)
In-game behavior coding:
Following above expectations we’re coding for three different in-game behaviors for the son after rejection:
- Withdrawal (e.g. no attacking (monsters or other players), no trading, moves little, no interaction with other teens, doesn’t use chat ),
- Angry/aggressive behavior (e.g. attacks other teens, doesn’t trade with other players, doesn’t cooperate, steals money from other teens, uses negative chat options) and
- Prosocial behavior (e.g. doesn’t attack other teens, trades with other teens, heals other teens, helps other teens kill monsters, uses positive chat options)
Conversation coding:
If there is any conversation regarding the rejection, the conversation is coded for both the child and parent for the following behaviors, whereas for the child it is the question to what extent he showed the behavior and for the parent the questions is to what extent he/she modelled the behavior.
- (e.g., talk about the same rejection episode in only one way, focus on negative emotions/appraisals of rejection, rehash same solutions).
- Actively problem-solve (e.g., look for or suggest alternative reasons for rejection, experiment/try out different responses to discover reasons for rejection, ask the other for help thinking about reasons and different approaches/moves in response)
- Accept (e.g., “let’s just move on”, “oh well, he’s just a crazy player”).
- Reappraise (e.g., change initial negative appraisal of rejection to more positive or banal appraisal)
- Internal reasons for rejection (e.g., attribute rejection to own self; “I’m always the one left out”, “I guess I just can’t play this game well”, “I’m too slow for these good players”
- External reasons for rejection (e.g., attribute rejection to circumstances outside the self; “That guy has no idea how to play coop games”, “Maybe he’s playing as his “selfish” personality”, “He’s just trying to get more money because that’s his strength”)
Pilot Findings:
Conversation:
In general there is little conversation between parent and son after the rejection events. Often the parent doesn’t notice the attacks because he/she is paying little attention to other players (parents are usually still trying to master the controls). The voting results are noticed by parents, but the conversation doesn’t go further than the mention of the rejection.
Most of the sons respond to the rejection by commenting on it (e.g. “He’s attacking me”; “I think he just hit me”). 2 sons seemed to ruminate a little. When asked afterward how they felt, one responded that he thought it was a pity, because it caused them to focus on each other instead of looking together for the scroll, but he didn’t feel real bad about it. The other son admitted that he did not like being attacked, because he felt like he lost the game. The mother later confirms that he always wants to win games. One of the sons commented on the attacks and showed acceptance in his words. These words did not seem directed at someone in particular, so no further conversation happened between father and son.
Since most of the parents don’t notice the rejection events, there is little modeling. One father showed some reappraisal (“They like your avatar the most” (laughing), therefor changing it to a banal appraisal). This was in response to some ruminating behavior by the son. One mother and son started a small conversation regarding the rejection, with the son trying to think of alternative reasons for the rejection (Problem-solving) and the mother modeling acceptance.
In-game behavior:
There’s no difference in in-game behavior between RS and non-RS kids. Most sons responded to the rejection by showing angry/aggressive behavior (e.g. attacking Player 1 or 2, voting for 1 or 2, using negative chat options). Some showed prosocial behavior (e.g. trading with Player 1 or 2, helping Player 1 or 2 kill monsters, voting for themselves instead of another player).
Interview question regarding rejection:
After playing, we asked both sons and parents to discuss the reasons the confederate players attacked them.
Some sons and parents stated they didn’t understand why the son was attacked by the other player. Others tried to think of reasons for the rejection (e.g. “Because they were more experienced”; “Because I attacked/voted for them first” (revenge); “Because I was strong” (e.g. had a lot of gold); “Because he really doesn’t like me”; “Because he felt threatened” “I think they were just annoyed. They played the game before and I’m just a noob”; “He could’ve hit the wrong key or wanted to see how I reacted to it”; “Isn’t that what teenage boys do when they play games?”). Most sons and parents tell that this is part of gaming. Almost all sons have experienced being attacked in online games before and they don’t find it very bad. A few parents said that they can imagine their son having a stronger rejection when he is more invested in the game.
Marketing Related Pilot Findings
In addition to efficacy measurement, we collected questions related to marketing potential.
- On a 1-5 scale with 1 being “not at all”, 3 being “a little” and 5 being “a lot” how much did you like the game (= Enjoyment)
- On the same scale, how much did your character in the game resemblance your own personality (= Resemblance)
- On the same scale, how fun do you think this game would be to play with your son? (= Fun for Co-play)
- On the same scale, how appealing does a game sound that prompts you and your son to tackle puzzles around social mechanics (= Interest Game)
- On the same scale, do you think you would choose to play this game with your son over other alternative games? (= Values over other games)
To assess pilot’s engagement, after playing, we asked parents and sons:
Do you think you would choose to play this game with your son over other alternative games?
How appealing does a game sound that prompts you and your son to tackle puzzles around social mechanics (= Social Puzzle Game Appealing)
How fun do you think this game would be to play with your son? (= Fun Co-play)
How much did you like the game (= Enjoyment)
On a 1-5 scale with 1 being “not at all”, 3 being “a little” and 5 being “a lot”
Note that we see no significant differences among boys sensitive to rejection with their peers.
[1] The prosocial aims of our final prototype were to reduce rejection sensitivity in boys age 10-17 by activating boys’ feelings of rejection via play experience, and create boy-parent conversations about the rejection feelings.