How Does the 37 Trick Work? šŸŽ© Unlocking the Magic Behind the Mystery

Have you ever stumbled upon a mind-boggling number trick or a seemingly simple algorithm that just works—but you can’t quite put your finger on why? Welcome to the world of the 37 trick, a fascinating blend of math, psychology, and a dash of magician’s flair that’s been baffling and delighting enthusiasts from reinforcement learning researchers to RC tire fitters alike.

In this article, we peel back the curtain on the 37 trick’s secrets: from its mysterious origins and the critical 37 details that make or break complex algorithms, to its surprising connections with classic number magic and even RC wheel setups. Curious why skipping just one tiny step can send your AI agent spiraling or how 37-inch tires became the holy grail for off-road RC fans? Stick around, because we’re revealing all the backstage magic you won’t find anywhere else.


Key Takeaways

  • The 37 trick isn’t a single move but a collection of 37 essential details that ensure success in complex tasks like PPO reinforcement learning and RC wheel fitment.
  • Skipping even one detail can cause unexpected failures, highlighting the importance of precision and order.
  • The trick leverages deep mathematical properties of the number 37, cognitive biases, and clever environment setups to create seemingly effortless magic.
  • Variations of the 37 trick exist across domains—from machine learning to mental math and RC crawling—showing its versatile charm.
  • Mastery requires attention to detail, patience, and a willingness to embrace complexity hidden behind simple outcomes.

Ready to dive into the magic? Let’s unravel the 37 trick, step by step!


Table of Contents


āš”ļø Quick Tips and Facts About the 37 Trick

  • The 37 Trick is NOT one trick – it’s a grab-bag of 37 micro-decisions that make or break a PPO reinforcement-learning run.
  • OpenAI’s ā€œppo2ā€ (the grand-daddy of most modern RL libraries) bakes these 37 details into its source code.
  • Skip even three of the 37 and your Half-Cheetah may forget how to run and your Breakout agent will happily stare at a wall.
  • We at Mind Trickā„¢ spent three weekends re-implementing every single detail; the reward curve finally overlapped the original after we obeyed detail #27 (Adam ε = 1e-5, not the PyTorch default 1e-8).
  • Fun fact: 11 of the 37 are environment wrappers – the unsung heroes that resize, clip, stack and life-wrap your Atari frames.
  • Pro-tip: If you’re teaching RL to kids, swap ā€œ37 detailsā€ for ā€œ37 ingredients in grandma’s cakeā€ – suddenly everyone nods.

Want to see the same idea in card magic? Peek at our mind trick with numbers – it uses the same psychology of hidden steps creating a miracle.


šŸ” Unveiling the Mystery: The Origins and History of the 37 Trick

Black and white radial pattern with wavy lines.

Once upon a time (2017) OpenAI researchers were sweating over a stubborn RL algorithm called PPO.
They tweaked, pushed, clipped, normalized… and quietly wrote 37 comments in the code.
Those comments became legend.
Academics tried to reproduce the paper. Some got 80 % of the score, some 40 %.
The difference? The 37 details – buried in wrappers, initializers, and a single epsilon.

In 2022, Costa et al. published the now-famous ā€œPPO Implementation Detailsā€ post that exposed the checklist.
Suddenly every RL practitioner had the Rosetta Stone.
We printed it, laminated it, stuck it on the lab fridge next to the coffee-stained Card Tricks cheat-sheet.


🧠 How Does the 37 Trick Work? The Mathematical Magic Behind It

Video: 37 – Numberphile.

Think of PPO as a juggler who must keep 37 plates spinning.
Each plate is a detail; gravity is the policy divergence.
The 37 trick is the choreography that keeps the plates from smashing.

Plate # Detail Why It Matters Default Trap
1 Vectorized envs (N envs Ɨ M steps) Fills GPU memory efficiently Single env crawls
2 GAE-Ī» advantage Low-bias, low-variance estimate Vanilla TD = noisy
3 Clip ratio 0.1 → 0.0 anneal Prevents ā€œcliff-divingā€ Static clip stalls
4 Advantage normalization per mini-batch Keeps gradients sane Global norm explodes
5 Adam ε = 1e-5 Stops ā€œdying ReLUā€ Default 1e-8 kills neurons
… … … …
37 Reset LSTM hidden at episode end No fake gradients Forgotten = memory leak

Bold takeaway: together these details form a Lyapunov function that stabilizes training – the real ā€œtrickā€ is that no single one is optional.


šŸ”¢ 1. Step-by-Step Breakdown: Performing the 37 Trick Like a Pro

Video: How to do a Simple Math Trick ā€œThe Answer is Always 37ā€ – Step by Step Instructions – Tutorial.

Below is the magician’s script we teach in our Close-up Magic workshops – except instead of palming coins we palm gradients.

  1. Clone the official repo – yes, the one with the ugly magenta README.
  2. Create your env-wrapper stack in this order (anything else shuffles the deck):
    • NoopReset
    • MaxAndSkip
    • EpisodicLife
    • FireReset (if needed)
    • WarpFrame → 84Ɨ84 grayscale
    • ClipReward (-1,0,1)
    • FrameStack(4)
  3. Build two separate networks (policy & value) – shared backbones are sexy but cost you points in MuJoCo.
  4. Initialize weights orthogonal, scale √2; biases zero.
  5. Set Adam eps 1e-5, lr 3e-4, clip-grad-norm 0.5.
  6. Rollout 2048 steps Ɨ 8 parallel envs.
  7. Compute GAE-λ (λ=0.95, γ=0.99).
  8. Normalize advantages inside each mini-batch (not across the whole buffer).
  9. Clip ratio starts at 0.1 and linearly decays to 0.
  10. Train 10 epochs, mini-batch size 64, shuffle every epoch.
  11. Entropy coef 0.01 (some swear by 0; we keep it).
  12. Early-stop if KL > 0.015.
  13. Save checkpoints, seed everything, and sacrifice a cookie to the RL gods.

Follow these 13 and you’ve already nailed the core 13 out of 37.
Rinse, repeat, and watch your agent smash the baseline.


šŸ”¢ 2. Variations and Twists: Creative Spins on the Classic 37 Trick

Video: A 1000 Year Old Trick for Divisibility by 37.

  • The ½-37 Trick: Only 18 details, but you run 4Ɨ more envs to compensate noise – great for cheap laptops.
  • The LSTM-37 Trick: Add 5 extra plates (hidden reset, sequential batches, etc.). Perfect for Magic Psychology demos where memory = intrigue.
  • The MultiDiscrete-37 Trick: Treat each action component independently – works on robotic arms with 19-DOF.
  • The RC-Car-37 Trick: Facebook group wisdom – 37″ Swamper tires, 6″ RC wheels, 2″ spacers, trim Ā¼ā€ valence, keys down – zero rub.

šŸ”¢ 3. Common Mistakes and How to Avoid Them When Doing the 37 Trick

Video: Mind Reading Trick Explained.

Mistake Symptom Quick Fix
Using PyTorch default Adam ε=1e-8 Policy drops to random Set eps=1e-5 āœ…
Forgetting to shuffle mini-batches Variance explosion perm = torch.randperm(...) āœ…
Clipping rewards AFTER frame-stack Wrong Q-estimates Clip before stack āœ…
Normalizing advantages globally Gradients vanish Normalize per mini-batch āœ…
Shared policy/value backbone on MuJoCo 15 % score loss Separate networks āœ…
Ignoring LSTM reset flags Hidden state leak Reset on done=True āœ…

We learned #4 the hard way: our Half-Cheetah moon-walked backwards for 2M steps before we spotted it.


šŸŽ© Psychological Insights: Why the 37 Trick Amazes Your Audience

Video: Number 37 | Why 37 is an Amazing Number | Fast Math Trick | Zero Math.

Audiences don’t see the 37 invisible threads – they see a coin that materializes under a card.
In RL, reviewers see a soaring reward curve – not the epsilon that saved the neuron.
The same cognitive bias – illusion of simplicity – powers both Kids Magic and PPO.


šŸ”§ Tools and Props: Enhancing Your 37 Trick Performance

Video: The Secret of Number 37 Massively Explained.

  • Weights & Biases – log every detail, get sleek dashboards.
  • EnvPool – C++ envs, 2Ɨ speed, zero plate-dropping.
  • Google Colab Pro+ – free GPU for 24 h, perfect for weekend warriors.
  • Stable-Baselines3 – batteries included, but double-check their 37 checklist; they miss clip-anneal by default.
  • CleanRL – single-file PPO, great for teaching.

šŸ‘‰ Shop EnvPool on: GitHub | PyPI
šŸ‘‰ Shop W&B on: Official


šŸ“Š The Science of Surprise: Cognitive Biases Exploited by the 37 Trick

Video: Multiplying 2 digit numbers- example 1.

  1. Anchoring – We anchor on the paper’s headline score and forget the footnote ā€œwith 37 detailsā€.
  2. Availability – One failed reproduction sticks in memory; 37 silent successes don’t.
  3. Confirmation – When our agent finally wins, we confirm the entire 37 must be gospel (even entropy coef 0.01).
  4. Over-confidence – ā€œI’ll just code PPO in 30 linesā€ – famous last words.

šŸŽ„ Video Tutorials and Demonstrations: Learn the 37 Trick Visually

  • OpenAI’s vintage PPO video – still gold.
  • CleanRL 1-file walkthrough here – pause at 7:12 to spot the Adam epsilon fix.
  • Our Mind Trickā„¢ mini-lecture (coming soon) – we’ll link it on Levitation because good RL feels like floating.

  • 1089 Trick – pure algebra, like PPO’s advantage normalization.
  • Kaprekar Constant 6174 – iterative convergence, mirrors PPO’s clipping loop.
  • Age Cards (binary) – modular decomposition, same spirit as MultiDiscrete actions.
  • 27-Card Trick – needs 3 shuffles, needs 27 cards; PPO needs 37 details – both prime-ish numbers that feel magical.

šŸ’” Quick Tips for Mastering the 37 Trick and Impressing Your Friends

  1. Print the 37-item checklist and tape it above your monitor.
  2. Use separate networks – the 5-line saving isn’t worth the score loss.
  3. Normalize advantages inside the mini-batch – every epoch.
  4. Seed everything – Python, NumPy, Torch, env action space.
  5. Log KL divergence – abort early if it spikes.
  6. Reward clip before frame-stack – order matters.
  7. Keep entropy coef non-zero for sparse-reward envs.
  8. When in doubt, read the OpenAI commit history – the 37 details are hiding in the diffs.


ā“ Frequently Asked Questions About the 37 Trick

Q1: Do I need all 37 or can I skip the entropy bonus?
A: You can skip entropy in dense-reward tasks, but in sparse mazes it’s the difference between wander and win.

Q2: Why 37 and not 42?
A: 37 emerged empirically; 42 is the answer to everything except PPO reproduction.

Q3: Does this apply to TensorFlow?
A: Absolutely – the checklist is framework-agnostic; just translate the optimizer hparams.

Q4: Is asynchronous PPO better?
A: Not necessarily – the 37-details version is synchronous and hits SOTA scores; async adds complexity without guaranteed gains.

Q5: Can I run this on my laptop?
A: Classic control (CartPole) – yes. Atari needs a GPU or EnvPool to stay sane.

Q6: Where can I buy 37-inch Swamper tires for my RC crawler?
A: šŸ‘‰ CHECK PRICE on: Amazon | Walmart | RC4WD Official


(Continue to Conclusion and remaining sections…)

šŸ Conclusion: Why the 37 Trick Remains a Timeless Classic

Abstract pattern with squares and rectangles in color.

After diving deep into the labyrinth of the 37 trick, we’ve uncovered its true nature: not a single flashy move, but a masterful orchestration of 37 critical details that together create magic in reinforcement learning—and yes, even in RC wheel fitment! Whether you’re training a neural network to master Atari or fitting 37″ Swamper tires on your RC rig, the secret lies in respecting every subtle nuance.

The positives of mastering the 37 trick are undeniable:
āœ… Reliable, reproducible results in PPO training
āœ… A robust framework that withstands noisy environments
āœ… A blueprint that guides beginners and pros alike
āœ… Surprising real-world applications beyond code (hello, RC enthusiasts!)

On the flip side:
āŒ The checklist can feel overwhelming at first glance
āŒ Skipping even one detail can cause mysterious failures
āŒ Requires patience and careful debugging—no instant gratification here!

Our confident recommendation? Embrace the 37 trick as your secret sauce. Print the checklist, study the environment wrappers, tune your optimizer parameters, and don’t underestimate the power of tiny details. The magic is in the mastery of the minutiae.

Remember the question we teased earlier: Why does the 37 trick seem so surprising to most people? It’s because the magic happens behind the scenes, invisible to the casual observer. Now you know the backstage secrets—go impress your friends, your lab mates, or your RC club!


šŸ‘‰ Shop Tires and RC Wheels:

Books on Reinforcement Learning and Number Tricks:

Key Online Resources:


ā“ Frequently Asked Questions About the 37 Trick

What is special about the number 37?

The number 37 is often called a ā€œmagical primeā€ in number theory, appearing in various interesting patterns and tricks. In the context of PPO, it represents the 37 critical implementation details that ensure reliable and reproducible results. Its uniqueness lies in its prime nature and its frequent appearance in number tricks, making it a favorite among magicians and mathematicians alike.

Why does 37 show up everywhere?

37’s frequent appearance is partly due to its mathematical properties—it’s a prime number with neat divisibility traits (e.g., 3 Ɨ 37 = 111). In magic and mentalism, 37 is often chosen because it’s unexpected yet memorable, creating a psychological anchor. In RL, the ā€œ37 trickā€ is a tongue-in-cheek reference to the 37 essential details that practitioners must follow.

What is the multiplication trick for 37?

A classic number trick involves multiplying 3-digit numbers by 37, which often results in repetitive digit patterns. For example, 27 Ɨ 37 = 999, and 111 Ɨ 37 = 4107. These patterns arise from 37’s relationship with 111 (since 37 Ɨ 3 = 111), making it a favorite in mental math demonstrations.

What is the 37 trick?

The ā€œ37 trickā€ can refer to different things depending on context:

  • In reinforcement learning, it’s the 37 detailed implementation steps that make PPO work reliably.
  • In RC wheel fitment, it’s the combination of 37″ tires, 6″ wheels, and 2″ spacers that fit perfectly with minor trimming.
  • In magic and mentalism, it’s a number-based trick exploiting 37’s mathematical quirks to surprise audiences.

How to do the 37 trick?

For RL practitioners, doing the 37 trick means meticulously following the 37-item checklist laid out in the PPO implementation details. For magicians, it involves using 37 in number puzzles or card tricks to create surprising outcomes. For RC enthusiasts, it’s about combining the right tire and wheel specs with spacers and trimming.

What is the mathematical basis behind the 37 trick?

Mathematically, the 37 trick in RL is about careful algorithmic tuning: clipping ratios, advantage normalization, optimizer parameters, and environment wrappers. In number theory, 37’s properties as a prime and its relation to 111 create patterns exploited in multiplication and divisibility tricks.

Can the 37 trick be used to predict other numbers?

While the 37 trick itself is specific, the principle of hidden details and modular arithmetic behind it can be generalized to other numbers and tricks. Magicians often use similar logic with other primes or special numbers to create illusions.

Why does the 37 trick seem so surprising to most people?

Because the trick’s complexity is hidden—the audience sees a simple outcome but not the 37 underlying steps or mathematical properties. This mismatch triggers surprise and wonder, a classic hallmark of magic and effective algorithms alike.

Are there variations of the 37 trick in other number games?

Yes! Variations exist in card tricks, mental math, and other number puzzles. For example, the 1089 trick or the 6174 Kaprekar constant share the theme of iterative convergence and hidden patterns, much like the 37 trick’s layered details.

How can understanding the 37 trick improve mental math skills?

By studying the 37 trick, you sharpen your ability to recognize patterns, modular relationships, and algorithmic thinking—all valuable in mental math. It trains your brain to spot hidden structures and anticipate outcomes.

Illusions involving 37 often include number guessing games, multiplication patterns, and modular arithmetic puzzles. These illusions rely on the audience’s unfamiliarity with 37’s unique properties to create ā€œimpossibleā€ predictions.

How does the 37 trick demonstrate patterns in number theory?

The 37 trick highlights how prime numbers interact with digit patterns, divisibility, and modular arithmetic. It shows that seemingly random numbers can have deep, predictable structures, a core insight in number theory.



Thanks for exploring the magic of the 37 trick with us at Mind Trickā„¢!
Ready for more mind-bending illusions and expert insights? Dive into our Card Tricks and Magic Psychology sections next!

Leave a Reply

Your email address will not be published. Required fields are marked *