I've designed a ESP32-based device on a custom circuit board (using bare ESP32 D0WDQ6 chips, rev1) and unfortunately I'm facing some instability issues that are very hard to reproduce.
The issue is that, when the device is used, at some point it seems to reset into bootloader or something similar (GPIO5 gets into weakly pulled up state, judging from that I have a LED on that pin, and it glows dimly when the issue happens). I've set the strapping pins to the good values for my application (GPIO12/MTDI pulled down through 360k (my flash is a 3.3V part), GPIO15 not tied to anything (relying on internal WPU), same for GPIO2, and GPIO0 is pulled up through 10k to 3.3V). The CHIP_PU pin is pulled up (10k to 3.3V) and has a 100nF cap to ground.
The ESP is powered at 3.3V and uses a Winbond W25Q64JVSSIQ 3.3V flash.
The application is automotive (the device transmits moto-racing telemetry, so it experiences some accelerations and has interfaces for external power and sensors). I know this can be a demanding application, but I've ruled out most of the things that can be the culprit:
- External power as not an issue, since the input has the usual kind of automotive voltage dump protections, and there are devices that are running on battery only and are still flaky.
- Temperature is also not an issue, in fact the devices regularly crash as described in 20°C.
- For the external sensors, they are all interfaced by other MCUs so interference from them is almost ruled out. The ESP32 only communicates through I²C, SPI and UART to the other MCUs and internal sensors.
- Vibration is likely not an issue, as we've placed devices on a vibration platform, and have thrown them to the ground, to no ill effect.
- The issue happens to ~15 out of the 25 "beta" devices, so it's not just a single flaky board.
- The power is through a Li-Ion 18650 battery and 3.3V LDO, with the recommended capacitance to the LDO, 10µF + a bunch of 100nF caps immediately close to the ESP on the respective power pins, as recommended in the "hardware design guidelines"
- It's not a software crash (I've verified this explicitly)
- The devices don't crash in "lab" setting (e.g. in the office, no matter how much time do I leave them powered on), only "in the field" (after a few minutes of working fine, but this varies and is quite random)
- PCB composition and layout. Due to cost reasons, I've designed a 2-layer PCB (I was encouraged that there are development boards like the ESP32 Thing that are 2-layer as well). It could be some inadequate routing that is allowing interference through;
- Some problem related to the varying WiFi power requirements as the vehicles are moving away or closer to the WiFi APs;
- Floating input pins. Some of the input-only pins are floating (some don't have any traces, some have traces spanning the entire board though);
- Floating VDD_SDIO pin (my flash is a 3V3 part, and is powered through the 3V3 rail directly). I've left the VDD_SDIO pin unconnected to anything;
- Something else completely?
The whole thing is developed under NDA, so I can't freely share the design for verification.
Best regards!