ESP32S3 Memory problem [solved]

lesept
Posts: 38
Joined: Wed Jun 27, 2018 10:30 pm

ESP32S3 Memory problem [solved]

Postby lesept » Tue Jul 30, 2024 6:32 pm

Hi
I'm running an AI code on an ESP32S3 board equipped with 8MB PSRAM and 16MB Flash. The code is doing the inference, and all the parameters and the input data are in separate .h files.
The AI network is made of several layers, and the code works correctly and gives the correct results up to the last layer. I mean that when I comment the call of the function for the last layer, everything is fine. But when I uncomment it, there is nothing printed on the serial monitor, although I print many memory and text information.
This is why I think it is a memory problem.

Here is the function runningthe inference, called by the setup of the Arduino IDE code:

Code: Select all

#include <string.h>
#include <stdlib.h>


#include <stdint.h>

#include "../include/network/rescaling.hpp"

// Layer & memory configurations
#include "../include/kernels/convolution.hpp"
#include "../layers/feature_feature_0_Conv_output_0.h"
#include "../include/kernels/activation.hpp"
#include "../layers/feature_feature_1_Relu.h"
#include "../include/kernels/pooling.hpp"
#include "../layers/feature_feature_2_MaxPool_output_0.h"
#include "../include/kernels/convolution.hpp"
#include "../layers/feature_feature_3_Conv_output_0.h"
#include "../include/kernels/activation.hpp"
#include "../layers/feature_feature_4_Relu.h"
#include "../include/kernels/pooling.hpp"
#include "../layers/feature_feature_5_MaxPool_output_0.h"
#include "../include/kernels/fullyconnected.hpp"
#include "../layers/classifier_classifier_1_Gemm_output_0.h"
#include "../include/kernels/activation.hpp"
#include "../layers/classifier_classifier_2_Relu.h"
#include "../include/kernels/fullyconnected.hpp"
#include "../layers/classifier_classifier_3_Gemm_output_0.h"
#include "../include/kernels/activation.hpp"
#include "../layers/classifier_classifier_4_Relu.h"
#include "../include/kernels/fullyconnected.hpp"
#include "../layers/output.h"
#include "../memory/mem_info.h"

#include "esp_heap_caps.h"

void model_forward(float* result)
{
	int dim;
	float* input;
	float* output;
	float* weights;
	float* bias;

	input  = (float*) malloc (FEATURE_FEATURE_0_CONV_OUTPUT_0_NB_CHANNELS * FEATURE_FEATURE_0_CONV_OUTPUT_0_CHANNELS_HEIGHT * FEATURE_FEATURE_0_CONV_OUTPUT_0_CHANNELS_WIDTH * sizeof(float));
	output = (float*) malloc (FEATURE_FEATURE_0_CONV_OUTPUT_0_NB_OUTPUTS * FEATURE_FEATURE_0_CONV_OUTPUT_0_OUTPUTS_HEIGHT * FEATURE_FEATURE_0_CONV_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
 	printf("Input + output %d\n", 4*(FEATURE_FEATURE_0_CONV_OUTPUT_0_NB_CHANNELS * FEATURE_FEATURE_0_CONV_OUTPUT_0_CHANNELS_HEIGHT * FEATURE_FEATURE_0_CONV_OUTPUT_0_CHANNELS_WIDTH+ FEATURE_FEATURE_0_CONV_OUTPUT_0_NB_OUTPUTS * FEATURE_FEATURE_0_CONV_OUTPUT_0_OUTPUTS_HEIGHT * FEATURE_FEATURE_0_CONV_OUTPUT_0_OUTPUTS_WIDTH));
	{
		#include "../../inputs.h"
		memcpy(input, inputs, FEATURE_FEATURE_0_CONV_OUTPUT_0_NB_CHANNELS * FEATURE_FEATURE_0_CONV_OUTPUT_0_CHANNELS_HEIGHT * FEATURE_FEATURE_0_CONV_OUTPUT_0_CHANNELS_WIDTH * sizeof(float));
		#include "../parameters/feature_0_weight.h"
		#include "../parameters/feature_0_bias.h"
		convolution_forward<FEATURE_FEATURE_0_CONV_OUTPUT_0_NB_CHANNELS,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_CHANNELS_HEIGHT,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_CHANNELS_WIDTH,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_NB_OUTPUTS,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_OUTPUTS_HEIGHT,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_OUTPUTS_WIDTH,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_PADDING_Y,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_PADDING_X,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_STRIDE_Y,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_STRIDE_X,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_DILATION_Y,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_DILATION_X,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_KERNEL_HEIGHT,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_KERNEL_WIDTH,
			FEATURE_FEATURE_0_CONV_OUTPUT_0_ACTIVATION>
			(input, output, weights, bias, FEATURE_FEATURE_0_CONV_OUTPUT_0_RESCALING);
	}
printf("Consumed heap: %d\n", 8555404-heap_caps_get_free_size(MALLOC_CAP_8BIT));
//
//
	input = (float*) realloc(input, FEATURE_FEATURE_0_CONV_OUTPUT_0_NB_OUTPUTS * FEATURE_FEATURE_0_CONV_OUTPUT_0_OUTPUTS_HEIGHT * FEATURE_FEATURE_0_CONV_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
 	printf("Input + output %d\n", 8*FEATURE_FEATURE_0_CONV_OUTPUT_0_NB_OUTPUTS * FEATURE_FEATURE_0_CONV_OUTPUT_0_OUTPUTS_HEIGHT * FEATURE_FEATURE_0_CONV_OUTPUT_0_OUTPUTS_WIDTH);
	memcpy(input, output, FEATURE_FEATURE_0_CONV_OUTPUT_0_NB_OUTPUTS * FEATURE_FEATURE_0_CONV_OUTPUT_0_OUTPUTS_HEIGHT * FEATURE_FEATURE_0_CONV_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
	activation_forward<FEATURE_FEATURE_1_RELU_NB_DATA,
		FEATURE_FEATURE_1_RELU_ACTIVATION>
		(input, output, FEATURE_FEATURE_0_CONV_OUTPUT_0_RESCALING);
printf("Consumed heap: %d\n", 8555404-heap_caps_get_free_size(MALLOC_CAP_8BIT));

//
//
	input = (float*) realloc(input, FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_NB_CHANNELS * FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_CHANNELS_HEIGHT * FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_CHANNELS_WIDTH * sizeof(float));
	memcpy(input, output, FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_NB_CHANNELS * FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_CHANNELS_HEIGHT * FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_CHANNELS_WIDTH * sizeof(float));
	output = (float*) realloc(output, FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_NB_OUTPUTS * FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_OUTPUTS_HEIGHT * FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
 	printf("Input + output %d\n", 4*(FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_NB_CHANNELS * FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_CHANNELS_HEIGHT * FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_CHANNELS_WIDTH+ FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_NB_OUTPUTS*FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_OUTPUTS_HEIGHT*FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_OUTPUTS_WIDTH));
	pooling_forward<FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_NB_CHANNELS,
		FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_CHANNELS_HEIGHT,
		FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_CHANNELS_WIDTH,
		FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_NB_OUTPUTS,
		FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_OUTPUTS_HEIGHT,
		FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_OUTPUTS_WIDTH,
		FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_PADDING_Y,
		FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_PADDING_X,
		FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_STRIDE_Y,
		FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_STRIDE_X,
		FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_KERNEL_HEIGHT,
		FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_KERNEL_WIDTH,
		FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_POOLING_TYPE,
		FEATURE_FEATURE_2_MAXPOOL_OUTPUT_0_ACTIVATION>
		(input, output);
printf("Consumed heap: %d\n", 8555404-heap_caps_get_free_size(MALLOC_CAP_8BIT));

//
//
	input = (float*) realloc(input, FEATURE_FEATURE_3_CONV_OUTPUT_0_NB_CHANNELS * FEATURE_FEATURE_3_CONV_OUTPUT_0_CHANNELS_HEIGHT * FEATURE_FEATURE_3_CONV_OUTPUT_0_CHANNELS_WIDTH * sizeof(float));
	memcpy(input, output, FEATURE_FEATURE_3_CONV_OUTPUT_0_NB_CHANNELS * FEATURE_FEATURE_3_CONV_OUTPUT_0_CHANNELS_HEIGHT * FEATURE_FEATURE_3_CONV_OUTPUT_0_CHANNELS_WIDTH * sizeof(float));
	output = (float*) realloc(output, FEATURE_FEATURE_3_CONV_OUTPUT_0_NB_OUTPUTS * FEATURE_FEATURE_3_CONV_OUTPUT_0_OUTPUTS_HEIGHT * FEATURE_FEATURE_3_CONV_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
 	printf("Input + output %d\n", 4*(FEATURE_FEATURE_3_CONV_OUTPUT_0_NB_CHANNELS * FEATURE_FEATURE_3_CONV_OUTPUT_0_CHANNELS_HEIGHT * FEATURE_FEATURE_3_CONV_OUTPUT_0_CHANNELS_WIDTH+ FEATURE_FEATURE_3_CONV_OUTPUT_0_NB_OUTPUTS * FEATURE_FEATURE_3_CONV_OUTPUT_0_OUTPUTS_HEIGHT * FEATURE_FEATURE_3_CONV_OUTPUT_0_OUTPUTS_WIDTH));
	{
		#include "../parameters/feature_3_weight.h"
		#include "../parameters/feature_3_bias.h"
		convolution_forward<FEATURE_FEATURE_3_CONV_OUTPUT_0_NB_CHANNELS,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_CHANNELS_HEIGHT,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_CHANNELS_WIDTH,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_NB_OUTPUTS,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_OUTPUTS_HEIGHT,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_OUTPUTS_WIDTH,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_PADDING_Y,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_PADDING_X,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_STRIDE_Y,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_STRIDE_X,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_DILATION_Y,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_DILATION_X,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_KERNEL_HEIGHT,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_KERNEL_WIDTH,
			FEATURE_FEATURE_3_CONV_OUTPUT_0_ACTIVATION>
			(input, output, weights, bias, FEATURE_FEATURE_3_CONV_OUTPUT_0_RESCALING);
	}
printf("Consumed heap: %d\n", 8555404-heap_caps_get_free_size(MALLOC_CAP_8BIT));

//
//
	input = (float*) realloc(input, FEATURE_FEATURE_3_CONV_OUTPUT_0_NB_OUTPUTS * FEATURE_FEATURE_3_CONV_OUTPUT_0_OUTPUTS_HEIGHT * FEATURE_FEATURE_3_CONV_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
	memcpy(input, output, FEATURE_FEATURE_3_CONV_OUTPUT_0_NB_OUTPUTS * FEATURE_FEATURE_3_CONV_OUTPUT_0_OUTPUTS_HEIGHT * FEATURE_FEATURE_3_CONV_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
 	printf("Input + output %d\n", 8*FEATURE_FEATURE_3_CONV_OUTPUT_0_NB_OUTPUTS * FEATURE_FEATURE_3_CONV_OUTPUT_0_OUTPUTS_HEIGHT * FEATURE_FEATURE_3_CONV_OUTPUT_0_OUTPUTS_WIDTH);
	activation_forward<FEATURE_FEATURE_4_RELU_NB_DATA,
		FEATURE_FEATURE_4_RELU_ACTIVATION>
		(input, output, FEATURE_FEATURE_3_CONV_OUTPUT_0_RESCALING);
printf("Consumed heap: %d\n", 8555404-heap_caps_get_free_size(MALLOC_CAP_8BIT));

//
//
	input = (float*) realloc(input, FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_NB_CHANNELS * FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_CHANNELS_HEIGHT * FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_CHANNELS_WIDTH * sizeof(float));
	memcpy(input, output, FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_NB_CHANNELS * FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_CHANNELS_HEIGHT * FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_CHANNELS_WIDTH * sizeof(float));
	output = (float*) realloc(output, FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_NB_OUTPUTS * FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_OUTPUTS_HEIGHT * FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
 	printf("Input + output %d\n", 4*(FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_NB_CHANNELS * FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_CHANNELS_HEIGHT * FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_CHANNELS_WIDTH+ FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_NB_OUTPUTS * FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_OUTPUTS_HEIGHT * FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_OUTPUTS_WIDTH));
	pooling_forward<FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_NB_CHANNELS,
		FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_CHANNELS_HEIGHT,
		FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_CHANNELS_WIDTH,
		FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_NB_OUTPUTS,
		FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_OUTPUTS_HEIGHT,
		FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_OUTPUTS_WIDTH,
		FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_PADDING_Y,
		FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_PADDING_X,
		FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_STRIDE_Y,
		FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_STRIDE_X,
		FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_KERNEL_HEIGHT,
		FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_KERNEL_WIDTH,
		FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_POOLING_TYPE,
		FEATURE_FEATURE_5_MAXPOOL_OUTPUT_0_ACTIVATION>
		(input, output);
printf("Consumed heap: %d\n", 8555404-heap_caps_get_free_size(MALLOC_CAP_8BIT));

//
//
	input = (float*) realloc(input, CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_NB_CHANNELS * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_CHANNELS_HEIGHT * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_CHANNELS_WIDTH * sizeof(float));
	memcpy(input, output, CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_NB_CHANNELS * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_CHANNELS_HEIGHT * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_CHANNELS_WIDTH * sizeof(float));
	output = (float*) realloc(output, CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_NB_OUTPUTS * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_OUTPUTS_HEIGHT * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
 	printf("Input + output %d\n", 4*(CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_NB_CHANNELS * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_CHANNELS_HEIGHT * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_CHANNELS_WIDTH+ CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_NB_OUTPUTS * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_OUTPUTS_HEIGHT * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_OUTPUTS_WIDTH));
	{
		#include "../parameters/classifier_1_weight.h"
		#include "../parameters/classifier_1_bias.h"
		fullyconnected_forward<CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_NB_CHANNELS,
			CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_CHANNELS_HEIGHT,
			CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_CHANNELS_WIDTH,
			CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_NB_OUTPUTS,
			CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_OUTPUTS_HEIGHT,
			CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_OUTPUTS_WIDTH,
			CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_ACTIVATION>
			(input, output, weights, bias, CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_RESCALING);
printf("Consumed heap FC: %d\n", 8555404-heap_caps_get_free_size(MALLOC_CAP_8BIT));
	}
printf("Consumed heap: %d\n", 8555404-heap_caps_get_free_size(MALLOC_CAP_8BIT));

//
//
	input = (float*) realloc(input, CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_NB_OUTPUTS * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_OUTPUTS_HEIGHT * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
	memcpy(input, output, CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_NB_OUTPUTS * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_OUTPUTS_HEIGHT * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
 	printf("Input + output %d\n", 8*CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_NB_OUTPUTS * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_OUTPUTS_HEIGHT * CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_OUTPUTS_WIDTH);
	activation_forward<CLASSIFIER_CLASSIFIER_2_RELU_NB_DATA,
		CLASSIFIER_CLASSIFIER_2_RELU_ACTIVATION>
		(input, output, CLASSIFIER_CLASSIFIER_1_GEMM_OUTPUT_0_RESCALING);
printf("Consumed heap: %d\n", 8555404-heap_caps_get_free_size(MALLOC_CAP_8BIT));

//
//
	input = (float*) realloc(input, CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_NB_CHANNELS * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_CHANNELS_HEIGHT * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_CHANNELS_WIDTH * sizeof(float));
	memcpy(input, output, CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_NB_CHANNELS * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_CHANNELS_HEIGHT * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_CHANNELS_WIDTH * sizeof(float));
	output = (float*) realloc(output, CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_NB_OUTPUTS * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_OUTPUTS_HEIGHT * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
 	printf("Input + output %d\n", 4*(CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_NB_CHANNELS * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_CHANNELS_HEIGHT * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_CHANNELS_WIDTH+ CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_NB_OUTPUTS * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_OUTPUTS_HEIGHT * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_OUTPUTS_WIDTH));
	{
		#include "../parameters/classifier_3_weight.h"
		#include "../parameters/classifier_3_bias.h"
		fullyconnected_forward<CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_NB_CHANNELS,
			CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_CHANNELS_HEIGHT,
			CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_CHANNELS_WIDTH,
			CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_NB_OUTPUTS,
			CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_OUTPUTS_HEIGHT,
			CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_OUTPUTS_WIDTH,
			CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_ACTIVATION>
			(input, output, weights, bias, CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_RESCALING);
	}
printf("Consumed heap: %d\n", 8555404-heap_caps_get_free_size(MALLOC_CAP_8BIT));

//
//
	input = (float*) realloc(input, CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_NB_OUTPUTS * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_OUTPUTS_HEIGHT * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
	memcpy(input, output, CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_NB_OUTPUTS * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_OUTPUTS_HEIGHT * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_OUTPUTS_WIDTH * sizeof(float));
 	printf("Input + output %d\n", 8*CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_NB_OUTPUTS * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_OUTPUTS_HEIGHT * CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_OUTPUTS_WIDTH);
	activation_forward<CLASSIFIER_CLASSIFIER_4_RELU_NB_DATA,
		CLASSIFIER_CLASSIFIER_4_RELU_ACTIVATION>
		(input, output, CLASSIFIER_CLASSIFIER_3_GEMM_OUTPUT_0_RESCALING);
printf("Consumed heap: %d\n", 8555404-heap_caps_get_free_size(MALLOC_CAP_8BIT));

//
//
	input = (float*) realloc(input, OUTPUT_NB_CHANNELS * OUTPUT_CHANNELS_HEIGHT * OUTPUT_CHANNELS_WIDTH * sizeof(float));
	memcpy(input, output, OUTPUT_NB_CHANNELS * OUTPUT_CHANNELS_HEIGHT * OUTPUT_CHANNELS_WIDTH * sizeof(float));
	output = (float*) realloc(output, OUTPUT_NB_OUTPUTS * OUTPUT_OUTPUTS_HEIGHT * OUTPUT_OUTPUTS_WIDTH * sizeof(float));
 	printf("Input + output %d\n", 4*(OUTPUT_NB_CHANNELS * OUTPUT_CHANNELS_HEIGHT * OUTPUT_CHANNELS_WIDTH+ OUTPUT_NB_OUTPUTS * OUTPUT_OUTPUTS_HEIGHT * OUTPUT_OUTPUTS_WIDTH));
printf("Consumed heap: %d\n", 8555404-heap_caps_get_free_size(MALLOC_CAP_8BIT));
	{
		#include "../parameters/classifier_5_weight.h"
		#include "../parameters/classifier_5_bias.h"
/*
		fullyconnected_forward<OUTPUT_NB_CHANNELS,
			OUTPUT_CHANNELS_HEIGHT,
			OUTPUT_CHANNELS_WIDTH,
			OUTPUT_NB_OUTPUTS,
			OUTPUT_OUTPUTS_HEIGHT,
			OUTPUT_OUTPUTS_WIDTH,
			OUTPUT_ACTIVATION>
			(input, output, weights, bias, OUTPUT_RESCALING);
*/
	}
//
//
	memcpy(result, output, OUTPUT_NB_OUTPUTS * OUTPUT_OUTPUTS_HEIGHT * OUTPUT_OUTPUTS_WIDTH * sizeof(float));
}

/*

Layers statistics:
convolution: Input: 784, Output: 4704, Weights: 150, Biases: 6, Total: 5644, RAM: 22576 B
activation: Input: 4704, Output: 4704, Total: 9408, RAM: 37632 B
pooling: Input: 4704, Output: 1176, Total: 5880, RAM: 23520 B
convolution: Input: 1176, Output: 1600, Weights: 2400, Biases: 16, Total: 5192, RAM: 20768 B
activation: Input: 1600, Output: 1600, Total: 3200, RAM: 12800 B
pooling: Input: 1600, Output: 400, Total: 2000, RAM: 8000 B
fullyconnected: Input: 400, Output: 120, Weights: 48000, Biases: 120, Total: 48640, RAM: 194560 B
activation: Input: 120, Output: 120, Total: 240, RAM: 960 B
fullyconnected: Input: 120, Output: 84, Weights: 10080, Biases: 84, Total: 10368, RAM: 41472 B
activation: Input: 84, Output: 84, Total: 168, RAM: 672 B
fullyconnected: Input: 84, Output: 10, Weights: 840, Biases: 10, Total: 944, RAM: 3776 B
Maximum RAM required: 194560 B (191 kB)

*/
It's quite complex because all the parameters are in other .h files.

Here is the output when the last layer is commented out:
Executing CPP export
Initial heap: 8555404
Input + output 21952
Consumed heap: 22184
Input + output 37632
Consumed heap: 37864
Input + output 23520
Consumed heap: 23752
Input + output 11104
Consumed heap: 11336
Input + output 12800
Consumed heap: 13032
Input + output 8000
Consumed heap: 8232
Input + output 2080
Consumed heap FC: 2312
Consumed heap: 2312
Input + output 960
Consumed heap: 1192
Input + output 816
Consumed heap: 1048
Input + output 672
Consumed heap: 904
Input + output 376
Consumed heap: 608
0: 0
1: 8.55611
2: 0
3: 0
4: 0
5: 0
6: 0
7: 2.1693
8: 0
9: 0
Execution time: 57 ms
And there is nothing on the monitor when it is uncommented!!
When I print the free heap using

Code: Select all

heap_caps_get_free_size(MALLOC_CAP_8BIT)
it seems that the RAM is not overloaded.
Can anyone help or provide some advice to find the source of the problem?

Thanks a lot.
Last edited by lesept on Fri Aug 02, 2024 11:05 am, edited 1 time in total.

lesept
Posts: 38
Joined: Wed Jun 27, 2018 10:30 pm

Re: ESP32S3 Memory problem

Postby lesept » Tue Jul 30, 2024 6:37 pm

Some additional information:

Code: Select all

Chip Info:
------------------------------------------
  Model             : ESP32-S3
  Package           : 0
  Revision          : 2
  Cores             : 2
  CPU Frequency     : 240 MHz
  XTAL Frequency    : 40 MHz
  Embedded Flash    : No
  Embedded PSRAM    : No
  2.4GHz WiFi       : Yes
  Classic BT        : No
  BT Low Energy     : Yes
  IEEE 802.15.4     : No
------------------------------------------
INTERNAL Memory Info:
------------------------------------------
  Total Size        :   395756 B ( 386.5 KB)
  Free Bytes        :   171792 B ( 167.8 KB)
  Allocated Bytes   :   219220 B ( 214.1 KB)
  Minimum Free Bytes:   166648 B ( 162.7 KB)
  Largest Free Block:   124916 B ( 122.0 KB)
------------------------------------------
SPIRAM Memory Info:
------------------------------------------
  Total Size        :  8388608 B (8192.0 KB)
  Free Bytes        :  8385672 B (8189.1 KB)
  Allocated Bytes   :      576 B (   0.6 KB)
  Minimum Free Bytes:  8385672 B (8189.1 KB)
  Largest Free Block:  8257524 B (8064.0 KB)
  Bus Mode          : OPI
------------------------------------------
Flash Info:
------------------------------------------
  Chip Size         : 16777216 B (16 MB)
  Block Size        :    65536 B (  64.0 KB)
  Sector Size       :     4096 B (   4.0 KB)
  Page Size         :      256 B (   0.2 KB)
  Bus Speed         : 80 MHz
  Bus Mode          : QIO
------------------------------------------
Partitions Info:
------------------------------------------
                nvs : addr: 0x00009000, size:    20.0 KB, type: DATA, subtype: NVS
            otadata : addr: 0x0000E000, size:     8.0 KB, type: DATA, subtype: OTA
               app0 : addr: 0x00010000, size: 16256.0 KB, type:  APP, subtype: OTA_0
           coredump : addr: 0x00FF0000, size:    64.0 KB, type: DATA, subtype: COREDUMP
I made a custom partitions file, but it doesn't seem to be related, as I still have the same problem with other partitions setups as well.

Also, I added this in the setup, to increase the memory allocated to the task:

Code: Select all

SET_LOOP_TASK_STACK_SIZE (200 * 1024); // Increase Stack size
Changing the value 200 to other values (from 32 to 512) did not solve the problem.

MicroController
Posts: 1734
Joined: Mon Oct 17, 2022 7:38 pm
Location: Europe, Germany

Re: ESP32S3 Memory problem

Postby MicroController » Tue Jul 30, 2024 7:55 pm

You may want/have to explicitly allocate buffers in the external RAM (ps_malloc()..., see e.g. https://thingpulse.com/esp32-how-to-use-psram/, or directly via the IDF's heap_caps_malloc()).

And 200kB of stack space for a single task is way too much, given that the stack resides in internal RAM and the full space reserved for it is unavailable for other things even when only a fraction of it is actually used.

I don't know the framework you're using; it may or may not internally try and fail to allocate some RAM. Do the functions you call return any results which would indicate an error?

lesept
Posts: 38
Joined: Wed Jun 27, 2018 10:30 pm

Re: ESP32S3 Memory problem

Postby lesept » Wed Jul 31, 2024 8:27 am

Hi, thank you for your answer.
There is no specific framework, only C functions, and they do not return error results.

Using PSRAM would be my next step. I think that if my arrays are too big for the SRAM, I would store them in PSRAM. But so far, this example does not seem to use arrays that do not fit into SRAM. I did the computation and added it as a comment at the end of the function call.

Code: Select all

Layers statistics:
convolution: Input: 784, Output: 4704, Weights: 150, Biases: 6, Total: 5644, RAM: 22576 B
activation: Input: 4704, Output: 4704, Total: 9408, RAM: 37632 B
pooling: Input: 4704, Output: 1176, Total: 5880, RAM: 23520 B
convolution: Input: 1176, Output: 1600, Weights: 2400, Biases: 16, Total: 5192, RAM: 20768 B
activation: Input: 1600, Output: 1600, Total: 3200, RAM: 12800 B
pooling: Input: 1600, Output: 400, Total: 2000, RAM: 8000 B
fullyconnected: Input: 400, Output: 120, Weights: 48000, Biases: 120, Total: 48640, RAM: 194560 B
activation: Input: 120, Output: 120, Total: 240, RAM: 960 B
fullyconnected: Input: 120, Output: 84, Weights: 10080, Biases: 84, Total: 10368, RAM: 41472 B
activation: Input: 84, Output: 84, Total: 168, RAM: 672 B
fullyconnected: Input: 84, Output: 10, Weights: 840, Biases: 10, Total: 944, RAM: 3776 B
Maximum RAM required: 194560 B (191 kB)
For each layer, I add the sizes of the arrays used (input, output, weights and biases), and multiply by 4 as they are floats.
The maximum value is 194560 bytes, which should be ok.

Each layer is enclosed into a block (using { } ) so only the input and output arrays are propagated from one layer to the next. The other arrays are deleted at the end of the block.

Actually, I thought that I needed to do the SET_LOOP_TASK_STACK_SIZE (200 * 1024); because of this.
Do you mean that these arrays are not stored / copied in the space allocated to the stack?

I don't know much about RAM usage: where are these arrays stored? In stack, heap, else?

lesept
Posts: 38
Joined: Wed Jun 27, 2018 10:30 pm

Re: ESP32S3 Memory problem

Postby lesept » Wed Jul 31, 2024 10:44 am

Here is the only output on the serial monitor when I uncomment the last layer:

Code: Select all

Build:Mar 27 2021
rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3818,len:0x508
load:0x403c9700,len:0x4
load:0x403c9704,len:0xad0
load:0x403cc700,len:0x29e4
entry 0x403c9880
No backtrace which could help debug...

lesept
Posts: 38
Joined: Wed Jun 27, 2018 10:30 pm

Re: ESP32S3 Memory problem

Postby lesept » Fri Aug 02, 2024 11:04 am

I finally found the solution, related to some compiler attriubtes not taken into account by the Arduino IDE.

Who is online

Users browsing this forum: No registered users and 88 guests