Unstable partition API flashing functions

imtiaz
Posts: 106
Joined: Wed Oct 26, 2016 1:34 am

Unstable partition API flashing functions

Postby imtiaz » Mon Mar 20, 2017 2:22 am

Hi Espressif experts , @ESP_Angus,

I have been struggling to get reliable performance with SPI flashing functions . I am using the partition API for both OTA for esp32 and for storing a binary file for another processor. I am transferring the file over wifi TCP socket. The esp32 being the AP and the socket server.

If I just download the file without writing it to flash there are no issues - however as soon as I enable the file to be written - which means I erase the partition and then write and then read back to verify , the program becomes unstable.

While doing OTA - the socket receiving thread seems to sometimes just hang after OTA_Init() - no error messages - sometimes half way through receiving the file it seems to hang - other times it crashes as follows:

Code: Select all

erasing partition<\r><\n>
<27>[0;32mI (27748) flashops:   1<27>[0m<\r><\n>
Guru Meditation Error of type IllegalInstruction occurred on core  0. Exception was unhandled.<\r><\n>
Register dump:<\r><\n>
PC      : 0x4011b49e  PS      : 0x00060f33  A0      : 0x80046686  A1      : 0x3ffc0500  <\r><\n>
A2      : 0x00000000  A3      : 0x4011b49c  A4      : 0x00000000  A5      : 0x0000000c  <\r><\n>
A6      : 0x3ffb93cc  A7      : 0x3ffb8360  A8      : 0x80019fb8  A9      : 0x000044c4  <\r><\n>
A10     : 0x00000000  A11     : 0x00000000  A12     : 0x00060d21  A13     : 0x00000022  <\r><\n>
A14     : 0x000023ec  A15     : 0x3ffc0640  SAR     : 0x00000017  EXCCAUSE: 0x00000000  <\r><\n>
EXCVADDR: 0x00000000  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffc  <\r><\n>
<\r><\n>
Backtrace: 0x4011b49e:0x3ffc0500 0x40046686:0x3ffc0530 0x40047518:0x3ffc0550 0x40048536:0x3ffc0570 0x40048675:0x3ffc0590 0x40054ed9:0x3ffc05b0 0x40082482:0x3ffc05d0 0x400810e8:0x3ffc0600<\r><\n>
<\r><\n>
Entering gdb stub now.<\r><\n>
Guru Meditation Error of type IllegalInstruction occurred on core  0. Exception was unhandled.<\r><\n>
Register dump:<\r><\n>
PC      : 0x400d416a  PS      : 0x00060033  A0      : 0x80085060  A1      : 0x3ffc03a0  <\r><\n>
A2      : 0x3ffc0440  A3      : 0x3ff5f000  A4      : 0x383fc000  A5      : 0x3ff60000  <\r><\n>
A6      : 0x00000000  A7      : 0x383fc000  A8      : 0x80084e8f  A9      : 0x3ffc0380  <\r><\n>
A10     : 0x3ffc0600  A11     : 0x3ffc0600  A12     : 0x00000009  A13     : 0x3ffc0400  <\r><\n>
A14     : 0x3ffc1953  A15     : 0x3ffc195c  SAR     : 0x00000017  EXCCAUSE: 0x00000000  <\r><\n>
EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0x00000000  <\r><\n>
<\r><\n>
Backtrace: 0x400d416a:0x3ffc03a0 0x40085060:0x3ffc0420 0x40080e8d:0x3ffc0440 0x4011b49e:0x3ffc0500 0x4011b49e:0x3ffc0530 0x40047518:0x3ffc0550 0x40048536:0x3ffc0570 0x40048675:0x3ffc0590 0x40054ed9:0x3ffc05b0 0x40082482:0x3ffc05d0 0x400810e8:0x3ffc0600<\r><\n>
I can give a sample of the code if you like. I am using IDF from a few days ago but I have had the same issues from previous versions as well.

Also - I believe there is an issue with partition erase if you call it with "NULL" size which is supposed to erase the whole partition.

Thank you
Imtiaz

aaquilina
Posts: 43
Joined: Fri Jan 20, 2017 3:10 pm

Re: Unstable partition API flashing functions

Postby aaquilina » Mon Mar 20, 2017 2:08 pm

Something I found when reading from flash is that unless you pin the associated task to a single core, the processor tends to crash ( at a non repeatable location). I believe theres a bug in the spi_flash api.

ESP_igrr
Posts: 2072
Joined: Tue Dec 01, 2015 8:37 am

Re: Unstable partition API flashing functions

Postby ESP_igrr » Mon Mar 20, 2017 4:59 pm

Hi imtiaz,
If you could share a code sample which exhibits the issue, that would be great.

imtiaz
Posts: 106
Joined: Wed Oct 26, 2016 1:34 am

Re: Unstable partition API flashing functions

Postby imtiaz » Mon Mar 20, 2017 8:04 pm

Code: Select all

static void fwUpdate_thread(void *arg)
{
    FWDWNLD_THREAD_ARGS* MyArgs = (FWDWNLD_THREAD_ARGS*) arg;
    struct sockaddr_in clientAddress;
	struct sockaddr_in serverAddress;
	TRACE_D("Firmware Update Sever Socket Starting .......");
	TRACE_D(" Port = %d", MyArgs->PortNumber);
	// Create a socket that we will listen upon.
	int sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (sock < 0)
	{
		TRACE_E("socket: %d %s \n", sock, strerror(errno));
		while(1);
	}
	// Bind our server socket to a port.
	serverAddress.sin_family = AF_INET;
	serverAddress.sin_addr.s_addr = htonl(INADDR_ANY);
	serverAddress.sin_port = htons(MyArgs->PortNumber);
	int rc  = bind(sock, (struct sockaddr *)&serverAddress, sizeof(serverAddress));
	if (rc < 0)
	{
		TRACE_E("socket: %d %s \n", sock, strerror(errno));
		while(1);
	}

	// Flag the socket as listening for new connections
	rc = listen(sock, 1);
	if (rc < 0)
	{
		TRACE_E("listen: %d %s \n", rc, strerror(errno));
		while(1);
	}
	BOOL Done = False;
	while(!Done)
	{
	    socklen_t clientAddressLength = sizeof(clientAddress);
	    TRACE_D("Waiting for new Connection \n");
		int clientSock = accept(sock, (struct sockaddr *)&clientAddress, &clientAddressLength); //blocks until new connection available
		if (clientSock < 0)
		{
			TRACE_E("accept error: %d %s", clientSock, strerror(errno));
		}
		else
		{
		    TRACE_D("new accept: %d %s\n", clientSock, strerror(errno));
		    uint8_t* RxBuf  = malloc(1024); //our local Rx Buffer on the heap
		    const esp_partition_t *p = esp_partition_find_first(ESP_PARTITION_TYPE_DATA, 0x80, NULL);
		    if(p == NULL)
		    {
		    	TRACE_E("ST Code Partition not found\n");
		    	Done = True;
		    	while(1);
		    }
		    else
		    {
		    	TRACE_D("ST Partition Found : %d bytes\n",p->size);
		    	//vTaskDelay(1000 / portTICK_RATE_MS);
		    }
		    uint32_t FileSizeRecived = 0;
		    while(1)
		    {
		    	ssize_t sizeRead = recv(clientSock, RxBuf, 1024, 0);
		    	if(sizeRead > 0)
		    	{
		    		if(esp_partition_write(p, FileSizeRecived , RxBuf, sizeRead) == ESP_OK)
		    		{
		    			 TRACE_D("%d : %d\n", sizeRead,FileSizeRecived);
		    		}
		    		else
		    		{
		    			TRACE_E("ST partition write error : %d\n",sizeRead);
		    			while(1);
		    		}
		    		FileSizeRecived+=sizeRead;
		    	}
		    	else if(sizeRead == 0)
		    	{
		    		TRACE_D("File Size : %d\n", FileSizeRecived);
		    		break;
		    	}
		    	else
		    	{
		    		TRACE_E("Socket return error\n");
		    		break;
		    	}
		    }
		    free(RxBuf);
		    if(VerifyBinFileFromFlash(p,FileSizeRecived ,(unsigned char*)&MyArgs->FR.md5Hash ))
		    {
		    	memcpy(&dwnldFileRequestCopy , &MyArgs->FR , sizeof(ID_DWNLD_FILE_REQUEST_TYPE));
		    }
		    else
		    {
		    	memset(&dwnldFileRequestCopy,0, sizeof(ID_DWNLD_FILE_REQUEST_TYPE));
		    }

		    close(clientSock);
		    close(sock);
		    Done = True;
		}
	}
	TRACE_D("Exiting update thread\n");

	vTaskDelete(NULL);
}
/************************************************************
@Func:
@Inputs:
@Outputs:
*************************************************************/
static esp_err_t EraseSTPartition(void)
{
	const esp_partition_t *p = esp_partition_find_first(ESP_PARTITION_TYPE_DATA, 0x80, NULL);

	if(p)
	{
		if(esp_partition_erase_range(p, 0, p->size-4096) == ESP_OK)
		{
				TRACE_D("erase OK : %d kbytes\n",p->size/1024);
				return ESP_OK;
		}
	}
	else
	{
		TRACE_E("ST Code Partition not found\n");
	}

	return ESP_FAIL;
}
/************************************************************
@Func:
@Inputs:
@Outputs:
*************************************************************/
void syrp_dwnld_bin_Start(uint16_t portNumber , ID_DWNLD_FILE_REQUEST_TYPE* pFR)
{
	static FWDWNLD_THREAD_ARGS args;
	args.PortNumber = portNumber;
	memcpy(&args.FR , pFR , sizeof(ID_DWNLD_FILE_REQUEST_TYPE));
	uint8_t error = 0;

	if(pFR->fileSize < GetMaxFileSize(pFR->hwID)) // step 1 : check the file size does not exceed max file size for that h/w module
	{
		// step 2 : erase the memory where the file will be temporarily stored if core module / if espmodule than start OTA process
		TRACE_D("erasing partition\n");
		if(EraseSTPartition() == ESP_OK)
		{
			//sys_thread_new("fwUpdate_thread", fwUpdate_thread, &args, 2048*4, 5);
			xTaskCreate(fwUpdate_thread, "fwUpdate_thread", 8192, &args, 6, NULL);
		}
		else
		{
			TRACE_E("Fatal error with flash erase\n");
			error = 2;
		}
	}
	else
	{
		TRACE_E("File size too large\n");
		error = 1;
	}
	if(error)
	{
		Send_ID_ERROR(ID_DWNLD_FILE_REQUEST , error, 0 , 0 ); //fixme the port and client are 0
	}
	else
	{
		Send_ID_DWNLD_FILE_REQUEST_REPLY(0 , 0 , pFR->hwID ,portNumber);
	}
}

imtiaz
Posts: 106
Joined: Wed Oct 26, 2016 1:34 am

Re: Unstable partition API flashing functions

Postby imtiaz » Mon Mar 20, 2017 8:19 pm

I can see on a wifi analyser that very often the access point drops off soon after a flash erase . But I am not 100 percent sure whether its to do with the flash erase or some other event happening at the same time , like starting a TCP server.

imtiaz
Posts: 106
Joined: Wed Oct 26, 2016 1:34 am

Re: Unstable partition API flashing functions

Postby imtiaz » Mon Mar 20, 2017 10:06 pm

I can confirm that I have isolated the problem. Calling

Code: Select all

(esp_partition_erase_range(p, 0, p->size)
or

Code: Select all

esp_ota_begin( partition, OTA_SIZE_UNKNOWN, &out_handle);
causes the wifi access point to drop off.

This has caused a lot of delay, confusion and frustration - please look at it with high priority.

Thanks
Imtiaz

imtiaz
Posts: 106
Joined: Wed Oct 26, 2016 1:34 am

Re: Unstable partition API flashing functions

Postby imtiaz » Tue Mar 21, 2017 11:40 pm

@ESP_igrr @ESP_Sprite @ESP_Angus

Hi Guys ,

I know you are busy , but some response would be appreciated :)

Thanks
Imtiaz

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Unstable partition API flashing functions

Postby ESP_Angus » Wed Mar 22, 2017 2:28 am

Hi Imtiaz,

We had a quick discussion about how to solve this yesterday. I'm going to try and reproduce & solve today (it may be that reproducing is easier on certain APs or ambient WiFi traffic loads).

Will keep you in the loop.

Angus

ESP_Angus
Posts: 2344
Joined: Sun May 08, 2016 4:11 am

Re: Unstable partition API flashing functions

Postby ESP_Angus » Wed Mar 22, 2017 6:39 am

Hi Imtiaz,

I haven't managed to reproduce this issue, even under high network load the ESP32 stays associated. I'm guessing either the partition you're erasing is particularly large, the SPI flash is slow, or the AP you have is more sensitive to timeouts (or maybe some combination of factors.)

However, there is a probable fix in this branch:
https://github.com/espressif/esp-idf/tr ... ock_period

Can you please try it out and let us know if it fixes the problem?

Angus

imtiaz
Posts: 106
Joined: Wed Oct 26, 2016 1:34 am

Re: Unstable partition API flashing functions

Postby imtiaz » Wed Mar 22, 2017 7:48 pm

Hi Angus @ESP_Angus,

It seems like from your wording you are testing with esp32 as a station. I am talking about the esp32 as a wifi access point. Then as you do SPI functions the AP just drops off and the esp32 doesnt seem to be resetting
Also the partition I am erasing is either 1Mbyte or 1.8MByte and it is on your standard dev kit and module . Its not slow because I when it does work I can send and write the whole 1Mbyte partition in a few seconds.

Also in your testing please enable bluetooth as well

Who is online

Users browsing this forum: Google [Bot] and 286 guests