Clone running firmware over mesh

Snedig
Posts: 24
Joined: Sat Apr 04, 2020 3:18 pm

Clone running firmware over mesh

Postby Snedig » Thu May 14, 2020 5:48 am

Hi,

I am developing a device that communicates over mesh, and where the root device only has a cellular card for communicating with the internet. I am currently trying to implement OTA updates on the model that the root device performs OTA for itself via cellular, and then turns off the cellular connection and sends it's new software over mesh to the other devices. As I do not have enough flash available to store the binary directly, I thought I would just read it off the running partition, as I read the contents out and saw they were identical to the binary produced by the compiler.

However, this only neary works, there seems to always be a few seemingly random changes that sneak in for a ~1MB OTA.

Any tips would be extremely welcome, as I cannot understand why it continually fails. I get these error messages:

Code: Select all

I (94992) OTA: Finished receiving OTA
I (94992) esp_image: segment 0: paddr=0x00190020 vaddr=0x3f400020 size=0x2ba14 (178708) map
I (95052) esp_image: segment 1: paddr=0x001bba3c vaddr=0x3ffb0000 size=0x03e78 ( 15992)
I (95062) esp_image: segment 2: paddr=0x001bf8bc vaddr=0x40080000 size=0x00404 (  1028)
I (95062) esp_image: segment 3: paddr=0x001bfcc8 vaddr=0x40080404 size=0x00350 (   848)
I (95072) esp_image: segment 4: paddr=0x001c0020 vaddr=0x400d0020 size=0xbfc00 (785408) map
E (95342) esp_image: invalid segment length 0x11b81aa8
I (95342) OTA: err:0x1503 (ESP_ERR_OTA_VALIDATE_FAILED)
I (95672) pppos_example: OTA partition SHA-256: c8cefb3f000000000000000080b0fc3fba49fd3f41cefb3f3ccffb3f00000000

I (604502) OTA: Writing OTA, chunk size:784, no:1042, tot:816928, ota_size:1067792, part_size:1572864, offset:816928, final:(yes)
I (604532) OTA: Finished receiving OTA
I (604532) esp_image: segment 0: paddr=0x00190020 vaddr=0x3f400020 size=0x2ba24 (178724) map
E (604592) esp_image: invalid segment length 0x66800006
I (604592) OTA: err:0x1503 (ESP_ERR_OTA_VALIDATE_FAILED)
However, it varies which image segment gets an illegal length.

The relevant code looks like this (the root will invoke send_ota_to_node, the nodes invoke get_message):

Code: Select all

static uint8_t tx_buf[TX_SIZE] = { 0, };
static uint8_t rx_buf[RX_SIZE] = { 0, };

#define ota_chunk_max_size  (1400)
static const uint16_t ota_chunk_size = 1380;
const esp_partition_t *current_partition;
const esp_partition_t *ota_partition;

typedef struct {
    uint8_t prog_data[ota_chunk_max_size + 1];
    uint16_t chunk_no;
    size_t chunk_size;
    size_t ota_size;
    uint8_t ota_version;
    bool final_chunk;
    uint16_t crc16;
    uint8_t token_id;
    uint16_t token_value;
} mesh_ota_update_t;

mesh_ota_update_t mesh_ota_update = {
    .prog_data = {0},
    .chunk_no = 0,
    .chunk_size = 0,
    .ota_size = 0,
    .ota_version = ota_version,
    .final_chunk = false,
    .crc16 = 0,
    .token_id = MESH_TOKEN_ID,
    .token_value = MESH_TOKEN_VALUE,
};

void send_ota_to_node(void *pvParameter) {
  mesh_addr_t route_table[100];
  mesh_data_t data;
  esp_err_t err;
  int i;
  bool all_data_sent;
  int flag = MESH_DATA_P2P;
  int route_table_size = 0;
  esp_mesh_set_self_organized(1, 0);
  
  mesh_ota_update.ota_version = ota_version;
  mesh_ota_update.ota_size = ota_data.image_len + (ota_data.image_len % 64); // Aligns the sent image to 64b (necessary?)
  esp_mesh_get_routing_table((mesh_addr_t *) &route_table, 300 * 6, &route_table_size);
  uint8_t *chunk_buffer = (uint8_t *) malloc(ota_chunk_size + 1);
  
  while(route_table_size < 2) {
    esp_mesh_get_routing_table((mesh_addr_t *) &route_table, 300 * 6, &route_table_size);
    vTaskDelay(1000 / portTICK_PERIOD_MS);
    ESP_LOGI(TAG, "OTA waiting...");
  }

  ESP_LOGI(TAG, "OTA proceeding");
  
  while (1) {
  for (i = 0; i < route_table_size; i++) {
    if(route_table[i].addr[0] != mesh_self_addr.addr[0] || route_table[i].addr[1] != mesh_self_addr.addr[1] || route_table[i].addr[2] != mesh_self_addr.addr[2] || route_table[i].addr[3] != mesh_self_addr.addr[3] || route_table[i].addr[4] != mesh_self_addr.addr[4] || route_table[i].addr[5] != mesh_self_addr.addr[5]) {
      all_data_sent = false;
      mesh_ota_update.chunk_no = 0;
      while (!all_data_sent) {
        send_count++;
        ESP_LOGI(TAG, "Upping send_count");
        tx_buf[25] = (send_count >> 24) & 0xff;
        tx_buf[24] = (send_count >> 16) & 0xff;
        tx_buf[23] = (send_count >> 8) & 0xff;
        tx_buf[22] = (send_count >> 0) & 0xff;
        if(mesh_ota_update.ota_size > (ota_chunk_size * (mesh_ota_update.chunk_no + 1))) {
          //ESP_LOGI(TAG, "Partition size is larger than the end of this chunk");
          mesh_ota_update.chunk_size = ota_chunk_size;
          mesh_ota_update.final_chunk = false;
        } else {
          ESP_LOGI(TAG, "Partition size is not larger than the end of this chunk (%u)", ota_chunk_size * mesh_ota_update.chunk_no);
          mesh_ota_update.chunk_size = mesh_ota_update.ota_size - (ota_chunk_size * mesh_ota_update.chunk_no);

          mesh_ota_update.final_chunk = true;
          all_data_sent = true;
        }
        ESP_LOGI(TAG, "Read %u bytes from partition @ %u", mesh_ota_update.chunk_size, mesh_ota_update.chunk_no * ota_chunk_size);
        ESP_ERROR_CHECK(esp_partition_read(current_partition, mesh_ota_update.chunk_no * ota_chunk_size, chunk_buffer, mesh_ota_update.chunk_size));
        memcpy(mesh_ota_update.prog_data, chunk_buffer, mesh_ota_update.chunk_size);
        mesh_ota_update.crc16 = crc16(mesh_ota_update.prog_data, mesh_ota_update.chunk_size);
        memcpy(tx_buf, (uint8_t *)&mesh_ota_update, sizeof(mesh_ota_update_t));
        data.data = tx_buf;
        data.size = sizeof(mesh_ota_update_t);
        data.proto = MESH_PROTO_BIN;
        err = esp_mesh_send(&route_table[i], &data, flag, NULL, 0);
        ESP_LOGW(MESH_TAG, "Sending OTA to: "MACSTR", chunk no:%u chunk size:%u struct size: %u err:0x%x (%s)", MAC2STR(route_table[i].addr), mesh_ota_update.chunk_no, mesh_ota_update.chunk_size, sizeof(mesh_ota_update_t), err, esp_err_to_name(err));
        mesh_ota_update.chunk_no++;
      }
      ESP_LOGI(MESH_TAG, "Finished OTA to "MACSTR", waiting for next", MAC2STR(route_table[i].addr));
    }
  }
  ESP_LOGI(MESH_TAG, "Finished OTA, ending");
  xTaskCreate(&send_message_to_node, "send_message_to_node", 4096, NULL, 5, NULL);
  vTaskDelete(NULL);
  }
  
void get_message(void *pvParameter)
{
  mesh_addr_t from;
  esp_err_t err;
  int send_count = 0;
  int recv_count = 0;
  int flag = 0;
  mesh_data_t data;
  data.data = rx_buf;
  size_t chunk_size = 0;
  
  while(1) {
    data.size = RX_SIZE;
    err = esp_mesh_recv(&from, &data, portMAX_DELAY, &flag, NULL, 0);
    if (err != ESP_OK || !data.size) {
      ESP_LOGI(MESH_TAG, "err:0x%x (%s), size:%d", err, esp_err_to_name(err), data.size);
      continue;
    }
    
    if (data.size >= sizeof(send_count)) {
      send_count = (data.data[25] << 24) | (data.data[24] << 16) | (data.data[23] << 8) | data.data[22];
    }
    
    recv_count++;

    if(data.proto == MESH_PROTO_BIN) {
      mesh_ota_update_t *in = (mesh_ota_update_t *) data.data;
      ESP_LOGI(MESH_TAG, "Receive OTA from "MACSTR", size:%d, heap:%d, chunk:%u, ver:%u [err:0x%x, proto:%d, tos:%d]", MAC2STR(from.addr), data.size, esp_get_free_heap_size(), in->chunk_no, in->ota_version, err, data.proto, data.tos);
      if(in->chunk_no == 0) {
        ota_in_progress = true;
        ESP_LOGI("OTA", "Began receiving OTA, size:%d", in->ota_size);
        ESP_ERROR_CHECK(esp_ota_begin(ota_partition, OTA_SIZE_UNKNOWN, &ota_handle));
        chunk_size = in->chunk_size;
        memcpy(&new_app_info, &in->prog_data[sizeof(esp_image_header_t) + sizeof(esp_image_segment_header_t)], sizeof(esp_app_desc_t));
      }

      ESP_LOGI("OTA", "Writing OTA, chunk size:%d, no:%d, tot:%d, ota_size:%d, part_size:%d, offset:%d, crc16:0x%04X, final:%s", in->chunk_size, in->chunk_no, in->chunk_size * in->chunk_no, in->ota_size, ota_partition->size, write_offset, in->crc16, in->final_chunk ? "(yes)":"(no)");
      uint16_t data_crc16 = crc16(in->prog_data, in->chunk_size);
      if (data_crc16 == in->crc16) {
        ESP_ERROR_CHECK(esp_ota_write(ota_handle, (const void *)in->prog_data, in->chunk_size));
      } else {
        ESP_LOGE("OTA", "CRC mismatch! Received crc16:0x%04X, but data has crc16:0x%04X", in->crc16, data_crc16);
      }
      if(in->final_chunk) {
        ESP_LOGI("OTA", "Finished receiving OTA");
        err = esp_ota_end(ota_handle);
        if (err != ESP_OK) {
          ESP_LOGI("OTA", "err:0x%x (%s)", err, esp_err_to_name(err));
          uint8_t *ota_sha256 = (uint8_t *) malloc(32);
          err = esp_partition_get_sha256(ota_partition, ota_sha256);
          print_sha256(ota_sha256, "OTA partition SHA-256");
        } else {
          ESP_LOGI("OTA", "Image verified, rebooting");
          ESP_ERROR_CHECK(esp_ota_set_boot_partition(ota_partition));
          esp_restart();
        }
      }
    }
  }
  vTaskDelete(NULL);
}
Best regards,
Preben

Snedig
Posts: 24
Joined: Sat Apr 04, 2020 3:18 pm

Re: Clone running firmware over mesh

Postby Snedig » Fri May 15, 2020 5:01 pm

After some more checking it seems some of my packets are being dropped in the mesh, so once I sort that out the problem might resolve itself - any tips here are welcome:)

willemmerson
Posts: 40
Joined: Mon Mar 18, 2019 12:34 pm

Re: Clone running firmware over mesh

Postby willemmerson » Tue May 19, 2020 8:20 am


Snedig
Posts: 24
Joined: Sat Apr 04, 2020 3:18 pm

Re: Clone running firmware over mesh

Postby Snedig » Wed May 20, 2020 4:03 pm

I can't use ESP-MDF as it doesn't have chain topology support, and I need to be able to do up to 100 ESP's in a chain.

That being said, mupgrade is pretty well explained, so that is the route I'm working on atm, send over everything and use partition_write, then check over and request any missing pieces.

Snedig
Posts: 24
Joined: Sat Apr 04, 2020 3:18 pm

Re: Clone running firmware over mesh

Postby Snedig » Sat Sep 12, 2020 1:13 pm

I figured this out, turns out my problems were bad packages that had to be resent, and since I was using a chunk size differing from the alignment size of the storage, the following packet was being left in a corrupt state.

I changed the code to use 1024 as firmware chunk size, and made sure do redownload all 4 chunks belonging to a sector on a corrupt packet, so now it works:)

phille
Posts: 4
Joined: Thu Feb 25, 2021 8:08 pm

Re: Clone running firmware over mesh

Postby phille » Sun Feb 28, 2021 7:31 pm

Hi Preben,
I'm looking for a similar solution for my ESP IDF Mesh, clone root firmware to the other nodes. But I have not managed with it yet. Then I came over your post here:)
How did you solve the "and made sure do redownload all 4 chunks belonging to a sector on a corrupt packet" issue?
Or is there any way you can share all your code for the OTA part? Have been struggling with this for some time now..

Snedig
Posts: 24
Joined: Sat Apr 04, 2020 3:18 pm

Re: Clone running firmware over mesh

Postby Snedig » Wed Apr 28, 2021 12:12 pm

Hi,

Sorry about the late reply, I hadn't checked the forum for a while.

My code for the checking is this:

Code: Select all

for (i = 0; i <= last_chunk_no; i++) {
      if (mesh_ota_log[i].written) {
        ESP_ERROR_CHECK_WITHOUT_ABORT(esp_partition_read(ota_partition, OTA_CHUNK_SIZE * i, &check_ota_buffer, mesh_ota_log[i].chunk_size));
        read_crc = crc((unsigned char *)check_ota_buffer, mesh_ota_log[i].chunk_size);
        if (read_crc != mesh_ota_log[i].crc) {
          ESP_LOGW(MESH_OTA_TAG, "CRC error, chunk %d (%d) (0x%04X) not matching written (0x%04X), chunk %d-%d marked missing", i, mesh_ota_log[i].chunk_size, mesh_ota_log[i].crc, read_crc, i - (((i * OTA_CHUNK_SIZE) % 4096) / OTA_CHUNK_SIZE), i + 4 - (((i * OTA_CHUNK_SIZE) % 4096) / OTA_CHUNK_SIZE));
          erase_sector_start = ((i * OTA_CHUNK_SIZE) / 4096) * 4096;
          //ESP_LOGW("OTA", "Missing chunk %d, erasing sector %d @ %d - %d", i, erase_sector_start / 4096, erase_sector_start, erase_sector_end);
          esp_partition_erase_range(ota_partition, erase_sector_start, 4096);
          for (uint16_t j = i - (((i * OTA_CHUNK_SIZE) % 4096) / OTA_CHUNK_SIZE); j < i + 4 - (((i * OTA_CHUNK_SIZE) % 4096) / OTA_CHUNK_SIZE); j++) {
            ESP_LOGI(MESH_OTA_TAG, "Marking chunk %d missing..", j);
            mesh_ota_log[j].written = false;
            mesh_ota_log[j].chunk_size = 0;
            mesh_ota_log[j].crc = 0;
          }
        }
      }
    }
I am using a struct array called mesh_ota_log to control received chunks, but the salient points are the:
erase_sector_start = ((i * OTA_CHUNK_SIZE) / 4096) * 4096;
esp_partition_erase_range(ota_partition, erase_sector_start, 4096);
That make sure to delete a 4096 byte aligned chunk if there is an error with a certain received packet. I then request the 4 deleted chunks (OTA_CHUNK_SIZE is 1024) for redownload.


Preben

Who is online

Users browsing this forum: markkuk and 335 guests