如何实现连续语音识别

inkfish321
Posts: 14
Joined: Thu Apr 22, 2021 6:23 am

如何实现连续语音识别

Postby inkfish321 » Fri Feb 03, 2023 8:38 am

我根据例子esp-adf\examples\speech_recognition\wwe 实现的基本的语音识别功能
但是无法连续语音识别。每次都需要先说唤醒词。 然后我根据这个说明

Code: Select all

////////////////////////////////////// Use case 4: Multi dialog after voice wakeup ////////////////////////
                         +                                                        + wakeup time +
                         |                                                        +-------------+
                         |                                                        |             |
                         |                                                        |             |
Wakeup Time  +-----------+                                                        |             +---------+
                +--------+         +----------+      +-------+   +---------+      |             |
                | wakeup |         |  voice   |      | music |   |  voice  |      |             |
                |   word |         |          |      |       |   |         |      |             |
Voice level  +--+--------+---------+          +------+       +---+         +------------------------------+
                         |                    |                  |         |      |             |
                         |                    +----------------+ |         |      |             |
                         |       SUSPEND ON \ |   SUSPEND OFF\ | |         |      |             |
                         |                   \|               \| |         |      |             |
SUSPEND ON   +--------------------------------+                +------------------------------------------+
                         |                    |                  |         |      |             |
                         |         +----------+                  |         +------+             |
                         |         |          |                  |         |      |             |
                         |         |          |                  |         |      |             |
EVENT        +---------------------+          +-----------------------------------------------------------+
                         |\        |\         |\                /|         |      |\            |\
                         | \       | \        | \              / |         | vad  | \           | \
                         |  WAKEUP |  VAD     |  VAD       VAD   |         | off  | VAD         |  WAKEUP
                         +  START  +  START   +  STOP      START +         + time + STOP        +  END

///////////////////////////////////////////////////////////////////////////////////////////////////////////

每次播放交互语音的时候先调用API SUSPEND ON,播放完交互语音调用SUSPEND OFF

audio_rec_cfg_t cfg = AUDIO_RECORDER_DEFAULT_CFG();
cfg.pinned_core = RECORD_INPUT_TASK_CPU_CORE;
cfg.read = (recorder_data_read_t)&input_cb_for_afe;
cfg.sr_handle = recorder_sr_create(&recorder_sr_cfg, &cfg.sr_iface);

srHandle = cfg.sr_handle;
srIface = cfg.sr_iface;

//SUSPEND ON
srIface->afe_suspend(srHandle, true);

do{
srIface->base.get_state(srHandle, &sr_st);
if (sr_st.afe_state == SUSPENDED)
{
ESP_LOGI(TAG, "sr_st.afe_state SUSPENDED");
break;
}
}while(1);

//播放交互语音
if(app_quest_cmd_process(void_cmd_id) == true)
{
mIfStop = false;
}
//SUSPEND OFF
srIface->afe_suspend(srHandle, false);

do{
srIface->base.get_state(srHandle, &sr_st);
if (sr_st.afe_state != SUSPENDED)
{
ESP_LOGI(TAG, "sr_st.afe_state RUNNING");
break;
}
}while(1);

是否能提供个完整的例子。

inkfish321
Posts: 14
Joined: Thu Apr 22, 2021 6:23 am

Re: 如何实现连续语音识别

Postby inkfish321 » Mon Feb 06, 2023 8:19 am

我仔细看了一下代码 改成

audio_recorder_trigger_stop(recorder);

//播放交互语音
.........

audio_recorder_trigger_start(recorder);

这样的方式

但是还是不行

打印信息结果如下
I (15005) APP_RECORDER: rec_engine_cb - AUDIO_REC_COMMAND_DECT
W (15006) APP_RECORDER: command 16
I (15006) AUDIO_RECORDER: RECORDER_CMD_TRIGGER_STOP [state 5]
I (15006) APP_RECORDER: rec_engine_cb - REC_EVENT_WAKEUP_END
I (15007) uart_events: app_player_tone_resume
I (15008) uart_events: app_player_tone_resume
I (15009) uart_events: app_player_quest 1 0
I (15009) AUDIO_RECORDER: RECORDER_CMD_TRIGGER_START
I (15010) AUDIO_RECORDER: Recorder update state, cur 0, event 2
I (15011) APP_RECORDER: rec_engine_cb - REC_EVENT_WAKEUP_START
I (15011) AUDIO_RECORDER: Recorder update state, cur 1, event 0
I (15035) APP_RECORDER: voice read begin
W (15036) AUDIO_RECORDER: Not in speeching, return 0
W (15036) APP_RECORDER: audio recorder read finished 0
I (15094) uart_events: mPlayerStateRecord = 1
I (15094) uart_events: mPlayerVolRecord = 80
I (15240) AUDIO_RECORDER: Recorder update state, cur 1, event 1
I (15401) AUDIO_RECORDER: Recorder update state, cur 2, event 4
I (15401) APP_RECORDER: rec_engine_cb - REC_EVENT_VAD_START 1
I (15401) APP_RECORDER: rec_engine_cb - REC_EVENT_VAD_START 2
I (15402) APP_RECORDER: voice read begin
I (15441) AUDIO_RECORDER: Recorder update state, cur 3, event 1
I (15457) AUDIO_RECORDER: Recorder update state, cur 3, event 0
I (15563) AUDIO_RECORDER: Recorder update state, cur 4, event 1
I (16334) AUDIO_RECORDER: Recorder update state, cur 3, event 0
I (16348) AUDIO_RECORDER: Recorder update state, cur 4, event 1
I (16523) AUDIO_RECORDER: Recorder update state, cur 3, event 0
I (16528) AUDIO_RECORDER: Recorder update state, cur 4, event 1
I (16971) AUDIO_RECORDER: Recorder update state, cur 3, event 0
I (17094) uart_events: mPlayerStateRecord = 4
I (17271) AUDIO_RECORDER: Recorder update state, cur 4, event 4
I (17272) APP_RECORDER: rec_engine_cb - REC_EVENT_VAD_STOP 1
I (17272) APP_RECORDER: rec_engine_cb - REC_EVENT_VAD_STOP 2
I (17294) AUDIO_RECORDER: Recorder update state, cur 5, event 0
I (17308) APP_RECORDER: voice read stopped void_cmd_id = -100, voiceLen = 66560
I (17611) AUDIO_RECORDER: Recorder update state, cur 5, event 1
I (17772) AUDIO_RECORDER: Recorder update state, cur 2, event 4
I (17772) APP_RECORDER: rec_engine_cb - REC_EVENT_VAD_START 1
I (17772) APP_RECORDER: rec_engine_cb - REC_EVENT_VAD_START 2
I (17773) uart_events: app_player_pause_for_tone = 4
I (17773) APP_RECORDER: voice read begin
I (17801) AUDIO_RECORDER: Recorder update state, cur 3, event 1
I (18309) APP_RECORDER: reset_avd_timer_handler
I (18526) AUDIO_RECORDER: Recorder update state, cur 3, event 0
I (18827) AUDIO_RECORDER: Recorder update state, cur 4, event 4
I (18827) APP_RECORDER: rec_engine_cb - REC_EVENT_VAD_STOP 1
I (18828) APP_RECORDER: rec_engine_cb - REC_EVENT_VAD_STOP 2
I (18834) AUDIO_RECORDER: Recorder update state, cur 5, event 0
I (18848) APP_RECORDER: voice read stopped void_cmd_id = -101, voiceLen = 39936
I (19728) AUDIO_RECORDER: Recorder update state, cur 5, event 3
I (19728) APP_RECORDER: rec_engine_cb - REC_EVENT_WAKEUP_END
I (19728) uart_events: app_player_tone_resume

进一步查看代码发现
AUDIO_RECORDER: Recorder update state, cur 3, event 0
状态为3 RECORDER_ST_SPEECHING 一直收到事件 0 RECORDER_EVENT_NOISE_DECT

进一步跟进发现

Code: Select all

static void fetch_task(void *parameters)
{
    recorder_sr_t *recorder_sr = (recorder_sr_t *)parameters;
    recorder_sr->fetch_running = true;

    while (recorder_sr->fetch_running) {
        xEventGroupWaitBits(recorder_sr->events, FETCH_TASK_RUNNING, false, true, portMAX_DELAY);

        afe_fetch_result_t *res = esp_afe->fetch(recorder_sr->afe_handle);
#ifdef CONFIG_USE_MULTINET
        recorder_mn_detect(recorder_sr, res->data, res->wakeup_state);
#endif
        if (recorder_sr->afe_monitor) {
            recorder_sr->afe_monitor(recorder_sr_afe_result_convert(recorder_sr, res), recorder_sr->afe_monitor_ctx);
        }
        recorder_sr_output(recorder_sr, res->data, res->data_size);
    }
    xEventGroupClearBits(recorder_sr->events, FETCH_TASK_RUNNING);
    xEventGroupSetBits(recorder_sr->events, FETCH_TASK_DESTROY);
    vTaskDelete(NULL);
}

Code: Select all

static inline int recorder_sr_afe_result_convert(recorder_sr_t *recorder_sr, afe_fetch_result_t *result)
{
    int ret = SR_RESULT_UNKNOW;
    ESP_LOGV(TAG, "wake %d, vad %d", result->wakeup_state, result->vad_state);
    switch (result->wakeup_state) {
        case WAKENET_CHANNEL_VERIFIED:
            ret = SR_RESULT_VERIFIED;
            break;
        case WAKENET_NO_DETECT:
            if (recorder_sr->vad_enable) {
                if (result->vad_state == AFE_VAD_SILENCE) {
                    ret = SR_RESULT_NOISE;
                } else if (result->vad_state == AFE_VAD_SPEECH) {
                    ret = SR_RESULT_SPEECH;
                } else {
                    ESP_LOGE(TAG, "vad state error");
                }
            } else {
                ret = SR_RESULT_SPEECH;
            }
            break;
        case WAKENET_DETECTED:
            ret = SR_RESULT_WAKEUP;
            break;
        default:
            break;
    }
    return ret;
}
RECORDER_SR: wake 0, vad 1

串口打印的信息表明状态 WAKENET_NO_DETECT 时返回 SR_RESULT_NOISE 导致无法连续识别语音。每次都要先说唤醒词才可以识别到语音命令。请问怎么改进

Joseph Tang
Posts: 1
Joined: Tue Feb 07, 2023 1:32 am

Re: 如何实现连续语音识别

Postby Joseph Tang » Tue Feb 07, 2023 2:04 am

1. 目前audio recorder并不支持连续命令词,因为需要一个唤醒词来标记命令词检测的开始,且multinet也有检测状态。
2. afe_suspend 是用来暂停唤醒词检测的。
3. 命令词的检测可以参阅 `recorder_mn_detect`

inkfish321
Posts: 14
Joined: Thu Apr 22, 2021 6:23 am

Re: 如何实现连续语音识别

Postby inkfish321 » Tue Feb 07, 2023 7:17 am

那请问这个问题有没有办法得到解决?或者什么时候可以解决??

ESP_William
Posts: 135
Joined: Tue Apr 24, 2018 5:54 am

Re: 如何实现连续语音识别

Postby ESP_William » Mon Feb 13, 2023 11:30 am

ESP-ADF 架构限制,短期内不好修改
如果您着急的话,可以选择 esp-skainet 来开发,例如这个 demo
https://github.com/espressif/esp-skaine ... ecognition

Who is online

Users browsing this forum: No registered users and 59 guests