语音识别 | Earth

通过腾讯云SDK实现语音识别功能，支持流式输出和客户端麦克风调用，以下是详细的实现流程：

一、准备环境

确保你的开发环境已经安装：
Python 3.6+
腾讯云SDK
PyAudio（用于麦克风录音）
腾讯云账号和API密钥

可以采用以下命令安装依赖：

pip install tencentcloud-sdk-python
pip install pyaudio
pip install websocket-client

二、获取腾讯云API密钥

登录腾讯云控制台，获取以下信息：

SecretId
SecretKey
AppId（语音识别应用ID）

在腾讯云控制台 > 访问管理 > API密钥管理中创建密钥。

三、配置环境变量

为了安全起见，建议将密钥信息配置为环境变量：

# Windows
set TENCENT_SECRET_ID=your_secret_id
set TENCENT_SECRET_KEY=your_secret_key
set TENCENT_APP_ID=your_app_id

# Linux/Mac
export TENCENT_SECRET_ID=your_secret_id
export TENCENT_SECRET_KEY=your_secret_key
export TENCENT_APP_ID=your_app_id

四、实现语音识别代码

4.1 基础配置

import os
import json
import time
import pyaudio
import threading
from tencentcloud.common import credential
from tencentcloud.common.profile.client_profile import ClientProfile
from tencentcloud.common.profile.http_profile import HttpProfile
from tencentcloud.asr.v20190614 import asr_client, models

# 配置信息
SECRET_ID = os.getenv('TENCENT_SECRET_ID')
SECRET_KEY = os.getenv('TENCENT_SECRET_KEY')
APP_ID = os.getenv('TENCENT_APP_ID')
REGION = "ap-beijing"  # 根据需要选择地域

# 音频参数
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000

4.2 实时语音识别类

class RealTimeASR:
    def __init__(self):
        self.cred = credential.Credential(SECRET_ID, SECRET_KEY)
        self.httpProfile = HttpProfile()
        self.httpProfile.endpoint = "asr.tencentcloudapi.com"
        
        self.clientProfile = ClientProfile()
        self.clientProfile.httpProfile = self.httpProfile
        
        self.client = asr_client.AsrClient(self.cred, REGION, self.clientProfile)
        
        self.audio = pyaudio.PyAudio()
        self.stream = None
        self.is_recording = False
        
    def start_recording(self):
        """开始录音"""
        self.stream = self.audio.open(
            format=FORMAT,
            channels=CHANNELS,
            rate=RATE,
            input=True,
            frames_per_buffer=CHUNK
        )
        self.is_recording = True
        print("开始录音...")
        
    def stop_recording(self):
        """停止录音"""
        self.is_recording = False
        if self.stream:
            self.stream.stop_stream()
            self.stream.close()
        print("录音结束")
        
    def recognize_stream(self, callback=None):
        """流式语音识别"""
        if not self.stream:
            print("请先开始录音")
            return
            
        audio_data = []
        
        while self.is_recording:
            try:
                data = self.stream.read(CHUNK, exception_on_overflow=False)
                audio_data.append(data)
                
                # 每收集一定量的音频数据就进行一次识别
                if len(audio_data) >= 10:  # 约0.64秒的音频
                    audio_bytes = b''.join(audio_data)
                    result = self._recognize_audio(audio_bytes)
                    
                    if result and callback:
                        callback(result)
                    
                    audio_data = []  # 清空缓存
                    
            except Exception as e:
                print(f"录音错误: {e}")
                break
                
    def _recognize_audio(self, audio_data):
        """调用腾讯云API进行语音识别"""
        try:
            req = models.CreateRecTaskRequest()
            req.EngineModelType = "16k_zh"  # 中文普通话
            req.ChannelNum = 1
            req.ResTextFormat = 0
            req.SourceType = 1
            
            # 将音频数据转换为base64
            import base64
            audio_base64 = base64.b64encode(audio_data).decode('utf-8')
            req.Data = audio_base64
            
            resp = self.client.CreateRecTask(req)
            
            # 获取任务ID
            task_id = resp.Data.TaskId
            
            # 轮询获取结果
            return self._get_recognition_result(task_id)
            
        except Exception as e:
            print(f"识别错误: {e}")
            return None
            
    def _get_recognition_result(self, task_id):
        """获取识别结果"""
        try:
            req = models.DescribeTaskStatusRequest()
            req.TaskId = task_id
            
            # 轮询等待结果
            for _ in range(30):  # 最多等待30秒
                resp = self.client.DescribeTaskStatus(req)
                
                if resp.Data.StatusStr == "success":
                    return resp.Data.Result
                elif resp.Data.StatusStr == "failed":
                    print("识别失败")
                    return None
                    
                time.sleep(1)  # 等待1秒后重试
                
            print("识别超时")
            return None
            
        except Exception as e:
            print(f"获取结果错误: {e}")
            return None

五、使用示例

def on_recognition_result(result):
    """识别结果回调函数"""
    print(f"识别结果: {result}")

def main():
    # 创建语音识别实例
    asr = RealTimeASR()
    
    try:
        # 开始录音
        asr.start_recording()
        
        # 创建识别线程
        recognition_thread = threading.Thread(
            target=asr.recognize_stream,
            args=(on_recognition_result,)
        )
        recognition_thread.start()
        
        # 等待用户输入停止
        input("按回车键停止录音...")
        
    except KeyboardInterrupt:
        print("\n用户中断")
    finally:
        # 停止录音
        asr.stop_recording()
        
        # 等待识别线程结束
        if 'recognition_thread' in locals():
            recognition_thread.join()
            
if __name__ == "__main__":
    main()

六、优化建议

6.1 错误处理

# 添加更完善的错误处理
try:
    # 语音识别代码
    pass
except Exception as e:
    print(f"发生错误: {e}")
    # 记录日志或进行其他处理

6.2 性能优化

调整音频缓存大小，平衡实时性和准确性
使用异步处理提高响应速度
添加音频质量检测，过滤噪音
实现断线重连机制

6.3 功能扩展

支持多语言识别
添加语音端点检测
实现语音转文字的实时显示
支持音频文件批量处理

七、注意事项

确保网络连接稳定，语音识别需要实时网络请求
注意API调用频率限制，避免超出配额
音频质量会直接影响识别准确率
妥善保管API密钥，避免泄露
根据实际需求选择合适的识别引擎模型

通过以上步骤，你就可以成功实现基于腾讯云SDK的实时语音识别功能了！

如果在使用过程中遇到问题，可以查看腾讯云官方文档或者在评论区留言交流。