Key features of Earkind include:
- The use of language models (LMs) combined with neural expressive text-to-speech and programmatic audio editing to create full podcast episodes and descriptions.
- Crawling title and abstract information from arXiv papers, as well as extracting other details from the raw PDF text using the chatGPT API.
- System and user prompts designed for each section and subsection of the podcast, allowing for 0 or 1-shot generation depending on prompt complexity.
- Engaging content presented as a conversation between characters, including the enthusiastic host, the sarcastic analyst, and the knowledgeable research expert.
- Editing the podcast with a variety of jingles, sound effects, and background music using Pydub, ensuring a polished and professional audio experience.
- Automatic generation of podcast descriptions with timestamps and titles using chatGPT.