GitHub - aarongeo1/Speech-Processing-Automation: Python-based toolset for automating the conversion and cleaning of speech transcript files, significantly enhancing the accuracy and efficiency of linguistic data processing.

Execution

To execute the program, run the python script titled CodeRunner.py which is located in the src directory. CodeRunner.py will run clean.py first then transform.py which are also located in the same directory. Two standard libraries namely 'os' and 're' were imported to run these scripts. The cleaned data can be found under /root/clean and the transformed data is within /root/transformed.

Data Cleaning Module: Engineered a module to preprocess raw CHA (Chatman) files, removing extraneous metadata, annotations, and non-alphabetic characters to produce cleaned text files ready for further analysis. Implemented regex-based transformations to ensure data integrity and uniformity. Phonetic Transformation Tool: Designed and developed a script to map English words to their ARPABET phonetic representations, facilitating the study of phonetics in linguistic research. Integrated error handling and data validation mechanisms to maintain high accuracy levels. Automation and Scalability: Automated the processing of an extensive dataset by recursively traversing directory structures, thereby streamlining the workflow for transforming and cleaning hundreds of files with minimal manual intervention. Performance Optimization: Employed efficient coding practices and optimized file handling operations to minimize processing times and resource usage, enabling the processing of large volumes of data with enhanced performance.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Data		Data
clean		clean
src		src
transformed		transformed
Bias & Limitations.txt		Bias & Limitations.txt
LICENSE		LICENSE
README.md		README.md
justifications.txt		justifications.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Execution

About

Releases

Packages

Languages

License

aarongeo1/Speech-Processing-Automation

Folders and files

Latest commit

History

Repository files navigation

Execution

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages