ELFEN - Efficient Linguistic Feature Extraction for Natural Language Datasets
ELFEN (Efficient Linguistic Feature Extraction for Natural Language Datasets) is a Python package for extracting linguistic features from text datasets at scale. It provides an extensive set of features that can be used to analyze text data and NLP model outputs. It is built on top of the modern dataframe package polars, allowing for handling large datasets efficiently. Preprocessing backbones are built on top of the popular NLP libraries spaCy and stanza, allowing for the use of both light-weight and state-of-the-art NLP models for feature extraction in various Languages.
Note
The package is actively maintained. If you encounter any issues or have any suggestions, please feel free to open an issue or add a pull request on the GitHub repository.
Getting Started
Guides
API Reference
- Module documentation
- elfen.configs module
- elfen.custom module
- elfen.dependency module
- elfen.emotion module
- elfen.entities module
- elfen.extractor module
- elfen.features module
- elfen.information module
- elfen.lexical_richness module
- elfen.morphological module
- elfen.pos module
- elfen.preprocess module
- elfen.psycholinguistic module
- elfen.ratios module
- elfen.readability module
- elfen.resources module
- elfen.schemas module
- elfen.semantic module
- elfen.surface module
- elfen.util module