.. ELFEN documentation master file, created by sphinx-quickstart on Fri Dec 20 17:16:37 2024. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. ELFEN - Efficient Linguistic Feature Extraction for Natural Language Datasets ============================================================================= ELFEN (Efficient Linguistic Feature Extraction for Natural Language Datasets) is a Python package for extracting linguistic features from text datasets at scale. It provides an extensive set of features that can be used to analyze text data and NLP model outputs. It is built on top of the modern dataframe package `polars`_, allowing for handling large datasets efficiently. Preprocessing backbones are built on top of the popular NLP libraries `spaCy`_ and `stanza`_, allowing for the use of both light-weight and state-of-the-art NLP models for feature extraction in various Languages. .. note:: The package is actively maintained. If you encounter any issues or have any suggestions, please feel free to open an issue or add a pull request on the `GitHub repository`_. .. _GitHub repository: https://www.github.com/mmmaurer/elfen .. _polars: https://www.pola.rs .. _spaCy: https://spacy.io .. _stanza: https://stanfordnlp.github.io/stanza/ .. toctree:: :maxdepth: 4 :caption: Getting Started installation quickstart .. toctree:: :caption: Guides tutorials custom_configuration feature_overview multilingual_support .. toctree:: :caption: API Reference elfen