Feature Overview

The table below gives an overview of the implemented features in ELFEN, grouped by feature area, with details on the feature name, description, and the feature area it belongs to.

Feature

Feature Area

Feature name

Name in extracted dataframe

Function

Description

Notes

Raw sequence length/total number of characters

surface

raw_sequence_length

raw_sequence_length

get_raw_sequence_length

Number of characters in the text (including whitespaces)

Number of tokens

surface

n_tokens

n_tokens

get_num_tokens

Number of tokens in the text

Number of sentences

surface

n_sentences

n_sentences

get_num_sentences

Number of sentences in the text

Number of token per sentence

surface

tokens_per_sentence

tokens_per_sentence

get_num_tokens_per_sentence

Average number of tokens per sentence: n_tokens / n_sentences

Number of characters

surface

n_characters

n_characters

get_num_characters

Number of characters in the text (excluding whitespaces)

Characters per sentence

surface

characters_per_sentence

characters_per_sentence

get_chars_per_sentence

Average number of characters per sentence: n_characters / n_sentences

Raw sequence length per sentence

surface

raw_length_per_sentence

raw_length_per_sentence

get_raw_length_per_sentence

Average number of characters per sentence: raw_sequence_length / n_sentences

Average word length

surface

avg_word_length

avg_word_length

get_avg_word_length

Average word length (in characters): n_characters / n_tokens

Number of types

surface

n_types

n_types

get_num_types

Number of types (unique tokens) in the text

Number of long words

surface

n_long_words

n_long_words

get_num_long_words

Number of long words (i.t.o. characters)

Threshold of what is considered a long word defaults to >6 characters; can be adapted in the config

Number of lemmas

surface

n_lemmas

n_lemmas

get_num_lemmas

Number of lemmas in the text

Token frequencies

surface

token_freqs

token_freqs

get_token_freqs

Token frequencies of the types in the text

As this produces a list in a column, writing to file has to be handled

Number of lexical tokens

pos

n_lexical_tokens

n_lexical_tokens

get_num_lexical_tokens

Number of lexical tokens (tokens w/ upos tag NOUN, ADVERB, ADJ, ADV)

POS variability

pos

pos_variability

pos_variability

get_pos_variability

POS variability of the text: (unique upos text in the text) / n_tokens

Number of tokens with upos tag {pos}

pos

n_per_pos

n_{pos}

get_num_per_pos

Number of tokens with a given upos tag in the text. Takes a list of upos tag to extract this feature for

pos_list defaults to all upos tags; if you only need a subset, this can be adapted in the config

Lemma token ratio

lexical_richness

lemma_token_ratio

lemma_token_ratio

get_lemma_token_ratio

Lemma token ratio of the text: n_lemmas / n_tokens

Type token ratio

lexical_richness

ttr

ttr

get_ttr

Type token ratio of the text: n_types / n_tokens

Root type token ratio

lexical_richness

rttr

rttr

get_rttr

Root type token ratio of the text: sqrt(n_types / n_tokens)

Corrected type token ratio

lexical_richness

cttr

cttr

get_cttr

Corrected type token ratio of the text: n_types / sqrt(2 * n_tokens)

Herdan’s C

lexical_richness

herdan_c

herdan_c

get_herdan_c

Herdan’s C of a text: log(n_types) / log(n_tokens)

Summer’s type token ratio/ index

lexical_richness

summer_index

summer_index

get_summer_index

Summer’s text token ratio of the text: log(log(n_types)) / log(log(n_tokens))

Dugast’s Uber index

lexical_richness

dugast_u

dugast_u

get_dugast_u

Dugast’s Uber index of the text: log(n_types)^2 / (log(n_tokens) - log( n_types))

Maas’ text token ratio/index

lexical_richness

maas_index

maas_index

get_maas_index

Maas’ text token ratio of the text: (n_tokens - n_types) / log(n_types)^2

Number of local hapax legomena

lexical_richness

n_hapax_legomena

n_hapax_legomena

get_n_hapax_legomena

Number of hapax legomena (tokens that occur only once) in the text

Number of global token hapax legomena

lexical_richness

n_global_token_hapax_legomena

n_global_token_hapax_legomena

get_n_global_token_hapax_legomena

Number of hapax legomena (tokens that occur only once) in the entire corpus in the text instance

Number of global lemma hapax legomena

lexical_richness

n_global_lemma_hapax_legomena

n_global_lemma_hapax_legomena

get_n_global_lemma_hapax_legomena

Number of hapax legomena (lemmas that occur only once) in the entire corpus in the text instance

Number of hapax dislegomena

lexical_richness

n_hapax_dislegomena

n_hapax_dislegomena

get_n_hapax_dislegomena

Number of hapax dislegomena (tokens that occur once or twice) in the text

Number of global token hapax dislegomena

lexical_richness

n_global_token_hapax_dislegomena

n_global_token_hapax_dislegomena

get_n_global_token_hapax_dislegomena

Number of hapax dislegomena (tokens that occur once or twice) in the entire corpus in the text instance

Number of global lemma hapax dislegomena

lexical_richness

n_global_lemma_hapax_dislegomena

n_global_lemma_hapax_dislegomena

get_n_global_lemma_hapax_dislegomena

Number of hapax dislegomena (tokens that occur once or twice) in the entire corpus in the text instance

Sichel’s S

lexical_richness

sichel_s

sichel_s

get_sichel_s

Sichel’s S of the text: n_hapax_dislegomena / n_types

Global Sichel’s S

lexical_richness

global_sichel_s

global_sichel_s

get_global_sichel_s

Global Sichel’s S of the text: n_global_token_hapax_dislegomena / n_types

Lexical density

lexical_richness

lexical_density

lexical_density

get_lexical_density

Lexical density of the text: n_lexical_tokens / n_tokens

Giroud’s index

lexical_richness

giroud_index

giroud_index

get_giroud_index

Giroud’s index of a text: n_types / sqrt(n_tokens)

Measure of Textual Lexical Density (MTLD)

lexical_richness

mtld

mtld

get_mtld

For definition, check https://link.springer.com/article/10.3758/BRM.42.2.381

Hypergeometric Distribution Diversity (HD-D

lexical_richness

hdd

hdd

get_hdd

For definition, check https://link.springer.com/article/10.3758/BRM.42.2.381

Moving-average type token ratio (MATTR)

lexical_richness

mattr

mattr

get_mattr

Calculates the TTR for a sliding window of n tokens, then takes the average

Mean segmental type token ratio (MSTTR)

lexical_richness

msttr

msttr

get_msttr

Divides the text into n segments, calculates the TTR for all of them, then takes the average

Yule’s K

lexical_richness

yule_k

yule_k

get_yule_k

Yule’s characteristic constant of vocabulary richness

For definition, check https://quantling.org/~hbaayen/publications/TweedieBaayen1998.pdf

Simpson’s D

lexical_richness

simpsons_d

simpsons_d

get_simpsons_d

For definition, check https://quantling.org/~hbaayen/publications/TweedieBaayen1998.pdf

Herdan’s Vm

lexical_richness

herdan_v

herdan_v

get_herdan_v

For definition, check https://quantling.org/~hbaayen/publications/TweedieBaayen1998.pdf

Number of syllables

readability

n_syllables

n_syllables

get_num_syllables

Number of syllables in the text

Only implemented for spacy backbone

Number of monosyllables

readability

n_monosyllables

n_monosyllables

get_num_monosyllables

Number of monosyllables (words with only one syllable) in the text

Number of polysyllables

readability

n_polysyllables

n_polysyllables

get_num_polysyllables

Number of polysyllables (words with three or more syllables) in the text

Flesch reading ease

readability

flesch_reading_ease

flesch_reading_ease

get_flesch_reading_ease

Flesch reading ease score of the text

For reference: https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests#Flesch_Reading_Ease

Flesch-Kincaid Grade Level

readability

flesch_kincaid_grade

flesch_kincaid_grade

get_flesch_kincaid_grade

Flesch-Kincaid grade level of the text

For reference: https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests#Flesch.E2.80.93Kincaid_Grade_Level

Automated Readability Index (ARI)

readability

ari

ari

get_ari

For reference: https://en.wikipedia.org/wiki/Automated_readability_index

Simple Measure of Gobbledygook (SMOG)

readability

smog

smog

get_smog

For reference: https://en.wikipedia.org/wiki/SMOG

Coleman-Liau Index (CLI)

readability

cli

cli

get_cli

For reference: https://en.wikipedia.org/wiki/Coleman%E2%80%93Liau_index

Gunning-fog Index

readability

gunning_fog

gunning_fog

get_gunning_fog

For reference: https://en.wikipedia.org/wiki/Gunning_fog_index

LIX

readability

lix

lix

get_lix

For reference: https://en.wikipedia.org/wiki/Lix_(readability_test)

RIX

readability

rix

rix

get_rix

For reference: https://www.jstor.org/stable/40031755

Compressibility

information

compressibility

compressibility

get_compressibility

Compressibility is the ratio of the length of the compressed text to the length of the original text. This is a proxy for the Kolmogorov complexity of the text

Entropy

information

entropy

entropy

get_entropy

Shannon entropy of the text

Number of named entities

entities

n_entitites

n_entitites

get_num_entities

Number of named entities in the text

Number of named entities of type {ent}

entities

n_per_entity_type

n_{ent}

get_num_per_entity_type

Number of named entities in the text with type {ent}. Takes a list of entity types to extract this feature for

ent_types defaults to all possible entity types; if you only need a subset, this can be adapted in the config

Number of hedge words

semantic

n_hedges

n_hedges

get_num_hedges

Number of hedge words in the text (words expressing uncertainty of the speaker).

requires a hedge lexicon; currently only supported in English

Hedges token ratio

semantic

hedges_ratio

hedges_ratio

get_hedges_ratio

Ratio of hedges in the text: n_hedges / n_tokens

Average number of synsets

semantic

avg_n_synsets

avg_n_synsets

get_avg_num_synsets

Average number of wordnet synsets of lexical tokens; proxy for ambiguity/polysemy

Number of words with a low number of synsets per pos

semantic

n_low_synsets_per_pos

n_low_synsets_{pos}

get_low_synsets_{pos}

Number of lexical tokens with a low number of synsets per pos tag

Threshold defaults to 2

Number of words with a high number of synsets per pos

semantic

n_high_synsets_per_pos

n_high_synsets_{pos}

get_high_synsets_{pos}

Number of lexical tokens with a high number of synsets per pos tag

Threshold defaults to 5

Number of words with a low number of synsets

semantic

n_low_synsets

n_low_synsets

get_num_low_synsets

Number of lexical tokens with a low number of synsets

Threshold defaults to 2

Number of words with a high number of synsets

semantic

n_high_synsets

n_high_synsets

get_num_high_synsets

Number of lexical tokens with a high number of synsets

Threshold defaults to 5

Average valence

emotion

avg_valence

avg_valence

get_avg_valence

Average valence of the tokens in the text

For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Number of low valence tokens

emotion

n_low_valence

n_low_valence

get_n_low_valence

Number of low valence tokens in the text

Threshold defaults to 0.33; For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Number of high valence tokens

emotion

n_high_valence

n_high_valence

get_n_high_valence

Number of high valence tokens in the text

Threshold defaults to 0.66; For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Average arousal

emotion

avg_arousal

avg_arousal

get_avg_arousal

Average arousal of the tokens in the text

For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Maximum arousal

emotion

max_arousal

max_arousal

get_max_arousal

Maximum arousal of the tokens in the text

For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Minimum arousal

emotion

min_arousal

min_arousal

get_min_arousal

Minimum arousal of the tokens in the text

For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Standard deviation of arousal

emotion

sd_arousal

sd_arousal

get_sd_arousal

Standard deviation of the arousal ratings of the tokens in the text

For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Number of low arousal tokens

emotion

n_low_arousal

n_low_arousal

get_n_low_arousal

Number of low arousal tokens in the text

Threshold defaults to 0.33; For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Number of high arousal tokens

emotion

n_high_arousal

n_high_arousal

get_n_high_arousal

Number of high arousal tokens in the text

Threshold defaults to 0.66; For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Maximum dominance

emotion

max_dominance

max_dominance

get_max_dominance

Maximum dominance of the tokens in the text

For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Minimum dominance

emotion

min_dominance

min_dominance

get_min_dominance

Minimum dominance of the tokens in the text

For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Standard deviation of dominance

emotion

sd_dominance

sd_dominance

get_sd_dominance

Standard deviation of the dominance ratings of the tokens in the text

For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Average dominance

emotion

avg_dominance

avg_dominance

get_avg_dominance

Average dominance of the tokens in the text

For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Number of low dominance tokens

emotion

n_low_dominance

n_low_dominance

get_n_low_dominance

Number of low dominance tokens in the text

Threshold defaults to 0.33; For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Number of high dominance tokens

emotion

n_high_dominance

n_high_dominance

get_n_high_dominance

Number of high dominance tokens in the text

Threshold defaults to 0.66; For reference: https://saifmohammad.com/WebPages/nrc-vad.html

Maximum emotion intensity for {emotion}

emotion

max_intensity

max_intensity_{emotion}

get_max_intensity

Maximum intensity of an emotion; takes a list of emotions

For reference: https://saifmohammad.com/WebPages/AffectIntensity.htm

Minimum emotion intensity for {emotion}

emotion

min_intensity

min_intensity_{emotion}

get_min_intensity

Minimum intensity of an emotion; takes a list of emotions

For reference: https://saifmohammad.com/WebPages/AffectIntensity.htm

Standard deviation of emotion intensity for {emotion}

emotion

sd_intensity

sd_intensity_{emotion}

get_sd_intensity

Standard deviation of intensity of an emotion; takes a list of emotions

For reference: https://saifmohammad.com/WebPages/AffectIntensity.htm

Average emotion intensity for {emotion}

emotion

avg_intensity

avg_intensity_{emotion}

get_avg_intensity

Average intensity of an emotion; takes a list of emotions

For reference: https://saifmohammad.com/WebPages/AffectIntensity.htm

Number of high intensity tokens for {emotion}

emotion

n_high_intensity

n_high_intensity_{emotion}

get_n_high_intensity

Number of high intensity tokens for a given emotion

Threshold defaults to 0.33; For reference: https://saifmohammad.com/WebPages/AffectIntensity.htm

Number of low intensity tokens for {emotion}

emotion

n_low_intensity

n_low_intensity_{emotion}

get_n_low_intensity

Number of high§§ intensity tokens for a given emotion

Threshold defaults to 0.66; For reference: https://saifmohammad.com/WebPages/AffectIntensity.htm

Maximum intensity for {emotion}

emotion

max_{emotion}

max_{emotion}

get_max_{emotion}

Maximum value for a given emotion; takes a list of emotions

For reference: https://saifmohammad.com/WebDocs/NRC-Emotion-Lexicon.htm

Minimum intensity for {emotion}

emotion

min_{emotion}

min_{emotion}

get_min_{emotion}

Minimum value for a given emotion; takes a list of emotions

For reference: https://saifmohammad.com/WebDocs/NRC-Emotion-Lexicon.htm

Standard deviation of intensity for {emotion}

emotion

sd_{emotion}

sd_{emotion}

get_sd_{emotion}

Standard deviation of values for a given emotion; takes a list of emotions

For reference: https://saifmohammad.com/WebDocs/NRC-Emotion-Lexicon.htm

Sentiment score

emotion

sentiment_score

sentiment_score

get_sentiment_score

Difference between the number of positive and negative sentiment words in the text: (n_positive_sentiment - n_negative_sentiment) / n_tokens

Values in range (-1,1) where 0 is neutral, -1 is completely negative sentiment, and 1 is completely positive sentiment; For reference: https://saifmohammad.com/WebDocs/NRC-Emotion-Lexicon.htm

Number of negative sentiment tokens

emotion

n_negative_sentiment

n_negative_sentiment

get_n_negative_sentiment

Number of negative sentiment tokens

For reference: https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm

Number of positive sentiment tokens

emotion

n_positive_sentiment

n_positive_sentiment

get_n_positive_sentiment

Number of positive sentiment tokens

For reference: https://saifmohammad.com/WebDocs/NRC-Emotion-Lexicon.htm

Average concreteness

psycholinguistic

avg_concreteness

avg_concreteness

get_avg_concreteness

Average human concreteness ratings of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5

Average standard deviation of concreteness

psycholinguistic

avg_sd_concreteness

avg_sd_concreteness

get_avg_sd_concreteness

Average standard deviation in the human concreteness ratings of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5

Number of low concreteness tokens

psycholinguistic

n_low_concreteness

n_low_concreteness

get_n_low_concreteness

Number of tokens with a low concreteness rating

For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5

Number of high concreteness tokens

psycholinguistic

n_high_concreteness

n_high_concreteness

get_n_high_concreteness

Number of tokens with a high concreteness rating

For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5

Number of tokens with controversial concreteness

psycholinguistic

n_controversial_concreteness

n_controversial_concreteness

get_n_controversial_concreteness

Number of tokens with a high standard deviation in the human concreteness rating

For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5

Maximum concreteness

psycholinguistic

max_concreteness

max_concreteness

get_max_concreteness

Maximum concreteness rating of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5

Minimum concreteness

psycholinguistic

min_concreteness

min_concreteness

get_min_concreteness

Minimum concreteness rating of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5

Standard deviation of concreteness

psycholinguistic

sd_concreteness

sd_concreteness

get_sd_concreteness

Standard deviation of the concreteness ratings of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5

Average age of acquisition

psycholinguistic

avg_aoa

avg_aoa

get_avg_aoa

Average age of acquisition rating

For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4

Average standard deviation of age of acquisition

psycholinguistic

avg_sd_aoa

avg_sd_aoa

get_avg_sd_aoa

Average standard deviation in the age of acquisition rating

For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4

Number of low age of acquisition tokens

psycholinguistic

n_low_aoa

n_low_aoa

get_n_low_aoa

Number of low age of acquisition tokens

For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4

Number of high age of acquisition tokens

psycholinguistic

n_high_aoa

n_high_aoa

get_n_high_aoa

Number of high age of acquisition tokens

For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4

Number of tokens with controversial age of acquisition

psycholinguistic

n_controversial_aoa

n_controversial_aoa

get_n_controversial_aoa

Number of tokens with a high standard deviation in the age of acquisition rating

For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4

Minimum age of acquisition

psycholinguistic

min_aoa

min_aoa

get_min_aoa

Minimum age of acquisition rating of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4

Maximum age of acquisition

psycholinguistic

max_aoa

max_aoa

get_max_aoa

Maximum age of acquisition rating of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4

Standard deviation of age of acquisition

psycholinguistic

sd_aoa

sd_aoa

get_sd_aoa

Standard deviation of the age of acquisition ratings of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4

Average prevalence

psycholinguistic

avg_prevalence

avg_prevalence

get_avg_prevalence

Average human prevalence ratings of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-018-1077-9

Number of low prevalence tokens

psycholinguistic

n_low_prevalence

n_low_prevalence

get_n_low_prevalence

Number of low prevalence tokens

For reference: https://link.springer.com/article/10.3758/s13428-018-1077-9

Number of high prevalence tokens

psycholinguistic

n_high_prevalence

n_high_prevalence

get_n_high_prevalence

Number of high prevalence tokens

For reference: https://link.springer.com/article/10.3758/s13428-018-1077-9

Minimum prevalence

psycholinguistic

min_prevalence

min_prevalence

get_min_prevalence

Minimum prevalence rating of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-018-1077-9

Maximum prevalence

psycholinguistic

max_prevalence

max_prevalence

get_max_prevalence

Maximum prevalence rating of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-018-1077-9

Standard deviation of prevalence

psycholinguistic

sd_prevalence

sd_prevalence

get_sd_prevalence

Standard deviation of the prevalence ratings of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-018-1077-9

Average socialness

psycholinguistic

avg_socialness

avg_socialness

get_avg_socialness

Average human socialness ratings of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x

Average standard deviation of socialness

psycholinguistic

avg_sd_socialness

avg_sd_socialness

get_avg_sd_socialness

Average standard deviation in the human socialness ratings of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x

Number of high socialness tokens

psycholinguistic

n_high_socialness

n_high_socialness

get_n_high_socialness

Number of high socialness tokens

For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x

Number of low socialness tokens

psycholinguistic

n_low_socialness

n_low_socialness

get_n_low_socialness

Number of low socialness tokens

For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x

Number of tokens with controversial socialness

psycholinguistic

n_controversial_socialness

n_controversial_socialness

get_n_controversial_socialness

Number of tokens with a high standard deviation in the human socialness rating

For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x

Minimum socialness

psycholinguistic

min_socialness

min_socialness

get_min_socialness

Minimum socialness rating of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x

Maximum socialness

psycholinguistic

max_socialness

max_socialness

get_max_socialness

Maximum socialness rating of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x

Standard deviation of socialness

psycholinguistic

sd_socialness

sd_socialness

get_sd_socialness

Standard deviation of the socialness ratings of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x

Average iconicity

psycholinguistic

avg_iconicity

avg_iconicity

get_avg_iconicity

Average human iconicity ratings of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6

Average standard deviation of iconicity

psycholinguistic

avg_sd_iconicity

avg_sd_iconicity

get_avg_sd_iconicity

Average standard deviation in the human iconicity ratings of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6

Number of high iconicity tokens

psycholinguistic

n_high_iconicity

n_high_iconicity

get_n_high_iconicity

Number of high iconicity tokens

For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6

Number of low iconicity tokens

psycholinguistic

n_low_iconicity

n_low_iconicity

get_n_low_iconicity

Number of low iconicity tokens

For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6

Number of tokens with controversial iconicity

psycholinguistic

n_controversial_iconicity

n_controversial_iconicity

get_n_controversial_iconicity

Number of tokens with a high standard deviation in the human iconicity rating

For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6

Minimum iconicity

psycholinguistic

min_iconicity

min_iconicity

get_min_iconicity

Minimum iconicity rating of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6

Maximum iconicity

psycholinguistic

max_iconicity

max_iconicity

get_max_iconicity

Maximum iconicity rating of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6

Standard deviation of iconicity

psycholinguistic

sd_iconicity

sd_iconicity

get_sd_iconicity

Standard deviation of the iconicity ratings of the tokens in the text

For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6

Average sensorimotor score for {sensorimotor}

psycholinguistic

avg_sensorimotor

avg_{var}_sensorimotor

get_avg_sensorimotor

Average sensorimotor score for a given sensorimotor dimension

Average standard deviation of sensorimotor score

psycholinguistic

avg_sd_sensorimotor

avg_sd_{var}_sensorimotor

get_avg_sd_sensorimotor

Average standard deviation in the sensorimotor scores of the tokens in the text

Number of tokens with low sensorimotor rating

psycholinguistic

n_low_sensorimotor

n_low_{var}_sensorimotor

get_n_low_sensorimotor

Number of tokens with low sensorimotor rating

Number of tokens with high sensorimotor rating

psycholinguistic

n_high_sensorimotor

n_high_{var}_sensorimotor

get_n_high_sensorimotor

Number of tokens with high sensorimotor rating

Number of tokens with controversial sensorimotor rating

psycholinguistic

n_controversial_sensorimotor

n_controversial_{var}_sensorimotor

get_n_controversial_sensorimotor

Number of tokens with a high standard deviation in the sensorimotor rating

Minimum sensorimotor score

psycholinguistic

min_sensorimotor

min_{var}_sensorimotor

get_min_sensorimotor

Minimum sensorimotor rating of the tokens in the text

Maximum sensorimotor score

psycholinguistic

max_sensorimotor

max_{var}_sensorimotor

get_max_sensorimotor

Maximum sensorimotor rating of the tokens in the text

Standard deviation of sensorimotor score

psycholinguistic

sd_sensorimotor

sd_{var}_sensorimotor

get_sd_sensorimotor

Standard deviation of the sensorimotor ratings of the tokens in the text

Morphological feature counts

morphological

n_per_morph_feature

n_{pos}_{feature}_{val}

get_morph_feats

Number of tokens with {pos} {feature} {val} (e.g. VERB VerbForm Inf), takes a dictionary of pos and associated features and values for them

Default dictionary is the full set of UD options; adapt if you do not need all of them (may be language-specific)

Dependency tree width

dependency

tree_width

tree_width

get_tree_width

Maximum number of siblings of a node at any level

Dependency tree depth

dependency

tree_depth

tree_depth

get_tree_depth

Maximum distance of a token to the root of the dependency tree

Tree branching factor

dependency

tree_branching

tree_branching

get_tree_branching

Average number of children of a token in the dependency tree

Tree ramification factor

dependency

ramification_factor

ramification_factor

get_ramification_factor

Average number of children per level

Number of noun chunks

dependency

n_noun_chunks

n_noun_chunks

get_n_noun_chunks

Number of noun chunks in the dependency tree

Number of dependencies of type {type}

dependency

n_per_dependency_type

n_dependency_{type}

get_n_per_dependency_type

{Feature} token ratio

ratios/normalization

{feature}_token_ratio

get_feature_token_ratio

{Feature} type ratio

ratios/normalization

{feature}_type_ratio

get_feature_type_ratio

{Feature} sentence ratio

ratios/normalization

{feature}_sentence ratio

get_feature_sentence_ratio