Feature Overview
The table below gives an overview of the implemented features in ELFEN, grouped by feature area, with details on the feature name, description, and the feature area it belongs to.
Feature |
Feature Area |
Feature name |
Name in extracted dataframe |
Function |
Description |
Notes |
|---|---|---|---|---|---|---|
Raw sequence length/total number of characters |
surface |
raw_sequence_length |
raw_sequence_length |
get_raw_sequence_length |
Number of characters in the text (including whitespaces) |
|
Number of tokens |
surface |
n_tokens |
n_tokens |
get_num_tokens |
Number of tokens in the text |
|
Number of sentences |
surface |
n_sentences |
n_sentences |
get_num_sentences |
Number of sentences in the text |
|
Number of token per sentence |
surface |
tokens_per_sentence |
tokens_per_sentence |
get_num_tokens_per_sentence |
Average number of tokens per sentence: n_tokens / n_sentences |
|
Number of characters |
surface |
n_characters |
n_characters |
get_num_characters |
Number of characters in the text (excluding whitespaces) |
|
Characters per sentence |
surface |
characters_per_sentence |
characters_per_sentence |
get_chars_per_sentence |
Average number of characters per sentence: n_characters / n_sentences |
|
Raw sequence length per sentence |
surface |
raw_length_per_sentence |
raw_length_per_sentence |
get_raw_length_per_sentence |
Average number of characters per sentence: raw_sequence_length / n_sentences |
|
Average word length |
surface |
avg_word_length |
avg_word_length |
get_avg_word_length |
Average word length (in characters): n_characters / n_tokens |
|
Number of types |
surface |
n_types |
n_types |
get_num_types |
Number of types (unique tokens) in the text |
|
Number of long words |
surface |
n_long_words |
n_long_words |
get_num_long_words |
Number of long words (i.t.o. characters) |
Threshold of what is considered a long word defaults to >6 characters; can be adapted in the config |
Number of lemmas |
surface |
n_lemmas |
n_lemmas |
get_num_lemmas |
Number of lemmas in the text |
|
Token frequencies |
surface |
token_freqs |
token_freqs |
get_token_freqs |
Token frequencies of the types in the text |
As this produces a list in a column, writing to file has to be handled |
Number of lexical tokens |
pos |
n_lexical_tokens |
n_lexical_tokens |
get_num_lexical_tokens |
Number of lexical tokens (tokens w/ upos tag NOUN, ADVERB, ADJ, ADV) |
|
POS variability |
pos |
pos_variability |
pos_variability |
get_pos_variability |
POS variability of the text: (unique upos text in the text) / n_tokens |
|
Number of tokens with upos tag {pos} |
pos |
n_per_pos |
n_{pos} |
get_num_per_pos |
Number of tokens with a given upos tag in the text. Takes a list of upos tag to extract this feature for |
pos_list defaults to all upos tags; if you only need a subset, this can be adapted in the config |
Lemma token ratio |
lexical_richness |
lemma_token_ratio |
lemma_token_ratio |
get_lemma_token_ratio |
Lemma token ratio of the text: n_lemmas / n_tokens |
|
Type token ratio |
lexical_richness |
ttr |
ttr |
get_ttr |
Type token ratio of the text: n_types / n_tokens |
|
Root type token ratio |
lexical_richness |
rttr |
rttr |
get_rttr |
Root type token ratio of the text: sqrt(n_types / n_tokens) |
|
Corrected type token ratio |
lexical_richness |
cttr |
cttr |
get_cttr |
Corrected type token ratio of the text: n_types / sqrt(2 * n_tokens) |
|
Herdan’s C |
lexical_richness |
herdan_c |
herdan_c |
get_herdan_c |
Herdan’s C of a text: log(n_types) / log(n_tokens) |
|
Summer’s type token ratio/ index |
lexical_richness |
summer_index |
summer_index |
get_summer_index |
Summer’s text token ratio of the text: log(log(n_types)) / log(log(n_tokens)) |
|
Dugast’s Uber index |
lexical_richness |
dugast_u |
dugast_u |
get_dugast_u |
Dugast’s Uber index of the text: log(n_types)^2 / (log(n_tokens) - log( n_types)) |
|
Maas’ text token ratio/index |
lexical_richness |
maas_index |
maas_index |
get_maas_index |
Maas’ text token ratio of the text: (n_tokens - n_types) / log(n_types)^2 |
|
Number of local hapax legomena |
lexical_richness |
n_hapax_legomena |
n_hapax_legomena |
get_n_hapax_legomena |
Number of hapax legomena (tokens that occur only once) in the text |
|
Number of global token hapax legomena |
lexical_richness |
n_global_token_hapax_legomena |
n_global_token_hapax_legomena |
get_n_global_token_hapax_legomena |
Number of hapax legomena (tokens that occur only once) in the entire corpus in the text instance |
|
Number of global lemma hapax legomena |
lexical_richness |
n_global_lemma_hapax_legomena |
n_global_lemma_hapax_legomena |
get_n_global_lemma_hapax_legomena |
Number of hapax legomena (lemmas that occur only once) in the entire corpus in the text instance |
|
Number of hapax dislegomena |
lexical_richness |
n_hapax_dislegomena |
n_hapax_dislegomena |
get_n_hapax_dislegomena |
Number of hapax dislegomena (tokens that occur once or twice) in the text |
|
Number of global token hapax dislegomena |
lexical_richness |
n_global_token_hapax_dislegomena |
n_global_token_hapax_dislegomena |
get_n_global_token_hapax_dislegomena |
Number of hapax dislegomena (tokens that occur once or twice) in the entire corpus in the text instance |
|
Number of global lemma hapax dislegomena |
lexical_richness |
n_global_lemma_hapax_dislegomena |
n_global_lemma_hapax_dislegomena |
get_n_global_lemma_hapax_dislegomena |
Number of hapax dislegomena (tokens that occur once or twice) in the entire corpus in the text instance |
|
Sichel’s S |
lexical_richness |
sichel_s |
sichel_s |
get_sichel_s |
Sichel’s S of the text: n_hapax_dislegomena / n_types |
|
Global Sichel’s S |
lexical_richness |
global_sichel_s |
global_sichel_s |
get_global_sichel_s |
Global Sichel’s S of the text: n_global_token_hapax_dislegomena / n_types |
|
Lexical density |
lexical_richness |
lexical_density |
lexical_density |
get_lexical_density |
Lexical density of the text: n_lexical_tokens / n_tokens |
|
Giroud’s index |
lexical_richness |
giroud_index |
giroud_index |
get_giroud_index |
Giroud’s index of a text: n_types / sqrt(n_tokens) |
|
Measure of Textual Lexical Density (MTLD) |
lexical_richness |
mtld |
mtld |
get_mtld |
For definition, check https://link.springer.com/article/10.3758/BRM.42.2.381 |
|
Hypergeometric Distribution Diversity (HD-D |
lexical_richness |
hdd |
hdd |
get_hdd |
For definition, check https://link.springer.com/article/10.3758/BRM.42.2.381 |
|
Moving-average type token ratio (MATTR) |
lexical_richness |
mattr |
mattr |
get_mattr |
Calculates the TTR for a sliding window of n tokens, then takes the average |
|
Mean segmental type token ratio (MSTTR) |
lexical_richness |
msttr |
msttr |
get_msttr |
Divides the text into n segments, calculates the TTR for all of them, then takes the average |
|
Yule’s K |
lexical_richness |
yule_k |
yule_k |
get_yule_k |
Yule’s characteristic constant of vocabulary richness |
For definition, check https://quantling.org/~hbaayen/publications/TweedieBaayen1998.pdf |
Simpson’s D |
lexical_richness |
simpsons_d |
simpsons_d |
get_simpsons_d |
For definition, check https://quantling.org/~hbaayen/publications/TweedieBaayen1998.pdf |
|
Herdan’s Vm |
lexical_richness |
herdan_v |
herdan_v |
get_herdan_v |
For definition, check https://quantling.org/~hbaayen/publications/TweedieBaayen1998.pdf |
|
Number of syllables |
readability |
n_syllables |
n_syllables |
get_num_syllables |
Number of syllables in the text |
Only implemented for spacy backbone |
Number of monosyllables |
readability |
n_monosyllables |
n_monosyllables |
get_num_monosyllables |
Number of monosyllables (words with only one syllable) in the text |
|
Number of polysyllables |
readability |
n_polysyllables |
n_polysyllables |
get_num_polysyllables |
Number of polysyllables (words with three or more syllables) in the text |
|
Flesch reading ease |
readability |
flesch_reading_ease |
flesch_reading_ease |
get_flesch_reading_ease |
Flesch reading ease score of the text |
For reference: https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests#Flesch_Reading_Ease |
Flesch-Kincaid Grade Level |
readability |
flesch_kincaid_grade |
flesch_kincaid_grade |
get_flesch_kincaid_grade |
Flesch-Kincaid grade level of the text |
For reference: https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests#Flesch.E2.80.93Kincaid_Grade_Level |
Automated Readability Index (ARI) |
readability |
ari |
ari |
get_ari |
For reference: https://en.wikipedia.org/wiki/Automated_readability_index |
|
Simple Measure of Gobbledygook (SMOG) |
readability |
smog |
smog |
get_smog |
For reference: https://en.wikipedia.org/wiki/SMOG |
|
Coleman-Liau Index (CLI) |
readability |
cli |
cli |
get_cli |
For reference: https://en.wikipedia.org/wiki/Coleman%E2%80%93Liau_index |
|
Gunning-fog Index |
readability |
gunning_fog |
gunning_fog |
get_gunning_fog |
For reference: https://en.wikipedia.org/wiki/Gunning_fog_index |
|
LIX |
readability |
lix |
lix |
get_lix |
For reference: https://en.wikipedia.org/wiki/Lix_(readability_test) |
|
RIX |
readability |
rix |
rix |
get_rix |
For reference: https://www.jstor.org/stable/40031755 |
|
Compressibility |
information |
compressibility |
compressibility |
get_compressibility |
Compressibility is the ratio of the length of the compressed text to the length of the original text. This is a proxy for the Kolmogorov complexity of the text |
|
Entropy |
information |
entropy |
entropy |
get_entropy |
Shannon entropy of the text |
|
Number of named entities |
entities |
n_entitites |
n_entitites |
get_num_entities |
Number of named entities in the text |
|
Number of named entities of type {ent} |
entities |
n_per_entity_type |
n_{ent} |
get_num_per_entity_type |
Number of named entities in the text with type {ent}. Takes a list of entity types to extract this feature for |
ent_types defaults to all possible entity types; if you only need a subset, this can be adapted in the config |
Number of hedge words |
semantic |
n_hedges |
n_hedges |
get_num_hedges |
Number of hedge words in the text (words expressing uncertainty of the speaker). |
requires a hedge lexicon; currently only supported in English |
Hedges token ratio |
semantic |
hedges_ratio |
hedges_ratio |
get_hedges_ratio |
Ratio of hedges in the text: n_hedges / n_tokens |
|
Average number of synsets |
semantic |
avg_n_synsets |
avg_n_synsets |
get_avg_num_synsets |
Average number of wordnet synsets of lexical tokens; proxy for ambiguity/polysemy |
|
Number of words with a low number of synsets per pos |
semantic |
n_low_synsets_per_pos |
n_low_synsets_{pos} |
get_low_synsets_{pos} |
Number of lexical tokens with a low number of synsets per pos tag |
Threshold defaults to 2 |
Number of words with a high number of synsets per pos |
semantic |
n_high_synsets_per_pos |
n_high_synsets_{pos} |
get_high_synsets_{pos} |
Number of lexical tokens with a high number of synsets per pos tag |
Threshold defaults to 5 |
Number of words with a low number of synsets |
semantic |
n_low_synsets |
n_low_synsets |
get_num_low_synsets |
Number of lexical tokens with a low number of synsets |
Threshold defaults to 2 |
Number of words with a high number of synsets |
semantic |
n_high_synsets |
n_high_synsets |
get_num_high_synsets |
Number of lexical tokens with a high number of synsets |
Threshold defaults to 5 |
Average valence |
emotion |
avg_valence |
avg_valence |
get_avg_valence |
Average valence of the tokens in the text |
For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Number of low valence tokens |
emotion |
n_low_valence |
n_low_valence |
get_n_low_valence |
Number of low valence tokens in the text |
Threshold defaults to 0.33; For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Number of high valence tokens |
emotion |
n_high_valence |
n_high_valence |
get_n_high_valence |
Number of high valence tokens in the text |
Threshold defaults to 0.66; For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Average arousal |
emotion |
avg_arousal |
avg_arousal |
get_avg_arousal |
Average arousal of the tokens in the text |
For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Maximum arousal |
emotion |
max_arousal |
max_arousal |
get_max_arousal |
Maximum arousal of the tokens in the text |
For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Minimum arousal |
emotion |
min_arousal |
min_arousal |
get_min_arousal |
Minimum arousal of the tokens in the text |
For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Standard deviation of arousal |
emotion |
sd_arousal |
sd_arousal |
get_sd_arousal |
Standard deviation of the arousal ratings of the tokens in the text |
For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Number of low arousal tokens |
emotion |
n_low_arousal |
n_low_arousal |
get_n_low_arousal |
Number of low arousal tokens in the text |
Threshold defaults to 0.33; For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Number of high arousal tokens |
emotion |
n_high_arousal |
n_high_arousal |
get_n_high_arousal |
Number of high arousal tokens in the text |
Threshold defaults to 0.66; For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Maximum dominance |
emotion |
max_dominance |
max_dominance |
get_max_dominance |
Maximum dominance of the tokens in the text |
For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Minimum dominance |
emotion |
min_dominance |
min_dominance |
get_min_dominance |
Minimum dominance of the tokens in the text |
For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Standard deviation of dominance |
emotion |
sd_dominance |
sd_dominance |
get_sd_dominance |
Standard deviation of the dominance ratings of the tokens in the text |
For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Average dominance |
emotion |
avg_dominance |
avg_dominance |
get_avg_dominance |
Average dominance of the tokens in the text |
For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Number of low dominance tokens |
emotion |
n_low_dominance |
n_low_dominance |
get_n_low_dominance |
Number of low dominance tokens in the text |
Threshold defaults to 0.33; For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Number of high dominance tokens |
emotion |
n_high_dominance |
n_high_dominance |
get_n_high_dominance |
Number of high dominance tokens in the text |
Threshold defaults to 0.66; For reference: https://saifmohammad.com/WebPages/nrc-vad.html |
Maximum emotion intensity for {emotion} |
emotion |
max_intensity |
max_intensity_{emotion} |
get_max_intensity |
Maximum intensity of an emotion; takes a list of emotions |
For reference: https://saifmohammad.com/WebPages/AffectIntensity.htm |
Minimum emotion intensity for {emotion} |
emotion |
min_intensity |
min_intensity_{emotion} |
get_min_intensity |
Minimum intensity of an emotion; takes a list of emotions |
For reference: https://saifmohammad.com/WebPages/AffectIntensity.htm |
Standard deviation of emotion intensity for {emotion} |
emotion |
sd_intensity |
sd_intensity_{emotion} |
get_sd_intensity |
Standard deviation of intensity of an emotion; takes a list of emotions |
For reference: https://saifmohammad.com/WebPages/AffectIntensity.htm |
Average emotion intensity for {emotion} |
emotion |
avg_intensity |
avg_intensity_{emotion} |
get_avg_intensity |
Average intensity of an emotion; takes a list of emotions |
For reference: https://saifmohammad.com/WebPages/AffectIntensity.htm |
Number of high intensity tokens for {emotion} |
emotion |
n_high_intensity |
n_high_intensity_{emotion} |
get_n_high_intensity |
Number of high intensity tokens for a given emotion |
Threshold defaults to 0.33; For reference: https://saifmohammad.com/WebPages/AffectIntensity.htm |
Number of low intensity tokens for {emotion} |
emotion |
n_low_intensity |
n_low_intensity_{emotion} |
get_n_low_intensity |
Number of high§§ intensity tokens for a given emotion |
Threshold defaults to 0.66; For reference: https://saifmohammad.com/WebPages/AffectIntensity.htm |
Maximum intensity for {emotion} |
emotion |
max_{emotion} |
max_{emotion} |
get_max_{emotion} |
Maximum value for a given emotion; takes a list of emotions |
For reference: https://saifmohammad.com/WebDocs/NRC-Emotion-Lexicon.htm |
Minimum intensity for {emotion} |
emotion |
min_{emotion} |
min_{emotion} |
get_min_{emotion} |
Minimum value for a given emotion; takes a list of emotions |
For reference: https://saifmohammad.com/WebDocs/NRC-Emotion-Lexicon.htm |
Standard deviation of intensity for {emotion} |
emotion |
sd_{emotion} |
sd_{emotion} |
get_sd_{emotion} |
Standard deviation of values for a given emotion; takes a list of emotions |
For reference: https://saifmohammad.com/WebDocs/NRC-Emotion-Lexicon.htm |
Sentiment score |
emotion |
sentiment_score |
sentiment_score |
get_sentiment_score |
Difference between the number of positive and negative sentiment words in the text: (n_positive_sentiment - n_negative_sentiment) / n_tokens |
Values in range (-1,1) where 0 is neutral, -1 is completely negative sentiment, and 1 is completely positive sentiment; For reference: https://saifmohammad.com/WebDocs/NRC-Emotion-Lexicon.htm |
Number of negative sentiment tokens |
emotion |
n_negative_sentiment |
n_negative_sentiment |
get_n_negative_sentiment |
Number of negative sentiment tokens |
For reference: https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm |
Number of positive sentiment tokens |
emotion |
n_positive_sentiment |
n_positive_sentiment |
get_n_positive_sentiment |
Number of positive sentiment tokens |
For reference: https://saifmohammad.com/WebDocs/NRC-Emotion-Lexicon.htm |
Average concreteness |
psycholinguistic |
avg_concreteness |
avg_concreteness |
get_avg_concreteness |
Average human concreteness ratings of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5 |
Average standard deviation of concreteness |
psycholinguistic |
avg_sd_concreteness |
avg_sd_concreteness |
get_avg_sd_concreteness |
Average standard deviation in the human concreteness ratings of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5 |
Number of low concreteness tokens |
psycholinguistic |
n_low_concreteness |
n_low_concreteness |
get_n_low_concreteness |
Number of tokens with a low concreteness rating |
For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5 |
Number of high concreteness tokens |
psycholinguistic |
n_high_concreteness |
n_high_concreteness |
get_n_high_concreteness |
Number of tokens with a high concreteness rating |
For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5 |
Number of tokens with controversial concreteness |
psycholinguistic |
n_controversial_concreteness |
n_controversial_concreteness |
get_n_controversial_concreteness |
Number of tokens with a high standard deviation in the human concreteness rating |
For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5 |
Maximum concreteness |
psycholinguistic |
max_concreteness |
max_concreteness |
get_max_concreteness |
Maximum concreteness rating of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5 |
Minimum concreteness |
psycholinguistic |
min_concreteness |
min_concreteness |
get_min_concreteness |
Minimum concreteness rating of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5 |
Standard deviation of concreteness |
psycholinguistic |
sd_concreteness |
sd_concreteness |
get_sd_concreteness |
Standard deviation of the concreteness ratings of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-013-0403-5 |
Average age of acquisition |
psycholinguistic |
avg_aoa |
avg_aoa |
get_avg_aoa |
Average age of acquisition rating |
For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4 |
Average standard deviation of age of acquisition |
psycholinguistic |
avg_sd_aoa |
avg_sd_aoa |
get_avg_sd_aoa |
Average standard deviation in the age of acquisition rating |
For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4 |
Number of low age of acquisition tokens |
psycholinguistic |
n_low_aoa |
n_low_aoa |
get_n_low_aoa |
Number of low age of acquisition tokens |
For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4 |
Number of high age of acquisition tokens |
psycholinguistic |
n_high_aoa |
n_high_aoa |
get_n_high_aoa |
Number of high age of acquisition tokens |
For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4 |
Number of tokens with controversial age of acquisition |
psycholinguistic |
n_controversial_aoa |
n_controversial_aoa |
get_n_controversial_aoa |
Number of tokens with a high standard deviation in the age of acquisition rating |
For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4 |
Minimum age of acquisition |
psycholinguistic |
min_aoa |
min_aoa |
get_min_aoa |
Minimum age of acquisition rating of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4 |
Maximum age of acquisition |
psycholinguistic |
max_aoa |
max_aoa |
get_max_aoa |
Maximum age of acquisition rating of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4 |
Standard deviation of age of acquisition |
psycholinguistic |
sd_aoa |
sd_aoa |
get_sd_aoa |
Standard deviation of the age of acquisition ratings of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-016-0811-4 |
Average prevalence |
psycholinguistic |
avg_prevalence |
avg_prevalence |
get_avg_prevalence |
Average human prevalence ratings of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-018-1077-9 |
Number of low prevalence tokens |
psycholinguistic |
n_low_prevalence |
n_low_prevalence |
get_n_low_prevalence |
Number of low prevalence tokens |
For reference: https://link.springer.com/article/10.3758/s13428-018-1077-9 |
Number of high prevalence tokens |
psycholinguistic |
n_high_prevalence |
n_high_prevalence |
get_n_high_prevalence |
Number of high prevalence tokens |
For reference: https://link.springer.com/article/10.3758/s13428-018-1077-9 |
Minimum prevalence |
psycholinguistic |
min_prevalence |
min_prevalence |
get_min_prevalence |
Minimum prevalence rating of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-018-1077-9 |
Maximum prevalence |
psycholinguistic |
max_prevalence |
max_prevalence |
get_max_prevalence |
Maximum prevalence rating of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-018-1077-9 |
Standard deviation of prevalence |
psycholinguistic |
sd_prevalence |
sd_prevalence |
get_sd_prevalence |
Standard deviation of the prevalence ratings of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-018-1077-9 |
Average socialness |
psycholinguistic |
avg_socialness |
avg_socialness |
get_avg_socialness |
Average human socialness ratings of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x |
Average standard deviation of socialness |
psycholinguistic |
avg_sd_socialness |
avg_sd_socialness |
get_avg_sd_socialness |
Average standard deviation in the human socialness ratings of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x |
Number of high socialness tokens |
psycholinguistic |
n_high_socialness |
n_high_socialness |
get_n_high_socialness |
Number of high socialness tokens |
For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x |
Number of low socialness tokens |
psycholinguistic |
n_low_socialness |
n_low_socialness |
get_n_low_socialness |
Number of low socialness tokens |
For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x |
Number of tokens with controversial socialness |
psycholinguistic |
n_controversial_socialness |
n_controversial_socialness |
get_n_controversial_socialness |
Number of tokens with a high standard deviation in the human socialness rating |
For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x |
Minimum socialness |
psycholinguistic |
min_socialness |
min_socialness |
get_min_socialness |
Minimum socialness rating of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x |
Maximum socialness |
psycholinguistic |
max_socialness |
max_socialness |
get_max_socialness |
Maximum socialness rating of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x |
Standard deviation of socialness |
psycholinguistic |
sd_socialness |
sd_socialness |
get_sd_socialness |
Standard deviation of the socialness ratings of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-022-01810-x |
Average iconicity |
psycholinguistic |
avg_iconicity |
avg_iconicity |
get_avg_iconicity |
Average human iconicity ratings of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6 |
Average standard deviation of iconicity |
psycholinguistic |
avg_sd_iconicity |
avg_sd_iconicity |
get_avg_sd_iconicity |
Average standard deviation in the human iconicity ratings of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6 |
Number of high iconicity tokens |
psycholinguistic |
n_high_iconicity |
n_high_iconicity |
get_n_high_iconicity |
Number of high iconicity tokens |
For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6 |
Number of low iconicity tokens |
psycholinguistic |
n_low_iconicity |
n_low_iconicity |
get_n_low_iconicity |
Number of low iconicity tokens |
For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6 |
Number of tokens with controversial iconicity |
psycholinguistic |
n_controversial_iconicity |
n_controversial_iconicity |
get_n_controversial_iconicity |
Number of tokens with a high standard deviation in the human iconicity rating |
For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6 |
Minimum iconicity |
psycholinguistic |
min_iconicity |
min_iconicity |
get_min_iconicity |
Minimum iconicity rating of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6 |
Maximum iconicity |
psycholinguistic |
max_iconicity |
max_iconicity |
get_max_iconicity |
Maximum iconicity rating of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6 |
Standard deviation of iconicity |
psycholinguistic |
sd_iconicity |
sd_iconicity |
get_sd_iconicity |
Standard deviation of the iconicity ratings of the tokens in the text |
For reference: https://link.springer.com/article/10.3758/s13428-023-02112-6 |
Average sensorimotor score for {sensorimotor} |
psycholinguistic |
avg_sensorimotor |
avg_{var}_sensorimotor |
get_avg_sensorimotor |
Average sensorimotor score for a given sensorimotor dimension |
|
Average standard deviation of sensorimotor score |
psycholinguistic |
avg_sd_sensorimotor |
avg_sd_{var}_sensorimotor |
get_avg_sd_sensorimotor |
Average standard deviation in the sensorimotor scores of the tokens in the text |
|
Number of tokens with low sensorimotor rating |
psycholinguistic |
n_low_sensorimotor |
n_low_{var}_sensorimotor |
get_n_low_sensorimotor |
Number of tokens with low sensorimotor rating |
|
Number of tokens with high sensorimotor rating |
psycholinguistic |
n_high_sensorimotor |
n_high_{var}_sensorimotor |
get_n_high_sensorimotor |
Number of tokens with high sensorimotor rating |
|
Number of tokens with controversial sensorimotor rating |
psycholinguistic |
n_controversial_sensorimotor |
n_controversial_{var}_sensorimotor |
get_n_controversial_sensorimotor |
Number of tokens with a high standard deviation in the sensorimotor rating |
|
Minimum sensorimotor score |
psycholinguistic |
min_sensorimotor |
min_{var}_sensorimotor |
get_min_sensorimotor |
Minimum sensorimotor rating of the tokens in the text |
|
Maximum sensorimotor score |
psycholinguistic |
max_sensorimotor |
max_{var}_sensorimotor |
get_max_sensorimotor |
Maximum sensorimotor rating of the tokens in the text |
|
Standard deviation of sensorimotor score |
psycholinguistic |
sd_sensorimotor |
sd_{var}_sensorimotor |
get_sd_sensorimotor |
Standard deviation of the sensorimotor ratings of the tokens in the text |
|
Morphological feature counts |
morphological |
n_per_morph_feature |
n_{pos}_{feature}_{val} |
get_morph_feats |
Number of tokens with {pos} {feature} {val} (e.g. VERB VerbForm Inf), takes a dictionary of pos and associated features and values for them |
Default dictionary is the full set of UD options; adapt if you do not need all of them (may be language-specific) |
Dependency tree width |
dependency |
tree_width |
tree_width |
get_tree_width |
Maximum number of siblings of a node at any level |
|
Dependency tree depth |
dependency |
tree_depth |
tree_depth |
get_tree_depth |
Maximum distance of a token to the root of the dependency tree |
|
Tree branching factor |
dependency |
tree_branching |
tree_branching |
get_tree_branching |
Average number of children of a token in the dependency tree |
|
Tree ramification factor |
dependency |
ramification_factor |
ramification_factor |
get_ramification_factor |
Average number of children per level |
|
Number of noun chunks |
dependency |
n_noun_chunks |
n_noun_chunks |
get_n_noun_chunks |
Number of noun chunks in the dependency tree |
|
Number of dependencies of type {type} |
dependency |
n_per_dependency_type |
n_dependency_{type} |
get_n_per_dependency_type |
||
{Feature} token ratio |
ratios/normalization |
{feature}_token_ratio |
get_feature_token_ratio |
|||
{Feature} type ratio |
ratios/normalization |
{feature}_type_ratio |
get_feature_type_ratio |
|||
{Feature} sentence ratio |
ratios/normalization |
{feature}_sentence ratio |
get_feature_sentence_ratio |