Package: tosca 0.3-1

Lars Koppers

tosca: Tools for Statistical Content Analysis

A framework for statistical analysis in content analysis. In addition to a pipeline for preprocessing text corpora and linking to the latent Dirichlet allocation from the 'lda' package, plots are offered for the descriptive analysis of text corpora and topic models. In addition, an implementation of Chang's intruder words and intruder topics is provided. Sample data for the vignette is included in the toscaData package, which is available on gitHub: <https://github.com/Docma-TU/toscaData>.

Authors:Lars Koppers [aut, cre], Jonas Rieger [aut], Karin Boczek [ctb], Gerret von Nordheim [ctb]

tosca_0.3-1.tar.gz
tosca_0.3-1.zip(r-4.5)tosca_0.3-1.zip(r-4.4)tosca_0.3-1.zip(r-4.3)
tosca_0.3-1.tgz(r-4.4-any)tosca_0.3-1.tgz(r-4.3-any)
tosca_0.3-1.tar.gz(r-4.5-noble)tosca_0.3-1.tar.gz(r-4.4-noble)
tosca_0.3-1.tgz(r-4.4-emscripten)tosca_0.3-1.tgz(r-4.3-emscripten)
tosca.pdf |tosca.html
tosca/json (API)

# Install 'tosca' in R:
install.packages('tosca', repos = c('https://docma-tu.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/docma-tu/tosca/issues

On CRAN:

6.65 score 17 stars 1 packages 59 scripts 436 downloads 51 exports 42 dependencies

Last updated 2 years agofrom:cbc61f52f0. Checks:OK: 1 ERROR: 6. Indexed: yes.

TargetResultDate
Doc / VignettesOKOct 13 2024
R-4.5-winERROROct 13 2024
R-4.5-linuxERROROct 13 2024
R-4.4-winERROROct 13 2024
R-4.4-macERROROct 13 2024
R-4.3-winERROROct 13 2024
R-4.3-macERROROct 13 2024

Exports:as.corpus.textmetaas.metaas.textmeta.corpuscleanTextsclusterTopicsdeleteAndRenameDuplicatesduplistfilterCountfilterDatefilterIDfilterWordimportanceintruderTopicsintruderWordsis.duplistis.textmetais.textmeta_tidyLDAgenLDAprepmakeWordlistmergeLDAmergeTextmetaplotAreaplotFreqplotHeatplotScotplotTopicplotTopicWordplotWordptplotWordSubprecisionreadTextmetareadTextmeta.dfreadWhatsAppreadWikireadWikinewsrecallremoveHTMLremoveUmlautsremoveXMLsamplingshowMetashowTextstextmetatidy.textmetatopicCoherencetopicsInTexttopTextstopWordsvprecisionvrecall

Dependencies:askpassbase64encBHclicpp11curldata.tabledigestfastmapfastmatchgenericsgluehtmltoolshttrISOcodesjsonlitelatticeldalifecyclelubridatemagrittrMatrixmimeNLPopensslquantedaR6RColorBrewerRcpprlangslamSnowballCstopwordsstringistringrsystimechangetmvctrsWikipediRxml2yaml

tosca: Tools for Statistical Content Analysis

Rendered fromVignette.Rmdusingknitr::rmarkdownon Oct 13 2024.

Last update: 2021-04-18
Started: 2018-08-30

Readme and manuals

Help Manual

Help pageTopics
Transform textmeta to corpusas.corpus.textmeta
"meta" Component of "textmeta"-Objectsas.meta
Transform corpus to textmetaas.textmeta.corpus
Data PreprocessingcleanTexts
Cluster AnalysisclusterTopics
Deletes and Renames Articles with the same IDdeleteAndRenameDuplicates
Creating List of Duplicatesduplist is.duplist print.duplist summary.duplist
Subcorpus With Count FilterfilterCount filterCount.default filterCount.textmeta
Subcorpus With Date FilterfilterDate filterDate.default filterDate.textmeta
Subcorpus With ID FilterfilterID filterID.default filterID.textmeta
Subcorpus With Word FilterfilterWord filterWord.default filterWord.textmeta
Function to validate the fit of the LDA modelintruderTopics
Function to validate the fit of the LDA modelintruderWords
Function to fit LDA modelLDAgen
Create Lda-ready DatasetLDAprep
Counts Words in Text CorporamakeWordlist
Preparation of Different LDAs For ClusteringmergeLDA
Merge Textmeta ObjectsmergeTextmeta
Plotting topics over time as stacked areas below plotted lines.plotArea
Plotting Counts of specified Wordgroups over Time (relative to Corpus)plotFreq
Plotting Topics over Time relative to CorpusplotHeat
Plots Counts of Documents or Words over Time (relative to Corpus)plotScot
Plotting Counts of Topics over Time (Relative to Corpus)plotTopic
Plotting Counts of Topics-Words-Combination over Time (Relative to Words)plotTopicWord
Plots Counts of Topics-Words-Combination over Time (Relative to Topics)plotWordpt
Plotting Counts/Proportion of Words/Docs in LDA-generated Topic-Subcorpora over TimeplotWordSub
Precision and Recallprecision recall vprecision vrecall
Read Corpora as CSVreadTextmeta readTextmeta.df
Read WhatsApp filesreadWhatsApp
Read Pages from WikipediareadWiki
Read files from WikinewsreadWikinews
Removes XML/HTML Tags and UmlautsremoveHTML removeUmlauts removeXML
Sample Textssampling
Export Readable Meta-Data of Articles.showMeta
Exports Readable Text ListsshowTexts
"textmeta"-Objectsis.textmeta plot.textmeta print.textmeta summary.textmeta textmeta
Transform textmeta to an object with tidy text datais.textmeta_tidy print.textmeta_tidy tidy.textmeta
Calculating Topic CoherencetopicCoherence
Coloring the words of a text corresponding to topic allocationtopicsInText
Get The IDs Of The Most Representive TextstopTexts
Top Words per Topicimportance topWords