NEWS
vosonSML 0.35.0
Bug Fixes
- Changed reddit https collection method because of platform issues with the
httr
packages on Windows.
- Removed failing S3 dispatch methods from the
Graph
function.
- Replaced
httr
request methods with httr2
versions.
Major Changes
- Removed twitter functions from the package.
Minor Changes
- Added
writeToFile
to all methods.
- Added
Merge
support for mastodon.
- Changed the
voson.msg
option to voson.cat
for cat
message output.
- Changed
verbose
message technique and verbose = TRUE
is now the default for most functions.
vosonSML 0.34.3
Bug Fixes
- Disabled metadata logging that occurs when the
writeToFile
parameter of Collect
is used. This is due to a new
package issue with R version 4.4.
vosonSML 0.34.2
Bug Fixes
- Fixed a reddit data collection issue for threads that are specified using shorter URL's without the title part and
that contain
continue thread
links. These links were resolving to the main thread resulting in duplication of
comments and thread structures.
vosonSML 0.34.1
Minor Changes
- Added a parameter to
Mastodon
network Create()
function named subtype
for creating variations to the
activity
and actor
networks. For the activity
network a subtype = tag
parameter can be used to create a tag
network of post tags that are colocated. For the actor
network a subtype = server
parameter can be used to create
a server
network, which is an actor
network reduced to server associations.
vosonSML 0.34.0
Major Changes
- Added
Mastodon
authentication, collection and network creation. There are two options for Mastodon
collection, a
hashtag search for global or local server timeline posts that is optionally authenticated: Collect.search.mastodon()
,
and a public thread collection function using input URL's that is similar to Reddit
thread collection that requires
no authentication: Collect.thread.mastodon()
. To access these methods via Collect
an endpoint = "search"
or
endpoint = "thread"
parameter should be passed to the functions.
- The
Mastodon
authentication and collection uses the rtoot
package and a function has been created for importing
rtoot
data into vosonSML
called ImportRtoot
. Imported data can be passed as input to the Create
network functions.
Minor Changes
- Changed default
Reddit
request wait time range from 3 to 5 seconds, to 6 to 8 seconds to avoid a proposed platform
rate limit of 10 requests per minute. This value can still be manually set using the waitTime = c(min, max)
wait time
range parameter.
vosonSML 0.33.2
Bug Fixes
- Fixed a bug in the regex for
Reddit
URL parsing in which thread ID's were limited to 6 characters.
- Fixed verbose output for
2mode
networks to use option specified method.
- Fixed an issue with adding text to
Twitter
networks caused by missing columns in the data.
- Added twitter tokenization functions that were recently removed from the
tidytext
and tokenizers
packages due to
a change in the ICU library unicode standard and the stringi
package
(tokenizers issue #82). This affects only the generation of
semantic
and 2mode
twitter networks and the fix maintains their functionality until an alternative tweet
tokenization method is implemented. Unfortunately these two twitter network types are not supported on systems using
ICU library versions >= 72.0 at this time.
- Fixed an intermitant column mismatch error in
Twitter
caused by unexpected type when data merging.
- Fixed the number of tweet observations does not match number of users error reported with
rtweet
v1.1.
- Fixed number of tweets requested count in verbose message for
Twitter
timeline collection.
- Fixed a bug in
Reddit
thread collection where URL's missing trailing slashes would trigger loop protection errors.
- Changed the default
sort
parameter value for Reddit
threaad collection to be NA
. Default sort order on Reddit
is not a fixed value.
Major Changes
- Added
sort
parameter to Reddit
collection. As this collection method is limited, it may be useful to request
comments in sort order using the Reddit
sort options top
, new
, controversial
, old
, qa
and best
.
- Added a
Collect.listing
function for subreddits on Reddit
. This is not a search, however it allows complete
metadata for a specified number of subreddit threads to be collected in sorted order. The sort options are hot
,
top
, new
and rising
. There is a further time parameter period
that can be set to hour
, day
, week
,
month
, year
or all
if sort = top
, meaning for example, results sorted by top threads over the last week.
Minor Changes
- Added simple log file output for
Collect
and Merge
functions when writeToFile = TRUE
. The log file is written in
the same location as the data file with the .txt
extension appended.
- Changed data output path option
option(voson.data = "my-data")
to now attempt to create the directory if it does
not exist.
vosonSML 0.32.8
Bug Fixes
- Fixed two issues that arose from the introduction of tibbles and verbose messaging in
Collect.reddit()
.
- Fixed an error caused by unescaped regex parameters in hyperlinks processed by
Collect.web()
(#49).
vosonSML 0.32.7 (2022-08-16)
Major Changes
- Re-wrote and modified
vosonSML
Twitter
functions to support major changes made in rtweet
release version 1.0.2.
- Added an
endpoint
parameter to the Twitter
Collect
function. It is set to search
by default, which is the
usual collect behaviour, but can also now be set to timeline
to collect user timelines instead. See
Collect.timeline.twitter()
for parameters.
- Changed output message system.
vosonSML
functions are now silent by default. Using the verbose
parameter will
again print function output.
- Changed output messages to use the
message()
function instead of the cat()
function by default. Setting the global
option option(voson.msg = FALSE)
will again redirect output to cat()
. The option can be removed by assigning a
value of NULL
.
- Added the
voson.data
option allowing a directory path to be set for writeToFile
output files. Files are output to
the current working directory by default, however a new directory can now be set with option(voson.data = "my-data")
for example. The directory path can be relative or a full path, but must be created beforehand or already exist. If
the path is invalid or does not exist it will continue with the default behaviour. This option can be removed by
assigning a value of NULL
.
This will not effect other file write operations performed by the user.
- The
Twitter
AddText()
and AddUserData()
functions now work with most Twitter
network types.
AddText()
now adds columns for embedded tweet text and has a hashtags
parameter to add a list of tweet hashtags as
a network attribute.
AddUserData()
now adds an additional dataframe for missing_users
. It lists the ids and screen names of users that
did not have metadata embedded in the collected data. Using the lookupUsers
parameter will retrieve the metadata
using the twitter API. Additonally passing the refresh = TRUE
parameter will now retrieve and update the metadata
for all users in the network.
- Twitter data collection now returns a named list of two dataframes containing
tweets
and users
.
- Removed the
ImportData
function and replaced it with ImportRtweet()
for rtweet
version 1.0 format data.
- Added
Merge()
and MergeFiles()
functions to support the merging of collected data from separate operations. These
functions support input of multiple Collect objects or .RDS
files, automatically detect the datasource type and
support the writeToFile
parameter for file output of merged data.
Minor Changes
- Re-wrote
YouTube
id extraction from url function to be more robust and added support for YouTube
shorts urls.
- Removed stand-alone
GetYoutubeVideoIDs
function. The YouTube
collect function parameter videoIDs
will now accept
video ids or video urls.
- Added wrappers and aliases for some functions. Twitter auth objects can now be created with simplified
auth_twitter_app()
, auth_twitter_dev()
and auth_twitter_user()
functions for each token type. The
collect_reddit_threads()
and collect_web_hyperlinks()
functions skip the unecessary Authenticate
step for
Reddit
and web data collection.
vosonSML 0.31.1
Bug Fixes
- Incorrectly ordered tweets by
status ID
to summarise collected tweet range. The Min ID
and Max ID
are not
necessarily the earliest and latest tweet in the tweets collected and therefore not ideal for delimiting subsequent
collections. Instead the two Earliest Obs
and two Latest Obs
tweets as returned by the Twitter API
are now
reported.
Major Changes
- Added
enpoint
parameter to Collect
, allowing search
or timeline
to be specified for a twitter
data
collection. If it is not specified the default is a twitter search
.
- The
timeline
collection accepts a users
vector of user names or ID's or a mixture of both, and will return up to
3,200 of each users most recent tweets.
- Minimum required version of R has changed from 3.6 to 4.1.
Minor Changes
- Updated standard package documentation, added citation, code of conduct and README.Rmd.
- Replaced magrittr pipes with native pipe operators.
vosonSML 0.30.6
Minor Changes
- Updated standard package documentation, added citation and README.Rmd.
vosonSML 0.30.5
Major Changes
- Re-implemented
Create.actor.twitter
and Create.activity.twitter
to use dplyr
and data.table
techniques
consistent with other package network creation functions. Both functions are significantly faster for large collection
dataframes.
Minor Changes
Create.actor.twitter
includes two new parameters for mentions
, inclMentions
that will process and include
mentions
edges in the network and inclRtMentions
that will process and include mentions found in retweets.
The inclMentions
parameter is set to TRUE
by default and inclRtMentions
set to FALSE
. The inclRtMentions
parameter is a subset of mentions, therefore for it to be set to TRUE
, inclMentions
must also be TRUE
.
- Re-implemented and simplified the
Create.activity.twitter
network creation. Added author_id
and
author_screen_name
to nodes to assist with labels or re-creating tweet URLs from data.
- Added
rmEdgeTypes
parameter to Create.activity.twitter
and Create.actor.twitter
. These accept a list of edge
types that can be filtered out of the network during network creation.
- Removed label attributes from igraph graphs generated by the
Graph
function.
- Tidied up and renamed many of the utils functions. Removed unused functions.
- Added last observation tweet to minimum and maximum status ID values reported for twitter collections. Usually the
last observation and
Min ID
will be the same, but sometimes the Min ID
is outside of the expected collection
range. The last observation is a more reliable tweet to use as the starting point for subsequent search collections.
- Cleaned up package imports, suggests and added some interactive package checks to reduce the number of required
imports.
vosonSML 0.30.0
Major Changes
- Added a web crawler
Collect
method with hyperlink network creation. The Create
function with activity
type
parameter creates a network where nodes are web pages
and edges the hyperlinks
linking them (extracted from
a href
HTML tags). The actor
network has page or site domains
as the nodes and again the hyperlinks
from
linking pages between domains.
vosonSML 0.29.14
Minor Changes
- Prepending instead of appeneding S3 class names to
Collect
dataframes to avoid dplyr
issues.
- Removed
retryOnRateLimit
set to FALSE
if rate limit cannot be determined.
ImportData
will now accept a file path or a dataframe.
Bug Fixes
- S3 class names were being added to
Collect
dataframes after writeToFile
. Should no longer be required to manually
add class names or use ImportData
to load RDS files to use previously saved data with Create
functions.
vosonSML 0.29.13 (2020-07-18)
Minor Changes
- Minor documentation updates to
Create.semantic.twitter
, Create.twomode.twitter
and the Intro-to-vosonSML
vignette:
- Specified the
tidyr
, tidytext
and stopwords
package requirements in descriptions and examples
- Updated references to
twomode
networks as 2-mode
where possible
vosonSML 0.29.12
Bug Fixes
- Fixed an issue with custom classes assigned to dataframes causing an
vctrs
error when using dplyr
functions. The
classes are no longer needed post-method routing so they are simply removed.
- Replaced an instance of the deprecated
dplyr::funs
function that was generating a warning.
Minor Changes
- Minor documentation updates.
vosonSML 0.29.11
Bug Fixes
- Fixed a reddit collect
bind_rows
error on joining dataframes with different types for the structure column. Column
type was being set to integer instead of character in cases when every thread comment have no replies or depth (except
the OP).
vosonSML 0.29.10 (2020-04-25)
Minor Changes
- Reimplemented the
Create.semantic.twitter
and Create.twomode.twitter
functions using the tidytext
package. They
now better support tokenization of tweet text and allows a range of stopword lists and sources to be used from the
stopwords
package. The semantic network function requires the tidytext
and tidyr
packages to be installed before
use.
- New parameters have been added to
Create.semantic.twitter
:
- Numbers and urls can be removed or included from the term list using
removeNumbers
and removeUrls
, default
value is TRUE
.
- The
assoc
parameter has been added to choose which node associations or ties to include in the network. The
default value is "limited"
and includes only ties between most frequently occurring hashtags and terms in
tweets. A value of full
will also include ties between most frequently occurring hashtags and hashtags, and
terms with terms creating a more densely connected network.
- Parameters to specify
stopwords
language e.g stopwordsLang = "en"
and source e.g stopwordsSrc = "smart"
have
been added. These correspond to the language
and source
parameters of the tidytext::get_stopwords
function.
The stopwords
default value is TRUE
.
- The network produced by the
Create.twomode.twitter
function is weighted by default but can be disabled by setting
the new weighted
parameter to FALSE
.
- Renamed the
replies_from_text
parameter to repliesFromText
and at_replies_only
to atRepliesOnly
in the
AddText.actor.youtube
function for consistency.
- Improved the usage examples in the README file.
- Removed
tm
package dependency.
vosonSML 0.29.9
Minor Changes
- Updated
Introduction to vosonSML
vignette Merging Collected Data
examples.
- Added new hex sticker to package documentation.
Bug Fixes
- Fixed a logic problem in
Collect.youtube
that was causing no video comments to be collected if there were no reply
comments for any of the videos first maxComments
number of top level comments. For example, if maxComments
is set
to 100 and the first 100 comments made to a video had no replies then no results would be returned.
vosonSML 0.29.8
Bug Fixes
- A recent intermittent problem with the Twitter API caused an issue with the
rtweet::rate_limit
function that
resulted in an error when using the rtweet retryonratelimit
search parameter. The rate_limit
function was being
called by vosonSML
to check the twitter rate limit regardless of whether the search parameter was set or not, and so
was failing Collect
with an error. A fix was made so that vosonSML
checks if rtweet::rate_limit
succeeds, and if
not automatically sets retryonratelimit
to FALSE
so that a twitter Collect
can still be performed without error
should this problem occur again.
Minor Changes
- Added some links to the
pkgdown
site navbar.
vosonSML 0.29.7
Minor Changes
- Added some guidance for merging collected data to the
Introduction to vosonSML
vignette.
vosonSML 0.29.6
Minor Changes
- Added
Introduction to vosonSML
vignette to the package.
- Minor changes and input checks added to
ImportData
.
- Added some unit testing for
Authenticate
and ImportData
.
vosonSML 0.29.5
Minor Changes
- Reddit JSON is now retrieved using
jsonlite::fromJSON
.
- Reddit 'Continue' threads are now followed with additional thread requests. Many more comments are now collected for
threads with large diameters or breadth. Continue threads also have a Reddit limit of 500 comments per thread request.
- Reddit comment ID's and timestamps are now extracted.
- Removed the
tictoc
package from dependency imports to suggested packages.
- Added some checks for whether the
rtweet
package is installed.
- Removed the
RedditExtractoR
package from imports.
- HTML decoded tweet text during network creation to replace '&', '<', and '>' HTML codes.
- Added node type attribute to
twomode
networks.
vosonSML 0.29.4 (2019-11-23)
Minor Changes
- Renamed
bimodal
networks to twomode
.
vosonSML 0.29.3
Minor Changes
- Added output messages from supplemental functions such as
AddText()
and Graph()
. Also improved
consistency of output messages from Collect
and Create
functions.
Bug Fixes
- Added a fix
reddit
gsub locale error https://github.com/vosonlab/vosonSML/issues/21.
- Changed
bimodal
network hashtags to lowercase as filter terms when entered are converted to
lowercase.
- Fixed errors thrown when removing terms from
bimodal
and semantic
networks.
- Removed a duplicate
GetVideoData()
function call in AddVideoData
.
- Fixed data type errors in
AddText
functions related to strict typing by dplyr::if_else
function.
vosonSML 0.29.2
Minor Changes
- A feature was added to the youtube actor
AddText
function to redirect edges towards actors based
on the presence of a screen name
or @screen name
that may be found at the beginning of
a reply comment. Typically reply comments are directed towards a top-level comment, this
instead captures when reply comments are directed to other commenters in the thread.
vosonSML 0.29.1
Minor Changes
- Changed youtube
actor
network identifiers to be their unique Channel ID
instead of their
screen names
.
- Created the
AddVideoData
function to add collected video data to the youtube actor
network. The
main purpose of this function is to replace video identifiers with the Channel ID
of the video
publisher (actor) instead. To get the Channel ID
of video publishers an additional API lookup for
the videos in the network is required. Additional columns such as video Title
, Description
and
Published
time are also added to the network $edges
dataframe as well as returned in their own
dataframe called $videos
.
vosonSML 0.29.0
Major Changes
- Created the
AddText
function to add collected text data to networks. This feature applies
to activity
and actor
networks and will typically add a node attribute to activity networks
and an edge attribute to actor networks. For example, this function will add the column
vosonTxt_tweets
containing tweet text to $nodes
if passed an activity network, and to
$edges
if passed an actor network.
- Generation of
igraph
graph objects and subsequent writing to file has been removed from the
Create
function and placed in a new function Graph
. This change abstracts the graph creation
and makes it optional, but also allows supplemental network steps such as AddText
to be
performed prior to creating the final igraph object.
Minor Changes
- Removed
writeToFile
parameter from Create
functions and added it to Graph
.
- Removed
weightEdges
, textData
and cleanText
parameters from Create.actor.reddit
.
cleanText
is now a parameter of AddText.activity.reddit
and AddText.actor.reddit
.
- Replaced
AddTwitterUserData
with AddUserData
function that works similarly to AddText
.
This function currently only applies to twitter actor networks and will add, or download
add if missing, user profile information to actors as node attributes.
vosonSML 0.28.1
Minor Changes
- Added
activity
network type for reddit. In the reddit activity network nodes are the
thread posts and comments, edges represent where comments are directed in the threads.
- Added github dev version badge to README.
vosonSML 0.28.0
Major Changes
- Added new
activity
network type for twitter and youtube Create
function. In this network
nodes are the items collected such as tweets returned from a twitter search and comments
posted to youtube videos. Edges represent the platform relationship between the tweets or
comments.
vosonSML 0.27.3
Minor Changes
- Added a new twitter actor network edge type
self-loop
. This aims to facilitate the later addition
of tweet text to the network graph for user tweets that have no ties to other users.
vosonSML 0.27.2 (2019-07-18)
Minor Changes
- Added twitter interactive web authorization of an app as provided by
rtweet::create_token
.
Method is used when only twitter app name and consumer keys are passed to Authenticate.twitter
as parameters. e.g Authenticate("twitter", appName = "An App", apiKey = "xxxxxxxxxxxx", apiSecret = "xxxxxxxxxxxx")
. A browser tab will open asking the user to authorize the app to
their twitter account to complete authentication. This is using twitters
Application-user authentication: OAuth 1a (access token for user context)
method.
- It is suspected that Reddit is rate-limiting some generic R UA strings. So a User-Agent string is
now set for underlaying R Collect functions (e.g
file
) via the HTTPUserAgent
option. It is
temporarily set to package name and current version number for Collect e.g
vosonSML v.0.27.2 (R Package)
.
- Removed hex sticker (and favicons for pkgdown site).
vosonSML 0.27.1
Bug Fixes
- Fixed a bug in
Create.semantic.twitter
in which a sum operation calculating edge
weights would set NA
values for all edges due to NA
values present in the hashtag fields.
This occurs when there are tweets with no hashtags in the twitter collection and is now
checked.
- Some UTF encoding issues in
Create.semantic.twitter
were also fixed.
Minor Changes
- Added '#' to hashtags and '@' to mentions in twitter semantic network to differentiate between
hashtags, mentions and common terms.
vosonSML 0.27.0
Bug Fixes
- Fixed a bug in
Collect.twitter
in which any additional twitter API
parameters
e.g lang
or until
were not being passed properly to rtweet::search_tweets
. This
resulted in the additional parameters being ignored.
Major Changes
- Removed the
SaveCredential
and LoadCredential
functions, as well as the useCachedToken
parameter for Authenticate.twitter
. These were simply calling the saveRDS
and readRDS
functions and not performing any additional processing. Using saveRDS
and readRDS
directly
to save and load an Authenticate
credential object to file is simpler.
- Changed the way that the
cleanText
parameter works in Create.actor.reddit
so that it is
more permissive. Addresses encoding issues with apostrophes and pound symbols and removes
unicode characters not permitted by the XML 1.0 standard as used in graphml
files. This is
best effort and does not resolve all reddit
text encoding issues.
Minor Changes
- Added
Collect.twitter
summary information that includes the earliest (min) and latest (max)
tweet status_id
collected with timestamp. The status_id
values can be used to frame
subsequent collections as since_id
or max_id
parameter values. If the until
date
parameter was used the timestamp can also be used as a quick confirmation.
- Added elapsed time output to the
Collect
method.
vosonSML 0.26.3 (2019-02-22)
Bug Fixes
- Fixed bugs in
Create.actor.reddit
that were incorrectly creating edges between
top-level commentors and thread authors from different threads. These bugs were only
observable in when collecting multiple reddit threads.
Minor Changes
- Improved output for
reddit
collection. Removed the progress bar and added a table
of results summarising the number of comments collected for each thread.
- Added to
twitter
collection output the users twitter API
reset time.
vosonSML 0.26.2
Bug Fixes
- Fixed a bug in
Create.actor.twitter
and Create.bimodal.twitter
in which the vertices
dataframe provided to the graph_from_data_frame
function as a contained duplicate names
raising an error.
Major Changes
- Revised and updated
roxygen
documentation and examples for all package functions.
- Updated all
Authenticate
, Collect
and Create
S3 methods to implement function routing
based on object class names.
Minor Changes
- Created a
pkgdown
web site for github hosted package documentation.
- Created a new hex sticker logo.
vosonSML 0.25.0
Major Changes
- Replaced the
twitteR
twitter collection implementation with the rtweet
package.
- A users
twitter
authentication token can now be cached in the .twitter_oauth_token
file and
used for subsequent twitter API
requests without re-authentication. A new authentication
token can be cached by deleting this file and using the re-using the parameter
useCachedToken = TRUE
.