Page History

...

fetcher.class - the fully-qualified name of the class that will be used to fetch data
fetcher.realtime - if true, only the most recent token from each execution will be written to the stream; if false, all tokens produced from each execution will be written to the stream (if they that are newer than the stream's most recent token )will be written to the stream
fetcher.delay - how long to wait between executions (in milliseconds)

parser.class - the fully-qualified name of the class that will be used to parse data (note: some parser implementations may ignore the fetcher and perform the fetching themselves)

date.extractor.class - the fully-qualified name of the class that will be used to extract dates (note: some parsers may ignore the date extractor and perform timestamping themselves)

stream.assigner.class - the fully-qualified name of the class that will determine which stream tokens will be written to (note: some parsers may ignore the stream assigner)

...

Code Block

# the fetcher is ignored, but since we must instantiate something, an HTMLFetcher is used
fetcher.class = edu.uiuc.ncsa.datastream.util.fetch.fetcher.HTMLFetcher
# non-realtime, because we're performing a Twitter search and want all new results
fetcher.realtime = false
# wait 5 minutes between fetches (Twitter is rate-limited)
fetcher.delay = 300000

parser.class = edu.uiuc.ncsa.datastream.util.fetch.dataparser.TwitterParser
# the following four parameters are OAuth authentication parameters
parser.twitter.key = qeS5HHN1s69urz2SqtJISQqeS5HHN1s69urZ2SqtJISQ
parser.twitter.secret = sXcEHIlzMqDSsfxrUNe8D4bGOObxsqidmknpmBn8IsXcEHIlzMqDSsfxrUNe8D4bGOObxsqixmknpmBn8I
parser.twitter.token = 61353510-9TUfOSHMddWSklTzpV23kCqrnK23ev2WdFzlvNP1F9TUfOSHMddWSklTzpV23qCqrnK23ev2WdFzlvNP1F
parser.twitter.tokenSecret = nHyQR6Mi5zZvgptgOPDr0JjqGnoASbvyW5wAa5bKBEnHyQR6Mi5zZvgptgOPDr0JjqGnoASbvzW5wAa5bKBE

# the next few parameters specify a query against Twitter's query API
# for documentation on query syntax see http://search.twitter.com/api/
# this is the query itself. "car" means search for tweets containing the word "car"
parser.twitter.query = car
# here we specify a geographic centroid
parser.twitter.lat = 40.116349
parser.twitter.lon = -88.239183
# and a radius
parser.twitter.radius = 30
# in miles. this is the geographic region in which to search
parser.twitter.distanceUnits = miles

# the date extractor is ignored; the twitter4j API performs date extraction for us
date.extractor.class = edu.uiuc.ncsa.datastream.util.fetch.dateparser.SimpleDateExtractor

# here we're putting all search results into a single stream
stream.assigner.class =  edu.uiuc.ncsa.datastream.util.fetch.ConstantStreamAssigner
# the URI of the stream. this can be any valid URI
stream.assigner.constant.stream = urn:streams/snorb8/twitter


# set up multiple filters
onetimefilter.class = edu.uiuc.ncsa.datastream.util.fetch.filter.MultiFilter
filter.multi.package = edu.uiuc.ncsa.datastream.util.fetch.filter
filter.multi.classes = TypeRegisterFilter,StreamMetadataFilter
# StreamMetadataFilter allows us to include some metadata about our stream
filter.metadata.stream.label = Twitter search for car near CU
# TypeRegisterFilter allows us to associate a content type with token data
# in this case tweets are of type text/plain
filter.typeregister.mime = text/plain

...

Child pages

Versions Compared

Old Version 3

New Version 4

Key