Page tree
Skip to end of metadata
Go to start of metadata

What algorithm author need to submit

Template

algorithm.pyplot.py

algorithm.py

  • algorithm.py contains a function named def algorithm(df, params), which serve as the "main function" of your algorithm. It takes two input parameters: df is a pandas dataframe that contains the complete social media source data (see examples); params is a python dictionary that has all the user specified parameters. 
  • algorithm() function outputs a python dictionary named output. Its content will be key-value pairs with the key being the output name, and the value being the output content in memory. The type of value can be String, List, List of List,  nested Dictionary, binary and etc.
  • if you would like to plot your algorithm result in either pie chart, bar chart or network chart, you can use our helper function plot to do so. If you would like to produce your own plot, you HAVE TO use python library PLOTLY to do so and generate HTML strings.
  • due to security concern, we do not allow user to directly plugin their algorithm to SMILE with access to real time social media search results, as well as access to the remote storage. However, we took into consideration that the author might want to do some mock test of their algorithm. In algorithm.py you will be able to add a "__main__" check so you can run the script as a standalone. Within the "__main__", first you need to download our mock social media dataset source (example_dataset.csv) and put it in the same directory as the algorithm.py script. Then you would need to load that csv file into a pandas dataframe. After that, you would need to add your parameters to the params variable. One common parameters would likely to be there is column - which column of text in the dataframe you would want to perform analysis on, for example, the example we provide you is a complete tweet payload and you would normally want to analysis the text column only. Column is a default parameter provide by SMILE web app and its value will be user selection based on different social media source type. 
  • to test if your algorithm works with our social media data simply run python3 algorithm.py

Here is an example of algorithm.py for Sentiment Analysis. Here we have already construct a Sentiment class where contains all the calculation of sentiment, negations, capitalized word and so on. If your algorithm code is short enough, you can fit the code in the algorithm function as well. 

Sentiment Analysis
import plot
import pandas as pd
from sentiment_analysis import Sentiment


def algorithm(df, params):
    """
    wrapper function to put each individual algorithm inside
    :param df: dataframe that contains all the input dataset
    :param params: algorithm specific parameters
    :return: a dictionary of { outputname: output content in memory }
    """

    output = {}

    # algorithm specific code
    # construct sentiment analysis
    SA = Sentiment(df, params['column'])

    sentiment_sentence, sentiment_doc = SA.sentiment(params['algorithm'])
    output['sentiment'] = sentiment_sentence
    output['doc'] = sentiment_doc

    if params['algorithm'] == 'vader':
        output['negation'] = SA.negated()
        output['allcap'] = SA.allcap()

    # plot
    labels = ['negative', 'neutral', 'positive']
    values = [sentiment_doc['neg'], sentiment_doc['neu'],
              sentiment_doc['pos']]
    output['div'] = plot.plot_pie_chart(labels, values,
                                        title='Sentiment of the dataset')

    return output


if __name__ == '__main__':
    """ 
    help user with no access to AWS test their model
    to test just run algorithm.py:
    python3 algorithm.py
    """

    # download our example dataset and place it under the same directory of this script
    df = pd.read_csv('example_dataset.csv')

    # add your parameters needed by the analysis
    params = {
        "column": "text",
        "algorithm": "vader"
    }

    # execute your algorithm
    output = algorithm(df, params)

    # see if the outputs are what you desired
    print(output.keys())
    print(output['sentiment'][:5])
    print(output['doc'])
    print(output['negation'][:5])
    print(output['allcap'][:5])
    print(output['div'][:100])


plot.py

We have provided a graph helper script using plotly to generate html code. There're three types of plots available right now: pie chart, bar chart and network chart.

pie chart
def plot_pie_chart(labels, values, title):
    """
    plot pie chart
    :param labels: list of label, shape must match parameter values
    :param values: list of values, shape must match parameter labels
    :param title: title to show
    :return: html code in a div
network
def plot_network(graph, layout, relationships, title):
    """
    plot network graph
    :param graph: networkx graph
    :param layout: network layout
    :param relationships: reply, retweet, mention or anything else
    :param title: title to show
    :return: html code in a div
    """
bar chart
def plot_bar_chart(index, counts, title):
    """
    plot bar chart
    :param index: x - axis, usually the index
    :param counts: y - axis, usually counts
    :param title:
    :return:
    """


If you would like to write your own code to generate other types of plot, please make sure:

  • You have to make interactive plot that in HTML code (ideally wrapped in a <div> tag instead of a complete HTML page)
  • If you would like to use plotly library, here is how you write the final output command 
div = plot(fig, output_type='div', image='png', auto_open=False,
image_filename='plot_img')

return div



requirement.txt

“Requirements files” are files containing a list of items to be installed using pip install like so: 

pip install -r requirements.txt

For any third party libraries that your algorithm has used, you have to add those in a requirement file (ideally with a specific versions). Read more about requirement files here.

Here is an example:

requirement.txt
networkx==1.11 
plotly==2.7.0
nltk==3.2.5



Complete deployment

Batch

Lambda

Access the templates:

Explanation:

Function

parameters

description

lambda_function.lambda_handler

params

context

At the time you create a Lambda function, you specify a handler, which is a function in your code, that AWS Lambda can invoke when the service executes your code.

Lambda uses params to pass in event data to the handler. This parameter is usually of the Python dict type. It can also be list, str, int, float, or NoneType type. In our case, event is the argument that SMILE passed into the lambda function: It contains the parameters from the args section in your config file, along with default parameters such as remoteReadPath, resultPath, column, s3FolderName and uid. Here is an example:

params
{
  "remoteReadPath": "cwang138/GraphQL/twitter-Tweet/pelosi/",
  "resultPath": "/NLP/sentiment/",
  "column": "text",
  "s3FolderName": "cwang138",
  "uid": "20872a1c-52fd-45b0-b26c-aa0536097bd6",
  "algorithm": "vader"
}

context provides runtime information to your handler.

Under no special circumstance, you should not need to modify anything within the lambda_handler function. A few things have been done in order:

  • given the params, construct reading and writing path both in local /tmp as well as in remote S3 bucket
  • save all the parameters in a config.json file and store it remotely
  • preparing input dataset and load the dataset into a pandas dataframe
  • execute the user specified algorithm. This algorithm must takes a dataframe, and the params as input; spill out a dictionary of output { output_name: output_data }
  • store the output from the algorithm accordingly into different type of files (json, csv, html, pickle and etc); store them remotely and returns a dictionary of { output_name: url_of_the_file } to the SMILE app

lambda_function.algorithm

df

params

This is where you add your own algorithm. You can directly put your algorithm here if it is just a few lines; or you can write your own class of algorithm and here just initiate your class and calling your functions.

Your input would be a pandas dataframe that contains the social media dataset of your choice, as well the parameters you specified in the configuration json file's args section. The output is a dictionary of output { output_name: output_data }

Here is an example:

sentiment analysis
import plot
from lambda_sentiment_analysis import Sentiment


def algorithm(df, params):
    """
    wrapper function to put each individual algorithm inside
    :param df: dataframe that contains all the input dataset
    :param params: algorithm specific parameters
    :return: a dictionary of { outputname: output content in memory }
    """

    output = {}

    # algorithm specific code
    # construct sentiment analysis
    SA = Sentiment(df, params['column'])

    sentiment_sentence, sentiment_doc = SA.sentiment(params['algorithm'])
    output['sentiment'] = sentiment_sentence
    output['doc'] = sentiment_doc

    if params['algorithm'] == 'vader':
        output['negation'] = SA.negated()
        output['allcap'] = SA.allcap()

    # plot
    labels = ['negative', 'neutral', 'positive']
    values = [sentiment_doc['neg'], sentiment_doc['neu'],
              sentiment_doc['pos']]
    output['div'] = plot.plot_pie_chart(labels, values,
                                        title='Sentiment of the dataset')

    return output


writeToS3.upload

localpath

remotepath

filename

helper function to upload local file to S3 bucket

writeToS3.createDirectory

DirectoryNamehelper function to create a folder in S3 bucket

writeToS3.generate_downloads

remotepath

filename

helper function to generate a downloadable url of a file stored in S3 bucket

writeToS3.downloadToDisk

filename

localpath

remotepath

helper function to download a file to local disk

writeToS3.getObject

remoteKeyload remote object into memory

writeToS3.listDir

remoteClassreturn a list of folder names in S3 bucket

writeToS3.listFiles

foldernamelist all the files under a certain folder in S3 bucket
plot.plot_pie_chart

labels

values

title

create an interactive pie chart html (div) code using plotly

label is the pie chart label; values are the values to plot; title is the plot title

Note: label and values must have matching shape

plot.plot_network

graph

layout

relationships

title

Given a networkx graph, layout settings, relationships and plot title, create an interactive network html code using plotly
plot.plot_bar_chart

index

counts

title

index denotes the x-axis; counts denotes the y-axis; title is the plot title. create an interactive bar chart html code using plotly
dataset.organize_path_lambdaevent

given parameters passed from SMILE, construct the path for locally read and save files, and path to read and save in S3 bucket.

it outputs a path dictionary that contains remoteReadPath, localReadPath, localSavePath, remoteSavePath, filename

for example:

path
{
	"remoteReadPath":"cwang138/GraphQL/twitter-Tweet/Boeing737/",
	"filename":"Boeing737.csv",
	"localReadPath":"/tmp/cwang138/0d2cdb87-cf5f-486f-92d5-e75fd41fe439/",
	"localSavePath":"/tmp/cwang138/NLP/preprocessing/0d2cdb87-cf5f-486f-92d5-e75fd41fe439/",
	"remoteSavePath":"cwang138/NLP/preprocessing/0d2cdb87-cf5f-486f-92d5-e75fd41fe439/",
}	
dataset.get_remote_input

remoteReadPath

filename

localReadPath

Using the parameters to download input file from s3 bucket to a local location, and then load it to a pandas dataframe

dataset.save_remote_output

localSavePath

remoteSavePath

fname

output_data

Given the output in memory, first save the output data to local file and then upload to remote S3 bucket. Returns a dictionary of { output_name: url_of_the_file }
requirement.txtNA

provide a list of libraries you use in your algorithm for easy installation.

example:

requirement.txt
BeautifulSoup==3.2.0
Django==1.3
Fabric==1.2.0
Jinja2==2.5.5
PyYAML==3.09
Pygments==1.4
SQLAlchemy==0.7.1
South==0.7.3
amqplib==0.6.1
anyjson==0.3
...
pie chart
def plot_pie_chart(
  • No labels