##### Page tree
Go to start of banner

# Template for submitting Python algorithm

## What algorithm author need to submit

### algorithm.py

• algorithm.py contains a function named def algorithm(df, params), which serve as the "main function" of your algorithm. It takes two input parameters: df is a pandas dataframe that contains the complete social media source data (see examples); params is a python dictionary that has all the user specified parameters.
• algorithm() function outputs a python dictionary named output. Its content will be key-value pairs with the key being the output name, and the value being the output content in memory. The type of value can be String, List, List of List,  nested Dictionary, binary and etc.
• if you would like to plot your algorithm result in either pie chart, bar chart or network chart, you can use our helper function plot to do so. If you would like to produce your own plot, you HAVE TO use python library PLOTLY to do so and generate HTML strings.
• to test if your algorithm works with our social media data simply run python3 algorithm.py

Here is an example of algorithm.py for Sentiment Analysis. Here we have already construct a Sentiment class where contains all the calculation of sentiment, negations, capitalized word and so on. If your algorithm code is short enough, you can fit the code in the algorithm function as well.

Sentiment Analysis
```import plot
import pandas as pd
from sentiment_analysis import Sentiment

def algorithm(df, params):
"""
wrapper function to put each individual algorithm inside
:param df: dataframe that contains all the input dataset
:param params: algorithm specific parameters
:return: a dictionary of { outputname: output content in memory }
"""

output = {}

# algorithm specific code
# construct sentiment analysis
SA = Sentiment(df, params['column'])

sentiment_sentence, sentiment_doc = SA.sentiment(params['algorithm'])
output['sentiment'] = sentiment_sentence
output['doc'] = sentiment_doc

output['negation'] = SA.negated()
output['allcap'] = SA.allcap()

# plot
labels = ['negative', 'neutral', 'positive']
values = [sentiment_doc['neg'], sentiment_doc['neu'],
sentiment_doc['pos']]
output['div'] = plot.plot_pie_chart(labels, values,
title='Sentiment of the dataset')

return output

if __name__ == '__main__':
"""
to test just run algorithm.py:
python3 algorithm.py
"""

# download our example dataset and place it under the same directory of this script

params = {
"column": "text",
}

output = algorithm(df, params)

# see if the outputs are what you desired
print(output.keys())
print(output['sentiment'][:5])
print(output['doc'])
print(output['negation'][:5])
print(output['allcap'][:5])
print(output['div'][:100])

```

### plot.py

We have provided a graph helper script using plotly to generate html code. There're three types of plots available right now: pie chart, bar chart and network chart.

pie chart
```def plot_pie_chart(labels, values, title):
"""
plot pie chart
:param labels: list of label, shape must match parameter values
:param values: list of values, shape must match parameter labels
:param title: title to show
:return: html code in a div```
network
```def plot_network(graph, layout, relationships, title):
"""
plot network graph
:param graph: networkx graph
:param layout: network layout
:param relationships: reply, retweet, mention or anything else
:param title: title to show
:return: html code in a div
"""```
bar chart
```def plot_bar_chart(index, counts, title):
"""
plot bar chart
:param index: x - axis, usually the index
:param counts: y - axis, usually counts
:param title:
:return:
"""```

If you would like to write your own code to generate other types of plot, please make sure:

• You have to make interactive plot that in HTML code (ideally wrapped in a <div> tag instead of a complete HTML page)
• If you would like to use plotly library, here is how you write the final output command
`div = plot(fig, output_type='div', image='png', auto_open=False,           image_filename='plot_img')return div`

### requirement.txt

“Requirements files” are files containing a list of items to be installed using pip install like so:

pip install -r requirements.txt

For any third party libraries that your algorithm has used, you have to add those in a requirement file (ideally with a specific versions). Read more about requirement files here.

Here is an example:

requirement.txt
```networkx==1.11
plotly==2.7.0
nltk==3.2.5```

## Complete deployment

### Lambda

Access the templates:

Explanation:

Function

parameters

description

lambda_function.lambda_handler

params

context

At the time you create a Lambda function, you specify a handler, which is a function in your code, that AWS Lambda can invoke when the service executes your code.

Lambda uses params to pass in event data to the handler. This parameter is usually of the Python dict type. It can also be list, str, int, float, or NoneType type. In our case, event is the argument that SMILE passed into the lambda function: It contains the parameters from the args section in your config file, along with default parameters such as remoteReadPath, resultPath, column, s3FolderName and uid. Here is an example:

params
```{
"resultPath": "/NLP/sentiment/",
"column": "text",
"s3FolderName": "cwang138",
"uid": "20872a1c-52fd-45b0-b26c-aa0536097bd6",
}```

context provides runtime information to your handler.

Under no special circumstance, you should not need to modify anything within the lambda_handler function. A few things have been done in order:

• given the params, construct reading and writing path both in local /tmp as well as in remote S3 bucket
• save all the parameters in a config.json file and store it remotely
• preparing input dataset and load the dataset into a pandas dataframe
• execute the user specified algorithm. This algorithm must takes a dataframe, and the params as input; spill out a dictionary of output { output_name: output_data }
• store the output from the algorithm accordingly into different type of files (json, csv, html, pickle and etc); store them remotely and returns a dictionary of { output_name: url_of_the_file } to the SMILE app

lambda_function.algorithm

df

params

This is where you add your own algorithm. You can directly put your algorithm here if it is just a few lines; or you can write your own class of algorithm and here just initiate your class and calling your functions.

Your input would be a pandas dataframe that contains the social media dataset of your choice, as well the parameters you specified in the configuration json file's args section. The output is a dictionary of output { output_name: output_data }

Here is an example:

sentiment analysis
```import plot
from lambda_sentiment_analysis import Sentiment

def algorithm(df, params):
"""
wrapper function to put each individual algorithm inside
:param df: dataframe that contains all the input dataset
:param params: algorithm specific parameters
:return: a dictionary of { outputname: output content in memory }
"""

output = {}

# algorithm specific code
# construct sentiment analysis
SA = Sentiment(df, params['column'])

sentiment_sentence, sentiment_doc = SA.sentiment(params['algorithm'])
output['sentiment'] = sentiment_sentence
output['doc'] = sentiment_doc

output['negation'] = SA.negated()
output['allcap'] = SA.allcap()

# plot
labels = ['negative', 'neutral', 'positive']
values = [sentiment_doc['neg'], sentiment_doc['neu'],
sentiment_doc['pos']]
output['div'] = plot.plot_pie_chart(labels, values,
title='Sentiment of the dataset')

return output

```

localpath

remotepath

filename

helper function to upload local file to S3 bucket

writeToS3.createDirectory

DirectoryNamehelper function to create a folder in S3 bucket

remotepath

filename

helper function to generate a downloadable url of a file stored in S3 bucket

filename

localpath

remotepath

writeToS3.getObject

writeToS3.listDir

remoteClassreturn a list of folder names in S3 bucket

writeToS3.listFiles

foldernamelist all the files under a certain folder in S3 bucket
plot.plot_pie_chart

labels

values

title

create an interactive pie chart html (div) code using plotly

label is the pie chart label; values are the values to plot; title is the plot title

Note: label and values must have matching shape

plot.plot_network

graph

layout

relationships

title

Given a networkx graph, layout settings, relationships and plot title, create an interactive network html code using plotly
plot.plot_bar_chart

index

counts

title

index denotes the x-axis; counts denotes the y-axis; title is the plot title. create an interactive bar chart html code using plotly
dataset.organize_path_lambdaevent

given parameters passed from SMILE, construct the path for locally read and save files, and path to read and save in S3 bucket.

it outputs a path dictionary that contains remoteReadPath, localReadPath, localSavePath, remoteSavePath, filename

for example:

path
```{
"filename":"Boeing737.csv",
"localSavePath":"/tmp/cwang138/NLP/preprocessing/0d2cdb87-cf5f-486f-92d5-e75fd41fe439/",
"remoteSavePath":"cwang138/NLP/preprocessing/0d2cdb87-cf5f-486f-92d5-e75fd41fe439/",
}	```
dataset.get_remote_input

filename

Using the parameters to download input file from s3 bucket to a local location, and then load it to a pandas dataframe

dataset.save_remote_output

localSavePath

remoteSavePath

fname

output_data

Given the output in memory, first save the output data to local file and then upload to remote S3 bucket. Returns a dictionary of { output_name: url_of_the_file }
requirement.txtNA

provide a list of libraries you use in your algorithm for easy installation.

example:

requirement.txt
```BeautifulSoup==3.2.0
Django==1.3
Fabric==1.2.0
Jinja2==2.5.5
PyYAML==3.09
Pygments==1.4
SQLAlchemy==0.7.1
South==0.7.3
amqplib==0.6.1
anyjson==0.3
...```
pie chart
`def plot_pie_chart(`
• No labels