Commit e5b0d0f8 authored by Giraldo Garcia, C.G.'s avatar Giraldo Garcia, C.G.
Browse files

Update README.md

parent 0fe4cb06
# lad18-coursera-retrieval
The data retrieval app from coursera
The data retrieval app from Coursera
Before you run the application:
# Before you run the application
In case you dont have a database set up to hold the data, comment lines 47 and 48 in the automatic_requester.py file.
* Make sure that you have Coursera Data Coordinator credentials
* Make sure you know the ID of your organization
* Install the Coursera research exports tool
* Authenticate the "manage_research_exports" app using the Coursera oauth2 client and your Coursera Data Coordinator credentials
* In case the machine that will run the retrieval application doesn’t have a browser: generate the pickle file and bring it
* Install PostgreSQL and make sure there is a database and a user you have access to.
Make a copy of the template_config.json file and edit the date from which you want to start retrieving data in the "fist_clickstream_date" field.
# Preparations for the first use
To run the application, use the command:
"python automatic_requester.py your_own_config.json"
* Download the tables from Coursera using the Coursera research exports tool to make sure that it works
* Do some modifications in the SQL files that come with it
* Add unique here:
* remove the name of your organization from the SQL files:
* keep the modified files, they will shape your database
* Configure the config.json file
* The ID of your organization
* The date you want to start downloading data from in the field "fist_clickstream_date"
* Marking its the first time the tool runs
* The location where the retrieved compressed files will download and extract
* The location where the retrieved compressed files will be stored after the data is pushed into the database
* The comments that will go with your requests against Coursera's service
* The username of the database and the name of the database. It is recommended that the user corresponds to the user from whose bash the retrieval application will run.
The application will request the tables and the clickstream and will download them at the location specified at the json file you used.
\ No newline at end of file
# Preparations for the following uses
* Configure the config.json file
* The ID of your organization
* The date you want to start downloading data from in the field "last_clickstream_date"
* Marking its the first time the tool runs
* The location where the retrieved compressed files will download and extract
* The location where the retrieved compressed files will be stored after the data is pushed into the database
* The comments that will go with your requests against Coursera's service
* The username of the database and the name of the database. It is recommended that the user corresponds to the user from whose bash the retrieval application will run.
# Running the tool
python automatic_requester.py config.json
## What it will do
* First:
* Issue a request for the tables that belong to your organization
* Issue a request for the clickstream that belongs to your organization from the interval between the "fist_clickstream_date" and today, or the "last_clickstream_date" and today.
* Second:
* Download the tables into its temporary location
* Download the clickstream data into its temporary location
* Third:
* Extract the downloaded files
* Insert the values into the database
* Table values are deleted and replaced with the new downloaded values
* Clickstream data is inserted directly
* Remove the extracted files from the location
* Move the extracted files into the final location
# Schedule the task
It is recommended to use cron for this matter.
For better results make sure the application runs after midnight UTC0
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment