multipart/form-data – blog.fossasia.org

Create Event by Importing JSON files in Open Event Server

Post author:magdalenesuo
Post published:August 17, 2017
Post category:API Event Event Management FOSSASIA GSoC Open Event
Post comments:0 Comments

Apart from the usual way of creating an event in FOSSASIA’s Orga Server project by using POST requests in Events API, another way of creating events is importing a zip file which is an archive of multiple JSON files. This way you can create a large event like FOSSASIA with lots of data related to sessions, speakers, microlocations, sponsors just by uploading JSON files to the system. Sample JSON file can be found in the open-event project of FOSSASIA. The basic workflow of importing an event and how it works is as follows:

First step is similar to uploading files to the server. We need to send a POST request with a multipart form data with the zipped archive containing the JSON files.
The POST request starts a celery task to start importing data from JSON files and storing them in the database.
The celery task URL is returned as a response to the POST request. You can use this celery task for polling purposes to get the status. If the status is FAILURE, we get the error text along with it. If status is SUCCESS we get the resulting event data
In the celery task, each JSON file is read separately and the data is stored in the db with the proper relations.
Sending a GET request to the above mentioned celery task, after the task has been completed returns the event id along with the event URL.

Let’s see how each of these points work in the background.

Uploading ZIP containing JSON Files

For uploading a zip archive instead of sending a JSON data in the POST request we send a multipart form data. The multipart/form-data format of sending data allows an entire file to be sent as a data in the POST request along with the relevant file informations. One can know about various form content types here .

An example cURL request looks something like this:

curl -H "Authorization: JWT <access token>" -X POST -F 'file=@event1.zip' http://localhost:5000/v1/events/import/json

The above cURL request uploads a file event1.zip from your current directory with the key as ‘file’ to the endpoint /v1/events/import/json. The user uploading the feels needs to have a JWT authentication key or in other words be logged in to the system as it is necessary to create an event.

@import_routes.route('/events/import/<string:source_type>', methods=['POST'])
@jwt_required()
def import_event(source_type):
    if source_type == 'json':
        file_path = get_file_from_request(['zip'])
    else:
        file_path = None
        abort(404)
    from helpers.tasks import import_event_task
    task = import_event_task.delay(email=current_identity.email, file=file_path,
                                   source_type=source_type, creator_id=current_identity.id)
    # create import job
    create_import_job(task.id)

    # if testing
    if current_app.config.get('CELERY_ALWAYS_EAGER'):
        TASK_RESULTS[task.id] = {
            'result': task.get(),
            'state': task.state
        }
    return jsonify(
        task_url=url_for('tasks.celery_task', task_id=task.id)
    )

After the request is received we check if a file exists in the key ‘file’ of the form-data. If it is there, we save the file and get the path to the saved file. Then we send this path over to the celery task and run the task with the .delay() function of celery. After the celery task is started, the corresponding data about the import job is stored in the database for future debugging and logging purposes. After this we return the task url for the celery task that we started.

Celery Task to Import Data

Just like exporting of event, importing is also a time consuming task and we don’t want other application requests to be paused because of this task. Hence, we use a celery queue to execute this task. Whenever an import task is started, it is added to the celery queue. When it comes to the front of the queue it is executed.

For importing, we have created a celery task, import.event which calls the import_event_task_base() function that uses the import helper functions to get the data from JSON files imported and saved in the DB. After the task is completed, we update the import job data in the table with the status as either SUCCESS or FAILURE depending on the outcome of the celery task.

As a result of the celery task, the newly created event’s id and the frontend link from where we can visit the url is returned. This along with the status of the celery task is returned as the response for a GET request on the celery task. If the celery task fails, then the state is changed to FAILURE and the error which the celery faced is returned as the error message in the result key. We also print an error traceback in the celery worker.

@celery.task(base=RequestContextTask, name='import.event', bind=True, throws=(BaseError,))
def import_event_task(self, file, source_type, creator_id):
    """Import Event Task"""
    task_id = self.request.id.__str__()  # str(async result)
    try:
        result = import_event_task_base(self, file, source_type, creator_id)
        update_import_job(task_id, result['id'], 'SUCCESS')
        # return item
    except BaseError as e:
        print(traceback.format_exc())
        update_import_job(task_id, e.message, e.status if hasattr(e, 'status') else 'failure')
        result = {'__error': True, 'result': e.to_dict()}
    except Exception as e:
        print(traceback.format_exc())
        update_import_job(task_id, e.message, e.status if hasattr(e, 'status') else 'failure')
        result = {'__error': True, 'result': ServerError().to_dict()}
    # send email
    send_import_mail(task_id, result)
    # return result
    return result

Save Data from JSON

In import helpers, we have the functions which perform the main task of reading the JSON files, creating sqlalchemy model objects from them and saving them in the database. There are few global dictionaries which help maintain the order in which the files are to be imported and saved and also the file vs model mapping. The first JSON file to be imported is the event JSON file. Since all the other tables to be imported are related to the event table so first we read the event JSON file. After that the order in which the files are read is as follows:

SocialLink
CustomForms
Microlocation
Sponsor
Speaker
Track
SessionType
Session

This order helps maintain the foreign constraints. For importing data from these files we use the function create_service_from_json(). It sorts the elements in the data list based on the key “id”. It then loops over all the elements or dictionaries contained in the data list. In each iteration delete the unnecessary key-value pairs from the dictionary. Then set the event_id for that element to the id of the newly created event from import instead of the old id present in the data. After all this is done, create a model object based on the mapping with the filename with the dict data. Then save that model data into the database.

def create_service_from_json(task_handle, data, srv, event_id, service_ids=None):
    """
    Given :data as json, create the service on server
    :service_ids are the mapping of ids of already created services.
        Used for mapping old ids to new
    """
    if service_ids is None:
        service_ids = {}
    global CUR_ID
    # sort by id
    data.sort(key=lambda k: k['id'])
    ids = {}
    ct = 0
    total = len(data)
    # start creating
    for obj in data:
        # update status
        ct += 1
        update_state(task_handle, 'Importing %s (%d/%d)' % (srv[0], ct, total))
        # trim id field
        old_id, obj = _trim_id(obj)
        CUR_ID = old_id
        # delete not needed fields
        obj = _delete_fields(srv, obj)
        # related
        obj = _fix_related_fields(srv, obj, service_ids)
        obj['event_id'] = event_id
        # create object
        new_obj = srv[1](**obj)
        db.session.add(new_obj)
        db.session.commit()
        ids[old_id] = new_obj.id
        # add uploads to queue
        _upload_media_queue(srv, new_obj)

    return ids

After the data has been saved, the next thing to do is upload all the media files to the server. This we do using the _upload_media_queue() function. It takes paths to upload the files to from the storage.py helper file for APIs. Then it uploads the files using the various helper functions to the static data storage services like AWS S3, Google storage, etc.

Other than this, the import helpers also contains the function to create an import job that keeps a record of all the imports along with the task url and the user id of the user who started the importing task. It also stores the status of the task. Then there is the get_file_from_request() function which saves the file that is uploaded through the POST request and returns the path to that file.

Get Response about Event Imported

The POST request returns a task url of the form /v1/tasks/ebe07632-392b-4ae9-8501-87ac27258ce5. To get the final result, you need to keep polling this URL. To know more about polling read my previous blog about exporting event or visit this link. So when the task is completed you would get a “result” key along with the status. The state can either be SUCCESS or FAILURE. If it is a FAILURE you will get a corresponding error message due to which the celery task failed. If it is a success then you get data related to the corresponding event that was created because of import. The data returned are the event id, event name and the event url which you can use to visit the event from the frontend. This data is also sent to the user as an email and notification.

An example response looks something like this:

{ 
    “result”: {
“event_name” : “FOSSASIA 2016”,
     “id” : “24”,
     “url” : “https://eventyay.com/events/ab3de6”
},
    “state” : “SUCCESS”
}

The corresponding event name and the url is also sent to the user who started the import task. From the frontend, one can use the object value of the result to show the name of the event that is imported along with providing the event url. Since the id and identifier are both present in the result returned one can also make use of them to send GET, PATCH and other API requests to the events/ endpoint and get the corresponding relationship urls from it to query the other APIs. Thus, the entire data that is imported gets available to the frontend as well.

Reference Links:

Read about the form data types: https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4
Read more about the various task APIs related to celery: http://docs.celeryproject.org/en/latest/reference/celery.app.task.html.
Read more about AJAX polling: https://techoctave.com/c7/posts/60-simple-long-polling-example-with-javascript-and-jquery.
More about the various import helper functions: https://github.com/fossasia/open-event-orga-server/blob/nextgen/app/api/helpers/import_helpers.py.

Uploading Files via APIs in the Open Event Server

Post author:magdalenesuo
Post published:August 10, 2017
Post category:API Event Event Management FOSSASIA GSoC Open Event
Post comments:0 Comments

There are two file upload endpoints. One is endpoint for image upload and the other is for all other files being uploaded. The latter endpoint is to be used for uploading files such as slides, videos and other presentation materials for a session. So, in FOSSASIA’s Orga Server project, when we need to upload a file, we make an API request to this endpoint which is turn uploads the file to the server and returns back the url for the uploaded file. We then store this url for the uploaded file to the database with the corresponding row entry.

Sending Data

The endpoint /upload/file accepts a POST request, containing a multipart/form-data payload. If there is a single file that is uploaded, then it is uploaded under the key “file” else an array of file is sent under the key “files”.

A typical single file upload cURL request would look like this:

curl -H “Authorization: JWT <key>” -F file=@file.pdf -x POST http://localhost:5000/v1/upload/file

A typical multi-file upload cURL request would look something like this:

curl -H “Authorization: JWT <key>” -F files=@file1.pdf -F files=@file2.pdf -x POST http://localhost:5000/v1/upload/file

Thus, unlike other endpoints in open event orga server project, we don’t send a json encoded request. Instead it is a form data request.

Saving Files

We use different services such as S3, google cloud storage and so on for storing the files depending on the admin settings as decided by the admin of the project. One can even ask to save the files locally by passing a GET parameter force_local=true. So, in the backend we have 2 cases to tackle- Single File Upload and Multiple Files Upload.

Single File Upload

if 'file' in request.files:
        files = request.files['file']
        file_uploaded = uploaded_file(files=files)
        if force_local == 'true':
            files_url = upload_local(
                file_uploaded,
                UPLOAD_PATHS['temp']['event'].format(uuid=uuid.uuid4())
            )
        else:
            files_url = upload(
                file_uploaded,
                UPLOAD_PATHS['temp']['event'].format(uuid=uuid.uuid4())
            )

We get the file, that is to be uploaded using request.files[‘file’] with the key as ‘file’ which was used in the payload. Then we use the uploaded_file() helper function to convert the file data received as payload into a proper file and store it in a temporary storage. After this, if force_local is set as true, we use the upload_local helper function to upload it to the local storage, i.e. the server where the application is hosted, else we use whatever service is set by the admin in the admin settings.

In uploaded_file() function of helpers module, we extract the filename and the extension of the file from the form-data payload. Then we check if the suitable directory already exists. If it doesn’t exist, we create a new directory and then save the file in the directory

extension = files.filename.split('.')[1]
        filename = get_file_name() + '.' + extension
        filedir = current_app.config.get('BASE_DIR') + '/static/uploads/'
        if not os.path.isdir(filedir):
            os.makedirs(filedir)
        file_path = filedir + filename
        files.save(file_path)

After that the upload function gets the settings key for either s3 or google storage and then uses the corresponding functions to upload this temporary file to the storage.

Multiple File Upload

 elif 'files[]' in request.files:
        files = request.files.getlist('files[]')
        files_uploaded = uploaded_file(files=files, multiple=True)
        files_url = []
        for file_uploaded in files_uploaded:
            if force_local == 'true':
                files_url.append(upload_local(
                    file_uploaded,
                    UPLOAD_PATHS['temp']['event'].format(uuid=uuid.uuid4())
                ))
            else:
                files_url.append(upload(
                    file_uploaded,
                    UPLOAD_PATHS['temp']['event'].format(uuid=uuid.uuid4())
                ))

In case of multiple files upload, we get a list of files instead of a single file. Hence we get the list of files sent as form data using request.files.getlist(‘files[]’). Here ‘files’ is the key that is used and since it is an array of file content, hence it is written as files[]. We again use the uploaded_file() function to get back a list of temporary files from the content that has been uploaded as form-data. After that we loop over all the temporary files that are stored in the variable files_uploaded in the above code. Next, for every file in the list of temporary files, we use the upload() helper function to save these files in the storage system of the application.

In the uploaded_file() function of the helpers module, since this time there are multiple files and their content sent, so things work differently. We loop over all the files that are received and for each of these files we find their filename and extension. Then we create directories to save these files in and then save the content of the file with the corresponding filename and extension. After the file has been saved, we append it to a list and finally return the entire list so that we can get a list of all files.

if multiple:
        files_uploaded = []
        for file in files:
            extension = file.filename.split('.')[1]
            filename = get_file_name() + '.' + extension
            filedir = current_app.config.get('BASE_DIR') + '/static/uploads/'
            if not os.path.isdir(filedir):
                os.makedirs(filedir)
            file_path = filedir + filename
            file.save(file_path)
            files_uploaded.append(UploadedFile(file_path, filename))

The upload() function then finally returns us the urls for the files after saving them.

API Response

The file upload endpoint either returns a single url or a list of urls depending on whether a single file was uploaded or multiple files were uploaded. The url for the file depends on the storage system that has been used. After the url or list of urls is received, we jsonify the entire response so that we can send a proper JSON response that can be parsed properly in the frontend and used for saving corresponding information to the database using the other API services.

A typical single file upload response looks like this:

{
     "url": "https://xyz.storage.com/asd/fgh/hjk/12332233.docx"
 }

Multiple file upload response looks like this:

{
     "url": [
         "https://xyz.storage.com/asd/fgh/hjk/12332233.docx",
         "https://xyz.storage.com/asd/fgh/hjk/66777777.ppt"
     ]
 }

You can find the related documentations and example payloads on how to use this endpoint to upload files here: http://open-event-api.herokuapp.com/#upload-file-upload.

Reference:

More about the uploaded_file() helper function: https://github.com/fossasia/open-event-orga-server/blob/nextgen/app/api/helpers/files.py
Source code for all storage related functions used in open event organizer server: https://github.com/fossasia/open-event-orga-server/blob/nextgen/app/api/helpers/storage.py
Check how to send multipart/form-data as curl request: https://ec.haxx.se/http-multipart.html