Setting up event batch file delivery

In addition to real-time notifications, event notifications and data can be sent in files via a configured integration channel to cloud storage - Amazon Web Services (AWS), Google Cloud Platform (GCP) - or downloaded using a Secure File Transfer Protocol (SFTP) service. This article describes the set up required on Pismo's part and your part for each method.

Event data is batched for 5 minutes or until it reaches 10MB in size and then saved as a new AWS S3 bucket object containing event JSON data to the /main_stream/ path.

For information security, no S3 bucket is ever exposed publicly. A Key Management Service (KMS) key is created for each bucket so that stored objects get encrypted and only opened when you download it.

Setting up event file delivery to an AWS account

The following shows the data flow for event file delivery to an AWS account:

  1. As processing occurs, event files are generated that are saved in a dedicated S3 bucket exclusive to your Org.

  2. A listener registers the new file and generates a new bucket object event.

  3. The event is delivered as a real-time notification to your SNS topic. This requires you to also have set up real-time event delivery to your AWS account.

  4. Your AWS account receives the event and identifies the bucket, object's path, and size.

  5. Your routine creates a S3 session using a pre-authorized Identity Access Management (IAM) Role to assume Pismo's IAM Role (sts:AssumeRole), using AWS SDK/CLI to access Pismo's S3 bucket. The file is then downloaded ( s3:GetObject) and copied to a local target in your environment.

Steps 3, 4, and 5 can be changed according to your preferences. A worker or application can replace a Lambda function, for example, to receive notifications and download files. A file-based system, database, or other storage solution can replace a destination S3 bucket.

Setup tutorial

Pismo provides a tutorial that steps you through the configurations below - AWS event file configuration tutorial.

Pismo configuration

  • S3 bucket - An S3 bucket is created to handle file transfers exclusively for your Org.

    S3 bucket's Amazon Resource Name (ARN) sample:
    arn:aws:s3:::pismo-dataplatform-tn-376a4170-3c93-4676-b294-ec0b4241c7ab

  • IAM Role - An IAM Role is created to allow file downloading from the bucket. This Role grants s3:GetObject permission to your data bucket. That allows file transfers and ties the STS:AssumeRole action to the ARN of the resource responsible for file downloading (Lambda function, for example).

    IAM Role's ARN sample:
    arn:aws:iam::{pismo_aws_account_id}:role/dataplatform-consumer-tn-376a4170-3c93-4676-b294-ec0b4241c7ab

    Your routine must assume this IAM Role before accessing the S3 Bucket to retrieve files. Initially, no resource has permission to execute AssumeRole. Permission for this action can be set after you let Pismo know your IAM Role.

Your configuration

Configure the following in your AWS environment:

  • Set up real-time event delivery - You must be able to receive Pismo event notifications in real-time to know when a new event file is available for download. To set this up, see Setting up real-time event delivery to an AWS account.

  • IAM Role - Each processing resource at AWS needs an IAM Role permission. Pismo enables the Role the resource uses to execute an STS:AssumeRole on the Pismo side.

    Whenever your routine (Lambda, worker, application, or other method) runs, it should explicitly invoke an STS:AssumeRole to the Pismo's IAM Role (according to the ARN explained previously).

Once your configuration is complete, you can then execute an STS:AssumeRole to download S3 objects.

Setting up event file delivery to a GCP account

The following shows the data flow for event file delivery to a GCP account:

  1. As processing occurs, event files are generated that are saved in a dedicated S3 bucket exclusive to your Org.

  2. A listener registers the new file and generates a new bucket object event.

  3. The event is delivered as a real-time notification to your Pub/Sub. This requires you to also have set up real-time event delivery to your GCP account.

  4. A Pismo account event consumer receives the notification.

  5. The file is copied to your GCP storage.

Setup tutorial

Pismo provides a tutorial that steps you through the configurations below - GCP event file configuration tutorial.

Pismo configuration

Pismo creates an Identity Access Management (IAM) service account to interact with your GCP account resources.

Your configuration

Configure the following in your GCP environment:

  • Project identifier - Provide Pismo with your account's project_id.

  • Bucket cloud storage - Provide Pismo with a Google cloud storage bucket identifier.

  • Bucket access - Configure write permission for Pismo's account service so it can write objects to your GCP bucket.

To begin event file delivery, contact your Technical Account Manager asking for the service account ID related to your GCP integration and provide your GCP project ID and bucket identifier.

Setting up event file delivery using SFTP

If you don't want, or are unable, to exchange integration data from cloud accounts between your organization and Pismo, or if you are not using AWS or GCP, the supported cloud providers for event file delivery, you have the option to consume the event files using an SFTP service.

The Pismo SFTP service abstracts S3 and serves it as an SFTP layer. When you connect using SFTP, you are, in fact, accessing the Pismo S3 bucket.

📘

Accessing JSON schemas

Keep in mind that, while you can access event JSON schemas rendered as HTML, you need an AWS account to access JSON schemas in Pismo's AWS documentation S3 storage bucket.

Your SFTP access is allowed using a login username and SSH RSA Key. No password is generated, just create an SSH RSA Key pair (or use an existing one) and provide Pismo with its public key which is then linked to your login username for access.

The public key should be similar to this:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCwskg9UmvUCUCscqNPgpSMzMUOcpSLzESz+d8RFa+YLkMEO5doyLskdisssjtxFz8A+fW/m4XVB+DyLHoS8pxmRfML/+DyhIb40GJbGV0xcAYC0IrmNXb8ldzMU0FGyIh5r2dZtH5mK9MeDBIZeYASrVLjbyflUy6JWCUUKYFSnm1eCIzmgGYHfYQqf+doCSKItGxpB9G3HhvBXsEvlka93aZRkGiEGTNHDGQ1NooksIKJSltk2ik1XfSADZfex0xrKBmlq/uy2/HXZ3lPQrIaN9fswA7+BLES/s9LZ9C0FC2uD2AwQbpJigqUihHuC7q+zIssWWWksjdkxb1jOljvej

Open up a ticket at the Pismo Service Desk with your public keys attached and request SFTP access for your organization.

📘

You can associate up to 8 keys with your organization. Each Org ID has its own set of credentials, so sandbox and production access are not the same.

Accessing the SFTP service

After you receive your login information from Pismo, you can connect to the SFTP service.

Different environments have different access endpoints:

EnvironmentEndpointPort
Development/Sandboxsftp-main.data.pismolabs.io22
Productionsftp-main.data.pismo.io22

Make sure you are using the correct endpoint with the correct credentials to retrieve your files. Connecting to production using sandbox credentials is not going to work.

To connect to the SFTP service, you can use an SFTP client like Filezilla or WinSCP, or implement a system integration/automation in Java, .Net, Go, Python, or other programming language.

Knowing when files are available

To be notified when there is a new file available, you need to set up real-time event notifications through a cloud account. Otherwise, you will have to periodically open a connection and check for new files.