Best Security Practices for Predictive Services

The premise of Dato Predictive Services is to make it easy to deploy your models into production, by turning them into endpoints than can be managed and queried. This has a number of implications with a security aspect:

  • A predictive service exposes a management endpoint for a GraphLab Create client to talk to.
  • A predictive service exposes a query endpoint for an HTTP client to talk to.
  • A predictive service can be deployed in the public cloud (EC2), potentially on more than one node.

In the following we provide a set of best practices and guidelines for setting up and managing a secure predictive service.

HTTPS

It is recommended to use HTTPS for the endpoints that your predictive service is exposing, to encrypt all data transmitted between clients and service. For a predictive service deployed in EC2 that requires to provide an SSL credential tuple to the create call:

ec2 = graphlab.deploy.Ec2Config(region='us-west-2',
                                instance_type='m3.xlarge',
                                aws_access_key_id='YOUR_ACCESS_KEY',
                                aws_secret_access_key='YOUR_SECRET_KEY')

deployment = graphlab.deploy.predictive_service.create(
    name='testing',
    ec2_config=ec2,
    state_path='s3://sample-testing/first',
    ssl_credentials=('privatekey.key', 'certificate.crt', True))

In this example, we are indicating that the given certificate is self-signed (using True in the tuple).

You can create a self-signed certificate from a private key in a Linux shell as follows:

openssl genrsa 1024 > privatekey.key
openssl req -new -key privatekey.key -out CSR.csr
openssl x509 -req -days 365 -in CSR.csr -signkey privatekey.key -out certificate.crt

A predictive service created with these parameters will use HTTPS for both its management interface as well as its query endpoints.

On-Premises

For an on-premises deployment you need to combine your SSL certificate (.crt or .cer file provided by a certificate authority or self-signed) and its respective private key generated by you:

openssl genrsa 1024 > privatekey.key
openssl req -new -key privatekey.key -out CSR.csr
openssl x509 -req -days 365 -in CSR.csr -signkey privatekey.key -out certificate.crt

cat certificate.crt privatekey.key > certificate.pem

This creates the combined PEM file certificate.pem that can be specified in the Predictive Services setup script predictive_service.cfg (see also the chapter about Predictive Services on-premises):

...
# secure communication configuration
use_ssl=true
certificate_is_self_signed=true
certificate_path=/home/user/certs/certificate.pem
...

Make sure to secure any copies of your private key file, including the PEM file (which contains the private key).

Changing the API key

A client that is submitting queries to a Predictive Services endpoint needs to specify an API key in the request's HTTP body. Such a key is generated upon the creation of a predictive service but can be changed later, using the set_api_key command:

ps.set_api_key('new_api_key')

The new API key will apply to all endpoints deployed in the respective predictive service. Note that this will affect any client that is currently querying the service.

You can retrieve the current value of the API key using the api_key property:

ps.api_key
'new_api_key'

Cross-Origin Resource Sharing

By default a predictive service does not allow cross-origin resource requests: Any code deployed to the predictive service can only request resources located on the same domain. In order to circumvent this restriction, you can specify a Cross-Origin Resource Sharing (CORS) directive in the form of a string parameter to create:

ec2 = graphlab.deploy.Ec2Config(region='us-west-2',
                                instance_type='m3.xlarge',
                                aws_access_key_id='YOUR_ACCESS_KEY',
                                aws_secret_access_key='YOUR_SECRET_KEY')

deployment = graphlab.deploy.predictive_service.create(
    name='testing',
    ec2_config=ec2,
    state_path='s3://sample-testing/first',
    cors_origin='https://dato.com')

You can disable CORS by using * as the value for cors_origin.

You can find more information on CORS here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS

On-premises

For an on-premises deployment of Predictive Services, which happens through a scipt-based deployment (as opposed to the GraphLab Create client using graphlab.deploy.predictive_services.create), you can use the set_CORS method of the PredictiveServices object:

ps.set_CORS('https://dato.com')

Port Configuration

A predictive service running in EC2 uses either 80 or 443 for control and data flow, depending on whether SSL is configured.

On-premises

For an on-premises deployment, you can override the default (80 or 443) using the lb_port parameter in the setup config. We recommend to close any unused ports of the host machine.

EC2 Security Groups

When you create a predictive service, by default a new security group Dato_Predictive_Service will be created in the default subnet. If you want to manage the security group yourself, you can specify it when configuring the predictive service for EC2:

ec2 = graphlab.deploy.Ec2Config(security_group='YOUR_SECURITY_GROUP_NAME')

If this security group does not exist, it will be created.

CIDR Rules in EC2

In order to further restrict access to your predictive service we recommend to explicitly specify CIDR rules when configuring your EC2 deployment. CIDR rules are specified as part of the Ec2Config object and require an explicit security group. See also graphlab.deploy.Ec2Config

For more information on CIDR rules, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#security-group-rules.