Wednesday, October 29, 2014

Netflix Asgard 1.5 Deployments

With the upgrade for v1.5 of Asgard Netflix, the API for deployments has changed and not all of the old endpoints exist (specifically /cluster/deployment for one).  Because of this, we have had to upgrade our deployment plan to use the new APIs.  However, there does not seem to be a lot of documentation out there for the new APIs, so I thought I'd put together some information in hopes it might help others in the future.

The primary API endpoint for deployments in 1.5 is:

http://<host>/<region>/deployment

where <host> is the host, and <region> is the EC2 region, such as us-west-2.  So a full deployment endpoint might look something like this:

http://asgard.jonfenner.com/us-west-2/deployment

Steps for a deployment are:
  1. Prepare for a deployment
  2. Start a deployment

Prepare for a deployment

Endpoint: http://<host>/<region>/deployment/prepare?id=<asg>

This gets the ASG json information which can be used in the deployment process

Start a deployment

Endpoint: http://<host>/<region>/deployment/start

The deployment consists of "steps".  We've implemented the following:

- Create the new ASG (always starts out with 0 instances)
- Resize it to the appropriate # of instances
- Disable the old ASG
- Delete the old ASG

Each step is self-checking, so if it fails, none of the succeeding steps will execute.

Below is a sample python script to implement this:

#!/usr/bin/python

import sys
import urllib2
import jsonbr /> import requests

version = '1.0'

print 'AMI Asgard Deployment Script V' + version

asgardhost = 'localhost:8080'
ec2region = 'us-west-2'
baseurl = 'http://' + asgardhost + '/' + ec2region + '/deployment'
notify = 'xxx@yyy.com'

if (len(sys.argv) != 3):
   print 'Syntax: launch.py <ASG Id> <AMI Id>'
   sys.exit()

asgid=sys.argv[1]
amiid=sys.argv[2]

print 'Asgard Host: ' + asgardhost
print 'EC2 Region: ' + ec2region
print 'ASG: ' + asgid
print 'AMI to Launch: ' + amiid

query = baseurl + '/prepare?id=' + asgid
f = urllib2.urlopen(query)
deflcjson = f.read()
f.close()

deflc = json.loads(deflcjson)

deflc['lcOptions']['imageId'] = amiid

deflc['deploymentOptions'] = {
    "clusterName": asgid,
    "notificationDestination": notify,
    "steps": [
      { "type": "CreateAsg" },
      { "type": "Resize", "targetAsg": "Next", "capacity": deflc['asgOptions']['minSize'], "startUpTimeoutMinutes": 41},
      { "type": "DisableAsg", "targetAsg": "Previous" },
      { "type": "DeleteAsg", "targetAsg": "Previous" },
    ]
  }

posturl = baseurl + '/start'
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
response = requests.post(posturl, data=json.dumps(deflc), headers=headers)

print response
print response.text

Also, be sure you've implemented Eureka and healthchecks for all services.  Asgard waits for both a Eureka UP and a positive healthcheck.