I know next to nothing when it comes to the how and why of https connections. Obviously, when I'm transmitting secure data like passwords or especially credit card information, https is a critical tool. What do I need to know about it, though? What are the most common mistakes you see developers making when they implement it in their projects? Are there times when https is just a bad idea? Thanks!
An HTTPS, or Secure Sockets Layer (SSL) certificate is served for a site, and is typically signed by a Certificate Authority (CA), which is effectively a trusted 3rd party that verifies some basic details about your site, and certifies it for use in browsers. If your browser trusts the CA, then it trusts any certificates signed by that CA (this is known as the trust chain).
Each HTTP (or HTTPS) request consists of two parts: a request, and a response. When you request something through HTTPS, there are actually a few things happening in the background:
- The client (browser) does a "handshake", where it requests the server's public key and identification.
- At this point, the browser can check for validity (does the site name match? is the date range current? is it signed by a CA it trusts?). It can even contact the CA and make sure the certificate is valid.
- The client creates a new pre-master secret, which is encrypted using the servers's public key (so only the server can decrypt it) and sent to the server
- The server and client both use this pre-master secret to generate the master secret, which is then used to create a symmetric session key for the actual data exchange
- Both sides send a message saying they're done the handshake
- The server then processes the request normally, and then encrypts the response using the session key
If the connection is kept open, the same symmetric key will be used for each.
If a new connection is established, and both sides still have the master secret, new session keys can be generated in an 'abbreviated handshake'. Typically a browser will store a master secret until it's closed, while a server will store it for a few minutes or several hours (depending on configuration).
For more on the length of sessions see How long does an HTTPS symmetric key last?
Certificates and Hostnames
Certificates are assigned a Common Name (CN), which for HTTPS is the domain name. The CN has to match exactly, eg, a certificate with a CN of "example.com" will NOT match the domain "www.example.com", and users will get a warning in their browser.
Before SNI, it was not possible to host multiple domain names on one IP. Because the certificate is fetched before the client even sends the actual HTTP request, and the HTTP request contains the Host: header line that tells the server what URL to use, there is no way for the server to know what certificate to serve for a given request. SNI adds the hostname to part of the TLS handshake, and so as long as it's supported on both client and server (and in 2015, it is widely supported) then the server can choose the correct certificate.
Even without SNI, one way to serve multiple hostnames is with certificates that include Subject Alternative Names (SANs), which are essentially additional domains the certificate is valid for. Google uses a single certificate to secure many of it's sites, for example.
Another way is to use wildcard certificates. It is possible to get a certificate like ".example.com" in which case "www.example.com" and "foo.example.com" will both be valid for that certificate. However, note that "example.com" does not match ".example.com", and neither does "foo.bar.example.com". If you use "www.example.com" for your certificate, you should redirect anyone at "example.com" to the "www." site. If they request https://example.com, unless you host it on a separate IP and have two certificates, the will get a certificate error.
Of course, you can mix both wildcard and SANs (as long as your CA lets you do this) and get a certificate for both "example.com" and with SANs ".example.com", "example.net", and ".example.net", for example.
Strictly speaking, if you are submitting a form, it doesn't matter if the form page itself is not encrypted, as long as the submit URL goes to an https:// URL. In reality, users have been trained (at least in theory) not to submit pages unless they see the little "lock icon", so even the form itself should be served via HTTPS to get this.
Traffic and Server Load
HTTPS traffic is much bigger than its equivalent HTTP traffic (due to encryption and certificate overhead), and it also puts a bigger strain on the server (encrypting and decrypting). If you have a heavily-loaded server, it may be desirable to be very selective about what content is served using HTTPS.
If you're not just using HTTPS for the entire site, it should automatically redirect to HTTPS as required. Whenever a user is logged in, they should be using HTTPS, and if you're using session cookies, the cookie should have the secure flag set. This prevents interception of the session cookie, which is especially important given the popularity of open (unencrypted) wifi networks.
Any resources on the page should come from the same scheme being used for the page. If you try to fetch images from http:// when the page is loaded with HTTPS, the user will get security warnings. You should either use fully-qualified URLs, or another easy way is to use absolute URLs that do not include the hostname (eg, src="/images/foo.png") because they work for both.
- This includes external resources (eg, Google Analytics)
Don't do POSTs (form submits) when changing from HTTPS to HTTP. Most browsers will flag this as a security warning.