Preparation of a chain for HMAC

advertisements

I am writing a webservice which uses HMAC for message authentication. I am having some issues preparing the 'data' for digest, and am getting different digests for the same 'data' in Python vs NodeJS.

I am fairly sure that this issue is due to encoding, but I am not sure how to best approach this.

Python code:

import hmac
from hashlib import sha1

f = open('../test.txt')
raw = f.read()

raw = raw.strip()

hm = hmac.new('12345', raw, sha1)
res = hm.hexdigest()
print res

>> 5bff447a0fb82f3e7572d9fde362494f1ee2c25b

NodeJS (coffee) code:

fs = require 'fs'
http = require 'http'
{argv} = require 'optimist'
crypto = require 'crypto'

# Load the file
file = fs.readFileSync argv.file, 'utf-8'
file = file.trim()

# Create the signature
hash = crypto.createHmac('sha1', '12345').update(file).digest('hex')
console.log(hash)

>> a698f82ea8ff3c4e9ffe0670be2707c104d933aa

Edit: Also, the length of raw is 2 characters longer than file, but I cant work out where these two characters come from.


This is the problem with encoding of the data you read from the filesystem and has nothing with algorithms you use.

When you work with string data both in Python and JavaScript, you should be very careful about encoding which your data is stored in. Try to work with data either as with strings (which, in particular have such a property as encoding), or as with "raw data". When reading and signing data, you shouldn't probably care about the encoding, and try to use data as "raw" as much as you can in your language.

Some points to note:

  • Filesystem stores "raw" bytes, and knows nothing about the contents and the encoding of your file. Furthermore, for some files (like, jpegs, for example), the "encoding" concept is worthless
  • The same is valid for crypto algorithms. They work with raw bytes and know nothing about its "character representation". That's why digital signatures work so well with all sorts of binary documents, etc.
  • trim() in javascript or strip() in python work with strings, and their behaviour can vary depending on the underlying encoding (try u's '.encode('utf-16').strip().decode('utf-16') in python, for example). If possible, I'd rather avoid using trimming, to not to mix the way you work with data.
  • Python 2.x (and, I suppose, Javascript too) have set of rules for implicit conversion between strings and raw data.

Here in your code you work with binary data in Python, but do conversion to string in JavaScript, when you define the encoding of the file to read. Apparently, there is a sort of implicit converting from utf-8 back to raw string in crypto module, but I don't know what it does.

As described in here, the most kosher way of handing raw strings in node.js is to use buffers. You could read buffer from filesystem, but unfortunately, nodejs crypto library doesn't support them yet. As described here:

The Crypto module was added to Node before there was the concept of a unified Stream API, and before there were Buffer objects for handling binary data.

As such, the streaming classes don't have the typical methods found on other Node classes, and many methods accept and return Binary-encoded strings by default rather than Buffers.

That's said, to make the example work, current approach is to read data by passing "binary" as the second argument to the call:

file = fs.readFileSync argv.file, "binary"

Also, as I said, I'd rather avoid stripping data I just read from the file.