I'll be looking at it too, let me know what you find.
I'm mostly tripped up by: angular_velocity+=_inv_inertia_tensor.xform(torque)*p_step;
Which is:
principal_inertia_axes_local = inertia_tensor.diagonalize().transposed();
principal_inertia_axes = get_transform().basis * principal_inertia_axes_local;
Basis tb = principal_inertia_axes;
Basis tbt = tb.transposed();
tb.scale(_inv_inertia);
_inv_inertia_tensor = tb * tbt;
And the inertia tensor: https://github.com/godotengine/godot/blob/93ab45b6b5c4f8e0619e963156c983009d399a9d/servers/physics/body_sw.cpp#L84
inertia_tensor.set_zero();
for (int i=0;i<get_shape_count();i++) {
const ShapeSW* shape=get_shape(i);
float area=get_shape_area(i);
float mass = area * this->mass / total_area;
Basis shape_inertia_tensor=shape->get_moment_of_inertia(mass).to_diagonal_matrix();
Transform shape_transform=get_shape_transform(i);
Basis shape_basis = shape_transform.basis.orthonormalized();
// NOTE: we don't take the scale of collision shapes into account when computing the inertia tensor!
shape_inertia_tensor = shape_basis * shape_inertia_tensor * shape_basis.transposed();
Vector3 shape_origin = shape_transform.origin - center_of_mass_local;
inertia_tensor += shape_inertia_tensor + (Basis()*shape_origin.dot(shape_origin)-shape_origin.outer(shape_origin))*mass;
}
I'm not very familiar with physics engines, so I'm not terribly confident about verifying results or debugging.