Foo
Foo
Foo
- Foo
- Bar
- Foo
- Bar
- Foo
- Bar
When mounting a SharePoint form library using gvfs over WebDAV, files stored as XML on the server are returned as the HTML presented to browsers. Other WebDAV clients on Mac and Windows successfully retrieve XML versions. What’s going on‽
Well, SharePoint supports a proprietary Translate: f request header to tell the server that it should return files verbatim. This is sent by Mac and Windows WebDAV clients, but not gvfs.
To test using curl, compare:
curl https://sharepoint.example.com/path/to/site/Library/foo.xml \
-u"username:password"
and
curl https://sharepoint.example.com/path/to/site/Library/foo.xml \
-u"username:password" -H"Translate: f"
I’ve created bug 688045 in the GNOME Bugzilla to ask that gvfs add this header to all requests. In the meantime, we’ll probably have to proxy SharePoint and add the header ourselves, or patch gvfs. Yay.
I’m currently trying to work out how to poke the University’s SharePoint instance to get data out in useful formats. Often, it’s useful to see how other tools (e.g. WebDAV clients and SharePoint Designer) do it. As SharePoint is (sensibly) only available over HTTPS, I’ve had to set up a local Apache instance to act as a proxy.
Here’s the config:
SSLProxyEngine on ProxyRequests on ProxyPass / https://sharepoint.nexus.ox.ac.uk/ ProxyPassReverse / https://sharepoint.nexus.ox.ac.uk/ ProxyPassReverseCookieDomain .nexus.ox.ac.uk localhost Header edit Set-Cookie secure ""
Line-by-line (almost):
SSLProxyEngine on ProxyRequests on
ProxyPass / https://sharepoint.nexus.ox.ac.uk/ ProxyPassReverse / https://sharepoint.nexus.ox.ac.uk/
ProxyPassReverseCookieDomain .nexus.ox.ac.uk 192.168.122.1
Header edit Set-Cookie secure ""
This allows me to run clients against the local URL with traffic in the clear for me to snoop using WireShark. It’s worked for gvfs, now to check SharePoint Designer (running on a local Windows 7 VM).
I’ve previously blogged about accessing Exchange (2007) using suds and Python. Turns out that things have changed slightly in Exchange 2010, so here’s an update.
First, you’ll need to use Alex Koshelev’s EWS-specific fork of suds, which you can grab from BitBucket. Next, you’ll need code a little like this:
import urllib2
from suds.client import Client
from suds.sax.element import Element
from suds.transport.http import HttpTransport
class Transport(HttpTransport):
def __init__(self, **kwargs):
realm, uri = kwargs.pop('realm'), kwargs.pop('uri')
HttpTransport.__init__(self, **kwargs)
self.handler = urllib2.HTTPBasicAuthHandler()
self.handler.add_password(realm=realm,
user=self.options.username,
passwd=self.options.password,
uri=uri)
self.urlopener = urllib2.build_opener(self.handler)
transport = Transport(realm='nexus.ox.ac.uk',
uri='https://nexus.ox.ac.uk/',
username='abcd0123',
password='secret')
client = Client("https://nexus.ox.ac.uk/EWS/Services.wsdl",
transport=transport)
ns = ('t', 'http://schemas.microsoft.com/exchange/services/2006/types')
soap_headers = Element('RequestServerVersion', ns=ns)
soap_headers.attributes.append('Version="Exchange2010_SP1"')
client.set_options(soapheaders=soap_headers)
address = client.factory.create('t:EmailAddress')
address.Address = 'first.last@unit.ox.ac.uk'
client.service.GetUserOofSettings(address)
Differences from the previous post are:
I was at the University’s Webmasters’ Workshop event at the OeRC on Friday, and got talking to Dan Q of the Bodleian Libraries about the soon-to-be-enforced ‘cookie law’. We realised that it’s possible to achieve cookie-like behaviour without actually setting a cookie. We’d initially thought that this would circumvent the ‘cookie law’, but having looked at the text of the legislation as quoted in the ICO’s guidance on cookies it appears that this cookie-less approach would also be unlawful, and is certainly against the spirit of the law. I present the idea here as a thought experiment, and to point out that one might need to be careful before implementing any ‘workarounds’ to continue to track visitors.
A cookie is simply an arbitrary bit of data handed to a browser that it will then hand back on subsequent requests. The cookie can be used to store a (semi-)permanent identifier that can be used to track the user, and it’s this functionality we want to duplicate.
In this approach, each page on a site pulls in a bit of JavaScript that uses XMLHttpRequest to retrieve /track/. This returns a never-expiring 301 Moved permanently response with a redirect to a URL containing a tracking identifier, say /track/sgnklsfg/. The browser retrieves this URL, and receives another never-expiring document. The document is a bit of XML containing the identifier, which can be retrieved using from the original XMLHttpRequest object.
This uses the browser's caching to maintain the identifier unchanged indefinitely. With the onset of Cross-Origin Resource Sharing, this would also allow the site owner to track users across domains. Dan Q also reckons it could be used to implement a shim around Google Analytics to eschew the use of cookies, which woud be useful were the cookie law only about cookies.
Update: Dave King points out that similar functionality could be acheived using web storage.
Further update: The redirect is probably unnecessary. There's also the possibility that the cached resource containing the identifier might drop off the bottom of the browser cache after a relatively short time. In this case, Dave's suggestion is probably a more reliable way to track a user.
The law is complicated, and I am not a lawyer. This is my interpretation of the law, and it is liable to differ from that of professionals.
The relevant section of the Privacy and Electronic
Communications Regulations Act 2003, as ammended, is:
- Subject to paragraph (D), a person shall not store or gain access to information stored, in the terminal equipment of a subscriber or user unless the requirements of paragraph (B) are met.
- The requirements are that the subscriber or user of that terminal
equipment--
- is provided with clear and comprehensive information about the purposes of the storage of, or access to, that information; and
- has given his or her consent.
- Where an electronic communications network is used by the same person to store or access information in the terminal equipment of a subscriber or user on more than one occasion, it is sufficient for the purposes of this regulation that the requirements of paragraph (B) are met in respect of the initial use.
For the purposes of paragraph (B), consent may be signified by a subscriber who amends or sets controls on the internet browser which the subscriber uses or by using another application or programme to signify consent.- Paragraph (A) shall not apply to the technical storage of, or access to, information--
- for the sole purpose of carrying out the transmission of a communication over an electronic communications network; or
- where such storage or access is strictly necessary for the provision of an information society service requested by the subscriber or user.
This doesn't mention cookies by name, only the act of causing to be stored or retrieving information from the user's browser without consent unless it is necessary in order to provide the requested service. A broad interpretation might be that as CSS generally contains no semantic content then it is not strictly necessary, and so requires the permission of the user. Likewise advertising. Other techniques for identifying the user, such as browser fingerprinting access information stored in the terminal equipment without permission, and so are presumably unlawful. Likewise subscribing to orientation events would be forbidden as it isn't "strictly necessary" for providing a service, just convenient. It all seems a bit too woolly and all-encompassing. You might be interested in Silktide's page on what is affected by the "Cookie Law".
As mentioned earlier. the wording of the legislation would seems to suggest that this cookie-less approach would still be as unlawful as the equivalent using cookies.
Ever seen decorators which can be used like this?
@baked
def get_cake(flavour):
# …
@baked(temperature=180, duration=25)
def get_cake(flavour):
# …
Django’s template filter registration decorator is a good example of this, where it can be called as either a decorator, or a function that returns a decorator (specifically, a function that returns a function that takes a function and returns a function
).
All these levels of indirection can get a little confusing. First, lets look at a simple decorator function:
import functools
def baked(method):
@functools.wraps(method)
def f(*args, **kwargs):
thing_to_be_baked = method(*args, **kwargs)
return bake(thing_to_be_baked)
return f
functools.partial is a useful utility function that copies attributes from the wrapped function to the wrapping function.
Next, here’s one that takes additional arguments:
import functools
def baked(temperature=None, duration=None):
def decorator(method):
@functools.wraps(method)
def f(*args, **kwargs):
thing_to_be_baked = method(*args, **kwargs)
return bake(thing_to_be_baked, temperature, duration)
return f
return decorator
Here, baked is the function that returns a function (decorator) that takes a function (method) and returns another function (f). This is a lot of nesting, and still doesn’t handle the case where the user doesn’t want to supply the optional arguments.
We can reduce the nesting using functools.partial, while at the same time making the arguments optional:
import functools
def baked(method=None, temperature=None, duration=None):
# If called without method, we've been called with optional arguments.
# We return a decorator with the optional arguments filled in.
# Next time round we'll be decorating method.
if method is None:
return functools.partial(baked, temperature=temperature, duration=duration)
@functools.wraps(method)
def f(*args, **kwargs):
thing_to_be_baked = method(*args, **kwargs)
return bake(thing_to_be_baked)
return f
I spent the weekend at DevXS a student developer event hosted by the lovely people at the University of Lincoln.
All in all, it was a great event, and I look forward to there being more of them. Joss Winn commented afterwards that it’s also quite likely a good way to encourage young developers to work in higher education. In the very least it’s going the attendees more aware that there are things they can build that will improve the student experience for them and their peers.
Want to know more? Tony Hirst has penned a blog post with his thoughts, and the official blog has a closing video, the list of winners — there were £1500 worth of prizes(!) — and loads more stuff about the event.
While there I knocked together a lightning talk about how easy it is to build stuff that interfaces with the physical world, based on my very positive experience with JeeNodes. The slides are available as a PDF.
This is mostly a note to myself, though might be useful for anyone else trying to get it working.
pylibacl is a Python module for accessing and modifying POSIX.1e Access Control Lists. We’re using these ACLs in the DataFlow project as we need finer-grained access control than is afforded by the standard Unix permissions model.
One of our developers was trying to get pylibacl working on Mac OS X, and ran into a bit of trouble when compiling. Basically, it isn’t supported, and won’t work. The pylibacl homepage says:
Todo: while Linux support is quite good, other OSes are not; this should be remedied…
The longer explanation is that OS X has a set of permissions that are different to those that pylibacl expects:
// </usr/include/sys/acl.h> on Max OS X 10.6.7
typedef enum {
ACL_READ_DATA = …,
ACL_LIST_DIRECTORY = …,
ACL_WRITE_DATA = …,
…
} acl_perm_t;
On my GNU/Linux 2.6.35.14 box:
// </usr/include/sys/acl.h> on GNU/Linux 2.6.35.14 #define ACL_READ (0x04) #define ACL_WRITE (0x02) #define ACL_EXECUTE (0x01)
This leads to the following fun:
snow-leopard:~ root# pip install pylibacl
Downloading/unpacking pylibacl
Downloading pylibacl-0.4.0.tar.gz
Running setup.py egg_info for package pylibacl
warning: no files found matching 'MANIFEST'
Installing collected packages: pylibacl
Running setup.py install for pylibacl
building 'posix1e' extension
gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch ppc -arch x86_64 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -c acl.c -o build/temp.macosx-10.6-universal-2.6/acl.o
acl.c:52: error: ‘ACL_READ’ undeclared here (not in a function)
acl.c:53: error: ‘ACL_WRITE’ undeclared here (not in a function)
/usr/libexec/gcc/powerpc-apple-darwin10/4.2.1/as: assembler (/usr/bin/../libexec/gcc/darwin/ppc/as or /usr/bin/../local/libexec/gcc/darwin/ppc/as) for architecture ppc not installed
Installed assemblers are:
/usr/bin/../libexec/gcc/darwin/x86_64/as for architecture x86_64
/usr/bin/../libexec/gcc/darwin/i386/as for architecture i386
acl.c:52: error: ‘ACL_READ’ undeclared here (not in a function)
acl.c:53: error: ‘ACL_WRITE’ undeclared here (not in a function)
acl.c:1615: fatal error: error closing -: Broken pipe
compilation terminated.
acl.c:52: error: ‘ACL_READ’ undeclared here (not in a function)
acl.c:53: error: ‘ACL_WRITE’ undeclared here (not in a function)
lipo: can't open input file: /var/tmp//ccWGzg6m.out (No such file or directory)
error: command 'gcc-4.2' failed with exit status 1
Wikipedia also claims that OS X supports NFSv4 ACLs, which may also offer some explanation; I don’t know anything about NFSv4 ACLs to be able to tell!
In the longer term an interested party could probably fix pylibacl to work on OS X without too much difficulty — their code is available in a Git repository at git://git.k1024.org/pylibacl.git, and it already does a bit of platform-specific stuff. However, C isn’t exactly my area of expertise, Mac OS X isn’t one of our target platforms, and we’ve got plenty of other things to be getting on with.
I’ve just come back from the JISCexpo end-of-programme meeting in Manchester — I’d attended as a developer on the Open Citations Project. While there I met some of the University of Lincoln‘s web team, and it was interesting to see how they were using the web to interact with current and prospective students.
Take a look at their main Twitter account:
They’re being a lot more interactive than I’ve seen elsewhere within HE. They’re actually responding to queries and pointing people in the right direction for further information. On the other hand, most of what we do is regurgitate RSS feeds.
This got me wondering whether we should strive to be using social media in a more bi-directional fashion. I’m not saying that it should be to the detriment of publishing news articles — one could mix them on the same account or have both news and conversational accounts.
Looking as though one would respond to tweets would give a “beneficial air of friendliness”, which could translate into “conversions” and new opportunities. The people that manage these accounts have a lot of knowledge about their University, department or college that they’re likely not going to think to share until asked, but which would be useful to a lot of people on the Internet.
Relatedly, one of their students had put together three videos[1, 2, 3] which he placed on YouTube and labelled as “banned adverts”². These have racked up about 2 million news between them.
Seeing these, their press people commissioned him to produce a clearing advert, which has garnered them even more (positive) publicity. It’s great to see them innovating in finding ways to generate interest among potential students and the wider Internet.
The Python tarfile module is a handy way to access files within tar archives without needing to unpack them first. You can iterate over files using the following pattern:
import tarfile
tar = tarfile.open(filename, 'r:gz')
for tar_info in tar: # tar_info is the metadata for a
# file in the archive.
file = tar.extractfile(tar_info) # file is a file-like object.
for line in file: # We can do standard file-like
print line, # things.
Behind the scenes, each TarFile object maintains a list of members of the archive, and keeps this updated whenever you read or write members. This is fine for small archives, particularly if you want to access the metadata without having to re-read the archive. (TarFile objects have getmember, getmembers, and getnames methods for this kind of access.)
This list of members contains the TarInfo objects for every file in the archive. When you’ve got an archive with 18 million members (as I have), this list will no longer conceivably fit in memory. It’s not documented (as far as I can tell), but the solution is to periodically set the members attribute on the TarFile object to the empty list:
import tarfile
tar = tarfile.open(filename, 'r:gz')
for tar_info in tar:
file = tar.extractfile(tar_info)
do_something_with(file)
tar.members = []
Obviously one loses some functionality as specified above, but hopefully now my scripts will terminate in reasonable time!