es-discuss Digest, Vol 43, Issue 9
lmeyerov at gmail.com
Sun Sep 5 18:41:48 PDT 2010
Snipping together several emails:
> Of course not. But it does allow these to be frozen. SES makes use of that to freeze them (after cleaning them up) as the last step of its initialization <http://code.google.com/p/es-lab/source/browse/trunk/src/ses/initSES.js#557>.
So why not allow the ability to disable weak map enumeration? Perhaps default on vs. off is a distinction more important to SES than strict mode.
> 3) Sniffing is already ubiquitous in terms of language / host API variants
> Sure. What issue does that raise?
It's an example of (static) data leakage we've already accepted, and, yep, I won't lose sleep over it anytime soon!
> More generally, ES mostly has what one might call sources of "static nondeterminism" <http://code.google.com/p/google-caja/wiki/SourcesOfNonDeterminism>. By "static" I mean ways implementations of the spec may differ from each other. But these only open a non-overt channel if they may differ dynamically among runs on the same platform in a way that can be influenced.
That page suggests we agree that combining a language implementation with a predictable cost model (e.g., time iterating a short list vs a long one is observable) with "Date()" is enough to form a reliable covert channel. In the motivating mashup scenario for our discussion, I suspect we're just as likely (or more likely!) to get actual side channels from APIs with exploitable complexities as from enumerable weakmaps. I couldn't tell from the wiki page if you thought we should remove "Date()" in general, in Cajita in particular, or should just be generally aware of it.
Inclusion of "Date()" suggests a strict regimen well beyond ocap-style ES5 if you want object (e.g., API) sharing mashups without side channels.
> Nice. Excellent questions. The boundary between overt and non-overt is relative to the semantics of a platform. For the same concrete platform implementation, there can be multiple semantic accounts, and thereby multiple notions of where the boundary is between overt and non-overt.
I've thought about the meaning of covert vs. overt in terms of granularities of semantics for the past couple of years. What is visible in a model is overt; what is only visible in a finer model is covert in the coarse model. PL papers typically hide cost models and GC, though I view it more as a way to avoid distractions and developers do use some sort of high-level model of them when writing programs. Thus, aspects of them are indeed overt.
I don't know about side-channels; it seems too widely used a term for me to want to pick one clean meaning at the language level. In the case of your taxonomy breaker, it might even be orthogonal to overt vs. covert. If there's any sort of bug communicating what wasn't intended, irrespective of (c)overtness, that might be a side channel. E.g., maybe you forgot to disable the global readable debug log.
> Formal models for reasoning about confidentiality that ignore non-overt channels are generally unsound. Deterministic platforms aside, "correctly" preserving confidentiality involves reasoning about implementations and about multiple levels of abstraction.
As we can always make a model more fine-grained, by my definition of covert vs overt, any confidentiality result is unsound (which is not surprising: you better not be sniffing my electrons!). So perhaps you mean that we only admit attacks within a restricted range of semantic models. The question then is what is the finest model considered and where do we draw the line between overt and covert ones. This starts to be very involved...
> A correct program must only rely on the communications provided by overt channels. This brings us to an important realization from Fred Spiessen's thesis <http://www.evoluware.eu/fsp_thesis.pdf>: Formal models for reasoning about integrity that ignore non-overt channels remain sound. The correctness of a program at preserving integrity is within conventional correctness, which we can reason about soundly in a modular manner.
I didn't see which part of the thesis you're referring to, perhaps you can point to a section?
> Now to your questions. My earlier statements are with regard to using a conventional programming language semantics, in which memory allocation mechanics are mostly ignored. At <http://wiki.ecmascript.org/doku.php?id=strawman:gc_semantics> I captured some notes and references to papers that explore how to extend such conventional semantics to take account of some of these issues. At the opposite extreme, we could reason about the semantics of the machine instruction set, where the "program" we are reasoning about is the entire tower of software leading up the same program. At this level of abstraction, there is finite memory and no memory allocation. Under this perspective, we would get very different answers about what is correct or incorrect, and corresponding different answers about what is overt and non-overt.
1. "For code that can overtly sense what has been collected, such as by using weak references, such observation opens an overt information leakage channel" -- it seems here you are accepting weak references as an overt peephole into references they attach to. The more I think about it, the more I think about this as being an (acceptable) ocap explanation of weak references that isn't far from, say, (open) destructors.
I agree with your later comment about non-local reasoning. I've been trying to think of a way to make an enforceable decision at the allocation or sharing point -- e.g., declaring an object cannot be reachable by a weak reference -- but haven't thought of anything satisfying beyond being able to disable weak enumerability at the global level as a MOP capability.
Most selective controls are going to still enable implicit control dependencies similar to your example. I don't see ES5/Cajita/... providing reliable way to do both general object sharing and information flow control against these without adding some major restrictions and machinery that I suspect could also obviate the need for such controls.
2. If we want to be side-channel free to the extent of weak map enumeration mattering, more stringent GC progress guarantees seem essential. I haven't followed the literature here, but this sounds kind of neat at the research level (which also might be a concern for those whose address doesn't end in ".edu"!).
> Ok, let's suppose that we consider a semantics in which a shared object reference is an overt communications channel. Overt implies that correct programs may rely on this communications channel. This semantics introduces a non-modularity, in that providing an abstraction to two clients enables those clients to legitimately interact in ways out of control of their shared abstraction.
I totally agree! However, I'd point my finger at garbage collection: even without weak references, it breaks modularity. GC, for high-level semantic models, automates the memory allocation protocol, seemingly restoring modularity to programs. I don't recall the best paper describing this perspective; definitely surfaced in part in Dan Grossman's "The TM / GC Analogy" and maybe in Wilson's GC survey, but I think there was a more focused writeup elsewhere. However, the automation isn't perfect: memory leaks etc. still happen and we still use non-local reasoning when reference passing (for memory use, not confidentiality). Perhaps a more historically fair way of phrasing this is that "garbage collection didn't fully solve modularity wrt memory use."
> How would we program modularly in such a system? We would need to avoid ever sharing references except when we intend to enable communication.
Even without timing attacks, weak references, etc., I think ES5 is still unreliable for information flow properties. SES sounds like a step closer, but even there, I wouldn't trust 20,000+ lines of it without something like an information flow analysis. Writing information safe ES seems like it would be much less like ES5 than anything like SES, AdSafe, etc.
Furthermore, for most analyses I've read, I think the modularity (for confidentiality) isn't so negatively impacted. Yes, you now have to check that exposed tainted references outlive untrusted widgets (or finer ideas like collection is not impacted by high data), but I suspect most such analyses would disallow giving the items to beginwith! I don't recall any papers about such leaks; I think the timing attacks (even without traditional GC) are a bigger concern.
Rereading this email, I'm not sure the covert/overt/side channel perspective is adding much to this discussion. Our willingness to address timing attacks such as predictable GC, (concrete) acceptable levels of information leakage, and 'toggleability' seem more important. The others are interesting, but maybe not for here (say the ocap list).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the es-discuss