Discussion about this post

User's avatar
Overcoming Bias Commenter's avatar

Daniel, thanks for your perspective; it gives me lots to ponder.

Expand full comment
Overcoming Bias Commenter's avatar

Cyan,

See, this kind of terminological disagreement illustrates why I think it's better to use the codelength idea :-)

Can normalized maximum likelihood be used to send data? If so, then it implies an implicit prior over data sets which is exactly 2^(-l(x)), where l(x) is the length of the code. Whether or not this means it is "equivalent" to Bayes would seem to depend on what the word "Bayesian" means to you; in my lexicon it means a philosophical commitment to the necessity of using prior distributions that are essentially arbitrary. Once you've accepted that priors are necessary, then the rules for updating them are mathematical theorems which are no longer disputable.

Note that the above argument "Can method X be used to send data? If so, then it implies an implicit prior over data sets..." works for a wide range of methods X (e.g. Support Vector machines, Belief nets) which various people have claimed are not explicitly Bayesian.

It also means they are ALL subject to the mighty No Free Lunch Theorem which says roughly that in general, data compression cannot be achieved. All modeling and statistical learning techniques should therefore be prefaced by disclaimers noting that "this method does not work in general, but if we make certain assumptions about the nature of the process generating the data..."

Andrew, thanks for starting this discussion, looking forward to future OB posts from you (don't tell Eliezer that you're into things like the Gibbs sampler and Metropolis algorithm, though).

Expand full comment
19 more comments...