Differential Entropy and nonstandard analysis

For reasons discussed previously I believe that every scientific measurement lives on a finite sample set. But, it is tiresome to work with enormous explicit finite sample sets. like for example the actual vales that a 64 bit IEEE floating point number can take on... They're not actually evenly spaced for example. What we tend to do is deal with discrete samples spaces with explicit values when the set is small enough (2 or 10 or 256 or something like that) and deal with "continuous" distributions as approximations when there are lots of values, and the finite set of values are close enough together (for example a voltage measured by a 24 bit A/D converter in which the range 0-1V is represented by the numbers 0-16777215 so that the interval between sample values is about 0.06 micro-volts, which corresponds to 0.06 micro amps for a microsecond into a microfarad capacitor, or around 374000 electrons).

Because of this, the nonstandard number system of IST corresponds pretty well to what we're doing typically. Suppose for example x ~ normal(0,1) in a statistical model. We can pick a large enough number, like 10, and a small enough number like and grid out all the individual values between -10 and +10 in steps of 0.000001 and very rarely is anyone going to have a problem with this discrete distribution instead of the normal one. Anyone who does have a problem should remember that we're free to choose a smaller grid, and their normal RNG might be giving them single precision floating point numbers that have 24 bit mantissas anyway... IST formalizes this by some stuff (axioms, lemmas etc) that proves the existence, in IST, of an infinitesimal number that is so small no "standard" math could distinguish it from zero, and yet it isn't zero.

So, now we could say we have the problem of picking a distribution to represent some data, and we know only that the data has mean 0 and standard deviation 1. We appeal to the idea that we'd like to maximize a measure of uncertainty conditional on mean 0 and standard deviation 1. In discrete outcomes, there's an obvious choice of uncertainty metric, it's one of the entropies

Where the free choice of logarithm is equivalent to a free choice of a scale constant which is why I say "entropies" above. Informally, since the log of a number between 0 and 1 (a probability) is always negative, then the negative of the log is positive. The smaller you make each of the p values, the bigger you make each of the values. So maximizing the entropy is like pushing down on all the probabilities. The fact that total probability stays equal to 1 limits how hard you can push down. So that in the end the total probably is spread out over more and more of the possible outcomes. If there are no constraints, all the probability become equal (the uniform probability). Other constraints limit how hard you can push down in certain areas (ie. if you want a mean of 0 you probably can't push the whole range around 0 down too hard) so you wind up with more "lumpy" distributions or whatever depending on your constraints.

The procedure for maximizing this sum subject to the constraints is detailed elsewhere. The basic technique is to take a derivative with respect to each of the values and set all the derivatives equal to 0. To add the constraints, you use the method of lagrange multipliers. The result would be each and the will depend on in our case, and the chosen to normalize the total probability to 1.

Now, suppose you want to work with a "continuous" variable. In nonstandard analysis we can say that our model is that the possible outcomes are on an infinitesimal grid with grid size and constrained to be between the values for a nonstandard integer. So the possible values are for all the i values between 0 and . We define a nonstandard probability density function to be a constant over each interval of length dx, and the probability to land at the grid point in the center (or left side or some fixed part) of the interval is .

Now we calculate the nonstandard entropy

Now clearly the argument to is infinitesimal since p(x_i) is limited and is infinitesimal, so is nonstandard (very very large and positive). But, it's a perfectly good number. There is a finite number of terms in the sum so the sum is well defined. The value of the sum is of course a nonstandard number, but we could ask, how to set the p(x_i) values such that the sum achieves its largest (nonstandard) value. Clearly is going to be the same kind of expression as before, because we're doing the same calculation (hand waving goes here feel free to formalize this in the comments) so we're going to wind up with:

Where refers to the nonstandard function which is constant over each interval, the standardization of this is going to be the usual normal distribution.

The point is, just because the entropy is nonstandard doesn't mean it doesn't have a maximum, and so long as the maximum occurs for some function of x whose standardization exists, we can take the standard probability density that is chosen as the maximum entropy result we should use, and this procedure is justified in large part because of the way that the continuous function is being used to approximate a grid of points anyway!

If you don't like this result, you could always use the relative entropy (ie. replace the logarithm expression with relative to a nonstandard uniform distribution whose height is across the whole domain . This seems to be the concept referred to by Jaynes as the limiting density of discrete points. Then, the values in the logarithm cancel, and the entropy value itself isn't nonstandard, but the distribution is, so it's still a nonstandard construct. Since is just a constant anyway, it's basically just saying that by rescaling the original one via a nonstandard constant, we can recover a standard entropy to be maximized. But... and this is key, we are never USING the numerical entropy value itself, except as a means to pick out a probability density which turns out to have a perfectly well defined standardization, namely the normal distribution.

Differential Entropy and nonstandard analysis

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List