Skip to content

ldn ml_1

Louis edited this page Sep 28, 2016 · 1 revision

-3-4y PhD at Turing Inst -unique in ds -fun and interesting place, interested in PhDs or postdocs good place -end of ad, new info revolution, really influencing society the sciences and business world -everybody awash and what we need is powerful tools for modelling predicting and reasoning on large predictive datasets -start moving into content -ML meetup so all know some ML, now ML means different things to diff ppl and -...delighted that field exploding in importance -fascinating, but let's look at org principles, foundations -what ties field together -{pic} -let's think of ML as probabilistic modelling, -:to me a model is description of possible data that one could observe frm a sys -if model meant to make preds from ... use maths to model -Bayesian or what Fisher renamed from "you Bayesians" you [inv probabilists] applied rules to learning from data -adaptive models and learning from data -should apologise given talks recently -Goldman Sachs talk was the same -"few who were left should sit in the back and sleep and that's probably what they're doing now..."

to model those who repeat this talk regard the percutant backlayin

-how to do hypotheses on data; -hypoth:hidden var, model params.anything not observed call hypotheses -before observing data may have some assumptions about models that will encode in prob model call prior -:. each assign prob to data compute prob of data I observe call likelihood

Bayes tells us multiply prior by likelihood and renormalise and call learning

-go through data observed -inference from hypothesis thru data inference in human brain at some theoretical level may want to think of as inferring given data -,e.g. coming from eye, -,ties nicely into what we see in eyes, -,instead of words:plug in Bayes' rule -Bayes rule isn't axiom it's Sum Rule AND Product Rule -...sum tells us -...joint tells us marginal can be factored into ... and marginal prob -Bayes rule follows from 2 rules -here written Bayes rule in format typically seen in ML problem -d: representing data -m: general class -prior over params: range of params willing to consider before observing data

-likelihood func here multiply prior to likelihood -get posterior of params given data -...what we learned about params given data -pred what not observed, missing, new, then sumANDprod tells x given data is integral or avg. is ...any param...weighted by posterior prob of params -pred. from basic any basis is naturally ensembling or averaging -model comparison is applied BR at level of models at slightly higher level -basics and could go deeper into how to do technically how to choose prior

models are not problems -models' problems are:how many dim:feature sel:fitting dyn-order:#unitsHMM:layersNN: -...beauty of Bayesian framework: all just variations on same Q -works thru Occam's razor -not new concept, how does that work

Yes how does that work? What do Goldman Sachs think is 'as simple as possible', what is elided in such groups' vision on the world?

-what Bayesian Occ Raz tells you is actually marginalised likelihood naturally embodies a preference for simplicity -how does that come about -normalised constant aka integrated aka model evidence

-term has several interpretations -averaging over all possible parameter values

-probability that randomly selected problems from the prior -log(...) is no. bits of surprise -...entropy is average surprise

Alan did not write bits

natural logs please

-regularise model avoid overfitting do early stopping etc. overcome silliness of doing optimisation -what basic sAp rule say: if want to know 2nd order... avg. cf. prior -...otherwise model not well defined -avg. over prior and get marginal likelihood -if apply to model comparison problems what you get is for any given dset is avoid selecting models too simple or too complicated

Is complicated another word for complex and if not what is elided? Is it too simple or compl. a mapping? Is it self similar in this relation or non-meron/homonymTODO:ascertain

-marginal likelihood small enough that they have too many params to explain data -tutorial part of talk over


-the big revolution in ML in last few years has been in DL -coming back to later -=cut=- -when is probabilistic modelling essential? -for many aspects of ML and intel you need to be careful about uncertainty -when doing forecasting you're uncertain about what's going to happen in the future -if going to make decisions where the consequences of your decision into the future is what matters then for decision making you're interested

MAXIMISE loss etc.

-calibrate probabilities -many things give probabilities not often reasonable probabilities

Reasonable does not mean good [speaker implied otherwise] it implies logically accessible.

-noise in tests: benefit from probabilistic framework for filling in missing data -all of data compression [all compression algos] correspond to explicit probabilistic model -better compressors<=>better probabilistic model

-=cut=-

-what do we mean by automate? -lot of data out there, not enough ML-ers -even for "people like us",

and I don't know about that

-we're going to automate some aspect of model discovery from data -really ambitious -model discovery is collectively [actually]`TODO:soi-dis-quoi?`` quite interested

-=cut=-

-diagram of things we developed -basic idea is data comes in, -,something like a table, table of numbers, -process is going to search over some space of models for a good model of the data -could enumerate some models and test them out -not very exciting...

...um...

-take a compositional idea e.g. how to compose structures out of Lego -simple elements-:words in our language essentially:-language of models open-ended -capture all sorts of things incl. Fourier analysis, linear regression, search procedure that tries to efficiently explore this language of models -not going to exhaustively ennumerate

enough greed what happened to Occam hey. nope this isn't complexity given observation nope.

-could use cross-validation, in fact have version of code base that just uses cross-validation -using marginal likelihood is more efficient given

...x-val not good if high-dim 1 3 precess 4 processional 13 mod proc int

int.er int.er.n .al erg. supp. patt. claims. website .re port

end-to-end data-to-reports

Clone this wiki locally