These fields are all optional and need only
be supplied if you would like a direct reply.
Subject
Your email address
Your real name
You must answer this!
If you don't, my spam filtering will
ensure that I never see your email.
What's 8 plus five (in digits only)?
Please make your changes here and then
Editing tips and layout rules.
File: IntroducingBigOh ''' <link rel="alternate" type="application/rss+xml" ''' href="/rss.xml" title="RSS Feed"> ********> width="25%" |>> ''' <a title="Subscribe to my feed" ''' rel="alternate" ''' href="https://www.solipsys.co.uk/rss.xml"> ''' <img style="border-width: 0px;" ''' src="https://www.feedburner.com/fb/images/pub/feed-icon32x32.png" ''' align="middle" ''' alt="" />Subscribe!</a> _ ''' <a href="https://twitter.com/ColinTheMathmo"> ''' <img src="https://www.solipsys.co.uk/new/images/TwitterButton.png" ''' title="By: TwitterButtons.net" ''' width="212" height="69" ''' alt="@ColinTheMathmo" ''' /></a> <<| ---- My latest posts can be found here: * ColinsBlog ---- Previous blog posts: * ConstantDifferences * AlgorithmsAndSizesOfInstances * IntroducingTimeComplexity * TheLinearFrog * SeventyVersusOneHundredRevisited * HowTheFarragoWorks * SeventyVersusOneHundred * PowersOfTwoInLexOrder * EmergingEExpanded * RageInducingSystemImplementation * TheBookIsNotAlwaysRight * EmergingE * ImpossibleToTranslate * WaitingInVain * NonRepeatingDecimals * RationalRepeats * WhyIsItLovely * CompilingCryptoConnections * ExploringConnectionsBetweenCryptoSystems * ElwynBerlekampHasLeftUs * RootCauseAnalysisAndThePhotocopierQuestion * TheUpDownTides * TheForeAftTide * TheSidewaysTide * WrappingUpWrappingUpTheEarth * TheOtherWrappingTheEarthProblem * WrappingTheEarth * TheRingOfSteel * RoundingUpTheRopes * OtherOtherOtherRopeAroundTheEarth * RopeAroundTheEarthRefined * TheOtherRopeAroundTheEarth * ElementaryEstimates * LatitudeCorrection * JustGiveMeTheAnswer * MoreMusingOnPollardRho * IdleThoughtsAboutPollardRho * WhenOptimisingCodeMeasure * ADogCalledMixture * AnotherPayPalScam * WhyTopPostingHasWon * UnexpectedInteractionOfFeatures * ArchimedesHatBoxTheorem * ConsideringASphere * ToLinkOrNotToLink * GenericAdviceForWritingAThesis * JustTeachMyChildTheMaths * NotASpectatorSport * LeftTruncatablePrime * TheDoctorAndTheLawyer * FourPointsTwoDistancesProof * MeetingRonGraham * NapkinRingVersusSphericalCap * TheFourPointsPuzzle * RadiusOfTheEarthPartTwo * GrepTimingAnomaly * TheRadiusOfTheEarth * ThisWorksToCureMyHiccoughs * PerhapsWeSavedOne * ThinkingAboutMastodon * DisappearingTrainsOnVirgin * TheIndependenceGame * OneOfMyFavouritePuzzles * ThinkingAboutRecursion * MemorisingTheTube * SpikeySpheres * SurprisinglyQuick * AnUnexpectedFraction * YouHaveToAdmireTheirOptimism * RepresentativesMatter * PythagorasByIncircle * APuzzleAboutPuzzles * HowNotToDoTwitter * Calculating52FactorialByHand * SmallThingsMightNotBeSoSmall * NotIfYouHurry * FactoringViaGraphThreeColouring * AnotherProofOfTheDoodleTheorem * WhenObviousIsNotObvious * GraphThreeColouring * TheDoodleTheorem * BeCarefulWhatYouSay * TheMutilatedChessboardRevisited * AMirrorCopied * TheOtherOtherRopeAroundTheEarth * PhotocopyAMirror * ThePointOfTheBanachTarskiTheorem * SieveOfEratosthenesInPython * FastPerrinTest * RussianPeasantMultiplication * FindingPerrinPseudoPrimes_Part2 * FindingPerrinPseudoPrimes_Part1 * TheUnwiseUpdate * MilesPerGallon * TrackingAnItemOnHackerNews * HackerNewsUserAges * PokingTheDustyCorners * ThereIsNoTimeForThis * PublicallySharingLinks * LearningTimesTables * GracefulDegradation * DiagrammingMathsTopics * OnTheRack * SquareRootByLongDivision * BeyondTheBoundary * FillInTheGaps * SoftwareChecklist * NASASpaceCrews * TheBirthdayParadox * TheTrapeziumConundrum * RevisitingTheAnt * TheAntAndTheRubberBand * IrrationalsExist * MultipleChoiceProbabilityPuzzle * RandomEratosthenes * WrappingUpSquareDissection * DissectingASquarePart2 * DissectingACircle * DissectingASquare * AnOddityInTennis * DecisionTreeForTennis * DecisionTreesInGames * AMatterOfConvention * DoYouNourishOrTarnish * BinarySearchReconsidered * TwoEqualsFour * TheLostPropertyOffice * TheForgivingUserInterface * SettingUpRSS * WithdrawingFromHackerNews ---- Additionally, some earlier writings: * RandomWritings. * ColinsBlog2010 * ColinsBlog2009 * ColinsBlog2008 * ColinsBlog2007 * ColinsBlogBefore2007 ******** !! What is "Big Oh"? So a brief recap: * A _Problem_ is a(n infinite) collection of _Instances;_ * An _Instance_ is a _"Challenge/Response"_ pair; * An _Algorithm,_ when given a _Challenge,_ returns the _Response;_ * An _Instance_ has a concept of its _Size;_ * For each _Size_ of _Instance,_ we look at the worst case time taken by an _Algorithm;_ * We chart that maximum time as a function of _Instance_Size;_ * We look at how that "maximum time taken" grows. Last time I said: ********> width="10%" ******** width="80%" [[[ If you do this for two different algorithms and the results are, apart from a scaling factor, pretty much the same, then the algorithms are said to have the same "Time Complexity". ]]] ******** width="10%" ********< This post is going to look at that carefully, because while it's true that if the "time taken" functions of two algorithms "look like" each other in the long run then they have the same time complexity, that's not enough. Algorithms can behave erratically, so the concept of "this looks like that" isn't really enough, we would like to do better. [[[>50 This post is deliberately gentle and repeats itself a lot. The ideas in here are important, and can take some time to assimilate. There is also, as a work in progress, and "Autobahn" version of these posts, a high-speed romp through the basic ideas. I'll link to that soon, and replace the content of this aside with a pointer. ]]] So we will look at how /much/ better we can do. In this post we will look at "Big Oh" notation. I'm expecting that people will mostly skim this, get the broad sweep of ideas, and then come back to it later to check on the details (if they care). So don't be too worried if some of the details don't make sense yet. The broad conclusion is this: ********> width="10%" ******** width="80%" [[[ "Big Oh" notation gives us a way of understanding that in the long run, one algorithm can be better than another, and gives us a way to quantify that. ]]] ******** width="10%" ********< !! Not really linear, but sort of. [[[> https://www.solipsys.co.uk/images/Chart_Linear2.png ---- |>> Fig 1: Are these "Linear"? <<| ]]] So while the statement from last time isn't technically true, it does give the right intuition. The problem with intuition is that it can let you down in cases where we see "pathological behaviour". Most of the time it doesn't matter, but it's nice to have definitions that cover /all/ cases, so we don't have to worry about exceptions. So let's have a look at the time-performance of two hypothetical algorithms on the same problem. They are different algorithms, and are quite complex, so their behaviours are not especially easy to understand. Even so, let's posit that they behave as shown in this chart. These two algorithms behave somewhat erratically, with some instance sizes taking much less than others, but broadly, both algorithms take time that grows linearly(ish) with the size of the instance. In some sense these two algorithms are behaving very similarly, and it would be great to have some way of capturing that mathematically. Then we remember two things: * Firstly, we really only care about how an algorithm behaves _at_worst,_ and ... * Secondly, we really only care about how it behaves _in_the_long_run._ So what we look for is a function that, in the long run, is larger than the "run time" for our algorithm. That gives us an upper bound. Well in this case that's easy, we can just use $2^n$. That will certainly exceed our algorithm's run time. Hmm. Not really either helpful or informative. What we really want is the /best/ upper bound. So here's what has been proven to be both useful /and/ informative. !! Getting an upper bound with "Domination" The concept we use is to think of one function "dominating" another. In formulas, we say that function $f(n)$ "dominates" function $g(n)$ if, after some time, $f(n)$ is always bigger than $g(n).$ OK, so that's not /quite/ true. We allow for a constant, and we say something more precise about "in the long run" and we get this: ********> width="10%" ******** width="80%" [[[ Function $f(n)$ *dominates* function $g(n)$ if there's a constant $c$ and a number $N$ such that whenever $n>N$ we have $c{\cdot}f(n) > |g(n)|$ ]]] ******** width="10%" ********< [[[>40 Yes, I am literally saying the same thing, but using different words. This is an important concept, and when you think about the exact details, it's not entirely obvious. ]]] In other words, $f$ dominates $g$ if, in the long run, $f$ is always bigger than $g$ (up to a constant). This isn't /quite/ enough ... Suppose by way of an example that the time taken by algorithm $A$ is linear, so we can write $T_A(n)=kn$ for some constant $k$. Using our definition, it is true to say that this is dominated by $n^2,$ because it doesn't matter what $k$ might be, eventually $n^2$ will be bigger than $k\cdot{n}$. But that isn't really very helpful, and isn't as informative as we might like. By itself the definition of "to dominate" doesn't really capture how the algorithm behaves. In a sense it's not aggressive enough, it's too "loose". But an upper bound is an upper bound, and what we're really asking for is for the best known upper bound. That brings us to our final form. So the set-up is this. We have a problem, $P$, and for that problem we have an algorithm, $A$. At best we can we compute an upper bound for how long $A$ takes for each size of instance of $P$. We'll call that $T_A(n).$ For simple problems we might be able to compute that exactly, but for sufficiently complex problems we can usually only get an upper bound. Then we say this: * Algorithm $A$ is, at worst, of complexity $C$, if $C$ dominates $T_A$. [[[> https://www.solipsys.co.uk/images/CompareTimes.png ---- |>> Comparing growth rates <<| ]]] !! It's actually more complicated than that ! (now there's a surprise) So look again at the plot for our hypothetical algorithm, and overlay various functions that increase as $n$ increases. It's not helpful, I know, but it's true to say that the time function for this algorithm is dominated by $2^n$. It's substantially better to say that it's dominated by $n^4$, and it's better again to say it's dominated by $n^2$. Better still, it's dominated by $n\cdot\log(n)$. But it's also dominated by the function $f(n)=n$, and by $g(n)=n+5,$ and perhaps more surprisingly, by $h(n)=n-\log(n).$ What? Yes, believe it or not the function $f(n)=n$ is dominated by $n-\log(n)$. To see this, we look at the definition. That asks if there is a $k$ and an $N$ such that when $n>N$ we have $k(n-\log(n))>n.$ Rearranging that, we want to know if there is a $k$ such that when $n$ is big enough, $(k-1)n>k\log(n).$ But just taking $k=2$ suffices, so yes, $n$ is dominated by $n-\log(n)$. You might think that would render the whole concept sufficiently non-intuitive as to be useless, but actually it lets us find the idea of the "dominating term". In essence, $n$ grows faster than $\log(n)$, so including any terms that are essentially just $\log(n)$ doesn't change things. It's the fastest growing term that matters, the slower growing don't change things enough to make a difference. [[[>40 Oh, and by the way ... yes, the algorithms shown in Fig 1 are "linear". ]]] We need to make that more precise, but before we can do so we need some new notation, and a few fundamental ideas about relationships between things. ---- |>> | |>> <<<< Prev <<<< ---- ConstantDifferences <<| | : | |>> >>>> Next >>>> ---- MathematicalRelations ... <<| | ---- ********> ''' <a href="https://mathstodon.xyz/@ColinTheMathmo"> ''' <img src="https://www.solipsys.co.uk/images/Mastodon_Mascot.png" ''' width="256" height="280" ''' alt="https://mathstodon.xyz/@ColinTheMathmo" ''' /></a> ******** ''' <a href="https://mathstodon.xyz/@ColinTheMathmo/">You can follow me on Mathstodon.</a> _ _ _ _ [[[> ''' <a href="https://twitter.com/ColinTheMathmo">Of course, you can also<br>follow me on twitter:</a> ''' <a href="https://twitter.com/ColinTheMathmo"> ''' <img src="https://www.solipsys.co.uk/new/images/TwitterButton.png" ''' title="By: TwitterButtons.net" ''' width="212" height="69" ''' alt="@ColinTheMathmo" ''' /></a> ''' <img src="/cgi-bin/CountHits.py?IntroducingBigOh" alt="" /> ]]] ********< ---- !! Send us a comment ... ''' <form action="https://www.solipsys.co.uk/cgi-bin/FormMail.pl" method=post> ''' <input type=hidden name="recipient" value="colinsblogcomment@solipsys.co.uk" > ''' <input type=hidden name="subject" value="Blog comment : IntroducingBigOh" > ''' <input type=hidden name="redirect" value="https://www.solipsys.co.uk/new/ThankYouForYourComment.html" > ''' <input type=hidden name="missing_fields_redirect" value="https://www.solipsys.co.uk/RequestError.html"> ''' <input type=hidden name="env_report" value="REMOTE_HOST, REMOTE_ADDR, HTTP_USER_AGENT" > ''' <input type=hidden name="print_blank_fields" value="1" > ********> width="47%" You can send us a message here. It doesn't get published, it just sends us an email, and is an easy way to ask any questions, or make any comments, without having to send a separate email. So just fill in the boxes and then ''' <font size="+4"><INPUT TYPE="submit" VALUE="CLICK HERE TO SEND"></font> ******** width="53%" ********< ''' <table cellpadding="5"> ''' <tr> ''' <td valign="top">Your name </td> <td valign="top">:</td> ''' <td> <input type=text name="realname" size="48"> </td> ''' <tr> ''' <td valign="top">Email </td> <td valign="top">:</td> ''' <td> <input type=text name="email" size="48"> </td> ''' </tr> ''' <tr> ''' <td valign="top">Message </td> <td valign="top">:</td> ''' <td> <TEXTAREA NAME="Message" ROWS=10 COLS=64></TEXTAREA> </td> ''' </tr> ''' </table> ''' <center> ''' <font size="+4"> ''' <INPUT TYPE="submit" VALUE="CLICK HERE TO SEND"> ''' </font> ''' </center> ''' </form> ********<