Integer statistic
-
Moss 2007 | User Profiles | Part 2
[Windows] (MSDN Blogs)Importing & processing data in SQL Server Tables To completely understand the next phase of import process we must first have some additional information. There are a few stored procedures that are registered in the server’s registry that are invoked during various stages of importing data from the gatherer pipeline. These can be found in the following registry hive. HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server2.0\Search\Applications[___BRIEF___]lt;GUID>\Gatherer Archival Plugin\ProfileIm ...
Importing & processing data in SQL Server Tables
To completely understand the next phase of import process we must first have some additional information. There are a few stored procedures that are registered in the server’s registry that are invoked during various stages of importing data from the gatherer pipeline. These can be found in the following registry hive.
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server2.0\Search\Applications[___DESCRIPTION___]lt;GUID>\Gatherer Archival Plugin\ProfileImport
The most important of them are the following:
· profile_pluginOnStartCrawl
· profile_pluginOnEndCrawl
· profile_pluginDataImportRefer to Step by step process explained below which explains job done by these Stored Procedures.
This phase is where we “crawl” the data. During this phase, the data is pushed into various temporary tables in the SSP database. The following tables are written to, or updated during this phase.
· Profile_DataImport
· Profile_DeletedUsers
· Profile_DeletedUsers_Temp
· Profile_Lookup
· Profile_Stats
· ProfileImport
· ProfileImport_copy
· ProfileImportAlt
· ProfileList
· ProfileUI1. The stored procedure profile_pluginOnStartCrawl is invoked. This stored procedure basically updates the Profile_Stats table with the start time, import status & import type (Full/Incremental) which is used in updating the UI with information such as count of profiles imported and whether we are in a full crawl or not. Once we start profile import the UI status show as “Enumerating”
2. Profile import uses protocol handler used to query data from Active directory & feed the Gatherer Pipeline.
3. The data currently in the gatherer pipe has been processed by the iFilter. It is now handed over to the Archival and Retrieval plug-in. The ARPI plug-in reads this data and writes in chunks to ProfileImport and ProfileImportAlt temporary tables alternatively. Both these tables have the same structure. The table being currently written is called Active Buffer.
4. Profile_PlugginDataImport stored procedure truncates the ProfileImport_Copy table . It then copies active buffer into ProfileImport_Copy . Thus this table holds a subset of data stored in ProfileImport or ProfileImportAlt tables and is used to perform various operations such as evaluating the properties of existing user profiles, updating the signatures (Signatures hold specific user property information in a hashed format).
5. Once we have the records in ProfileImport_Copy we try to find which records need to be updated. If a profile exists, a subset of the profile property is matched to decide if there are any changes.
a) If matched, then only the LastImported field in UserProfile_Full table is updated.
b) If not we move on to Update the record.6. We continue using ProfileImport_copy to process the properties and occasionally write into Profile_DataImport table. The Profile_DataImport is just another temporary table that we use to process changes with the imported data before we finally write to the UserProfile_Full table. This way we also minimize write lock times while we update the final data store for user profiles – UserProfile_Full
7. The next step is the final step in the profile import cycle. After we have finished processing, we will truncate the active buffer, ProfileImport_copy, and Profile_DataImport so that they are clear for the next run.
8. At this stage, the stored procedure Profile_PluginOnEndCrawl is executed, which updates the Profile_Stats table which indicates the crawl is complete. In the UI, this will reflect as Idle against “Profile import status” setting and the final statistics of the import is displayed in the bottom of the Web page. At this point, profile import is completed and the process to update Membership statistics will start. You should see this as “Enumerating” in the user interface against the “Membership & BDC import status” setting.
Key SQL Server Tables Used in Import Process
As discussed above, all the user profile information that is retrieved from the Active Directory store is ultimately written into the UserProfile_Full table located in the SSP database. Each row in the UserProfile_Full table represents a user. The following are some of the useful information which is written into the UserProfile_Full table:
Attribute
Description
RecordID
Primary Key. The record identifier of the user profile
DocID
The SharePoint document identifier associated with the user profile.
UserID
GUID which provides a universal identifier for the user profile.
NTName
SAM account name of the user’s logon account.
PreferredName
Display name of the user
Email
Email address
SID
Security Identifier
Manager
Manager name
SIPAddress
Session Initiated Protocol address
LastUpdate
Last Date & time when this record was updated.
LastUserUpdate
Last date & time user updated data in this record.
LastImported
Last Date & Time user profile data was updated during import process
bDeleted
Flag to indicate if the user has been marked for deletion & is no longer active
Signature
A MD5 hash that represents properties of the user.
* NOTE: The Signature is checked to see if the value matches one that we already stored. If a match is found, we do not update the properties for that user.
There is another table that also stores the user profile information. This table is UserProfileValue. This table contains all the user profile property values. The following is the structure of the UserProfileValue table
RecordID
A 64-bit integer which specifies the record id of the user profile this property is associated with
PropertyID
A 64-bit integer which specifies the identifier of the user profile property
PropertyVal
A variant value which is the value of the user profile property.
Image
Image
Text
A string associated with the user profile.
VocValID
A string that specifies a database operation for a property in a user profile record Add/delete etc
OrderRank
An integer which specifies the order of display of the user profile property in the user interface
Privacy
A privacy type value of the user profile property.
The RecordID links the UserProfile_Full table and UserProfileValue tables together. The UserProfile_Full holds the basic user profile information while the UserProfileValue holds additional information such as Image etc.
Timer Jobs in User Profile Import
The user profile import begins with a timer job that is part of the Shared Services. Therefore the user profile import starts within the OWSTimer.exe process running on the host which is also the indexer (Remember that User Profiles is dependent on the crawler component). As soon as a full import is scheduled, a one-time, timer job is created. You cannot view this timer job in the Central Administration’s Timer Job Status page because it is hidden. Shared Services scoped timer jobs are usually hidden because there is nothing you can configure for these timer jobs. To view the names of the SSP scoped timer jobs, you can use the following command:
Stsadm –o enumssptimerjobs -title “Your Shared Service Name”
SSP Timer Job Id="a7297727-de3a-4411-91e8-2f3543137494" Display Name="User Profile Change Job"
SSP Timer Job Id="db46d3ed-f4f3-4d86-af8a-3f7336c30f2d" Display Name="Distribution List Import Job"
SSP Timer Job Id="dfcb8fb2-5212-4e78-b236-591635096f6e" Display Name="User Profile Change Cleanup Job"
SSP Timer Job Id="8ece801e-deed-494d-ba51-5d154e592a3b" Display Name="Audience Compilation Job"
SSP Timer Job Id="63a1a5e4-92fe-4a07-81a7-684b150adaa6" Display Name="User Profile Full Import Job"
SSP Timer Job Id="804ede6a-b9a2-43b5-86e8-8d1ea0ff39c5" Display Name="User Profile Incremental Import Job"
The output of this however, will not provide you with the job schedule as you can see from the above. To view the job schedule for these timer jobs, you can use the SQL Query as given below against the SSP Database.
SELECT DisplayName, Recurrence, NextDueTime, Disabled FROM MIPScheduledJob (NO LOCK)
CAUTION: It is unsupported to directly query the database or make any modifications. Refer this link:
http://msdn.microsoft.com/en-us/library/bb861829.aspxNote: A 0 in the Disabled column indicates False and 1 indicates true.
For more information on the SSP time jobs, please refer this MSDN link and the section SSP timer jobs
http://technet.microsoft.com/en-us/library/cc678870.aspx#section2Import Log
The import log contains information about the success or failures that occurred during the user profile import. As mentioned earlier, we use the crawler component for the user profile import. Thus to view the import logs, we use the Crawl log. You can access it as follows:
· Shared Services Administration: Shared Service Name > User Profile and Properties
· Click on View Import LogThe following is a picture of a successful user profile import log. From the highlighted portions, you can see that the content source is PEOPLE_DL_IMPORT. In the URL section, you also see the domain for which the crawl occurred.
FAQs
How is user profile information updated? For eg, if a user is removed from active directory when does it get updated in SharePoint databases?
Answer: You must schedule a full import at regular intervals so that users who have been deleted or updated in the data source are removed from the user profile database. Incremental import only adds those user accounts that have been added since the last full import.
What happens to the MySite of a user after that user has been removed from the Active Directory? Is the site removed automatically or does it continue to exist.
Answer: The ownership of the Mysite will be assigned to the manager of the user. A timer job called Mysite Cleanup will later remove this site from the database.
What account is used to access the content from the data store – Active Directory?
Answer: This is a configurable setting. By default, The default account will be used for connections that do not specify an import access account. When choosing Use Default Content Access Account verify that the account has access to the source. It is recommended that you specify an account rather than relying on the default content access account.
What happens if we remove a user profile from SharePoint? Does it import it again automatically or does it need to be an explicit import?
Answer: It will be automatically imported on the next run.
References:
About protocol handlers (Office SharePoint Server 2007)
Adding connections to Active Directory
-
Armadillo C++ Library 0.9.2
[Open Source] (Open Source Pixels)Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, ...
Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, [...] -
Armadillo C++ Library 0.9.2
[Tech, Linux, Shareware] (freshmeat.net Releases)Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, based on template meta-programming, is used (during compile time) to combine several operations into one and reduce or elim ...
Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, based on template meta-programming, is used (during compile time) to combine several operations into one and reduce or eliminate the need for temporaries.Changes: This release has minor speedups as well as bugfixes in complex number versions of several functions.
Release Tags: minor enhacements, Minor bugfixes
Tags: Scientific/Engineering, Mathematics, Software Development, Libraries, machine learning, Statistics
Licenses: LGPL
-
Composite vs elementary particles
[Physics, Science] (The Reference Frame)Tommaso Dorigo wrote a text about some limits on compositeness. You don't have to read it: nothing is excessively interesting about the experiments and their punch line is, of course, that no internal substructure of quarks and leptons has been seen by the colliders as of today. Tommaso Dorigo's comrade Vladimir Lenin believed than an electron was a galaxy with many electrons, and so on, indefinitely. This "inexhaustible" picture of Matryoshkas was clearly indefensible already during Len ...
Tommaso Dorigo wrote a text about some limits on compositeness. You don't have to read it: nothing is excessively interesting about the experiments and their punch line is, of course, that no internal substructure of quarks and leptons has been seen by the colliders as of today.

Tommaso Dorigo's comrade Vladimir Lenin believed than an electron was a galaxy with many electrons, and so on, indefinitely. This "inexhaustible" picture of Matryoshkas was clearly indefensible already during Lenin's life. First, it has to stop at the Planck scale. Second, two electrons must be exactly identical to make chemistry work, so they can't carry any substructure that would be as variable as one of a galaxy.
The typical characteristic energy scale where compositeness could possibly become compatible with the existing experiments is something like 5 TeV (or higher): it means that the internal pieces have to be really close to each other, closer than 1/(5 TeV) in the "hbar=c=1" units, if they're composite. Unless unexpected things happen, the LHC will just improve these limits.
Compositeness has been a traditional and repeatedly successful type of insight in physics but when you repeat some idea many times, it ceases to be revolutionary. It is getting pretty boring. And frankly speaking, there exists no convincing reason - theoretical or experimental reason - why the quarks and leptons in the Standard Model should be composite, i.e. composed out of smaller particles.
History of compositeness
Let me review a brief history of compositeness.
In the ancient Greece, many philosophers would think that everything was made out of five classical elements:
They invented this theory by pure thought which is great except that they have overestimated their ability to guess the correct answer. ;-) Four of the elements were highly composite while the middle one, the aether, didn't exist at all. :-)air
water ... aether ... fire
Earth
Democritus' atomist school hypothesized that the matter was composed of atoms. That wasn't such a huge discovery - it's really one of the two possible answers. Matter can either be continuous or not. Their shapes of the "indivisible" atoms were strange, contrived, but kind of practical - balls with hooks to make the interactions easy and diverse. :-)
But the idea was qualitatively correct. Once alchemy was supplemented by some rational thinking and careful work, chemistry was born. People learned that the mixing ratios were nice rational numbers, in some proper units. At the microscopic level, the compounds (materials) were made out of molecules, and each molecule is made out of atoms which are the basic building blocks of chemical elements (materials).
So generic matter was found to be made of molecules which were bound states of several atoms. Everyone knows this story so I won't give you a PBS special here. Atoms were found to have a nucleus. Rutherford was shocked when the alpha particles mostly penetrated through the gold (packed in zinc sulfide) in 1909. He summarized his surprise in his famous quote:
All science is either physics or stamp collecting
Oh no, I did mean this one although it is true, too. I meant:
It was quite the most incredible event that has ever happened to me in my life. It was almost as incredible as if you fired a 15-inch shell at a piece of tissue paper and it came back and hit you. On consideration, I realized that this scattering backward must be the result of a single collision, and when I made calculations I saw that it was impossible to get anything of that order of magnitude unless you took a system in which the greater part of the mass of the atom was concentrated in a minute nucleus. It was then that I had the idea of an atom with a minute massive centre, carrying a charge.
The atomic nucleus was born. Because people understood the atomic weights, it became clear that the nucleus was composed out of protons and neutrons. By the 1960s, a whole jungle of hadrons - particles similar to protons and neutrons - had been known. The deep inelastic experiments played a very analogous role to the Rutherford experiment: they helped to show that even protons and neutrons were composite.
The theory ultimately found to describe these particles - Quantum Chromodynamics, also motivated by several other key partial observations - described protons and neutrons as composites of three quarks (aside from a lot of gluon fuzz and quark-antiquark short-lived pairs).
The Standard Model assumes that leptons, quarks, photons, W-bosons, Z-bosons, the Higgs boson, the graviton (if I include gravity for a while), and all their anti-particles are point-like, described by fields that map points in spacetime to operators on the Hilbert space. They interact locally.
The Rutherford experiment story could be repeated again. Except that it seems as a boring old idea. And there are no good reasons to think that the leptons and quarks are composite objects. There are no known experimental reasons, as Tommaso's article clarifies in detail.
But there are no good theoretical reasons, either. While the quarks have simplified the jungle of strongly interacting particles, no known model of a substructure of quarks and leptons is able to do the same thing today. Preons and rishons usually have to add a lot of new stuff that is as complicated as the quarks and gluons themselves, new gauge groups (hypercolor etc.), and they usually fail to get a realistic phenomenology (including three families), anyway.
Mass vs compositeness scale
In fact, there exists a good general argument to see that the traiditional compositeness (even smaller point-like particles inside the known ones) is unlikely to win another match. Let's look at the masses and sizes of the composite objects. In most of this discussion, I will use the relativistic "E=mc^2" relationship between mass and energy.
The molecules typical change the total energies of the electrons in the atoms by millielectronvolts or so (it's the energy of a photon emitted when a molecule changes). The size of the molecule may be an inverse electronvolt. The molecules themselves are heavy - containing dozens of protons and neutrons i.e. dozens of GeV in the nuclei - but most of the mass is sitting at fixed places most of the time. It's the light and free electrons that should be credited with all the wonders of chemistry.
That was atomic and molecular physics - chemistry. Nuclear physics deals with energy differences comparable to fractions of a GeV. And the size of the nuclei is an inverse GeV, too. Note that in the molecular case, the energy difference was smaller than the inverse size - because of the small fine-structure constant and heavy nuclei that keep the electrons localized). But in the nuclear case, the sizes and typical energy differences are linked more tightly: the coupling of the strong force is much closer to one.
There exist lighter hadrons - strongly interacting particles. Pions are the most important examples of mesons. They're light because they can be approximately described as Goldstone bosons - a type of particles that should ideally be massless because they're linked with a symmetry (in this case, an approximate SU(2) or, even less accurately, SU(3) symmetry between the flavors i.e. different types of quarks - up/down and perhaps strange).
But most of the strongly interacting particles have GeV-like masses. It's very hard to get particles that are substantially lighter than the inverse size of the bound state. Once we don't ignore the latent "E=mc^2" energy of the nuclei or anything, the natural expectation is that the size and the energy are inversely proportional to each other.
But the elementary fermions of the Standard Model are pretty light. The heaviest is the top quark, and all others are much lighter. And we have already studied physics at distances comparable to the inverse mass of these particles (in the hbar=c=1 units): and there's no compositeness. So it makes it natural to expect that there is no compositeness anywhere.
Strings and surprisingly light composites
However, you know that quarks and leptons may be viewed as composites, in some sense. If they're vibrating strings, i.e. if perturbative string theory is a good approximation of reality, they may be interpreted as eigenstates of a bound state of "pearls" - the so-called string bits - connected into loops. These string bits are strongly interacting. In fact, their interaction is determined by the string tension which is just huge - probably 10^{18} GeV or so.
So how is it possible that there exist light string vibrations whose mass is well below the string scale of 10^{18} GeV? Well, that's a good question but there are good answers, too. In string theory, one can actually show that some states may be exactly massless, or almost exactly massless, because of both old (non-stringy) and new (stringy) reasons.
The old reasons are primarily symmetries. The photons and gravitons are massless because the gauge bosons associated with unbroken local symmetries in spacetime have to be massless. Similarly, fermions may be massless because of supersymmetry - if they're paired with massless bosons - or because of the chiral symmetry (left-right asymmetric change of the phase of their wave function).
Additional spin-zero bosons may be massless because supersymmetry may pair them with massless fermions, whose masslessness was protected by the chiral symmetry, or because they're the Goldstone bosons connected with an exact or approximate symmetry.
At any rate, whenever you have a particle that is much lighter than the dimensional analysis would indicate - that is unnaturally light - you should ask why it is so because such an observation is "marginally incompatible" with the a priori expectations based on the Bayesian inference that lead to natural masses. There is never any sharp contradiction here - because it's just some Bayesian inference based on vague arguments and statistics - but those mental tools should be refined, too.
I have mentioned the non-stringy reasons why particles can be massless (or much lighter than expected). But there also exist purely stringy reasons. One of their classes are index theorems. If you consider e.g. heterotic strings on Calabi-Yau manifolds, the first realistic realizations of the (nearly) real world within string theory, you find out that in the approximation of the geometry, the leptons and quarks are massless. They're surely much lighter than the string scale.
Why is it so? It's because supersymmetry links not only bosons and fermions but it usually pairs left-handed particles with their right-handed partners, too. This has to be true for all massive particles. But massless particles can come in "short multiplets" - the "partner" of a particle can be "zero" which fails to be a new independent normalizable "basis vector" or a new "particle species". (The square of this zero is linked to the mass of the particle.)
In fact, there exist sophisticated geometric methods to calculate the number of left-handed massless particles without partners. In the case of Calabi-Yau manifolds, this imbalance is linked to the homology of the manifolds - its Hodge numbers. It's the number of topologically non-equivalent and independent non-contractible "holes" or codimension-p submanifolds of the Calabi-Yau manifold.
For a given topology, you can prove that these integers are nonzero, and they imply that there has to be an asymmetry between left-handed and right-handed fermions. Those "odd ones" have to be massless. They can eventually get some small masses from the supersymmetry breaking and from the interactions with the Higgs boson etc. But these effects are "small oscillations on the stringy background", much like the binding energies of the electrons in the molecules (analogy: corrections from the Higgs) were just small corrections to the huge, solid, and unchanging latent energy of the nuclei (analogy: stringy geometry).
There exist other, often surprising reasons why the light fields are light in various vacua of string theory. All these arguments may be viewed as "generalized geometry" in one way or another. The diversity of reasons that string theory is able to relate (or even identify) is amazing. And even in phenomenology, when people try to construct models where the Higgs is lighter than the generic models would imply, they get inspired by geometry - they "engineer" degrees of freedom that behave much like the extra dimensions of string theory. See e.g. Littlest Higgs model and deconstruction.
To summarize this portion of the text, particles that are much lighter than their inverse size (in the c=hbar=1 untis) almost always have to have reasons to be light. The reasons include broken or unbroken symmetries, relationship with other particles protected by symmetries, or stringy arguments such as index theorems that are as powerful as the symmetries. All those arguments may work either exactly, or in some approximation. In the latter case, the particles are massive but much lighter than you would expect if you didn't know about the hierarchy of influences.
Compositeness of magnetic and electric particles
There's a much more general and equally important theme I want to mention in this article: the notion of compositeness is not physial in general. It depends on the description. However, when you know that the coupling between your lightest objects is weak, you may always divide your objects into elementary and composite ones.
Let me mention some examples.
Electrons and quarks carry the electric U(1) charge. In The Big Bang Theory, Sheldon Cooper tried to find the magnetic monopoles, and for a good reason. It's almost guaranteed that the magnetically charged particles - South poles of a magnet without the North poles, or vice versa - have to exist.
Why? For example, locality around black holes implies that it must be possible for the magnetic field to be "mostly outgoing" from a region. The region may be one side of a dipole magnet. However, one of the sides may be faster in its collapse to a black hole. Consequently, you must be able to create a black hole that carries a magnetic monopole charge, at least in principle.
So such microstates have to exist. And it's likely that the lightest microstates with this new kind of charge will look more like particles than the black holes. However, these particles may still be insanely heavy - like the GUT or string scale, 10^{18} GeV. At any rate, they should exist. While it's not clear whether there's any useful or well-known low-energy description, I think that good physicists agree that the monopoles should exist in the spectrum.
The funny thing is that the monopoles may always be viewed as composite objects. More precisely, in some field theories such as GUT theories that admit monopoles, they can be represented by classical solutions. They're topologically nontrivial configurations of the photon and generalized "gluon" fields that hold together, if you wish. In this sense, they're made out of infinitely many gauge bosons conspired in a specific way. That's also why they're so heavy.
More generally, we use the word "solitons" for such composite objects that are most easily described as classical solutions involving fields that are associated with the light particles. (Of course, they should still be quantized, after you construct them: the world is a quantum world, and it applies to everyone.)
If you want to know, "instantons" are similar solutions like solitons, but they're localized in the (Euclideanized) time, too. Instantons are not static objects but isolated "histories" that contribute to the Feynman path integral. They change the results if you calculate the probability of a process (such as a rare decay of a seemingly stable particle).
The electrically charged particles look point-like and elementary - they're usually light - while the magnetic monopoles are heavy and look like a non-local, extended solution involving the elementary fields. A similar separation exists in string theory, too.
When the string coupling constant is low (weak coupling), the strings are the lightest, and therefore most elementary, objects in your theory. Other objects are "made out of strings". For example, D-branes can be understood as a special type of "solitons". In fact, the D-brane masses go like "1/g" in string units, and they're near the geometric average of the strings' mass ("1" in string units) and the field-theoretical solitons similar to magnetic monopoles (whose mass goes like "1/g^2").
In a different parameterization, the D-branes have masses that go like "1/g_{closed} = 1/g_{open}^2", which is the usual power law for the solitons, but with "g_{open}" replacing the gauge coupling from field theory.
S-duality, evaporating compositeness, and bootstrap
While the separation to elementary (usually light) and composite (usually heavy) particles is clear at the weak coupling, it becomes ill-defined at the strong coupling ("g" of order one). In fact, many theories exhibit S-duality. If you make "g" much greater than one, physics will be totally equivalent to physics of another (or the same) theory at the coupling "1/g" which is much smaller than one. For the N=4 gauge theory, this map is an exact self-equivalence.
Such equivalence means that the magnetic monopoles and the electrically charged particles are equally elementary and equally composite! It was just a matter of the weak-coupling expansion that one group looked more elementary while the other looked more composite. For higher couplings, this "qualitative" difference goes away.
Less symmetric theories, such as N=2 gauge theories, usually have a more complex prescriptions how the electrically and magnetically charged states (and dyons, which have both) transform into each other, as shown by Seiberg and Witten (and their followers). It's still true that these theories show that the separation of particles into elementary and composite depends on the context and is not sharp and universal.
After all, that's what was expected for decades. Werner Heisenberg was among those who believed in "bootstrap", a self-consistent theory that defines its own rules and that prevents you from starting from a unique, constructive starting point that divides the objects into elementary and composite ones.
This bootstrap thinking was popular in the late 1960s, at the same time when string theory happened to be born, and it kind of influenced the birth of string theory, too. However, the philosophy was completely defeated at least for 30 years. Quantum Chromodynamics described the nuclei using a completely constructive, non-bootstrap theory with well-defined elementary fields. And even string theory itself abruptly became a constructive theory with very well-defined elementary degrees of freedom which are separated from the composite or "derived" ones.
So historically, string theory is sometimes linked to the bootstrap program, but it is a flawed correlation scientifically. String theory is as uncorrelated to the bootstrap program as field theory. And the bootstrap program hasn't really been successful (except for the classification of classes of two-dimensional conformal field theory).
But it's pretty likely that the bootstrap program will have to return to physics. Compositeness is not absolute, and if people ever find a description of string theory that is equally valid or equally "weakly or strongly coupled" in all situations (which may be a contradiction, who knows!), then such a formulation will also have to treat all objects as equally fundamental or non-fundamental, and only physical distinctions such as the mass and/or interaction strengths will be derivable.
We know that the difference depends on the environment. And there's one more place that clearly shows that compositeness is doomed. At the Planck scale, the smallest black hole microstates are surely "somewhere in between" composite objects - black holes are kinds of "solitons of general relativity" - and elementary particles - black hole microstates are just heavy particle species.
This transition has to be gradual and the peaceful co-existence of the black holes as the dominant heavy-mass microstates, described by GR as solitons, and the low-energy limit of GR without black holes is a major consistency constraint that makes quantum gravity so hard and that guarantees that only the solutions linked to string theory may work.
Quantum gravity is not an "anything goes" business. It is a very fine reconciliation of two worlds with known descriptions. Both of these worlds, in some sense, describe everything when extrapolated properly, but the extrapolation that agrees with both (or all) limits is very nontrivial and doesn't allow you to make naive bureaucratic decisions such as the separation of composite particles from the elementary ones, or the counting of either. These things are ill-defined. -
Armadillo C++ Library 0.9.0
[Open Source] (Open Source Pixels)Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, ...
Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, [...] -
Armadillo C++ Library 0.9.0
[Tech, Linux, Shareware] (freshmeat.net Releases)Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, based on template meta-programming, is used (during compile time) to combine several operations into one and reduce or elim ...
Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, based on template meta-programming, is used (during compile time) to combine several operations into one and reduce or eliminate the need for temporaries.Changes: This release has an extended and overhauled expression evaluation framework for faster handling of compound expressions. Several new functions were added and the documentation has been improved: there is now a conversion table between Matlab and Armadillo syntax.
Release Tags: Major architecture enhancement, Documentation Updates, Speedups
Tags: Scientific/Engineering, Mathematics, Software Development, Libraries, machine learning, Statistics
Licenses: LGPL
-
Armadillo C++ Library 0.8.2
[Tech, Linux, Shareware] (freshmeat.net Releases)Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, based on template meta-programming, is used (during compile time) to combine several operations into one and reduce or elim ...
Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, based on template meta-programming, is used (during compile time) to combine several operations into one and reduce or eliminate the need for temporaries.Changes: Several bugs were fixed. Functionality was added for forward compatibility with future 0.9.x releases.
Release Tags: Minor bugfixes
Tags: Scientific/Engineering, Mathematics, Software Development, Libraries, machine learning
Licenses: LGPL
-
Value of the fumble
[Fantasy Football] (Footballguys.com Forums: The Shark Pool)Value of the fumble QUOTE We handed this Quick Reads intro off to Adrian Peterson, but then he dropped it. Maybe you saw that one coming. After the New Orleans Saints and Minnesota Vikings played a game that was more reminiscent of Pop Warner football on Sunday night, though, fumbles are on our mind. Measuring the impact of turnovers on an offense can be an inexact science. While fans rue a turnover in the moment, all we see at the end of a season in a player's stat line is an integer. We kn ...
Value of the fumble
QUOTEWe handed this Quick Reads intro off to Adrian Peterson, but then he dropped it.
Maybe you saw that one coming. After the New Orleans Saints and Minnesota Vikings played a game that was more reminiscent of Pop Warner football on Sunday night, though, fumbles are on our mind.
Measuring the impact of turnovers on an offense can be an inexact science. While fans rue a turnover in the moment, all we see at the end of a season in a player's stat line is an integer. We know Pierre Thomas fumbled two times in the regular season, but there's no context for where the fumbles occurred on the field or how important they were. Most statistical lines also include only fumbles lost, which is nonsense; Football Outsiders' research has shown that fumble recoveries are a wholly random event and that no player or team consistently recovers more than 50 percent of the fumbles it puts on the ground year after year. (For anecdotal evidence of how fumble recoveries aren't skillful plays, watch some of the Keystone Kops work from the Superdome.)
Just as we give players credit for touchdowns, they also deserve criticism for turning the ball over. Applying the old adage of "a penny saved is a penny earned" to football, the points you lose by turning the ball over are just as valuable as the ones you gain by scoring.
We can do that in a pretty simple manner by evaluating the yard line at which a fumble takes place and then noting the point expectation for both the offense and the defense from that yard line. For example, say a team has a ball on the "70th" yard line, its opposition's 30-yard line, 70 yards away from its own end zone. On average, a team with the ball at this point will score 2.83 points per possession, so losing the ball 30 yards away from the end zone will cost that team 2.83 points.
Furthermore, the defense now gets the ball on its own 30-yard line, 70 yards away from the opposing end zone. A team at that 30-yard line will score an average of 0.4 points on that subsequent possession. Adding those two figures up, a fumble by a running back on the 30-yard line costs his team a total of 3.23 expected points. At Football Outsiders, we adjust these figures for the down and distance, as well as the game situation, but we'll keep things quick and simple for the purposes of this article.
If we include fumbles both kept and lost, and blame Brett Favre for the fumble on the botched handoff as opposed to Peterson, the Vikings' star back fumbled nine times this season. Using the methodology above, those fumbles cost his team a whopping 28.9 points. That's the most of any running back in football this season, ahead of Steve Slaton (22.7 points), Matt Forte (22.6 points), Beanie Wells (20.5 points) and Tim Hightower (19.1 points).
That's one of the reasons why Peterson's DVOA consistently ranks far lower than his reputation and why he's not the best running back in football. He might be the most talented back in the league, and our advanced stats still don't consider the number of defenders in the box against him. But his inability to hold on to the football costs the Vikings pretty significantly over the course of a season. Usually, they're able to overcome Peterson's fumble issues; on Sunday, the botched handoff from Favre to Peterson might very well have cost them their season.
Here are our ratings for the best and worst players of the conference championships. Click here to learn more about what DVOA and DYAR numbers mean and how they are computed.
Quarterbacks
Rk Player Team CP/AT Yds TD INT Total
DYAR Pass
DYAR Rush
DYAR
1. Peyton Manning IND 26/39 373 3 0 305 317 -11
Manning finishes with the fourth-highest single-game total of the season after picking apart the league's best pass defense. He got off to a slow start, taking two sacks on back-to-back passes after taking all of three sacks against Rex Ryan-authored defenses in five years, but Manning got into a rhythm in the second half. Consistently picking on weak links Drew Coleman and Dwight Lowery, Manning sailed right by Revis Island and completed 16 of his 21 attempts in the second half, gaining 155 yards while throwing for nine first downs and two touchdowns. That half accounted for 178 of his 317 passing DYAR on the day.
2. Mark Sanchez NYJ 17/29 257 2 1 97 99 -3
Sanchez didn't exactly lead the Jets to the playoffs -- the closest metaphor we could come up with was the cute girl in your study group who doesn't contribute anything at your meetings, but still earns the "A" for your work anyway -- but he sure was good when they got there. He finishes the postseason with a 40.9 percent DVOA, after a regular season where he was one of the worst quarterbacks in the league. While he threw a meaningless late interception, he also made big-time throws on his touchdown passes to Braylon Edwards and Dustin Keller. (Give credit to the "NFL Matchup" guys, who noted that the Jets went for big plays on the third series of games; the 80-yard touchdown pass to Edwards was the first play of the third series.)
3. Brett Favre MIN 28/46 310 1 2 86 103 -17
Favre is credited with the fumble inside the Saints' 5-yard line, which cost his team an estimated 3.67 points. His second interception was obviously egregious; while the Vikings needed to move the ball forward after their 12-men-in-the-huddle penalty (after which Favre tried to call a timeout, which would have been consecutive timeouts by the offense), even a slim chance at a 56-yard field goal is better than a dangerous pass over the middle. Those mistakes spoiled an otherwise great game by Favre, who abused Randall Gay and Tracy Porter while staying away from the dangerous Jabari Greer. He was very effective on third down, going 7-of-12 for 115 yards, yielding six first downs and a touchdown pass to Sidney Rice.
4. Drew Brees NO 17/31 197 3 0 39 53 -14
Brees is given minus-14 DYAR for the aborted snap on third-and-1 from his own 14-yard line; being stuffed in that situation isn't particularly bad, especially against the Vikings' defense, but nearly turning the ball over is disastrous. He was the master of coming up just short on third down, completing two different passes for 9 yards on third-and-10, and picking up 16 yards on third-and-18. On first down, he was awful, as 13 dropbacks yielded a strip sack, a very questionable 12-yard defensive pass interference penalty, four completions totaling 44 yards and seven incomplete passes. He has to play a lot better than this to beat Indianapolis.
Five most valuable running backs
Rk Player Team Rush
Yds Rush
TD Rec
Yds Rec
TD Total
DYAR Rush
DYAR Rec
DYAR
1. Pierre Thomas NO 61 1 38 1 63 42 21
Thomas nearly fumbled on his fourth-down conversion in overtime, with Chad Greenway's helmet knocking the ball out of Thomas' hands, but also affixing it to his stomach in the process. Thomas had a 50 percent success rate against the league's best run defense, and he did great work dancing along the sideline en route to his 38-yard receiving touchdown in the first quarter.
2. Joseph Addai IND 80 0 13 0 15 4 10
From 63 DYAR to 15! That's quite the drop-off. Addai probably doesn't deserve all the blame for his second-quarter fumble, when Calvin Pace came free and nearly hit Addai before he was handed the ball. Take away the fumble, and he's way closer to Thomas' figure. Addai didn't have a run longer than 17 yards, but he had seven carries of 5 yards or more.
3. Shonn Greene NYJ 41 0 0 0 10 10 0
While Jets fans might have rued the absence of Leon Washington on Sunday, Washington's injury has allowed Greene to emerge as the team's best pure running back. He's still got a ways to go as a receiver, but Greene's got more than a bit of Marion Barber in him with regards to hitting the hole. He started off the second half with two consecutive 7-yard runs before leaving with an injury; the Jets had 25 rushing yards the rest of the way.
4. Reggie Bush NO 8 0 33 1 5 -20 25
That DYAR figure does not include his muffed punt, which was disastrous. It also doesn't account for his miraculous effort in turning a reverse from a huge loss into no gain, running about 25 yards in the process. For all the talk of his new rushing style and how he's matured, though, the results weren't there this week: Seven carries yielded 8 yards and a success rate of 14 percent.
5. Adrian Peterson MIN 115 3 14 0 1 -6 6
Yes, he had three touchdowns and 115 rushing yards; in addition to the two fumbles, though, he had a success rate of only 40 percent and averaged 2.1 yards per carry on 13 first-down carries against the league's fourth-worst run defense. Wonder how the Saints spent the whole game teeing off on Brett Favre? It was because he was stuck in third-and-8 all day.
Least valuable running back
Rk Player Team Rush
Yds Rush
TD Rec
Yds Rec
TD Total
DYAR Rush
DYAR Rec
DYAR
1. Thomas Jones NYJ 42 0 28 0 -8 -26 18
Even cliffs are disavowing any knowledge of Thomas Jones falling off them at this point. Sixteen carries against the Colts resulted in a 19 percent success rate and exactly one first down. The two first downs he picked up on passes while down two touchdowns in the fourth quarter gave his numbers some boost, but what happened to this guy?
Five most valuable wide receivers and tight ends
Rk Player Team Rec Att Yds Avg TD Total
DYAR
1. Pierre Garçon IND 11 15 151 13.7 1 81
If Dwight Lowery were a video game, Garcon would have the high score; he started off with 27- and 36-yard receptions and settled into a steady string of big plays as the game went along. Six of his 11 completions resulted in first downs, and a seventh was a great catch on a fade for a touchdown. His 23-yard catch in the fourth quarter on third-and-9 from the Jets' 35-yard line probably sealed the game.
2. Austin Collie IND 7 9 123 17.6 1 78
When Garcon missed the first Jets-Colts game, Collie had a dominant first half against Sheppard; with Garcon back, Collie shifted into the slot, and he made whichever Jets safety was matched up against him look foolish. A string of six consecutive catches yielded 112 yards, four first downs and a touchdown. Oh, and when Anthony Gonzalez returns next season, Collie will be the Colts' fifth receiver on offense (behind Gonzalez, Garcon, Reggie Wayne and Dallas Clark). Scary.
3. Jerricho Cotchery NYJ 5 7 102 20.4 0 37
The bomb from Brad Smith shows how tricky it can be to assign measures of value to individual players in football. Cotchery caught a 35-yard pass and ran for 10 yards after the catch. Those are the facts. We know that the ball was underthrown, though, so Cotchery probably deserves more credit than Smith does for the play. On the other hand, Cotchery was wide open because of the play call and the Jets' tendencies over the course of the season, not his speed or route-running, so really, Brian Schottenheimer deserves a lot of the credit, too. All five of Cotchery's completions resulted in first downs, and three of those plays were on third down.
4. Dustin Keller NYJ 6 7 63 10.5 1 32
There aren't many plays in the playbook for second-and-17, but the Jets pulled out one of theirs on a pass to Keller that picked up 19 yards and a first down in the third quarter. Keller also had a nice catch on the Jets' second touchdown. But as the game wore on, they needed better blocking, so that meant more Ben Hartsock.
5. Visanthe Shiancoe MIN 4 6 83 20.8 0 30
Someone who's not a Jet or Colt! Shiancoe's four completions each went for between 16 and 26 yards, including a sublime catch on the sidelines. He wasn't in friendly situations, either; he converted two third downs, a second-and-9, and even a second-and-20.
Least valuable wide receiver or tight end
Rk Player Team Rush
Yds Rush
TD Rec
Yds Rec
TD Total
DYAR Rush
DYAR Rec
DYAR
1. Percy Harvin MIN 15 0 38 0 -30 -17 -13
Harvin ran for two first downs, but he also fumbled; in the passing game, outside of one 20-yard completion, he had seven other targets for 18 yards. After talk all week of whether he and Saints tight end Jeremy Shockey would play, neither had much of an impact on Sunday.
Special "The Pistol is not the Wildcat" Section
Rk Player Team Pass Rec TD Pass
DYAR Rec
DYAR Total
DYAR
1. Brad Smith NYJ 1/1, 45 yds 2/3, 7 yds 1
-15 13
The Jets spent all season setting up Smith's pass from an option play, but when Smith got Jerricho Cotchery wide open, he showed why he's no longer a quarterback. His 35-yard duck was complete, but had Smith led Cotchery, it would have resulted in an easy touchdown instead of just 10 yards after catch. It was a throw closer to LaDainian Tomlinson or Antwaan Randle El than even, say, a Seneca Wallace. -
[ Programming & Design ] Open Question : Please help me with java program!!!!!!!?
[Q & A] (Yahoo! Answers: Latest Questions)here are the instructions Statistics Background: Your instructor will provide you with a text file, (numbers.txt), containing a large (N <= 1000) number of integers. The integers range in value from 0 to 100. The text file has been created with one value on each line. Due to the potential for the sum of the numbers to be very large, you should use a long integer in your calculation to find the average. The number of integers in the file is unknown. Your program must find the average, standa ...
here are the instructions Statistics Background: Your instructor will provide you with a text file, (numbers.txt), containing a large (N <= 1000) number of integers. The integers range in value from 0 to 100. The text file has been created with one value on each line. Due to the potential for the sum of the numbers to be very large, you should use a long integer in your calculation to find the average. The number of integers in the file is unknown. Your program must find the average, standard deviation, and mode of the list of numbers. The mode is defined as the value(s) present with the highest frequency. Calculating the standard deviation consists of the following steps: Find the average of the list of numbers. Determine the difference of each number from the average, and square each difference. Sum all the differences. Divide this sum by (the number of values - 1). Take the square root of the above division result from step c. Example, given this list of numbers: 7 4 5 9 10 The average = 7 Sum of square of differences: For a normal distribution, 68.3% of the data will lie within one standard deviation of the average, while 95.4% will lie within two standard deviations. Assignment: Your program should print out the average, standard deviation, and mode of the data in numbers.txt. Format the real numbers to print with 2 decimal places. Your program must utilize proper modular design and parameter passing. Turn in your source code and run output. LAB ASSIGNMENT A15.3 page 11 of 11 Statistics Background: Your instructor will provide you with a text file, (numbers.txt), containing a large (N <= 1000) number of integers. The integers range in value from 0 to 100. The text file has been created with one value on each line. Due to the potential for the sum of the numbers to be very large, you should use a long integer in your calculation to find the average. The number of integers in the file is unknown. Your program must find the average, standard deviation, and mode of the list of numbers. The mode is defined as the value(s) present with the highest frequency. Calculating the standard deviation consists of the following steps: Find the average of the list of numbers. Determine the difference of each number from the average, and square each difference. Sum all the differences. Divide this sum by (the number of values - 1). Take the square root of the above division result from step c. Example, given this list of numbers: 7 4 5 9 10 The average = 7 Sum of square of differences: For a normal distribution, 68.3% of the data will lie within one standard deviation of the average, while 95.4% will lie within two standard deviations. Assignment: Your program should print out the average, standard deviation, and mode of the data in numbers.txt. Format the real numbers to print with 2 decimal places. Your program must utilize proper modular design and parameter passing. Turn in your source code and run output. LAB ASSIGNMENT A15.3 page 11 of 11 Statistics Background: Your instructor will provide you with a text file, (numbers.txt), containing a large (N <= 1000) number of integers. The integers range in value from 0 to 100. The text file has been created with one value on each line. Due to the potential for the sum of the numbers to be very large, you should use a long integer in your calculation to find the average. The number of integers in the file is unknown. Your program must find the average, standard deviation, and mode of the list of numbers. The mode is defined as the value(s) present with the highest frequency. Calculating the standard deviation consists of the following steps: Find the average of the list of numbers. Determine the difference of each number from the average, and square each difference. Sum all the differences. Divide this sum by (the number of values - 1). Take the square root of the above division result from step c. Example, given this list of numbers: 7 4 5 9 10 The average = 7 Sum of square of differences: For a normal distribution, 68.3% of the data will lie within one standard deviation of the average, while 95.4% will lie within two standard deviations. Assignment: Your program should print out the average, standard deviation, and mode of the data in numbers.txt. Format the real numbers to print with 2 decimal places. Your program must utilize proper modular design and parameter passing. Turn in your source code and run output. -
Math/Stats: help me analyze a data set and determine the values that created it
[Q & A] (Ask MetaFilter)Mathematics / Statistics Filter: I have some pairs of numbers that are the result of a process. Given just that data set, and a rule that relates them, can you determine the integer values that could have resulted in those sets? Apologies for the phrasing of the FPP -- I know it doesn't make much sense. Hopefully some mathematics / statistics types will click through and see this longer version. I have some sets of numbers, shown below. I'm trying to reverse engineer the numbers that co ...
Mathematics / Statistics Filter: I have some pairs of numbers that are the result of a process. Given just that data set, and a rule that relates them, can you determine the integer values that could have resulted in those sets?
Apologies for the phrasing of the FPP -- I know it doesn't make much sense. Hopefully some mathematics / statistics types will click through and see this longer version.
I have some sets of numbers, shown below. I'm trying to reverse engineer the numbers that could have resulted in these sets, based on some known mathematical relationships between them.
In general, a given Device(n) consumes a resource in integer Quantities at a floating point Rate(n), resulting in a total Cost for that consumption run. What I have is pairs of Device/Cost, for several different Device for several runs each, and I'm trying to determine the floating point Rate. The Rate is constant for a given Device. The Quantity consumed is different for each run, but the one key here is that I know that the Quantity values are integers.
So for a given Device run, we have:
Quantity(int) x Rate(float) = Cost(float)
All I have is Cost data for each Device, but I have multiple sets of these and am hoping there's some sort of numeric analysis that can tell me the likely Quantity values that fit.
Here's a sample of the data:
Device / Cost
Device1 / 1235
Device1 / 988
Device1 / 1003
Device1 / 1526
Device2 / 3652
Device2 / 1207
Device2 / 1729
Device2 / 518
Device3 / 745
Device3 / 2115
Device3 / 1415
Device3 / 334
So, for example, using the Device1 / 988 and Device1 / 1003 set, I could eyeball it and see that the Cost difference of 15 is due to 1 unit of Quantity difference in the runs. Thus the first run consumed 66 x 14.97 = 988 and the second run consumed 67 x 14.97 = 1003 . (Alas, the Rate values should be more in the 30-50 range, so 14.97 doesn't make much sense) But I'm hoping that with a larger population of data, there's some analysis I can do that will give a more confident answer.
Perhaps this can even be solved without ensuring that the Quantity values are integers, but it's a constraint that the data is supposed to have so I thought I'd mention it.
I'll monitor this thread for the next couple hours to answer any questions. And will add better tags! I used the science category because this seems like the kind of math that a lab scientist might be familiar with, trying to analyze a data set to work out the conditions that created it. I'm especially hoping for a statistic analysis that produces some sort of confidence measure, because a couple of these data points might be outliers, screwing up what might otherwise be a closed solution.
Note: this is not homework filter, or even do-my-job filter. It's just something I'm trying to reverse engineer. -
ロードアベレージとは何か
[Corporate Blogs, Enterprise, RIA (Rich Internet Apps)] (Sun Bloggers)はじめに 今回は UNIX で伝統的にシステム負荷の指標とされて来たロードアベレージの実装についてご紹介したいと思います。これまでロードアベレージは実行中および実行待ちのスレッドの数を集計した数値と言われており、過去の Solaris でもその様に実装されていました。しかし Solaris 10 以降はロードアベレージの内部実装が変更され、スレッド数をカウントする方� ...
はじめに
今回は UNIX で伝統的にシステム負荷の指標とされて来たロードアベレージの実装についてご紹介したいと思います。これまでロードアベレージは実行中および実行待ちのスレッドの数を集計した数値と言われており、過去の Solaris でもその様に実装されていました。しかし Solaris 10 以降はロードアベレージの内部実装が変更され、スレッド数をカウントする方式から CPU の処理時間とスレッドの実行待ち時間を加算する方式に変わっています(計算方法が変わっただけで、計算結果としてのロードアベレージの値が変わった訳ではありません)。では、具体的にどの様に変更されたのか見て行きましょう。
ロードアベレージが使用されている例
まずロードアベレージが使われている場所を見てみましょう。Solaris ではロードアベレージは prstat, uptime, w などのコマンドで使用されています。ロードアベレージは 3 つの数値で表現されており、それぞれ直近の 1 分間、5 分間、15 分間のシステム負荷の平均値を大まかに反映しています。
prstat の load averages
一番下の行の "load averages:" に続く 3 つの数値が、左から 1 分、5 分、15 分のロードアベレージを表しています。
% prstat -n 3 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 10277 root 8056K 3944K sleep 59 0 10:06:12 0.1% named/5 25401 dh129379 3432K 2944K cpu1 49 0 0:00:00 0.1% prstat/1 25348 root 8672K 2624K sleep 59 0 0:00:02 0.0% sshd/1 Total: 109 processes, 308 lwps, load averages: 0.02, 0.02, 0.01uptime の load average
ロードアベレージを見る一番簡単な方法は uptime コマンドでしょう。見方は prstat コマンドの場合と一緒です。
% uptime 11:41am up 218 day(s), 1:07, 17 users, load average: 0.02, 0.02, 0.01
w の load average
w コマンドでもロードアベレージは表示されます。
% w 3:31pm up 221 day(s), 4:56, 16 users, load average: 0.02, 0.02, 0.02 User tty login@ idle JCPU PCPU what ...
ロードアベレージの読み方
シングルスレッドのプログラムが 1CPU を占有し続けるとロードアベレージの数値は次第に 1 に近付いて行きます。2 スレッドで 2CPU を占有し続けるとロードアベレージは最終的に 2 になります。また、1CPU しかないマシンで 1 スレッドが 1CPU を占有し、もう 1 スレッドが実行待ちキューに入っている状態が続く場合もロードアベレージは 2 になります。この様に、ロードアベレージは 1 CPU で処理可能な量を 1 とした場合の CPU リソースの要求量を反映した値です。
一般に、ロードアベレージの値が CPU 数より多い場合は負荷が高い状態と言われています。Solaris 9 までのロードアベレージはスケジューラのキューに入っているカーネルスレッドの数を元にして計算されていますが、キューの中に CPU の数よりも多いスレッドが入っていた場合は処理待ちが発生しているので負荷が高いと看做せる、というのがその理由です。Solaris 10 以降で言えば、実行中のスレッドが使用した時間に加えて実行待ちをしているスレッドの待ち時間が多ければ負荷が高いという事になります。
ロードアベレージはその名の通り負荷の平均値情報であり、短期的な負荷のピークやある時間の正確な負荷の情報を取り出す事は出来ません。Solaris では mpstat コマンドや vmstat コマンドを使用して、より精度の高いシステム負荷情報を取得する事が可能です。また後述する通り DTrace を使ってロードアベレージの元となるデータを取得する事も可能です。
ロードアベレージを取得する為のプログラミングインターフェイス
uptime 等のコマンドからではなく、C 言語等で書いたプログラムから直接ロードアベレージ情報を取得する事も可能です。
getloadavg(3C)
libc には getloadavg(3C) 関数 が用意されています。後述する通り getloadavg(3C) はシステムコールを呼び出しています。
ロードアベレージをダンプするプログラム
#include <sys/loadavg.h> int main() { double loadavg[LOADAVG_NSTATS]; getloadavg(loadavg, LOADAVG_NSTATS); printf(" 1MIN : %f\n 5MIN : %f\n 15MIN : %f\n\n", loadavg[0], loadavg[1], loadavg[2]); }使用方法は以下の通りです。
# ./loadavg 1MIN : 1.968750 5MIN : 1.089844 15MIN : 0.460938 1MIN : 1.968750 5MIN : 1.093750 15MIN : 0.460938 ...
pset_getloadavg(3C)
プロセッサセット毎のロードアベレージは pset_getloadavg(3C) 関数 で取得します。
ロードアベレージの実装
ロードアベレージの値がどの様に算出されているのか、先ほど使用した getloadavg(3C) 関数を起点として実装を見て行きたいと思います。
loadavg.h
getloadavg(3C) を使用する際にインクルードする loadavg.h には LOADAVG_1MIN, LOADAVG_5MIN, LOADAVG_15MIN の 3 つのマクロが設定されており、ロードアベレージの数値が 1 分、5 分、15 分の 3 種類ある事が分かります。
#define LOADAVG_1MIN 0 #define LOADAVG_5MIN 1 #define LOADAVG_15MIN 2 #define LOADAVG_NSTATS 3
getloadavg(3C)
続いて getloadavg(3C) の実装を見てみます。getloadavg(3C) の コード は 20 行ほどの簡単な物です。データ格納用のバッファを用意して __getloadavg() を呼び出し、その結果を浮動小数点数に直し、呼び出し元に返しています。なお FSCALE は (1 << 8) == 256 です。元の数値が下位 8bit 小数部の固定小数点数である為、256 で割って浮動小数点数にしています。
36 /* 37 * getloadavg -- get the time averaged run queues from the system 38 */ 39 int 40 getloadavg(double loadavg[], int nelem) 41 { 42 extern int __getloadavg(int *buf, int nelem); 43 44 int i, buf[LOADAVG_NSTATS]; 45 46 if (nelem > LOADAVG_NSTATS) 47 nelem = LOADAVG_NSTATS; 48 49 if ((nelem = __getloadavg(buf, nelem)) == -1) 50 return (-1); 51 52 for (i = 0; i < nelem; i++) 53 loadavg[i] = (double)buf[i] / FSCALE; 54 55 return (nelem); 56 }__getloadavg()
getloadavg(3C) から呼び出されている __getloadavg() の実装は ここ にあります。__getloadavg をラップしている SYSCALL2_RVAL1 が展開されると __getloadagv() になります。
34 SYSCALL2_RVAL1(__getloadavg,getloadavg) 35 RET 36 SET_SIZE(__getloadavg)SYSCALL2_RVAL1
__getloadavg をラップしている SYSCALL2_RVAL1 は ここ にあります。このマクロが展開されると L192 は ENTRY(__getloadavg) になり、__getloadavg() のエントリポイントが作成されます。__getloadavg() の実装は SYSTRAP_RVAL1 の方です。
191 #define SYSCALL2_RVAL1(entryname, trapname) \ 192 ENTRY(entryname); \ 193 SYSTRAP_RVAL1(trapname); \ 194 SYSCERRORSYSTRAP_RVAL1
SYSTRAP_RVAL1 は ここ にあります。引数はそのままで __SYSTRAP に置き換えられているだけです。
75 #define SYSTRAP_RVAL1(name) __SYSTRAP(name)__SYSTRAP
__SYSTRAP は ここ にあります。ここで SYS_/**/name は SYS_getloadavg に展開されます。mov SYS_getloadavg, %g1; でシステムコール番号を設定し、ta SYSCALL_TRAPNUM でシステムコールを呼び出します (http://docs.sun.com/app/docs/doc/816-1681/6m83631kj) 。
70 #define __SYSTRAP(name) \ 71 /* CSTYLED */ \ 72 mov SYS_/**/name, %g1; \ 73 ta SYSCALL_TRAPNUMSYS_getloadavg
SYS_getloadavg は ここ にあります。システムコールの 105 番が SYS_getloadavg です。先ほどの __SYSTRAP のマクロではシステムコールの番号に 105 番を指定してシステムコールを発生させていた事が分かります。
289 #define SYS_getloadavg 105システムコール
getloadavg() システムコールが呼び出された時に実行されるカーネル側の関数の実装は ここ にあります。中で行っている処理は avenrun[] からデータをコピーして返しているだけです。
36 /* 37 * Extract elements of the raw avenrun array from the kernel for the 38 * implementation of getloadavg(3c) 39 */ 40 int 41 getloadavg(int *buf, int nelem) 42 { 43 int *loadbuf = &avenrun;[0]; 44 int loadavg[LOADAVG_NSTATS]; 45 int error; 46 47 if (nelem < 0) 48 return (set_errno(EINVAL)); 49 if (nelem > LOADAVG_NSTATS) 50 nelem = LOADAVG_NSTATS; 51 52 if (!INGLOBALZONE(curproc)) { 53 mutex_enter(&cpu;_lock); 54 if (pool_pset_enabled()) { 55 psetid_t psetid = zone_pset_get(curproc->p_zone); 56 57 error = cpupart_get_loadavg(psetid, &loadavg;[0], nelem); 58 ASSERT(error == 0); /* pset isn't going anywhere */ 59 loadbuf = &loadavg;[0]; 60 } 61 mutex_exit(&cpu;_lock); 62 } 63 64 error = copyout(loadbuf, buf, nelem * sizeof (avenrun[0])); 65 if (error) 66 return (set_errno(EFAULT)); 67 return (nelem); 68 }avenrun[]
getloadavg() システムコールの中で参照されていた avenrun[] の実体は clock.c にあります。avenrun[] は int 型の配列です。コメントには実行キューの長さの平均と書かれています。
96 int avenrun[3]; /* FSCALED average run queue lengths */avenrun[] の値は clock.c の中で hp_avenrun[] 変数の値をシフトして求められています。hp_avenrun[] は高精度のロードアベレージ情報を格納した変数で、calcloadavg() 関数により更新されています。calcloadavg() 関数は genloadavg() 関数の結果と hp_avenrun[] の現在値を元に hp_avenrun[] 変数を更新しており、genloadavg() 関数は loadavg 変数を元に処理を行っています。loadavg 変数は直近の負荷情報を保持している変数です。loadavg 変数 -> genloadavg() 関数 -> calcloadavg() 関数 <-> hp_avenrun[] 変数 -> avenrun[] 変数の順にデータが加工されています。
811 calcloadavg(genloadavg(&loadavg;), hp_avenrun); 812 for (i = 0; i < 3; i++) 813 /* 814 * At the moment avenrun[] can only hold 31 815 * bits of load average as it is a signed 816 * int in the API. We need to ensure that 817 * hp_avenrun[i] >> (16 - FSHIFT) will not be 818 * too large. If it is, we put the largest value 819 * that we can use into avenrun[i]. This is 820 * kludgey, but about all we can do until we 821 * avenrun[] is declared as an array of uint64[] 822 */ 823 if (hp_avenrun[i] < ((uint64_t)1<<(31+16-FSHIFT))) 824 avenrun[i] = (int32_t)(hp_avenrun[i] >> 825 (16 - FSHIFT)); 826 else 827 avenrun[i] = 0x7fffffff;avenrun は <sys/systm.h> を通してインクルードされます。
105 extern int avenrun[]; /* array of load averages */clock.c
clock.c はクロックルーチンです。一定の時間毎に呼び出されます。clock.c にある genloadavg(), loadavg_update(), calcloadavg() の 3 つの関数で loadavg が計算されています。
loadavg 変数
loadavg は <sys/cpuvar.h> に定義されている loadavg_s 構造体型の変数です。loadavg_s 構造体の lg_loads[] はリングバッファになっています。S_LOADAVG_SZ がリングバッファのサイズを表しており、その数は 11 です。lg_cur はリングバッファの中の現在位置を指す変数です。要素の数が 11 個ですので、lg_cur の取りうる値は 0 - 10 の間になります。lg_len はリングバッファの内、データが入力されている要素数を表しています。lg_loads[] は loadavg_update() 関数により、毎秒一要素ずつ書き換えられます。lg_cur も loadavg_update() 関数の中で更新されています。
56 #define S_LOADAVG_SZ 11 57 #define S_MOVAVG_SZ 10 58 59 struct loadavg_s { 60 int lg_cur; /* current loadavg entry */ 61 unsigned int lg_len; /* number entries recorded */ 62 hrtime_t lg_total; /* used to temporarily hold load totals */ 63 hrtime_t lg_loads[S_LOADAVG_SZ]; /* table of recorded entries */ 64 };loadavg_update() 関数
loadavg_update() 関数は clock() ルーチンから毎秒一回呼び出されて loadavg 変数を更新する関数です。loadavg_update() 関数は cpu_acct[CMS_USER], cpu_acct[CMS_SYSTEM], cpu_waitrq の値を使用してロードアベレージを計算します。cpu_acct[CMS_USER] はユーザ空間で消費された CPU 時間、cpu_acct[CMS_SYSTEM] はカーネル空間で消費された CPU 時間で、この 2 つを合わせると実行時間になります。cpu_waitrq は CPU 毎の実行キュー (run queue => rq) の待ち (wait) 時間です。従来の実行スレッド数 + 実行待ちスレッド数という計算式を使用する代わりに、実行時間 + 実行待ち時間を使用しているということになります。
ロードアベレージを計算する為のデータソースはこの 3 つの数値が全てです。DTrace 等で cpu_acct[CMS_USER], cpu_acct[CMS_SYSTEM], cpu_waitrq の値を取得すれば、自分でロードアベレージを計算する事も可能です。
947 static void 948 loadavg_update() 949 { 950 cpu_t *cp; 951 cpupart_t *cpupart; 952 hrtime_t cpu_total; 953 int prev; 954 955 cp = cpu_list; 956 loadavg.lg_total = 0; 957 958 /* 959 * first pass totals up per-cpu statistics for system and cpu 960 * partitions 961 */ 962 963 do { 964 struct loadavg_s *lavg; 965 966 lavg = &cp-;>cpu_loadavg; 967 968 cpu_total = cp->cpu_acct[CMS_USER] + /* ここがポイント */ 969 cp->cpu_acct[CMS_SYSTEM] + cp->cpu_waitrq; /* ここがポイント */ 970 /* compute delta against last total */ 971 scalehrtime(&cpu;_total); 972 prev = (lavg->lg_cur - 1) >= 0 ? lavg->lg_cur - 1 : 973 S_LOADAVG_SZ + (lavg->lg_cur - 1); 974 if (lavg->lg_loads[prev] <= 0) { 975 lavg->lg_loads[lavg->lg_cur] = cpu_total; 976 cpu_total = 0; 977 } else { 978 lavg->lg_loads[lavg->lg_cur] = cpu_total; 979 cpu_total = cpu_total - lavg->lg_loads[prev]; 980 if (cpu_total < 0) 981 cpu_total = 0; 982 } 983 984 lavg->lg_cur = (lavg->lg_cur + 1) % S_LOADAVG_SZ; 985 lavg->lg_len = (lavg->lg_len + 1) < S_LOADAVG_SZ ? 986 lavg->lg_len + 1 : S_LOADAVG_SZ; 987 988 loadavg.lg_total += cpu_total; 989 cp->cpu_part->cp_loadavg.lg_total += cpu_total; 990 991 } while ((cp = cp->cpu_next) != cpu_list); 992 993 loadavg.lg_loads[loadavg.lg_cur] = loadavg.lg_total; 994 loadavg.lg_cur = (loadavg.lg_cur + 1) % S_LOADAVG_SZ; 995 loadavg.lg_len = (loadavg.lg_len + 1) < S_LOADAVG_SZ ? 996 loadavg.lg_len + 1 : S_LOADAVG_SZ; 997 /* 998 * Second pass updates counts 999 */ 1000 cpupart = cp_list_head; 1001 1002 do { 1003 struct loadavg_s *lavg; 1004 1005 lavg = &cpupart-;>cp_loadavg; 1006 lavg->lg_loads[lavg->lg_cur] = lavg->lg_total; 1007 lavg->lg_total = 0; 1008 lavg->lg_cur = (lavg->lg_cur + 1) % S_LOADAVG_SZ; 1009 lavg->lg_len = (lavg->lg_len + 1) < S_LOADAVG_SZ ? 1010 lavg->lg_len + 1 : S_LOADAVG_SZ; 1011 1012 } while ((cpupart = cpupart->cp_next) != cp_list_head); 1013 1014 }cpu_acct[] と cpu_waitrq は共に hrtime_t 型の変数で、経過時間を保持しています。どちらも cpuvar.h に定義されています。
193 volatile hrtime_t cpu_mstate_start; /* cpu microstate start time */ 194 volatile hrtime_t cpu_acct[NCMSTATES]; /* cpu microstate data */ 195 hrtime_t cpu_intracct[NCMSTATES]; /* interrupt mstate data */ 196 hrtime_t cpu_waitrq; /* cpu run-queue wait time */ 197 struct loadavg_s cpu_loadavg; /* loadavg info for this cpu */genloadavg() 関数
loadavg_update() 関数によって更新された loadavg 変数を使用しているのは、以下の genloadavg() 関数です。genloadavg() 関数は、loadavg 変数から過去 10 秒間の lg_loads[] の値の平均を取り、ナノ秒単位を秒単位に直した値を返します。ここで見て頂きたいのは genloadavg() 関数はロードアベレージに対して新たなデータを加えていないという事です。genloadavg() 関数は loadavg_update() 関数が用意したデータのみを使用しています。
905 /* 906 * Called before calcloadavg to get 10-sec moving loadavg together 907 */ 908 909 static int 910 genloadavg(struct loadavg_s *avgs) 911 { 912 int avg; 913 int spos; /* starting position */ 914 int cpos; /* moving current position */ 915 int i; 916 int slen; 917 hrtime_t hr_avg; 918 919 /* 10-second snapshot, calculate first positon */ 920 if (avgs->lg_len == 0) { 921 return (0); 922 } 923 slen = avgs->lg_len < S_MOVAVG_SZ ? avgs->lg_len : S_MOVAVG_SZ; /* リングバッッファの長さが 10 以上なら 10 個のサンプルを採る */ 924 925 spos = (avgs->lg_cur - 1) >= 0 ? avgs->lg_cur - 1 : 926 S_LOADAVG_SZ + (avgs->lg_cur - 1); /* 一つ前のリングバッファの位置 */ 927 for (i = hr_avg = 0; i < slen; i++) { 928 cpos = (spos - i) >= 0 ? spos - i : S_LOADAVG_SZ + (spos - i); 929 hr_avg += avgs->lg_loads[cpos]; /* slen の数だけ以前の要素を足し合わせる */ 930 } 931 932 hr_avg = hr_avg / slen; /* 足し合わせた値を要素数で割る */ 933 avg = hr_avg / (NANOSEC / LGRP_LOADAVG_IN_THREAD_MAX); /* 秒単位に直して、128 を掛ける */ 934 935 return (avg); 936 }なお、NANOSEC は 1000000000 で LGRP_LOADAVG_IN_THREAD_MAX は 128 です。
237 #define NANOSEC 100000000081 #define LGRP_LOADAVG_IN_THREAD_MAX 128calcloadavg() 関数
genloadavg() が計算したロードアベレージを元に hp_avenrun[] の値を更新する関数が以下の calcloadavg() です。calcloadavg() はロードアベレージに対し指数減衰処理を施しています。引数の nrun はその名の通り、元々は run しているスレッド数と run 可能なスレッド数から計算されていましたが、現在は先ほどの genloadavg() 関数の返り値が渡されています。通常、hp_ave には hp_avenrun[] が渡されます。hp_avenrun[] の値は calcloadavg() 内でのみ更新されており、外部のデータには依存していません。calcloadavg() 関数も(genloadavg() 関数経由で)loadavg_update() 関数が用意したデータのみを使用しています。
2090 static void 2091 calcloadavg(int nrun, uint64_t *hp_ave) 2092 { 2093 static int64_t f[3] = { 135, 27, 9 }; 2094 uint_t i; 2095 int64_t q, r; 2096 2097 /* 2098 * Compute load average over the last 1, 5, and 15 minutes 2099 * (60, 300, and 900 seconds). The constants in f[3] are for 2100 * exponential decay: 2101 * (1 - exp(-1/60)) << 13 = 135, 2102 * (1 - exp(-1/300)) << 13 = 27, 2103 * (1 - exp(-1/900)) << 13 = 9. 2104 */ 2105 2106 /* 2107 * a little hoop-jumping to avoid integer overflow 2108 */ 2109 for (i = 0; i < 3; i++) { 2110 q = (hp_ave[i] >> 16) << 7; 2111 r = (hp_ave[i] & 0xffff) << 7; 2112 hp_ave[i] += ((nrun - q) * f[i] - ((r * f[i]) >> 16)) >> 4; 2113 } 2114 }まとめ
ユーザランドからロードアベレージを取得する方法として getloadavg() 関数が用意されていました。getloadavg() 関数はシステムコールを経由してカーネル内の avenrun[] 変数をコピーしていました。avenrun[] は loadavg 変数から直近のロードの平均を取り、指数減衰を加味した数値でした。
loadavg 変数は loadavg_update() 関数が CPU の統計情報を利用して更新していました。loadavg の計算の途中には現在の実行スレッド数や実行待ちスレッド数についての情報は登場しません。代わりに CPU の処理時間と処理待ち時間を使用していました。
さいごに
以上、システム全体のロードアベレージが CPU の統計情報を元に計算されている事をご覧頂きました。Solaris 10 以降はロードアベレージの計算に実行スレッドの個数は使用されていません。何故このような実装に変わったかというと、システム負荷を計算する場合にスレッド数を数えるよりも CPU の情報を元にした方がサンプリング誤差に強く、正確な情報が得られるからです。他の OS でも Solaris の様に CPU のマイクロアカウント情報が整備されており、常に利用可能な状態になっているのであれば、ロードアベレージの計算にスレッド数を使用する必要はないと思います。なお、この変更によりロードアベレージの取る値が変わったという訳ではありません。値の読み方は基本的には以前と変わりません。また、単に実行待ちのスレッド数を見たい場合は vmstat の r を使用する事が可能です。
補足情報
検証用プログラム
ロードアベレージに関連する変数を全てダンプする DTrace スクリプト
以下のスクリプトを loadavg.d という名前で保存してください。DTrace はとても便利ですね。
#!/usr/sbin/dtrace -qs fbt:genunix:loadavg_update:return { printf("time : %d\n", timestamp / 1000000000); printf("\n"); printf(" avenrun[0] : %d\n", `avenrun[0]); printf(" avenrun[1] : %d\n", `avenrun[1]); printf(" avenrun[2] : %d\n", `avenrun[2]); printf("\n"); printf(" hp_avenrun[0] : %d\n", `hp_avenrun[0]); printf(" hp_avenrun[1] : %d\n", `hp_avenrun[1]); printf(" hp_avenrun[2] : %d\n", `hp_avenrun[2]); printf("\n"); printf(" lg_cur : %d\n", `loadavg.lg_cur); printf(" lg_len : %d\n", `loadavg.lg_len); printf(" lg_total : %d\n", `loadavg.lg_total); printf(" lg_loads[0] : %d\n", `loadavg.lg_loads[0]); printf(" lg_loads[1] : %d\n", `loadavg.lg_loads[1]); printf(" lg_loads[2] : %d\n", `loadavg.lg_loads[2]); printf(" lg_loads[3] : %d\n", `loadavg.lg_loads[3]); printf(" lg_loads[4] : %d\n", `loadavg.lg_loads[4]); printf(" lg_loads[5] : %d\n", `loadavg.lg_loads[5]); printf(" lg_loads[6] : %d\n", `loadavg.lg_loads[6]); printf(" lg_loads[7] : %d\n", `loadavg.lg_loads[7]); printf(" lg_loads[8] : %d\n", `loadavg.lg_loads[8]); printf(" lg_loads[9] : %d\n", `loadavg.lg_loads[9]); printf(" lg_loads[10] : %d\n", `loadavg.lg_loads[10]); printf("\n"); printf("\n"); }使い方
# ./loadavg.d time : 1208341 avenrun[0] : 4 avenrun[1] : 4 avenrun[2] : 2 hp_avenrun[0] : 1172 hp_avenrun[1] : 1113 hp_avenrun[2] : 691 lg_cur : 7 lg_len : 11 lg_total : 334519769 lg_loads[0] : 46801985 lg_loads[1] : 60175047 lg_loads[2] : 60416778 lg_loads[3] : 60746759 lg_loads[4] : 64462319 lg_loads[5] : 34230600 lg_loads[6] : 334519769 lg_loads[7] : 2024692 lg_loads[8] : 1697770 lg_loads[9] : 6010137 lg_loads[10] : 1915265 ...cpu_acct[] と cpu_waitrq をダンプするシェルスクリプト
以下のシェルスクリプトを cpu_acct.sh という名前で保存してください。これも DTrace を利用しています。
#!/bin/sh dscript=`psrinfo | awk '{print $1}' | while read i do /usr/ucb/echo ' tick-1sec { ncpu='$i'; printf("%d\t\t%u\t%u\t%u\t%u\n", ncpu,
ロードアベレージとは何か
[Corporate Blogs, Enterprise, RIA (Rich Internet Apps)] (Sun Bloggers)はじめに 今回は UNIX で伝統的にシステム負荷の指標とされて来たロードアベレージの実装についてご紹介したいと思います。これまでロードアベレージは実行中および実行待ちのスレッドの数を集計した数値と言われており、過去の Solaris でもその様に実装されていました。しかし Solaris 10 以降はロードアベレージの内部実装が変更され、スレッド数をカウントする方� ...
cpu[ncpu]->cpu_acct[0],
ロードアベレージとは何か
[Corporate Blogs, Enterprise, RIA (Rich Internet Apps)] (Sun Bloggers)はじめに 今回は UNIX で伝統的にシステム負荷の指標とされて来たロードアベレージの実装についてご紹介したいと思います。これまでロードアベレージは実行中および実行待ちのスレッドの数を集計した数値と言われており、過去の Solaris でもその様に実装されていました。しかし Solaris 10 以降はロードアベレージの内部実装が変更され、スレッド数をカウントする方� ...
cpu[ncpu]->cpu_acct[1],
ロードアベレージとは何か
[Corporate Blogs, Enterprise, RIA (Rich Internet Apps)] (Sun Bloggers)はじめに 今回は UNIX で伝統的にシステム負荷の指標とされて来たロードアベレージの実装についてご紹介したいと思います。これまでロードアベレージは実行中および実行待ちのスレッドの数を集計した数値と言われており、過去の Solaris でもその様に実装されていました。しかし Solaris 10 以降はロードアベレージの内部実装が変更され、スレッド数をカウントする方� ...
cpu[ncpu]->cpu_acct[2],
ロードアベレージとは何か
[Corporate Blogs, Enterprise, RIA (Rich Internet Apps)] (Sun Bloggers)はじめに 今回は UNIX で伝統的にシステム負荷の指標とされて来たロードアベレージの実装についてご紹介したいと思います。これまでロードアベレージは実行中および実行待ちのスレッドの数を集計した数値と言われており、過去の Solaris でもその様に実装されていました。しかし Solaris 10 以降はロードアベレージの内部実装が変更され、スレッド数をカウントする方� ...
cpu[ncpu]->cpu_waitrq) }' done` header='tick-1sec{printf("\nCPU\t\tUSER\t\tSYSTEM\t\tIDLE\t\t\tWaitRunQueue\n")}' dtrace -qn "${header} ${dscript}"使い方
# ./cpu_acct.sh CPU USER SYSTEM IDLE WaitRunQueue 0 393058730424 3505065319348 6225195707147196 1928374528 1 675458639209 37806347146860 6190196714476028 23251910602 2 295121491846 636804044408 6227742143570932 214512326509 3 134573272271 163389092029 6228370750871859 52309235982 4 114391873741 4868500694500 6223678918587263 4457218207 5 71058590513 31310646602 6228554702708875 4141334771 6 464954088771 142789925351 6227999559480501 8135320823 7 398981333884 97614493289 6228154535173466 8623139580 ^C
ロードアベレージのシミュレータ
以下のコードの decay と n に適当な数値を設定するとロードアベレージの推移をシミュレートする事が出来ます。decay は減衰処理用の定数です。取りうる値は 135, 27, 9 の何れかで、135 は 1 分間のロードアベレージ用、27 は 5 分間、9 は 15 分間のロードアベレージ用の定数です。n には実行中および実行待ちのスレッド数を指定します。そのままコンパイルするだけでも使用可能ですが、一々コンパイルし直すのが面倒な場合は decay や n の指定をコマンドの引数にする等、適宜書き換えて使用してください。
#include <unistd.h> int main() { int decay = 135; // decay : exponential decay, {135=1min, 27=5min, 9=15min} int n = 2; // n : number of running + runnable threads int i, q, r; uint64_t lavg = 0; n *= (1 << 7); // LGRP_LOADAVG_IN_THREAD_MAX for(i = 0; i < 1500; i++) { // i : duration, {60=1min, 300=5min, 900=15min} q = (lavg >> 16) << 7; r = (lavg & 0xffff) << 7; lavg += ((n - q) * decay - ((r * decay) >> 16)) >> 4; printf("%d:\t%f\n", i, (double)lavg / (1 << 16)); } }ロードアベレージに関連するデータ
ロードアベレージその物ではありませんが、Solaris にはロードアベレージに関連するデータが幾つか用意されています。
kstat の avenrun
avenrun_* はロードアベレージの元になっているカーネル内のデータです。この値を 256 で割った結果がロードアベレージになります。pset はプロセッサセット毎のロードアベレージ、system_misc はシステム全体のロードアベレージです。
% kstat -p | grep avenrun unix:0:pset:avenrun_15min 3 unix:0:pset:avenrun_1min 5 unix:0:pset:avenrun_5min 5 unix:0:system_misc:avenrun_15min 3 unix:0:system_misc:avenrun_1min 5 unix:0:system_misc:avenrun_5min 5
256 で割るのは getloadavg() の中の FSCALE を反映するためです。小数部 8bit の固定小数点数を浮動小数点数に直しています。
vmstat の run queue
vmstat コマンドの出力の kthr の r は run queue に実行待ちのカーネルスレッドが幾つ入っているかを示してします。ロードアベレージとは異なりますが、スレッドのスケジューリングに着目して負荷を判断する場合は、こちらの方がより直接的な指標となります。
% vmstat 1 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr 1m 1m 1m lf in sy cs us sy id 0 0 0 7687208 6172792 9 10 349 7 7 0 0 7 7 14 2 377 517 366 2 2 97 0 0 0 7314560 5881512 0 18 170 0 0 0 0 0 0 0 0 289 363 249 0 0 100 0 0 0 7314560 5881576 2 2 0 0 0 0 0 0 0 0 0 233 119 152 0 1 99 0 0 0 7314560 5881576 2 2 0 0 0 0 0 0 0 0 0 246 169 176 0 0 100
getloadavg(3C) のマニュアル
Solaris 10 の getloadavg(3C) のマニュアルには以下の様に書いてあります。ロードアベレージが取る値が変わった訳ではなく、計算方法が変わっただけなので、このままでも間違いではありませんが、実行時間と実行待ち時間から実行スレッド数と実行待ちスレッド数に換算された数値であると考えると良いかもしれません。
DESCRIPTION The getloadavg() function returns the number of processes in the system run queue averaged over various periods of time.参考資料
- http://docs.sun.com/app/docs/doc/817-0547/eyhuu
- http://perfcap.blogspot.com/2007/04/load-average-differences-between.html
- http://www.princeton.edu/~unix/Solaris/troubleshoot/cpuload.html
- http://www.runningunix.com/2009/01/what-is-load-average-in-solaris/
- http://members3.jcom.home.ne.jp/katsumi.nuruki/
- http://www.teamquest.com/resources/gunther/display/5/
Armadillo C++ Library 0.8.0
[Open Source] (Open Source Pixels)Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, ...
Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, [...]
Armadillo C++ Library 0.8.0
[Tech, Linux, Shareware] (freshmeat.net Releases)Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, based on template meta-programming, is used (during compile time) to combine several operations into one and reduce or elim ...
Armadillo is a C++ linear algebra library (matrix maths) aiming towards a good balance between speed and ease of use. Integer, floating point, and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. A delayed evaluation approach, based on template meta-programming, is used (during compile time) to combine several operations into one and reduce or eliminate the need for temporaries.Changes: This release added several new functions such as pseudo-inverse. A class for on-the-fly statistics of vectors was also added. There are improvements and bugfixes in handling of submatrix views as well as speedups for some compound expressions.
Release Tags: major enhancements, Major bugfixes
Tags: Scientific/Engineering, Mathematics, Software Development, Libraries, machine learning
Licenses: LGPL
The Table protobuf message format
[Programming] (Planet MySQL)If you’ve ever opened up drizzled/message/table.proto in the Drizzle source tree you will have seen what’s in the table message: the structure that describes a database table in Drizzle. Previously I’ve talked about the Table message more generally, giving a fair bit of history of the FRM file and how we’ve replaced it with both the Table protobuf message and an infrastructure inside Drizzle so that Storage Engines own their own metadata. Yesterday I talked about the Sche ...
If you’ve ever opened up drizzled/message/table.proto in the Drizzle source tree you will have seen what’s in the table message: the structure that describes a database table in Drizzle. Previously I’ve talked about the Table message more generally, giving a fair bit of history of the FRM file and how we’ve replaced it with both the Table protobuf message and an infrastructure inside Drizzle so that Storage Engines own their own metadata. Yesterday I talked about the Schema protobuf message format in more detail, and this time I’m talking about the Table protobuf message in a similar amount. The first time we were loading (then only part of) the table definition out of a protobuf message was way back in January 2009 (I blogged about it too). It was an adventure untangling all sorts of things to get to a much nicer place (where we are now). The code in the server is not perfect… I’ll be the first to admit that some of it is rather strange, but that’s mostly all behind the scenes for people interested in the protobuf Table message! The Table message has several embedded messages in it too. We need to have information on the Storage Engine, Fields and Indexes (and each of those can have other properties). It is much more complex than the simple Schema message. Let’s have a look at the basic structure of the Table message: message Table { /* *SNIP* (Here goes the definitions for TableType, StorageEngine, Field, Index, ForeignKeyConstrain, TableOptions and TableStats) */ required string name = 1; required TableType type = 5; required StorageEngine engine = 2; repeated Field field = 3; repeated Index indexes = 4; repeated ForeignKeyConstraint fk_constraint = 8; optional TableOptions options = 9; optional TableStats stats = 10; } (We’ve skipped the definitions for the embedded messages for now) This seems all pretty logical; a table has a name, a type, is in a Storage Engine, has Fields, may have Indexes, may have foreign key constraints, it has some options and statistics (the statistics may go away at some point “soon”). Let’s have a look at the TableType message definition: enum TableType { STANDARD = 0; TEMPORARY = 1; INTERNAL = 2; } It’s pretty simple, the table type is either a standard table (what you get from CREATE TABLE), a temporary table (what you get from CREATE TEMPORARY TABLE) or an INTERNAL table (what you get when Drizzle uses a temporary table during query execution). Next, the StorageEngine message: message StorageEngine { message EngineOption { enum EngineOptionType { BOOL = 0; INTEGER = 1; STRING = 2; } required string option_name = 1; required string option_value = 2; required EngineOptionType option_type = 3; } required string name = 1; repeated EngineOption option = 2; } The main part is the “name” member, which is just the name of the storage engine (e.g. “PBXT”, ”INNODB”, “ARCHIVE”). We do however have support specified in the StorageEngine message for engine specific options (in key value form). Expect these to be used more in the near future. Specifying Fields is probably the most complex part of the table message. The Field message looks like this (with many embedded messages): message Field { required string name = 1; required FieldType type = 2; optional FieldFormatType format = 3; optional FieldOptions options = 4; optional FieldConstraints constraints = 5; optional NumericFieldOptions numeric_options = 6; optional StringFieldOptions string_options = 7; optional string comment = 16; /* Reserve 0-15 for frequently accessed attributes */ optional SetFieldOptions set_options = 17; optional TimestampFieldOptions timestamp_options = 18; } So… what does this all mean? Well, Fields have a type, they’re stored in a format, there’s options attached to them, there may be constraints as well as field type specific options. The different field types should be fairly familiar by now: enum FieldType { DOUBLE = 0; VARCHAR = 1; BLOB = 2; ENUM = 3; INTEGER = 4; BIGINT = 5; DECIMAL = 6; DATE = 7; TIME = 8; TIMESTAMP = 9; DATETIME = 10; } We also allow fields in different formats. Currently, these are default, fixed and dynamic. The idea is you can tell the engine (or the engine can tell you) how it’s storing the field. This is currently here as a nicety and the users for this are few and far between. enum FieldFormatType { DefaultFormat= 0; FixedFormat= 1; DynamicFormat= 2; } The FieldOptions get interesting though: message FieldOptions { optional string default_value = 1; optional string update_value = 2; optional bool default_null = 3 [default = false]; optional bytes default_bin_value = 4; } You’ll no doubt be intrigued by the existence of both “default_value” and “default_bin_value”. Ordinarily, using a string to contain a textual representation of the default value (e.g. “foo” or “42″) is fine. However, for BLOB columns, you can have defaults that aren’t representable in a text string, you need binary data (e.g. the default value contains ‘[___DESCRIPTION___]′). For TIMESTAMP columns, we continue to support DEFAULT NOW() and the ability to update the timestamp column on UPDATE. How is this represented in the table message? Well… default_value will be “NOW()” and update_value will be “NOW()”. It is intended that in the future it will be possible to have arbitrary SQL expressions for these. This does, of course, require support in the Drizzle server. The default_null bool should be rather obvious :) Well… that’s enough for today. Next time: more of the Field message!
An early look at Derby 10.6
[Corporate Blogs, Enterprise, RIA (Rich Internet Apps)] (Sun Bloggers)The Derby community is currently working on the next feature release, Derby 10.6, and a number of new features have already been implemented. If you haven't already done so, now would be a good time to download the development sources and kick the tires on your favourite new feature! Early feedback to the community (either on the mailing lists or in the JIRA bug tracker) increases the likelihood of getting improvements and bug fixes implemented before the final release. If you don't feel like ...
The Derby community is currently working on the next feature release, Derby 10.6, and a number of new features have already been implemented. If you haven't already done so, now would be a good time to download the development sources and kick the tires on your favourite new feature! Early feedback to the community (either on the mailing lists or in the JIRA bug tracker) increases the likelihood of getting improvements and bug fixes implemented before the final release.
If you don't feel like building jar files from the development sources yourself, you can download the binaries used in the nightly regression testing from http://dbtg.foundry.sun.com/derby/bits/trunk/. Make sure that you understand all the warnings about the difference between a production build and a development build before you start using it!
The release page on the wiki lists some of the features that are planned for 10.6. Let's take a quick look at some of those that are mostly done and ready for testing.
Store Java objects in columns
DERBY-651 tracks the ongoing work that enables storing of Java objects directly in the columns of a table. The objects must belong to a class that implements the
java.io.Serializableinterface.For example, if you want to store
java.util.Listobjects in a table, you first need to declare a user-defined type that maps to the Java type, and create a column with that type:stmt.execute("CREATE TYPE JAVA_UTIL_LIST " + "EXTERNAL NAME 'java.util.List' " + "LANGUAGE JAVA"); stmt.execute("CREATE TABLE T(C1 JAVA_UTIL_LIST)");Next, you prepare a statement and use
setObject()to insert the Java object that you want to store:PreparedStatement ps = conn.prepareStatement( "INSERT INTO T(C1) VALUES (?)"); ArrayList lst = new ArrayList(); lst.add("First element"); lst.add("Second element"); ps.setObject(1, lst); ps.execute();Finally, you execute a SELECT query and use
getObject()to restore the object from the database:ResultSet rs = stmt.executeQuery("SELECT C1 FROM T"); while (rs.next()) { List list = (List) rs.getObject(1); System.out.println("Size of list: " + list.size()); }You can also use your object types as arguments to user-defined functions or procedures:
stmt.execute("CREATE FUNCTION LIST_SIZE(LST JAVA_UTIL_LIST) " + "RETURNS INTEGER " + "EXTERNAL NAME 'MyListFunctions.listSize' " + "LANGUAGE JAVA PARAMETER STYLE JAVA"); ps = conn.prepareStatement("values list_size(?)"); ps.setObject(1, lst);Multi-level grouping
Derby's GROUP BY syntax has been extended with the ROLLUP keyword, which allows for multi-level grouping. There's a fine write-up about the feature on this wiki page. The example below shows how you can ask for the total amount of sales of each product in each state, the total amount of sales in each state, the total amount of sales in each region, and the amount of sales in all regions, in one single query:
ij> SELECT REGION, STATE, PRODUCT, SUM(SALES) FROM SALES_HISTORY GROUP BY ROLLUP(REGION, STATE, PRODUCT) ORDER BY REGION, STATE, PRODUCT; REGION |STA&|PRODUCT |4 -------------------------------------- East |MA |Boats |10 <-- all boat sales in Massachusetts East |MA |Cars |190 East |MA |NULL |200 <-- sum of all sales in Massachusetts East |NY |Boats |570 East |NY |Cars |10 East |NY |NULL |580 East |NULL|NULL |780 <-- sum of all sales in region East West |AZ |Boats |40 West |AZ |Cars |300 West |AZ |NULL |340 West |CA |Boats |570 West |CA |Cars |750 West |CA |NULL |1320 West |NULL|NULL |1660 NULL |NULL|NULL |2440 <-- sum of all sales in all regions 15 rows selectedRicher JOIN syntax
Derby already supports INNER JOIN, LEFT OUTER JOIN and RIGHT OUTER JOIN. In 10.6, the syntax will be enhanced with support for
- the CROSS JOIN operation
- named column joins
- sub-queries in ON clauses
CROSS JOIN is the simplest of the join operations, and SELECT * FROM T1 CROSS JOIN T2 is just another way of writing SELECT * FROM T1, T2. It may also be combined with other join operations:
ij> SELECT * FROM T1 CROSS JOIN T2 LEFT OUTER JOIN T3 ON T1.A = T3.B; A |B |A |B |A |B ----------------------------------------------------------------------- 5 |2 |2 |7 |NULL |NULL 1 row selected
Named columns join may save you some typing when the columns in the join key have the same name in the two tables being joined. In the example below, where two tables COUNTRIES and CITIES are joined, you no longer need to write the full join condition ON COUNTRIES.COUNTRY = CITIES.COUNTRY. Instead, you just say that the tables should be joined on the COUNTRY column with a USING clause:
ij> SELECT COUNTRY, COUNT(CITY_ID) FROM COUNTRIES JOIN CITIES USING (COUNTRY) GROUP BY COUNTRY; COUNTRY |2 -------------------------------------- Afghanistan |1 Argentina |1 Australia |2 . . . United Kingdom |1 United States |37 Venezuela |1 45 rows selectedExplain plan
The XPLAIN functionality makes it possible to access run-time statistics with SQL queries. See this section in the development version of Tuning Derby for details about how to enable it and use it. Here's one of the more advanced examples from the manual, which shows how to get all queries that performed a table scan on the COUNTRIES table, and the number of pages and rows visited in the scan:
ij> SELECT ST.STMT_TEXT, SP.NO_VISITED_PAGES AS PAGES, SP.NO_VISITED_ROWS AS "ROWS" FROM STATS.SYSXPLAIN_SCAN_PROPS SP, STATS.SYSXPLAIN_RESULTSETS RS, STATS.SYSXPLAIN_STATEMENTS ST WHERE ST.STMT_ID = RS.STMT_ID AND RS.SCAN_RS_ID = SP.SCAN_RS_ID AND RS.OP_IDENTIFIER = 'TABLESCAN' AND SP.SCAN_OBJECT_NAME = 'COUNTRIES'; STMT_TEXT |PAGES |ROWS ------------------------------------------------------ SELECT * FROM COUNTRIES |2 |114 1 row selectedORDER BY in sub-queries
Up till now, Derby has only accepted ORDER BY clauses in top-level SELECT queries. The ongoing work on DERBY-4397 will allow ORDER BY clauses in sub-queries as well. This is going to resolve the very old request for ordering of inserts (DERBY-4). There's also a request for OFFSET/FETCH in sub-queries (DERBY-4398), which would be very useful in combination with ORDER BY.
Dropping of in-memory databases
Derby 10.5 added support for in-memory databases, but there was no convenient API for deleting the in-memory databases and reclaiming the memory without taking down the Java process. In Derby 10.6, in-memory databases can be destroyed with an API similar to the one used for creating and shutting down databases. Simply add
drop=trueto the JDBC URL.ij> connect 'jdbc:derby:memory:mydb;create=true'; ij> create table t (x int); 0 rows inserted/updated/deleted ij> connect 'jdbc:derby:memory:mydb;drop=true'; ERROR 08006: Database 'memory:mydb' dropped.
Very nice if you're running JUnit tests and want to clear the database and free up the memory between each test case.
SHOW FUNCTIONS
Derby's interactive tool for running scripts or queries against a database, ij, lacked a command to show the functions that are stored in the database. Derby 10.6 adds SHOW FUNCTIONS to ij's list of SHOW commands, so now you can issue the following command to view all the functions in the APP schema:
ij> SHOW FUNCTIONS IN APP; FUNCTION_SCHEM|FUNCTION_NAME |REMARKS ---------------------------------------------------------- APP |MY_FUNC |MyFunctions.f 1 row selected
Restricted table functions
In Derby 10.6 it will be possible to push predicates into table functions and process the restrictions before the function returns the results to the SQL engine. This can speed up execution of some table function queries significantly. Take this example:
SELECT * FROM TABLE ( MY_FUNCTION() ) AS T WHERE X = 5
If MY_FUNCTION() returns thousands of rows, and only a handful of them match the restriction X = 5, a lot of work is wasted generating and scanning rows that are just thrown away. With restricted table functions, Derby's SQL engine will pass information about the restriction down to the table function, and the table function may use this information to produce and return only the rows that are actually needed.
Details about how to use restricted table functions can be found in the functional spec attached to DERBY-4357.
ROW_NUMBER improvements
Previous releases of Derby only allowed the ROW_NUMBER function to appear in the select list of a query, and it could not be used to build up more complex expressions. Now these limitations have been removed, and you can write queries like this one without getting syntax errors:
SELECT X / ROW_NUMBER() OVER () FROM T ORDER BY ROW_NUMBER() OVER () DESC
© 2010 Something Simpler Systems Inc. (contact)Categories
Produced with the financial participation of Telefilm Canada
