Manifesto of the motivations & processes of script
Explain in detail the basic algorithm to be employed by this script, and also point out its inherent flaws.
This commit is contained in:
		
							parent
							
								
									9ed0147eed
								
							
						
					
					
						commit
						26104f38d9
					
				
					 1 changed files with 72 additions and 0 deletions
				
			
		
							
								
								
									
										72
									
								
								extract-code-added-in-commits.plx
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										72
									
								
								extract-code-added-in-commits.plx
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,72 @@ | ||||||
|  | #!/usr/bin/perl -w | ||||||
|  | # extract-code-added-in-commits.plx                                            -*- Perl -*- | ||||||
|  | # | ||||||
|  | # Copyright (C) 2016 Bradley M. Kuhn <bkuhn@ebb.org> | ||||||
|  | # | ||||||
|  | # This software's license gives you freedom; you can copy, convey, | ||||||
|  | # propogate, redistribute and/or modify this program under the terms of | ||||||
|  | # the GNU  General Public License (GPL) as published by the Free | ||||||
|  | # Software Foundation (FSF), either version 3 of the License, or (at your | ||||||
|  | # option) any later version of the GPL published by the FSF. | ||||||
|  | # | ||||||
|  | # This program is distributed in the hope that it will be useful, but | ||||||
|  | # WITHOUT ANY WARRANTY; without even the implied warranty of | ||||||
|  | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU | ||||||
|  | # General Public License for more details. | ||||||
|  | # | ||||||
|  | # You should have received a copy of the GNU General Public License | ||||||
|  | # along with this program in a file in the toplevel directory called | ||||||
|  | # "GPLv3".  If not, see <http://www.gnu.org/licenses/>. | ||||||
|  | # | ||||||
|  | # Motivation for this script: | ||||||
|  | 
 | ||||||
|  | #  This script takes as standard input a list of commit ids.  This is called | ||||||
|  | #  the "whitelisted commits" for the process. | ||||||
|  | 
 | ||||||
|  | #  The output is a series of directories for each COMMIT_ID (all placed under | ||||||
|  | #  the directory specified in $ARGV[1]).  Under each COMMIT_ID directory, | ||||||
|  | #  there is a redacted copy of the files specifically changed or added by the | ||||||
|  | #  operations perfomed in COMMIT_ID.  The redcated copy will contain only | ||||||
|  | #  lines that were added or changed in that file by any operation in the | ||||||
|  | #  "whitelisted commits". | ||||||
|  | 
 | ||||||
|  | #  Motivation for this process: | ||||||
|  | 
 | ||||||
|  | #   The idea is to create a corpus of code that we know received | ||||||
|  | #   contributions from the whitelisted commits.  Note that across the various | ||||||
|  | #   COMMIT_ID directories, there will be substantial duplication.  However, | ||||||
|  | #   the full corpus requires building but in some cases, where code has been | ||||||
|  | #   rewritten. | ||||||
|  | 
 | ||||||
|  | 
 | ||||||
|  | # Clear Flaw in this process: | ||||||
|  | 
 | ||||||
|  | # In "Estimating the Total Cost of a Linux Distribution", found at | ||||||
|  | # https://www.linuxfoundation.org/sites/main/files/publications/estimatinglinux.html, | ||||||
|  | # McPherson, Proffitt, and Hale-Evans write: | ||||||
|  | 
 | ||||||
|  | #   The biggest weakness in SLOC analysis is its focus on net additions to | ||||||
|  | #   software projects. Anyone who is familiar with kernel development, for | ||||||
|  | #   instance, realizes that the highest man-power cost in its development is | ||||||
|  | #   when code is deleted and modified. The amount of effort that goes into | ||||||
|  | #   deleting and changing code, not just adding to it, is not reflected in | ||||||
|  | #   the values associated with this estimate. Because in a collaborative | ||||||
|  | #   development model, code is developed and then changed and deleted, the | ||||||
|  | #   true value is far greater than the existing code base. Just think about | ||||||
|  | #   the process: when a few lines of code are added to the kernel, for | ||||||
|  | #   instance, many more have to be modified to be compatible with that | ||||||
|  | #   change. The work that goes into understanding the dependencies and | ||||||
|  | #   outcomes and then changing that code is not well represented in this | ||||||
|  | #   study. | ||||||
|  | 
 | ||||||
|  | # Therefore, this process, which ignores lines that are *deleted*, thus | ||||||
|  | # streamlining and improving code, ignore a fundamental tenant of software | ||||||
|  | # development: that making code smaller, more expressive, and more concise | ||||||
|  | # yeilds better designed software.  While the process herein *can* produce a | ||||||
|  | # clear list of code whose known introduction is directly attributable to the | ||||||
|  | # whitelisted commits, the analysis produced by this process does not do | ||||||
|  | # justice to the full weight of the contributions made in those whitelisted | ||||||
|  | # commits, since removed code is outright ignored in this process. | ||||||
|  | 
 | ||||||
|  | # In other words, this process measures only quantity of code written and | ||||||
|  | # fails to examine the quality of the code. | ||||||
		Loading…
	
	Add table
		
		Reference in a new issue
	
	 Bradley M. Kuhn
						Bradley M. Kuhn