Utility Function: Singularize()

Well, after doing pluralize(), it should be no surprise that I also wanted to do singularize(). There are actually more rules for this one, even though I tried harder to make the rules smarter, rather then more numerous. In the end, the code is identical, save for the different list of patterns to match. Again, these patterns are based on the singularization regular expressions I found on ThinkSharp.org, although with my own modifications. Read more to see the code and the results of testing.


<cffunction name="Singularize" output="false" returntype="string">
    <cfargument name="item" type="string" required="true" />
    
    <cfset var local = StructNew() />
    
    <!--- Things that are singular or plural, or not countable --->
    <cfset local.uncountable = "sheep,fish,series,species,money,rice,information,equipment" />
        
    <!--- Does not follow the normal pluralization rules --->
    <cfset local.irregular = {
        move    = "moves",
        sex        = "sexes",
        child    = "children",
        person    = "people"
    } /
>

    
    <!--- Singularization rules, array to keep priority --->
    <cfset local.singularizations = ArrayNew(2) />
            
    <cfset local.singularizations[1][1] = "(quiz)zes$" />
    <cfset local.singularizations[1][2] = "$1" />
    
<cfset local.singularizations[2][1] = "(matr)ices$" />
    <cfset local.singularizations[2][2] = "$1ix" />
    
<cfset local.singularizations[3][1] = "(vert|ind)ices$" />
    <cfset local.singularizations[3][2] = "$1ex" />
    
<cfset local.singularizations[4][1] = "^(ox)en" />
    <cfset local.singularizations[4][2] = "$1" />
    
<cfset local.singularizations[5][1] = "(alias|status)$" />
    <cfset local.singularizations[5][2] = "$1" />
    
<cfset local.singularizations[6][1] = "(alias|status)es$" />
    <cfset local.singularizations[6][2] = "$1" />
    
<cfset local.singularizations[7][1] = "(octop|vir)(i|us)$" />
    <cfset local.singularizations[7][2] = "$1us" />
    
<cfset local.singularizations[8][1] = "(cris|ax|test)es$" />
    <cfset local.singularizations[8][2] = "$1is" />
    
<cfset local.singularizations[9][1] = "(shoe)s$" />
    <cfset local.singularizations[9][2] = "$1" />
    
<cfset local.singularizations[10][1] = "(o|bus)es$" />
    <cfset local.singularizations[10][2] = "$1" />
    
<cfset local.singularizations[11][1] = "([m|l])ice$" />
    <cfset local.singularizations[11][2] = "$1ouse" />
    
    <!--- Escape --->
<cfset local.singularizations[12][1] = "([a-zA-Z]+)?(us|is|sus|sis)$" />
    <cfset local.singularizations[12][2] = "$1$2" />
    
<cfset local.singularizations[13][1] = "([a-zA-Z]+)?ses$" />
    <cfset local.singularizations[13][2] = "$1sis" />
    
<cfset local.singularizations[14][1] = "(x|ch|ss|sh)es$" />
    <cfset local.singularizations[14][2] = "$1" />
    
<cfset local.singularizations[15][1] = "(m)ovies$" />
    <cfset local.singularizations[15][2] = "$1ovie" />
    
<cfset local.singularizations[16][1] = "(s)eries$" />
    <cfset local.singularizations[16][2] = "$1eries" />
    
<cfset local.singularizations[17][1] = "([a-zA-Z]+)?xies$" />
    <cfset local.singularizations[17][2] = "$1xi" />
    
<cfset local.singularizations[18][1] = "([^aeiouy]|qu)ies$" />
    <cfset local.singularizations[18][2] = "$1y" />
    
<cfset local.singularizations[19][1] = "([lr])ves$" />
    <cfset local.singularizations[19][2] = "$1f" />
    
<cfset local.singularizations[20][1] = "(tive)s$" />
    <cfset local.singularizations[20][2] = "$1" />
    
<cfset local.singularizations[21][1] = "(hive)s$" />
    <cfset local.singularizations[21][2] = "$1" />
    
<cfset local.singularizations[22][1] = "([^f])ves$" />
    <cfset local.singularizations[22][2] = "$1fe" />
    
<cfset local.singularizations[23][1] = "((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$" />
    <cfset local.singularizations[23][2] = "$1$2sis" />
    
<cfset local.singularizations[24][1] = "([ti])a$" />
    <cfset local.singularizations[24][2] = "$1um" />
    
<cfset local.singularizations[25][1] = "(n)ews$" />
    <cfset local.singularizations[25][2] = "$1ews" />
    
    <cfset local.singularizations[26][1] = "([a-zA-Z]+)?men$" />
    <cfset local.singularizations[26][2] = "$1man" />
    
<cfset local.singularizations[27][1] = "s$" />
    <cfset local.singularizations[27][2] = "" />
    
    <!--- Check if the item is in the uncountable list --->
    <cfif ListFindNoCase(local.uncountable, arguments.item)>
        <!--- If it is, set it as the return value --->
        <cfset local.returnValue = arguments.item />
    </cfif>
    
    <!--- Check if this value is in the irregular struct --->
    <cfif NOT StructKeyExists(local, "returnValue")>
        <!--- Loop over each irregular item --->
        <cfloop collection="#local.irregular#" item="word">
            <cfif arguments.item eq word OR arguments.item eq local.irregular[word]>
                <cfset local.returnValue = local.irregular[word] />
                <cfbreak />
            </cfif>
        </cfloop>
    </cfif>
    
    <!--- Test for pluralization rules --->
    <cfif NOT StructKeyExists(local, "returnValue")>
        <!--- Loop over each rule --->
        <cfloop from="1" to="#ArrayLen(local.singularizations)#" index="x">
            <!--- Get a new pattern for this rule --->
            <cfset local.pattern = PatternNew(local.singularizations[x][1]) />
            
            <!--- See if the pattern matches --->
            <cfif PatternFind(local.pattern, arguments.item)>
                <cfset local.returnValue = PatternReplace(local.pattern, arguments.item, local.singularizations[x][2]) />
            
                <cfbreak />
            </cfif>
        </cfloop>
    </cfif>
    
    <cfif NOT StructKeyExists(local, "returnValue")>
        <cfset local.returnValue = arguments.item />
    </cfif>
    
    <cfreturn local.returnValue />
</cffunction>

Example Results

Word Singularize(Word) Pluralize(Singularize(word)) eq Word
matrices matrix True
complexes complex True
dicta dictum True
quizzes quiz True
oxen ox True
mice mouse True
indices index True
benches bench True
lilies lily True
dwarves dwarf True
theses thesis True
atria atrium True
tomatoes tomato True
buses bus True
aliases alias True
viri virus True
axes axis True
census census True
taxies taxi True
cats cat True
women woman True
men man True

Word Singularize(Word) Singularize(word) eq Word
matrix matrix True
complex complex True
dictum dictum True
quiz quiz True
ox ox True
mouse mouse True
index index True
bench bench True
lily lily True
dwarf dwarf True
thesis thesis True
atrium atrium True
tomato tomato True
bus bus True
alias alias True
virus virus True
axis axis True
census census True
taxi taxi True
cat cat True
woman woman True
man man True

 

Comments

Peter's Gravatar what are function 'PatternNew()' and 'PatternReplace()
Jon Hartmann's Gravatar @Peter - You can find PatternNew() in this post: http://www.jonhartmann.com/index.cfm/2008/12/12/Ut...

And you can find PatternReplace() in this post: http://www.jonhartmann.com/index.cfm/2008/12/14/Ut...

Basically, they serve as wrappers for Java's RegEx engine which has some additional features over CF.
saurav's Gravatar Great job done here!
Mic's Gravatar Hi, what about PatternFind?
Comments are not allowed for this entry.
Jon Hartmann, July 2011

I'm Jon Hartmann and I'm a Javascript fanatic, UX/UI evangelist and former ColdFusion master. I blog about mysterious error messages, user interface design questions, and all things baffling and irksome about programming for the web.

Learn more about me on LinkedIn.