Utility Function: Singularize()
- December 16, 2008 5:19 PM
- ColdFusion, Utility Function
- Comments (4)
Well, after doing pluralize(), it should be no surprise that I also wanted to do singularize(). There are actually more rules for this one, even though I tried harder to make the rules smarter, rather then more numerous. In the end, the code is identical, save for the different list of patterns to match. Again, these patterns are based on the singularization regular expressions I found on ThinkSharp.org, although with my own modifications. Read more to see the code and the results of testing.
<cffunction name="Singularize" output="false" returntype="string">
<cfargument name="item" type="string" required="true" />
<cfset var local = StructNew() />
<!--- Things that are singular or plural, or not countable --->
<cfset local.uncountable = "sheep,fish,series,species,money,rice,information,equipment" />
<!--- Does not follow the normal pluralization rules --->
<cfset local.irregular = {
move = "moves",
sex = "sexes",
child = "children",
person = "people"
} />
<!--- Singularization rules, array to keep priority --->
<cfset local.singularizations = ArrayNew(2) />
<cfset local.singularizations[1][1] = "(quiz)zes$" />
<cfset local.singularizations[1][2] = "$1" />
<cfset local.singularizations[2][1] = "(matr)ices$" />
<cfset local.singularizations[2][2] = "$1ix" />
<cfset local.singularizations[3][1] = "(vert|ind)ices$" />
<cfset local.singularizations[3][2] = "$1ex" />
<cfset local.singularizations[4][1] = "^(ox)en" />
<cfset local.singularizations[4][2] = "$1" />
<cfset local.singularizations[5][1] = "(alias|status)$" />
<cfset local.singularizations[5][2] = "$1" />
<cfset local.singularizations[6][1] = "(alias|status)es$" />
<cfset local.singularizations[6][2] = "$1" />
<cfset local.singularizations[7][1] = "(octop|vir)(i|us)$" />
<cfset local.singularizations[7][2] = "$1us" />
<cfset local.singularizations[8][1] = "(cris|ax|test)es$" />
<cfset local.singularizations[8][2] = "$1is" />
<cfset local.singularizations[9][1] = "(shoe)s$" />
<cfset local.singularizations[9][2] = "$1" />
<cfset local.singularizations[10][1] = "(o|bus)es$" />
<cfset local.singularizations[10][2] = "$1" />
<cfset local.singularizations[11][1] = "([m|l])ice$" />
<cfset local.singularizations[11][2] = "$1ouse" />
<!--- Escape --->
<cfset local.singularizations[12][1] = "([a-zA-Z]+)?(us|is|sus|sis)$" />
<cfset local.singularizations[12][2] = "$1$2" />
<cfset local.singularizations[13][1] = "([a-zA-Z]+)?ses$" />
<cfset local.singularizations[13][2] = "$1sis" />
<cfset local.singularizations[14][1] = "(x|ch|ss|sh)es$" />
<cfset local.singularizations[14][2] = "$1" />
<cfset local.singularizations[15][1] = "(m)ovies$" />
<cfset local.singularizations[15][2] = "$1ovie" />
<cfset local.singularizations[16][1] = "(s)eries$" />
<cfset local.singularizations[16][2] = "$1eries" />
<cfset local.singularizations[17][1] = "([a-zA-Z]+)?xies$" />
<cfset local.singularizations[17][2] = "$1xi" />
<cfset local.singularizations[18][1] = "([^aeiouy]|qu)ies$" />
<cfset local.singularizations[18][2] = "$1y" />
<cfset local.singularizations[19][1] = "([lr])ves$" />
<cfset local.singularizations[19][2] = "$1f" />
<cfset local.singularizations[20][1] = "(tive)s$" />
<cfset local.singularizations[20][2] = "$1" />
<cfset local.singularizations[21][1] = "(hive)s$" />
<cfset local.singularizations[21][2] = "$1" />
<cfset local.singularizations[22][1] = "([^f])ves$" />
<cfset local.singularizations[22][2] = "$1fe" />
<cfset local.singularizations[23][1] = "((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$" />
<cfset local.singularizations[23][2] = "$1$2sis" />
<cfset local.singularizations[24][1] = "([ti])a$" />
<cfset local.singularizations[24][2] = "$1um" />
<cfset local.singularizations[25][1] = "(n)ews$" />
<cfset local.singularizations[25][2] = "$1ews" />
<cfset local.singularizations[26][1] = "([a-zA-Z]+)?men$" />
<cfset local.singularizations[26][2] = "$1man" />
<cfset local.singularizations[27][1] = "s$" />
<cfset local.singularizations[27][2] = "" />
<!--- Check if the item is in the uncountable list --->
<cfif ListFindNoCase(local.uncountable, arguments.item)>
<!--- If it is, set it as the return value --->
<cfset local.returnValue = arguments.item />
</cfif>
<!--- Check if this value is in the irregular struct --->
<cfif NOT StructKeyExists(local, "returnValue")>
<!--- Loop over each irregular item --->
<cfloop collection="#local.irregular#" item="word">
<cfif arguments.item eq word OR arguments.item eq local.irregular[word]>
<cfset local.returnValue = local.irregular[word] />
<cfbreak />
</cfif>
</cfloop>
</cfif>
<!--- Test for pluralization rules --->
<cfif NOT StructKeyExists(local, "returnValue")>
<!--- Loop over each rule --->
<cfloop from="1" to="#ArrayLen(local.singularizations)#" index="x">
<!--- Get a new pattern for this rule --->
<cfset local.pattern = PatternNew(local.singularizations[x][1]) />
<!--- See if the pattern matches --->
<cfif PatternFind(local.pattern, arguments.item)>
<cfset local.returnValue = PatternReplace(local.pattern, arguments.item, local.singularizations[x][2]) />
<cfbreak />
</cfif>
</cfloop>
</cfif>
<cfif NOT StructKeyExists(local, "returnValue")>
<cfset local.returnValue = arguments.item />
</cfif>
<cfreturn local.returnValue />
</cffunction>
Example Results
Word | Singularize(Word) | Pluralize(Singularize(word)) eq Word |
---|---|---|
matrices | matrix | True |
complexes | complex | True |
dicta | dictum | True |
quizzes | quiz | True |
oxen | ox | True |
mice | mouse | True |
indices | index | True |
benches | bench | True |
lilies | lily | True |
dwarves | dwarf | True |
theses | thesis | True |
atria | atrium | True |
tomatoes | tomato | True |
buses | bus | True |
aliases | alias | True |
viri | virus | True |
axes | axis | True |
census | census | True |
taxies | taxi | True |
cats | cat | True |
women | woman | True |
men | man | True |
Word | Singularize(Word) | Singularize(word) eq Word |
---|---|---|
matrix | matrix | True |
complex | complex | True |
dictum | dictum | True |
quiz | quiz | True |
ox | ox | True |
mouse | mouse | True |
index | index | True |
bench | bench | True |
lily | lily | True |
dwarf | dwarf | True |
thesis | thesis | True |
atrium | atrium | True |
tomato | tomato | True |
bus | bus | True |
alias | alias | True |
virus | virus | True |
axis | axis | True |
census | census | True |
taxi | taxi | True |
cat | cat | True |
woman | woman | True |
man | man | True |
Comments
And you can find PatternReplace() in this post: http://www.jonhartmann.com/index.cfm/2008/12/14/Ut...
Basically, they serve as wrappers for Java's RegEx engine which has some additional features over CF.